DIMACS Center, CoRE Building, Rutgers University

**Organizers:****Tanya Berger-Wolf**, University of Illinois at Chicago, tanyabw@uic.edu**Mark Goldberg**, RPI, goldberg@cs.rpi.edu**Malik Magdon-Ismail**, RPI, magdon@cs.rpi.edu**Fred Roberts**, DIMACS, froberts@dimacs.rutgers.edu**William "Al" Wallace**, RPI, wallaw@rpi.edu

Invited Speakers:

Title: Dynamic Network Analysis with ORA: Key Challenges

Dynamic network analysis as a field represents the merger of link analysis, social networks, and multi-agent systems. But merging these three approaches pose several challenges that need to be met before the promise of scalable, robust, dynamic assessment of networks is possible. A series of challenges will be discussed and the current approach to meeting each challenge will be described. Challenges include: network extraction, missing data, certainty estimates, geo-spatial networks, change detection.

Title: Optimization Problems in Social Networks

A social network - the graph of relationships and interactions within a group of individuals - plays a fundamental role as a medium for the spread of information, ideas, influence, or diseases among its members. An idea or innovation will appear, and it can either die out quickly or make significant inroads into the population. Similarly, an infectious disease may either affect a large share of the population, or be confined to a small fraction.

The collective behavior of individuals and the spread of diseases in a social network have a long history of study in sociology and epidemiology. In this talk, we will investigate graph-theoretic optimization problems relating to the spread of information or diseases. Specifically, we will focus on two types of questions: influence maximization, wherein we seek to identify influential individuals to start a cascade of an innovation to maximize the expected number of eventual adopters; and infection minimization, wherein we seek to remove nodes so as to keep a given infected component small.

We will present constant factor and bicriteria algorithms for versions of these problems, and also touch on many open problems and issues regarding competition among multiple innovators. (This talk represents past joint work with Jon Kleinberg, Eva Tardos, Elliot Anshelevich, Ara Hayrapetyan, Martin Pal, and Zoya Svitkina, as well as ongoing work with Shishir Bharathi and Mahyar Salek.)

Title: Structure and Evolution of Online Social Networks

Online social networks have become major and driving phenomena on the web. In this talk we focus on the properties of two large online social networks, namely, the LiveJournal network of friends and the Flickr network of contacts. In the context of LiveJournal, we formulate a simple and general model of social networks that can explain the success of Milgram's famous experiment that gave rise to `six-degrees of separation'. In the context of Flickr, we focus on the temporal evolution of social networks from a graph-structure point of view.

Title: Learning What Makes a Society Tick: From Communication Data to Social Mechanism

The social evolution of a community can be captured by the dynamics of its social groups. We present recent results on a machine learning approach to discovering the agent dynamics that drives the evolution of social groups. The task is to infer the "laws" governing a society's agent based social dynamics by observing only its communication data. We break this task into two steps. The first is to infer the group dynamics from communication dynamics (macro-laws). The second is to infer agent "micro-laws" from the group dynamics.

We set up the problem by introducing a parameterized probabilistic model for the agent dynamics. The model identification problem is then formulated as a mixed optimization problem. To solve the optimization problem efficiently, we develop heuristic expectation-maximization style algorithms. We present the results of extensive experiments on simulated data as well as some results on real communities of newsgroups, blogs and emails.

Title: Collective Classification in Network Data

Collective classification is the process of collectively inferring labels of nodes and/links in a network. This process is different from the classic supervised learning setting, where classification is typically done on each object independently (i.e., without taking into account any underlying relationships between objects). In this talk, I will provide an overview of existing approaches to collective classification; discuss when they perform well and more importantly when they do not; show the usefulness of a network's global structural characteristics in classifying network data (in particular, when missing class labels abound); present pitfalls in experimental methodology associated with network data; and lastly outline future work on classification in dynamic heterogeneous networks.

This is joint work with Brian Gallagher (LLNL), Lise Getoor (UMD), and her students: Prithviraj Sen, Galileo Namata, and Mustafa Bilgic.

Title: Recent Research on Statistical Models for Social Networks, with a Focus on Imputation

This talk highlights the wide range of statistical analyses that are part of network science. Of particular importance are the exponential family of random graph distributions, known as p*, and recent work on robustness and resistance of network data when actors and/or relational ties are missing or removed. This is joint work with Garry Robins and Douglas Steinley.

Contributed Talks

Title: Mixed Membership Stochastic Blockmodels

Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.

Title: High-Performance Combinatorial Techniques for Analyzing Massive Dynamic Interaction Networks

Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socio-economic interactions, the world-wide web, communication networks, and scientific computing can be intuitively modeled as graphs. In this paper, we discuss high performance combinatorial techniques for analyzing large-scale information networks, encapsulating dynamic interaction data in the order of billions of entities. For tractable analysis of massive temporal data sets, we need holistic techniques that supplement existing static graph algorithms with relevant ideas from dynamic graph algorithms, social network analysis, and parallel algorithms for combinatorial problems. For instance, in order to design scalable parallel algorithms, it is crucial to estimate and exploit network characteristics such as the degree distribution and the graph diameter. We present a computational framework for the topological analysis of dynamic interaction data: we experiment with several graph representations, identify key analysis kernels to be optimized, and discuss parallel algorithms for large-scale graph analysis. In recent work, we have designed efficient parallel techniques for graph traversal, connectivity and centrality problems that process static graphs with billions of vertices and edges. We intend to extend these algorithms for studying temporal data, and our current research focus is on a prototype open-source toolkit for large-scale dynamic network analysis.

Title: Effect of Public Policies and Individual Behavior on the Co-evolution of Social Networks and Infectious Disease Dynamics

Human behavior, social networks, and the civil infrastructures are closely intertwined. Understanding their co-evolution is critical for designing public policies and decision support for disaster planning. For example, human behaviors and day to day activities of individuals create dense social interactions that are characteristic of modern urban societies. These dense social networks provide a perfect fabric for fast, uncontrolled disease propagation. Conversely, people's behavior in response to public policies and their perception of how the crisis is unfolding as a result of disease outbreak can dramatically alter the normally stable social interactions. Effective planning and response strategies must take these complicated interactions into account. In this paper, we describe a computer simulation based approach to study these issues using public health and computational epidemiology as an illustrative example.

Title: Persistence and periodicity in a dynamic proximity network

The topology of social networks can be understood as being inherently dynamic, with edges having a distinct position in time. Most characterizations of dynamic networks discretize time by converting temporal information into a sequence of network "snapshots" for further analysis. Here we study a highly resolved data set of a dynamic proximity network of 66 individuals. We show that the topology of this network evolves over a very broad distribution of time scales, that its behavior is characterized by strong periodicities driven by external calendar cycles, and that the conversion of inherently continuous-time data into a sequence of snapshots can produce highly biased estimates of network structure. We suggest that dynamic social networks exhibit a natural time scale and that the best conversion of such dynamic data to a discrete sequence of networks is done at this natural rate.

Title: Betweenness Centrality Measure in Dynamic Networks

In this paper we propose three methods of measuring betweenness of individuals in networks which are best modeled as graphs with explicit time ordering on their edges. The betweenness centrality index is one of the basic measure in the analysis of social networks, but most of the work done for measuring the betweenness index of individuals is based on the aggregate representation of the network. Many network problems are based on fundamental relationship involving time. We incorporate the time factor in the aggregate graph representation of social networks to create dynamic networks. We define and measure the betweenness in this dynamic framework. We compare the three betweenness with the standard betweenness measure for the same network. We show that by incorporating the exact times of interactions among individuals in a network, we can better study the betweenness of individuals in the underlying network.

Title: Empirical Analysis of an Evolving Social Network

Social networks evolve over time, driven by the shared activities and affiliations of their members, by similarity of individuals' attributes, and by the closure of short network cycles. We analyzed a dynamic social network comprising 43,553 students, faculty, and staff at a large university, in which interactions between individuals are inferred from time-stamped e-mail headers recorded over one academic year and are matched with affiliations and attributes. We found that network evolution is dominated by a combination of effects arising from network topology itself and the organizational structure in which the network is embedded. In the absence of global perturbations, average network properties appear to approach an equilibrium state, whereas individual properties are unstable.

Title: Cuts, Hubs and Webs in Peer-to-peer Concurrent Flow Networks

We model peer-to-peer networks by a weighted graphs G where every vertex (peer-to-peer) pair has a demand D(i,j) and every edge has a capacity c(e). A concurrent flow of throughput z is a flow function on the paths in G such that the total flow on all paths with end vertices (i,j) is zD(i,j) and the total flow on all paths containing edge e is at most c(e). A maximum concurrent flow is a flow function maximizing z, which can be solved in polynomial time as a linear program. The solution determines a set of critical edges that must be saturated by every maximum concurrent flow function. The critical edges identify a sparsest cut or multipartite cut (web) partitioning the vertices into a canonically determined number of parts.

We describe a flow circulation mechanism for dynamically updating a max concurrent flow function given small changes in the demands/capacities. We describe a hierarchal max concurrent flow function which is employed to characterize flow through and terminal flow at a vertex and provide a robust measure of hub vertices. These cuts, webs and hubs are all canonically determined by the demand and capacity functions and provide a strong foundation to peer-to-peer network communication.

Title: Embedding Resources in Social Network Analysis

Social Network Analysis has studied the structure of relationships among agents in order to gain insight into the properties of groups as well as the social positions of individuals. Their methodology focuses on the relational properties of individuals and groups and ignores the non-relational properties of agents. But it is clear that the explanation of many social phenomena can be understood only by factoring in non-relational properties into the relational matrix. For instance, the vulnerability of members of a community to a certain infection depends both on the pattern of contacts among the community members as well as the differing biological capacities of individuals to resist that type of infection; or the vulnerability of members of a community to an economic hardship depends both on the pattern of relationships among the members as well the pattern of distribution of resources among members. In this paper we introduce techniques for studying the resources available to individual members of a community as a result of the network of the relationships in the community as well as the pattern of distribution of resources possessed by each community member.

Title: Capturing Causality in Communications Graphs

We analyze dynamic graphs formed from email data to identify high impact messages and look for evidence of implied causality for real life events. We introduce a new method for identifying anomalies or events from dynamic communication graphs. The method is robust to false alarms. Experiments show it is effective in identifying the formation of communities within communications networks.

Title: Epidemic thresholds for infections on networks

Over the last ten years, the field of mathematical epidemiology has piqued the interest of complex systems researchers, which has resulted in a tremendous volume of work exploring the consequences of population structure on disease propagation. Much of this research focuses on computing thresholds for whether a disease will become an epidemic, and in practice, several different thresholds are often used interchangeably. Here, we'll summarize some recent work that attempts to clarify the relationships among different threshold criteria, discuss conditions under which topology and infection characteristics can be decoupled in the computation of a threshold, and connect current and classical threshold theorems to results in spectral graph theory.

Previous: Program

Workshop Index

DIMACS Homepage

Contacting the Center

Document last modified on August 8, 2007.