Estimating time-varying networks

Amr Ahmed

doi:10.1214/09-AOAS308

Outline

Estimating time-varying networks

Amr Ahmed

2008

https://doi.org/10.1214/09-AOAS308

visibility

…

description

32 pages

link

1 file

Abstract

Stochastic networks are a plausible representation of the relational information among entities in dynamic systems such as living cells or social communities. While there is a rich literature in estimating a static or temporally invariant network from observation data, little has been done toward estimating time-varying networks from time series of entity attributes. In this paper we present two new machine learning methods for estimating time-varying networks, which both build on a temporally smoothed l1-regularized logistic regression formalism that can be cast as a standard convex-optimization problem and solved efficiently using generic solvers scalable to large networks. We report promising results on recovering simulated timevarying networks. For real data sets, we reverse engineer the latent sequence of temporally rewiring political networks between Senators from the US Senate voting records and the latent evolving regulatory networks underlying 588 genes across the life cycle of Drosophila melanogaster from the microarray time course. . This reprint differs from the original in pagination and typographic detail. 1 2 KOLAR, SONG, AHMED AND XING Each of these characteristics adds a degree of complexity to the interpretation and analysis of networks. In this paper we present a new methodology and analysis that address a particular aspect of dynamic network analysis: how can one reverse engineer networks that are latent, and topologically evolving over time, from time series of nodal attributes. While there is a rich and growing literature on modeling time-invariant networks, much less has been done toward modeling dynamic networks that are rewiring over time. We refer to these time or condition specific circuitries as time-varying networks, which are ubiquitous in various complex systems. Consider the following two real world problems:

References (34)

Arbeitman, M., Furlong, E., Imam, F., Johnson, E., Null, B., Baker, B., Kras- now, M., Scott, M., Davis, R. and White, K. (2002). Gene expression during the life cycle of Drosophila melanogaster. Science 297 2270-2275.
Banerjee, O., El Ghaoui, L. and d'Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation. J. Mach. Learn. Res. 9 485-516. MR2417243
Bresler, G., Mossel, E. and Sly, A. (2008). Reconstruction of Markov random fields from samples: Some observations and algorithms. In APPROX '08 / RANDOM '08: Proceedings of the 11th International Workshop, APPROX 2008, and 12th International Workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Op- timization 343-356. Springer, Berlin.
Davidson, E. H. (2001). Genomic Regulatory Systems. Academic Press, San Diego.
Drton, M. and Perlman, M. D. (2004). Model selection for Gaussian concentration graphs. Biometrika 91 591-602. MR2090624
Duchi, J., Gould, S. and Koller, D. (2008). Projected subgradient methods for learning sparse Gaussians. In Proceedings of the Twenty-fourth Conference on Uncertainty in AI (UAI) 145-152.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. MR2060166
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. MR1946581
Fan J., Feng Y. and Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. Ann. Appl. Statist. 3 521-541.
Friedman, J., Hastie, J. and Tibshirani, R. (2007). Sparse inverse co- variance estimation with the graphical lasso. Biostat, kxm045. Available at http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxm045v1.
Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization paths for generalized linear models via coordinate descent. Technical report, Dept. Statistics, Stanford Univ.
Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302. MR2415737
Getoor, L. and Taskar, B. (2007). Introduction to Statistical Relational Learning (Adap- tive Computation and Machine Learning). MIT Press, Cambridge, MA. MR2391486
Grant, M. and Boyd, S. (2008). Cvx: Matlab software for disciplined convex program- ming (web page and software). Available at http://stanford.edu/ ~boyd/cvx.
Guo, F., Hanneke, S., Fu, W. and Xing, E. P. (2007). Recovering temporally rewiring networks: A model-based approach. In Proceedings of the 24th International Conference on Machine Learning 321-328. ACM Press, New York.
Hanneke, S. and Xing E. P. (2006). Discrete temporal models of social networks. Lecture Notes in Computer Science 4503 115-125.
Koh, K., Kim, S.-J. and Boyd, S. (2007). An interior-point method for large-scale l1- regularized logistic regression. J. Mach. Learn. Res. 8 1519-1555. MR2332440
Kolar, R. and Xing, E. P. (2009). Sparsistent estimation of time-varying discrete Markov random fields. ArXiv e-prints.
Lauritzen, S. L. (1996). Graphical Models. Oxford Univ. Press, Oxford. MR1419991
Luscombe, N., Babu, M., Yu, H., Snyder, M., Teichmann, S. and Gerstein, M. (2004). Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431 308-312.
Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable se- lection with the lasso. Ann. Statist. 34 1436. MR2278363
Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735-746. MR2541591
Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional ising model selection using ℓ1 regularized logistic regression. Ann. Statist. 38 1287-1319.
Rinaldo, A. (2009). Properties and refinements of the fused lasso. Ann. Statist. 37 2922- 2952. MR2541451
Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Statist. 2 494. MR2417391
Sarkar, P. and Moore, A. (2006). Dynamic social network analysis using latent space models. SIGKDD Explor. Newsl. 7 31-40.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B 67 91-108. MR2136641
Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 475-494. MR1835069
van Duijn, M. A. J., Gile, K. J. and Handcock, M. S. (2009). A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of ex- ponential family random graph models. Social Networks 31 52-62.
Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 1-305.
Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso). IEEE Trans. Inform. Theory 55 2183-2202.
Watts, D. and Strogatz, S. (1998). Collective dynamics of 'small-world' networks. Na- ture 393 440-442.
Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19-35. MR2367824
Zhou, S., Lafferty, J. and Wasserman, L. (2008). Time varying undirected graphs. In Conference on Learning Theory (R. A. Servedio and T. Zhang, eds.) 455-466. Omni- press, Madison, WI.

Network or a Graph: A mathematical structure to represent objects and their interactions. Objects are represented by Nodes or Vertices (often denoted by a set V) and interactions are represented by Links or Edges (often denoted by a set E). Mathematically, a graph G is defined as a tuple G(V, E). Mathematicians use the 2 term Graph whereas scientists from other disciplines usually use the term Network to refer to the same concept. Throughout this text, we use these terms interchangeably. Social Network: A network where objects represent people and their interactions represent some sort of relationship among people. For example, two individuals may be connected to each other if they have studied at the same school, or play for the same football team. Clusters: A group of nodes (representing objects) that are densely connected to each other and sparsely connected to other nodes in the network. Formally, a clustering of a static graph G = (V, E) is defined by a set C of subsets of V : C = {c 1 , c 2 , ..., c l } such that V = c 1 ∪ c 2 ∪ ... ∪ c l. Small World Network: A graph with two characteristic properties. The average path length i.e. the number of nodes needed to traverse from one node to another on average is low, as compared to an equivalent size random graph. The second characteristic is the high transitivity among nodes i.e. many sets of three nodes are connected to each other with three vertices. Scale Free Network: A graph whose degree distribution follows a power law where the power law coefficient is usually between [2,3]. In other words, this means that most nodes nodes have only a few connections (low degree) and few nodes have many connections (high degree) in the network. Definition Network Science has emerged as an interdisciplinary field of study to model many physical and real world systems. A network, although consists of only a set of nodes and edges, but is a very powerful structure to represent a wide variety of systems such as people related through social relations, airports related through flights and Formally, we can define a dynamic network as a network which undergoes structural changes over time. The analysis and visualization of these networks is the study of algorithms, methods, tools and techniques which help us understand these networks and extract applicable knowledge from them. The study of Dynamic Networks forms a new and cross disciplinary area of study with research opportunities and applications in many diverse fields.

Estimating time-varying networks

Sign up for access to the world's latest research

Abstract

Related papers

References (34)

Related papers

Related topics

Cited by