Mining Concurrent Topical Activity in Microblog Streams
Abstract
Streams of user-generated content in social media exhibit patterns of collective attention across diverse topics, with temporal structures determined both by exogenous factors and endogenous factors. Teasing apart different topics and resolving their individual, concurrent, activity timelines is a key challenge in extracting knowledge from microblog streams. Facing this challenge requires the use of methods that expose latent signals by using term correlations across posts and over time. Here we focus on content posted to Twitter during the London 2012 Olympics, for which a detailed schedule of events is independently available and can be used for reference. We mine the temporal structure of topical activity by using two methods based on non-negative matrix factorization. We show that for events in the Olympics schedule that can be semantically matched to Twitter topics, the extracted Twitter activity timeline closely matches the known timeline from the schedule. Our results show that, given appropriate techniques to detect latent signals, Twitter can be used as a social sensor to extract topical-temporal information on realworld events at high temporal resolution.
References (35)
- REFERENCES
- E. Adar, D. Weld, Bershad, B.N., and S. Gribble. Why we search: visualizing and predicting user behavior. In Proc. 16th intl. conf. on World Wide Web (WWW'07), pages 161-170, 2007.
- S. Asur, B. A. Huberman, G. Szabo, and W. C. Trends in social media : Persistence and decay. In Proc. 5th Intl. Conf. on Weblogs and Social Media (ICWSM), page 434, 2011.
- J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1 -8, 2011.
- L. Byron and M. Wattenberg. Stacked graphs-geometry & aesthetics. Visualization and Computer Graphics, IEEE Transactions on, 14(6):1245-1252, 2008.
- J. Carroll and J.-J. Chang. Analysis of individual differences in multidimensional scaling via an n-way generalization of "eckart-young" decomposition. Psychometrika, 35(3):283-319, September 1970.
- A. Cichocki, A. H. Phan, and R. Zdunek. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, Chichester, 2009.
- R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. PNAS, 105:15649, 2008.
- F. Figueiredo, F. Benevenuto, and J. Almeida. The tube over time: Characterizing popularity growth of youtube videos. In Proc. ACM Intl. Conf. on Web Search and Data Mining (WSDM), pages 745-754, 2011.
- R. A. Harshman. Foundations of the PARAFAC procedure: Models and conditions for an" explanatory" multi-modal factor analysis. UCLA Working Papers in Phonetics, 16(1):84, 1970.
- P. Hoyer. Non-negative matrix factorization with sparseness constraints. The Journal of Machine Learning Research, 5:1457-1469, 2004.
- B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science & Technology, 60(11), 2009.
- J. Kim and H. Park. Fast nonnegative tensor factorization with an active-set-like method. In M. W. Berry, K. A. Gallivan, E. Gallopoulos, A. Grama, B. Philippe, Y. Saad, and F. Saied, editors, High-Performance Scientific Computing, pages 311-326. Springer London, 2012.
- T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Rev., 51(3):455-500, Aug. 2009.
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? WWW '10 Proc. of the 19th intl. conf. on World wide web, page 591, Feb 2010.
- D. Laniado and P. Mika. Making sense of twitter. In Semantic Web - ISWC, volume 6469, pages 470-485, 2010.
- J. Lehmann, B. Gonçalves, J. J. Ramasco, and C. Cattuto. Dynamical classes of collective attention in twitter. In Proc. of the 21st intl. conf. on World Wide Web, WWW '12, pages 251-260, New York, NY, USA, 2012. ACM.
- J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proc. of the 15th ACM SIGKDD intl. conf. on Knowledge discovery and data mining, page 497, 2009.
- C. Lin. Projected gradient methods for nonnegative matrix factorization. Neural computation, 19(10):2756-2779, 2007.
- D. Mocanu, A. Baronchelli, N. Perra, B. Gonçalves, Q. Zhang, and A. Vespignani. The twitter of babel: Mapping world languages through microblogging platforms. PloS one, 8(4):e61981, 2013.
- M. Naaman, H. Becker, and L. Gravano. Hip and trendy: Characterizing emerging trends on twitter. J. Am. Soc. Inf. Sci., 62:902-918, 2011.
- P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2):111-126, 1994.
- J. Ratkiewicz, F. Menczer, S. Fortunato, A. Flammini, and A. Vespignani. Traffic in social media ii: Modeling bursty popularity. In SocialCom 2010: SIN, 2010.
- J.-P. Royer, N. Thirion-Moreau, and P. Comon. NonNegative 3-Way tensor Factorzation taking into account Possible Missing Data. In Eurasip, editor, EUSIPCO-2012, pages 1-5, Bucarest, Roumanie, Aug. 2012. Elsevier.
- A. Saha and V. Sindhwani. Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization. In Proc. of the fifth ACM intl. conf. on Web search and data mining, WSDM '12, pages 693-702, New York, NY, USA, 2012. ACM.
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proc. of the 19th intl. conf. on World wide web, WWW '10, pages 851-860, New York, NY, USA, 2010. ACM.
- A. Shashua and T. Hazan. Non-negative tensor factorization with applications to statistics and computer vision. In Proc. of the 22nd intl. conf. on Machine learning, ICML '05, pages 792-799, New York, NY, USA, 2005. ACM.
- R. R. Sokal and C. D. Michener. A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin, 28:1409-1438, 1958.
- J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In Proc. of the 12th ACM SIGKDD intl. conf. on Knowledge discovery and data mining, KDD '06, pages 374-383, New York, NY, USA, 2006. ACM.
- T. Van de Cruys. A non-negative tensor factorization model for selectional preference induction. In Proc. of the Workshop on Geometrical Models of Natural Language Semantics, GEMS '09, pages 83-90, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
- Y. Wang and E. Agichtein. Temporal latent semantic analysis for collaboratively generated content: preliminary results. In Proc. of the 34th intl. ACM SIGIR conf. on Research and development in Information Retrieval, SIGIR '11, pages 1145-1146, New York, NY, USA, 2011. ACM.
- F. Wu and B. A. Huberman. Novelty and collective attention. Proc. Nat. Acad. Sci., 104:17599, 2007.
- J. Wu, K. M. Thornton, and E. N. Efthimiadis. Conversational tagging in twitter. In Proc. of the 21st ACM conf. on Hypertext and Hypermedia, pages 173-178, 2010.
- S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. WWW 2011, pages 1-10, Feb 2011.
- J. Yang and J. Leskovec. Patterns of temporal variation in online media. In Proc. of the fourth ACM intl. conf. on Web search and data mining, pages 177-186, 2011.