Post Summarization of Microblogs of Sporting Events
26th International World Wide Web (WWW) Conference, 2017
https://doi.org/10.1145/3041021.3054146Abstract
Every day 645 million Twitter users generate approximately 58 million tweets. This motivates the question if it is possible to generate a summary of events from this rich set of tweets only. Key challenges in post summarization from microblog posts include circumnavigating spam and conversational posts. In this study, we present a novel technique called lexi-temporal clustering (LTC), which identifies key events. LTC uses k-means clustering and we explore the use of various distance measures for clustering using Euclidean, cosine similarity and Manhattan distance. We collected three original data sets consisting of Twitter mi-croblog posts covering sporting events, consisting of a cricket and two football matches. The match summaries generated by LTC were compared against standard summaries taken from sports sections of various news outlets, which yielded up to 81% precision, 58% recall and 62% F-measure on different data sets. In addition, we also report results of all three varian...
References (32)
- REFERENCES
- G. Beverungen and J. Kalita. Evaluating methods for summarizing twitter posts. In Proceedings of International AAAI Conference on Web and Social Media (ICWSM), 11:9-12, 2011.
- S. Bird, E. Klein, and E. Loper. Natural Language Processing with Python. O ŠReilly Media Inc., 2009.
- D. Chakrabarti and K. Punera. Event Summarization Using Tweets. In International Conference on Weblogs and Social Media (ICWSM), 2011.
- M. Chaput. stemming 1.0 : Python package index. https: // pypi. python. org/ pypi/ stemming/ 1. 0 , 2017.
- eMarketer. Worldwide Social Network Users: 2013 Forecast and Comparative Estimates. Technical report, eMarketer, 2013.
- G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457-479, 2004.
- ESPN. ESPN Commentary. In http: // goo. gl/ UHpQBO , [accessed Jan-2016].
- ESPNcricinfo. Indian Premier League -Final, Kolkata Knight Riders vs Chennai Super Kings, Scorecard. In http: // goo. gl/ vTpi3l , [accessed Jan-2016].
- R. Halvorsen. Simple Twitter Streaming API access, tweetstream 1.1.1, https://pypi.python.org/pypi/tweetstream. Technical report, Pyhthon.org, 2011.
- Y. Hu, A. John, D. D. Seligmann, and F. Wang. What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds. In Intern. Conf. on Weblogs and Social Media, 2012.
- K. Inc. Klout | be known for what you love. https: // klout. com/ , 2015, 2015.
- Indiatoday. IPL 2012 Final Live: scores and commentary. In http: // goo. gl/ UIhIkR , [accessed Jan-2016].
- D. Inouye and J. K. Kalita. Comparing Twitter Summarization Algorithms for Multiple Post Summaries. In Third IEEE International Conference on Social Computing (SocialCom), pages 298-306, October 2011.
- R. Kelly. Twitter Study Reveals Interesting Results About Usage, 40% is Pointless Babble. http: // goo. gl/ DZea6f , 2009.
- M. A. H. Khan, D. Bollegala, G. Liu, and K. Multi-tweet summarization of real-time events. In Social Computing (SocialCom), 2013 International Conference on, pages 128-133. IEEE, 2013.
- K. Lerman and R. Ghosh. Information contagion: An empirical study of the spread of news on digg and twitter social networks. International Conference on Weblogs and Social Media, 10:90-97, 2010.
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74-81, 2004.
- A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. Twitinfo: Aggregating and Visualizing Microblogs for Eevent Exploration. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 227-236, 2011.
- R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP), volume 4, page 275. Barcelona, Spain, 2004.
- J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pages 189-198. ACM, 2012.
- B. O'Connor, M. Krieger, and D. Ahn. TweetMotif: Exploratory Search and Topic Summarization for Twitter. In International AAAI Conference on Web and Social Media (ICWSM), 2010.
- D. A. Shamma, L. Kennedy, and E. F. Churchill. Tweet the debates: Understanding community annotation of uncollected sources. In Proceedings of the first SIGMM workshop on Social media, pages 3-10, 2009.
- B. P. Sharifi. Automatic microblog classification and summarization. Doctoral dissertation, University of Colorado at Colorado Springs, 2010, 2010.
- B. P. Sharifi, M. A. Hutton, and J. Kalita. Summarizing Microblogs Automatically. In Human Language Technologies, pages 685-688. Association for Computational Linguistics, 2010.
- B. P. Sharifi, M. A. Hutton, and J. K. Kalita. Experiments in Microblog Summarization. In IEEE International Conference on Social Computing, 2010.
- A. Singhal. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4):35-43, 2001.
- Skysports. European Championships Commentary. In http: // goo. gl/ Wk3mR6 , [accessed Jan-2016].
- Skysports. UEFA Champions League Commentary. In http: // goo. gl/ Df1NQo , [accessed Jan-2016].
- K. Tao, F. Abel, C. Hauff, G. Houben, and U. Gadiraju. Groundhog Day: Near-Duplicate Detection on Twitter. In Proceedings of the international conference on World Wide Web, 2013.
- Twitter. Twitter Statistics. Technical report, available at www.statisticbrain.com/twitter-statistics/, Online; accessed Jan-2016.
- UEFAchampionsLeague. UCL 2012 Final Post-Match Commentary. In http: // goo. gl/ LWift2 , [accessed Jan-2016].