Academia.eduAcademia.edu

Outline

Post Summarization of Microblogs of Sporting Events

26th International World Wide Web (WWW) Conference, 2017

https://doi.org/10.1145/3041021.3054146

Abstract

Every day 645 million Twitter users generate approximately 58 million tweets. This motivates the question if it is possible to generate a summary of events from this rich set of tweets only. Key challenges in post summarization from microblog posts include circumnavigating spam and conversational posts. In this study, we present a novel technique called lexi-temporal clustering (LTC), which identifies key events. LTC uses k-means clustering and we explore the use of various distance measures for clustering using Euclidean, cosine similarity and Manhattan distance. We collected three original data sets consisting of Twitter mi-croblog posts covering sporting events, consisting of a cricket and two football matches. The match summaries generated by LTC were compared against standard summaries taken from sports sections of various news outlets, which yielded up to 81% precision, 58% recall and 62% F-measure on different data sets. In addition, we also report results of all three varian...

References (32)

  1. REFERENCES
  2. G. Beverungen and J. Kalita. Evaluating methods for summarizing twitter posts. In Proceedings of International AAAI Conference on Web and Social Media (ICWSM), 11:9-12, 2011.
  3. S. Bird, E. Klein, and E. Loper. Natural Language Processing with Python. O ŠReilly Media Inc., 2009.
  4. D. Chakrabarti and K. Punera. Event Summarization Using Tweets. In International Conference on Weblogs and Social Media (ICWSM), 2011.
  5. M. Chaput. stemming 1.0 : Python package index. https: // pypi. python. org/ pypi/ stemming/ 1. 0 , 2017.
  6. eMarketer. Worldwide Social Network Users: 2013 Forecast and Comparative Estimates. Technical report, eMarketer, 2013.
  7. G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457-479, 2004.
  8. ESPN. ESPN Commentary. In http: // goo. gl/ UHpQBO , [accessed Jan-2016].
  9. ESPNcricinfo. Indian Premier League -Final, Kolkata Knight Riders vs Chennai Super Kings, Scorecard. In http: // goo. gl/ vTpi3l , [accessed Jan-2016].
  10. R. Halvorsen. Simple Twitter Streaming API access, tweetstream 1.1.1, https://pypi.python.org/pypi/tweetstream. Technical report, Pyhthon.org, 2011.
  11. Y. Hu, A. John, D. D. Seligmann, and F. Wang. What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds. In Intern. Conf. on Weblogs and Social Media, 2012.
  12. K. Inc. Klout | be known for what you love. https: // klout. com/ , 2015, 2015.
  13. Indiatoday. IPL 2012 Final Live: scores and commentary. In http: // goo. gl/ UIhIkR , [accessed Jan-2016].
  14. D. Inouye and J. K. Kalita. Comparing Twitter Summarization Algorithms for Multiple Post Summaries. In Third IEEE International Conference on Social Computing (SocialCom), pages 298-306, October 2011.
  15. R. Kelly. Twitter Study Reveals Interesting Results About Usage, 40% is Pointless Babble. http: // goo. gl/ DZea6f , 2009.
  16. M. A. H. Khan, D. Bollegala, G. Liu, and K. Multi-tweet summarization of real-time events. In Social Computing (SocialCom), 2013 International Conference on, pages 128-133. IEEE, 2013.
  17. K. Lerman and R. Ghosh. Information contagion: An empirical study of the spread of news on digg and twitter social networks. International Conference on Weblogs and Social Media, 10:90-97, 2010.
  18. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74-81, 2004.
  19. A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. Twitinfo: Aggregating and Visualizing Microblogs for Eevent Exploration. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 227-236, 2011.
  20. R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP), volume 4, page 275. Barcelona, Spain, 2004.
  21. J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pages 189-198. ACM, 2012.
  22. B. O'Connor, M. Krieger, and D. Ahn. TweetMotif: Exploratory Search and Topic Summarization for Twitter. In International AAAI Conference on Web and Social Media (ICWSM), 2010.
  23. D. A. Shamma, L. Kennedy, and E. F. Churchill. Tweet the debates: Understanding community annotation of uncollected sources. In Proceedings of the first SIGMM workshop on Social media, pages 3-10, 2009.
  24. B. P. Sharifi. Automatic microblog classification and summarization. Doctoral dissertation, University of Colorado at Colorado Springs, 2010, 2010.
  25. B. P. Sharifi, M. A. Hutton, and J. Kalita. Summarizing Microblogs Automatically. In Human Language Technologies, pages 685-688. Association for Computational Linguistics, 2010.
  26. B. P. Sharifi, M. A. Hutton, and J. K. Kalita. Experiments in Microblog Summarization. In IEEE International Conference on Social Computing, 2010.
  27. A. Singhal. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4):35-43, 2001.
  28. Skysports. European Championships Commentary. In http: // goo. gl/ Wk3mR6 , [accessed Jan-2016].
  29. Skysports. UEFA Champions League Commentary. In http: // goo. gl/ Df1NQo , [accessed Jan-2016].
  30. K. Tao, F. Abel, C. Hauff, G. Houben, and U. Gadiraju. Groundhog Day: Near-Duplicate Detection on Twitter. In Proceedings of the international conference on World Wide Web, 2013.
  31. Twitter. Twitter Statistics. Technical report, available at www.statisticbrain.com/twitter-statistics/, Online; accessed Jan-2016.
  32. UEFAchampionsLeague. UCL 2012 Final Post-Match Commentary. In http: // goo. gl/ LWift2 , [accessed Jan-2016].