Summarizing Disaster Related Event from Microblog
2017
Sign up for access to the world's latest research
Abstract
The Information Retrieval Lab at DA-IICT India participated in text summarization of the Data Challenge track of SMERP 2017. SMERP 2017 track organizers have provided the Italy earthquake tweet dataset along with the set of topics which describe important information required during any disaster related incident. The main goal of this task is to gather how well the participant’s system summarizes important tweets which are relevant to a given topic in 300 words. We have anticipated Text summarization as a clustering problem. Our approach is based on extractive summarization. We have submitted runs in both the levels with different methodologies. We have done query expansion on the topics using Wordnet. In the first level, we have calculated the cosine similarity score between tweets and expanded query. In the second level, we have used language model with Jelinek-Mercer smoothing to calculate relevance score between tweets and expanded query. We have selected tweets above a relevanc...
Related papers
Journal of Global Tourism Research, 2017
In Japan, where natural disasters occurs frequently, obtaining and delivering accurate information promptly when a disaster occurs is essential to minimize damage. Information from traditional mass media contain a number of general information unrelated to disaster, so there are limitations in delivering necessary information to the resident in affected area. On the other hand, Twitter, one of the popular social media, is expected to play an important role during disaster because of its simplicity, promptness and wide propagation. However, because of its huge size of users, there are too many tweets which hinders timely extraction of relevant information. Disaster information is also useful for business travellers and tourists. They are less informed about the area and the challenge is to provide them with accurate information promptly. Our study proposes to establish a system to assist real time understanding of disaster by extracting relevant information efficiently from messages tweeted during two typhoons. First, binary classification is applied to classify and extract disaster tweets from tweets group. By using BNS method, the improvement in accuracy is confirmed. Then clustering is applied to the disaster tweets. The tweets are classified by 15 clusters generated. The result yields F measure of 0.59.
Now a day's social networking sites are the fastest medium which delivers news to user as compare to the news paper and television. There so many social networking sites are present and one of them is Twitter. Twitter allows large no. of users to share/post their views, ideas on any particular event. According to recent survey daily 340 million Tweets are sent on Twitter which is on a different topic and only 4% of posts on Twitter have relevant news data. It is not possible for any human to read the posts to get meaningful information related to specific events. There is one solution to this problem, i.e. we have to apply Summarization technique on it. In this paper, we have used an algorithm which uses a frequency count technique along with this we have also used some NLP features to summarize the event specified by the user. This automatic summarization algorithm handles the numerous, short, dissimilar, and noisy nature of tweets. We believe our novel approach helps users as well as researchers.
cs.uccs.edu
Microblogs like Twitter 1 are becoming increasingly popular and serve as a source of ample data on breaking news, public opinion, etc. However, it can be hard to find relevant, meaningful information from the enormous amount of activity on a microblog. Previous work has explored the use of clustering algorithms to create multi-post summaries as a way of understanding the vast amount of microblog activity. Clustering of microblog data is notoriously difficult because of non-standard orthography, noisiness, limited sets of features, and ambiguity as to the correct number of clusters. We examine several methods of making standard natural language processing techniques more amenable to the domain of Twitter including normalization, term expansion, improved feature selection, noise reduction, and estimation of the number of natural clusters in a set of posts. We show that these techniques can be used to improve the quality of extractive summaries of Twitter posts, providing valuable tools for understanding and utilizing microblog data.
Proceedings of the ACM Web Conference 2022
Microblogging platforms like Twitter have been heavily leveraged to report and exchange information about natural disasters. The real-time data on these sites is highly helpful in gaining situational awareness and planning aid efforts. However, disaster-related messages are immersed in a high volume of irrelevant information. The situational data of disaster events also vary greatly in terms of information types ranging from general situational awareness (caution, infrastructure damage, casualties) to individual needs or not related to the crisis. It thus requires efficient methods to handle data overload and prioritize various types of information. This paper proposes an interpretable classification-summarization framework that first classifies tweets into different disaster-related categories and then summarizes those tweets. Unlike existing work, our classification model can provide explanations or rationales for its decisions. In the summarization phase, we employ an Integer Linear Programming (ILP) based optimization technique along with the help of rationales to generate summaries of event categories. Extensive evaluation on large-scale disaster events shows (a). our model can classify tweets into disaster-related categories with an 85% Macro F1 score and high interpretability (b). the summarizer achieves (5-25%) improvement in terms of ROUGE-1 F-score over most state-of-the-art approaches.
2017
Microblogging sites like Twitter are increasingly being used for aiding relief operations during disaster events. In such situations, identifying actionable information like needs and availabilities of various types of resources is critical for effective coordination of post disaster relief operations. However, such critical information is usually submerged within a lot of conversational content, such as sympathy for the victims of the disaster. Hence, automated IR techniques are needed to find and process such information. In this paper, we utilize word vector embeddings along with fastText sentence classification algorithm to perform the task of classification of tweets posted during natural disasters.
Every day 645 million Twitter users generate approximately 58 million tweets. This motivates the question if it is possible to generate a summary of events from this rich set of tweets only. Key challenges in post summarization from microblog posts include circumnavigating spam and conversational posts. In this study, we present a novel technique called lexi-temporal clustering (LTC), which identifies key events. LTC uses k-means clustering and we explore the use of various distance measures for clustering using Euclidean, cosine similarity and Manhattan distance. We collected three original data sets consisting of Twitter mi-croblog posts covering sporting events, consisting of a cricket and two football matches. The match summaries generated by LTC were compared against standard summaries taken from sports sections of various news outlets, which yielded up to 81% precision, 58% recall and 62% F-measure on different data sets. In addition, we also report results of all three variants of the recall-oriented understudy for gisting evaluation (ROUGE) software, a tool which compares and scores automatically generated summaries against standard summaries.
IRJMETS Pubication, 2021
In the big data era, there has been an explosion in the amount of text data from a variety of sources. This volume of text needs to be effectively summarized to be useful. Twitter is increasingly becoming an ideal platform for getting access to real-time response from the crowd about ongoing public events, but the large number of messages from the users often leads to the information overload problem. Thus, to make use of Twitter's real-time nature, it is imperative to develop effective methods for automatically detecting events from a Twitter stream and objectively summarizing them. The aim of the project is to produce a summary of events as seen by the audience using tweets relating to a match. The model is built for the example of India Vs England T2O 5th ODI held on 20th March, 2021. The proposed framework consists of three key components: Identification of important moments from the twitter stream, extraction of representative tweets from each of the moments, and summarization of the tweets to produce an overall summary. Standardisation is done on the best summary obtained out of the three chosen methods-TextRank, TF-IDF based scoring and frequency based scoring.
International Journal of Rough Sets and Data Analysis, 2016
Summary generation is an important process in those conditions where the user needs to obtain the key features of the document without having to go through the whole document itself. The summarization process is of basically two types: 1) Single document Summarization and, 2) Multiple Document Summarization. But here the microblogging environment is taken into account which have a restriction on the number of characters contained within a post. Therefore, single document summarizers are not applicable to this condition. There are many features along which the summarization of the microblog post can be done for example, post's topic, it's posting time, happening of the event, etc. This paper proposes a method that includes the temporal features of the microblog posts to develop an extractive summary of the event from each and every post, which will further increase the quality of the summary created as it includes all the key features in the summary.
2018
This paper addresses the challenge of tweet stream filtering and summarization, which is an important task for keeping users up to date on topics they care about without overwhelming them with irrelevant and redundant posts. To cut down the noise and shield users from unwanted posts, tweet stream is filtered and a concise summary containing relevant and non-redundant posts is generated. Rather than rely on traditional threshold filter based only on tweet content, we exploit social signals as well as query dependent features to train a binary classifier in attempts to filter out irrelevant tweets with respect to the topic of interest. The core intuition is that the use of machine learning algorithm allows to overcome the issue of threshold setting and to examine how effective is the use of social signals in tweet filtering. Unlike existing approaches that generate a summary by selecting iteratively top weighted tweets, we formulate the summary generation as an optimization problem to...

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (5)
- SMERP ECIR 2017 guidelines, http://www.computing.dcu.ie/~dganguly/smerp2017/
- Bagdouri, M., Oard, D.W.: CLIP at TREC 2015: Microblog and LiveQA. In :TREC (2015)
- Tan, L., Roegiest, A. and Clarke, C.L.: University of Waterloo at TREC 2015 Microblog Track. In : TREC (2015).
- Tan, L., Roegiest, A., Clarke, C.L. and Lin, J.: Simple dynamic emission strategies for mi- croblog filtering. In : Proc. 39th International ACM SIGIR conference on Research and Development in Information Retrieval ,pp. 1009-1012. ACM (2016)
- Sakaki, T., Okazaki, M. and Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proc. 19th international conference on World wide web, pp. 851-860. ACM (2010)