Summarizing Disaster Related Event from Microblog

Prasenjit Majumder

Outline

Conclusions and Future Work

References

Summarizing Disaster Related Event from Microblog

Prasenjit Majumder

2017

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The Information Retrieval Lab at DA-IICT India participated in text summarization of the Data Challenge track of SMERP 2017. SMERP 2017 track organizers have provided the Italy earthquake tweet dataset along with the set of topics which describe important information required during any disaster related incident. The main goal of this task is to gather how well the participant’s system summarizes important tweets which are relevant to a given topic in 300 words. We have anticipated Text summarization as a clustering problem. Our approach is based on extractive summarization. We have submitted runs in both the levels with different methodologies. We have done query expansion on the topics using Wordnet. In the first level, we have calculated the cosine similarity score between tweets and expanded query. In the second level, we have used language model with Jelinek-Mercer smoothing to calculate relevance score between tweets and expanded query. We have selected tweets above a relevanc...

Hidetaka Nambo

Journal of Global Tourism Research, 2017

In Japan, where natural disasters occurs frequently, obtaining and delivering accurate information promptly when a disaster occurs is essential to minimize damage. Information from traditional mass media contain a number of general information unrelated to disaster, so there are limitations in delivering necessary information to the resident in affected area. On the other hand, Twitter, one of the popular social media, is expected to play an important role during disaster because of its simplicity, promptness and wide propagation. However, because of its huge size of users, there are too many tweets which hinders timely extraction of relevant information. Disaster information is also useful for business travellers and tourists. They are less informed about the area and the challenge is to provide them with accurate information promptly. Our study proposes to establish a system to assist real time understanding of disaster by extracting relevant information efficiently from messages tweeted during two typhoons. First, binary classification is applied to classify and extract disaster tweets from tweets group. By using BNS method, the improvement in accuracy is confirmed. Then clustering is applied to the disaster tweets. The tweets are classified by 15 clusters generated. The result yields F measure of 0.59.

downloadDownload free PDF View PDFchevron_right

Twitter Event Summarization Using Phrase Reinforcement Algorithm and NLP Features

International Journal IJRITCC

Now a day's social networking sites are the fastest medium which delivers news to user as compare to the news paper and television. There so many social networking sites are present and one of them is Twitter. Twitter allows large no. of users to share/post their views, ideas on any particular event. According to recent survey daily 340 million Tweets are sent on Twitter which is on a different topic and only 4% of posts on Twitter have relevant news data. It is not possible for any human to read the posts to get meaningful information related to specific events. There is one solution to this problem, i.e. we have to apply Summarization technique on it. In this paper, we have used an algorithm which uses a frequency count technique along with this we have also used some NLP features to summarize the event specified by the user. This automatic summarization algorithm handles the numerous, short, dissimilar, and noisy nature of tweets. We believe our novel approach helps users as well as researchers.

downloadDownload free PDF View PDFchevron_right

Evaluating Methods for Summarizing Twitter Posts

Abhik Dhekar

cs.uccs.edu

Microblogs like Twitter 1 are becoming increasingly popular and serve as a source of ample data on breaking news, public opinion, etc. However, it can be hard to find relevant, meaningful information from the enormous amount of activity on a microblog. Previous work has explored the use of clustering algorithms to create multi-post summaries as a way of understanding the vast amount of microblog activity. Clustering of microblog data is notoriously difficult because of non-standard orthography, noisiness, limited sets of features, and ambiguity as to the correct number of clusters. We examine several methods of making standard natural language processing techniques more amenable to the domain of Twitter including normalization, term expansion, improved feature selection, noise reduction, and estimation of the number of natural clusters in a set of posts. We show that these techniques can be used to improve the quality of extractive summaries of Twitter posts, providing valuable tools for understanding and utilizing microblog data.

downloadDownload free PDF View PDFchevron_right

Towards an Interpretable Approach to Classify and Summarize Crisis Events from Microblogs

Thi Hoa Nguyen

Proceedings of the ACM Web Conference 2022

Microblogging platforms like Twitter have been heavily leveraged to report and exchange information about natural disasters. The real-time data on these sites is highly helpful in gaining situational awareness and planning aid efforts. However, disaster-related messages are immersed in a high volume of irrelevant information. The situational data of disaster events also vary greatly in terms of information types ranging from general situational awareness (caution, infrastructure damage, casualties) to individual needs or not related to the crisis. It thus requires efficient methods to handle data overload and prioritize various types of information. This paper proposes an interpretable classification-summarization framework that first classifies tweets into different disaster-related categories and then summarizes those tweets. Unlike existing work, our classification model can provide explanations or rationales for its decisions. In the summarization phase, we employ an Integer Linear Programming (ILP) based optimization technique along with the help of rationales to generate summaries of event categories. Extensive evaluation on large-scale disaster events shows (a). our model can classify tweets into disaster-related categories with an 85% Macro F1 score and high interpretability (b). the summarizer achieves (5-25%) improvement in terms of ROUGE-1 F-score over most state-of-the-art approaches.

downloadDownload free PDF View PDFchevron_right

BITS_PILANI@IMRiDis-FIRE 2017: Information Retrieval from Microblog during Disasters

Yashvardhan Sharma

2017

Microblogging sites like Twitter are increasingly being used for aiding relief operations during disaster events. In such situations, identifying actionable information like needs and availabilities of various types of resources is critical for effective coordination of post disaster relief operations. However, such critical information is usually submerged within a lot of conversational content, such as sympathy for the victims of the disaster. Hence, automated IR techniques are needed to find and process such information. In this paper, we utilize word vector embeddings along with fastText sentence classification algorithm to perform the task of classification of tweets posted during natural disasters.

downloadDownload free PDF View PDFchevron_right

Post Summarization of Microblogs of Sporting Events

Saad Saleh, Usman Ilyas

Every day 645 million Twitter users generate approximately 58 million tweets. This motivates the question if it is possible to generate a summary of events from this rich set of tweets only. Key challenges in post summarization from microblog posts include circumnavigating spam and conversational posts. In this study, we present a novel technique called lexi-temporal clustering (LTC), which identifies key events. LTC uses k-means clustering and we explore the use of various distance measures for clustering using Euclidean, cosine similarity and Manhattan distance. We collected three original data sets consisting of Twitter mi-croblog posts covering sporting events, consisting of a cricket and two football matches. The match summaries generated by LTC were compared against standard summaries taken from sports sections of various news outlets, which yielded up to 81% precision, 58% recall and 62% F-measure on different data sets. In addition, we also report results of all three variants of the recall-oriented understudy for gisting evaluation (ROUGE) software, a tool which compares and scores automatically generated summaries against standard summaries.

downloadDownload free PDF View PDFchevron_right

TWEET EXTRACTION AND SUMMARISATION FOR SPORTS EVENTS

International Research Journal of Modernization in Engineering Technology and Science (IRJMETS)

IRJMETS Pubication, 2021

In the big data era, there has been an explosion in the amount of text data from a variety of sources. This volume of text needs to be effectively summarized to be useful. Twitter is increasingly becoming an ideal platform for getting access to real-time response from the crowd about ongoing public events, but the large number of messages from the users often leads to the information overload problem. Thus, to make use of Twitter's real-time nature, it is imperative to develop effective methods for automatically detecting events from a Twitter stream and objectively summarizing them. The aim of the project is to produce a summary of events as seen by the audience using tweets relating to a match. The model is built for the example of India Vs England T2O 5th ODI held on 20th March, 2021. The proposed framework consists of three key components: Identification of important moments from the twitter stream, extraction of representative tweets from each of the moments, and summarization of the tweets to produce an overall summary. Standardisation is done on the best summary obtained out of the three chosen methods-TextRank, TF-IDF based scoring and frequency based scoring.

downloadDownload free PDF View PDFchevron_right

Modified LexRank for Tweet Summarization

avinash samuel

International Journal of Rough Sets and Data Analysis, 2016

Summary generation is an important process in those conditions where the user needs to obtain the key features of the document without having to go through the whole document itself. The summarization process is of basically two types: 1) Single document Summarization and, 2) Multiple Document Summarization. But here the microblogging environment is taken into account which have a restriction on the number of characters contained within a post. Therefore, single document summarizers are not applicable to this condition. There are many features along which the summarization of the microblog post can be done for example, post's topic, it's posting time, happening of the event, etc. This paper proposes a method that includes the temporal features of the microblog posts to develop an extractive summary of the event from each and every post, which will further increase the quality of the summary created as it includes all the key features in the summary.

downloadDownload free PDF View PDFchevron_right

Using Relevancer to Detect Relevant Tweets in Nepal Earthquake

ali hürriyetoğlu

2016

downloadDownload free PDF View PDFchevron_right

Aggregative information retrieval on social media stream : Tweet summarization

Bernard Dousset

2018

This paper addresses the challenge of tweet stream filtering and summarization, which is an important task for keeping users up to date on topics they care about without overwhelming them with irrelevant and redundant posts. To cut down the noise and shield users from unwanted posts, tweet stream is filtered and a concise summary containing relevant and non-redundant posts is generated. Rather than rely on traditional threshold filter based only on tweet content, we exploit social signals as well as query dependent features to train a binary classifier in attempts to filter out irrelevant tweets with respect to the topic of interest. The core intuition is that the use of machine learning algorithm allows to overcome the issue of threshold setting and to examine how effective is the use of social signals in tweet filtering. Unlike existing approaches that generate a summary by selecting iteratively top weighted tweets, we formulate the summary generation as an optimization problem to...

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (5)

SMERP ECIR 2017 guidelines, http://www.computing.dcu.ie/~dganguly/smerp2017/
Bagdouri, M., Oard, D.W.: CLIP at TREC 2015: Microblog and LiveQA. In :TREC (2015)
Tan, L., Roegiest, A. and Clarke, C.L.: University of Waterloo at TREC 2015 Microblog Track. In : TREC (2015).
Tan, L., Roegiest, A., Clarke, C.L. and Lin, J.: Simple dynamic emission strategies for mi- croblog filtering. In : Proc. 39th International ACM SIGIR conference on Research and Development in Information Retrieval ,pp. 1009-1012. ACM (2016)
Sakaki, T., Okazaki, M. and Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proc. 19th international conference on World wide web, pp. 851-860. ACM (2010)

Prasenjit Majumder

2017

This paper presents the participation of Information Retrieval Lab(IRLAB) at DAIICT Gandhinagar ,India in Data challenge track of SMERP 2017. This year SMERP Data challenge track has offered a task called Text Extraction on the Italy earthquake tweet dataset, with an objective to retrieve relevant tweets with high recall and high precision. In this task, three runs were submitted by us and we describe the different approaches adopted. Initially, we have performed query expansion on the topics using Wordnet. In the first run, we have ranked tweets using cosine similarity against the topics. In the second run, relevance score between tweets and the topic is calculated using Okapi BM25 ranking function and in the third run relevance score is calculated using language model with Jelinek-Mercer smoothing .

downloadDownload free PDF View PDFchevron_right

Summarization Approach From Microblog During Disaster Events

Nitin Pise

International journal of computer applications, 2017

During bulk convergence events such as natural disasters, microblogging platforms like Twitter are broadly used by affected people to post situational awareness messages. As soon as natural disaster events happen, users are willing to know more about them. Twitter is a great source that can be exploited for obtaining such fine-grained arranged information for fresh natural disaster events. These crisisrelated messages disperse among multiple categories like infrastructure damage, information about bomb blast, missing, injured, and dead people etc. The challenge here is to create summary from disaster related tweets and filter the short spam url containing tweets.

downloadDownload free PDF View PDFchevron_right

A Survey on Automatic Twitter Event Summarization

baby bhattacharya

J. Inf. Process. Syst., 2018

Twitter is one of the most popular social platforms for online users to share trendy information and views on any event. Twitter reports an event faster than any other medium and contains enormous information and views regarding an event. Consequently, Twitter topic summarization is one of the most convenient ways to get instant gist of any event. However, the information shared on Twitter is often full of nonstandard abbreviations, acronyms, out of vocabulary (OOV) words and with grammatical mistakes which create challenges to find reliable and useful information related to any event. Undoubtedly, Twitter event summarization is a challenging task where traditional text summarization methods do not work well. In last decade, various research works introduced different approaches for automatic Twitter topic summarization. The main aim of this survey work is to make a broad overview of promising summarization approaches on a Twitter topic. We also focus on automatic evaluation of summ...

downloadDownload free PDF View PDFchevron_right

Automatic Summarization of Events from Social Media

Freddy C . Chua

ABSTRACT Social media services such as Twitter generate phenomenal volume of content for most real-world events on a daily basis. Digging through the noise and redundancy to understand the important aspects of the content is a very challenging task. We propose a search and summarization framework to extract relevant representative tweets from an unfiltered tweet stream in order to generate a coherent and concise summary of an event.

downloadDownload free PDF View PDFchevron_right

Correlation Distance based Information Extraction System at FIRE 2016 Microblog Track

Saptarashmi Bandyopadhyay

2016

The FIRE 2016 Microblog track provided a set of tweets posted during the Nepal earthquake on April 2015, and a set of seven topics. The challenge was to extract all tweets relevant to each topic. In this method, separate word bags are constructed for each topic describing a generic information need during a disaster situation using topic seed words, stemmers, dictionary and other NLP tools. The topic word bags have been populated with scrambled words that generally appear as noise words in tweet texts. The correlation distance between the topic word bag vectors and each tweet text vector is computed. The correlation distance measure is used to compute the relevance score of each tweet to each topic. Special consideration is taken for the topics that are conditioned on the presence of organization names, location names and Geo locations. Organization names and location names are identified on the crawled tweet texts. The presence of geo locations in the crawled tweets is also identif...

downloadDownload free PDF View PDFchevron_right

Summarization Method and Timeline Generation of the Tweet

International Journal of Scientific Research in Science, Engineering and Technology IJSRSET

Twitter is the most popular micro blogging web site. More than millions of tweets are posted along twitter every day. Tweets contains huge amount of noisy and redundant data. It is very important to summarize the huge amount of tweets by reducing the size of tweets and removing the noise, for improving the result accuracy. The operations over flood of tweets are not an easy task. There are so many tweets are unrelated, also arrival rate of tweets is fast. To handle these problems, there is a need of efficient and strong summarization algorithm. This algorithm should be flexible with random time duration. For topic evolution system should detect sub-topic and keeps track for any changes occur with the time. To achieve all these goals, proposed system performs three types of operations on tweets, named as clustering of tweets, summarization and topic evaluation over tweeter data. Framework has component is data duplication checking using SHA1 hashing strategy. Framework used clustering procedure it uses EM clustering and compare the EM clustering algorithm with K-means clustering algorithm. After this, tweets are summarized with greedy algorithm, which is more accuracy as compared to traditional summarization algorithm. Finally, the topic is detected for generated summary. Experimental results proves that the proposed system summarize the tweets more accurately and efficiently.

downloadDownload free PDF View PDFchevron_right

Summarizing twitter posts regarding COVID-19 based on n-grams

Indonesian Journal of Electrical Engineering and Computer Science

Indonesian Journal of Electrical Engineering and Computer Science, 2023

The COVID-19 pandemic announced by the World Health Organization has disrupted human lives at different scales, including the economy, public health, and people's emotions. Social media databases record huge accumulated information concern this pandemic. Twitter platform is considered one of the most active social media that enable users to tweet in different conversations they are concerned about. The problem arises when tweeters want to search about a specific topic. They can only sort tweets by its recency to understand conversation and not by relevancy. This makes tweeters read through the most tweets to understand what was firstly discussed about the related topic. Some strategies were developed for summarizing tweets but summarizing topics of COVID-19 are still at the beginning. The current research aims to introduce a technique to present a short summary related COVID-19 topics with consuming little time and effort. Thus, summarization task started by clustering topics based on latent dirichlet allocation (LDA) method and K-means clustering and then selected the important sentences to format summarization. The study also compares bigram-based and unigram-based summarization. Different metrics were used to evaluate results and experiments at each stage, and the output of the proposal system was evaluated using ROUGE metrics.

downloadDownload free PDF View PDFchevron_right

EnDSUM: Entropy and Diversity based Disaster Tweet Summarization

Sourav Dandapat

Cornell University - arXiv, 2022

The huge amount of information shared in Twitter during disaster events are utilized by government agencies and humanitarian organizations to ensure quick crisis response and provide situational updates. However, the huge number of tweets posted makes manual identification of the relevant tweets impossible. To address the information overload, there is a need to automatically generate summary of all the tweets which can highlight the important aspects of the disaster. In this paper, we propose an entropy and diversity based summarizer, termed as EnDSUM, specifically for disaster tweet summarization. Our comprehensive analysis on 6 datasets indicates the effectiveness of EnDSUM and additionally, highlights the scope of improvement of EnDSUM.

downloadDownload free PDF View PDFchevron_right

CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization

Shihao Ran

Cornell University - arXiv, 2022

Social media has increasingly played a key role in emergency response: first responders can use public posts to better react to ongoing crisis events and deploy the necessary resources where they are most needed. Timeline extraction and abstractive summarization are critical technical tasks to leverage large numbers of social media posts about events. Unfortunately, there are few datasets for benchmarking technical approaches for those tasks. This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date. CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms. We built CrisisLTLSum using a semi-automated clusterthen-refine approach to collect data from the public Twitter stream. Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks. Our dataset, code, and models are publicly available. 1

downloadDownload free PDF View PDFchevron_right

TweetSum: Automated News Summarization of Twitter Trends

rhea parekh

International Journal of Computer Applications, 2017

In the recent times, content generated on social blogging sites has become an encyclopedic source of information for every concerned topic in the world[4]. These sites have enabled people to contribute to the vastness of the content available on the internet. During such times, the amount of disorganized and repetitive information generated through these platforms has complicated the means to get the key information. The application that we propose takes an input phrase from the user, captures all tweets related to it and uses them to create a summary from the content already available on the internet.

downloadDownload free PDF View PDFchevron_right

Summarizing Disaster Related Event from Microblog

Sign up for access to the world's latest research

Abstract

Related papers

References (5)

Related papers