International Conference on Advances in Social Networks Analysis and Mining, 2015
Microblogs such as Twitter are characterized by the richness and recency of information shared by... more Microblogs such as Twitter are characterized by the richness and recency of information shared by their users during major events. However, it is very challenging to automatically mine for information or for users sharing certain information due to the huge variety of unstructured stream of data shared in such microblogs. This work proposes a ranking and classification model for identifying users sharing useful information during a specified event. The model is based on a novel set of features that can be computed in real time. These features are designed such that they take into account both the on and off-topic activities of a user. Once users are characterized by a feature vector, supervised machine learning tool is trained to classify users as either prominent or not. Our model has been tested on data shared during a flooding disaster event and performed very well. The achieved results show the effectiveness of the proposed model for both the classification and ranking of prominent users in such events, and also the importance of the adjustment of the on-topic features by the off-topic ones when describing users' activities.
Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
During specific real-world events, some users of microblogging platforms could provide exclusive ... more During specific real-world events, some users of microblogging platforms could provide exclusive information about those events. The identification of such prominent users depends on several factors such as the freshness and the relevance of their shared information. This work proposes a probabilistic model for the identification of prominent users in microblogs during specific events. The model is based on learning and classifying user behavior over time using Mixture of Gaussians Hidden Markov Models. A user is characterized by a temporal sequence of feature vectors describing his activities. The features computed at each time-stamp are designed to reflect both the on-and off-topic activities of users. To validate the efficacy of our proposed model, we have conducted experiments on data collected from Twitter during the Herault floods that have occurred in France. The achieved results show that learning the time-series of users' actions is better than learning just those actions without temporal information.
Detecting prominent microblog users over crisis events phases
Inf. Syst., 2018
During crisis events such as disasters, the need for real-time information retrieval (IR) from mi... more During crisis events such as disasters, the need for real-time information retrieval (IR) from microblogs becomes essential. However, the huge amount and the variety of the shared information in real time during such events over-complicates this task. Unlike existing IR approaches based on content analysis, we propose to tackle this problem by using user-centric IR approaches with identifying and tracking prominent microblog users who are susceptible to share relevant and exclusive information at an early stage of each analyzed event phase. This approach ensures real-time access to the valuable microblogs information required by the emergency teams. In this approach, we propose a phase-aware probabilistic model for predicting and ranking prominent microblog users over time according to their behavior using Mixture of Gaussians Hidden Markov Models (MoG-HMM). The model utilizes a new user representation which takes into account both the user and the event specificities over time. Thi...
ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT
Text detection and recognition in a natural environment are key components of many applications, ... more Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text (MLT) in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together. This competition is an extension of the Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context. The proposed competition is presented as a new challenge of the RRC. The dataset built for this challenge largely extends the previous RRC editions in many aspects: the multi-lingual text, the size of the dataset, the multi-oriented text, the wide variety of scenes. The dataset is comprised of 18,000 images which contain text belonging to 9 languages. The challenge is comprised of three tasks related to text detection and sc...
Content shared in microblogs during disasters is expressed in various formats and languages. This... more Content shared in microblogs during disasters is expressed in various formats and languages. This diversity makes the information retrieval process more complex and computationally infeasible in real time. To address this, we propose a classification model for the identification of prominent users who are sharing relevant and exclusive information during the disaster. Users who have shared at least one tweet about the disaster are modeled using three kinds of time-sensitive features, including topical, social and geographical features. Then, these users are classified into two classes using a linear Support Vector Machine (SVM) to evaluate them over the extracted features and identify the most prominent ones. The first results using the actual dataset, show that our model has a high accuracy by detecting most of the prominent users. Moreover, we demonstrate that all the proposed features used by our model are indispensable to achieve this high accuracy.
Lecture Notes in Business Information Processing, 2015
This paper presents a learning-based approach for the selection of relevant feature categories in... more This paper presents a learning-based approach for the selection of relevant feature categories in the context of information retrieval from microblogs during unexpected disasters. Our information retrieval strategy consists of identifying prominent microblog users who are susceptible to share relevant and exclusive information in a disaster case. To identify these users, we evaluate the effectiveness of the state-of-the-art features characterizing microblog users for the identification of prominent users in a specific context. We experimented with a different sets of feature categories to determine those that discriminate prominent users sets from non-prominent ones interacting in Twitter during the 2014 Herault floods that occurred in France. The achieved results show that onand off-topical user activities features are the most representative features for identifying prominent users in a disaster context. We also note that SVM outperforms the ANN learning algorithm for this classification context especially when it is trained with additional spatial features.
The response phase in a disaster case is often considered to be the most critical in terms of sav... more The response phase in a disaster case is often considered to be the most critical in terms of saving lives and dealing with irreversible damage. The timely provision of geospatial information is crucial in the decision-making process. Thus, there is a need for the integration of heterogeneous spatial databases which are inherently distributed and created under different projects by various organizations. The integration of all relevant data for timely decision making is a challenging task due to syntactic, schematic and semantic heterogeneity. This paper aims to propose a framework for the integration of heterogeneous spatial databases using novel approaches, such as web services and ontologies. We focus on providing solutions for the three levels of heterogeneity, in order to be able to interrogate the content of the different databases conveniently. Based on the proposed framework, we implemented a use case using heterogeneous data belonging to La Rochelle city in France.
Microblogs have proved their potential to attract people from all over the world to express volun... more Microblogs have proved their potential to attract people from all over the world to express voluntarily what is happening around them during unexpected events. However, retrieving relevant information from the huge amount of data shared in real time in these microblogs remain complex. This paper proposes a new system named MASIR for real-time information retrieval from microblogs during unexpected events. MASIR is based on a decentralized and collaborative multi-agent approach analyzing the profiles of users interested in a given event in order to detect the most prominent ones that have to be tracked in real time. Real time monitoring of these users enables a direct access to valuable fresh information. Our experiments shows that MASIR simplifies the real-time detection and tracking of the most prominent users by exploring both the old and fresh information shared during the event and outperforms the standard centrality measures by using a time-sensitive ranking model.
During crisis events such as disasters, the need of real-time information retrieval (IR) from mic... more During crisis events such as disasters, the need of real-time information retrieval (IR) from microblogs remains inevitable. However, the huge amount and the variety of the shared information in real time during such events over-complicate this task. Unlike existing IR approaches based on content analysis, we propose to tackle this problem by using user-centricIR approaches with solving the wide spectrum of methodological and technological barriers inherent to : 1) the collection of the evaluated users data, 2) the modeling of user behavior, 3) the analysis of user behavior, and 4) the prediction and tracking of prominent users in real time. In this context, we detail the different proposed approaches in this dissertation leading to the prediction of prominent users who are susceptible to share the targeted relevant and exclusive information on one hand and enabling emergency responders to have a real-time access to the required information in all formats (i.e. text, image, video, l...
Uploads
Papers by Imen Bizid