Papers by ali hürriyetoğlu
Smelly, dense, and spreaded: The Object Detection for Olfactory References (ODOR) dataset
Expert systems with applications, Jun 1, 2024
Validating Digital Traces with Survey Data: The Use Case of Religiosity

arXiv (Cornell University), Jun 10, 2024
Research in the food domain is at times limited due to data sharing obstacles, such as data owner... more Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing obstacles. This systematic review investigates the use of federated learning within the food domain, structures included papers in a federated learning framework, highlights knowledge gaps, and discusses potential applications. A total of 41 papers were included in the review. The current applications include solutions to water and milk quality assessment, cybersecurity of water processing, pesticide residue risk analysis, weed detection, and fraud detection, focusing on centralized horizontal federated learning. One of the gaps found was the lack of vertical or transfer federated learning and decentralized architectures.
A Computational Analysis of the Ideological Landscape of Turkey and Electoral Behavior
Zenodo (CERN European Organization for Nuclear Research), Nov 5, 2023
arXiv (Cornell University), May 28, 2024
We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP... more We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP 2022. The workshop consists of regular papers, two keynotes, working papers of shared task participants, and task overview papers. This workshop has been bringing together all aspects of event information collection across technical and social science fields. In addition to the progress in depth, the submission and acceptance of multimodal approaches show the widening of this interdisciplinary research topic.

Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the ... more The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the Causal News Corpus. Subtask 1 required participants to predict if a sentence contains a causal relation or not. This is a supervised binary classification task. Subtask 2 required participants to identify the Cause, Effect and Signal spans per causal sentence. This could be seen as a supervised sequence labeling task. For both subtasks, participants uploaded their predictions for a held-out test set, and ranking was done based on binary F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper summarizes the work of the 17 teams that submitted their results to our competition and 12 system description papers that were received. The best F1 scores achieved for Subtask 1 and 2 were 86.19% and 54.15%, respectively. All the top-performing approaches involved pretrained language models fine-tuned to the targeted task. We further discuss these approaches and analyze errors across participants' systems in this paper.
Zero-Shot Ranking Socio-Political Texts with Transformer Language Models to Reduce Close Reading Time
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

arXiv (Cornell University), Nov 22, 2022
The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the ... more The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the Causal News Corpus. Subtask 1 required participants to predict if a sentence contains a causal relation or not. This is a supervised binary classification task. Subtask 2 required participants to identify the Cause, Effect and Signal spans per causal sentence. This could be seen as a supervised sequence labeling task. For both subtasks, participants uploaded their predictions for a held-out test set, and ranking was done based on binary F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper summarizes the work of the 17 teams that submitted their results to our competition and 12 system description papers that were received. The best F1 scores achieved for Subtask 1 and 2 were 86.19% and 54.15%, respectively. All the topperforming approaches involved pre-trained language models fine-tuned to the targeted task. We further discuss these approaches and analyze errors across participants' systems in this paper.
arXiv (Cornell University), Nov 21, 2022
We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP... more We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP 2022. The workshop consists of regular papers, two keynotes, working papers of shared task participants, and task overview papers. This workshop has been bringing together all aspects of event information collection across technical and social science fields. In addition to the progress in depth, the submission and acceptance of multimodal approaches show the widening of this interdisciplinary research topic.

This workshop is the fourth issue of a series of workshops on automatic extraction of sociopoliti... more This workshop is the fourth issue of a series of workshops on automatic extraction of sociopolitical events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of sociopolitical events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the stateof-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi-and cross-lingual machine learning in few-and zero-shot settings.
Supporting Experts to Handle Tweet Collections About Significant Events
Lecture Notes in Computer Science, 2017
We introduce Relevancer that processes a tweet set and enables generating an automatic classifier... more We introduce Relevancer that processes a tweet set and enables generating an automatic classifier from it. Relevancer satisfies information needs of experts during significant events. Enabling experts to combine automatic procedures with expertise is the main contribution of our approach and the added value of the tool. Even a small amount of feedback enables the tool to distinguish between relevant and irrelevant information effectively. Thus, Relevancer facilitates the quick understanding of and proper reaction to events presented on Twitter.

Estimating Time to Event of Future Events Based on Linguistic Cues on Twitter
Studies in computational intelligence, Nov 18, 2017
Given a stream of Twitter messages about an event, we investigate the predictive power of feature... more Given a stream of Twitter messages about an event, we investigate the predictive power of features generated from words and temporal expressions in the messages to estimate the time to event (TTE). From labeled training data average TTE values of the predictive features are learned, so that when they occur in an event-related tweet the TTE estimate can be provided for that tweet. We utilize temporal logic rules and a historical context integration function to improve the TTE estimation precision. In experiments on football matches and music concerts we show that the estimates of the method are off by 4 and 10 h in terms of mean absolute error on average, respectively. We find that the type and size of the event affect the estimation quality. An out-of-domain test on music concerts shows that models and hyperparameters trained and optimized on football matches can be used to estimate the remaining time to concerts. Moreover, mixing in concert events in training improves the precision of the average football event estimate.
Given a stream of Twitter messages about an event, we investigate the predictive power of tempora... more Given a stream of Twitter messages about an event, we investigate the predictive power of temporal expressions in the messages to estimate the time to event (TTE). From labeled training data we learn average TTE estimates of temporal expressions and combinations thereof, and define basic rules to compute the time to event from temporal expressions, so that when they occur in a tweet that mentions an event we can generate a prediction. We show in a case study on soccer matches that our estimations are off by about eight hours on average in terms of mean absolute error.

Twitter is a social network, which contains information of the city events (concerts, festival, e... more Twitter is a social network, which contains information of the city events (concerts, festival, etc.), city problems (traffic, collision, and road incident), the news, feelings of people, etc. For these reasons, there are many studies, which use tweet data to detect useful information to support the smart city management. In this paper, the ways of finding citizen problems with their locations by using tweet data is discussed. Tweets in Turkish language from the Aegean Region of Turkey were used for the study. It is aimed to form a smart system, which detects problems of citizens and extracts the problems' exact locations from tweet texts. Firstly, the collected data was analyzed to get information of any city event, citizen's complaint or requests about a problem. After the possibility of detecting tweets, which have any city problem, was ensured, two datasets were created. The first one consists of the tweets that have an event information or a problem and the second one has the tweets, which have other information not related to our study. Then Naive Bayes classifier was trained on the annotated tweets and was tested on a separate set of tweets. Accuracy, precision, recall, and F-measure of the classifier is given. A location recognizer, which finds the Turkish place names in a text, is created and applied on the tweets that are marked as information-containing by the classifier to detect the location of the problem precisely. The first findings of the project is promising. The high accuracy, which is obtained by the classifier, shows that it is proper to use this classifier for our study. The location recognizer is planned to be improved and place names on the real-time tweet data is to be detected.

The American Historical Review, Mar 1, 2023
We are all familiar with the famous Shakespearian statement that a rose by any other name would s... more We are all familiar with the famous Shakespearian statement that a rose by any other name would smell as sweet. Yet throughout history the scent of the rose has absorbed, as if by a kind of magnetic attraction, a multitude of meanings. It has often been the flower of power. Contained in expensive casting bottles, the scent of rose oil (commonly known as rose attar) was distributed to the mistresses of Henry VIII. The potent scent evoked an intimacy with the body of the monarch and was a representation of Tudor rule. In the eighteenth and nineteenth centuries, European visitors to the Ottoman Empire noted the use of rose-water, sprinkled on the hands, to welcome guests in an expression of hospitality. The corpses of medieval men and women who emitted the odor of sanctity instead of pungent putrefaction were often said to smell of roses. The scent of the rose has also been central to Islam: the first rose supposedly bloomed from the tear of the Prophet, and the indispensability of rose water in Islamic culture has left its impact in material culture in the form of the gulab, a rounded bottle with an elongated spout for pouring the perfume. The rose has never merely smelled sweet; it has evoked a whole range of meanings. 1
Event Extraction for Balkan Languages
Decision Support Systems, Nov 1, 2012
* endorsed by SIGNLL-ACL's Special Interest Group on Natural Language Learning * endorsed by SIGA... more * endorsed by SIGNLL-ACL's Special Interest Group on Natural Language Learning * endorsed by SIGANN-ACL's Special Interest Group for Annotation * Extended versions of the best papers will be chosen for a special issue of the Decision Support Systems journal (published by Elsevier).
arXiv (Cornell University), Aug 1, 2020

arXiv (Cornell University), Dec 12, 2016
We investigate what distinguishes reported dreams from other personal narratives. The continuity ... more We investigate what distinguishes reported dreams from other personal narratives. The continuity hypothesis, stemming from psychological dream analysis work, states that most dreams refer to a person's daily life and personal concerns, similar to other personal narratives such as diary entries. Differences between the two texts may reveal the linguistic markers of dream text, which could be the basis for new dream analysis work and for the automatic detection of dream descriptions. We used three text analytics methods: text classification, topic modeling, and text coherence analysis, and applied these methods to a balanced set of texts representing dreams, diary entries, and other personal stories. We observed that dream texts could be distinguished from other personal narratives nearly perfectly, mostly based on the presence of uncertainty markers and descriptions of scenes. Important markers for non-dream narratives are specific time expressions and conversational expressions. Dream texts also exhibit a lower discourse coherence than other personal narratives.
Uploads
Papers by ali hürriyetoğlu