R. Remus

Followers

Following

Co-authors

Public Views

Related Authors

Farooq Kperogi

Kennesaw State University

Viacheslav Kuleshov

Stockholm University

Alexander APOSTOLOV

Bulgarian Academy of Sciences

Jean Caelen

Université de Grenoble

Léon-Martin Mbembo Likongo

Université Catholique du Congo

Benjamin N Mbah

University of Nigeria, Nsukka

Robert Vogel

Georgia Southern University

Dragos Simandan

Brock University

Historia y Sociedad

Universidad Nacional de Colombia (National University of Colombia)

Lord Pefela Gildas Nyugha (Ph.D.)

University of Dschang, Cameroon

Interests

Uploads

Papers by R. Remus

Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis

Lecture Notes in Computer Science, 2013

Edit

Towards Well-Grounded Phrase-Level Polarity Analysis

Lecture Notes in Computer Science, 2011

ABSTRACT We propose a new rule-based system for phrase-level polarity analysis and show how it be... more ABSTRACT We propose a new rule-based system for phrase-level polarity analysis and show how it benefits from empirically validating its polarity composition through surveys with human subjects. The system’s two-layer architecture and its underlying structure, i.e. its composition model, are presented. Two functions for polarity aggregation are introduced that operate on newly defined semantic categories. These categories detach a word’s syntactic from its semantic behavior. An experimental setup is described that we use to carry out a thorough evaluation. It incorporates a newly created German-language data set that is made freely and publicly available. This data set contains polarity annotations at word-level, phrase-level and sentence-level and facilitates comparability between different studies and reproducibility of our results.

Edit

Labeling Queries for a People Search Engine

by R. Remus and Amit Kirschenbaum

Lecture Notes in Computer Science, 2012

ABSTRACT We present methods for labeling queries for a specialized search engine: a people search... more ABSTRACT We present methods for labeling queries for a specialized search engine: a people search engine. Thereby, we propose several methods of different complexity from simple probabilistic ones to Conditional Random Fields. All methods are then evaluated on a manually annotated corpus of queries submitted to a people search engine. Additionally, we analyze this corpus with respect to typical search patterns and their distribution.

Edit

Learning from Domain Complexity

Edit

Textual Characteristics for Language Engineering

by R. Remus and M. Schierle

Language statistics are widely used to characterize and better understand language. In parallel, ... more Language statistics are widely used to characterize and better understand language. In parallel, the amount of text mining and information retrieval methods grew rapidly within the last decades, with many algorithms evaluated on standardized corpora, often drawn from newspapers. However, up to now there were almost no attempts to link the areas of natural language processing and language statistics in order to properly characterize those evaluation corpora, and to help others to pick the most appropriate algorithms for their particular corpus. We believe no results in the field of natural language processing should be published without quantitatively describing the used corpora. Only then the real value of proposed methods can be determined and the transferability to corpora originating from different genres or domains can be estimated. We lay ground for a language engineering process by gathering and defining a set of textual characteristics we consider valuable with respect to building natural language processing systems. We carry out a case study for the analysis of automotive repair orders and explicitly call upon the scientific community to provide feedback and help to establish a good practice of corpus-aware evaluations.

DownloadEdit

Textual characteristics of different-sized corpora

Recently, textual characteristics, i.e. certain language statistics, have been proposed to compar... more Recently, textual characteristics, i.e. certain language statistics, have been proposed to compare corpora originating from different genres and domains, to give guidance in language engineering processes and to estimate the transferability of natural language processing algorithms from one corpus to another. However, until now it is unclear how these textual characteristics behave for different-sized corpora. We monitor the behavior of 7 textual characteristics across 4 genres -news articles, Wikipedia articles, general web text and fora posts -and 10 corpus sizes, ranging from 100 to 3,000,000 sentences. Thereby we show, certain textual characteristics are almost constant across corpus sizes and thus might be used to reliably compare different-sized corpora, while others are highly corpus size-dependent and thus may only be used to compare similar-or same-sized corpora. Moreover we find, although textual characteristics vary from genre to genre, their behavior for increasing corpus size is quite similar.

DownloadEdit

MLSA–A Multi-layered Reference Corpus for German Sentiment Analysis

by Stefan Gindl, R. Remus, and J. Ruppenhofer

Edit

Sentiment in German-language News and Blogs, and the DAX

by Gerhard Heyer and R. Remus

An analysis of a diachronically organised corpus of German- language newspaper articles and blog ... more An analysis of a diachronically organised corpus of German- language newspaper articles and blog posts on economy and finance is presented using a prototype dictionary of aect in German. The changes in the frequency of occurrence of positive and negative polarity words are rendered as return time series and the properties of this time series are described. The returns and

DownloadEdit

R. Remus

Uploads

Papers by R. Remus

Log In