Academia.eduAcademia.edu

Multiword Expressions

description274 papers
group359 followers
lightbulbAbout this topic
Multiword expressions (MWEs) are linguistic units composed of two or more words that function as a single semantic or syntactic entity. They include phrases such as idioms, collocations, and fixed expressions, which exhibit properties distinct from their individual components, often challenging traditional models of syntax and semantics in natural language processing.
lightbulbAbout this topic
Multiword expressions (MWEs) are linguistic units composed of two or more words that function as a single semantic or syntactic entity. They include phrases such as idioms, collocations, and fixed expressions, which exhibit properties distinct from their individual components, often challenging traditional models of syntax and semantics in natural language processing.
Il lavoro espone i risultati di due indagini contrastive italiano-inglese e italiano-francese riguardanti le espressioni idiomatiche (EI) gestuo-cinesiche, cioè caratterizzate dal fare riferimento a gesti e ad altri comportamenti cinesici... more
This paper describes an algorithm for automatically extracting multiword expressions (MWEs) from a corpus. The algorithm is nodebased, i.e. extracts MWEs that contain the item specified by the user, using a fixed window-size around the... more
In this paper, we will focus on the structural, cognitive, and cross-cultural properties of proverbs. The first part examines the structural characteristics of proverbial sentences, while the second part explores the cognitive framework... more
This comparative study examines the impact of WhatsApp-based instruction versus text-based instruction on intentional media-related vocabulary learning and learners' perceptions. Specifically, it investigates the effects of WhatsApp... more
This paper is aimed at describing the main differences between spoken and written English. More specifically, attention is paid to the different examples which are classified as predicative Prepositional Phrases (PPs) in the International... more
Particle Verbs (PVs) are a very frequent and productive word class in German. They can occur in different syntactic paradigms. In verb-first and verb-second clauses which do not contain auxiliary verbs they occur syntactically separated.... more
Noun-noun compounds are complex words with two simplex nouns as constituents. In English and German, the first constituent represents the modifier of the compound, and the second constituent represents the head. A compound may have... more
We shed light on aspects of the relation between the semantics and the syntactic flexibility of multiword expressions by investigating fixed adjective similes (FS), a predicative multiword expression class not studied in this respect... more
This study explored how second language (L2) speakers' use of multiword sequences in speech predicted perceived fluency ratings while controlling for their utterance fluency. A total of 102 Japanese speakers of English delivered an... more
The PARSEME Shared Task on automatic identification of verbal multiword expressions aims at identifying such expressions in running texts. Typology of verbal multiword expressions, very detailed annotation guidelines and gold-standard... more
Although the interest of literature in word combinations has significantly increased over the last decades, the full classification of their types and comprehensive collection of their forms is far from complete and flawless. This paper... more
In support verb constructions (SVC), as 'have poise', the support verb is explicitly assumed to be a verb, here 'have'. However, during the last 50 years, the notion of SVC has been extended to a large range of new cases. With this new... more
This paper aims to analyze the structure and meanings of the Latin indefinite pronouns that can be traced back to the Indo-European root *kw-e-/*kw-i. All of them are morphologically derivational forms: this property is supported by... more
This study focuses on investigating semantic errors in English to Indonesian translation using DeepL Translate, with the aim of evaluating the extent of semantic accuracy of this translation tool. This study uses a qualitative approach... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
The objective of this study is to investigate how learners of Italian as a second or foreign language search for new meanings in online Italian dictionaries. Using eye-tracking technology, we carried out experiments inviting users to do... more
UD_Greek-GUD (GUD) is the most recent Universal Dependencies (UD) treebank for Standard Modern Greek (SMG) and the first SMG UD treebank to annotate Verbal Multiword Expressions (VMWEs). GUD contains material from fiction texts and... more
The precise identification of light verb constructions is crucial for the successful functioning of several NLP applications. In order to facilitate the development of an algorithm that is capable of recognizing them, a manually annotated... more
The Szeged Corpus is the largest manually annotated database containing the possible morphological analyses and lemmas for each word form. In this work, we present its latest version, Szeged Corpus 2.5, in which the new harmonized... more
The precise identification of light verb constructions is crucial for the successful functioning of several NLP applications. In order to facilitate the development of an algorithm that is capable of recognizing them, a manually annotated... more
In this introductory chapter, we first present the topic and context of this volume. We then summarize its contributions, which have been collected through an open call for submissions and a peer-reviewing process.
for my studies, since I could not pursue empirical research without their help. I am grateful to all of my committee members, Jill Morford, Joan Bybee, Bill Croft, and Andy Wedel, for their mentorship. My research is better due to the... more
Аннотация: Данная статья посвящена анализу результатов zero-shot межъязыкового переноса автоматической лингвистической разметки в стандарте CoBaLD с русского языка на близкородственные и неродственные языки. Исследование показывает, что... more
Korpusgesteuerte und korpusbasierte Untersuchungen führen überzeugend vor Augen, dass die Sprache zu einem viel stärkeren Grad aus konventionalisierten Mehrworteinheiten besteht als früher angenommen wurde. Daraus ergibt sich das... more
There has been a consensus among language researchers regarding the apparent advantages of learning lexical chunks. Conventional pedagogies (e.g., memorizing, drilling, input flooding, typographic enhancement) have been utilized in... more
In this paper, we look at the manual construction of a lexicon of emotion terms in Old English organised as a wordnet lexicon and based on a pre-existing dataset which categorises emotion terms on the basis of cognitive criteria. This is... more
In this introductory chapter, we first present the topic and context of this volume. We then summarize its contributions, which have been collected through an open call for submissions and a peer-reviewing process.
Identifying and translating MultiWord Expressions (MWES) in a text represent a key issue for numerous applications of Natural Language Processing (NLP), especially for Machine Translation (MT). In this paper, we present a method aiming to... more
MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a... more
There is a great deal of knowledge available on the Web, which represents a great opportunity for automatic, intelligent text processing and understanding, but the major problems are finding the legitimate sources of information and the... more
We present here the enhancement of the Romanian wordnet with a new type of information, very useful in language processing, namely types of verbal multi-word expressions. All verb literals made of two or more words are attached a label... more
Рецензенти проф. д-р Йовка Тишева доц. д-р Атанас Атанасов СЪДЪРЖАНИЕ Използвани знаци и съкращения / 9 Въведение / 11 Глава I Българската лингвистика за предложните изрази / 15 84 Приема се, че е възможно съществуването и на хибридни... more
We present MWE-Finder, an application that enables a user to search for multiword expressions (MWEs) in large Dutch text corpora. Components of many MWEs in Dutch can occur in multiple forms, need not be adjacent, and can occur in... more
This paper introduces and demonstrates MWE-Finder, an application to search for flexible multiword expressions (MWEs) in Dutch text corpora, starting from an example. If the example is in canonical form, the application automatically... more
This paper proposes a canonical form for Multiword Expressions (MWEs), in particular for the Dutch language. The canonical form can be enriched with all kinds of annotations that can be used to describe the properties of the MWE and its... more
In this paper we showcase and evaluate MWE-Finder, a system that allows users to search for occurrences of an MWE in a large Dutch text corpus. To this end, we conduct three small case studies, and discuss the results in detail. We make... more
Over the last decade, the prominence of statistical NLP applications that use syntactic rather than only word-based shallow clues increased very significantly. This prominence triggered the creation of large scale treebanks, i.e., corpora... more
Genuine lexical writing assistants that attempt to detect lexical errors such as miscollocations are traditionally less common in Computer Assisted Language Learning than spell and grammar checkers. However, there is empirical evidence of... more
Genuine lexical writing assistants that attempt to detect lexical errors such as miscollocations are traditionally less common in Computer Assisted Language Learning than spell and grammar checkers. However, there is empirical evidence of... more
We report on UD_Greek-GUD (henceforth GUD), the most recent Universal Dependencies (UD) treebank of Standard Modern Greek (SMG). GUD adheres to UD.v2 (de Marneffe et al., 2021) and is the first SMG UD treebank to annotate Verbal Multiword... more
Medieval French is known to be relatively hard to parse, with several possible sources of confusion for automatic parsers, among which its flexible word order and its graphical and syntactic variation, both synchronically and... more
We introduce a new method to tag Multiword Expressions (MWEs) using a linguistically interpretable language-independent deep learning architecture. We specifically target discontinuity, an under-explored aspect that poses a significant... more
Currently, the high volume of international information exchange involves a wide range of localities. As each locality comes with its own distinctive dialect, the need for an effective means of language translation is becoming more and... more
Clichés, as trite expressions, are predominantly multiword expressions, but not all MWEs are clichés. We conduct a preliminary examination of the problem of determining how clichéd a text is, taken as a whole, by comparing it to a... more
This paper describes the adaptation and extension of an existing morphological system and its integration into an intranet service of a large international bank. The system includes a tool for the analysis and extraction of simple and... more
Having a quality annotated corpus is essential especially for applied research. Despite the recent focus of Web science community on researching about cyberbullying, the community dose not still have standard benchmarks. In this paper, we... more
This paper describes the combinatorial properties of agere in association with nouns that designate a scenic activity. I therefore examine combinations such as fabulam, tragoediam, comoediam, partes, gestum, personam agere. The aim of the... more
This paper addresses Persian Complex Predicates (CPs) from an Applied/Pedagogical Construction Grammar (PCxG) stance. PCxG is an approach to foreign language pedagogy that emphasises the importance of constructions (form-meaning... more
Download research papers for free!