Academia.eduAcademia.edu

Corpora and Computational Linguistics

description58 papers
group228 followers
lightbulbAbout this topic
Corpora and Computational Linguistics is the study of language through the analysis of large collections of textual data (corpora) using computational methods. This field combines linguistic theory with computer science techniques to model, analyze, and understand language patterns, structures, and usage in various contexts.
lightbulbAbout this topic
Corpora and Computational Linguistics is the study of language through the analysis of large collections of textual data (corpora) using computational methods. This field combines linguistic theory with computer science techniques to model, analyze, and understand language patterns, structures, and usage in various contexts.
This overview presents the Author Profiling and Deception Detection in Arabic (APDA) shared task at PAN@FIRE 2019. Two have been the main aims of this years task: i) to profile the age, gender and native language of a Twitter user; ii) to... more
Irony and satire in Orwell's Animal Farm are lexically investigated in the current paper, in order to find out the correlation between both concepts. The researcher adopts a qualitative method of analysis, focusing on chapter 10. The... more
Corpus Linguistics Syllabus for Students and Corpus Learners.
Èdè kì í ṣe èti tí kì í dágbà, bí ọ̀làjú àti ìdàgbàsókè ṣe bá ayé ní èdè náà dàgbà si. Gẹ́gẹ́ bí àwọn onímọ̀ ti ṣe sọ, òpó pàtàkì ni èdè ọmọnìyàn jẹ́ fún àsọyé àti àgbọ́yé ìṣẹ̀lẹ̀. Èdè sì... more
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to... more
ΕΚΔΟΣΗ 4 η ΣΥΜΠΛΗΡΩΜΕΝΗ-ΒΕΛΤΙΩΜΕΝΗ-Η Ελληνική είναι η πιο εύκολη και η πιο τέλεια γραφή του κόσμου. Γράφει καλύτερα και από το μαγνητόφωνο. Με το μαγνητόφωνο γίνονται παρανοήσεις με τις ομόηχες λέξεις, ενώ με το ελληνικό σύστημα γραφής... more
Climate change has led to rising sea levels and warmer sea surface temperatures. These factors contribute greatly to the intensity of hurricanes and floods they provoke. Projections estimate there will be an increase of 45% to 87% in the... more
The goal of Style Change Detection task in a document is to determine if it was written by more than one author and in such case, to delimit which paragraph (or more generally a portion of text) corresponds to each one of them. The... more
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to... more
обчислено квантитативні параметри службових частин мови у словнику і тексті на матеріалі електронного корпусу великої прози Івана Франка. Виявлено їхні статистичні характеристики, спільне та відмінне як для окремих творів письменника, так... more
تأليف جماعي حول جهود أستاذة الدراسات الأندلسية والمغربية بجامعة سيدي محمد بن عبد الله بفاس، الدكتورة سعيدة العلمي، وقد شارك في هذا الكتاب نخبة من الباحثين - يُنظر فهرس المحتويات في الملف- وذلك في بداية دراستي حول جهود هذه الأستاذة التي... more
Nepal, Aaradh, and Francesco Perono Cacciafoco. (2024). Minoan Cryptanalysis: Computational Approaches to Deciphering Linear A and Assessing Its Connections with Language Families from the Mediterranean and the Black Sea Areas. In Revesz,... more
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to... more
"Fragmentation" is a well-worn watchword in contemporary biblical studies. But is endless fragmentation across the traditional domains of epistemology, methodology and hermeneutics the inevitable future for the postmodern exercise of... more
يهدف البحث إلى تقييم وتقويم أداة قياس تشابه الجمل (Sentence similarity) الملحقة بـ(BERT) المعدة من (Google)، التي يعتمد عليها بشكل كبير في البحوث المهتمة بمعالجة اللُّغات الطَّبيعيَّة، خاصة في تحسين مخرجات التَّرجمة الآليَّة[ ]، وذلك من... more
Historically, the lone wolf has been associated with different movements, ranging from the propaganda of the deed in the 19th Century to the leaderless resistance of white-supremacist groups in the 1980s and 90s. More recently, it is... more
This paper describes and outlines a new project entitled "Applying computer-aided methods to discourse analysis". This project aims to develop an e-learning environment dedicated to documenting, evaluating and teaching the use of corpus... more
The article presents a study conducted within the framework of discourse complexology - an integral scientific domain that has united linguists, cognitive scientists, psychologists and programmers dealing with the problems of discourse... more
Stylometric Authorship attribution is one of the essential approaches in the text mining. The present research endorses a Stylometric method called Stylometric Authorship Ranking Attribution (SARA) overcomes the usual problems which are... more
Текстот има цел да го претстави дигитализираното творештво на Блаже Конески, кое е вметнато во корпусот Гралис-мак, и да го образложи како модел за отворени образовни ресурси. Материјалот претставува богата ризница за научните истражувачи... more
Contemporary studies on interpersonal communication confirm that in order to understand and model this multifaceted process, not only speech itself but also other components, including gestures, facial expressions, or body postures, must... more
Register studies describe the situational and linguistic characteristics of particular registers. There are also studies that make comparisons across registers. These studies have shown that different registers have systematic and... more
Text reuse is becoming a serious issue in many fields and research shows that it is much harder to detect when it occurs across languages. The recent rise in multilingual content on the Web has increased cross-language text reuse to an... more
The involvement of technological applications on everyday life is increasing. In particular. the growing market for personal speech assistants is remarkable. Mexico is home to many vernacular languages; however, almost all of them have... more
This paper presents our approach to the Author Clustering task at PAN 2017. We performed a hierarchical clustering analysis of different document features: typed and untyped character n-grams, and word n-grams. We experimented with two... more
The goal of Style Change Detection task in a document is to determine if it was written by more than one author and in such case, to delimit which paragraph (or more generally a portion of text) corresponds to each one of them. The... more
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to... more
The goal of Style Change Detection task in a document is to determine if it was written by more than one author and in such case, to delimit which paragraph (or more generally a portion of text) corresponds to each one of them. The... more
The goal of Style Change Detection task in a document is to determine if it was written by more than one author and in such case, to delimit which paragraph (or more generally a portion of text) corresponds to each one of them. The... more
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to... more
Latin phrases are an integral part of the language of educated speakers in many (European) languages. Besides lexical units of Latin origin that have been already adapted to the orthography of the respective host language and calques,... more
Contribution to the conference "Accidents, Hitches, and Glitches in Linguistic Research" Roma, 23-24 February 2023 The Nordic Dialect Corpus (henceforth NDC) represents the most comprehensive digital resource for the study of North... more
This paper aims at investigating the use of textual distributional similarity measures in the context of comparable corpora. We address the issue of measuring the relatedness between documents by extracting, measuring and ranking their... more
The goal of Style Change Detection task in a document is to determine if it was written by more than one author and in such case, to delimit which paragraph (or more generally a portion of text) corresponds to each one of them. The... more
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to... more
General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright... more
Este artigo soma-se aos trabalhos disponíveis sobre Processamento de Língua Natural ao fornecer uma demonstração de como linguagens de programação como o R (R CORE TEAM, 2020) podem ser úteis na detecção de autoria e na identificação do... more
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to... more
In this paper we address style change detection problem at PAN’18 author identification task. For this task one should determine whether text is written by the same author or not. We consider supervised problem statement with the whole... more
The paper investigates method for the style breach detection task. We developed a method based on mapping sentences into high dimensional vector space. Each sentence vector depends on the previous and next sentence vectors. As main... more
This notebook paper documents the approach adopted by our team for Author Masking Task in PAN 2016. For the purpose of masking the identity of the author, we use a simple translation based approach. From the source language (English), the... more
Climate change has led to rising sea levels and warmer sea surface temperatures. These factors contribute greatly to the intensity of hurricanes and floods they provoke. Projections estimate there will be an increase of 45% to 87% in the... more
Climate change has led to rising sea levels and warmer sea surface temperatures. These factors contribute greatly to the intensity of hurricanes and floods they provoke. Projections estimate there will be an increase of 45% to 87% in the... more
integration of ICT in teaching through the application of technologies for e-learning and through creation of digital content for education in the form of online OERs. This project will help teachers make comparison between the... more
There are limited linguistic options for expressing the emotion of Sadness by non-native English speakers. To address the issue, current interdisciplinary study links disciplines of Psychology, Linguistics, Computer Science, and Big Data... more
The idea of doing things in a new way actually pressurizes the researcher to submit the composed distortion. At this time plagiarism takes place in the mind if researcher. Plagiarism detection as it is conceivable today does not ensure to... more
Download research papers for free!