Arabic NLP

description1,000 papers

group10,082 followers

lightbulbAbout this topic

Arabic Natural Language Processing (NLP) is a subfield of artificial intelligence and computational linguistics focused on the interaction between computers and the Arabic language. It involves the development of algorithms and models to enable machines to understand, interpret, and generate Arabic text and speech, addressing unique linguistic features and challenges of the language.

lightbulbAbout this topic

Key research themes

1. How can multidialectal Arabic NLP be advanced to address the diversity and complexity of Arabic dialects in tasks like Named Entity Recognition?

This research area focuses on developing robust NLP models that handle multiple Arabic dialects simultaneously, overcoming the challenge posed by the linguistic diversity, morphological richness, and lack of standardized dialectal resources. It is crucial because Arabic dialects differ considerably from Modern Standard Arabic (MSA) and from each other, leading to poor performance of MSA-centric tools on dialectal texts, thus hindering real-world applications such as information retrieval, machine translation, and question answering.

ARDIAL-BERT: Advancing Multidialectal Arabic Named Entity Recognition Through Continual Pretraining

by alimi tahar

2025, IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE

Key finding: Proposed ARDIAL-BERT—the first multidialectal NER model covering major Arabic dialects (Levantine, Maghrebi, Egyptian, Gulf)—and demonstrated that continual pretraining on regionally grouped datasets notably improves NER... Read more

articleView Paper downloadDownload

CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing

by Mai Oudah

2022

Key finding: Introduced CAMeL Tools, an open-source Python toolkit supporting morphological modeling, dialect identification, named entity recognition, and sentiment analysis tailored to Arabic and its dialects. The toolkit addresses... Read more

articleView Paper downloadDownload

Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic

by Ramy Nagah Eskander

2024

Key finding: Developed MADAMIRA, a fast tool for morphological analysis and disambiguation applicable to both MSA and dialectal Arabic. It integrates morphologically rich analysis, diacritization, POS tagging, and tokenization using... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the effective methodologies for constructing and utilizing large-scale Arabic language corpora and lexicons to support NLP applications, including dialectal variations?

This theme explores the creation, structuring, and use of large Arabic language corpora and lexical resources to enhance NLP tasks. Given the diglossic nature of Arabic, with its standard and multiple dialectal forms, language resources must represent this diversity. Properly designed corpora and lexicons enable better empirical analysis, lexicography, semantic understanding, and help overcome the scarcity of annotated data for dialects, which is a key bottleneck in Arabic NLP development.

Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon

by Maryam Aminian

2015

Key finding: Created Tharwa, a pioneering three-way lexicon connecting Egyptian Dialectal Arabic, Modern Standard Arabic, and English, covering over 73,000 dialect entries. It includes detailed linguistic features such as POS, gender,... Read more

articleView Paper downloadDownload

LDC Arabic treebanks and associated corpora: Data divisions manual

by Mona Diab

2022

Key finding: Established a standardized set of rules for consistent data division of Arabic treebanks (Modern Standard Arabic and Egyptian dialects) into train, development, and test splits. This methodological contribution enables... Read more

articleView Paper downloadDownload

Building an international corpus of Arabic (ICA): Progress of Compilation Stage

by Magdy Nagi

2021, 7th Int. Conf. on Language Eng. Cairo, Egypt

Key finding: Outlined the design and compilation of ICA, an effort to build a large, representative Arabic corpus encompassing diverse genres and regional varieties, addressing the shortage of Arabic corpora for linguistic research and... Read more

articleView Paper downloadDownload

1.5 billion words Arabic Corpus

by Ibrahim Abu El-Khair

2025, arXiv (Cornell University)

Key finding: Presented a large-scale, free Arabic corpus comprising over 1.5 billion words collected from 5 million newspaper articles across 8 countries over 14 years. The corpus offers diverse, multi-source, multi-country data... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. What are the specific challenges and linguistic features of Arabic that must be addressed in NLP, and how can morphological structures like schemes and multiword expressions enhance Arabic NLP systems?

Arabic’s unique linguistic characteristics—such as rich morphology, complex word formation via roots and schemes, orthographic ambiguity due to optional diacritics, diglossia, and pervasive use of multiword expressions—pose significant challenges in NLP. Research focuses on modeling these features accurately, including leveraging scheme-based abstractions to reduce vocabulary sparsity and compiling annotated repositories of multiword expressions to improve language understanding and processing accuracy.

Exploring the Potential of Schemes in Building NLP Tools for Arabic Language

by Mohamed Aziz Ben Mohamed

2023, International Arab Journal of Information Technology

Key finding: Explored the use of Arabic morphological schemes—templates guiding root-based derivation—as abstractions to reduce model sparsity in NLP. Demonstrated a vocabulary reduction of over 90% when converting text to schemes, with a... Read more

articleView Paper downloadDownload

Building an Arabic multiword expressions repository

by Mona Diab

2024

Key finding: Compiled a manually curated, morphosyntactically annotated repository of approximately 5,000 Arabic multiword expressions (MWEs), categorized by syntactic type and enriched with context-sensitive morphological analysis.... Read more

articleView Paper downloadDownload

Challenges in Arabic Natural Language Processing

by Khaled Shaalan and

2018

Key finding: Reviewed critical linguistic challenges in Arabic NLP arising from Arabic's derivational and inflectional morphology, syntactic free word order, diglossia (Classical, MSA, dialects), and orthographic ambiguities due to absent... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Arabic NLP

Une ressource linguistique arabe pour la morphologie computationnelle basée sur le modèle sémitique

by Alexis Neme

2025

We developed an original approach to Arabic traditional morphology, involving new concepts in Semitic lexicology, morphology, and grammar for standard written Arabic. This new methodology for handling the rich and complex Semitic... more

descriptionView Paper arrow_downwardDownload

Restoring Arabic vowels through omission-tolerant dictionary lookup

by Alexis Neme

2025, Language Resources and Evaluation

Vowels in Arabic are optional orthographic symbols written as diacritics above or below letters. In Arabic texts, typically more than 97 percent of written words do not explicitly show any of the vowels they contain; that is to say, depending on the author, genre and field, less than 3 percent of words include any explicit vowel. Although numerous studies have been published on the issue of restoring the omitted vowels in speech technologies, little attention has been given to this problem in papers dedicated to written Arabic technologies. In this research, we present Arabic-Unitex, an Arabic Language Resource, with emphasis on vowel representation and encoding. Specifically, we present two dozens of rules formalizing a detailed description of vowel omission in written text. They are typographical rules integrated into large-coverage resources for morphological annotation. For restoring vowels, our resources are capable of identifying words in which the vowels are not shown, as well as words in which the vowels are partially or fully included. By taking into account these rules, our resources are able to compute and restore for each word form a list of compatible fully vowelized candidates through omission-tolerant dictionary lookup. In our previous studies, we have proposed a straightforward encoding of taxonomy for verbs (Neme, 2011) and broken plurals . While traditional morphology is based on derivational rules, our description is based on inflectional ones. The breakthrough lies in the reversal of the traditional root-and-pattern Semitic model into pattern-and-root, giving precedence to patterns over roots. The lexicon is built and updated manually and contains 76,000 fully vowelized lemmas. It is then inflected by means of finite-state transducers (FSTs), generating 6 million forms. The coverage of these inflected forms is extended by formalized grammars, which accurately describe agglutinations around a core verb, noun, adjective or preposition. A laptop needs one minute to generate the 6 million inflected forms in a 340-Megabyte flat file, which is compressed in two minutes into 11 Megabytes for fast retrieval. Our program performs the analysis of 5,000 words/second for running text (20 pages/second).

descriptionView Paper arrow_downwardDownload

A lexicon of Arabic verbs constructed on the basis of Semitic taxonomy and using finite-state transducers

by Alexis Neme

2025, First International Workshop on Lexical Resources

We describe a lexicon of Arabic verbs constructed on the basis of Semitic patterns and used in a resource-based method of morphological annotation of written Arabic text. The annotated output is a graph of morphemes with accurate... more

descriptionView Paper arrow_downwardDownload

Une ressource sur la langue arabe pour la morphologie computationnelle basée sur le modèle sémitique

by Alexis Neme

2025

A natural path for Arabic morphology consists in adopting or adapting both the traditional Semitic model and finite-state technologies. On the one hand, we have to facilitate the linguist’s task of lexical encoding by proposing a familiar... more

descriptionView Paper arrow_downwardDownload

WASA: A Web Application for Sequence Annotation

by Mona Diab

2025

Data annotation is an important and necessary task for all NLP applications. Designing and implementing a web-based application that enables many annotators to annotate and enter their input into one central database is not a trivial... more

descriptionView Paper arrow_downwardDownload

Proceedings of the Third Arabic Natural Language Processing Workshop

by Mona Diab

2025

This paper presents a language identification system designed to detect the language of each word, in its context, in a multilingual documents as generated in social media by bilingual/multilingual communities, in our case speakers of... more

descriptionView Paper arrow_downwardDownload

DIRA: Dialectal Arabic Information Retrieval Assistant

by Mona Diab

2025

DIRA is a query expansion tool that generates search terms in Standard Arabic and/or its dialects when provided with queries in English or Standard Arabic. The retrieval of dialectal Arabic text has recently become necessary due to the... more

descriptionView Paper arrow_downwardDownload

NLP-Based Multilingual Chatbots for Farmer Advisory Systems

by IJETRM Journal

2025, International Journal of Engineering Technology Research & Management (IJETRM)

Most developing economies continue to be overdependent on agriculture, yet farmers fail to access professional knowledge and information in a timely fashion that may help them generate higher yields, reduce risks, and make decisions. In... more

descriptionView Paper arrow_downwardDownload

Author Profiling for Hate Speech Detection

by Pushkar Mishra

2025, ArXiv

The rapid growth of social media in recent years has fed into some highly undesirable phenomena such as proliferation of abusive and offensive language on the Internet. Previous research suggests that such hateful content tends to come... more

descriptionView Paper arrow_downwardDownload

Author Profiling for Abuse Detection

by Pushkar Mishra

2025

The rapid growth of social media in recent years has fed into some highly undesirable phenomena such as proliferation of hateful and offensive language on the Internet. Previous research suggests that such abusive content tends to come... more

descriptionView Paper arrow_downwardDownload

A Hybrid Approach to Contextual Information Extraction in Low-Resource Igbo

by UZOARU GODSON

2025

Extracting contextual information from low-resource languages such as Igbo remains a significant challenge due to limited linguistic data. This paper proposes a novel hybrid approach that leverages both global and subword-level... more

descriptionView Paper arrow_downwardDownload

Twitter Sentiment Analysis Using Machine Learning and Deep Learning Techniques

by Noor Mahmoud Alkudah

2025, Journal of Computer Science

This research investigates the use of Machine Learning (ML) and Deep Learning, including BiLSTM approaches, for Sentiment Analysis (SA) of consumer reviews on social media sites. Businesses are increasingly depending on online reviews to... more

descriptionView Paper arrow_downwardDownload

TEXT ANALYSIS IN MONGOLIAN LANGUAGE

by Chuluundorj Begz

2025, Text Analysis in Mongolian Language

The relevance of textual analysis appears in numerous case studies across fields of social, business and academic communication. A central question in multilingual research is to develop a universal concept representation using a variety... more

descriptionView Paper arrow_downwardDownload

Overview of the Track on Author Profiling and Deception Detection in Arabic

by Wajdi Zaghouani

2025, FIRE (Working Notes)

This overview presents the Author Profiling and Deception Detection in Arabic (APDA) shared task at PAN@FIRE 2019. Two have been the main aims of this years task: i) to profile the age, gender and native language of a Twitter user; ii) to... more

descriptionView Paper arrow_downwardDownload

REVIEW OF MACHINE LEARNING

by Vinayak K

2025

In recent times, machine learning and deep learning have quickly risen to prominence as highly effective instruments across a multitude of domains, encompassing areas such as image and speech interpretation, the processing of natural... more

descriptionView Paper arrow_downwardDownload

SERTUS Dataset Collection From Spontaneous Environments

by Latifa Iben Nasr

2025, SERTUS Dataset Collection from Spontaneous Environments

This paper introduces SERTUS (Speech Emotion Recognition TUnisian Spontaneous), an extensive dataset collection intended to propel research in Speech Emotion Recognition (SER), particularly within the realm of Tunisian Dialect (TD).... more

descriptionView Paper arrow_downwardDownload

Emotion Recognition from Spontaneous Tunisian Dialect Speech

by Latifa Iben Nasr

2025, Emotion Recognition from Spontaneous Tunisian Dialect Speech

Emotional expressions are a fundamental aspect of human communication, with speech being one of the most natural modes of interaction. Speech Emotion Recognition (SER) is a significant research topic in Natural Language Processing (NLP),... more

descriptionView Paper arrow_downwardDownload

Tunisian Dialect Speech Corpus: Construction and Emotion Annotation

by Latifa Iben Nasr

2025, Tunisian Dialect Speech Corpus: Construction and Emotion Annotation

Speech Emotion Recognition (SER) using Natural Language Processing (NLP) for underrepresented dialects faces significant challenges due to the lack of annotated corpora. This research addresses this issue by constructing and annotating... more

descriptionView Paper arrow_downwardDownload

Cogent Arts & Humanities

by RACHID ED-DALI

2025, Cogent Arts & Humanities

artificial intelligence (ai) tools such as deepseek r1 and chatGPT 4.5 have emerged as promising aids in arabic-english literary translation. This study aims to compare the translation performance of these two systems using a... more

descriptionView Paper arrow_downwardDownload

ARDIAL-BERT: Advancing Multidialectal Arabic Named Entity Recognition Through Continual Pretraining

by alimi tahar

2025, IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE

Named Entity Recognition (NER) is among the main tasks of Natural Language Processing (NLP). NER is a critical and fundamental component for several NLP applications including Information Retrieval (IR), Question-Answering (QA) and Machine Translation (MT). While several NER models for formal languages such as English and Modern Standard Arabic (MSA) have emerged, Arabic dialects remain in their infancy. We noted that the most recent researches focused on the study of a single Arabic dialect, hence the absence of a perfect multidialectal NER model. In this paper, we present the ARDIAL-BERT, the first multidialectal NER model which was built upon a continuous pretraining and then a finetuning on Arabic dialect publicly available datasets grouped by region (Levantine, Maghrebi, Egyptian and Gulf). The model was built upon the last updated version of BERT transfer transformers after several experiments on various BERT NER models. We approached our contribution on two different tasks: first we built ARDIAL-NER, an Arabic multidialect dataset extracted from existing NER datasets. ARDIAL-NER was manually annotated and contains a total of 53,539 entities of 369,372 tokens composing 21683 sentences. Second, we conducted a continual pretraining process using additional unannotated data, and then we guided a finetuning on new annotated NER datasets. The continuous learning system can be applied at different levels: updating model parameters, incorporating new language data, and training on new labels. Our results demonstrate its effectiveness to varying degrees. Our approach showed that it exhibited a greater ability by achieving superior results compared with both baselines and previous models. This demonstrates the capabilities of grouping Arabic dialects by region and the good selection of data that matches well with the baseline transformers. Impact Statement-ARDIAL-BERT represents a significant breakthrough in Arabic multidialectal Named Entity Recognition (NER), providing the first comprehensive model capable of accurately recognizing entities across major Arabic dialects, including Levantine, Maghrebi, Egyptian, and Gulf dialects. By leveraging a novel approach that combines continual pretraining with finetuning on regionally grouped datasets, this work addresses a critical gap in Natural Language Processing (NLP), enabling advancements in applications such as information retrieval, machine translation, and question-answering for Arabic dialects.

descriptionView Paper arrow_downwardDownload

The Correlation between Irony and Satire in Orwell’s Animal Farm: A Qualitative Lexical Analysis

by Mohamed Saber

2025, Journal of the Faculty of Arts- B.S.U

Irony and satire in Orwell's Animal Farm are lexically investigated in the current paper, in order to find out the correlation between both concepts. The researcher adopts a qualitative method of analysis, focusing on chapter 10. The... more

descriptionView Paper arrow_downwardDownload

Mono- and cross-lingual paraphrased text reuse and extrinsic plagiarism detection

by Muhammad Sharjeel

2025

descriptionView Paper arrow_downwardDownload

UPPC - Urdu Paraphrase Plagiarism Corpus

by Muhammad Sharjeel

2025

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is... more

descriptionView Paper arrow_downwardDownload

Natural Tunisian Speech Preprocessing for Features Extraction

by Latifa Iben Nasr

2025, 2023 IEEE/ACIS 23rd International Conference on Computer and Information Science (ICIS)

In this paper, we describe the process of building a corpus for Tunisian Speech Emotion Recognition (SER). To the best of our knowledge, it is the first work in the SER field that uses spontaneous speech emotion in Tunisian dialect.... more

descriptionView Paper arrow_downwardDownload

200 Questions About Transfer Learning and Transformers

by Saman Siadati

2025

The field of artificial intelligence (AI) is evolving at an unprecedented pace, with transfer learning and transformer-based models now forming the backbone of many state-of-the-art systems. This book, 200 Questions About Transfer... more

descriptionView Paper arrow_downwardDownload

The Question Answering Systems : A Survey

by Mohamed Haggag

2025

Question Answering (QA) is a specialized area in the field of Information Retrieval (IR). The QA systems are concerned with providing relevant answers in response to questions proposed in natural language. QA is therefore composed of... more

descriptionView Paper arrow_downwardDownload

Lexis and Syntax of Medicine Product Warnings in the Philippines

by Shielanie Dacumos

2025, International Journal on Natural Language Computing

In the Philippines, parents refused their children having an anti-measles and anti-dengue vaccines, which created a medical outbreak. This may not happen if product warnings have been given and explained to the parents. Indeed, product... more

descriptionView Paper arrow_downwardDownload

Developing Products Update-Alert System for E-Commerce Websites Users using Html Data and Web Scraping Technique

by Ebele Onyedinma

2025, International Journal on Natural Language Computing

Websites are regarded as domains of limitless information which anyone and everyone can access. The new trend of technology has shaped the way we do and manage our businesses. Today, advancements in Internet technology has given rise to... more

descriptionView Paper arrow_downwardDownload

Fake News Detection Using NLP and Logistic Regression

by Likitha T

2025, International Journal of Modern Education and Computer Science (IJMECS)

The widespread dissemination of fake news across digital platforms has emerged as a critical issue, undermining public trust and influencing societal discourse. This paper presents a lightweight yet effective fake news detection system... more

descriptionView Paper arrow_downwardDownload

Characterizing Asymmetries in the TenTen Corpus Family Membership: An Implicit Hierarchy in Multilingual Digital Tools

by David Bordonaba Plou

2025, Digital Studies/Le champ numérique

In this work, we examine the limitations of digital tools in facilitating cross-linguistic and crosscultural research from a humanistic perspective. Our primary objective is to draw comparisons between the TenTen corpora, assessing their... more

descriptionView Paper arrow_downwardDownload

"Bridging the Divide: How AI and Linguistics Inform and Challenge Each Other"

by Htein Win

2025, A I

This paper explores the interplay between artificial intelligence (AI) in natural language processing (NLP) and linguistics, offering NLP engineers actionable methodologies (e.g., syntactic probes, evaluation metrics) and linguists... more

descriptionView Paper arrow_downwardDownload

Detecting Fake News Using Hybrid Machine Learning Models

by IJIRCST I

2025, International Journal of Innovative Research in Computer Science and Technology (IJIRCST)

The increasing diffusion of misinformation in online media has raised alarm as a significant threat to information credibility and societal trust. The ease of disseminating false information across social media platforms, news websites,... more

descriptionView Paper arrow_downwardDownload

Linguistic feature based learning model for fake news detection and classification

by Dr. Anshika Choudhary

2025, Elsevier

Social media is used as a dominant source of news distribution among users. The world's preeminent decisions such as politics are acclaimed by social media to influence users for enclosing users' decisions in their favor. However, the... more

descriptionView Paper arrow_downwardDownload

On the Relevance of Query Expansion Using Parallel Corpora and Word Embeddings to Boost Text Document Retrieval Precision

by Ismaïl Biskri

2025, International Journal on Natural Language Computing

In this paper we implement a document retrieval system using the Lucene tool and we conduct some experiments in order to compare the efficiency of two different weighting schema: the well-known TF-IDF and the BM25. Then, we expand queries... more

descriptionView Paper arrow_downwardDownload

A comprehensive review of advances in transformer, GAN, and attention mechanisms: Their role in multimodal learning and applications across NLP

by MD ARIFUR RAHMAN

2025, International Journal of Science and Research Archive

The emergence and subsequent development of deep learning, specifically transformer-based architectures, Generative Adversarial Networks (GANs), and attention mechanisms, have had revolutionary implications on Natural Language Processing... more

descriptionView Paper arrow_downwardDownload

DEVELOPMENT OF SMART VOICE AGENT With case study (Libyan Voice Assistant

by mohamed arteimi

2025, Academy journal for basic and applied sciences

The paper presents the creation of an end-toend voice assistant system designed for a lesser-resourced dialect of Arabic, Libyan Tripolitanian, which does not receive local support in commercial ASR and NLP applications. To remediate this... more

descriptionView Paper arrow_downwardDownload

Weakly Supervised Deep Learning for Arabic Tweet Sentiment Analysis on Education Reforms: Leveraging Pre-Trained Models and LLMs With Snorkel

by Prof. Farrukh Nadeem

2025

This study introduces a novel approach to sentiment classification of Arabic tweets regarding educational reforms in Saudi Arabia. The complexity of the Arabic language, with its numerous dialects, poses challenges for natural language... more

descriptionView Paper arrow_downwardDownload

Mapping Arabic Wikipedia into the Named Entities Taxonomy

by Mark Lee

2025

This paper describes a comprehensive set of experiments conducted in order to classify Arabic Wikipedia articles into predefined sets of Named Entity classes. We tackle using four different classifiers, namely: Naïve Bayes, Multinomial... more

descriptionView Paper arrow_downwardDownload

SED-UA-SMALL: Ukrainian synthetic dataset for text embedding models

by Dmytro I . Martjanov

2025, International Systems and Networks

This paper presents Small Synthetic Embedding Dataset, a fully synthetic dataset in Ukrainian designed for training, fine-tuning, and evaluating text embedding models. The use of large language models (LLMs) allows for controlling the... more

descriptionView Paper arrow_downwardDownload

International Journal of Research Publication and Reviews

by Doris Chinedu Asogwa

2025, International Journal of Research Publication and Reviews

Automatic Text Classification is a machine learning task that automatically assigns a given text document to a set of pre-defined categories based on the features extracted from its textual content. Most online communication forums,... more

descriptionView Paper arrow_downwardDownload

Projet de base de données textuelles pour l'Institut de la Langue Française

by Etienne Brunet

2025, HAL (Le Centre pour la Communication Scientifique Directe)

descriptionView Paper arrow_downwardDownload

Beyond the Determiner ''Al-'': Expanding the Determiner Class in Arabic, and Elimination of Lexical Ambiguities by Grammars

by Alexis Neme and

2025, IEEE Access (ISSN: 2169-3536)

Arabic nouns can be marked for definiteness or indefiniteness. The definite article is the prefix ''Al-,'' which confines the determiner class to a single element ''Al-.'' This topic is generally discussed under noun inflections, such as... more

descriptionView Paper arrow_downwardDownload

Beyond the Determiner ''Al-'': Expanding the Determiner Class in Arabic, and Elimination of Lexical Ambiguities by Grammars

by Alexis Neme

2025, IEEE Access

descriptionView Paper arrow_downwardDownload

Multi-domain Urdu fake news detection using pre-trained ensemble model

by Sheetal Harris

2025, Multi-domain Urdu fake news detection using pre-trained ensemble model

Fake News (FN) dissemination on websites and online platforms influences human behaviours, sociopolitical domains, and the sovereignty of a country. The outpour of biased news and propaganda on online portals can be addressed by... more

descriptionView Paper arrow_downwardDownload

Combining interlingua with SMT

by Stephanie Seneff

2025, Conference of the Association for Machine Translation in the Americas

descriptionView Paper arrow_downwardDownload

Large Language Models and Microlects Express a Zeitgeist

by Ellis D Cooper

2025, American Journal of Computer Science and Technology

This article gives mathematical pseudocodes for large language model training based on a dataset from a corpus, inference, and chat with possibly lengthy human prompts and generated replies. It introduces the concepts of "microlect" and... more

descriptionView Paper arrow_downwardDownload

Geéz Grammar Error Handling Using Neural Machine Translation Approach

by Eshete Derb

2025

The goal of natural language processing (NLP), which has recently gained popularity, is to improve the capacity of computers to comprehend and interact with human language. Consequently, to converse using natural language, it is crucial... more

descriptionView Paper arrow_downwardDownload

AMWAL: Named Entity Recognition for Arabic Financial News

by Muhammad S. Abdo

2025

Financial Named Entity Recognition (NER) presents a pivotal task in extracting structured information from unstructured financial data, especially when extending its application to languages beyond English. In this paper, we present... more

descriptionView Paper arrow_downwardDownload

منزلة توثيق الحافظ العجلي 261هـ بين أئمة الجرح والتعديل من خلال كتاب تمييز الرجال

by عبدالله نوفل

2025

descriptionView Paper arrow_downwardDownload

Question answering systems: the story till the Arabic linked data

by N. Doumi

2025, International Journal of Artificial Intelligence and Soft Computing

Question answering system (QAS) is essential to satisfy the need to query information available in various formats, including structured data (ontology, databases) or unstructured data (document, web). The QAS provides a correct response... more

descriptionView Paper arrow_downwardDownload

Arabic NLP

Key research themes

1. How can multidialectal Arabic NLP be advanced to address the diversity and complexity of Arabic dialects in tasks like Named Entity Recognition?

2. What are the effective methodologies for constructing and utilizing large-scale Arabic language corpora and lexicons to support NLP applications, including dialectal variations?

3. What are the specific challenges and linguistic features of Arabic that must be addressed in NLP, and how can morphological structures like schemes and multiword expressions enhance Arabic NLP systems?

Related Topics

All papers in Arabic NLP