Motaz Saad

Detect Arabic Fake News Through Deep Learning Models and Transformers

Social Science Research Network, 2023

New Results - Machine translation and language modeling

Arabic Text Classification: Text Preprocessing, Term Weighting, and Morphological Analysis

Text mining draw more and more attention recently, it has been applied on different domains inclu... more Text mining draw more and more attention recently, it has been applied on different domains including web mining, and sentiment analysis. Text preprocessing is an important stage in text mining. The main problems in text mining are structuring text data, and the very high dimensionality of text data. Natural language processing and morphological tools can be employed to reduce the dimensionality of text data. In addition, term weighting schemes can be used to enhance text representation as feature vector. Researches in the field of Arabic text mining are still fairly limited. The work of this book presents and compares the impact of text preprocessing on Arabic text classification using popular text classification algorithms. Text preprocessing includes applying different term weighting schemes, and Arabic morphological analysis (stemming and light stemming). Text Classification algorithms are applied on 7 Arabic corpora. Results show that Light stemming with term pruning is best feature reduction technique; Support Vector Machines and Naive Bayes variations outperform other algorithms; Weighting schemes impact the performance of distance based classifier.

AraBEM at WANLP 2022 Shared Task: Propaganda Detection in Arabic Tweets

Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)

Propaganda is information or ideas that an organised group or government spreads to influence peo... more Propaganda is information or ideas that an organised group or government spreads to influence peopleś opinions, especially by not giving all the facts or secretly emphasising only one way of looking at the points. The ability to automatically detect propaganda-related language is a challenging task that researchers in the NLP community have recently started to address. This paper presents the participation of our team AraBEM in the propaganda detection shared task on Arabic tweets. Our system utilised a pre-trained BERT model to perform multi-class binary classification. It attained the best score at 0.602 micro-f1, ranking third on subtask-1, which identifies the propaganda techniques as a multilabel classification problem with a baseline of 0.079.

Download

Detect Arabic Fake News Through Deep Learning Models and Transformers

SSRN Electronic Journal

motazsaad/fit-bot-android: v2.0_Release

Full Changelog: https://github.com/motazsaad/fit-bot-android/compare... more

motazsaad/Arabic-News: v1.0

Full Changelog: https://github.com/motazsaad/Arabic-News/commits/v1.0

Motazsaad/Osac-Corpus: V1.0

Full Changelog: https://github.com/motazsaad/osac-corpus/commits/v1.0

Motazsaad/Arabic-Light-Stemmer: V1.1

Full Changelog: https://github.com/motazsaad/arabic-light-stemmer/co... more

Motazsaad/Arabic-Light-Stemming-Py: V1.0

Full Changelog: https://github.com/motazsaad/arabic-light-stemming-p... more

motazsaad/arwikiExtracts: v1.0

Full Changelog: https://github.com/motazsaad/arwikiExtracts/commits/... more

motazsaad/Arabic-Stories-Corpus: v1.0

Full Changelog: https://github.com/motazsaad/Arabic-Stories-Corpus/c... more

New Results - Machine translation and language modeling

Arabic Text Classification: Text Preprocessing, Term Weighting, and Morphological Analysis

Text mining draw more and more attention recently, it has been applied on different domains inclu... more Text mining draw more and more attention recently, it has been applied on different domains including web mining, and sentiment analysis. Text preprocessing is an important stage in text mining. The main problems in text mining are structuring text data, and the very high dimensionality of text data. Natural language processing and morphological tools can be employed to reduce the dimensionality of text data. In addition, term weighting schemes can be used to enhance text representation as feature vector. Researches in the field of Arabic text mining are still fairly limited. The work of this book presents and compares the impact of text preprocessing on Arabic text classification using popular text classification algorithms. Text preprocessing includes applying different term weighting schemes, and Arabic morphological analysis (stemming and light stemming). Text Classification algorithms are applied on 7 Arabic corpora. Results show that Light stemming with term pruning is best fe...

Motazsaad/BBC-Crawler: V1.0

Full Changelog: https://github.com/motazsaad/bbc-crawler/commits/v1.0

V International Conference on Corpus Linguistics (CILC2013) Extracting Comparable Articles from Wikipedia and Measuring their Comparabilities

Detecting and Counting People's Faces in Images Using Convolutional Neural Networks

2021 Palestinian International Conference on Information and Communication Technology (PICICT), 2021

Cross Language Concept Mining

Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data

Natural Language Engineering, 2018

The objective, in this article, is to address the issue of the comparability of documents, which ... more The objective, in this article, is to address the issue of the comparability of documents, which are extracted from different sources and written in different languages. These documents are not necessarily translations of each other. This material is referred as multilingual comparable corpora. These language resources are useful for multilingual natural language processing applications, especially for low-resourced language pairs. In this paper, we collect different data in Arabic, English, and French. Two corpora are built by using available hyperlinks for Wikipedia and Euronews. Euronews is an aligned multilingual (Arabic, English, and French) corpus of 34k documents collected from Euronews website. A more challenging issue is to build comparable corpus from two different and independent media having two distinct editorial lines, such as British Broadcasting Corporation (BBC) and Al Jazeera (JSC). To build such corpus, we propose to use the Cross-Lingual Latent Semantic approach....

Download

A Lexical Distance Study of Arabic Dialects

Procedia Computer Science, 2018

Diglossia is a very common phenomenon in Arabic-speaking communities, where the spoken language i... more Diglossia is a very common phenomenon in Arabic-speaking communities, where the spoken language is different from both Classical Arabic (CA) and Modern Standard Arabic (MSA). The spoken language is characterised as a number of dialects used in everyday communication as well as informal writing. In this paper, we highlight the lexical relation between the MSA and Dialectal Arabic (DA) in more than one Arabic region. We conduct a computational cross dialectal lexical distance study to measure the similarities and differences between dialects and the MSA. We exploit several methods from Natural Language Processing (NLP) and Information Retrieval (IR) like Vector Space Model (VSM), Latent Semantic Indexing (LSI) and Hellinger Distance (HD), and apply them on different Arabic dialectal corpora. We measure the overlap among all the dialects and compute the frequencies of the most frequent words in every dialect. The results are informative and indicate that Levantine dialects are very similar to each other and furthermore, that Palestinian appears to be the closest to MSA.

Download

Uploads

Papers by Motaz Saad

Log In