Amman Arab University for Graduate Studies
Management Information System
Much attention has been paid to the relative effectiveness of Interactive Query Expansion (IQE) versus Automatic Query Expansion (AQE). This research has been shown that automatic query expansion (collection dependent) strategy gives... more
Much attention has been paid to the relative effectiveness of Interactive Query Expansion (IQE) versus Automatic Query Expansion (AQE). This research has been shown that automatic query expansion (collection dependent) strategy gives better performance than no query expansion. The percentage of queries that are improved by AQE strategy is 57% with average precision equal to 43.2. Compared against AQE (collection dependent) strategy, IQE gives better average precision than AQE strategy. The percentage of queries that are improved by best IQE decision is 86% with average precision equal to 44.1. Evaluation process reveals that the value of n in AQE strategy that gave the optimal value of average precision for the whole query set is equal to one.
- by Basel Bani-Ismail and +2
- •
- Query Expansion, Average Precision
We depict the architecture of a question answering system and methodically evaluate contributions of different system components to accuracy. The system differs from most question answering systems in its dependency on data redundancy... more
We depict the architecture of a question answering system and methodically evaluate contributions of different system components to accuracy. The system differs from most question answering systems in its dependency on data redundancy rather than complicated linguistic analyses of either questions or contender answers. Because a wrong answer is often worse than no answer. A Question Answering (QA) system is a system that takes natural language questions expressed in the Arabic language then attempts to provide short answers. In order to handle this problem, traditional information retrieval techniques joined with a sophisticated natural language processing approach have been used in this research work. Using keyword matching, simple structures extracted from both the question and the candidate documents selected by the IR system were used in the process of identifying the answer. In order to perform this process, we used an existing tagger to identify proper names and other crucial lexical items and build lexical entries. Also provide an analysis of Arabic question forms and attempt to formulate better kinds of answers that users find more appropriate.
ABSTRACT: This algorithm provides a new method for extracting the quadriliteral Arabic root (a four consonant string) from its morphological derivatives. Our stemming algorithm starts by excluding prefixes and checking the word starting... more
ABSTRACT: This algorithm provides a new method for extracting the quadriliteral Arabic root (a four consonant string) from its morphological derivatives. Our stemming algorithm starts by excluding prefixes and checking the word starting from the last letter back to the first. A temporary vector is used to store the suffix letters being removed. and another vector is used to store the root. Particles and the definite article are removed before the suffix and root are partitioned. The algorithm has been tested on a sample of 145 words derived from ...
- by Riyad al-Shalabi and +1
- •
- Engineering
This paper provides an improvement to Arabic Information Retrieval Systems. The proposed system relies on the stem-based query expansion method, which adds different morphological variations to each index term used in the query. This... more
This paper provides an improvement to Arabic Information Retrieval Systems. The proposed system relies on the stem-based query expansion method, which adds different morphological variations to each index term used in the query. This method is applied on Arabic corpus. Roots of the query terms are derived, then for each derived root from the query words, all words in the corpus descendant from the same root are collected and classified in a distinct class. Afterward, each class is reformulated by co-occurrence ...
- by Riyad al-Shalabi and +1
- •
Information Retrieval, the results is not encouraging. Proper names are problematic for cross language information retrieval (CLIR), detecting and extracting proper noun in Arabic language is a primary key for improving the effectiveness... more
Information Retrieval, the results is not encouraging. Proper names are problematic for cross language information retrieval (CLIR), detecting and extracting proper noun in Arabic language is a primary key for improving the effectiveness of the system. The value of information in the text usually is determined by proper nouns of people, places, and organizations, to collect this information it should be detected first. The proper nouns in Arabic language do not start with capital letter as in many other languages such as English language so special treatment is required to find them in a text. Little research has been conducted in this area; most efforts have been based on a number of heuristic rules used to find proper nouns in the text. In this research we use a new technique to retrieve proper nouns from the Arabic text by using set of keywords and particular rules to represent the words that might form a proper noun and the relationships between them.
The Holy Quran is the biggest Miracle of Muslims everywhere and at every time; therefore, it is valid for every time and place. Actually, researches and studies into the Holy Quran that aim to uncover new miracles within are considered as... more
The Holy Quran is the biggest Miracle of Muslims everywhere and at every time; therefore, it is valid for every time and place. Actually, researches and studies into the Holy Quran that aim to uncover new miracles within are considered as a kind of worship for Muslims researchers since it facilitates the Islamic mission and clarifies the vague picture of Islam throughout the world. From this perspective, the researchers have selected the vague miracle of the number 19 in the Holy Quran to examine through this study.
Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different... more
Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text-classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naïve Bayes algorithms. The research results reveal that Naïve Bayes was the best performer, followed by kNN and Rocchio.
This study introduces an analysis to the performance of the Enhanced Associativity Based Routing protocol (EABR ) based on two factors; Operation complexity (OC) and Communication Complexity (CC). OC can be defined as the number of steps... more
This study introduces an analysis to the performance of the Enhanced Associativity Based Routing protocol (EABR ) based on two factors; Operation complexity (OC) and Communication Complexity (CC). OC can be defined as the number of steps required in performing a protocol operation, while CC can be defined as the number of messages exchanged in performing a protocol operation . The values represent the worst-case analysis. The EABR has been analyzed based on CC and OC and the results have been compared with another routing technique called ABR. The results have shown that EABR can perform better than ABR in many circumstances during the route reconstruction.
Building an effective stemmer for Arabic language has been always a hot research topic in the IR field since Arabic language has a very different and difficult structure than other languages, that's because it is a very rich language with... more
Building an effective stemmer for Arabic language has been always a hot research topic in the IR field since Arabic language has a very different and difficult structure than other languages, that's because it is a very rich language with complex morphology. Many linguistic and light stemmers have been developed for Arabic language but still there are many weakness and problems, in this paper we introduce a new light stemming technique and compare it with other used stemmers and show how it improves the search effectiveness.
- by Riyad al-Shalabi and +1
- •
- Technology
The development of an efficient compression scheme to process the Arabic language represents a difficult task. This paper employs the dynamic Huffman coding on data compression with variable length bit coding, on the Arabic language.... more
The development of an efficient compression scheme to process the Arabic language represents a difficult task. This paper employs the dynamic Huffman coding on data compression with variable length bit coding, on the Arabic language. Experimental tests have been performed on both Arabic and English text. A comparison is made to measure the efficiency of compressing data results on both Arabic and English text. Also a comparison is made between the compression rate and the size of the file to be compressed. It has been found that as the file size increases, the compression ratio decreases for both Arabic and English text. The experimental results show that the average message length and the efficiency of compression on Arabic text is better than the compression on English text. Also, results show that the main factor which significantly affects compression ratio and average message length is the frequency of the symbols on the text.
Feature selection is necessary for effective text classification. Dataset preprocessing is essential to make upright result and effective performance. This paper investigates the effectiveness of using feature selection. In this paper we... more
Feature selection is necessary for effective text classification. Dataset preprocessing is essential to make upright result and effective performance. This paper investigates the effectiveness of using feature selection. In this paper we have been compared the performance between different classifiers in different situations using feature selection with stemming, and without stemming.Evaluation used a BBC Arabic dataset, different classification algorithms such as decision tree (D.T), Knearest neighbors (KNN), Naïve Bayesian (NB) method and Naïve Bayes Multinomial(NBM) classifier were used. The experimental results are presented in term of precision, recall, F-Measures, accuracy and time to build model.
- by Riyad al-Shalabi and +1
- •
- Text Classification
Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct meanings. Human can figure out the suitable meaning according to the context in which the word occurs. The Arabic language is highly... more
Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct meanings. Human can figure out the suitable meaning according to the context in which the word occurs. The Arabic language is highly polysemous; in many situations we find it extremely necessary to disambiguate the word senses. This paper studies and compares the performance of a search engine before and after expanding the query through Interactive Word Sense Disambiguation (WSD). We found that expanding polysemous query terms by adding more specific synonyms will narrow the search into the specific targeted request and thus causes both precision and recall to increase; on the other hand, expanding the query with a more general (polysemous) synonym will broaden the search which would cause the precision to decrease.
This algorithm provides a new method for extracting the quadriliteral Arabic root (a four consonant string) from its morphological derivatives. Our stemming algorithm starts by excluding prefixes and checking the word starting from the... more
This algorithm provides a new method for extracting the quadriliteral Arabic root (a four consonant string) from its morphological derivatives. Our stemming algorithm starts by excluding prefixes and checking the word starting from the last letter back to the first. A temporary vector is used to store the suffix letters being removed, and another vector is used to store the root. Particles and the definite article are removed before the suffix and root are partitioned. The algorithm has been tested on a sample of 145 words derived from quadriliteral Arabic verbs, with 95% accuracy for the initial results.
- by Riyad al-Shalabi and +1
- •
- Engineering
Information Retrieval, the results is not encouraging. Proper names are problematic for cross language information retrieval (CLIR), detecting and extracting proper noun in Arabic language is a primary key for improving the effectiveness... more
Information Retrieval, the results is not encouraging. Proper names are problematic for cross language information retrieval (CLIR), detecting and extracting proper noun in Arabic language is a primary key for improving the effectiveness of the system. The value of information in the text usually is determined by proper nouns of people, places, and organizations, to collect this information it should be detected first. The proper nouns in Arabic language do not start with capital letter as in many other languages such as English language so special treatment is required to find them in a text. Little research has been conducted in this area; most efforts have been based on a number of heuristic rules used to find proper nouns in the text. In this research we use a new technique to retrieve proper nouns from the Arabic text by using set of keywords and particular rules to represent the words that might form a proper noun and the relationships between them.
Part-of-Speech tagging is the process of assigning grammatical part-of-speech tags to words based on their context. Many automated tagging systems have been developed for English and many other western languages, and for some Asian... more
Part-of-Speech tagging is the process of assigning grammatical part-of-speech tags to words based on their context. Many automated tagging systems have been developed for English and many other western languages, and for some Asian languages, and have achieved accuracy rates ranging from 95% to 98%. A tagged corpus has more useful information than untagged corpus; so, tagged corpus can be used to extract grammatical and linguistic information from the corpus. Then, it can be used for many applications such as creating dictionaries and grammars of a language using real language data. Tagged corpora are also useful for detailed quantitative analysis of text.
- by Riyad al-Shalabi and +1
- •
Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different... more
Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text-classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naïve Bayes algorithms. The research results reveal that Naïve Bayes was the best performer, followed by kNN and Rocchio.