Academia.eduAcademia.edu

Outline

Contribution to Semantic Analysis of Arabic Language

2012, Advances in Artificial Intelligence

https://doi.org/10.1155/2012/620461

Abstract

We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.

FAQs

sparkles

AI

What is the effectiveness of the proposed method for Arabic word sense disambiguation?add

The proposed method achieved a disambiguation rate of up to 76% using a modified Lesk algorithm tested on fifty ambiguous Arabic words in various contexts.

How does the chosen context window size influence disambiguation accuracy?add

Using a context window size of three words yielded the highest precision and recall rates, optimizing relevance during disambiguation according to experimental results.

What are the limitations observed in the current Arabic WSD approach?add

Key limitations include challenges in sentence segmentation and the ambiguity of Arabic, making sample selection for testing complex and subjective.

Which algorithms are integrated into the proposed Arabic WSD methodology?add

The methodology combines stemming, stop word elimination, approximate string matching, and similarity measures including Harman, Croft, and Okapi.

What algorithm provides the highest accuracy for stemming Arabic words?add

The Al-Shalabi-kanaan algorithm achieves approximately 90% accuracy in extracting three-letter roots from Arabic words, key for semantic analysis.

References (26)

  1. Zouaghi A., Zrigui M., Antoniadis G.: Automatic Understanding of Spontaneous Arabic Speech -A Numerical Model, TAL 49(1): 141-166, 2008.
  2. Belgacem, M., Zouaghi, A., Zrigui, M., and Antoniadis, G.: Amelioration of the performance of a semantic analyzer for the comprehension of the spontaneous Arabic speech, AIPR, USA, 2009.
  3. Al-Shalabi R., Kanaan G., and Al-Serhan H.: New approach for extracting Arabic roots. ACIT, Egypt, 2003.
  4. Cormen T.H., Leiserson C.E., Rivest R.L.: Introduction to algorithms, Chapter 34, pp 853-885, MIT Press, 1990.
  5. Harman, D.: An experimental study of factors important in document ranking, The ACM Conference on Research and Development in Information Retrieval, 1986.
  6. Croft, W.: Experiments with representation in a document retrieval system, Research and development, 1983.
  7. Robertson, S., Walker, M., and Gatford, M.: Okapi at TREC-3, TREC-3, NIST special publication, 1994.
  8. Lesk M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone, 5th annual international conference on Systems documentation, pp. 24 -26, 1986.
  9. Banerjee, S., Pedersen, T.: Adapting the Lesk Algorithm for Word Sense Disambiguation to WordNet, Submitted in partial fulfillment of the requirements for the degree of Master of Science. 2002.
  10. Derwester S., Dumais S.T., Furnas G.W., Landauer T.K., Harshmann R.: Indexing by Latent Semantic Analysis, Journal of the American Society for Informartion Science, 41, pp. 391-407, 1990.
  11. Pedersen T. and Bruce R.: Distinguishing Word Senses in Untagged Text, Second Conference on Empirical Methods in Natural Language Processing, 1997.
  12. Agirre E. and Edmond P.: Word Sense Disambiguation: Algorithms and Applications, Springer, 2006.
  13. Diab M. and Resnik P.: An unsupervised method for word sense tagging using parallel corpora, ACL, pp. 255-262, Philadelphia, 2002.
  14. Elmougy S., Taher H. and Noaman H.: Naïve Bayes Classifier for Arabic Word Sense Disambiguation. In proceeding of the NLP, 2008.
  15. Zouaghi A., Merhbene L., Zrigui M.: Word Sense disambiguation for Arabic language using the variants of the Lesk algorithm, ICAI'11, USA, 2011.
  16. Merhbene L., Zouaghi A., Zrigui M.: Ambiguous Arabic Words Disambiguation, SNPD 2010, 157-164, UK, 2010.
  17. Merhbene L., Zouaghi A., Zrigui M.: Arabic Word Sense Disambiguation. ICAART (1) 2010: 652- 655, 2010.
  18. Elloumi M.: Comparison of Strings Belonging to the Same Family, Information Sciences, Vol. 111, Elsevier Publishing Co., Amsterdam, p49-63, 1998.
  19. Pedersen T. and Bruce: R.: Distinguishing Word Senses in Untagged Text, Second Conference on Empirical Methods in Natural Language Processing, 1997.
  20. Sidorov G., Gelbukh A.: Word Sense Disambiguation in a Spanish Explanatory Dictionary, TALN'01, pp. 398-402, Tours, 2001.
  21. Vasilescu F., Langlais P., Lapalme J.: Evaluating Variants of the Lesk Approach for Disambiguating Words, LREC, Portugal, 2004.
  22. Vasilescu F.: Monolingual corpus disambiguation by the approaches of Lesk, University of Montreal, Faculty of Arts and Sciences; Paper presented at the Faculty of Graduate Studies to obtain the rank of Master of Science (MSc) in computer science, 2003.
  23. Al-Sulaiti L, Atwell E.: The design of a corpus of contemporary Arabic, International Journal of Corpus
  24. Linguistics, vol. 11, pp. 135-171, 2006. [24] Yarowsky D.: One sense per collocation, ARPA Workshop on Human Language Technology, Princeton, pp. 266-7, 1993.
  25. Zouaghi A., Merhbene L., Zrigui M.: A hybrid approach for Arabic word sense disambiguation, IJCPOL, 2012 (to appear).
  26. Zouaghi A., Merhbene L., Zrigui M.: Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation, Artificial intelligence review, DOI: 10.1007/s10462- 011-9249-3, 2011.