NGram Approach for Semantic Similarity on Arabic Short Text
2022, International Journal of Advanced Computer Science and Applications
https://doi.org/10.14569/IJACSA.2022.0131199Abstract
Measuring the semantic similarity between words requires a method that can simulate human thought. The use of computers to quantify and compare semantic similarities has become an important research area in various fields, including artificial intelligence, knowledge management, information retrieval, and natural language processing. Computational semantics require efficient measures for computing concept similarity, which still need to be developed. Several computational measures quantify semantic similarity based on knowledge resources such as the WordNet taxonomy. Several measures based on taxonomical parameters have been applied to optimize the expression for content semantics. This paper presents a new similarity measure for quantifying the semantic similarity between concepts, words, sentences, short text, and long text based on NGram features and Synonyms of NGram related to the same domain. The proposed algorithm was tested on 700 tweets, and the semantic similarity values were compared with cosine similarity on the same dataset. The results were analyzed manually by a domain expert who concluded that the values provided by the proposed algorithm were better than the cosine similarity values within the selected domain regarding the semantic similarity between the datasets' short texts.
References (27)
- A.-S. Mohammad, Z. Jaradat, A.-A. Mahmoud, and Y. Jararweh, "Paraphrase identification and semantic text similarity analysis in arabic news tweets using lexical, syntactic, and semantic features," Information Processing & Management, vol. 53, no. 3, pp. 640-652, 2017.
- A. Faaza, D. James, A. Zuhair, A. Keeley et al., "Arabic word semantic similarity," International Journal of Cognitive and Language Sciences, vol. 6, no. 10, pp. 2497-2505, 2012.
- F. A. Almarsoomi, J. D. OShea, Z. Bandar, and K. Crockett, "Awss: An algorithm for measuring arabic word semantic similarity," in 2013 IEEE international conference on systems, man, and cybernetics. IEEE, 2013, pp. 504-509.
- R. Mihalcea, C. Corley, C. Strapparava et al., "Corpus-based and knowledge-based measures of text semantic similarity," in Aaai, vol. 6, no. 2006, 2006, pp. 775-780.
- T. Slimani, "Description and evaluation of semantic similarity measures approaches," arXiv preprint arXiv:1310.8059, 2013.
- G. A. Miller, "Wordnet: a lexical database for english," Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995.
- Y. Li, D. McLean, Z. A. Bandar, J. D. O'shea, and K. Crockett, "Sentence similarity based on semantic nets and corpus statistics," IEEE transactions on knowledge and data engineering, vol. 18, no. 8, pp. 1138-1150, 2006.
- almaany, "Translation and meaning in almaany," https://www.almaany.com/en/dict/ar-en/, July 2022.
- A. Rozeva and S. Zerkova, "Assessing semantic similarity of texts- methods and algorithms," in AIP Conference Proceedings, vol. 1910, no. 1. AIP Publishing LLC, 2017, p. 060012.
- D. Chandrasekaran and V. Mago, "Evolution of semantic similarity-a survey," ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1-37, 2021.
- M. Alian and A. Awajan, "Arabic semantic similarity approaches- review," in 2018 International Arab Conference on Information Tech- nology (ACIT). IEEE, 2018, pp. 1-6.
- J. Yang, Y. Li, C. Gao, and Y. Zhang, "Measuring the short text simi- larity based on semantic and syntactic information," Future Generation Computer Systems, vol. 114, pp. 169-180, 2021.
- S. Zad, M. Heidari, P. Hajibabaee, and M. Malekzadeh, "A survey of deep learning methods on semantic similarity and sentence modeling," in 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). IEEE, 2021, pp. 0466- 0472.
- A. Zouaghi, M. Zrigui, G. Antoniadis, and L. Merhbene, "Contribu- tion to semantic analysis of arabic language," Advances in Artificial Intelligence, vol. 2012, 2012.
- W. Wali, B. Gargouri et al., "Supervised learning to measure the seman- tic similarity between arabic sentences," in Computational collective intelligence. Springer, 2015, pp. 158-167.
- S. S. Aljameel, J. D. O'Shea, K. A. Crockett, and A. Latham, "Sur- vey of string similarity approaches and the challenging faced by the arabic language," in 2016 11th International Conference on Computer Engineering & Systems (ICCES). IEEE, 2016, pp. 241-247.
- H. M. Alghamdi, A. Selamat, and N. S. A. Karim, "Arabic web pages clustering and annotation using semantic class features," Journal of King Saud University-Computer and Information Sciences, vol. 26, no. 4, pp. 388-397, 2014.
- B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury, "Twitter power: Tweets as electronic word of mouth," Journal of the American society for information science and technology, vol. 60, no. 11, pp. 2169-2188, 2009.
- D. Jatnika, M. A. Bijaksana, and A. A. Suryani, "Word2vec model analysis for semantic similarities in english words," Procedia Computer Science, vol. 157, pp. 160-167, 2019.
- N. Peinelt, D. Nguyen, and M. Liakata, "tbert: Topic models and bert joining forces for semantic similarity detection," in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 7047-7055.
- Y. Cai, Q. Zhang, W. Lu, and X. Che, "A hybrid approach for measuring semantic similarity based on ic-weighted path distance in wordnet," Journal of intelligent information systems, vol. 51, no. 1, pp. 23-47, 2018.
- L. Gutiérrez and B. Keith, "A systematic literature review on word embeddings," in International Conference on Software Process Improve- ment. Springer, 2018, pp. 132-141.
- A. Saif, N. Omar, U. Z. Zainodin, and M. J. Ab Aziz, "Building sense tagged corpus using wikipedia for supervised word sense disambigua- tion," Procedia Computer Science, vol. 123, pp. 403-412, 2018.
- H. A. Abdeljaber, "Automatic arabic short answers scoring using longest common subsequence and arabic wordnet," IEEE Access, vol. 9, pp. 76 433-76 445, 2021.
- N. Altuwairesh, "Successful translation students' use of dictionaries," International Journal of English Linguistics, vol. 12, no. 2, 2022.
- N. Sabbah and R. Alsalem, "Female translation students' knowledge and use of online dictionaries and terminology data banks: A case study," AWEJ for Translation & Literary Studies, vol. 2, no. 2, 2018.
- Google, "Google," https://developers.google.com/custom-search/, July 2022.