Academia.eduAcademia.edu

Outline

A new semantic relatedness measurement using WordNet features

2013, Knowledge and Information Systems

https://doi.org/10.1007/S10115-013-0672-4

Abstract

Computing semantic similarity/relatedness between concepts or words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun "is a" taxonomy, the nominalization relation allowing the use of verb "is a" taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.

References (73)

  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., and Soroa, A. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Stroudsburg, PA, USA, 2009), NAACL '09, Association for Computational Linguistics, pp. 19-27.
  2. Atkinson, J., Ferreira, A., and Aravena, E. Discovering implicit intention-level knowledge from natural-language texts. Know.-Based Syst. 22, 7 (Oct. 2009), 502-508.
  3. Ballatore, A., Bertolotto, M., and Wilson, D. Geographic Knowledge Extraction and Semantic Similarity in OpenStreetMap. Knowledge and Information Systems (2012).
  4. Banerjee, S., and Pedersen, T. Extended gloss overlaps as a measure of semantic relatedness. In In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003), pp. 805-810.
  5. Batet, M., Sánchez, D., and Valls, A. An ontology-based measure to compute semantic similarity in biomedicine. J. of Biomedical Informatics 44, 1 (Feb. 2011), 118-125.
  6. Batista, D. S., Silva, M. J., Couto, F. M., and Behera, B. Geographic signatures for semantic retrieval. In Proceedings of the 6th Workshop on Geographic Information Retrieval (New York, NY, USA, 2010), GIR'10, ACM, pp. 19:1-19:8.
  7. Baziz, M., Boughanem, M., and Aussenac-Gilles, N. Evaluating a conceptual indexing method by utilizing WordNet. In Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23
  8. September, 2005, Revised Selected Papers (2005), C. Peters, F. C. Gey, J. Gonzalo, H. Müller, G. J. F. Jones, M. Kluck, B. Magnini, and M. de Rijke, Eds., vol. 4022 of Lecture Notes in Computer Science, Springer, pp. 238-246.
  9. Blanco-Fernández, Y., Pazos-Arias, J. J., Gil-Solla, A., Ramos-Cabrer, M., López-Nores, M., García-Duque, J., Fernández-Vilas, A., Díaz-Redondo, R. P., and Bermejo-Muñoz, J. A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems. Know.-Based Syst. 21, 4 (May 2008), 305-320.
  10. Bollegala, D., Matsuo, Y., and Ishizuka, M. Measuring semantic similarity between words using web search engines. In WWW '07: Proceedings of the 16th international conference on World Wide Web (New York, NY, USA, 2007), ACM, pp. 757-766.
  11. Budanitsky, A., and Budanitsky, A. Lexical semantic relatedness and its application in natural language processing. Tech. rep., 1999.
  12. Budanitsky, A., and Hirst, G. Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist. 32, 1 (Mar. 2006), 13-47.
  13. Bulskov, H., and Andreasen, T. On measuring similarity for conceptual querying. In Proc. of the 5th International Conference on Flexible Query Answering Systems, Springer- Verlag publisher (2002), Springer, pp. 100-111.
  14. Chen, H.-H., Lin, M.-S., and Wei, Y.-C. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (Stroudsburg, PA,USA, 2006), ACL-44, Association for Computational Linguistics, pp. 1009-1016.
  15. Couto, F. M., Silva, M. J., and Coutinho, P. M. Measuring semantic similarity between gene ontology terms. Data Knowl. Eng. 61, 1 (Apr. 2007), 137-152.
  16. Cross, V., and Chennai-Thiagarajan, A. Measuring information content for an ontological concept. In Fuzzy Information Processing Society (NAFIPS) (2012).
  17. Curran, J. R. Ensemble methods for automatic thesaurus extraction. In Proc. Conference on Empirical Methods in Natural Language Processing (2002), pp. 222-229.
  18. Debenham, J., and Sierra, C. Merging intelligent agency and the semantic web. Know.- Based Syst. 21, 3 (Apr. 2008), 184-191.
  19. Fellbaum, C., Ed. WordNet: An Electronic Lexical Database (Language, Speech, and Communication), illustrated edition ed. The MIT Press, May 1998.
  20. Ferreira, J. a. D., and Couto, F. M. Semantic similarity for automatic classification of chemical compounds. PLoS computational biology 6, 9 (Sept. 2010).
  21. Ferreira, J. D., and Couto, F. M. Generic semantic relatedness measure for biomedical ontologies. In ICBO (2011), O. Bodenreider, M. E. Martone, and A. Ruttenberg, Eds., vol. 833 of CEUR Workshop Proceedings, CEUR-WS.org.
  22. Finkelstein, L., Evgenly, G., Yossi, M., Ehud, R., Zach, S., Gadi, W., and Eytan, R. Placing search in context: the concept revisited. In Proceedings of the Tenth International World Wide Web Conference (2001).
  23. Formica, A. Concept similarity in formal concept analysis: An information content approach. Know.-Based Syst. 21, 1 (Feb. 2008), 80-87.
  24. Francis, N. W., and Ku£era, H. Frequency Analysis of English Usage: Lexicon and Grammar., vol. 18. Houghton Mifflin, Boston, Apr. 1982.
  25. Gaeta, M., Orciuoli, F., and Ritrovato, P. Advanced ontology management system for personalised e-learning. Know.-Based Syst. 22, 4 (May 2009), 292-301.
  26. Gracia, J., and Mena, E. Web-based measure of semantic relatedness. In In Proc. of 9th International Conference on Web Information Systems Engineering (WISE 2008), Auckland (New Zealand (2008), Springer, pp. 136-150.
  27. Harris, Z. Distributional structure. Word 10, 23 (1954), 146-162.
  28. Hirst, G., and St-Onge, D. Lexical chains as representations of context for the detection and correction of malapropisms, 1997.
  29. Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E. G. M., and Milios, E. Information retrieval by semantic similarity. In Intern. Journal on Semantic Web and Information Systems (IJSWIS), 3(3):55-73, July/Sept. 2006. Special Issue of Multimedia Semantics (2006).
  30. Janowicz, K., Keler, C., Schwarz, M., Wilkes, M., Panov, I., Espeter, M., and Bumer, B. Algorithm, implementation and application of the sim-dl similarity server. In second international conference on geospatial semantics (GEOS 2007). Number 4853 in lecture notes in computer science (2007), Springer, pp. 128-145.
  31. Jiang, J. J., and Conrath, D. W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, Sept. 1997.
  32. Köhler, S., Schulz, M. H., Krawitz, P., Bauer, S., Dölken, S., Ott, C. E., Mundlos, C., Horn, D., Mundlos, S., and Robinson, P. N. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. American journal of human genetics 85, 4 (Oct.2009), 457-464.
  33. Leacock, C., and Chodorow, M. Combining Local Context and WordNet Similarity for Word Sense Identification. The MIT Press, May 1998, ch. 11, pp. 265-283.
  34. Lee, J. H., Kim, M. H., and Lee, Y. J. Information retrieval based on conceptual distance in is-a hierarchies. Journal of Documentation 49, 2(1993), 188-207.
  35. Lesk, M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation (New York, NY, USA, 1986), SIGDOC '86, ACM, pp. 24-26.
  36. Li, Y., Bandar, Z. A., and McLean, D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15, 4 (2003), 871-882.
  37. Lin, D. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning (1998), Morgan Kaufmann, pp. 296-304.
  38. Lopez-Pellicer, F. J., Silva, M. J., and Chaves, M. Linkable geographic ontologies. In Proceedings of the 6th Workshop on Geographic Information Retrieval (New York, NY, USA, 2010), GIR '10, ACM, pp. 1 :1-1 :8.
  39. Meng, L., Gu, J., and Zhou, Z. A new model of information content based on concept's topology for measuring semantic similarity in WordNet. International Journal of Grid and Distributed Computing 5, 3 (Sept. 2012).
  40. Miller, G. A., and Charles, W. G. Contextual correlates of semantic similarity. Language and Cognitive Processes 6, 1 (1991), 1-28.
  41. Nayak, R., and Iryadi, W. Xml schema clustering with semantic and hierarchical similarity measures. Know.-Based Syst. 20, 4 (May 2007), 336-349.
  42. Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., and Melton, G. B. Semantic similarity and relatedness between clinical terms: An experimental study. AMIA Annu Symp Proc 2010 (2010).
  43. Patwardhan, S., and Pedersen, T. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In EACL 2006 Workshop Making Sense of Sense- Bringing Computational Linguistics and Psycholinguistics Together (Trento, Italy, 2006), pp. 1-8.
  44. Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., and Chute, C. G. Measures of semantic similarity and relatedness in the biomedical domain. J. of Biomedical Informatics 40, 3 (June 2007), 288-299.
  45. Pedersen, T., Patwardhan, S., and Michelizzi, J. WordNet: :similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004 (Stroudsburg, PA, USA, 2004), HLT-NAACL-Demonstrations '04, Association for Computational Linguistics, pp. 38-41.
  46. Pesquita, C., Faria, D., Falcão, A. O., Lord, P., and Couto, F. M. Semantic Similarity in Biomedical Ontologies. PLoS Comput Biol 5, 7 (July 2009).
  47. Petrakis, E. G. M., Varelas, G., Hliaoutakis, A., and Raftopoulou, P. X-similarity: Computing semantic similarity between concepts from different ontologies. Journal of Digital Information Management (JDIM 2006).
  48. Pirró, G. A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68, 11 (Nov. 2009), 1289-1308.
  49. Rada, R., Mili, H., Bicknell, E., and Blettner, M. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics 19, 1 (Jan. 1989), 17-30.
  50. Radinsky, K., Agichtein, E., Gabrilovich, E., and Markovitch, S. A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World Wide Web (New York, NY, USA, 2011), WWW '11, ACM, pp. 337-346.
  51. Resnik, P. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th international joint conference on Artificial intelligence -Volume 1 (San Francisco, CA, USA, 1995), IJCAI'95, Morgan Kaufmann Publishers Inc., pp. 448- 453.
  52. Resnik, P. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language,1999.
  53. Richardson, R., Smeaton, A. F., and Murphy, J. Using WordNet as a knowledge base for measuring semantic similarity between words. Tech.rep., In Proceedings of AICS Conference, 1994.
  54. Rodríguez, M. A., and Egenhofer, M. J. Determining semantic similarity among entity classes from different ontologies. IEEE Trans. On Knowl. and Data Eng. 15, 2 (feb 2003), 442-456.
  55. Rubenstein, H., and Goodenough, J. B. Contextual correlates of synonymy. Commun. ACM 8, 10 (Oct. 1965), 627-633.
  56. Sahami, M., and Heilman, T. D. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on World Wide Web (New York, NY, USA, 2006), WWW '06, ACM, pp. 377-386.
  57. Sánchez, D. A methodology to learn ontological attributes from the web. Data Knowl. Eng. 69, 6 (June 2010), 573-597.
  58. Sánchez, D., Batet, M., and Isern, D. Ontology-based information content computation. Know.-Based Syst. 24, 2 (Mar. 2011), 297-303.
  59. Sánchez, D., Isern, D., and Millan, M. Content annotation for the semantic web: an automatic web-based approach. Knowl. Inf. Syst. 27, 3 (June 2011), 393-418.
  60. Sánchez, D., and Moreno, A. Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl. Eng. 64, 3 (Mar. 2008), 600-623.
  61. Sebti, A., and Barfroush, A. A. A new word sense similarity measure in WordNet. In IMCSIT (2008), IEEE, pp. 369-373.
  62. Seco, N., Veale, T., and Hayes, J. An intrinsic information content metric for semantic similarity in WordNet, 2004.
  63. Shannon, C. E. A mathematical theory of communication. Bell system technical journal 27 (1948).
  64. Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fast-but is it good? : evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Stroudsburg, PA, USA, 2008), EMNLP '08, Association for Computational Linguistics, pp. 254-263.
  65. Spearman, C. The proof and measurement of association between two things. By C. Spearman, 1904. The American journal of psychology 100, 3-4 (1987), 441-471.
  66. Stevenson, M., and Greenwood, M. A. A semantic approach to IE pattern induction. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (Stroudsburg, PA, USA, 2005), ACL'05, Association for Computational Linguistics, pp. 379-386.
  67. Sussna, M. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the second international conference on Information and knowledge management (New York, NY, USA, 1993), CIKM '93, ACM, pp. 67-74.
  68. Tapeh, A. G., and Rahgozar, M. A knowledge-based question answering system for b2c ecommerce. Know.-Based Syst. 21, 8 (Dec. 2008), 946-950.
  69. Tversky, A. Features of Similarity. In Psychological Review (1977), vol. 84, pp. 327-352.
  70. Wu, Z., and Palmer, M. Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics (Stroudsburg, PA, USA, 1994), ACL '94, Association for Computational Linguistics, pp. 133-138.
  71. Yang, D., and Powers, D. M. W. . In Proceedings of the Twenty-eighth Australasian conference on Computer Science -Volume 38 (Darlinghurst, Australia, Australia, 2005), ACSC '05, Australian Computer Society, Inc., pp. 315-322.
  72. Zargayouna, H. Contexte et sémantique pour une indexation de documents semi- structurés. In CORIA (2004), pp. 161-178.
  73. Zhou, Z., Wang, Y., and Gu, J. A new model of information content for semantic similarity in WordNet. In Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking Symposia -Volume 03 (Washington, DC, USA, 2008), FGCNS '08, IEEE Computer Society, pp. 85-89.