Academia.eduAcademia.edu

Outline

UQeResearch: Semantic Textual Similarity Quantification

2015, Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

https://doi.org/10.18653/V1/S15-2022

Abstract

This paper presents an approach for estimating the Semantic Textual Similarity of full English sentences as specified in Shared Task 2 of SemEval-2015. The semantic similarity of sentence pairs is quantified from three perspectives -structural, syntactical, and semantic. The numerical representations of the derived similarity measures are then applied to train a regression ensemble. Although none of these three sets of measures is able to represent the semantic similarity of two sentences individually, our experimental results show that the combination of these features can precisely assess the semantic similarity of the sentences. In the English subtask our system's best result ranked 35 among 73 system runs with 0.7189 average Pearson correlation over five test sets. This was 0.08 correlation points less than the best submitted run.

References (14)

  1. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Iñigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO.
  2. Wendy W. Chapmana, Will Bridewellb, Paul Hanburya, Gregory F. Coopera, and Bruce G. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301-310.
  3. Trevor Cohen and Dominic Widdows. 2009. Empirical distributional semantics: Methods and biomedical applications. Journal of Biomedical Informatics, 42(2), 390-405.
  4. Charles J. Fillmore, Christopher R. Johnson, and Miriam R.L. Petruck. 2003. Background to FrameNet. International Journal of Lexicography, 16(3), 235-250.
  5. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1).
  6. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the Conference 41st Annual Meeting of the Association for Computational Linguistics, pages 423-430.
  7. Dekang Lin. 1998. An Information-Theoretic Definition of Similarity. In Proceedings of the 15th International Conference on Machine Learning, pages 296-304.
  8. Kevin Lund and Curt Burgess. 1996. Producing high- dimensional semantic spaces from lexical co- occurrence. Behavior Research Methods Instruments & Computers, 28(2), 203-208.
  9. Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Conference on Language Re-sources and Evaluation (LREC- 2014), Reykjavik, Iceland.
  10. George A. Miller. 1995. Wordnet -a Lexical Database for English. Communications of the ACM, 38(11), 39-41.
  11. Saul B. Needleman and Christian D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins Journal of Molecular Biology, 48(3), 443 - 453.
  12. Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 448- 453.
  13. Dominic Widdows and Kathleen Ferraro. 2008. Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application. In Sixth International Conference on Language Resources and Evaluation, Lrec 2008, pages 1183- 1190.
  14. Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pages 133-138, Las Cruces, New Mexico.