Academia.eduAcademia.edu

Short-Text Semantic Similarity

description7 papers
group2 followers
lightbulbAbout this topic
Short-Text Semantic Similarity is a subfield of natural language processing that focuses on measuring the degree of similarity in meaning between short text segments, such as sentences or phrases. It employs various computational techniques, including vector space models and deep learning, to quantify semantic relationships and enhance understanding of textual content.
lightbulbAbout this topic
Short-Text Semantic Similarity is a subfield of natural language processing that focuses on measuring the degree of similarity in meaning between short text segments, such as sentences or phrases. It employs various computational techniques, including vector space models and deep learning, to quantify semantic relationships and enhance understanding of textual content.

Key research themes

1. What semantic knowledge sources and methodological frameworks can most effectively measure short-text semantic similarity?

This research theme focuses on classifying and evaluating various semantic similarity measurement techniques by leveraging different semantic knowledge sources such as string-based, corpus-based, knowledge-based, and hybrid methods. Understanding these frameworks is critical to developing effective algorithms that capture semantic similarity beyond surface lexical matching, which is particularly challenging in short texts due to limited context and high ambiguity.

Key finding: This comprehensive review categorizes short-text similarity methods into string-based, corpus-based, knowledge-based, and hybrid techniques, identifying four semantic knowledge bases and eight corpus resources as external... Read more
Key finding: The paper divides STS methods into topological/knowledge-based, statistical/corpus-based, and string-based categories, with special emphasis on WordNet taxonomy for topological methods. It contributes a novel hybrid approach... Read more
Key finding: Proposes a corpus-based semantic word similarity measure integrated with a modified and normalized Longest Common Subsequence (LCS) algorithm for text similarity. Experimentation on multiple datasets shows superior... Read more

2. How can lexical, syntactic, and semantic features be integrated via machine learning to improve short-text semantic similarity prediction?

This research area investigates the integration of multiple linguistic feature types—lexical overlap, syntactic structures, semantic relations—through supervised machine learning techniques like Support Vector Machines (SVM). The goal is to construct robust feature representations that capture semantic equivalence or similarity between short texts, enabling systems to generalize across languages and domains, including resource-scarce settings.

Key finding: This work presents an SVM-based system utilizing diverse linguistically motivated features including distributional, conceptual, semantic similarity measures, and multiword expressions. It performed well on SemEval-2015 Task... Read more
Key finding: Employs a supervised learning regression model combining lexical, syntactic, and semantic metrics such as word overlap, BLEU scores on base-phrases, named entity preservation, and predicate-argument alignment. While... Read more
Key finding: Proposes a bag-of-words statistical model augmented with a part-of-speech weighting scheme as proxy for deeper syntactic information, enhancing semantic similarity measurement without requiring resource-heavy parsing. It... Read more
Key finding: Offers a systematic analysis and classification of syntactic information usage—including word order, POS tagging, parsing, semantic role labeling—in STS algorithms, evaluated on the Microsoft Research Paraphrase Corpus.... Read more

3. What role do lexico-syntactic pattern-based corpus methods play in capturing semantic similarity in short texts without reliance on fine-grained semantic resources?

This theme examines approaches that extract semantic similarity via lexico-syntactic patterns from large corpora instead of curated semantic lexical resources like WordNet, aiming to overcome coverage limitations and resource constraints. It focuses on pattern-based methods employing finite-state transducers and corpus mining to capture semantic relations robustly and their utility in short-text similarity and relation extraction.

Key finding: Introduces PatternSim, a novel corpus-based semantic similarity measure leveraging 18 hand-crafted lexico-syntactic patterns encoded as finite-state transducers applied to massive corpora (WACYPEDIA, UKWAC). Without relying... Read more
Key finding: Applies various similarity matching methods—ranging from simple word overlap to dependency graph matching and feature-based vector similarity incorporating lexical, syntactic, and semantic features—for multiple-choice... Read more
Key finding: Develops a system originally designed for textual entailment that uses multiple WordNet-based word-to-word similarity measures aggregated at sentence level to assess semantic textual similarity. Achieves competitive Pearson... Read more

All papers in Short-Text Semantic Similarity

Text classification typically performs best with large training sets, but short texts are very common on the World Wide Web. Can we use resampling and data augmentation to construct larger texts using similar terms? Several current... more
U radu je opisan softverski sistem koji ocenjuje stepen semantičke sličnosti dva zadata kratka teksta na srpskom jeziku. Objašnjeni su osnovni principi na kojima sistem funkcioniše, kao i faze razvoja i evaluacije sistema. Takođe, opisan... more
This paper outlines and categorizes ways of using syntactic information in a number of algorithms for determining the semantic similarity of short texts. We consider the use of word order information, part-of-speech tagging, parsing and... more
U ovom radu su prikazani i kategorizovani načini korišćenja sintaksnih informacija u više algoritama za određivanje semantičke sličnosti kratkih tekstova. Evaluacija performansi algoritama je sprovedena korišćenjem rezultata testa... more
Although the task of semantic textual similarity (STS) has gained in prominence in the last few years, annotated STS datasets for model training and evaluation, particularly those with fine-grained similarity scores, remain scarce for... more
This paper presents POST STSS, a method of determining short-text semantic similarity in which part-of-speech tags are used as indicators of the deeper syntactic information usually extracted by more advanced tools like parsers and... more
Measuring the semantic similarity of short texts is a noteworthy problem since short texts are widely used on the Internet, in the form of product descriptions or captions, image and webpage tags, news headlines, etc. This paper describes... more
Download research papers for free!