Short-Text Semantic Similarity

description7 papers

group2 followers

lightbulbAbout this topic

Short-Text Semantic Similarity is a subfield of natural language processing that focuses on measuring the degree of similarity in meaning between short text segments, such as sentences or phrases. It employs various computational techniques, including vector space models and deep learning, to quantify semantic relationships and enhance understanding of textual content.

lightbulbAbout this topic

Key research themes

1. What semantic knowledge sources and methodological frameworks can most effectively measure short-text semantic similarity?

This research theme focuses on classifying and evaluating various semantic similarity measurement techniques by leveraging different semantic knowledge sources such as string-based, corpus-based, knowledge-based, and hybrid methods. Understanding these frameworks is critical to developing effective algorithms that capture semantic similarity beyond surface lexical matching, which is particularly challenging in short texts due to limited context and high ambiguity.

by asad abdi

2022, Soft Computing

Key finding: This comprehensive review categorizes short-text similarity methods into string-based, corpus-based, knowledge-based, and hybrid techniques, identifying four semantic knowledge bases and eight corpus resources as external... Read more

articleView Paper downloadDownload

Semantic Textual Similarity Methods, Tools, and Applications: A Survey

by Goutam Majumder

2022, Computación y Sistemas

Key finding: The paper divides STS methods into topological/knowledge-based, statistical/corpus-based, and string-based categories, with special emphasis on WordNet taxonomy for topological methods. It contributes a novel hybrid approach... Read more

articleView Paper downloadDownload

Semantic similarity of short texts

by Diana Inkpen

2015, Current Issues in Linguistic Theory

Key finding: Proposes a corpus-based semantic word similarity measure integrated with a modified and normalized Longest Common Subsequence (LCS) algorithm for text similarity. Experimentation on multiple datasets shows superior... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can lexical, syntactic, and semantic features be integrated via machine learning to improve short-text semantic similarity prediction?

This research area investigates the integration of multiple linguistic feature types—lexical overlap, syntactic structures, semantic relations—through supervised machine learning techniques like Support Vector Machines (SVM). The goal is to construct robust feature representations that capture semantic equivalence or similarity between short texts, enabling systems to generalize across languages and domains, including resource-scarce settings.

MiniExperts: An SVM Approach for Measuring Semantic Textual Similarity

by Hernani Costa and

2015, 9th Int. Workshop on Semantic Evaluation (SemEval'15)

Key finding: This work presents an SVM-based system utilizing diverse linguistically motivated features including distributional, conceptual, semantic similarity measures, and multiword expressions. It performed well on SemEval-2015 Task... Read more

articleView Paper downloadDownload

UOW: Semantically Informed Text Similarity

by Miguel Rios

2022

Key finding: Employs a supervised learning regression model combining lexical, syntactic, and semantic metrics such as word overlap, BLEU scores on base-phrases, named entity preservation, and predicate-argument alignment. While... Read more

articleView Paper downloadDownload

Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity

by Vuk Batanović

2015, Computer Science and Information Systems

Key finding: Proposes a bag-of-words statistical model augmented with a part-of-speech weighting scheme as proxy for deeper syntactic information, enhancing semantic similarity measurement without requiring resource-heavy parsing. It... Read more

articleView Paper downloadDownload

Evaluation and Classification of Syntax Usage in Determining Short-Text Semantic Similarity

by Vuk Batanović

2014, Telfor Journal

Key finding: Offers a systematic analysis and classification of syntactic information usage—including word order, POS tagging, parsing, semantic role labeling—in STS algorithms, evaluated on the Microsoft Research Paraphrase Corpus.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. What role do lexico-syntactic pattern-based corpus methods play in capturing semantic similarity in short texts without reliance on fine-grained semantic resources?

This theme examines approaches that extract semantic similarity via lexico-syntactic patterns from large corpora instead of curated semantic lexical resources like WordNet, aiming to overcome coverage limitations and resource constraints. It focuses on pattern-based methods employing finite-state transducers and corpus mining to capture semantic relations robustly and their utility in short-text similarity and relation extraction.

by Cédrick Fairon

2025

Key finding: Introduces PatternSim, a novel corpus-based semantic similarity measure leveraging 18 hand-crafted lexico-syntactic patterns encoded as finite-state transducers applied to massive corpora (WACYPEDIA, UKWAC). Without relying... Read more

articleView Paper downloadDownload

by Diego Molla

2025

Key finding: Applies various similarity matching methods—ranging from simple word overlap to dependency graph matching and feature-based vector similarity incorporating lexical, syntactic, and semantic features—for multiple-choice... Read more

articleView Paper downloadDownload

SAGAN: an approach to semantic textual similarity based on textual entailment

by JULIO GABRIEL MARTINEZ CASTILLO

2023

Key finding: Develops a system originally designed for textual entailment that uses multiple WordNet-based word-to-word similarity measures aggregated at sentence level to assess semantic textual similarity. Achieves competitive Pearson... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Short-Text Semantic Similarity

A Data Augmentation Approach to Short Text Classification

by Ryan R Rosario

Text classification typically performs best with large training sets, but short texts are very common on the World Wide Web. Can we use resampling and data augmentation to construct larger texts using similar terms? Several current... more

descriptionView Paper arrow_downwardDownload

Softverski sistem za određivanje semantičke sličnosti kratkih tekstova na srpskom jeziku / A software system for determining the semantic similarity of short texts in Serbian

by Vuk Batanović

2011, Proceedings of the 19th Telecommunications forum (TELFOR 2011)

U radu je opisan softverski sistem koji ocenjuje stepen semantičke sličnosti dva zadata kratka teksta na srpskom jeziku. Objašnjeni su osnovni principi na kojima sistem funkcioniše, kao i faze razvoja i evaluacije sistema. Takođe, opisan... more

descriptionView Paper arrow_downwardDownload

Evaluation and Classification of Syntax Usage in Determining Short-Text Semantic Similarity

by Vuk Batanović

2014, Telfor Journal

This paper outlines and categorizes ways of using syntactic information in a number of algorithms for determining the semantic similarity of short texts. We consider the use of word order information, part-of-speech tagging, parsing and... more

descriptionView Paper arrow_downwardDownload

Evaluacija i klasifikacija korišćenja sintaksnih informacija u određivanju semantičke sličnosti kratkih tekstova / Evaluation and Classification of Syntax Information Usage in Determining Short-Text Semantic Similarity

by Vuk Batanović

2013, Proceedings of the 21st Telecommunications forum (TELFOR 2013)

U ovom radu su prikazani i kategorizovani načini korišćenja sintaksnih informacija u više algoritama za određivanje semantičke sličnosti kratkih tekstova. Evaluacija performansi algoritama je sprovedena korišćenjem rezultata testa... more

descriptionView Paper arrow_downwardDownload

Fine-grained Semantic Textual Similarity for Serbian

by Vuk Batanović and

2018, Proceedings of the 11th International Language Resources and Evaluation Conference (LREC 2018)

Although the task of semantic textual similarity (STS) has gained in prominence in the last few years, annotated STS datasets for model training and evaluation, particularly those with fine-grained similarity scores, remain scarce for... more

descriptionView Paper arrow_downwardDownload

Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity

by Vuk Batanović

2015, Computer Science and Information Systems

This paper presents POST STSS, a method of determining short-text semantic similarity in which part-of-speech tags are used as indicators of the deeper syntactic information usually extracted by more advanced tools like parsers and... more

descriptionView Paper arrow_downwardDownload

by Vuk Batanović

2013, Decision Support Systems

Measuring the semantic similarity of short texts is a noteworthy problem since short texts are widely used on the Internet, in the form of product descriptions or captions, image and webpage tags, news headlines, etc. This paper describes... more

descriptionView Paper arrow_downwardDownload

Short-Text Semantic Similarity

Key research themes

1. What semantic knowledge sources and methodological frameworks can most effectively measure short-text semantic similarity?

2. How can lexical, syntactic, and semantic features be integrated via machine learning to improve short-text semantic similarity prediction?

3. What role do lexico-syntactic pattern-based corpus methods play in capturing semantic similarity in short texts without reliance on fine-grained semantic resources?

Related Topics

All papers in Short-Text Semantic Similarity