Sentence Similarity

description16 papers

group0 followers

lightbulbAbout this topic

Sentence similarity is a computational linguistics task that measures the degree of semantic equivalence between two sentences. It involves analyzing syntactic structures, word meanings, and contextual usage to quantify how closely related the sentences are in terms of meaning, often utilizing algorithms and models from natural language processing.

lightbulbAbout this topic

Key research themes

1. How do domain-specific string-based and ontology-informed methods improve biomedical sentence similarity measurement with reproducibility?

This research theme centers on the development and evaluation of sentence similarity methods tailored to the biomedical domain, emphasizing the importance of reproducible experimental setups. Due to the highly specialized vocabulary, complex syntactic structures, and abundant acronyms in biomedical texts, conventional sentence similarity models often underperform. The theme addresses the integration of string-based techniques and ontology-based semantic methods, the impact of preprocessing stages and Named Entity Recognition (NER) tools on method performance, and the establishment of reproducible resources and protocols to enhance experimental rigor and comparability.

A reproducible experimental survey on biomedical sentence similarity: A string-based method sets the state of the art

by Ana M Garcia-Serrano

2024, PLOS ONE

Key finding: Introduces LiBlock, a novel aggregated string-based sentence similarity measure that significantly outperforms all evaluated state-of-the-art machine learning models and most ontology-based methods on multiple biomedical... Read more

articleView Paper downloadDownload

Noun phrase based weghting scheme for sentence similarity measurement

by siti sakira kamaruddin

2024, Journal of Fundamental and Applied Sciences

Key finding: Proposes a hybrid similarity measure combining word embeddings with named-entity based semantic similarity, addressing semantic variation and noise in short biomedical texts. The approach achieves enhanced performance in... Read more

articleView Paper downloadDownload

Noun phrase based weghting scheme for sentence similarity measurement

by siti sakira kamaruddin

2024, Journal of Fundamental and Applied Sciences

Key finding: Demonstrates that weighting sentence similarity calculations by noun phrase (NP) importance, as opposed to standard term frequency, significantly improves semantic similarity measures applicable to text categorization and... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the impacts of lexical, syntactic, and semantic features, combined with vector-based distributional representations, on general domain sentence similarity?

This theme investigates the multifaceted role of lexical overlap, syntactic structure, and semantic frame alignment in modeling sentence similarity within general or cross-domain corpora. It explores supervised machine learning integration of these heterogeneous features, the utilization of distributional semantic models such as Random Indexing and Latent Semantic Analysis, and the embedding of syntactic and semantic information directly into vector representations (e.g., via vector permutations or recursive autoencoders). The research seeks to identify complementary strengths of diverse features to improve semantic textual similarity estimates beyond lexical matching alone.

UOW: Semantically Informed Text Similarity

by Miguel Rios

2022

Key finding: Employs a supervised regression model combining lexical metrics (word overlap and cosine similarity), syntactic similarity via BLEU scores over base-phrases, and semantic similarity based on named entity preservation and... Read more

articleView Paper downloadDownload

UNIBA: Distributional Semantics for Textual Similarity

by Annalina Caputo

2023

Key finding: Utilizes distributional semantic spaces constructed via Random Indexing, Latent Semantic Analysis, and an innovative vector permutation method to inject syntactic information into representations. Results show that combining... Read more

articleView Paper downloadDownload

Penn: Using Word Similarities to better Estimate Sentence Similarity

by Sneha Jha

2016

Key finding: Compares multiple word embedding methodologies — recursive autoencoders, eigenword spectral methods, and selector generalizations — to generate word-level similarities that are aggregated at the sentence level through... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How do hybrid and deep learning approaches integrating lexical relationships and sentence structure advance sentence similarity estimation?

This research focus explores the development of hybrid methodologies that combine deep learning models (e.g., CNNs, RNNs, BERT) with lexical knowledge-based techniques (e.g., WordNet) to measure semantic similarity between sentences. It emphasizes the need to incorporate lexical relationships, syntactic structures, word order, and semantic nuances such as determiners and negations. The goal is to improve similarity measures by capturing compositional semantic phenomena beyond simple lexical overlap, with particular attention to datasets and tasks where paraphrase detection and nuanced semantic differences are critical.

A Novel Hybrid Methodology of Measuring Sentence Similarity

by Yongmin Yoo

2024, Symmetry

Key finding: Proposes a hybrid sentence similarity measurement method combining deep learning architectures (CNN, RNN, BERT) with lexical relationship analysis based on WordNet, integrating cosine similarity on embedding vectors and... Read more

articleView Paper downloadDownload

Sentence paraphrase detection: When determiners and word order make the difference

by Nghĩa Phạm

2024

Key finding: Introduces a challenging evaluation dataset emphasizing semantic differences stemming from word order swaps and determiner replacements in paraphrase detection. Results reveal that compositional distributional semantics... Read more

articleView Paper downloadDownload

by M. Ishizuka

2023, Proceedings of the AAAI Conference on Artificial Intelligence

Key finding: Demonstrates that raw similarity scores between sentence pairs can be misleading for textual entailment prediction and proposes a supervised method that jointly learns non-linear transformations of multiple lexical similarity... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Sentence Similarity

Role of Conjunctions and Students' Cognitive Characteristics in Argumentative Essay Writing

by Teti Sobari

2025

Most high school students are able to write arguments. However, most students are still unable to develop complex writing. The purpose of this research was to investigate the students' argumentative writing which displays various... more

descriptionView Paper arrow_downwardDownload

Role of Conjunctions and Students' Cognitive Characteristics in Argumentative Essay Writing

by Teti Sobari and

2025, International Journal of Learning, Teaching and Educational Research

descriptionView Paper arrow_downwardDownload

Noun phrase based weghting scheme for sentence similarity measurement

by siti sakira kamaruddin

2024, Journal of Fundamental and Applied Sciences

The need for an effective text similarity measures has led many previous studies to propose different text weighting schemes to enhance the overall performance of sentence similarity measures. Term Frequency Inverse Document Frequency (TF... more

descriptionView Paper arrow_downwardDownload

A Novel Hybrid Methodology of Measuring Sentence Similarity

by Yongmin Yoo

2024, Symmetry

The problem of measuring sentence similarity is an essential issue in the natural language processing area. It is necessary to measure the similarity between sentences accurately. Sentence similarity measuring is the task of finding... more

descriptionView Paper arrow_downwardDownload

A Novel Hybrid Methodology of Measuring Sentence Similarity

by Yongmin Yoo

2024, Symmetry

descriptionView Paper arrow_downwardDownload

You Are Your Words: Modeling Students' Vocabulary Knowledge with Natural Language Processing Tools

by Danielle McNamara

2024

The current study investigates the degree to which the lexical properties of students’ essays can inform stealth assessments of their vocabulary knowledge. In particular, we used indices calculated with the natural language processing... more

descriptionView Paper arrow_downwardDownload

Automatic summarization of scientific publications using a feature selection approach

by Jean-Charles Lamirel

2023, International Journal on Digital Libraries

Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate... more

descriptionView Paper arrow_downwardDownload

IIIT Hyderabad in Summarization and Knowledge Base Population at TAC 2011

by Harshil Jain

2023

In this report, we present details about the participation of IIIT Hyderabad in Guided Summarization and Knowledge Base Population tracks at TAC 2011. we have enhanced our summarization system with knowledge based measures. Wikipedia... more

descriptionView Paper arrow_downwardDownload

Measuring the sentence level similarity

by Ercan Canhasi

2023, Advances in Architecture and Engineering

This article describes a method used to calculate the similarity between short English texts, specifically of sentence length. The described algorithm calculates semantic and word order similarities of two sentences. In order to do so, it... more

descriptionView Paper arrow_downwardDownload

IIIT Hyderabad in Summarization and Knowledge Base Population at TAC 2011

by Harshil Jain

2023

descriptionView Paper arrow_downwardDownload

Automatic summarization of scientific publications using a feature selection approach

by Nicolas Dugué

2023, International Journal on Digital Libraries

descriptionView Paper arrow_downwardDownload

Automatic summarization of scientific publications using a feature selection approach

by Jean-Charles Lamirel

2023, International Journal on Digital Libraries

descriptionView Paper arrow_downwardDownload

A Comprehensive Comparative Study of Word and Sentence Similarity Measures

by Issa Atoum

2022, International Journal of Computer Applications

Sentence similarity is considered the basis of many natural language tasks such as information retrieval, question answering and text summarization. The semantic meaning between compared text fragments is based on the words' semantic... more

descriptionView Paper arrow_downwardDownload

Duc 2005

by Hoa Dang

2022, Proceedings of the Workshop on Task-Focused Summarization and Question Answering - SumQA '06

The Document Understanding Conference (DUC) 2005 evaluation had a single useroriented, question-focused summarization task, which was to synthesize from a set of 25-50 documents a well-organized, fluent answer to a complex question. The... more

descriptionView Paper arrow_downwardDownload

Overview of DUC 2005

by Hoa Dang

2022, Proceedings of the Document Understanding …

The focus of DUC 2005 was on developing new evaluation methods that take into account variation in content in human-authored summaries. Therefore, DUC 2005 had a single user-oriented, question-focused summarization task that allowed the... more

descriptionView Paper arrow_downwardDownload

IIIT Hyderabad in Summarization and Knowledge Base Population at TAC 2011

by Harshit Jain

2022

descriptionView Paper arrow_downwardDownload

A Comprehensive Comparative Study of Word and Sentence Similarity Measures

by Ahmed Ali Otoom

2022, International Journal of Computer Applications

descriptionView Paper arrow_downwardDownload

Sentiment and Sentence Similarity as Predictors of Integrated and Independent L2 Writing Performance

by Kutay Uzun

2022

This study aimed to utilize sentiment and sentence similarity analyses, two Natural Language Processing techniques, to see if and how well they could predict L2 Writing Performance in integrated and independent task conditions. The data... more

descriptionView Paper arrow_downwardDownload

Psychological Features for Automatic Text Summarization

by David Losada

2022, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

Automatically summarizing a document requires conveying the important points of a large document in only a few sentences. Extractive strategies for summarization are based on selecting the most important sentences from the input... more

Table 6. Features included into the regression models fitted by the best subset selection method.

Continued on next page the main aim of our research.

Table 2. Summarization DUC datasets and tasks. The table reports the main statistics of the collections and how we used them in our experiments (train or test). document), centroid (cosine overlap of the sentence with the centroid vector of the document or cluster), and length.All feature values are linearly combined yielding an aggregated score for each sentence. These scores are used to build an initial ranking of sentences. Finally, a re-ranking module removes sentences that are too similar to sentences already in the ranking. The resulting ranked set of sentences is used to produce a summary of the desired size.

Table 3. Test results (Single-Document Summarization). ROUGE-2 and ROUGE-SU4 scores are reported together with their 95% confidence intervals (in brackets). For each collection and per- formance measure the highest score is bolded. Table 4. Test results (Multi-Document Summarization). ROUGE-2 and ROUGE-SU4 scores are reported together with their 95% confidence intervals (in brackets). For each collection and per- formance measure the highest score is bolded.

extract document passages that describe real stories or events.

Table 5. Mean Squared Error of regression models built by different model selection strategies. coefficients towards zero. Another class of approaches transform the features and then fit a model using the transformed features:

descriptionView Paper arrow_downwardDownload

The LIA summarization system at DUC-2007

by Juan-Manuel Torres-Moreno

2021

This paper presents the LIA summarization systems participating to DUC 2007. This is the second participation of the LIA at DUC and we will discuss our systems in both main and update tasks. The system proposed for the main task is the... more

descriptionView Paper arrow_downwardDownload

A New Hybrid Farsi Text Summarization Technique Based on Term Co-Occurrence and Conceptual Property of the Text

by Mohsen Sharifi

2021, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing

The importance of text summarization grows rapidly as the amount of information increases exponentially. This paper presents a new hybrid summarization technique that combines statistical properties of documents with Farsi linguistic... more

descriptionView Paper arrow_downwardDownload

A New Hybrid Farsi Text Summarization Technique Based on Term Co-Occurrence and Conceptual Property of the Text

by Behrouz Minaei

2016, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing

descriptionView Paper arrow_downwardDownload

MCMR: Maximum coverage and minimum redundant text summarization model

by Ramiz M Aliguliyev and

2016

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with... more

descriptionView Paper arrow_downwardDownload

Improving the Estimation of Word Importance for News Multi-Document Summarization

by Ani Nenkova

2016, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

In this paper, we propose a supervised model for ranking word importance that incorporates a rich set of features. Our model is superior to prior approaches for identifying words used in human summaries. Moreover we show that an... more

descriptionView Paper arrow_downwardDownload

Measuring importance and query relevance in topic-focused multi-document summarization

by Ani Nenkova

2016, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions - ACL '07

The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning... more

descriptionView Paper arrow_downwardDownload

A Cosine Maximization Minimization approach for User Oriented Multi-Document Update Summarization

by Juan-Manuel Torres-Moreno

2015

This paper presents a User-Oriented Multi-Document Update Summarization system based on a maximization-minimization approach. Our system relies on two main concepts. The first one is the cross summaries sentence redundancy removal which... more

descriptionView Paper arrow_downwardDownload

A Maximization-Minimization Approach for Update Text Summarization

by Juan-Manuel Torres-Moreno

2015

The work presents an update summarization system that uses a combination of two techniques to generate extractive summaries which focus on new but relevant information. A fast maximization-minimization approach is used to select sentences... more

descriptionView Paper arrow_downwardDownload

A Cosine Maximization-Minimization approach for User-Oriented Multi-Document Update Summarization

by Juan-Manuel Torres-Moreno

2015

descriptionView Paper arrow_downwardDownload

The LIA-Thales summarization system at DUC-2007

by M. El-bèze and

2015, Document Understanding Conference (DUC), Rochester, USA, April

descriptionView Paper arrow_downwardDownload

Multiple Document Summarization Using Principal Component Analysis Incorporating Semantic Vector Space Model

by Amit Gupta

2015

Text Summarization is very effective in relevant assessment tasks. The Multiple Document Summarizer presents a novel approach to select sentences from documents according to several heuristic features. Summaries are generated modeling the... more

descriptionView Paper arrow_downwardDownload

FastSum: Fast and accurate query-based multi-document summarization

by Frank Schilder and

2014

We present a fast query-based multi-document summarizer called FastSum based solely on word-frequency features of clusters, documents and topics. Summary sentences are ranked by a regression SVM. The summarizer does not use any expensive... more

descriptionView Paper arrow_downwardDownload

Exploiting Category-Specific Information for Multi-Document Summarization

by Jun Ping Ng

2013, International Conference on Computational Linguistics (COLING)

We show that by making use of information common to document sets belonging to a common category, we can improve the quality of automatically extracted content in multi-document summaries. This simple property is widely applicable in... more

descriptionView Paper arrow_downwardDownload

Language Technologies Research Center IIIT Hyderabad

by Praveen Bysani

2013

Abstract A Progressive summary helps a user to monitor changes in evolving news topics over a period of time. Detecting novel information is the essential part of progressive summarization that differentiates it from normal multi document... more

Figure 1: Stages in a Multi Document Summarizer The Focus of this paper is only on _ extrac- tive summarization, henceforth term summariza- tion/summarizer implies sentence extractive multi document summarization. Our Summarizer has 4 major stages as shown in Figure 1,

Table 2: Average ROUGE-2, ROUGE-SU4 recall scores for TAC 2009, cluster B

descriptionView Paper arrow_downwardDownload

Modeling novelty and feature combination using support vector regression for update summarization

by Praveen Bysani

2011, 7th International Conference on …

descriptionView Paper arrow_downwardDownload