Academia.eduAcademia.edu

Sentence Similarity

description16 papers
group0 followers
lightbulbAbout this topic
Sentence similarity is a computational linguistics task that measures the degree of semantic equivalence between two sentences. It involves analyzing syntactic structures, word meanings, and contextual usage to quantify how closely related the sentences are in terms of meaning, often utilizing algorithms and models from natural language processing.
lightbulbAbout this topic
Sentence similarity is a computational linguistics task that measures the degree of semantic equivalence between two sentences. It involves analyzing syntactic structures, word meanings, and contextual usage to quantify how closely related the sentences are in terms of meaning, often utilizing algorithms and models from natural language processing.

Key research themes

1. How do domain-specific string-based and ontology-informed methods improve biomedical sentence similarity measurement with reproducibility?

This research theme centers on the development and evaluation of sentence similarity methods tailored to the biomedical domain, emphasizing the importance of reproducible experimental setups. Due to the highly specialized vocabulary, complex syntactic structures, and abundant acronyms in biomedical texts, conventional sentence similarity models often underperform. The theme addresses the integration of string-based techniques and ontology-based semantic methods, the impact of preprocessing stages and Named Entity Recognition (NER) tools on method performance, and the establishment of reproducible resources and protocols to enhance experimental rigor and comparability.

Key finding: Introduces LiBlock, a novel aggregated string-based sentence similarity measure that significantly outperforms all evaluated state-of-the-art machine learning models and most ontology-based methods on multiple biomedical... Read more
Key finding: Proposes a hybrid similarity measure combining word embeddings with named-entity based semantic similarity, addressing semantic variation and noise in short biomedical texts. The approach achieves enhanced performance in... Read more
Key finding: Demonstrates that weighting sentence similarity calculations by noun phrase (NP) importance, as opposed to standard term frequency, significantly improves semantic similarity measures applicable to text categorization and... Read more

2. What are the impacts of lexical, syntactic, and semantic features, combined with vector-based distributional representations, on general domain sentence similarity?

This theme investigates the multifaceted role of lexical overlap, syntactic structure, and semantic frame alignment in modeling sentence similarity within general or cross-domain corpora. It explores supervised machine learning integration of these heterogeneous features, the utilization of distributional semantic models such as Random Indexing and Latent Semantic Analysis, and the embedding of syntactic and semantic information directly into vector representations (e.g., via vector permutations or recursive autoencoders). The research seeks to identify complementary strengths of diverse features to improve semantic textual similarity estimates beyond lexical matching alone.

Key finding: Employs a supervised regression model combining lexical metrics (word overlap and cosine similarity), syntactic similarity via BLEU scores over base-phrases, and semantic similarity based on named entity preservation and... Read more
Key finding: Utilizes distributional semantic spaces constructed via Random Indexing, Latent Semantic Analysis, and an innovative vector permutation method to inject syntactic information into representations. Results show that combining... Read more
Key finding: Compares multiple word embedding methodologies — recursive autoencoders, eigenword spectral methods, and selector generalizations — to generate word-level similarities that are aggregated at the sentence level through... Read more

3. How do hybrid and deep learning approaches integrating lexical relationships and sentence structure advance sentence similarity estimation?

This research focus explores the development of hybrid methodologies that combine deep learning models (e.g., CNNs, RNNs, BERT) with lexical knowledge-based techniques (e.g., WordNet) to measure semantic similarity between sentences. It emphasizes the need to incorporate lexical relationships, syntactic structures, word order, and semantic nuances such as determiners and negations. The goal is to improve similarity measures by capturing compositional semantic phenomena beyond simple lexical overlap, with particular attention to datasets and tasks where paraphrase detection and nuanced semantic differences are critical.

Key finding: Proposes a hybrid sentence similarity measurement method combining deep learning architectures (CNN, RNN, BERT) with lexical relationship analysis based on WordNet, integrating cosine similarity on embedding vectors and... Read more
Key finding: Introduces a challenging evaluation dataset emphasizing semantic differences stemming from word order swaps and determiner replacements in paraphrase detection. Results reveal that compositional distributional semantics... Read more
Key finding: Demonstrates that raw similarity scores between sentence pairs can be misleading for textual entailment prediction and proposes a supervised method that jointly learns non-linear transformations of multiple lexical similarity... Read more

All papers in Sentence Similarity

Most high school students are able to write arguments. However, most students are still unable to develop complex writing. The purpose of this research was to investigate the students' argumentative writing which displays various... more
by Teti Sobari and 
1 more
Most high school students are able to write arguments. However, most students are still unable to develop complex writing. The purpose of this research was to investigate the students' argumentative writing which displays various... more
The need for an effective text similarity measures has led many previous studies to propose different text weighting schemes to enhance the overall performance of sentence similarity measures. Term Frequency Inverse Document Frequency (TF... more
The problem of measuring sentence similarity is an essential issue in the natural language processing area. It is necessary to measure the similarity between sentences accurately. Sentence similarity measuring is the task of finding... more
The problem of measuring sentence similarity is an essential issue in the natural language processing area. It is necessary to measure the similarity between sentences accurately. Sentence similarity measuring is the task of finding... more
The current study investigates the degree to which the lexical properties of students’ essays can inform stealth assessments of their vocabulary knowledge. In particular, we used indices calculated with the natural language processing... more
Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate... more
In this report, we present details about the participation of IIIT Hyderabad in Guided Summarization and Knowledge Base Population tracks at TAC 2011. we have enhanced our summarization system with knowledge based measures. Wikipedia... more
This article describes a method used to calculate the similarity between short English texts, specifically of sentence length. The described algorithm calculates semantic and word order similarities of two sentences. In order to do so, it... more
In this report, we present details about the participation of IIIT Hyderabad in Guided Summarization and Knowledge Base Population tracks at TAC 2011. we have enhanced our summarization system with knowledge based measures. Wikipedia... more
Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate... more
Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate... more
Sentence similarity is considered the basis of many natural language tasks such as information retrieval, question answering and text summarization. The semantic meaning between compared text fragments is based on the words' semantic... more
The Document Understanding Conference (DUC) 2005 evaluation had a single useroriented, question-focused summarization task, which was to synthesize from a set of 25-50 documents a well-organized, fluent answer to a complex question. The... more
The focus of DUC 2005 was on developing new evaluation methods that take into account variation in content in human-authored summaries. Therefore, DUC 2005 had a single user-oriented, question-focused summarization task that allowed the... more
In this report, we present details about the participation of IIIT Hyderabad in Guided Summarization and Knowledge Base Population tracks at TAC 2011. we have enhanced our summarization system with knowledge based measures. Wikipedia... more
Sentence similarity is considered the basis of many natural language tasks such as information retrieval, question answering and text summarization. The semantic meaning between compared text fragments is based on the words' semantic... more
This study aimed to utilize sentiment and sentence similarity analyses, two Natural Language Processing techniques, to see if and how well they could predict L2 Writing Performance in integrated and independent task conditions. The data... more
Automatically summarizing a document requires conveying the important points of a large document in only a few sentences. Extractive strategies for summarization are based on selecting the most important sentences from the input... more
This paper presents the LIA summarization systems participating to DUC 2007. This is the second participation of the LIA at DUC and we will discuss our systems in both main and update tasks. The system proposed for the main task is the... more
The importance of text summarization grows rapidly as the amount of information increases exponentially. This paper presents a new hybrid summarization technique that combines statistical properties of documents with Farsi linguistic... more
The importance of text summarization grows rapidly as the amount of information increases exponentially. This paper presents a new hybrid summarization technique that combines statistical properties of documents with Farsi linguistic... more
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with... more
In this paper, we propose a supervised model for ranking word importance that incorporates a rich set of features. Our model is superior to prior approaches for identifying words used in human summaries. Moreover we show that an... more
The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning... more
This paper presents a User-Oriented Multi-Document Update Summarization system based on a maximization-minimization approach. Our system relies on two main concepts. The first one is the cross summaries sentence redundancy removal which... more
The work presents an update summarization system that uses a combination of two techniques to generate extractive summaries which focus on new but relevant information. A fast maximization-minimization approach is used to select sentences... more
This paper presents a User-Oriented Multi-Document Update Summarization system based on a maximization-minimization approach. Our system relies on two main concepts. The first one is the cross summaries sentence redundancy removal which... more
This paper presents the LIA summarization systems participating to DUC 2007. This is the second participation of the LIA at DUC and we will discuss our systems in both main and update tasks. The system proposed for the main task is the... more
Text Summarization is very effective in relevant assessment tasks. The Multiple Document Summarizer presents a novel approach to select sentences from documents according to several heuristic features. Summaries are generated modeling the... more
We present a fast query-based multi-document summarizer called FastSum based solely on word-frequency features of clusters, documents and topics. Summary sentences are ranked by a regression SVM. The summarizer does not use any expensive... more
We show that by making use of information common to document sets belonging to a common category, we can improve the quality of automatically extracted content in multi-document summaries. This simple property is widely applicable in... more
Abstract A Progressive summary helps a user to monitor changes in evolving news topics over a period of time. Detecting novel information is the essential part of progressive summarization that differentiates it from normal multi document... more
Download research papers for free!