Academia.eduAcademia.edu

Semantic Similarity Calculation

description20 papers
group54 followers
lightbulbAbout this topic
Semantic similarity calculation is the process of quantifying the degree of similarity in meaning between two or more linguistic entities, such as words, phrases, or texts, using various computational methods and models. This field integrates concepts from linguistics, computer science, and artificial intelligence to enhance natural language processing applications.
lightbulbAbout this topic
Semantic similarity calculation is the process of quantifying the degree of similarity in meaning between two or more linguistic entities, such as words, phrases, or texts, using various computational methods and models. This field integrates concepts from linguistics, computer science, and artificial intelligence to enhance natural language processing applications.

Key research themes

1. How can ontology and lexical taxonomy structures improve semantic similarity and relatedness measurement?

This research area focuses on exploiting structured knowledge bases, such as WordNet and domain-specific ontologies, to calculate semantic similarity and relatedness. These methods leverage hierarchical relationships (hypernymy/hyponymy), synonyms, and sometimes meronymy to compute similarity measures that reflect human-like semantic closeness. The importance lies in achieving interpretable, knowledge-driven similarity metrics that outperform purely corpus-based methods in precision and enable applications such as ontology matching, information retrieval, and word sense disambiguation.

Key finding: Proposed novel edge-counting search algorithms (BDLS and UBFS) incorporating syn/antonym, hyper/hyponym, and hol/meronym links in WordNet taxonomy with differentiated weights, achieving high correlation (0.921) with human... Read more
Key finding: Presented SemSimp, a parametric semantic similarity method leveraging information content and weighted ontologies derived from both digital resource datasets and ontology structure; extensive evaluation shows it outperforms... Read more
Key finding: Introduced an information content-based approach combining corpus statistics with WordNet's taxonomy for semantic similarity in information retrieval; demonstrated that incorporating the information content of the lowest... Read more
Key finding: Provided a comprehensive review distinguishing knowledge-based and distributional methods for computing semantic similarity of words and word senses; emphasized the importance of knowledge bases like WordNet to represent... Read more

2. What corpus-based and distributional semantic models best capture semantic textual similarity in practical applications?

This line of research investigates methods that use statistical information from large corpora and distributional semantics to compute semantic similarity of words, sentences, or documents. These approaches rely on co-occurrence patterns, word embeddings, and vector space models to model meaning based on context and usage frequencies. They aim to deliver scalable, domain-independent solutions often used in natural language processing tasks such as semantic textual similarity, document clustering, and short text similarity.

Key finding: Surveyed semantic textual similarity approaches spanning topological (WordNet-based), statistical, and string-based methods; proposed a novel sentence similarity method integrating WordNet synsets with uni-gram language... Read more
Key finding: Compared three semantic similarity methods—cosine similarity with tf-idf vectors, cosine similarity with word embeddings, and soft cosine similarity with word embeddings—for short news text; found that cosine similarity using... Read more
Key finding: Applied distributional vector space models including Random Indexing and Latent Semantic Analysis to semantic textual similarity tasks, demonstrating consistent outperformance over baseline metrics; additionally introduced... Read more
Key finding: Evaluated semantic similarity models within constrained and dynamic IoT/MEC/5G environments, showing that a distributional profile-based semantic model achieved competitive results compared to state-of-the-art corpus-based... Read more
Key finding: Developed a semantic relatedness measure leveraging the Web as a knowledge source through search engine frequency data; demonstrated domain-independence and universality by outperforming traditional lexical-resource-bound... Read more

3. Can lexico-syntactic pattern-based and hybrid lexical-corpus methods provide effective semantic similarity without reliance on hand-crafted knowledge bases?

This theme explores semantic similarity measures derived from automatically harvested lexical patterns and statistical co-occurrence, often implemented via pattern extraction or web-based statistics. The goal is to achieve wide coverage and reasonable precision without depending on curated resources like WordNet, which have limited domain coverage. These methods facilitate scalable semantic similarity computation applicable to named entity similarity, relation extraction, and semantic search.

Key finding: Proposed PatternSim, a corpus-based semantic similarity measure that exploits a rich set of lexicosyntactic finite state transducer patterns to extract semantic relations from large corpora; achieved correlations up to 0.739... Read more
Key finding: Developed an automatic method combining web search engine page counts with a novel pattern extraction and clustering algorithm to compute word semantic similarity; integration with vector support machines optimized the... Read more
Key finding: Employed a supervised regression model combining lexical, syntactic, and semantic metrics such as named entity preservation and predicate-argument alignments to predict sentence-level semantic similarity; demonstrated that... Read more
Key finding: Adapted a textual entailment system to compute graded semantic textual similarity by combining multiple WordNet-based word-to-word similarity measures aggregated at sentence level; results indicate the potential of... Read more
Key finding: Introduced a novel approach leveraging historical digitized book corpora to compute semantic similarity between words by statistically comparing their occurrence patterns over specific historical windows; preliminary findings... Read more

All papers in Semantic Similarity Calculation

Semantic similarity measurement between words is a tedious task in web mining, information extraction and natural language processing. The semantic similarity measurement between entities is required in Web mining applications such as... more
by Qin Lu
Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true... more
Recommender System is a subclass of information filtering system. It identifies similarity among users or items. It can be used as information filtering tool in online social network. Collaborative filtering recommendations are based on... more
Semantic similarity measurement between words is a tedious task in web mining, information extraction and natural language processing. The semantic similarity measurement between entities is required in Web mining applications such as... more
We describe the way to get benefit from broad cultural trends through the quantitative analysis of a vast digital book collection representing the digested history of humanity. Our research work has revealed that appropriately comparing... more
This paper identifies the factors that have an impact on mobile recommender systems. Recommender systems have become a technology that has been widely used by various online applications in situations where there is an information... more
Semantic relation is an important concept of information science. Now a days it is widely used in semantic web. This paper aims to present a measure to automatically determine semantic relation between words using web as knowledge source.... more
Semantic relation is an important concept of information science. Now a days it is widely used in semantic web. This paper aims to present a measure to automatically determine semantic relation between words using web as knowledge source.... more
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Despite challenges like concept drifts, or temporal dynamics in RS, RS has grown in popularity due to its usefulness in meeting customers' needs by helping them find things they might like based on past purchases and interests. Despite... more
Despite challenges like concept drifts, or temporal dynamics in RS, RS has grown in popularity due to its usefulness in meeting customers' needs by helping them find things they might like based on past purchases and interests. Despite... more
Measuring similarity between words using a search engine based on page counts alone is a challenging task. Search engines consider a document as a bag of words, ignoring the position of words in a document. In order to measure semantic... more
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of... more
Semantic similarity is a central concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Accurate measurement of semantic similarity between words is... more
Semantic similarity is a central concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Accurate measurement of semantic similarity between words is... more
Semantic similarity measurement between words is a tedious task in web mining, information extraction and natural language processing. The semantic similarity measurement between entities is required in Web mining applications such as... more
Semantic similarity measurement between words is a tedious task in web mining, information extraction and natural language processing. The semantic similarity measurement between entities is required in Web mining applications such as... more
The Matrix-Factorization (MF) based models have become popular when building Collaborative Filtering (CF) recommender systems, due to the high accuracy and scalability. Most of nowadays matrix factorization models don't have acceptable... more
Computing the textual similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional... more
Collaborative filtering is one of the most used approaches for providing recommendations in various online environments. Even though collaborative recommendation methods have been widely utilized due to their simplicity and ease of use,... more
Extracting a subset of a given OWL ontology that captures all the ontology's knowledge about a specified set of terms is a wellunderstood task. This task can be based, for instance, on locality-based modules (LBMs). These come in two... more
Description logics based languages have became the standard representation scheme for ontologies. They formalize the domain knowledge using interrelated concepts, contained in terminologies. The manual definition of terminologies is an... more
Recommender systems are typically provided as Web 2.0 services and are part of the range of applications that give support to large-scale social networks, enabling on-line recommendations to be made based on the use of networked... more
Recommender systems are the efficient and most used tools that prevail over the information overload problem, provide users with the most appropriate content by considering their personal preferences (mostly, ratings). In addition to... more
Clustering is one of the successful approaches of the model-based collaborative filtering techniques that deals with the problem of sparsity and provides quality recommendations. In the proposed work, fuzzy c-means clustering technique is... more
Estimating word relatedness is essential in natural language processing (NLP), and in many other related areas. Corpus-based word relatedness has its advantages over knowledge-based supervised measures. There are many corpus-based... more
Computing the textual similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional... more
Description logics based languages have became the standard representation scheme for ontologies. They formalize the domain knowledge using interrelated concepts, contained in terminologies. The manual definition of terminologies is an... more
The information available in the web is increasing daily. Searching for anything from web is very difficult because of availability of huge data and the disadvantage with the searching is, it simply mines data based on the keyword given... more
Collaborative filtering algorithm (CF) is a personalized recommendation algorithm that is the most widely used in e-commerce. CF still needs to be improved so that it can make adequate recommendations and solve the problems such as... more
The rapid growth of Internet technologies and availability of web tools created an opportunity to develop a robust and user-friendly web service model for medical care, and it demands urgent solutions as the uncertainty of disease spread... more
Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute... more
The proliferation of Internet has made people to rely on virtual recommendations. Recommender systems help out in giving important recommendations. Collaborative filtering is the most successful and widely used approach in designing... more
We describe the way to get benefit from broad cultural trends through the quantitative analysis of a vast digital book collection representing the digested history of humanity. Our research work has revealed that appropriately comparing... more
Recommendation System is a subclass of information filtering system. It identifies similarity among users or items. It can be used as information filtering tool in online social network. Collaborative filtering recommendations are based... more
Semantic similarity between words is fundamental to various fields such as Cognitive Science, Artificial Intelligence, Natural Language Processing and Information Retrieval. According to Baeza-Yates and Neto [2] an Information Retrieval... more
Recommendation System is a subclass of information filtering system. It identifies similarity among users or items. It can be used as information filtering tool in online social network. Collaborative filtering recommendations are based... more
Semantic similarity plays a significant role in the areas of Web mining, Information Retrieval, NLP and Text mining. Even though it is exploited in various applications accurately measuring semantic similarity still remains a challenging... more
Semantic similarity plays a significant role in the areas of Web mining, Information Retrieval, NLP and Text mining. Even though it is exploited in various applications accurately measuring semantic similarity still remains a challenging... more
Description logics based languages have became the standard representation scheme for ontologies. They formalize the domain knowledge using interrelated concepts, contained in terminologies. The manual definition of terminologies is an... more
Recommender Systems (RSs) are software tools and techniques that are used to produce recommendations for the users of a certain application in such a way that the recommendations generated are likely to be liked by the users. Popular... more
I hereby certify that the information contained in this (my submission) is information pertaining to research I conducted for this project. All information other than my own contribution will be fully referenced and listed in the relevant... more
Over the past few decades, various recommendation system paradigms have been developed for both research and industrial purposes to satisfy the needs and preferences of users when they deal with enormous data. The collaborative filtering... more
Finding relevant scholarly papers is an important task for researchers. Such a literature search involves identifying drawbacks in existing works and proposing new approaches that address them. However, the growing number of scientific... more
Finding relevant scholarly papers is an important task for researchers. Such a literature search involves identifying drawbacks in existing works and proposing new approaches that address them. However, the growing number of scientific... more
Semantic similarity measures play vital roles in information retrieval, natural language processing and paraphrasing detection. With the growing plagiarisms cases in both commercial and research community, designing efficient tools and... more
In this paper, we propose two exponential similarity measures for collaborative filtering in recommender systems. The proposed similarity measures are used to estimate the distance between two users or items. Furthermore, an algorithm is... more
In this paper, we build a hybrid Web-based metric for computing semantic relatedness between words. The method exploits page counts, titles, snippets and URLs returned by a Web search engine. Our technique uses traditional information... more
Download research papers for free!