Academia.eduAcademia.edu

Similarity Measures

description370 papers
group251 followers
lightbulbAbout this topic
Similarity measures are quantitative metrics used to assess the degree of similarity between two or more entities, such as objects, datasets, or patterns. These measures are fundamental in various fields, including statistics, machine learning, and information retrieval, facilitating tasks like clustering, classification, and recommendation.
lightbulbAbout this topic
Similarity measures are quantitative metrics used to assess the degree of similarity between two or more entities, such as objects, datasets, or patterns. These measures are fundamental in various fields, including statistics, machine learning, and information retrieval, facilitating tasks like clustering, classification, and recommendation.

Key research themes

1. How can component-wise and higher-order dissimilarity measures enhance similarity assessments in heterogeneous and complex data spaces?

This research area focuses on devising and analyzing dissimilarity/similarity measures tailored for complex real-world objects represented by heterogeneous, multi-component data. Traditional metric or Euclidean assumptions often fail in such unconventional spaces where data comprise mixed types (numerical, categorical, time series, graphs). Component-wise dissimilarities allow each heterogeneous component to be compared using domain-appropriate submeasures, combined often through weighted convex combinations. Theoretical and experimental studies explore how these weighted measures affect metric properties and Euclidean embeddability. Further, the concept of meta-distances introduces higher-order similarities that consider the relative similarities of objects with respect to the entire dataset, thereby capturing richer relational patterns beyond pairwise comparisons. These measures prove essential for improving pattern recognition and local classification performance in complex domains.

Key finding: The paper formalizes component-wise dissimilarity measures to accommodate heterogeneous real-world data described by mixed features and demonstrates that such dissimilarities often produce non-Euclidean matrices, limiting... Read more
Key finding: Introducing meta-distances constructed from primary classical distances by incorporating an adjunct dissimilarity factor encoding higher-order similarity relationships among all objects in a dataset, this study demonstrates... Read more
Key finding: The 'brsim' R package operationalizes the Brainerd-Robinson similarity coefficient designed for compositional data, facilitating significance testing through permutation methods and hierarchical clustering analyses. Its... Read more

2. How can semantic and fuzzy similarity measures be parametrically adapted and combined to improve conceptual reasoning and decision-making?

This theme covers theoretical and applied advancements in parametrically flexible similarity measures designed for semantic resources and fuzzy set representations. Semantic similarity methods leverage information content and ontology-based taxonomies with weights informed by either resource frequency or ontology structure, allowing improved assessment of concept relatedness capturing both statistical and domain-specific knowledge. Similarly, in fuzzy logic, combining distance and similarity measures into unified parametric forms addresses challenges in fuzzy set comparison, avoiding ambiguous interpretations when sets are disjoint or partially overlapping. Parametric adjustments and combinations enable tailoring similarity measures to better reflect nuanced semantic or fuzzy relationships, thereby enhancing applications such as semantic retrieval, multi-attribute decision making, and reasoning under uncertainty.

Key finding: The paper presents SemSim p, a parametric semantic similarity method that improves upon its predecessor by adjusting ontology concept weights and normalization factors. Experiments using the ACM Computing Classification... Read more
Key finding: This work formulates a novel combined measure of similarity and distance between fuzzy sets using an ordered weighted averaging (OWA) operator. It overcomes limitations when similarity or distance measures are used... Read more
Key finding: Addressing limitations in existing picture fuzzy set similarity measures, the paper introduces a parametric similarity measure with three adjustable parameters (m1, m2, m3) enabling flexible decision-making styles. Analytical... Read more

3. What novel similarity measures improve performance and interpretability in collaborative filtering and image similarity tasks?

This research cluster investigates new or hybrid similarity metrics tailored to enhance the effectiveness of collaborative filtering (CF) recommender systems and image similarity assessment. For CF, combining classical numerical similarity measures (e.g., cosine, Pearson correlation) with Jaccard similarity—which emphasizes presence/absence of ratings rather than rating magnitude—has been shown to produce superior neighbor identification and recommendation accuracy. In image similarity, beyond traditional pixel-wise metrics (PSNR, SSIM), novel approaches leverage fuzzy set solutions derived via max–min and min–max compositions or convolutional neural networks (CNNs) to capture nuanced perceptual similarities and increase robustness to noise. These advances address key challenges including sparsity, noise sensitivity, and semantic expressiveness, advancing both theory and practical applications in recommendation systems and image quality assessment.

Key finding: Experiments on MovieLens and FilmTrust datasets demonstrate that hybrid similarity measures combining Jaccard similarity—which captures rating presence—and numerical measures like cosine and Pearson outperform any single... Read more
Key finding: Extending prior findings, this paper empirically validates that fusing Jaccard similarity with classical numerical similarity metrics yields significant improvements in collaborative filtering performance. Rigorous testing on... Read more
Key finding: The study proposes a novel fuzzy-based image similarity measure that computes similarity using the greatest and smallest fuzzy sets derived as symmetrical solutions of fuzzy relation equations for image blocks. Evaluation on... Read more
Key finding: Introducing a deep learning-based approach, this paper develops a CNN model to assess similarity between UML class diagram images, thereby enabling automatic, objective evaluation of student diagrams in education. The model... Read more

All papers in Similarity Measures

Computer-assisted consensus in medical imaging involves automatic comparison of morphological abnormalities observed by physicians in images. We built an ontology of morphological abnormalities in breast pathology to assist inter-observer... more
At present many experts in the field of information technology have designed and developed algorithms to solve stemming problems, especially in Arabic. But of the many stemming analyses in Arabic, there is no standardization of a good... more
Every textbook is built upon the foundation of key concepts. Books that contain concepts that share some common properties and are semantically related are more lucid and intelligible than those that contain many unrelated concepts. These... more
measure between intuitionistic fuzzy sets and its application to
K-means clustering is a method of grouping data by looking for similarities between attributes possessed by data points and can overcome high data dimensions because of the simplicity of the algorithms it has. The disadvantage of the... more
Clustering is one of the most widely used machine learning techniques in data processing. Clustering has a wide range of applications, including market research, pattern recognition, data analysis, and image processing, among others. The... more
Memory is a complex phenomenon, and musical memory is especially interesting because it can involve so many facets: a visual image of the score, an aural recollection of the melody, the kinesthetic response of a performer, an analytical... more
Clustering is widely used to explore and understand large collections of data. In this thesis, we introduce LIMBO, a scalable hierarchical categorical clustering algorithm based on the Information Bottleneck (IB) framework for quantifying... more
Most of the ontology alignment tools use terminological techniques as the initial step and then apply the structural techniques to refine the results. Since each terminological similarity measure considers some features of similarity,... more
In this paper, the definition of intuitionistic fuzzy parameterised fuzzy soft set (ifpfs-sets) is introduced with their properties. Two operations on ifpfs-set, namely union and intersection are introduced. Also, some examples for these... more
The goal of image or video quality assessment is to evaluate if a distorted image or video is of a good quality by quantifying the difference between the original and distorted images or videos. In this paper, to assess the visual quality... more
The goal of image or video quality assessment is to evaluate if a distorted image or video is of a good quality by quantifying the difference between the original and distorted images or videos. In this paper, to assess the visual quality... more
Using the example of the 'h-related' publication dataset created for a previous study on the literature of Hirsch-type measures (Zhang et al. in J Informetri 5(3):583-593, 2011) and updated for the present paper, we attempt to study the... more
Public policies concerned with the reduction of poverty increasingly rely on identifying the most deprived households with the use of statistical targeting techniques. Targeting methods aim to measure deprivation as accurately as possible... more
Estimation of texture similarity is fundamental to many material recognition tasks. This study uses fine-grained human perceptual similarity ground-truth to provide a comprehensive evaluation of 51 texture feature sets. We conduct two... more
As each day passes by the world's NT requirements increase due to increasing population and technological advancements. Currently, traditional technologies are inadequate to support the requirement. It is vital to investigate... more
L'approche proposée consisteà comparer puisà classer des mesures de proximité dans un contexte topologique afin de sélectionner la meilleure mesure en vue d'effectuer une analyse des correspondances topologique. Les mesures de similarité... more
Proceeding, Seminar Nasional PESAT 2005 Auditorium Universitas Gunadarma, Jakarta, 23-24 Agustus 2005 ISSN : 18582559 PENTINGNYA PERANAN BAHASA D ALAM INTEROPERABILITAS INFORMASI BERBASISKAN KOMPUTER KARENA KERAGAMAN SEMANTIK I Wayan ...
Berita elektronik telah menjadi semakin popular sejak dimulainya perkembangan internet. Melalui internet, berita elektronik dikemas sedemikian rupa sehingga mampu mengalirkan informasi secara up-to-date kepada masyarakat. Namun, hal ini... more
As each day passes by the world's NT requirements increase due to increasing population and technological advancements. Currently, traditional technologies are inadequate to support the requirement. It is vital to investigate... more
Resumen. El análisis de autoría se ha convertido en una herramienta determinante para el análisis de documentos digitales en las ciencias forenses. Proponemos un método de Verificación de Autoría mediante el análisis de las semejanzas... more
Information is rising exponentially over the Internet. The World Wide Web has emerged as a treasure trove of knowledge and provide relevant information pertaining to any exclusive topic as per the individual’s performance or demand.... more
by MECS Press and 
1 more
Recommender Systems (RSs) work as a personal agent for individuals who are not able to make decisions from the potentially overwhelming number of alternatives available on the World Wide Web (or simply Web). Neighborhood-based algorithms... more
Reliable data and robust conceptual framework are two necessary preconditions for anti-poverty measures need to be effective and achieve their goals - bringing people out of poverty. Both preconditions are far from met in the case of Roma... more
This paper is a joint effort between five institutions that introduces several novel similarity measures and combines them to carry out a multimodal segmentation evaluation. The new similarity measures proposed are based on the location... more
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with... more
The spelling checker approach is planned to validate and rectify incorrectly spelled words by providing a list of alternative vocabulary that are more related to the erroneous phrase. Currently, English language spell checkers are well... more
Fuzzy risk analysis is widely used in risk assessment of components by linguistic terms. Fuzzy numbers are used to quantify the associated uncertainty. This study employs fuzzy risk analysis to evaluate processes for implementing... more
Very expressive Description Logics in the SH family have worst case complexity ranging from EXPTIME to double NEXPTIME. In spite of this, they are very popular with modellers and serve as the foundation of the Web Ontology Language (OWL),... more
Plagiarism of digital documents seems a serious problem in today’s era. Plagiarism refers to the use of someone’s data, language and writing without proper acknowledgment of the original source. Plagiarism can be of different types. This... more
Collaborative filtering (CF), one of the most widely employed methodologies for recommender systems, has drawn undeniable attention due to its effectiveness and simplicity. Nevertheless, a few papers have been published on the CF-based... more
CBIR (content based image retrieval) is the process which mainly focuses to provide efficient retrieval of digital image from the huge collection/database of the images. As many researchers and PhD scholars are working on this topic. So... more
In today's world of internet, with whole lot of e-documents such, as html pages, digital libraries etc. occupying considerable amount of cyber space, organizing these documents has become a practical need. Clustering is an important... more
TOPSIS, developed in 1981 by Hwang and Yoon, is one of the known multi-criteria decision-making (MCDM) methods. In 2015, the group decision-making method based on TOPSIS under fuzzy soft environment was defined and applied to a... more
Berita elektronik telah menjadi semakin popular sejak dimulainya perkembangan internet. Melalui internet, berita elektronik dikemas sedemikian rupa sehingga mampu mengalirkan informasi secara up-to-date kepada masyarakat. Namun, hal ini... more
There are a number of challenging problems in Content Based Image Retrieval (CBIR) particularly on structure of the image and image database. Separating an image into its constituent parts is a major task in this area. In fact, an image... more
This thesis makes an original contribution to knowledge in the eld of data objects' comparison where the objects are described by attributes of fuzzy or heterogeneous (numeric and symbolic) data types.
This thesis makes an original contribution to knowledge in the eld of data objects' comparison where the objects are described by attributes of fuzzy or heterogeneous (numeric and symbolic) data types.
The minimum backward Fréchet distance (MBFD) problem is a natural optimization problem for the weak Fréchet distance, a variant of the well-known Fréchet distance. In this problem, a threshold ε and two polygonal curves, T 1 and T 2 , are... more
Document clustering is an unsupervised machine learning technique that organizes a large collection of documents into smaller, topic homogenous, meaningful sub-collections (clusters). Traditional document clustering approaches use... more
Research Summary: We propose using text matching to measure the technological similarity between patents. Technology experts from different fields validate the new similarity measure and its improvement on measures based on the United... more
Different fields such as linguistics, teaching, and computing have demonstrated special interest in the study of sign languages (SL). However, the processes of teaching and learning these languages turn complex since it is unusual to find... more
Face recognition is a complex visual classification task which plays an important role in computer vision, image processing, and pattern recognition. SMWT is proposed to extract the features in images before using the PCA and histogram... more
Face recognition is a complex visual classification task which plays an important role in computer vision, image processing, and pattern recognition. SMWT is proposed to extract the features in images before using the PCA and histogram... more
This paper is a joint effort between five institutions that introduces several novel similarity measures and combines them to carry out a multimodal segmentation evaluation. The new similarity measures proposed are based on the location... more
Several methods exist in classification literature to quantify the similarity between two time series data sets. Applications of these methods range from the traditional Euclidean-type metric to the more advanced Dynamic Time Warping... more
In this paper, we propose a new approach for spellchecking errors committed in Arabic language. This approach is almost independent of the used dictionary, of the fact that we introduced the concept of morphological analysis in the... more
Introduced is a new algorithm for the classification of numerical data using the theory of fuzzy soft set, named Fuzzy Soft Set Classifier (FSSC). The algorithm uses the fuzzy approach in the pre-processing stage to obtain features, and... more
Download research papers for free!