dissimilarity measure

description36 papers

group1 follower

lightbulbAbout this topic

A dissimilarity measure is a quantitative metric used to assess the degree of difference or divergence between two or more objects, data points, or sets. It is commonly employed in statistics, machine learning, and data analysis to facilitate clustering, classification, and comparison of datasets.

lightbulbAbout this topic

Key research themes

1. How can distance and divergence measures be unified and generalized to quantify dissimilarity between probability distributions?

This research theme focuses on the theoretical foundations and general frameworks for directed distances—often called divergences—used to measure dissimilarity between probability distributions in statistics and related fields such as machine learning and information theory. Unifying classical divergences like Kullback-Leibler, Pearson’s chi-square, and newer cumulative divergences helps in understanding statistical inference, goodness-of-fit, and model estimation. This line of work clarifies properties like non-negativity, reflexivity, and asymmetry of divergences and investigates continuous parameterizations covering known special cases.

A Unifying Framework for Some Directed Distances in Statistics

by Michel Broniatowski

2022, ArXiv

Key finding: Introduces a comprehensive general framework for various directed distances (divergences) between probability distributions, including density-based divergences like Kullback-Leibler and Pearson’s chi-square, as well as... Read more

articleView Paper downloadDownload

Quantum Distance Measures Based upon Classical Symmetric Csiszár Divergences

by TRISTAN MARTIN OSAN

2023, Entropy

Key finding: Extends classical symmetric Csiszár divergences (f-divergences) to quantum states by optimizing over quantum measurements and state purifications, thus developing a new family of quantum distances measuring dissimilarity... Read more

articleView Paper downloadDownload

Survey of Distances between the Most Popular Distributions

by Mark Kelbert

2023, Analytics

Key finding: Presents a systematic survey of total variation and related distances (e.g., Kolmogorov-Smirnov, Wasserstein, Hellinger) between prominent probability distributions including univariate and multivariate Gaussians, Poissons,... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What novel and application-specific dissimilarity and similarity measures improve clustering and classification performance for complex or uncertain data types?

This research area explores the design and evaluation of dissimilarity measures tailored for complex data types such as categorical data, fuzzy sets, neutrosophic sets, interval-valued data, and data with heterogeneous components. The goal is to enhance cluster quality, classification accuracy, and decision-making under uncertainty by capturing data-specific relationships beyond standard numeric distances. Methodological advances include learning-based adaptive dissimilarities for categorical data and refined metrics based on algebraic, geometric, or statistical principles for fuzzy and neutrosophic sets, often applied in pattern recognition, decision making, and medical diagnosis.

Learning-Based Dissimilarity for Clustering Categorical Data

by Manuel S Lazo-Cortés

2022, Applied Sciences

Key finding: Proposes a novel Learning-Based Dissimilarity measure for categorical data, which estimates the dissimilarity between attribute values based on the confusion likelihood derived from models trained to predict attribute values... Read more

articleView Paper downloadDownload

On component-wise dissimilarity measures and metric properties in pattern recognition

by Enrico De Santis

2022, PeerJ Computer Science

Key finding: Analyzes component-wise dissimilarity measures suited for complex objects described by heterogeneous features (real numbers, categorical, graphs, etc.) which often generate non-metric spaces. The work discusses how combining... Read more

articleView Paper downloadDownload

by fuyuan cao

2025, Knowledge Based Systems

Key finding: Identifies limitations in existing simple matching and frequency-based dissimilarity measures used by the popular k-Modes clustering for categorical data, and introduces a novel dissimilarity that incorporates global... Read more

articleView Paper downloadDownload

by Florentin Smarandache

2015

Key finding: Develops a new cosine similarity measure for Interval-Valued Neutrosophic Sets (IVNS) based on Bhattacharya’s distance addressing imprecise, incomplete, and inconsistent information representation common in real-world... Read more

articleView Paper downloadDownload

EXTENDED HAUSDORFF DISTANCE AND SIMILARITY MEASURES FOR NEUTROSOPHIC REFINED SETS AND THEIR APPLICATION IN MEDICAL DIAGNOSIS

by Florentin Smarandache

2016

Key finding: Extends the classical Hausdorff distance to neutrosophic refined sets allowing for repeated occurrences of truth, indeterminacy, and falsity membership degrees, and proposes corresponding similarity measures. The methods are... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can distance and similarity measures be adapted and applied in functional and fuzzy data analysis contexts, and what are their properties and practical utilities?

This theme targets the theoretical development and application of distance measures specialized for fuzzy numbers, functional data, intuitionistic fuzzy sets, and related constructs. It includes the formulation of new fuzzy distance definitions, entropy measures derived from similarity concepts, and adaptations of standard distances for non-standard data types. The research emphasizes how these measures address challenges in uncertainty quantification, scale measurement, classification, testing equality of variability, and image similarity assessment.

A Theoretical Development on Fuzzy Distance Measure

by Tayebeh Hajjari

2022

Key finding: Proposes a novel algorithm for computing distances between fuzzy numbers (especially triangular and trapezoidal forms) that generalizes to n-dimensional fuzzy spaces, addressing shortcomings of previous fuzzy distance... Read more

articleView Paper downloadDownload

An Advanced Entropy Measure of IFSs via Similarity Measure

by Dr Palash Dutta

2024, International Journal of Fuzzy System Applications

Key finding: Introduces a novel similarity measure for intuitionistic fuzzy sets (IFS) and develops an advanced entropy measure based on this similarity via a new axiomatic approach. The proposed entropy effectively captures uncertainty... Read more

articleView Paper downloadDownload

A Novel Image Similarity Measure Based on Greatest and Smallest Eigen Fuzzy Sets

by salvatore sessa

2025, Symmetry

Key finding: Develops an image similarity index leveraging the greatest and smallest fuzzy set solutions of fuzzy relation equations, applied to image blocks normalized to fuzzy relations. The measure outperforms traditional indices like... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in dissimilarity measure

Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process

by Knut Reinert

2025, BMC bioinformatics

Biological Mass Spectrometry is used to analyse peptides and proteins. A mass spectrum generates a list of measured mass to charge ratios and intensities of ionised peptides, which is called a peak-list. In order to classify the... more

descriptionView Paper arrow_downwardDownload

GesRec3D: A Real-Time Coded Gesture-to-Speech System with Automatic Segmentation and Recognition Thresholding Using Dissimilarity Measures

by Michael Craven

2025, Springer eBooks

In the field of Human-Computer Interaction (HCI), gesture recognition is becoming increasingly important as a mode of communication, in addition to the more common visual, aural and oral modes, and is of particular interest to designers... more

descriptionView Paper arrow_downwardDownload

Alignment-free comparison of tops strings

by Mallika Veeramalai

2025

TOPS diagrams are concise descriptions of the structural topology of proteins, and their comparison usually relies on a structural alignment of the corresponding vertex ordered and vertex and edge labelled graphs. Such an approach... more

descriptionView Paper arrow_downwardDownload

An Integrated SOM-Fuzzy ARTMAP Neural System for the Evaluation of Toxicity

by Francesc Giralt

2025, Journal of Chemical Information and Computer Sciences

Self-organized maps (SOM) have been applied to analyze the similarities of chemical compounds and to select from a given pool of descriptors the smallest and more relevant subset needed to build robust QSAR models based on fuzzy ARTMAP.... more

descriptionView Paper arrow_downwardDownload

by fuyuan cao

2025, Knowledge Based Systems

Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community... more

descriptionView Paper arrow_downwardDownload

by fuyuan cao

2025, Knowledge-Based Systems

descriptionView Paper arrow_downwardDownload

Topographic Mapping of Large Dissimilarity Data Sets

by Alexander Hasenfuß

2025, Neural Computation

Topographic maps such as the self organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques which allow to simultaneously cluster data and infer its topological structure, such that additional features, e.g.... more

descriptionView Paper arrow_downwardDownload

Topographic mapping of dissimilarity data

by Alexander Hasenfuß

2025, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Topographic mapping offers a very flexible tool to inspect large quantities of high-dimensional data in an intuitive way. Often, electronic data are inherently non Euclidean and modern data formats are connected to dedicated non-Euclidean... more

descriptionView Paper arrow_downwardDownload

Topographic Mapping of Large Dissimilarity Data Sets

by Alexander Hasenfuß

2025, Neural Computation

Topographic maps such as the self-organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques that allow simultaneously clustering data and inferring their topological structure, such that additional features, for... more

descriptionView Paper arrow_downwardDownload

Metrics for Measuring Inter-item Dissimilarity in Fluency Data

by Nusrat Mir

2025

This document is provided as an electronic appendix to the article "A new dissimilarity measure for finding semantic structure in category fluency data with implications for understanding memory organization in schizophrenia".... more

descriptionView Paper arrow_downwardDownload

by Marleen Bruijne

2025, Lecture Notes in Computer Science

In this paper, we propose to classify medical images using dissimilarities computed between collections of regions of interest. The images are mapped into a dissimilarity space using an image dissimilarity measure, and a standard vector... more

descriptionView Paper arrow_downwardDownload

by Marleen Bruijne

2025, Lecture Notes in Computer Science

A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity... more

descriptionView Paper arrow_downwardDownload

A paper-based cheat-resistant multiple-choice question system with automated grading

by International Journal of Evaluation and Research in Education (IJERE)

2025, International Journal of Evaluation and Research in Education (IJERE)

This paper focuses on how to reduce cheating and minimize errors while automatically grading paper-based multiple-choice questions (MCQ) by making the whole process relatively fast, less expensive, more credible, and fairer especially... more

descriptionView Paper arrow_downwardDownload

Indexation of Document Images Using Frequent Items

by Eugen Barbu

2025

Documents exist in different formats. When we have document images, in order to access some part, preferably all, of the information contained in that images, we have to deploy a document image analysis application. Document images can be... more

descriptionView Paper arrow_downwardDownload

A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval

by Chung-ming Kuo

2025, Journal of Visual Communication and Image Representation

Dominant color descriptor (DCD) is one of the color descriptors proposed by MPEG-7 that has been extensively used for image retrieval. Among the color descriptors, DCD describes the salient color distributions in an image or a region of... more

descriptionView Paper arrow_downwardDownload

Δ-distance: A family of dissimilarity metrics between images represented by multi-level feature vectors

by Maude Manouvrier

2025, Information Retrieval

This article presents the -distance, a family of distances between images recursively decomposed into segments and represented by multi-level feature vectors. Such a structure is a quad, a quin or a nona-tree resulting from a fixed and... more

descriptionView Paper arrow_downwardDownload

Adaptive content-based music retrieval system

by Aleksandar Kovačević

2025, Multimedia Tools and Applications

This paper presents a tunable content-based music retrieval (CBMR) system suitable the for retrieval of music audio clips. The audio clips are represented as extracted feature vectors. The CBMR system is expert-tunable by altering the... more

descriptionView Paper arrow_downwardDownload

Long-Term Cross-Session Relevance Feedback Using Virtual Features

by B. Bhanu

2024, IEEE Transactions on Knowledge and Data Engineering

Relevance feedback (RF) is an iterative process, which refines the retrievals by utilizing the user's feedback on previously retrieved results. Traditional RF techniques solely use the short-term learning experience and do not exploit the... more

descriptionView Paper arrow_downwardDownload

A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances

by Amin Mantrach

2024, Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

This work introduces a new family of link-based dissimilarity measures between nodes of a weighted, directed, graph that generalizes both the shortest-path and the commute-time (or resistance) distances. This measure, called the... more

descriptionView Paper arrow_downwardDownload

Building comparable global change vulnerability assessments: The vulnerability scoping diagram

by Rob neff

2024, Global Environmental Change

Advancing vulnerability science depends in part on identifying common themes from multiple, independent vulnerability assessments. Such insights are difficult to produce when the assessments use dissimilar, often qualitative, measures.... more

Fig. 1. Assessing vulnerabilities to the effects of global change: an eight step approach. source: Schroter et al. (2005).

Fig. 2. General form of the VSD (Hazard & Exposure Unit Unspecified).

Fig. 4. Hypothetical VSD based on HERO research: Hazard = drought, and Exposure Unit = generic community water system.

descriptionView Paper arrow_downwardDownload

by gaurav sinha

2024, Journal of Geographical Systems

Many epidemiological studies involve analysis of clusters of diseases to infer locations of environmental hazards that could be responsible for the disease. This approach is however only suitable for sedentary populations or diseases with small latency periods. For migratory populations and diseases with long latency periods, people may change their residential location between time of exposure and onset of ill health. For such situations, clusters are diffused and diluted by in-and out-migration and may become very difficult to detect. One way to address the problem of diffused clusters is to include in analyses not only current residential locations, but all past locations at which cases might have been exposed to environmental hazardous. In this paper, we assume that a person's residential history provides such information and represent it through a discrete geospatial lifeline data model. Clusters of similar geospatial lifelines represent individuals who have similar residential histories-and therefore represent people who are more likely to have had similar environmental exposure histories. We therefore introduce a lifeline distance (dissimilarity) measure to detect clusters of cases, providing a basis for revealing possible regions in space-time where environmental hazards might have existed in the past. The ability of the measure to distinguish cases from controls is tested using two sets of synthetically generated cases and controls. Results indicate that the measure is able to consistently distinguish between populations of cases and controls with statistically significant results. The lifeline distance measure consistently outperforms another measure which uses only the distance between subjects' residences at time of diagnosis. However, the advantages of using the entire residential history are only partly realized, since the ability to distinguish between cases and controls is only moderately better for the lifeline distance function. Future work is needed to investigate This project is supported by grant number 1 R01 ES09816-01 from the National Institute of Environmental Health Sciences, NIH. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIEHS or NIH. We wish to thank Peter Rogerson for helpful discussions of the migration models, and the anonymous reviewers for pointing out areas where the paper could be improved.

descriptionView Paper arrow_downwardDownload

Metrics for Measuring Inter-item Dissimilarity in Fluency Data

by Lisa Newton

2024

length 7. The figure shows the cumulative frequency for ‘raw’ inter-items distances of 0-6, and, above this, the value of D(d,7) for each possible observed inter-item distance. For any iven value of d, D(d,n) exactly bisect the range of the relevant frequency distribution. Figure | provides a graphical illustration of the proposed measure for a list containing seven

Frequency f(d,n) of ‘Raw’ Inter-item Distances d in Lists of Length n From Table 1 we can see that the number of pairs obtained at a given distance, d, increase:

descriptionView Paper arrow_downwardDownload

Streamflow prediction in ungauged catchments using copula‐based dissimilarity measures

by Luis Antonio Samaniego

2024, Water Resources Research

There are many procedures in the available literature to perform prediction in ungauged basins. Commonly, the Euclidean metric is used as a proxy of the hydrologic dissimilarity. Here we propose a procedure to find a metric on the basis... more

descriptionView Paper arrow_downwardDownload

210 Int'l Conf. Artificial Intelligence | ICAI'09 | A New Tree Clustering Algorithm for Fuzzy Data Based on α-cuts

by Mostafa Sabzekar

2024

Abstract- This paper presents a new approach to clustering fuzzy data, called Extensional Tree (ET) clustering algorithm by defining a dendrogram over fuzzy data and using a new metric between fuzzy numbers based on α-cuts. All the... more

descriptionView Paper arrow_downwardDownload