Academia.eduAcademia.edu

dissimilarity measure

description36 papers
group1 follower
lightbulbAbout this topic
A dissimilarity measure is a quantitative metric used to assess the degree of difference or divergence between two or more objects, data points, or sets. It is commonly employed in statistics, machine learning, and data analysis to facilitate clustering, classification, and comparison of datasets.
lightbulbAbout this topic
A dissimilarity measure is a quantitative metric used to assess the degree of difference or divergence between two or more objects, data points, or sets. It is commonly employed in statistics, machine learning, and data analysis to facilitate clustering, classification, and comparison of datasets.

Key research themes

1. How can distance and divergence measures be unified and generalized to quantify dissimilarity between probability distributions?

This research theme focuses on the theoretical foundations and general frameworks for directed distances—often called divergences—used to measure dissimilarity between probability distributions in statistics and related fields such as machine learning and information theory. Unifying classical divergences like Kullback-Leibler, Pearson’s chi-square, and newer cumulative divergences helps in understanding statistical inference, goodness-of-fit, and model estimation. This line of work clarifies properties like non-negativity, reflexivity, and asymmetry of divergences and investigates continuous parameterizations covering known special cases.

Key finding: Introduces a comprehensive general framework for various directed distances (divergences) between probability distributions, including density-based divergences like Kullback-Leibler and Pearson’s chi-square, as well as... Read more
Key finding: Extends classical symmetric Csiszár divergences (f-divergences) to quantum states by optimizing over quantum measurements and state purifications, thus developing a new family of quantum distances measuring dissimilarity... Read more
Key finding: Presents a systematic survey of total variation and related distances (e.g., Kolmogorov-Smirnov, Wasserstein, Hellinger) between prominent probability distributions including univariate and multivariate Gaussians, Poissons,... Read more

2. What novel and application-specific dissimilarity and similarity measures improve clustering and classification performance for complex or uncertain data types?

This research area explores the design and evaluation of dissimilarity measures tailored for complex data types such as categorical data, fuzzy sets, neutrosophic sets, interval-valued data, and data with heterogeneous components. The goal is to enhance cluster quality, classification accuracy, and decision-making under uncertainty by capturing data-specific relationships beyond standard numeric distances. Methodological advances include learning-based adaptive dissimilarities for categorical data and refined metrics based on algebraic, geometric, or statistical principles for fuzzy and neutrosophic sets, often applied in pattern recognition, decision making, and medical diagnosis.

Key finding: Proposes a novel Learning-Based Dissimilarity measure for categorical data, which estimates the dissimilarity between attribute values based on the confusion likelihood derived from models trained to predict attribute values... Read more
Key finding: Analyzes component-wise dissimilarity measures suited for complex objects described by heterogeneous features (real numbers, categorical, graphs, etc.) which often generate non-metric spaces. The work discusses how combining... Read more
Key finding: Identifies limitations in existing simple matching and frequency-based dissimilarity measures used by the popular k-Modes clustering for categorical data, and introduces a novel dissimilarity that incorporates global... Read more
Key finding: Develops a new cosine similarity measure for Interval-Valued Neutrosophic Sets (IVNS) based on Bhattacharya’s distance addressing imprecise, incomplete, and inconsistent information representation common in real-world... Read more
Key finding: Extends the classical Hausdorff distance to neutrosophic refined sets allowing for repeated occurrences of truth, indeterminacy, and falsity membership degrees, and proposes corresponding similarity measures. The methods are... Read more

3. How can distance and similarity measures be adapted and applied in functional and fuzzy data analysis contexts, and what are their properties and practical utilities?

This theme targets the theoretical development and application of distance measures specialized for fuzzy numbers, functional data, intuitionistic fuzzy sets, and related constructs. It includes the formulation of new fuzzy distance definitions, entropy measures derived from similarity concepts, and adaptations of standard distances for non-standard data types. The research emphasizes how these measures address challenges in uncertainty quantification, scale measurement, classification, testing equality of variability, and image similarity assessment.

Key finding: Proposes a novel algorithm for computing distances between fuzzy numbers (especially triangular and trapezoidal forms) that generalizes to n-dimensional fuzzy spaces, addressing shortcomings of previous fuzzy distance... Read more
Key finding: Introduces a novel similarity measure for intuitionistic fuzzy sets (IFS) and develops an advanced entropy measure based on this similarity via a new axiomatic approach. The proposed entropy effectively captures uncertainty... Read more
Key finding: Develops an image similarity index leveraging the greatest and smallest fuzzy set solutions of fuzzy relation equations, applied to image blocks normalized to fuzzy relations. The measure outperforms traditional indices like... Read more

All papers in dissimilarity measure

Biological Mass Spectrometry is used to analyse peptides and proteins. A mass spectrum generates a list of measured mass to charge ratios and intensities of ionised peptides, which is called a peak-list. In order to classify the... more
In the field of Human-Computer Interaction (HCI), gesture recognition is becoming increasingly important as a mode of communication, in addition to the more common visual, aural and oral modes, and is of particular interest to designers... more
TOPS diagrams are concise descriptions of the structural topology of proteins, and their comparison usually relies on a structural alignment of the corresponding vertex ordered and vertex and edge labelled graphs. Such an approach... more
Self-organized maps (SOM) have been applied to analyze the similarities of chemical compounds and to select from a given pool of descriptors the smallest and more relevant subset needed to build robust QSAR models based on fuzzy ARTMAP.... more
Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community... more
Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community... more
Topographic maps such as the self organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques which allow to simultaneously cluster data and infer its topological structure, such that additional features, e.g.... more
Topographic mapping offers a very flexible tool to inspect large quantities of high-dimensional data in an intuitive way. Often, electronic data are inherently non Euclidean and modern data formats are connected to dedicated non-Euclidean... more
Topographic maps such as the self-organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques that allow simultaneously clustering data and inferring their topological structure, such that additional features, for... more
This document is provided as an electronic appendix to the article "A new dissimilarity measure for finding semantic structure in category fluency data with implications for understanding memory organization in schizophrenia".... more
In this paper, we propose to classify medical images using dissimilarities computed between collections of regions of interest. The images are mapped into a dissimilarity space using an image dissimilarity measure, and a standard vector... more
A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity... more
This paper focuses on how to reduce cheating and minimize errors while automatically grading paper-based multiple-choice questions (MCQ) by making the whole process relatively fast, less expensive, more credible, and fairer especially... more
Documents exist in different formats. When we have document images, in order to access some part, preferably all, of the information contained in that images, we have to deploy a document image analysis application. Document images can be... more
Dominant color descriptor (DCD) is one of the color descriptors proposed by MPEG-7 that has been extensively used for image retrieval. Among the color descriptors, DCD describes the salient color distributions in an image or a region of... more
This article presents the -distance, a family of distances between images recursively decomposed into segments and represented by multi-level feature vectors. Such a structure is a quad, a quin or a nona-tree resulting from a fixed and... more
This paper presents a tunable content-based music retrieval (CBMR) system suitable the for retrieval of music audio clips. The audio clips are represented as extracted feature vectors. The CBMR system is expert-tunable by altering the... more
Relevance feedback (RF) is an iterative process, which refines the retrievals by utilizing the user's feedback on previously retrieved results. Traditional RF techniques solely use the short-term learning experience and do not exploit the... more
This work introduces a new family of link-based dissimilarity measures between nodes of a weighted, directed, graph that generalizes both the shortest-path and the commute-time (or resistance) distances. This measure, called the... more
Advancing vulnerability science depends in part on identifying common themes from multiple, independent vulnerability assessments. Such insights are difficult to produce when the assessments use dissimilar, often qualitative, measures.... more
Many epidemiological studies involve analysis of clusters of diseases to infer locations of environmental hazards that could be responsible for the disease. This approach is however only suitable for sedentary populations or diseases with... more
This document is provided as an electronic appendix to the article "A new dissimilarity measure for finding semantic structure in category fluency data with implications for understanding memory organization in schizophrenia".... more
There are many procedures in the available literature to perform prediction in ungauged basins. Commonly, the Euclidean metric is used as a proxy of the hydrologic dissimilarity. Here we propose a procedure to find a metric on the basis... more
Abstract- This paper presents a new approach to clustering fuzzy data, called Extensional Tree (ET) clustering algorithm by defining a dendrogram over fuzzy data and using a new metric between fuzzy numbers based on α-cuts. All the... more
Download research papers for free!