Papers by Agnes Vathy-fogarassy
In this paper a new locality-based clustering algorithm is introduced for partitioning undirected... more In this paper a new locality-based clustering algorithm is introduced for partitioning undirected, unweighted graphs. This algorithm applies a new similarity measure based on the common neighbors of the vertices. Results of the suggested algorithm are compared to the generated clusters of the Girvan-Newman method as well as the running time is also measured on different data sets. The efficiency and the advantages of the proposed method, as well as our results are also presented in this paper through some application examples.

As data analysis tasks often have to deal with complex data structures, the nonlin-ear dimensiona... more As data analysis tasks often have to deal with complex data structures, the nonlin-ear dimensionality reduction methods play an important role in exploratory data analysis. In the literature a number of nonlinear dimensionality reduction techniques have been proposed (e.g. Sammon mapping, Locally Linear Embedding). These techniques attempt to preserve either the local or the global geometry of the original data, and they perform metric or non-metric dimensionality reduction. Nevertheless, it is difficult to apply most of them to large data sets. There is a need for new algorithms that are able to combine vector quantisation and mapping methods in order to visualise the data structure in a low-dimensional vector space. In this paper we define a new class of algorithms to quantify and disclose the data structure, that are based on the topology representing networks and apply different mapping methods to the low-dimensional visualisation. Not only existing methods are combined for that purpose but also a novel group of mapping methods (Topol-ogy Representing Network Map) are introduced as a part of this class. Topology Representing Network Maps utilise the main benefits of the topology representing networks and of the multidimensional scaling methods to disclose the real structure of the data set under study. To determine the main properties of the topology representing network based mapping methods, a detailed analysis of classical benchmark examples (Wine and Optical Recognition of Handwritten Digits data set) is presented.
As data analysis tasks often have to face the analysis of huge and complex data sets there is a n... more As data analysis tasks often have to face the analysis of huge and complex data sets there is a need for new algorithms that combine vector quantization and mapping methods to visualize the hidden data structure in a low-dimensional vector space. In this paper a new class of algorithms is defined. Topology representing networks are applied to quantify and disclose the data structure and different nonlinear mapping algorithms for the low-dimensional visualization are applied for the mapping of the quantized data. To evaluate the main properties of the resulted topology representing network based mapping methods a detailed analysis based on the wine benchmark example is given.

Acta Polytechnica Hungarica, vol. 3 (2) pp. 209-228. (2016), DOI: 10.12700/APH.13.2.2016.2.12, 2016
Production flow analysis includes various families of components and groups of machines. Machine-... more Production flow analysis includes various families of components and groups of machines. Machine-part cell formation means the optimal design of manufacturing cells consisting of similar machines producing similar products from a similar set of components. Most of the algorithms reorders of the machine-part incidence matrix. We generalize this classical concept to handle more than two elements of the production process (e.g. machine-part-product-resource-operator). The application of this extended concept requires an efficient optimization algorithm for the simultaneous grouping these elements. For this purpose, we propose a novel co-clustering technique based on crossing minimization of layered bipartite graphs. The present method has been implemented as a MATLAB toolbox. The efficiency of the proposed approach and developed tools is demonstrated by realistic case studies. The log-linear scalability of the algorithm is proven theoretically and experimentally.

Journal of Mathematical Modelling and Algorithm; vol. 7, num. 4, pp. 351-370, DOI: 10.1007/s10852-008-9092-y, 2008
In practical data mining tasks high-dimensional data has to be analyzed. In most of the cases it ... more In practical data mining tasks high-dimensional data has to be analyzed. In most of the cases it is very informative to map and visualize the hidden structure of a complex data set in a low-dimensional space. In this paper a new class of mapping algorithms is defined. These algorithms combine topology representing networks and different nonlinear mapping algorithms. While the former methods aim to quantify the data and disclose the real structure of the objects, the nonlinear mapping algorithms are able to visualize the quantized data in the low-dimensional vector space. In this paper we round up the techniques based on these methods and we show the results of a detailed analysis performed on them. The primary aim of this analysis was to examine the preservation of distances and neighborhood relations of the objects. Preservation of neighborhood relations was analyzed both in local and global environments. To evaluate the main properties of the examined methods we show the outcome of the analysis based on a synthetic and a real benchmark examples.
Fuzzy c-Medoid Graph Clustering
Artificial Intelligence and Soft Computing, pp 738-748, DOI 10.1007/978-3-319-07176-3_64, 2014
We present a modified fuzzy c-medoid algorithm to find central objects in graphs. Initial cluster... more We present a modified fuzzy c-medoid algorithm to find central objects in graphs. Initial cluster centres are determined by graph centrality measures. Cluster centres are fine-tuned by minimizing fuzzy-weighted geodesic distances calculated by Dijkstra’s algorithm. Cluster validity indices show significant improvement against fuzzy c-medoid clustering.

Applications of Fuzzy Sets Theory; pp. 195-202, 2007
Difierent clustering algorithms are based on difierent similar- ity or distance measures (e.g. Eu... more Difierent clustering algorithms are based on difierent similar- ity or distance measures (e.g. Euclidian distance, Minkowsky distance, Jackard coe-cient, etc.). Jarvis-Patrick clustering method utilizes the number of the common neighbors of the k-nearest neighbors of objects to disclose the clusters. The main drawback of this algorithm is that its parameters determine a too crisp cutting criterion, hence it is di-cult to determine a good parameter set. In this paper we give an extension of the similarity measure of the Jarvis-Patrick algorithm. This extension is carried out in the following two ways: (i) fuzzyflcation of one of the parameters, and (ii) spreading of the scope of the other parameter. The suggested fuzzy similarity measure can be applied in various forms, in difierent clustering and visualization techniques (e.g. hierarchical clus- tering, MDS, VAT). In this paper we give some application examples to illustrate the e-ciency of the use of the proposed fuzzy similarity measure in clustering. These examples show that the proposed fuzzy similarity measure based clustering techniques are able to detect clus- ters with difierent sizes, shapes and densities. It is also shown that the outliers are also detectable by the proposed measure.

Foundations of Information and Knowledge Systems, 4th International Symposium, pp. 313-330, 2006
Clustering is an important tool to explore the hidden struc- ture of large databases. There are s... more Clustering is an important tool to explore the hidden struc- ture of large databases. There are several algorithms based on difierent approaches (hierarchical, partitional, density-based, model-based, etc.). Most of these algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they sufier from numerical problems, like sensitive- ness to the initialization, etc. In this paper we introduce a new cluster- ing algorithm based on the sinergistic combination of the hierarchial and graph theoretic minimal spanning tree based clustering and the parti- tional Gaussian mixture model-based clustering algorithms. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically deflned parameters of these algorithms to decrease the in∞uence of the user on the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not sufier from the numerical problems of the Gaussian mixture based clustering algo- rithms.

World Academy of Science, Engineering and Technology (WASET), pp. 7-12., 2005
Most of fuzzy clustering algorithms have some discrepancies, e.g. they are not able to detect clu... more Most of fuzzy clustering algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. This paper studies the synergistic combination of the hierarchical and graph theoretic minimal spanning tree based clustering algorithm with the partitional Gath-Geva fuzzy clustering algorithm. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. For the analysis of the resulted fuzzy clusters a new fuzzy similarity measure based tool has been presented. The calculated similarities of the clusters can be used for the hierarchical clustering of the resulted fuzzy clusters, which information is useful for cluster merging and for the visualization of the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the classical Gath-Geva fuzzy clustering algorithm.

Fuzzy Sets and Systems, Volume 286, 1 March 2016, Pages 157-172, DOI:10.1016/j.fss.2015.06.022, 2016
Clustering high dimensional data and identifying central nodes in a graph are complex and computa... more Clustering high dimensional data and identifying central nodes in a graph are complex and computationally expensive tasks. We utilize k-nn graph of high dimensional data as efficient representation of the hidden structure of the clustering problem. Initial cluster centers are determined by graph centrality measures. Cluster centers are fine-tuned by minimizing fuzzy-weighted geodesic distances. The shortest-path based representation is parallel to the concept of transitive closure. Therefore, our algorithm is capable to cluster networks or even more complex and abstract objects based on their partially known pairwise similarities.
The algorithm is proven to be effective to identify senior researchers in a co-author network, central cities in topographical data, and clusters of documents represented by high dimensional feature vectors.
By the spreading of the information systems a huge amount of data has been aggregated in these sy... more By the spreading of the information systems a huge amount of data has been aggregated in these systems up to the present. Since strategically important information can be hidden in this mass of data, these pieces of information may be very valuable. With the help of data mining and knowledge discovery methods we can extract the hidden knowledge from these large amounts of data. These methods can be applied to numerous areas, for example commerce, telecommunication, finance and health care, too.

In practical data mining tasks high-dimensional data has to be analyzed. In most of the cases it ... more In practical data mining tasks high-dimensional data has to be analyzed. In most of the cases it is very informative to map and visualize the hidden structure of a complex data set in a low-dimensional space. In this paper a new class of mapping algorithms is defined. These algorithms combine topology representing networks and different nonlinear mapping algorithms. While the former methods aim to quantify the data and disclose the real structure of the objects, the nonlinear mapping algorithms are able to visualize the quantized data in the low-dimensional vector space. In this paper we round up the techniques based on these methods and we show the results of a detailed analysis performed on them. The primary aim of this analysis was to examine the preservation of distances and neighborhood relations of the objects. Preservation of neighborhood relations was analyzed both in local and global environments. To evaluate the main properties of the examined methods we show the outcome of the analysis based on a synthetic and a real benchmark examples.

One of the most important goals in the teaching of Mathematics and Physics is to develop and enco... more One of the most important goals in the teaching of Mathematics and Physics is to develop and encourage thinking based on understanding and to acquaint the learner with the two-way connection between real situations and models. The learning and teaching of Mathematics and Physics should result in knowledge that can be utilized in practical life just as well as in other subjects and professions; all this confirming the use of Maths and Physics. It is very important to make concepts visually well-demonstrated and robust. The technical development and spread of computer sciences, telecommunications and the Internet offers many new opportunities to users. With the help of a computer presentation, the subject matter in question can be outlined in a spectacular manner. The Internet offers unlimited freedom, which makes it possible to gather the latest information and material; furthermore, with the help of virtual reality tools, 3D images can also be displayed. Such virtual reality tools have been used in education for a long time, which is not surprising. There is no subject matter that would not require 3D presentation, modern demonstration techniques and a high-level of interactivity. Our goal is to create models that assist learners with insufficient education in developing their perception of space, gaining the mathematical knowledge required for the successful acquisition of a profession and in arriving at a more detailed analysis and understanding of the concepts and processes of Physics. The correct perception of space and the recognition of colours and shapes is necessary to get acquainted with the world. Such correct perception of reality is essential for moving, using transport and practically in all walks of life. The reason why this technology was chosen is that with the use of virtual reality any 3D world can be presented, regardless of whether it is a real or an abstract one. There is great satisfaction with the effectiveness and efficiency of such systems in institutions where these tools are already in use in the training of learners.

Lecture Notes in Computer Science, 2007
Different clustering algorithms are based on different similarity or distance measures (e.g. Eucl... more Different clustering algorithms are based on different similarity or distance measures (e.g. Euclidian distance, Minkowsky distance, Jackard coefficient, etc.). Jarvis-Patrick clustering method utilizes the number of the common neighbors of the k-nearest neighbors of objects to disclose the clusters. The main drawback of this algorithm is that its parameters determine a too crisp cutting criterion, hence it is difficult to determine a good parameter set. In this paper we give an extension of the similarity measure of the Jarvis-Patrick algorithm. This extension is carried out in the following two ways: (i) fuzzyfication of one of the parameters, and (ii) spreading of the scope of the other parameter. The suggested fuzzy similarity measure can be applied in various forms, in different clustering and visualization techniques (e.g. hierarchical clustering, MDS, VAT). In this paper we give some application examples to illustrate the efficiency of the use of the proposed fuzzy similarity measure in clustering. These examples show that the proposed fuzzy similarity measure based clustering techniques are able to detect clusters with different sizes, shapes and densities. It is also shown that the outliers are also detectable by the proposed measure.

Lecture Notes in Computer Science, 2008
In practical data mining problems high-dimensional data has to be analyzed. In most of these case... more In practical data mining problems high-dimensional data has to be analyzed. In most of these cases it is very informative to map and visualize the hidden structure of complex data set in a low-dimensional space. The aim of this paper is to propose a new mapping algorithm based both on the topology and the metric of the data. The utilized Topology Representing Network (TRN) combines neural gas vector quantization and competitive Hebbian learning rule in such a way that the hidden data structure is approximated by a compact graph representation. TRN is able to define a low-dimensional manifold in the high-dimensional feature space. In case the existence of a manifold, multidimensional scaling and/or Sammon mapping of the graph distances can be used to form the map of the TRN (TRNMap). The systematic analysis of the algorithms that can be used for data visualization and the numerical examples presented in this paper demonstrate that the resulting map gives a good representation of the topology and the metric of complex data sets, and the component plane representation of TRNMap is useful to explore the hidden relations among the features.

Lecture Notes in Computer Science, 2006
Clustering is an important tool to explore the hidden structure of large databases. There are sev... more Clustering is an important tool to explore the hidden structure of large databases. There are several algorithms based on different approaches (hierarchical, partitional, density-based, model-based, etc.). Most of these algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. In this paper we introduce a new clustering algorithm based on the sinergistic combination of the hierarchial and graph theoretic minimal spanning tree based clustering and the partitional Gaussian mixture model-based clustering algorithms. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the Gaussian mixture based clustering algorithms.
Lecture Notes in Computer Science, 2014
We present a modified fuzzy c-medoid algorithm to find central objects in graphs. Initial cluster... more We present a modified fuzzy c-medoid algorithm to find central objects in graphs. Initial cluster centres are determined by graph centrality measures. Cluster centres are fine-tuned by minimizing fuzzy-weighted geodesic distances calculated by Dijkstra’s algorithm. Cluster validity indices show significant improvement against fuzzy c-medoid clustering.
Vector Quantisation and Topology Based Graph Representation
SpringerBriefs in Computer Science, 2013
ABSTRACT
Graph-Based Clustering Algorithms
SpringerBriefs in Computer Science, 2013
ABSTRACT
Uploads
Papers by Agnes Vathy-fogarassy
The algorithm is proven to be effective to identify senior researchers in a co-author network, central cities in topographical data, and clusters of documents represented by high dimensional feature vectors.