Hierarchical Quasi-Clustering Methods for Asymmetric Networks
2014
Abstract
This paper introduces hierarchical quasiclustering methods, a generalization of hierarchical clustering for asymmetric networks where the output structure preserves the asymmetry of the input data. We show that this output structure is equivalent to a finite quasi-ultrametric space and study admissibility with respect to two desirable properties. We prove that a modified version of single linkage is the only admissible quasi-clustering method. Moreover, we show stability of the proposed method and we establish invariance properties fulfilled by it. Algorithms are further developed and the value of quasi-clustering analysis is illustrated with a study of internal migration within United States.
References (30)
- Boyd, J.P. Asymmetric clusters of internal migration re- gions of france. Ieee Transactions on Systems Man and Cybernetics, (2):101-104, 1980.
- Bureau of Economic Analysis. Input-output accounts: the use of commodities by industries before redefinitions. U.S. Department of Commerce, 2011. URL http:// www.bea.gov/iTable/index_industry.cfm.
- Carlsson, G. and Mémoli, F. Characterization, stability and convergence of hierarchical clustering methods. Journal of Machine Learning Research, 11:1425-1470, 2010.
- Carlsson, G. and Mémoli, F. Classifying clustering schemes. Foundations of Computational Mathematics, 13(2):221-252, 2013.
- Carlsson, G., Memoli, F., Ribeiro, A., and Segarra, S. Ax- iomatic construction of hierarchical clustering in asym- metric networks. In Acoustics, Speech and Signal Pro- cessing (ICASSP), 2013 IEEE International Conference on, pp. 5219-5223, 2013.
- Duan, R. and Pettie, S. Fast algorithms for (max, min)- matrix multiplication and bottleneck shortest paths. Sym- posium on discrete algorithms, 2009.
- Gondran, M. and Minoux, M. Graphs, dioids and semi rings: New models and algorithms. Springer, 2008.
- Gromov, M. Metric structures for Riemannian and non- Riemannian spaces. Birkhäuser Boston Inc., Boston, MA, 2007. ISBN 978-0-8176-4582-3; 0-8176-4582-9.
- Gurvich, V. and Vyalyi, M. Characterizing (quasi-) ultra- metric finite spaces in terms of (directed) graphs. Dis- crete Applied Mathematics, 160(12):1742-1756, 2012.
- Harzheim, E. Ordered sets. Springer, 2005.
- Hubert, L. Min and max hierarchical clustering using asymmetric similarity measures. Psychometrika, 38(1): 63-72, 1973.
- Jain, A.K. and Dubes, R. C. Algorithms for clustering data. Prentice Hall Advanced Reference Series. Prentice Hall Inc., 1988.
- Jardine, N. and Sibson, R. Mathematical taxonomy. John Wiley & Sons Ltd., London, 1971. Wiley Series in Prob- ability and Mathematical Statistics.
- Lance, G. N. and Williams, W. T. A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal, 9(4):373-380, 1967.
- Meila, M. and Pentney, W. Clustering by weighted cuts in directed graphs. Proceedings of the 7th SIAM Interna- tional Conference on Data Mining, 2007.
- Murtagh, F. Multidimensional clustering algorithms.
- Compstat Lectures, Vienna: Physika Verlag, 1985, 1, 1985.
- Newman, M. and Girvan, M. Community structure in so- cial and biological networks. Proc. Ntnl. Acad. Sci., 99 (12):7821-7826, 2002.
- Newman, M. and Girvan, M. Finding and evaluating com- munity structure in networks. Phys. Rev. E, 69, 026113, 2004.
- Ng, A., Jordan, M., and Weiss, Y. On spectral clustering: Analysis and an algorithm. In T.K. Leen, T.G. Dietterich and V. Tresp (Eds.), Advances in neural information pro- cessing systems 14, MIT Press, Cambridge, 2:849-856, 2002.
- Pentney, W. and Meila, M. Spectral clustering of biological sequence data. Proc. Ntnl. Conf. Artificial Intel., 2005.
- Saito, T. and Yadohisa, H. Data analysis of asymmet- ric structures: advanced approaches in computational statistics. CRC Press, 2004.
- Shi, J. and Malik, J. Normalized cuts and image segmen- tation. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 22(8):888-905, 2000.
- Slater, P.B. Hierarchical internal migration regions of france. Systems, Man and Cybernetics, IEEE Transac- tions on, (4):321-324, 1976.
- Slater, P.B. A partial hierarchical regionalization of 3140 us counties on the basis of 1965-1970 intercounty mi- gration. Environment and Planning A, 16(4):545-550, 1984.
- Tarjan, R. E. An improved algorithm for hierarchical clus- tering using strong components. Inf. Process. Lett., 17 (1):37-41, 1983.
- United States Census Bureau. State-to-state migration flows. U.S. Department of Commerce, 2011. URL http://www.census.gov/hhes/migration/ data/acs/state-to-state.html.
- Vassilevska, V., Williams, R., and Yuster, R. All pairs bot- tleneck paths and max-min matrix products in truly sub- cubic time. Theory of Computing, 5:173-189, 2009.
- Von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing, 17(4):395-416, 12 2007.
- Zhao, Y. and Karypis, G. Hierarchical clustering algo- rithms for document datasets. Data Mining and Knowl- edge Discovery, 10:141-168, 2005.