Academia.eduAcademia.edu

Outline

Hierarchical Quasi-Clustering Methods for Asymmetric Networks

2014

Abstract

This paper introduces hierarchical quasiclustering methods, a generalization of hierarchical clustering for asymmetric networks where the output structure preserves the asymmetry of the input data. We show that this output structure is equivalent to a finite quasi-ultrametric space and study admissibility with respect to two desirable properties. We prove that a modified version of single linkage is the only admissible quasi-clustering method. Moreover, we show stability of the proposed method and we establish invariance properties fulfilled by it. Algorithms are further developed and the value of quasi-clustering analysis is illustrated with a study of internal migration within United States.

References (30)

  1. Boyd, J.P. Asymmetric clusters of internal migration re- gions of france. Ieee Transactions on Systems Man and Cybernetics, (2):101-104, 1980.
  2. Bureau of Economic Analysis. Input-output accounts: the use of commodities by industries before redefinitions. U.S. Department of Commerce, 2011. URL http:// www.bea.gov/iTable/index_industry.cfm.
  3. Carlsson, G. and Mémoli, F. Characterization, stability and convergence of hierarchical clustering methods. Journal of Machine Learning Research, 11:1425-1470, 2010.
  4. Carlsson, G. and Mémoli, F. Classifying clustering schemes. Foundations of Computational Mathematics, 13(2):221-252, 2013.
  5. Carlsson, G., Memoli, F., Ribeiro, A., and Segarra, S. Ax- iomatic construction of hierarchical clustering in asym- metric networks. In Acoustics, Speech and Signal Pro- cessing (ICASSP), 2013 IEEE International Conference on, pp. 5219-5223, 2013.
  6. Duan, R. and Pettie, S. Fast algorithms for (max, min)- matrix multiplication and bottleneck shortest paths. Sym- posium on discrete algorithms, 2009.
  7. Gondran, M. and Minoux, M. Graphs, dioids and semi rings: New models and algorithms. Springer, 2008.
  8. Gromov, M. Metric structures for Riemannian and non- Riemannian spaces. Birkhäuser Boston Inc., Boston, MA, 2007. ISBN 978-0-8176-4582-3; 0-8176-4582-9.
  9. Gurvich, V. and Vyalyi, M. Characterizing (quasi-) ultra- metric finite spaces in terms of (directed) graphs. Dis- crete Applied Mathematics, 160(12):1742-1756, 2012.
  10. Harzheim, E. Ordered sets. Springer, 2005.
  11. Hubert, L. Min and max hierarchical clustering using asymmetric similarity measures. Psychometrika, 38(1): 63-72, 1973.
  12. Jain, A.K. and Dubes, R. C. Algorithms for clustering data. Prentice Hall Advanced Reference Series. Prentice Hall Inc., 1988.
  13. Jardine, N. and Sibson, R. Mathematical taxonomy. John Wiley & Sons Ltd., London, 1971. Wiley Series in Prob- ability and Mathematical Statistics.
  14. Lance, G. N. and Williams, W. T. A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal, 9(4):373-380, 1967.
  15. Meila, M. and Pentney, W. Clustering by weighted cuts in directed graphs. Proceedings of the 7th SIAM Interna- tional Conference on Data Mining, 2007.
  16. Murtagh, F. Multidimensional clustering algorithms.
  17. Compstat Lectures, Vienna: Physika Verlag, 1985, 1, 1985.
  18. Newman, M. and Girvan, M. Community structure in so- cial and biological networks. Proc. Ntnl. Acad. Sci., 99 (12):7821-7826, 2002.
  19. Newman, M. and Girvan, M. Finding and evaluating com- munity structure in networks. Phys. Rev. E, 69, 026113, 2004.
  20. Ng, A., Jordan, M., and Weiss, Y. On spectral clustering: Analysis and an algorithm. In T.K. Leen, T.G. Dietterich and V. Tresp (Eds.), Advances in neural information pro- cessing systems 14, MIT Press, Cambridge, 2:849-856, 2002.
  21. Pentney, W. and Meila, M. Spectral clustering of biological sequence data. Proc. Ntnl. Conf. Artificial Intel., 2005.
  22. Saito, T. and Yadohisa, H. Data analysis of asymmet- ric structures: advanced approaches in computational statistics. CRC Press, 2004.
  23. Shi, J. and Malik, J. Normalized cuts and image segmen- tation. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 22(8):888-905, 2000.
  24. Slater, P.B. Hierarchical internal migration regions of france. Systems, Man and Cybernetics, IEEE Transac- tions on, (4):321-324, 1976.
  25. Slater, P.B. A partial hierarchical regionalization of 3140 us counties on the basis of 1965-1970 intercounty mi- gration. Environment and Planning A, 16(4):545-550, 1984.
  26. Tarjan, R. E. An improved algorithm for hierarchical clus- tering using strong components. Inf. Process. Lett., 17 (1):37-41, 1983.
  27. United States Census Bureau. State-to-state migration flows. U.S. Department of Commerce, 2011. URL http://www.census.gov/hhes/migration/ data/acs/state-to-state.html.
  28. Vassilevska, V., Williams, R., and Yuster, R. All pairs bot- tleneck paths and max-min matrix products in truly sub- cubic time. Theory of Computing, 5:173-189, 2009.
  29. Von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing, 17(4):395-416, 12 2007.
  30. Zhao, Y. and Karypis, G. Hierarchical clustering algo- rithms for document datasets. Data Mining and Knowl- edge Discovery, 10:141-168, 2005.