Academia.eduAcademia.edu

Outline

Semi-supervised Classification from Discriminative Random Walks

2008, Lecture Notes in Computer Science

https://doi.org/10.1007/978-3-540-87479-9_29

Abstract

This paper describes a novel technique, called D-walks, to tackle semi-supervised classification problems in large graphs. We introduce here a betweenness measure based on passage times during random walks of bounded lengths. Such walks are further constrained to start and end in nodes within the same class, defining a distinct betweenness for each class. Unlabeled nodes are classified according to the class showing the highest betweenness. Forward and backward recurrences are derived to efficiently compute the passage times. D-walks can deal with directed or undirected graphs with a linear time complexity with respect to the number of edges, the maximum walk length considered and the number of classes. Experiments on various real-life databases show that D-walks outperforms NetKit [5], the approach of Zhou and Schölkopf and the regularized laplacian kernel . The benefit of D-walks is particularly noticeable when few labeled nodes are available. The computation time of D-walks is also substantially lower in all cases.

References (16)

  1. Callut, J.: First Passage Times Dynamics in Markov Models with Applications to HMM Induction, Sequence Classification, and Graph Mining. Phd thesis disserta- tion, Universite catholique de Louvain (October 2007)
  2. Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Automation and Remote Control 58(9), 1505-1514 (1997)
  3. Chebotarev, P., Shamis, E.: On proximity measures for graph vertices. Automation and Remote Control 59(10), 1443-1459 (1998)
  4. Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, Heidelberg (1983)
  5. Macskassy, S.A., Provost, F.: Classi cation in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935-983 (2007)
  6. Newman, M.E.J.: A measure of betweenness centrality based on random walks. Social networks 27, 39-54 (2005)
  7. Norris, J.R.: Markov Chains. Cambridge University Press, United Kingdom (1997)
  8. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical Report, Computer System Laboratory, Stan- ford University (1998)
  9. Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
  10. Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144-158.
  11. Springer, Heidelberg (2003)
  12. Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: Advances in Neural Information Processing Systems, vol. 14, pp. 945-952 (2002)
  13. Tsuda, K., Noble, W.S.: Learning kernels from biological networks by maximizing entropy. Bioinformatics 20(1), 326-333 (2004)
  14. Viger, F., Latapy, M.: Efficient and simple generation of random simple connected graphs with prescribed degree sequence. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 440-449. Springer, Heidelberg (2005)
  15. Zhou, D., Huang, J., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: ICML 2005: Proceedings of the 22nd international conference on Machine learning, pp. 1036-1043. ACM, New York (2005)
  16. Zhou, D., Schölkopf, B.: Learning from labeled and unlabeled data using ran- dom walks. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 237-244. Springer, Heidelberg (2004)