Affinity Prediction in Online Social Networks
2014
Abstract
Link prediction is the problem of inferring whether potential edges between pairs of vertices in a graph will be present or absent in the near future. To perform this task it is usual to use information provided by a number of available and observed vertices/edges. Then, a number of edge scoring methods based on this information can be created. Usually, these methods assess local structures of the observed graph, assuming that closer vertices in the original period of observation will be more likely to form a link in the future. In this paper we explore the combination of local and global features to conduct link prediction in online social networks. The contributions of the paper are twofold: a) We evaluate a number of strategies that combines global and local features tackling the locality assumption of link prediction scoring methods, and b) We only use network topology-based features, avoiding the inclusion of informational or transactional based features that involve heavy computational costs in the methods. We evaluate our proposal using real-world data provided by Skout Inc., an affinity online social network with millions of users around the world. Our results show that our proposal is feasible.
References (16)
- Liben-Nowell, David ; Kleinberg, Jon M.: The link prediction problem for social networks. In: CIKM, S. 556-559 (2003)
- Adamic, L. A., Adar, E.: Friends and neighbors on the Web. Social Networks, 25, 211230. (2003)
- Lin, D.: An Information-Theoretic Definition of Similarity. In Quality (Vol. 1, pp. 296304). (1998)
- L, L., Jin, C.-H., Zhou, T.: Similarity index based on local paths for link prediction of complex networks. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 80, 046122. (2009)
- Newman, M. E.: Clustering and preferential attachment in growing networks. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 64, 025102. (2001)
- Salton, G., McGill, M. J.: Introduction to Modern Information Retrieval. New York (Vol. 22, p. xv, 448 p.). (1983)
- Kossinets, G.: Effects of missing data in social networks. Social Networks, 28, 247268. (2006)
- Jaccard, P.: The distribution of the flora in the alphine zone. The New Phytologist, XI, 3750. (1912)
- T. Srensen, A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons, Biol. Skr. 5 (1948)
- Katz, L.: A new status index derived from sociometric anal- ysis. Psychometrika, 18(1), 3943. (1953)
- Kleinberg, J.: Authoritative Sources in a Hyperlinked Envi- ronment. SODA 1998: 668-677
- Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. Proceedings of the Eighth ACM SIGKDD Interna- tional , 111. (2002)
- Murata, T., Moriyasu, S.: Link Prediction based on Struc- tural Properties of Online Social Networks. New Generation Computing, 26(3), 245257. (2008)
- Silva, N. B., Tsang, I.-R., Cavalcanti, G. D. C., Tsang, I.-J.: A graph-based friend recommendation system using Genetic Algorithm. In IEEE Congress on Evolutionary Computation (pp. 17). IEEE. (2010)
- Linyuan, L.: Link Prediction in Complex Networks : A Survey. (2010)
- Genio, C. I. Del, Gross, T., Bassler, K. E.:All scale-free networks are sparse, (5), 14. (2011)