Entity Matching in Online Social Networks
SocialCom 2013
Abstract
In recent years, Online Social Networks (OSNs) have essentially become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus and offers for particular services and functionalities. To take advantage of the full range of services and functionalities that OSNs offer, users often create several accounts on various OSNs using the same or different personal information. Retrieving all available data about an individual from several OSNs and merging it into one profile can be useful for many purposes. In this paper, we present a method for solving the Entity Resolution (ER), problem for matching user profiles across multiple OSNs. Our algorithm is able to match two user profiles from two different OSNs based on machine learning techniques, which uses features extracted from each one of the user profiles. Using supervised learning techniques and extracted features, we constructed different classifiers, which were then trained and used to rank the probability that two user profiles from two different OSNs belong to the same individual. These classifiers utilized 27 features of mainly three types: name based features (i.e., the Soundex value of two names), general user info based features (i.e., the cosine similarity between two user profiles), and social network topological based features (i.e., the number of mutual friends between two users’ friends list). This experimental study uses real-life data collected from two popular OSNs, Facebook and Xing. The proposed algorithm was evaluated and its classification performance measured by AUC was 0.982 in identifying user profiles across two OSNs.
References (9)
- Patriquin A., "Connecting to Social Graph: Member Overlap at OpenSocial and Facebook", The compete.com Blog, 2007.
- Vosecky J., Hong D., and Shen V.Y., "User identification across multiple social networks", In Proc. of First International Conference on Networked Digital Technologies, 2009.
- Veldman I., "Matching Profiles from Social Network Sites", Master's thesis, University of Twente, 2009.
- Carmagnola F., Osborne F., and Torre I., "User data distributed on the social web:How to identify users on different social systems and collecting data about them", Proc. of HetRec 2010, USA, pp. 9-15.
- Iofciu T., Fankhauser P., Abel F., and Bischoff K., "Identifying users across social tagging systems", ICWSM 2011.
- Narayanan A., and Shmatikov V., "Robust De-anonymization of Large Sparse Datasets", IEEE Symposium on Security and Privacy, 2008.pp. 111-125.
- Narayanan A., and Shmatikov V., "De-anonymizing Social Networks", IEEE Symposium on Security & Privacy, 2009, pp. 173-187.
- Raghavan V. V., and Wong S. K. M., "A critical analysis of vector space model for information retrieval", Journal of the American Society for Information Science, Vol.37 (5), p. 279-87, 1986.
- Witten I. H., Frank E., and Hall M. A., "Data Mining: Practical Machine Learning Tools and Techniques" Elsevier, 2011.