User demographics prediction based on mobile data
2013, Pervasive and Mobile Computing
https://doi.org/10.1016/J.PMCJ.2013.07.009Abstract
Demographics prediction is an important component of user profile modeling. The accurate prediction of users' demographics can help promote many applications, ranging from web search, personalization to behavior targeting. In this paper, we focus on how to predict users' demographics, including ''gender'', ''job type'', ''marital status'', ''age'' and ''number of family members'', based on mobile data, such as users' usage logs, physical activities and environmental contexts. The core idea is to build a supervised learning framework, where each user is represented as a feature vector and users' demographics are considered as prediction targets. The most important component is to construct features from raw data and then supervised learning models can be applied. We propose a feature construction framework, CFC (contextual feature construction), where each feature is defined as the conditional probability of one user activity under the given contexts. Consequently, besides employing standard supervised learning models, we propose a regularized multi-task learning framework to model different kinds of demographics predictions collectively. We also propose a cost-sensitive classification framework for regression tasks, in order to benefit from the existing dimension reduction methods. Finally, due to the limited training instances, we employ ensemble to avoid overfitting. The experimental results show that the framework achieves classification accuracies on ''gender'', ''job'' and ''marital status'' as high as 96%, 83% and 86%, respectively, and achieves Root Mean Square Error (RMSE) on ''age'' and ''number of family members'' as low as 0.69 and 0.66 respectively, under the leave-one-out evaluation.
References (37)
- J. Hu, H.-J. Zeng, H. Li, C. Niu, Z. Chen, Demographic prediction based on user's browsing behavior, in: Proceedings of the 16th International Conference on World Wide Web, WWW'07, 2007, pp. 151-160.
- I. Weber, C. Castillo, The demographics of web search, in: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'10, 2010, pp. 523-530.
- L. Li, T. Mei, X. Niu, C.-W. Ngo, Pagesense: style-wise web page advertising, in: Proceedings of the 19th International Conference on World Wide Web, WWW'10, 2010, pp. 1273-1276.
- W. Fan, E. Zhong, J. Peng, O. Verscheure, K. Zhang, J. Ren, R. Yan, Q. Yang, Generalized and heuristic-free feature construction for improved accuracy, in: SDM, SIAM, 2010, pp. 629-640.
- A. Mislove, B. Viswanath, K.P. Gummadi, P. Druschel, You are who you know: inferring user profiles in online social networks, in: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM'10, 2010, pp. 251-260.
- A. Ulges, M. Koch, D. Borth, Linking visual concept detection with viewer demographics, in: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR'12, 2012, pp. 24:1-24:8.
- J. Otterbacher, Inferring gender of movie reviewers: exploiting writing style, content and metadata, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM'10, 2010, pp. 369-378.
- G. Chittaranjan, J. Blom, D. Gatica-Perez, Who's who with big-five: analyzing and classifying personality traits with smartphones, in: 2011 15th Annual International Symposium on Wearable Computers (ISWC), 2011, pp. 29-36.
- J. Staiano, B. Lepri, N. Aharony, F. Pianesi, N. Sebe, A. Pentland, Friends don't lie: inferring personality traits from social network structure, in: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp '12, ACM, New York, NY, USA, 2012, pp. 321-330.
- R. LiKamWa, Y. Liu, N. Lane, L. Zhong, Can your smartphone infer your mood? in: Second International Workshop on Sensing Applications on Mobile Phones, 2011.
- Y.-A. de Montjoye, J. Quoidbach, F. Robic, A.S. Pentland, Predicting personality using novel mobile phone-based metrics, in: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction, SBP'13, Springer-Verlag, Berlin, Heidelberg, 2013, pp. 48-55.
- R. Caruana, Multitask learning, Machine Learning (1997) 41-75.
- O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Y. Zhang, B. Tseng, Multi-task learning for boosting with application to web search ranking, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'10, 2010, pp. 1189-1198.
- R. Gupta, L. Ratinov, Text categorization with knowledge transfer from heterogeneous data sources, in: Proceedings of the 23rd National Conference on Artificial Intelligence-Volume 2, AAAI'08, 2008, pp. 842-847.
- L. Jacob, F. Bach, J.-P. Vert, Clustered multi-task learning: a convex formulation, in: D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Eds.), NIPS, Curran Associates, Inc., 2008, pp. 745-752.
- J. Chen, L. Tang, J. Liu, J. Ye, A convex formulation for learning shared structures from multiple tasks, in: Proceedings of the 26th Annual International Conference on Machine Learning, ICML'09, 2009, pp. 137-144.
- T.K. Pong, P. Tseng, S. Ji, J. Ye, Trace norm regularization: reformulations, algorithms, and multi-task learning, SIAM Journal on Optimization 20 (6) (2010) 3465-3489.
- G. Tsoumakas, I. Katakis, Multi-label classification: an overview, International Journal of Data Warehousing and Mining 2007 (2007) 1-13.
- W. Pan, E. Zhong, Q. Yang, Transfer learning for text mining, in: C.C. Aggarwal, C. Zhai (Eds.), Mining Text Data, Springer, 2012, pp. 223-257.
- S.S. Sawilowsky, Nonparametric tests of interaction in experimental design, Review of Educational Research 60 (1) (1990) 91-126.
- I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research 3 (2003) 1157-1182.
- I.T. Jolliffe, Principal Component Analysis, second ed., Springer, 2002.
- B. Schlkopf, A.J. Smola, K.R. Müller, Kernel principal component analysis, Advances in Kernel Methods: Support Vector Learning (1999) 327-352.
- Roweis, T. Martinetz, K. Schulten, N. Netw, V. Kumar, A. Grama, A. Gupta, G. Karypis, Nonlinear dimensionality reduction by locally linear embedding, 2000.
- K. Kira, L.A. Rendell, A practical approach to feature selection, in: D.H. Sleeman, P. Edwards (Eds.), Ninth International Workshop on Machine Learning, Morgan Kaufmann, 1992, pp. 249-256.
- I. Kononenko, Estimating attributes: analysis and extensions of relief, in: F. Bergadano, L.D. Raedt (Eds.), European Conference on Machine Learning, Springer, 1994, pp. 171-182.
- M. Robnik-Sikonja, I. Kononenko, An adaptation of relief for attribute estimation in regression, in: D.H. Fisher (Ed.), Fourteenth International Conference on Machine Learning, Morgan Kaufmann, 1997, pp. 296-304.
- J.R. Quinlan, Bagging, boosting, and c4.5, in: Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI Press, 1996, pp. 725-730.
- J.H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of Statistics 29 (2000) 1189-1232.
- L.B. Statistics, L. Breiman, Random forests, Machine Learning (2001) 5-32.
- C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998) 121-167.
- L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and Regression Trees, first ed., Chapman and Hall/CRC, 2013.
- A.J. Smola, B. Schölkopf, A tutorial on support vector regression, Statistics and Computing 14 (3) (2004) 199-222.
- C.E. Rasmussen, C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.
- R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (1) (1996) 267-288.
- T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction: With 200 Full-Color Illustrations, Springer-Verlag, New York, 2001.
- J.K. Laurila, D. Gatica-Perez, I. Aad, J. Blom, O. Bornet, T. Do, O. Dousse, J. Eberle, M. Miettinen, The Mobile Data Challenge: Big Data for Mobile Computing Research, Newcastle, UK, 2012.