Learning to Rank in Entity Relationship Graphs
2019, Informs Journal on Computing
https://doi.org/10.1287/IJOC.2018.0837Abstract
Many real-world data sets are modeled as entity relationship graphs or heterogeneous information networks. In these graphs, nodes represent entities and edges mimic relationships. ObjectRank extends the well-known PageRank authority flow-based ranking method to entity relationship graphs using an authority flow weight vector (W). The vector W assigns a different authority flow-based importance (weight) to each edge type based on domain knowledge or personalization. In this paper, our contribution is a framework for Learning to Rank in entity relationship graphs to learn W, in the context of authority flow. We show that the problem is similar to learning a recursive scoring function. We present a two-phase iterative solution and multiple variants of learning. In pointwise learning, we learn W, and hence the scoring function, from the scores of a sample of nodes. In pairwise learning, we learn W from given preferences for pairs of nodes. To demonstrate our contribution in a real setting, we apply our framework to learn the rank, with high accuracy, for a real-world challenge of predicting future citations in a bibliographic archive-that is, the FutureRank score. Our extensive experiments show that with a small amount of training data, and a limited number of iterations, our Learning to Rank approach learns W with high accuracy. Learning works well with pairwise training data in large graphs.
References (65)
- Adomavicius G, Tuzhilin A (2015) Context-aware recommender systems. Ricci F, Rokach L, Shapira B, eds. Recommender Systems Handbook, 2nd ed. (Springer, Boston), 191-226.
- Bahmani B, Chakrabarti K, Xin D (2011) Fast Personalized PageRank on MapReduce (ACM, New York).
- Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. Proc. VLDB Endow. 4(3):173-184.
- Balmin A, Hristidis V, Papakonstantinou Y (2004) Objectrank: Authority-based keyword search in databases. Proc. 2004 VLDB Conf. (Morgan Kaufmann, St. Louis), 564-575.
- Baltrunas L, Ludwig B, Ricci F (2011) Matrix factorization techniques for context aware recommendation. Proc. 5th ACM Conf. Rec- ommender Systems (RecSys'11) (ACM, New York), 301-304.
- Burges C, et al. (2005) Learning to rank using gradient descent, Proc. 22nd Internat. Conf. Machine Learn. (ICML'05), J. Machine Learn. Res. 89-96.
- Burton K, Java A, Soboroff I (2009) The icwsm 2009 spinn3r dataset. Proc. Conf. Weblogs Social Media (ICWSM'09) (Association for the Advancement of Artificial Intelligence, Palo Alto, CA).
- Cao Z, Liu T (2007) Learning to rank: From pairwise approach to listwise approach. Proc. 24th Internat. Conf. Machine Learn., J. Machine Learn. Res., 129-136.
- Chakrabarti S (2007) Dynamic personalized pagerank in entity- relation graphs. Proc. 16th Internat. Conf. World Wide Web (WWW'07) (ACM, New York), 571-580.
- Chakrabarti S, Agarwal A (2006) Learning parameters in entity relationship graphs from ranking preferences. Eur. Conf. Ma- chine Learn. Principles Practice Knowledge Discovery Databases (Springer, Berlin), 91-102.
- Cooley R, Mobasher B, Srivastava J (1999) Data preparation for mining world wide web browsing patterns. Knowledge Inform. Systems 1(1):5-32.
- Cortes C, Mohri M, Rastogi A (2007) Magnitude-preserving ranking algorithms. Proc. 24th Internat. Conf. Machine Learn. (ICML'07) (ACM, New York), 169-176,
- Croft B, Metzler D, Strohman T (2009) Search Engines: Information Retrieval in Practice, 1st ed. (Addison-Wesley Publishing Com- pany, Boston).
- Deng W, Ma J (2017) Leveraging heterogeneous information network for community recommendation. Proc. Internat. Conf. Inform. Systems (Association of Information Systems, Atlanta)
- Fakas G, Cai Z, Mamoulis N (2014) Versatile size-ℓ object summaries for relational keyword search. IEEE Trans. Knowledge Data Engrg. 26(4):1026-1038.
- Fakas G, Cai Z, Mamoulis N (2015) Diverse and proportional size-l object summaries for keyword search. Proc. 2015 ACM SIGMOD Internat. Conf. Management Data (ACM, New York), 363-375.
- Fakas G, Cai Z, Mamoulis N (2016) Diverse and proportional size-l object summaries using pairwise relevance. VLDB J. 25(6):791-816.
- Fogaras D, Rácz B, Csalogány K, Sarl ós T (2005) Towards scaling fully personalized pagerank: Algorithms, lower bounds, and exper- iments. Internet Math. 2(3):333-358.
- Gao J, Qi H, Xia X, yun Nie J (2005) Linear discriminant model for information retrieval. Proc. 28th Internat. ACM SIGIR Conf. (ACM, New York), 290-297.
- Geerts F, Mannila H, Terzi E (2004) Relational link-based ranking. Proc. 2004 VLDB Conf. (Morgan Kaufmann, St. Louis), 552-563.
- Getoor L, Taskar B (2007) Introduction to statistical relational learning. Adaptive Computation and Machine Learning (MIT Press, Cambridge, MA).
- Haveliwala T (2002) Topic-sensitive pagerank. Proc. 11th Internat. Conf. World Wide Web (WWW'02) (ACM, New York).
- Hristidis V, Hwang H, Papakonstantinou Y (2008) Authority-based keyword search in databases. ACM Trans. Database System 33(1): 1-40.
- Hristidis V, Wu Y, Raschid L (2014) Efficient ranking on entity graphswith personalized relationships. IEEE Trans. Knowledge Data Engrg. 26(4):850-863.
- Hwang H, Balmin A, Reinwald B, Nijkamp E (2009) Binrank: Scaling dynamic authority-based search using materialized subgraphs. Proc. Internat. Conf. Data Engrg. (IEEE, Piscataway, NJ) 66-77.
- Jeh G, Widom J (2003) Scaling personalized web search. Proc. 12th Internat. Conf. World Wide Web (WWW'03) (ACM, New York), 271-279.
- Ji M, Han J, Danilevsky M (2011) Ranking-based classification of heterogeneous information networks. Proc. 17th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1298-1306.
- Kashyap A, Hristidis V (2012) Logrank: Summarizing social activity logs. WebDB (ACM, New York), 1-6.
- KDD Cup 2003 (2016) Datasets. http://www.cs.cornell.edu/ projects/kddcup/datasets.html.
- Kleinberg JM (1999) Authoritative sources in a hyperlinked envi- ronment. J. ACM 46(5):604-632.
- Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30-37.
- Kritikopoulos A, Sideri M (2003) The compass filter: Search engine result personalization using web communities. Mobasher B, Anand SS, eds. Intelligent Techniques for Web Personalization: IJCAI 2003 Workshop, ITWP 2003, Acapulco, Mexico, August 11, 2003, Revised Selected Papers, Lecture Notes in Computer Science, vol. 3169 (Springer, Berlin), 229-240.
- Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Machine Learn. 81(1):53-67.
- Li P, Burges C, Wu Q (2007) Mcrank: Learning to rank using multiple classification and gradient boosting. Platt JC, Koller D, Singer Y, Raschid, Sayyadi, and Hristidis: Learning to Rank in Entity Relationship Graphs INFORMS Journal on Computing, Articles in Advance, pp. 1-18, © 2019 INFORMS Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Morgan Kauffman, San Francisco).
- Liu Y, Fu Y (2007) Automatic search engine performance evaluation with click-through data analysis. Proc. 16th Internat. Conf. World Wide Web (WWW'07) (ACM, New York).
- Manning C, Raghavan P, Schutze H (2008) Introduction to Information Retrieval (Cambridge University Press, New York).
- Mobasher B, Cooley R, Srivastava J (2000) Automatic personalization based on web usage mining. Commun. ACM 43(8):142-151.
- Nallapati R (2004) Discriminative models for information retrieval. Proc. 27th Annual Internat. Conf. Res. Development Inform. Retrieval (SIGIR'04) (ACM, New York), 64-71.
- Nie Z, Zhang Y, Wen JR, Ma WY (2005) Object-level ranking: Bringing order to web objects. Proc. 14th Internat. Conf. World Wide Web (WWW'05) (ACM, New York), 567-574.
- Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: Bringing order to the web. Stanford Technical Report, Stanford University, Palo Alto, CA.
- Prawesh S, Padmanabhan B (2012) Probabilistic news recommender systems with feedback. Proc. 6th ACM Conf. Recommender Sys- tems (RecSys'12) (ACM, New York), 257-260.
- Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: An open architecture for collaborative filtering of netnews. Proc. 1994 ACM Conf. Comput. Supported Cooperative Work (CSCW'94) (ACM, New York), 175-186.
- Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun. ACM 18(11):613-620.
- Sayyadi H, Getoor L (2009) Future rank: Ranking scientific articles by predicting their future pagerank. 2009 SIAM Internat. Conf. Data Mining (SDM'09) (SIAM, Philadelphia).
- Schloss Dagstuhl -Leibniz Center for Informatics and University of Trier (2017) DBLP Computer Science Bibliography. https:// dblp.uni-trier.de/.
- Sen P, Namata GM, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. AI Magazine 29(3):93-106.
- Shi C, Li Y, Zhang J, Sun Y, Yu P (2017) A survey of heterogeneous information network analysis. IEEE Trans. Knowledge Data Engrg. 29(1):17-37.
- Spiliopoulou M, Faulstich LC (1999) WUM: A tool for web utilization analysis. The World Wide Web and Databases International Work- shop WebDB'98, Valencia, Spain, March 27-28, 1998. Selected Pa- pers, Lecture Notes in Computer Science, vol. 1590 (Springer, Berlin), 184-203.
- Sugiyama K, Hatano K, Yoshikawa M (2004) Adaptive web search based on user profile constructed without any effort from users. Proc. 13th Conf. World Wide Web (WWW '04) (ACM, New York), 675-684.
- Sun Y, Han J (2012a) Mining heterogeneous information networks: A structural analysis approach. SigKDD Explorations 14(2):20.
- Sun Y, Han J (2012b) Mining Heterogeneous Information Networks: Principles and Methodologies (Morgan & Claypool, San Rafael, CA), 1-159.
- Sun Y, Norick B, Yan JHX, Yu P, Yu X (2012) Integrating meta-path selection with user-guided object clustering in heteroge- neous information networks. Proc. 18th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1348-1356.
- Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heteroge- neous information networks with star network schema. Proc. 15th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 797-806.
- Symeonidis P (2016) Matrix and tensor decomposition in recom- mender systems. Proc. 10th ACM Conf. Recommender Systems (RecSys '16) (ACM, New York), 429-430.
- Taskar B, fai Wong M, Abbeel P, Koller D (2003) Link prediction in relational data. Thrun S, Saul LK, Sch ölkopf B, eds. Advances in Neural Information Processing Systems, vol. 16 (Morgan Kauffman, San Francisco).
- Teevan J, Dumais ST, Horvitz E (2005) Personalizing search via automated analysis of interests and activities. Proc. 28th Annual Internat. ACM SIGIR Conf. Res. Dev. Inform. Retrieval (SIGIR'05) (ACM, New York).
- Varadarajan R, Hristidis V, Raschid L (2008) Explaining and refor- mulating authority flow queries. IEEE 24th Internat. Conf. Data Engrg. (IEEE, Piscataway, NJ) 883-892.
- Xia F, Liu T, Wang J, Zhang W, Li H (2008) Listwise approach to learning to rank: Theory and algorithm. Internat. Conf. Machine Learn., J. Machine Learn. Res., 1192-1199.
- Yedidia J (2011) Message-passing algorithms for inference and op- timization. J. Statist. Phys. 145(4):860-890.
- Yu X, Ren X, Sun Y, Gu Q, Sturt B, Khandelwal U, Norick B, Han J (2014) Personalized entity recommendation: A heterogeneous information network approach. Proc. 7th ACM Conf. Web Search Data Mining (ACM, New York).
- Zeng X, Li Y, Leung S, Lin Z, Liu X (2016) Investment behavior prediction in heterogeneous information network. Neuro- computing 217:125-132.
- Zhao H, Yao Q, Li J, Song Y, Lee DL (2017) Meta-graph based rec- ommendation fusion over heterogeneous information networks. Proc. ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York).
- Zheng Z, Zha H, Zhang T, Chapelle O, Chen K, Sun G (2008) A general boosting method and its application to learning rank- ing functions for web search. Platt JC, Koller D, Singer Y, Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Morgan Kauffman, San Francisco), 1697-1704.
- Zhou D, Orshanskiy SA, Zha H, Giles CL (2007) Co-ranking authors and documents in a heterogeneous network. 7th IEEE Internat. Conf. Data Mining (ICDM 2007) (IEEE, Piscataway, NJ).
- Raschid, Sayyadi, and Hristidis: Learning to Rank in Entity Relationship Graphs