Academia.eduAcademia.edu

Outline

Learning to Rank in Entity Relationship Graphs

2019, Informs Journal on Computing

https://doi.org/10.1287/IJOC.2018.0837

Abstract

Many real-world data sets are modeled as entity relationship graphs or heterogeneous information networks. In these graphs, nodes represent entities and edges mimic relationships. ObjectRank extends the well-known PageRank authority flow-based ranking method to entity relationship graphs using an authority flow weight vector (W). The vector W assigns a different authority flow-based importance (weight) to each edge type based on domain knowledge or personalization. In this paper, our contribution is a framework for Learning to Rank in entity relationship graphs to learn W, in the context of authority flow. We show that the problem is similar to learning a recursive scoring function. We present a two-phase iterative solution and multiple variants of learning. In pointwise learning, we learn W, and hence the scoring function, from the scores of a sample of nodes. In pairwise learning, we learn W from given preferences for pairs of nodes. To demonstrate our contribution in a real setting, we apply our framework to learn the rank, with high accuracy, for a real-world challenge of predicting future citations in a bibliographic archive-that is, the FutureRank score. Our extensive experiments show that with a small amount of training data, and a limited number of iterations, our Learning to Rank approach learns W with high accuracy. Learning works well with pairwise training data in large graphs.

References (65)

  1. Adomavicius G, Tuzhilin A (2015) Context-aware recommender systems. Ricci F, Rokach L, Shapira B, eds. Recommender Systems Handbook, 2nd ed. (Springer, Boston), 191-226.
  2. Bahmani B, Chakrabarti K, Xin D (2011) Fast Personalized PageRank on MapReduce (ACM, New York).
  3. Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. Proc. VLDB Endow. 4(3):173-184.
  4. Balmin A, Hristidis V, Papakonstantinou Y (2004) Objectrank: Authority-based keyword search in databases. Proc. 2004 VLDB Conf. (Morgan Kaufmann, St. Louis), 564-575.
  5. Baltrunas L, Ludwig B, Ricci F (2011) Matrix factorization techniques for context aware recommendation. Proc. 5th ACM Conf. Rec- ommender Systems (RecSys'11) (ACM, New York), 301-304.
  6. Burges C, et al. (2005) Learning to rank using gradient descent, Proc. 22nd Internat. Conf. Machine Learn. (ICML'05), J. Machine Learn. Res. 89-96.
  7. Burton K, Java A, Soboroff I (2009) The icwsm 2009 spinn3r dataset. Proc. Conf. Weblogs Social Media (ICWSM'09) (Association for the Advancement of Artificial Intelligence, Palo Alto, CA).
  8. Cao Z, Liu T (2007) Learning to rank: From pairwise approach to listwise approach. Proc. 24th Internat. Conf. Machine Learn., J. Machine Learn. Res., 129-136.
  9. Chakrabarti S (2007) Dynamic personalized pagerank in entity- relation graphs. Proc. 16th Internat. Conf. World Wide Web (WWW'07) (ACM, New York), 571-580.
  10. Chakrabarti S, Agarwal A (2006) Learning parameters in entity relationship graphs from ranking preferences. Eur. Conf. Ma- chine Learn. Principles Practice Knowledge Discovery Databases (Springer, Berlin), 91-102.
  11. Cooley R, Mobasher B, Srivastava J (1999) Data preparation for mining world wide web browsing patterns. Knowledge Inform. Systems 1(1):5-32.
  12. Cortes C, Mohri M, Rastogi A (2007) Magnitude-preserving ranking algorithms. Proc. 24th Internat. Conf. Machine Learn. (ICML'07) (ACM, New York), 169-176,
  13. Croft B, Metzler D, Strohman T (2009) Search Engines: Information Retrieval in Practice, 1st ed. (Addison-Wesley Publishing Com- pany, Boston).
  14. Deng W, Ma J (2017) Leveraging heterogeneous information network for community recommendation. Proc. Internat. Conf. Inform. Systems (Association of Information Systems, Atlanta)
  15. Fakas G, Cai Z, Mamoulis N (2014) Versatile size-ℓ object summaries for relational keyword search. IEEE Trans. Knowledge Data Engrg. 26(4):1026-1038.
  16. Fakas G, Cai Z, Mamoulis N (2015) Diverse and proportional size-l object summaries for keyword search. Proc. 2015 ACM SIGMOD Internat. Conf. Management Data (ACM, New York), 363-375.
  17. Fakas G, Cai Z, Mamoulis N (2016) Diverse and proportional size-l object summaries using pairwise relevance. VLDB J. 25(6):791-816.
  18. Fogaras D, Rácz B, Csalogány K, Sarl ós T (2005) Towards scaling fully personalized pagerank: Algorithms, lower bounds, and exper- iments. Internet Math. 2(3):333-358.
  19. Gao J, Qi H, Xia X, yun Nie J (2005) Linear discriminant model for information retrieval. Proc. 28th Internat. ACM SIGIR Conf. (ACM, New York), 290-297.
  20. Geerts F, Mannila H, Terzi E (2004) Relational link-based ranking. Proc. 2004 VLDB Conf. (Morgan Kaufmann, St. Louis), 552-563.
  21. Getoor L, Taskar B (2007) Introduction to statistical relational learning. Adaptive Computation and Machine Learning (MIT Press, Cambridge, MA).
  22. Haveliwala T (2002) Topic-sensitive pagerank. Proc. 11th Internat. Conf. World Wide Web (WWW'02) (ACM, New York).
  23. Hristidis V, Hwang H, Papakonstantinou Y (2008) Authority-based keyword search in databases. ACM Trans. Database System 33(1): 1-40.
  24. Hristidis V, Wu Y, Raschid L (2014) Efficient ranking on entity graphswith personalized relationships. IEEE Trans. Knowledge Data Engrg. 26(4):850-863.
  25. Hwang H, Balmin A, Reinwald B, Nijkamp E (2009) Binrank: Scaling dynamic authority-based search using materialized subgraphs. Proc. Internat. Conf. Data Engrg. (IEEE, Piscataway, NJ) 66-77.
  26. Jeh G, Widom J (2003) Scaling personalized web search. Proc. 12th Internat. Conf. World Wide Web (WWW'03) (ACM, New York), 271-279.
  27. Ji M, Han J, Danilevsky M (2011) Ranking-based classification of heterogeneous information networks. Proc. 17th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1298-1306.
  28. Kashyap A, Hristidis V (2012) Logrank: Summarizing social activity logs. WebDB (ACM, New York), 1-6.
  29. KDD Cup 2003 (2016) Datasets. http://www.cs.cornell.edu/ projects/kddcup/datasets.html.
  30. Kleinberg JM (1999) Authoritative sources in a hyperlinked envi- ronment. J. ACM 46(5):604-632.
  31. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30-37.
  32. Kritikopoulos A, Sideri M (2003) The compass filter: Search engine result personalization using web communities. Mobasher B, Anand SS, eds. Intelligent Techniques for Web Personalization: IJCAI 2003 Workshop, ITWP 2003, Acapulco, Mexico, August 11, 2003, Revised Selected Papers, Lecture Notes in Computer Science, vol. 3169 (Springer, Berlin), 229-240.
  33. Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Machine Learn. 81(1):53-67.
  34. Li P, Burges C, Wu Q (2007) Mcrank: Learning to rank using multiple classification and gradient boosting. Platt JC, Koller D, Singer Y, Raschid, Sayyadi, and Hristidis: Learning to Rank in Entity Relationship Graphs INFORMS Journal on Computing, Articles in Advance, pp. 1-18, © 2019 INFORMS Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Morgan Kauffman, San Francisco).
  35. Liu Y, Fu Y (2007) Automatic search engine performance evaluation with click-through data analysis. Proc. 16th Internat. Conf. World Wide Web (WWW'07) (ACM, New York).
  36. Manning C, Raghavan P, Schutze H (2008) Introduction to Information Retrieval (Cambridge University Press, New York).
  37. Mobasher B, Cooley R, Srivastava J (2000) Automatic personalization based on web usage mining. Commun. ACM 43(8):142-151.
  38. Nallapati R (2004) Discriminative models for information retrieval. Proc. 27th Annual Internat. Conf. Res. Development Inform. Retrieval (SIGIR'04) (ACM, New York), 64-71.
  39. Nie Z, Zhang Y, Wen JR, Ma WY (2005) Object-level ranking: Bringing order to web objects. Proc. 14th Internat. Conf. World Wide Web (WWW'05) (ACM, New York), 567-574.
  40. Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: Bringing order to the web. Stanford Technical Report, Stanford University, Palo Alto, CA.
  41. Prawesh S, Padmanabhan B (2012) Probabilistic news recommender systems with feedback. Proc. 6th ACM Conf. Recommender Sys- tems (RecSys'12) (ACM, New York), 257-260.
  42. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: An open architecture for collaborative filtering of netnews. Proc. 1994 ACM Conf. Comput. Supported Cooperative Work (CSCW'94) (ACM, New York), 175-186.
  43. Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun. ACM 18(11):613-620.
  44. Sayyadi H, Getoor L (2009) Future rank: Ranking scientific articles by predicting their future pagerank. 2009 SIAM Internat. Conf. Data Mining (SDM'09) (SIAM, Philadelphia).
  45. Schloss Dagstuhl -Leibniz Center for Informatics and University of Trier (2017) DBLP Computer Science Bibliography. https:// dblp.uni-trier.de/.
  46. Sen P, Namata GM, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. AI Magazine 29(3):93-106.
  47. Shi C, Li Y, Zhang J, Sun Y, Yu P (2017) A survey of heterogeneous information network analysis. IEEE Trans. Knowledge Data Engrg. 29(1):17-37.
  48. Spiliopoulou M, Faulstich LC (1999) WUM: A tool for web utilization analysis. The World Wide Web and Databases International Work- shop WebDB'98, Valencia, Spain, March 27-28, 1998. Selected Pa- pers, Lecture Notes in Computer Science, vol. 1590 (Springer, Berlin), 184-203.
  49. Sugiyama K, Hatano K, Yoshikawa M (2004) Adaptive web search based on user profile constructed without any effort from users. Proc. 13th Conf. World Wide Web (WWW '04) (ACM, New York), 675-684.
  50. Sun Y, Han J (2012a) Mining heterogeneous information networks: A structural analysis approach. SigKDD Explorations 14(2):20.
  51. Sun Y, Han J (2012b) Mining Heterogeneous Information Networks: Principles and Methodologies (Morgan & Claypool, San Rafael, CA), 1-159.
  52. Sun Y, Norick B, Yan JHX, Yu P, Yu X (2012) Integrating meta-path selection with user-guided object clustering in heteroge- neous information networks. Proc. 18th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1348-1356.
  53. Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heteroge- neous information networks with star network schema. Proc. 15th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 797-806.
  54. Symeonidis P (2016) Matrix and tensor decomposition in recom- mender systems. Proc. 10th ACM Conf. Recommender Systems (RecSys '16) (ACM, New York), 429-430.
  55. Taskar B, fai Wong M, Abbeel P, Koller D (2003) Link prediction in relational data. Thrun S, Saul LK, Sch ölkopf B, eds. Advances in Neural Information Processing Systems, vol. 16 (Morgan Kauffman, San Francisco).
  56. Teevan J, Dumais ST, Horvitz E (2005) Personalizing search via automated analysis of interests and activities. Proc. 28th Annual Internat. ACM SIGIR Conf. Res. Dev. Inform. Retrieval (SIGIR'05) (ACM, New York).
  57. Varadarajan R, Hristidis V, Raschid L (2008) Explaining and refor- mulating authority flow queries. IEEE 24th Internat. Conf. Data Engrg. (IEEE, Piscataway, NJ) 883-892.
  58. Xia F, Liu T, Wang J, Zhang W, Li H (2008) Listwise approach to learning to rank: Theory and algorithm. Internat. Conf. Machine Learn., J. Machine Learn. Res., 1192-1199.
  59. Yedidia J (2011) Message-passing algorithms for inference and op- timization. J. Statist. Phys. 145(4):860-890.
  60. Yu X, Ren X, Sun Y, Gu Q, Sturt B, Khandelwal U, Norick B, Han J (2014) Personalized entity recommendation: A heterogeneous information network approach. Proc. 7th ACM Conf. Web Search Data Mining (ACM, New York).
  61. Zeng X, Li Y, Leung S, Lin Z, Liu X (2016) Investment behavior prediction in heterogeneous information network. Neuro- computing 217:125-132.
  62. Zhao H, Yao Q, Li J, Song Y, Lee DL (2017) Meta-graph based rec- ommendation fusion over heterogeneous information networks. Proc. ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York).
  63. Zheng Z, Zha H, Zhang T, Chapelle O, Chen K, Sun G (2008) A general boosting method and its application to learning rank- ing functions for web search. Platt JC, Koller D, Singer Y, Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Morgan Kauffman, San Francisco), 1697-1704.
  64. Zhou D, Orshanskiy SA, Zha H, Giles CL (2007) Co-ranking authors and documents in a heterogeneous network. 7th IEEE Internat. Conf. Data Mining (ICDM 2007) (IEEE, Piscataway, NJ).
  65. Raschid, Sayyadi, and Hristidis: Learning to Rank in Entity Relationship Graphs