Academia.eduAcademia.edu

Outline

Learning to Rank Academic Experts in the DBLP Dataset

Abstract

Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts, and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combing all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.

References (59)

  1. Adali, S., Magdon-Ismail, M. & Marshall, B. (2007), A classification algorithm for finding the optimal rank aggregation method, in 'Proceedings of the 22nd International International Symposium on In Computer and Information Sciences'.
  2. Balog, K., Azzopardi, L. & de Rijke, M. (2006), Formal models for expert finding in enterprise corpora, in 'Proceedings of the 29th annual international ACM Conference on Research and Development in Information Retrieval'.
  3. Balog, K., Azzopardi, L. & de Rijke, M. (2009), 'A language modeling framework for expert finding', Information Processing and Management 45, 1-19.
  4. Balog, K., Fang, Y., de Rijke, M., Serdyukov, P. & Si, L. (2012), 'Expertise retrieval', Foundations and Trends in Information Retrieval 6, 127-256.
  5. Batista, P. D., Campiteli, M. G. & Kinouchi, O. (2006), 'Is it possible to compare researchers with different scientific interests?', Scientometrics 68, 179-189.
  6. Bozkurt, I. N., Gurkok, H. & Ayaz, E. S. (2007), Data fusion and bias, Technical report, Bilkent University.
  7. Breiman, L. (1996), 'Bagging predictors', Machine Learning 24, 123-140.
  8. Brin, S., Page, L., Motwani, R. & Winograd, T. (1999), The pagerank citation ranking: Bringing order to the web, Technical Report 1999-66, Stanford Digital Library Technologies Project.
  9. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N. & Hullender, G. (2005), Learning to rank using gradient descent, in 'Proceedings of the 22nd International Conference on Machine Learning'.
  10. Cao, Y., Liu, J., Bao, S. & Li, H. (2006), Research on expert search at enterprise track of trec 2005, in 'Proceedings of the 14th Text REtrieval Conference'.
  11. Chapelle, O. & Chang, Y. (2011), Yahoo! Learning to Rank Challenge -Overview, in ' Machine Learning Research 14, 1-24.'.
  12. Chen, P.-J., Xie, H., Maslov, S. & Redner, S. (2007), 'Finding scientific gems with google's pagerank algorithms', Informetrics 1(1), 8-15.
  13. Craswell, N., Hawking, D., Vercoustre, A.-M. & Wilkins, P. (2001), P@noptic expert: Searching for experts not just for documents, in 'Proceedings of the 7th Australian World Wide Web Conference (poster papers)'.
  14. de Borda, J.-C. (1781), Mémoire sur les Élections au Scrutin., Histoire de l'Académie Royale des Sciences.
  15. Deng, H., King, I. & Lyu, M. R. (2008), Formal models for expert finding on dblp bibliography data, in 'Proceedings of the 8th IEEE International Conference on Data Mining'.
  16. Deng, H., King, I. & Lyu, M. R. (2011), 'Enhanced models for expertise retrieval using community-aware strategies', IEEE Transactions on Systems, Man, and Cybernetics 99, 1-14.
  17. Dwork, C., Kumar, R., Naor, M. & Sivakumar, D. (2001), Rank aggregation revisited, in 'Proceeding of the 10th World Wide Web Conference Series'.
  18. Egghe, L. (2006), 'Theory and practice of the g-index', Scientometrics 69, 131-152.
  19. Ertekin, S. & Rudin, C. (2011), 'On equivalence relationships between classification and ranking algorithms', Machine Learning Research 12, 2905-2929.
  20. Fang, H. & Zhai, C. (2007), Probabilistic models for expert finding, in 'Proceedings of the 29th European Conference on Information Retrieval Research'.
  21. Fox, E. & Shaw, J. A. (1994), Combination of multiple searches, in 'Proceedings of the 2nd Text Retrieval Conference'.
  22. Freund, Y., Iyer, R., Schapire, R. E. & Singer, Y. (2003), 'An efficient boosting algorithm for combining preferences', Machine Learning Research 4, 933-969.
  23. Haykin, S. (2008), Neural Networks and Learning Machines, Pearson Education.
  24. Hirsch, J. E. (2005), An index to quantify an individual's scientific research output, in 'Proceedings of the National Academy of Sciences USA'.
  25. Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2010), A practical guide to support vector classification, Technical report, National Taiwan University.
  26. Ji, M., Han, J. & Danilevsky, M. (2011), Ranking-based classification of heterogeneous information networks, in 'Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining'.
  27. Joachims, T. (2006), Training linear SVMs in linear time, in 'Proceedings of the 12th ACM Conference on Knowledge Discovery and Data Mining'.
  28. Liu, T.-Y. (2009), 'Learning to rank for information retrieval', Foundations of Trends Information Retrieval 3, 225-331.
  29. Liu, X., Bollen, J., Nelson, M. L. & de Sompel, H. V. (2005), Co-authorship networks in the digital library research community, in 'Information Processing and Management' 41, 1462-1480.
  30. Macdonald, C. & Ounis, I. (2008), Voting techniques for expert search, in 'Knowledge Information Systems' 16, 259-280.
  31. Macdonald, C. & Ounis, I. (2011), Learning models for ranking aggregates, in 'Proceedings of the 33rd European Conference on Information Retrieval'.
  32. Manning, C. D. (2008), Introduction to Information Retrieval, Cambridge University Press.
  33. Metzler, D. & Croft, W. B. (2007), 'Linear feature-based models for information retrieval', Information Retrieval 16(28), 1-23.
  34. Montague, M. H. & Aslam, J. A. (2002), Condorcet fusion for improved retrieval, in 'Proceedings of the 11th international conference on information and knowledge management'.
  35. Moreira, C., Calado, P. & Martins, B. (2011), Learning to rank for expert search in digital libraries of academic publications, in 'Proceedings of the 15th Portuguese Conference on Artificial Intelligence'.
  36. Petkova, D. & Croft, B. (2006), Hierarchical language models for expert finding in enterprise corpora, in 'Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence'.
  37. Petkova, D. & Croft, B. (2007), Proximity-based document representation for named entity retrieval, in 'Proceedings of the 16th ACM conference on Conference on information and knowledge management'.
  38. Pfahringer, B. (2011), Semi-random model tree ensembles: An effective and scalable regression method, in 'Proceedings of the 24th Australasian Joint Conference in Advances in Artificial Intelligence'.
  39. Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y. & Li, H. (2008), Learning to rank relational objects and its application to web search, in 'Proceedings of the 17th international conference on World Wide Web'.
  40. Riker, W. H. (1988), Liberalism Against Populism: A Confrontation Between the Theory of Democracy and the Theory of Social Choice, Waveland Press.
  41. Serdyukov, P. (2009), Search for Expertise : Going Beyond Direct Evidence, PhD thesis, University of Twente.
  42. Serdyukov, P. & Hiemstra, D. (2008), Modeling documents as mixtures of persons for expert finding, in 'Proceedings of the 30th European conference on Advances in information retrieval'.
  43. Sidiropoulos, A. & Manolopoulos, Y. (2005), 'A citation-based system to assist prize awarding', Journal of the ACM Special Interest Group on Management of Data Record 34, 54-60.
  44. Sidiropoulos, A. & Manolopoulos, Y. (2006), Generalized comparison of graphbased ranking algorithms for publications and authors, in 'Journal for Systems and Software' 79, 1679-1700.
  45. Sidiropoulos, A., Katsaros, D. & Manolopoulos, Y. (2007), 'Generalized h-index for disclosing latent facts in citation networks', Scientometrics 72(2), 253-280.
  46. Smucker, M. D., Allan, J. & Carterette, B. (2007), A comparison of statistical significance tests for information retrieval eval- uation, in 'Ins Proceedings of the sixteenth ACM conference on Conference on information and knowledge management'.
  47. Sorokina, D., Caruana, R. & Riedewald, M. (2007), Additive groves of regression trees, in 'Proceedings of the 18th European Conference on Machine Learning'.
  48. Tsochantaridis, I., Joachims, T., Hofmann, T. & Altun, Y. (2005), 'Large margin methods for structured and interdependent output variables', Machine Learning Research 6, 1453-1484.
  49. Voorhees, E. (1999), The trec-8 question answering track report, in 'Proceedings of the 8th Text Retrieval Conference'.
  50. Xu, J. & Li, H. (2007), Adarank: a boosting algorithm for information retrieval, in 'Proceedings of the 30th annual international ACM conference on Research and development in information retrieval'.
  51. Xu, J., yan Liu, T., Lu, M., Li, H. & ying Ma, W. (2008), Directly optimizing evaluation measures in learning to rank, in 'Proceedings of the 31st annual international ACM conference on Research and development in information retrieval'.
  52. Yang, Z., Tang, J., Wang, B., Guo, J., Li, J. & Chen, S. (2009), Expert2bole: From expert finding to bole search, in 'Proceedings of the 15th ACM Conference on Knowledge Discovery and Data Mining'.
  53. Yue, Y., Finley, T., Radlinski, F. & Joachims, T. (2007), A support vector method for optimizing average precision, in 'Proceedings of the 30th Annual International ACM Conference on Research and Development in Information Retrieval'.
  54. Zhang, C.-T. (2009), 'The e-index, complementing the h-index for excess citations', Public Library of Science One 4, 5.
  55. Zhu, J., Song, D. & Rüger, S. (2007), The open university at trec 2006 enterprise track expert search task, in 'Proceedings of the 15th Text Retrieval Conference'.
  56. Zhu, J., Song, D., Rüger, S. & Huang, J. (2008), Modeling document features for expert finding, in 'Proceedings of the 17th ACM Conference on Information and Knowledge Management'.
  57. Catarina Moreira is a researcher at the Intelligent Agents and Synthetic Characters Group at INESC-ID. She is also making a PhD in quantum decision support systems at Instituto Superior Técnico, Technical University of Lisbon. She received her Master Degree in enterprise information systems and artificial intelligence in 2011. Her main research interests are machine learning algorithms, high dimensional indexing structures, quantum cognitive models and quantum probabilistic graphical models.
  58. 2 Pável Calado Pável Calado has an MSc degree in Computer Science from the Federal University of Minas Gerais (UFMG), where he also obtained his PhD in 2004. He is currently an Assistant Professor at the Computer Science and Engineering Department of Instituto Superior Técnico (IST), and a researcher at INESC-ID, in Lisbon. His research interests include information retrieval, information systems, and digital libraries.
  59. Bruno Martins is an Assistant Professor at the Computer Science and Engineering Department of Instituto Superior Técnico (IST), a school of Engineering, Science and Technology in the University of Lisbon. He is also a researcher at the Data Management and Information Retrieval group of INESC-ID. His main research interests are related to the general areas of text mining and information retrieval.