Academia.eduAcademia.edu

Outline

Researcher affiliation extraction from homepages

2009, Proceedings of the 2009 Workshop on …

https://doi.org/10.3115/1699750.1699752

Abstract
sparkles

AI

This research focuses on extracting researcher affiliation information from personal homepages using advanced natural language processing techniques. It highlights the importance of analyzing unstructured text found on these homepages, which often contains valuable academic information. The study presents methodologies for identifying key sentences and relationships, contributing to the field of scholarly data analysis.

References (207)

  1. Brad Adelberg. 1998. Nodose -a tool for semi- automatically extracting structured and semistruc- tured data from text documents. ACM SIGMOD, 27(2):283-294.
  2. Javier Artiles, Julio Gonzalo, and Satoshi Sekine. 2009. Weps 2 evaluation campaign: overview of the web people search clustering task. In 2nd Web Peo- ple Search Evaluation Workshop (WePS 2009), 18th WWW Conference.
  3. A. L. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schu- bert, and T. Vicsek. 2002. Evolution of the so- cial network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3- 4):590 -614.
  4. Kedar Bellare, Partha Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCal- lum, and Mark Dredze. 2007. Lightly-supervised attribute extraction for web search. In Proceedings of NIPS 2007 Workshop on Machine Learning for Web Search.
  5. Mary Elaine Califf and Raymond J. Mooney. 1999. Relational learning of pattern-match rules for in- formation extraction. In Proceedings of the Six- teenth National Conference on Artificial Intelli- gence, pages 328-334.
  6. Xiwen Cheng, Peter Adolphs, Feiyu Xu, Hans Uszko- reit, and Hong Li. 2009. Gossip galore -a self- learning agent for exchanging pop trivia. In Pro- ceedings of the Demonstrations Session at EACL 2009, pages 13-16, Athens, Greece, April. Associa- tion for Computational Linguistics.
  7. Dayne Freitag. 1998. Information extraction from html: Application of a general machine learning ap- proach. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 517- 523.
  8. A. A Goodrum, K. W McCain, S. Lawrence, and C. L Giles. 2001. Scholarly publishing in the internet age: a citation analysis of computer science liter- ature. Information Processing and Management, 37:661-675, September.
  9. Raymond Kosala and Hendrik Blockeel. 2000. Web mining research: A survey. SIGKDD Explorations, 2:1-15.
  10. Nicholas Kushmerick. 2000. Wrapper induction: Ef- ficiency and expressiveness. Artificial Intelligence, 118:2000.
  11. John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Prob- abilistic models for segmenting and labeling se- quence data. In Proc. 18th International Conf. on Machine Learning, pages 282-289. Morgan Kauf- mann, San Francisco, CA.
  12. Bing Liu and Kevin Chen-Chuan-Chang. 2004. Edito- rial: special issue on web content mining. SIGKDD Explor. Newsl., 6(2):1-4.
  13. Andrew Kachites McCallum. 2002. Mal- let: A machine learning for language toolkit. http://mallet.cs.umass.edu.
  14. M. E. J. Newman. 2001. The structure of scientific collaboration networks. In Proceedings National Academy of Sciences USA, pages 404-418.
  15. Marius Pas ¸ca. 2009. Outclassing Wikipedia in open- domain information extraction: Weakly-supervised acquisition of attributes over conceptual hierarchies. In Proceedings of the 12th Conference of the Eu- ropean Chapter of the ACL (EACL 2009), Athens, Greece, March.
  16. Celine Robardet and Eric Fleury. 2009. Communi- ties detection and the analysis of their dynamics in collaborative networks. Int. J. Web Based Commu- nities, 5(2):195-211.
  17. Yasmin H. Said, Edward J. Wegman, Walid K. Shara- bati, and John T. Rigsby. 2008. Social networks of author-coauthor relationships. Computational Statistics & Data Analysis, 52(4):2177-2184.
  18. Satoshi Sekine. 2006. On-demand information ex- traction. In Proceedings of the COLING/ACL 2006
  19. Main Conference Poster Sessions, pages 731-738, Sydney, Australia, July. Association for Computa- tional Linguistics.
  20. György Szarvas, Richárd Farkas, and András Kocsor. 2006. A multilingual named entity recognition sys- tem using boosting and c4.5 decision tree learning algorithms. DS2006, LNAI, 4265:267-278.
  21. Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Dis- course and Dialogue, pages 80-87, Sydney, Aus- tralia, July. Association for Computational Linguis- tics.
  22. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Walter Daelemans and Miles Osborne, editors, Pro- ceedings of CoNLL-2003, pages 142-147. Edmon- ton, Canada.
  23. Y. Yang, C. M. Au Yeung, M. J. Weal, and H. Davis. 2009. The researcher social network: A social net- work based on metadata of scientific publications.
  24. Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan Chang. 2002. Pebl: positive example based learn- ing for web page classification using svm. In KDD '02: Proceedings of the eighth ACM SIGKDD in- ternational conference on Knowledge discovery and data mining, pages 239-248, New York, NY, USA. ACM.
  25. E. Amitay. 1998. Using common hypertext links to identify the best phrasal description of target web documents. In Proc. of the SIGIR'98 Post Confe- rence Workshop on Hypertext Information Re- trieval for the Web, Melbourne, Australia.
  26. G. Attardi, A. Gulli, and F. Sebastiani. 1999. Theseus: categorization by context. In Proceedings of the 8th International World Wide Web Conference.
  27. A. Baxter, P. Christen, T. Churches. 2003. A compar- ison of fast blocking methods for record linkage. In ACM SIGKDD'03 Workshop on Data Cleaning, Record Linkage and Object consolidation. Wash- ington DC.
  28. A. Broder, S. Glassman, M. Manasse, and G. Zweig. 1997. Syntactic clustering of the Web. In Proceed- ings of the Sixth International World Wide Web Conference, pp. 391-404.
  29. C.J.C. Burges. 1998. A tutorial on support vector ma- chines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121-167.
  30. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. 1998. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference.
  31. K. Chakrabarti, V. Ganti, J. Han, and D. Xin. 2006. Ranking objects based on relationships. In SIG- MOD '06: Proceedings of the 2006 ACM SIG- MOD international conference on Management of data, pages 371-382, New York, NY, USA. ACM.
  32. B. Davison. 2000. Topical locality in the web. In SI- GIR'00: Proceedings of the 23rd annual interna- tional ACM SIGIR conference on Research and development in information retrieval, pages 272- 279, New York, NY, USA. ACM.
  33. I.P. Fellegi, and A.B. Sunter. A Theory for Record Linkage, Journal of the American Statistical Asso- ciation, 64, (1969), 1183-1210.
  34. C. L. Giles, K. Bollacker, and S. Lawrence. 1998. CiteSeer: An automatic citation indexing system. In IanWitten, Rob Akscyn, and Frank M. Shipman III, editors, Digital Libraries 98 -The Third ACM Conference on Digital Libraries, pages 89-98, Pittsburgh, PA, June 23-26. ACM Press.
  35. T.H. Haveliwala, A. Gionis, D. Klein, and P. Indyk. 2002. Evaluating strategies for similarity search on the web. In WWW '02: Proceedings of the 11th in- ternational conference on World Wide Web, pages 432-442, New York, NY, USA. ACM.
  36. K. Jarvelin, and J. Kekalainen. 2000. IR Evaluation Methods for Retrieving Highly Relevant Docu- ments. In Proceedings of the 23rd Annual Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval (SI- GIR2000).
  37. S. Lawrence, C.L. Giles, and K. Bollacker. 1999. Dig- ital libraries and Autonomous Citation Indexing. IEEE Computer, 32(6):67-71.
  38. A. McCallum, K. Nigam, J. Rennie, and K. Seymore. 1999. Building Domain-specific Search Engines with Machine Learning Techniques. In Proceed- ings of the AAAI-99 Spring Symposium on Intelli- gent Agents in Cyberspace.
  39. A. McCallum, K. Nigam, and L. Ungar. 2000. Effi- cient clustering of high-dimensional data sets with application to reference matching. In Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discov- ery and Data Mining.
  40. O.A. McBryan. 1994. Genvl and wwww: Tools for taming the web. In In Proceedings of the First In- ternational World Wide Web Conference, pages 79-90.
  41. H. Nanba, M. Okumura. 1999. Towards Multi-paper Summarization Using Reference Information. In Proc. of the 16 th International Joint Conference on Artificial Intelligence, pp.926-931.
  42. H. Nanba, T. Abekawa, M. Okumura, and S. Saito. 2004. Bilingual PRESRI: Integration of Multiple Research Paper Databases. In Proc. of RIAO 2004, 195-211.
  43. L. Parsons, E. Haque, H. Liu. 2004. Subspace cluster- ing for high dimensional data: a review. SIGKDD Explorations 6(1): 90-105.
  44. S.E. Robertson, S. Walker, and M. Beaulieu. 1999. Okapi at TREC-7: automatic ad hoc, filtering, VLC and filtering tracks. In Proceedings of TREC'99.
  45. S. Shi, R. Song, and J-R Wen. 2006. Latent Additivity: Combining Homogeneous Evidence. Technique report, MSR-TR-2006-110, Microsoft Research, August 2006.
  46. S. Shi, F. Xing, M. Zhu, Z.Nie, and J.-R. Wen. 2006. Pseudo-Anchor Extraction for Search Vertical Ob- jects. In Proc. of the 2006 ACM 15th Conference on Information and Knowledge Management. Ar- lington, USA.
  47. Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. 2005. Object-level ranking: bringing order to web objects. InWWW'05: Proceedings of the 14th international conference on World Wide Web, pages 567-574, New York, NY, USA. ACM. References arXiv. 2005. arxiv.org archive. http://arxiv.org.
  48. R. Barzilay, K.R. McKeown, and M. Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th an- nual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 550-557. Association for Computational Linguistics Morristown, NJ, USA.
  49. R. Brandow, K. Mitze, and L.F. Rau. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing and management, 31(5):675-685.
  50. A.L. Brown, J.D. Day, and R.S. Jones. 1983. The development of plans for summarizing texts. Child Development, pages 968-979.
  51. Stanley Chen and Ronald Rosenfeld. 1999. A Gaus- sian prior for smoothing maximum entropy models. Technical report, Carnegie Mellon University, Pitts- burgh, PA.
  52. James R. Curran and Stephen Clark. 2003. Investigat- ing GIS and smoothing for maximum entropy tag- gers. In Proceedings of the 10th Conference of the European Chapter of the Association for Computa- tional Linguistics, pages 91-98, Budapest, Hungary, 12-17 April.
  53. E. Frank, M.A. Hall, G. Holmes, R. Kirkby, B. Pfahringer, I.H. Witten, and L. Trigg. 2005. Weka-a machine learning workbench for data min- ing. The Data Mining and Knowledge Discovery Handbook, pages 1305-1314.
  54. M. Johnson, S. Geman, S. Canon, Z. Chi, and S. Rie- zler. 1999. Estimators for stochastic 'unification- based' grammars. In Proceedings of the 37th Meet- ing of the ACL, pages 535-541, University of Mary- land, MD.
  55. W. Kintsch and T.A. Van Dijk. 1978. Toward a model of text comprehension and production. Psychologi- cal review, 85(5):363-94.
  56. K. Knight and D. Marcu. 2000. Statistics-based summarization-step one: Sentence compression. In Proceedings of the National Conference on Artifi- cial Intelligence, pages 703-710. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
  57. J. Kupiec, J. Pedersen, and F. Chen. 1995. A train- able document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information re- trieval, pages 68-73. ACM New York, NY, USA.
  58. M. Marcus, B. Santorini, and M. Marcinkiewicz. 1993. Building a large annotated corpus of english: The penn treebank.
  59. T. Murphy, T. McIntosh, and J.R. Curran. 2006. Named entity recognition for astronomy literature. In Proceedings of the 2006 Australasian Language Technology Workshop (ALTW).
  60. K. Nigam, J. Lafferty, and A. McCallum. 1999. Us- ing maximum entropy for text classification. In Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61-67, Stockholm, Sweden.
  61. Adwait Ratnaparkhi. 1996. A maximum entropy part- of-speech tagger. In Proceedings of the EMNLP Conference, pages 133-142, Philadelphia, PA.
  62. J.C. Reynar and A. Ratnaparkhi. 1997. A maximum entropy approach to identifying sentence bound- aries. In Proceedings of the fifth conference on Ap- plied natural language processing, pages 16-19.
  63. R. Rosenfeld. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10:187-228.
  64. J.M. Swales. 1990. Genre analysis: English in aca- demic and research settings. Cambridge University Press.
  65. S. Teufel and M. Moens. 2002. Summarising scientific articles -experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409-445.
  66. S. Teufel, J. Carletta, and M. Moens. 1999. An anno- tation scheme for discourse-level argumentation in research articles. In Proceedings of EACL 1999.
  67. S. Teufel, A. Siddharthan, and D. Tidhar. 2006. Auto- matic classification of citation function. In Proceed- ings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 103-110.
  68. S. Teufel. 1999. Argumentative zoning: Information extraction from scientific text. Ph.D. thesis, Univer- sity of Edinburgh, Edinburgh, UK.
  69. Guo-Wei Bian and Shun-Yuan Teng. 2008. Integrat- ing Query Translation and Text Classification in a Cross-Language Patent Access System, Proceeding of the 7 th TCIR Workshop Meeting: 341-346.
  70. Stephane Clinchant and Jean-Michel Renders. 2008. XRCE's Participation to Patent Mining Task at NTCIR-7, Proceedings of the 7 th TCIR Workshop Meeting: 351-353.
  71. Atsushi Fujii, Makoto Iwayama, and Noriko Kando. 2004. Overview of Patent Retrieval Task at NTCIR-4, Working otes of the 4 th TCIR Work- shop: 225-232.
  72. Atsushi Fujii, Makoto Iwayama, and Noriko Kando. 2005. Overview of Patent Retrieval Task at NTCIR-5, Proceedings of the 5 th TCIR Workshop Meeting: 269-277.
  73. Atsushi Fujii, Makoto Iwayama, and Noriko Kando. 2007. Overview of the Patent Retrieval Task at NTCIR-6 Workshop, Proceedings of the 6 th TCIR Workshop Meeting: 359-365.
  74. Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, and Takehito Utsuro. 2008. Overview of the Patent Translation Task at the NTCIR-7 Workshop, Pro- ceedings of the 7 th TCIR Workshop Meeting: 389- 400.
  75. Marti A. Hearst. 1992. Automatic Acquisition of Hy- ponyms from Large Text Corpora, Proceedings of the 14 th International Conference on Computation- al Linguistics: 539-545.
  76. Daisuke Ikeda, Toshiaki Fujiki, and Manabu Okumu- ra. 2006. Automatically Linking News Articles to Blog Entries, Proceedings of AAAI Spring Sympo- sium Series Computational Approaches to Analyz- ing Weblogs: 78-82.
  77. Masaki Itagaki, Takako Aikawa, and Xiaodong He. 2007. Automatic Validation of Terminology Trans- lation Consistency with Statistical Method, Pro- ceedings of MT summit XI: 269-274.
  78. Hideo Itoh, Hiroko Mano, and Yasushi Ogawa. 2002. Term Distillation for Cross-db Retrieval, Working otes of the 3 rd TCIR Workshop Meeting, Part III: Patent Retrieval Task: 11-14.
  79. Makoto Iwayama, Atsushi, Fujii, Noriko Kando, and Akihiko Takano. 2002. Overview of Patent Re- trieval Task at NTCIR-3, Working otes of the 3 rd TCIR Workshop Meeting, Part III: Patent Re- trieval Task: 1-10.
  80. Makoto Iwayama, Atsushi Fujii, and Noriko Kando. 2005. Overview of Classification Subtask at NTCIR-5 Patent Retrieval Task, Proceedings of the 5 th TCIR Workshop Meeting: 278-286.
  81. Makoto Iwayama, Atsushi Fujii, and Noriko Kando. 2007. Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task, Proceedings of the 6 th TCIR Workshop Meeting: 366-372.
  82. Noriko Kando, Kazuko Kuriyama, Toshihiko Nozue, Koji Eguchi, Hiroyuki Kato, and Soichiro Hidaka. 1999. Overview of IR Tasks at the first NTCIR Workshop, Proceedings of the 1 st TCIR Work- shop on Research in Japanese Text Retrieval and Term Recognition: 11-44.
  83. Noriko Kando, Kazuko Kuriyama, and Makoto Yo- shioka. 2001. Overview of Japanese and English Information Retrieval Tasks (JEIR) at the Second NTCIR Workshop, Proceedings of the 2 nd TCIR Workshop Meeting: 4-37 -4-60.
  84. Hisao Mase and Makoto Iwayama. 2008. NTCIR-7 Patent Mining Experiments at Hitachi, Proceedings of the 7 th TCIR Workshop Meeting: 365-368.
  85. Hidetsugu Nanba. 2007. Query Expansion using an Automatically Constructed Thesaurus, Proceedings of the 6 th TCIR Workshop Meeting: 414-419.
  86. Hidetsugu Nanba, Natsumi Anzen, and Manabu Okumura:a. 2008. Automatic Extraction of Citation Information in Japanese Patent Applications, Inter- national Journal on Digital Libraries, 9(2): 151- 161.
  87. Hidetsugu Nanba, Atsushi Fujii, Makoto Iwayama, and Taiichi Hashimoto:b. 2008. Overview of the Patent Mining Task at the NTCIR-7 Workshop, Proceedings of the 7 th TCIR Workshop Meeting: 325-332.
  88. Hidetsugu Nanba:c. 2008. Hiroshima City University at NTCIR-7 Patent Mining Task. Proceedings of the 7 th TCIR Workshop Meeting: 369-372.
  89. Hidetsugu Nanba, Hideaki Kamaya, Toshiyuki Take- zawa, Manabu Okumura, Akihiro Shinmori, and Hidekazu Tanigawa. 2009. Automatic Translation of Scholarly Terms into Patent Terms, Journal of Information Processing Society Japan TOD, 2(1): 81-92. (in Japanese)
  90. Gerald Salton. 1971. The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ.
  91. Masatsugu Tonoike. Mitsuhiro Kida, Toshihiro Taka- gi, Yasuhiro Sakai, Takehito Utsuro, and Satoshi Sato. 2005. Translation Estimation for Technical Terms using Corpus Collected from the Web, Pro- ceedings of the Pacific Association for Computa- tional Linguistics: 325-331.
  92. Salah Aït-Mokhtar, Jean-Pierre Chanod, and Claude Roux. 2002. Robustness beyond shallowness: in- cremental dependency parsing. Natural Language Engineering, 8(2/3):121-144.
  93. Robin Barrow. 2008. Education and the Body: Prole- gomena. British Journal of Educational Studies 56(3):272-285.
  94. Lutz Bornmann and Hans-Dieter Daniel. 2003. Begu- tachtung durch Fachkollegen in der Wissenschaft. Stand der Forschung zur Reliabilität, Fairness und Validität des Peer-Review-Verfahrens. Universität auf dem Prüfstand. Konzepte und Befunde der Hochschulforschung. (S. Schwarz and U. Teichler, Eds.). Campus Verlag Frankfurt/New York: 207- 225.
  95. Richard Holmes. 1997. Genre analysis, and the social sciences: An investigation of the structure of re- search article discussion sections in three disci- plines. English for Specific Purposes, 16(4):321- 337.
  96. Noriko Kando. 1997. Text-level structure of research papers: Implications for text-based information processing systems. Proceedings of the 19th Brit- ish Computer Society Annual Colloquium of Infor- mation Retrieval Research, Sheffield University, Sheffield, UK, 68-81.
  97. Elizabeth D. Liddy. 1991. The discourse-level struc- ture of empirical abstracts: an exploratory study. Information Processing and Management, 27(1):55-81.
  98. Frédérique Lisacek, Christine Chichester, Aaron Kap- lan, and Ágnes Sandor. 2005. Discovering para- digm shift patterns in biomedical abstracts: appli- cation to neurodegenerative diseases. First Interna- tional Symposium on Semantic Mining in Biomedi- cine, Cambridge, UK, April 11-13, 2005.
  99. Yanping Lu. 2005. Editorial Peer Review in Educa- tion: Mapping the Field. Australian Association for Research in Education 2004 conference papers, Melbourne, Australia (Jeffery, P. L., Ed.):1-19.
  100. Yanping Lu. 2008. Peer review and its contribution to manuscript quality: an Australian perspective. Learned Publishing, 21(3):307-316.
  101. Eric G. Meinberg and Peter J. Stern. 2003. Incidence of Wrong-Site Surgery Among Hand Surgeons. The Journal of Bone and Joint Surgery (American) 85:193-197.
  102. Yoko Mizuta, Anna Korhonen, Tony Mullen, and Nigel Collier. 2006. Zone analysis in biology arti- cles as a basis for information extraction. Interna- tional Journal of Medical Informatics, 75(6):468- 87.
  103. Michaela Montesi and John Mackenzie Owen. 2008. Research journal articles as document genres: ex- ploring their role in knowledge organization. Jour- nal of Documentation, 64(1):143-167.
  104. Robert N. Oddy, Elizabeth D. Liddy, Bhaskaran Balakrishnan, Ann Bishop, Joseph Elewononi and Eileen Martin. 1992. Towards the use of situational information in information retrieval. Journal of Documentation, 48(2):123-171.
  105. Yang Ruiying and Desmond Allison. 2004. Research articles in applied linguistics: structures from a functional perspective. English for Specific Pur- poses, 23(3):264-279.
  106. Ágnes Sándor, Aaron Kaplan and Gilbert Rondeau. 2006. Discourse and citation analysis with concept- matching. International Symposium: Discourse and document (ISDD), Caen, France, June 15- 16, 2006.
  107. Ágnes Sándor. 2007. Modeling metadiscourse con- veying the author's rhetorical strategy in biomedi- cal research abstracts. Revue Française de Linguis- tique Appliquée 200(2):97-109.
  108. Ágnes Sándor. 2009. Automatic detection of dis- course indicating emerging risk. To appear in Critical Approaches to Discourse Analysis across Disciplines. Risk as Discourse -Discourse as Risk: Interdisciplinary perspectives.
  109. Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409-445.
  110. Victoria Uren, Simon Buckingham Shum, Clara Mancini, and Gangmin Li. 2007. Modelling Natu- ralistic Argumentation in Research Literatures: Representation and Interaction Design Issues. In- ternational Journal of Intelligent Systems, (Special Issue on Computational Models of Natural Argu- ment, Eds: C. Reed and F. Grasso), 22(1):17-47.
  111. Richard Whitley and Jochen Gläser. 2007. The Changing Governance of Sciences: The Advent Of Research Evaluation Systems. Springer References
  112. Joan C. Bartlett and Tomasz Neugebauer. 2008. A task-based information retrieval interface to support bioinformatics analysis. In IIiX '08: Proceedings of the second international symposium on Information interaction in context, pages 97-101, New York, NY, USA. ACM.
  113. Nicholas J. Belkin. 1994. Design principles for electronic textual resources: Investigating users and uses of scholarly information. In Current Issues in Computational Linguistics: In Honour of Donald Walker.Kluwer, pages 1-18. Kluwer.
  114. Katriina Bystrm, Katriina Murtonen, Kalervo Jrvelin, Kalervo Jrvelin, and Kalervo Jrvelin. 1995. Task complexity affects information seeking and use. In Information Processing and Management, pages 191-213.
  115. Juliet Corbin and Anselm L. Strauss. 2008. Basics of qualitative research : techniques and procedures for developing grounded theory. Sage, 3rd edition.
  116. John W Ely, Jerome A Osheroff, Paul N Gorman, Mark H Ebell, M Lee Chambliss, Eric A Pifer, and P Zoe Stavri. 2000. A taxonomy of generic clini- cal questions: classification study. British Medical Journal, 321:429-432.
  117. Barney G. Glaser and Anselm L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qual- itative Research. Aldine de Gruyter, New York.
  118. Andreas Henrich and Volker Luedecke. 2007. Char- acteristics of geographic information needs. In GIR '07: Proceedings of the 4th ACM workshop on Ge- ographical information retrieval, pages 1-6, New York, NY, USA. ACM.
  119. W. R. Hersh. 2008. Information Retrieval. Springer. Information Retrieval for biomedical researchers.
  120. Vahed Qazvinian and Dragomir R. Radev. 2008. Sci- entific paper summarization using citation summary networks. In The 22nd International Conference on Computational Linguistics (COLING 2008), Mach- ester, UK, August.
  121. G. Salton and M. J. McGill. 1983. Introduction to modern information retrieval. McGraw-Hill, New York.
  122. Karen Spark Jones. 1998. Automatic summarizing: factors and directions. In I. Mani and M. Maybury, editors, Advances in Automatic Text Summarisation. MIT Press, Cambridge MA.
  123. Robert S Taylor. 1962. Process of asking questions. American Documentation, 13:391-396, October.
  124. Simone Teufel and Marc Moens. 2002. Summa- rizing scientific articles: experiments with rele- vance and rhetorical status. Computional Linguis- tics, 28(4):409-445.
  125. Elaine G. Toms. 2000. Understanding and facilitating the browsing of electronic text. International Jour- nal of Human-Computing Studies, 52(3):423-452.
  126. D Tran, C Dubay, P Gorman, and W. Hersh. 2004. Ap- plying task analysis to describe and facilitate bioin- formatics tasks. Studies in Health Technology and Informatics, 107107(Pt 2):818-22.
  127. Stephen Wan and Cécile Paris. 2008. In-browser sum- marisation: Generating elaborative summaries bi- ased towards the reading context. In The 46th An- nual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Paper, Columbus, Ohio, June.
  128. Stephen Wan, Cécile Paris, and Robert Dale. 2009. Whetting the appetite of scientists: Producing sum- maries tailored to the citation context. In Proceed- ings of the Joint Conference on Digital Libraries. References
  129. Vahed Qazvinian and Dragomir R. Radev. Scien- tific paper summarization using citation sum- mary networks. In COLING 2008, Manchester, UK, 2008.
  130. Dragomir R. Radev, Mark Joseph, Bryan Gibson, and Pradeep Muthukrishnan. A Bibliometric and Network Analysis of the Field of Computa- tional Linguistics. JASIST, 2009 to appear. References J.D. Anderson and M.A. Hofmann. 2006. A fully faceted syntax for Library of Congress subject headings. Cataloging & Classification Quarterly, 43(1):7-38.
  131. K. Antelman, E. Lynema, and A.K. Pace. 2006. To- ward a twenty-first century library catalog. Infor- mation technology and libraries, 25(3):128-138.
  132. David Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Ma- chine Learning Research, 3:993-1022.
  133. W. Dakka and P.G. Ipeirotis. 2008. Automatic extrac- tion of useful facet hierarchies from text databases. In IEEE 24th International Conference on Data En- gineering, 2008. ICDE 2008, pages 466-475.
  134. W. Dakka, P.G. Ipeirotis, and K.R. Wood. 2005. Au- tomatic construction of multifaceted browsing inter- faces. In Proceedings of the 14th ACM international conference on Information and knowledge manage- ment, pages 768-775. ACM New York, NY, USA.
  135. J. English, M.A. Hearst, R. Sinha, K. Swearingen, and K.-P. Yee. 2001. Examining the usability of web site search. Unpublished Manuscript, http://flamenco.berkley.edu/papers/epicurious- study.pdf.
  136. Christiane Fellbaum, editor. 1998. WordNet: An Elec- tronic Lexical Database. MIT Press.
  137. M.A. Hearst, J. English, R. Sinha, K. Swearingen, and K.-P. Yee. 2002. Finding the flow in web site search. Communications of the ACM, 45(9), September.
  138. M.A. Hearst. 2000. Next Generation Web Search: Set- ting Our Sites. IEEE Data Engineering Bulletin, 23(3):38-48.
  139. M.A. Hearst. 2006a. Clustering Versus Faceted Cat- egories For Information Exploration. Communca- tions Of The Acm, 49(4):59-61.
  140. M.A. Hearst. 2006b. Design recommendations for hierarchical faceted search interfaces. In SIGIR'06 Workshop On Faceted Search, Seattle, Wa, August.
  141. K. Hornbaek and E. Frøkjaer. 1999. Do Thematic Maps Improve Information Retrieval. Human-Computer Interaction (INTERACT'99), pages 179-186.
  142. A.J. Kleiboemer, M.B. Lazear, and J.O. Pedersen. 1996. Tailoring a retrieval system for naive users. In Proceedings of the Fifth Annual Symposium on Doc- ument Analysis and Information Retrieval (SDAIR '96), Las Vegas, NV.
  143. J. Koren, Y. Zhang, and X. Liu. 2008. Personalized interactive faceted search. WWW '08: Proceeding of the 17th international conference on World Wide Web.
  144. Bernardo Magnini. 2000. Integrating subject field codes into WordNet. In Proc. of LREC 2000, Athens, Greece.
  145. Rada Mihalcea and Dan I. Moldovan. 2001.
  146. Ez.wordnet: Principles for automatic generation of a coarse grained wordnet. In Proc. of FLAIRS Con- ference 2001, May.
  147. Roberto Navigli, Paola Velardi, and Aldo Gangemi. 2003. Ontology learning and its application to auto- mated terminology translation. Intelligent Systems, 18(1):22-31.
  148. T.A. Olson. 2007. Utility of a faceted catalog for scholarly research. Library Hi Tech, 25(4):550-561.
  149. W. Pratt, M.A. Hearst, and L. Fagan. 1999. A knowledge-based approach to organizing retrieved documents. In Proceedings of 16th Annual Con- ference on Artificial Intelligence(AAAI 99), Orlando, FL. K. Rodden, W. Basalaj, D. Sinclair, and K. R. Wood. 2001. Does organisation by similarity assist im- age browsing? In Proceeedings of ACM CHI 2001, pages 190-197.
  150. D.M. Russell, M. Slaney, Y. Qu, and M. Hous- ton. 2006. Being literate with large document collections: Observational studies and cost struc- ture tradeoffs. In Proceedings of the 39th Annual Hawaii International Conference on System Sci- ences (HICSS'06).
  151. Mark Sanderson and Bruce Croft. 1999. Deriving con- cept hierarchies from text. In Proceedings of SIGIR 1999.
  152. E. Stoica and M. Hearst. 2004. Nearly-automated metadata hierarchy creation. In Companion Pro- ceedings of HLT-NAACL'04, pages 117-120.
  153. E. Stoica, M.A. Hearst, and M. Richardson. 2007. Au- tomating Creation of Hierarchical Faceted Metadata Structures. In Human Language Technologies: The Annual Conference of the North American Chap- ter of the Association for Computational Linguistics (NAACL-HLT 2007), pages 244-251.
  154. K.-P. Yee, K. Swearingen, K. Li, and M.A. Hearst. 2003. Faceted metadata for image search and browsing. In Proceedings of ACM CHI 2003, pages 401-408. ACM New York, NY, USA.
  155. V. Zelevinsky, J. Wang, and D. Tunkelang. 2008. Sup- porting Exploratory Search for the ACM Digital Li- brary. In Workshop on Human-Computer Interac- tion and Information Retrieval (HCIR'08).
  156. Chia-Hui Chang, Chun-Nan Hsu, and Shao-Cheng Lui. 2003. Automatic information extraction from semi- Support Syst., 35(1):129-147.
  157. Eli Cortez, Altigran S. da Silva, Marcos André Gonc ¸alves, Filipe Mesquita, and Edleno S. de Moura. 2007. FLUX-CIM: flexible unsuper- vised extraction of citation metadata. In Proc. JCDL '07, pages 215-224, New York, NY, USA. ACM.
  158. Isaac G. Councill, C. Lee Giles, and Min-Yen Kan. 2008. ParsCit: An open-source CRF reference string parsing package. In LREC '08, Marrakesh, Morrocco, May.
  159. Junfei Geng and Jun Yang. 2004. Autobib: automatic extraction of bibliographic information on the web. pages 193-204, July.
  160. Erik Hetzner. 2008. A simple method for citation metadata extraction using hidden markov models. In Proc. JCDL '08, pages 280-284, New York, NY, USA. ACM.
  161. Fiona Fui-Hoon Nah. 2004. A study on tolerable wait- ing time: how long are web users willing to wait? Behaviour & Information Technology Special Issue on HCI in MIS, 23(3), May-June.
  162. Fuchun Peng and Andrew McCallum. 2004. Accu- rate information extraction from research papers us- ing conditional random fields. pages 329-336. HLT- NAACL.
  163. Kristie Seymore, Andrew McCallum, and Roni Rosen- feld. 1999. Learning hidden markov model struc- ture for information extraction. In AAAI'99 Work- shop on Machine Learning for Information Extrac- tion.
  164. Xin Xin, Juanzi Li, Jie Tang, and Qiong Luo. 2008. Academic conference homepage understanding us- ing constrained hierarchical conditional random fields. In Proc. CIKM '08, pages 1301-1310, New York, NY, USA. ACM.
  165. Kai-Hsiang Yang, Shui-Shi Chen, Ming-Tai Hsieh, Hahn-Ming Lee, and Jan-Ming Ho. 2008. CRE: An automatic citation record extractor for publica- tion list pages. In Proc. WMWA'08 of PAKDD-2008, Osaka, Japan, May.
  166. Yanhong Zhai and Bing Liu. 2005. Web data extrac- tion based on partial tree alignment. In Proc. WWW '05, pages 76-85, New York, NY, USA. ACM.
  167. Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, and Wei-Ying Ma. 2006. Simultaneous record detec- tion and attribute labeling in web data extraction. In Proc. KDD '06, pages 494-503, New York, NY, USA. ACM. References
  168. Gregory Crane. 1987. From the old to the new: in- tergrating hypertext into traditional scholarship. In Proceedings of the ACM conference on Hypertext, pages 51-55, Chapel Hill, North Carolina, United States. ACM.
  169. Gregory Crane. 2006. What do you do with a million books. D-Lib Magazine, 12(3).
  170. Aron Culotta, Andrew Mccallum, and Jonathan Betz. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of the main conference on Hu- man Language Technology Conference of the North American Chapter of the Association of Computa- tional Linguistics, pages 296-303, Morristown, NJ, USA. Association for Computational Linguistics.
  171. Andreas Doms and Michael Schroeder. 2005. GoP- ubMed: exploring PubMed with the gene ontology. Nucl. Acids Res., 33(suppl 2):783-786, July.
  172. Andrea Ernst-Gerlach and Gregory Crane, 2008. Iden- tifying Quotations in Reference Works and Primary Materials, pages 78-87.
  173. C. Lee Giles Isaac Councill and Min-Yen Kan. 2008. Parscit: an open-source crf reference string pars- ing package. In Bente Maegaard Joseph Mari- ani Jan Odjik Stelios Piperidis Daniel Tapias Nico- letta Calzolari (Conference Chair), Khalid Choukri, editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Re- sources Association (ELRA). http://www.lrec- conf.org/proceedings/lrec2008/.
  174. Okan Kolak and Bill N. Schilit. 2008. Generating links by mining quotations. In Proceedings of the nine- teenth ACM conference on Hypertext and hyperme- dia, pages 117-126, Pittsburgh, PA, USA. ACM.
  175. John Lafferty, Andrew Mccallum, and Fernando Pereira. 2001. Conditional random fields: Prob- abilistic models for segmenting and labeling se- quence data. In Proc. 18th International Conf. on Machine Learning, pages 289, 282. Morgan Kauf- mann, San Francisco, CA.
  176. Frank Lester. 2007. Backlinks: Alternatives to the citation index for determining impact. Journal of Electronic Publishing, 10(2).
  177. Andrew Kachites McCallum. 2002. MAL- LET: a machine learning for language toolkit. http://mallet.cs.umass.edu.
  178. Matteo Romanello. 2007. A semantic linking sys- tem for canonical references to electronic corpora. Prague. to be next published in the proceedings of the ECAL 2007 Electronic Corpora of Ancient Lan- guages, held in Prague November 2007.
  179. Matteo Romanello. 2008. A semantic linking frame- work to provide critical value-added services for e- journals on classics. In Susanna Mornati and Leslie Chan, editors, ELPUB2008. Open Scholarship: Au- thority, Community, and Sustainability in the Age of Web 2.0 -Proceedings of the 12th International Con- ference on Electronic Publishing held in Toronto, Canada 25-27 June 2008 / Edited by: Leslie Chan and Susanna Mornati.
  180. David A. Smith and Gregory Crane. 2001. Disam- biguating geographic names in a historical digital li- brary. In ECDL '01: Proceedings of the 5th Euro- pean Conference on Research and Advanced Tech- nology for Digital Libraries, pages 127-136, Lon- don, UK. Springer-Verlag.
  181. Neel Smith. 2009. Citation in classical studies. Digital Humanities Quarterly, 3(1).
  182. Shannon Bradshaw. 2003. Reference directed index- ing: Redeeming relevance for subject search in cita- tion indexes. In Proceedings of the 7th ECDL, pages 499-510.
  183. Eugene Garfield, Irving H. Sher, and Richard J. Torpie. 1964. The use of citation data in writing the his- tory of science. Institute for Scientific Information, Philadelphia, Pennsylvania.
  184. Thorsten Joachims. 1999. Making large-scale sup- port vector machine learning practical. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexan- der J. Smola, editors, Advances in kernel methods: support vector learning, pages 169-184. MIT Press, Cambridge, MA, USA.
  185. Dain Kaplan and Takenobu Tokunaga. 2008. Sighting citation sites: A collective-intelligence approach for automatic summarization of research papers using c-sites. In ASWC 2008 Workshops Proceedings.
  186. Andrew Kehler. 2004. The (non)utility of predicate- argument frequencies for pronoun interpretation. In In: Proceedings of 2004 North American chapter of the Association for Computational Linguistics an- nual meeting, pages 289-296.
  187. M. M. Kessler. 1963. Bibliographic coupling be- tween scientific papers. American Documentation, 14(1):10-25.
  188. LDC2001T02. 2001. Message understanding confer- ence (MUC) 7.
  189. Daniel Marcu. 2000. The rhetorical parsing of unre- stricted texts: A surface-based approach. Computa- tional Linguistics, 26(3):395-448.
  190. Hidetsugu Nanba, Noriko Kando, and Manabu Oku- mura. 2000. Classification of research papers using citation links and citation types: Towards automatic review article generation. In Proceedings of 11th SIG/CR Workshop, pages 117-134.
  191. Hidetsugu Nanba, Takeshi Abekawa, Manabu Oku- mura, and Suguru Saito. 2004. Bilingual presri inte- gration of multiple research paper databases. In Pro- ceedings of RIAO 2004, pages 195-211, Avignon, France.
  192. Vincent Ng and Claire Cardie. 2002. Improving ma- chine learning approaches to coreference resolution. In Proceedings of the 40th Annual Meeting on Asso- ciation for Computational Linguistics, pages 104- 111.
  193. J. Nie. 2002. Towards a unified approach to clir and multilingual ir. In In: Workshop on Cross Language Information Retrieval: A Research Roadmap in the 25th Annual International ACM SIGIR Conference on Research and Development in Information Re- trieval, pages 8-14.
  194. Masaki Noguchi, Kenta Miyoshi, Takenobu Tokunaga, Ryu Iida, Mamoru Komachi, and Kentaro Inui. 2008. Multiple purpose annotation using SLAT - Segment and link-based annotation tool -. In Pro- ceedings of 2nd Linguistic Annotation Workshop, pages 61-64, May.
  195. John O'Connor. 1982. Citing statements: Computer recognition and use to improve retrieval. Informa- tion Processing & Management., 18(3):125-131.
  196. Vahed Qazvinian and Dragomir R. Radev. 2008. Sci- entific paper summarization using citation summary networks.
  197. Anna Ritchie, Simone Teufel, and Stephen Robertson. 2006. How to find better index terms through cita- tions. In Proceedings of the Workshop on How Can Computational Linguistics Improve Information Re- trieval?, pages 25-32, Sydney, Australia, July. As- sociation for Computational Linguistics.
  198. Anna Ritchie, Stephen Robertson, and Simone Teufel. 2008. Comparing citation contexts for informa- tion retrieval. In CIKM '08: Proceedings of the 17th ACM conference on Information and knowl- edge management, pages 213-222, New York, NY, USA. ACM.
  199. Serge Sharoff. 2006. Creating general-purpose cor- pora using automated search engine queries. In WaCky! Working papers on the Web as Corpus. Gedit.
  200. H. Small. 1973. Co-citation in the scientific literature: A new measure of the relationship between two doc- uments. JASIS, 24:265-269.
  201. Wee Meng Soon, Daniel Chung, Daniel Chung Yong Lim, Yong Lim, and Hwee Tou Ng. 2001. A machine learning approach to coreference resolu- tion of noun phrases. Computational Linguistics, 27(4):521-544.
  202. Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. Automatic classification of citation function. In In Proceedings of EMNLP-06.
  203. Sandra A. Thompson and William C. Mann. 1987. Rhetorical structure theory: A framework for the analysis of texts. Pragmatics, 1(1):79-105.
  204. Vladimir N. Vapnik. 1998. Statistical Learning The- ory. Adaptive and Learning Systems for Signal Pro- cessing Communications, and control. John Wiley & Sons.
  205. Web-Scale NLP 2008. 2008. http: //research.microsoft.com/ur/asia/ research/NLP.aspx.
  206. M. Weinstock. 1971. Citation indexes. Encyclopedia of Library and Information Science, 5:16-41.
  207. Ying Zhang, Fei Huang, and Stephan Vogel. 2005. Mining translations of oov terms from the web through. In International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 03), pages 669-670.