Researcher affiliation extraction from homepages

István Nagy; Richárd Farkas; Márk Jelasity

doi:10.3115/1699750.1699752

Researcher affiliation extraction from homepages

István T. Nagy

2009, Proceedings of the 2009 Workshop on …

https://doi.org/10.3115/1699750.1699752

visibility

…

description

107 pages

link

1 file

Abstract
AI

This research focuses on extracting researcher affiliation information from personal homepages using advanced natural language processing techniques. It highlights the importance of analyzing unstructured text found on these homepages, which often contains valuable academic information. The study presents methodologies for identifying key sentences and relationships, contributing to the field of scholarly data analysis.

Figures (52)

Table 1: The size of the textual corpus which con- tains affiliation information. freely available for non-commercial use*.

Table 4: Accuracies of subject detection methods. To find predicated relationships among the other types of entities (affiliation, position type, start year, end year) we used a very simple heuristic. As the af { we simply filiation slot is the head of the tuple assigned every other detected entity to the nearest affiliation and regarded the ear- lier preidcated year token as the start year.

Table 2: The results achieved by CRF. evaluation scheme, while Table 3 lists the results of a baseline method which labels each member of the university and position type gazetteers and identifies years using regular expressions. This comparison highlights the fact that labeling each occurrences of this easily recognisable classes cannot be applied. It gives an extremely low pre- cision thus contextual information has to be lever- aged.

Table 3: NER baseline results. 4.5 The assignment of roles

Figure 2. The main process of extracting (a) anchor text in general web search and (b) pseudo-anchor text in academic search Before describing our approach in detail, we first recall how anchor text is processed in general Web search. Assume that there have been a col- lection of documents being crawled and stored on local disk. In the first step, each web page is parsed and the out links (or forward links) within the page are extracted. Each link is comprised of a URL and its corresponding anchor text. In the second step, all links are accumulated according to their destination URLs (i.e. the anchor texts of all links pointed to the same URL are merged). Thus, we can get all anchor text corresponding to each web page. Figure-2 (a) demonstrates this process.

for each term in the anchor blocks, a discrete de- gree of being anchor text. The main reasons for taking such an approach is twofold: First, we believe that assigning each term a fuzzy degree of being anchor text is more appropriate than a binary judgment as either an anchor-term or non- anchor-term. Second, since the importance of a term for a “link” may be determined by many factors in paper search, a machine-learning could be more flexible and general than the approaches that compute term degrees by a specially de- signed formula.

Figure 6 shows the performance comparison be- tween the results of two baseline paper ranking algorithms and the results of including pseudo- anchor text in ranking.

Table 2. Statistical significance tests (t-test over nDCG@3) From Figure 6, we can see that the overall per- formance is greatly improved by including pseu- do-anchor information. Table 2 shows the t-test results, where a “>” indicates that the algorithm in the row outperforms that in the column with a p-value of 0.05 or less, and a “>>” means a p- value of 0.01 or less.

Table 5. Performance comparison between different feature strings and slot size thresholds We take all the papers extracted from PDF files as input to run the algorithm. Identical TP- URLs are first eliminated (therefore their candi- date anchor blocks are merged) by utilizing a hash table. This pre-process step results in about 1.46 million distinct TP-URLs. The number is larger than our collection size (0.9 million), be- cause some cited papers are not in our paper col- lection. We tested four kinds of feature strings all of which are generated from paper title: uni- grams, bigrams, trigrams, and 4-grams. Table-4 shows the slot size distribution corresponding to each kind of feature strings. The performance comparison among different feature strings and slot size thresholds is shown in Table 5. It seems that bigrams achieve a good trade-off between accuracy and performance.

Table 4. Slot distribution with different feature strings ture strings. When feature strings are fixed, the slot size threshold can be used to tune the tra- deoff between accuracy and performance.

Table 1: Teufel’s (1999) Argumentative Zones 3 Maximum Entropy models 2 Argumentative Zoning Teufel (1999) introduced a new rhetorical analy- sis for scientific texts called Argumentative Zon- ing. Each sentence of an article from the scien- tific literature is classified into one of seven basic rhetorical structures shown in Table 1. Maximum entropy (ME) or log-linear models are statistical models that can incorporate evidence from a diverse range of complex and potentially overlapping features. Unlike Naive Bayes (NB), the features can be conditionally dependent given the class, which is important since feature sets in NLP rarely satisfy this independence constraint.

large difference to classification accuracy.

Table 3: Teufel and Moens (2002)’s and our NB performance on CMP-LG Table 4: History features on the CMP-LG corpus with ME model of unigram/bigram features only

Table 5: Subtractive analysis CMP-LG ME model

Table 8: Subtractive analysis ASTRO ME model Table 7: Final CMP-LG ME performance

Table 9: Final ASTRO ME model performance Figure 1: Examples of sentences with the given tags in the astronomical corpus

Table 10: Comparing CMP-LG and ASTRO directly on the basic annotation scheme Table 10 compares the performance of our Naive Bayes and Maximum Entropy classifiers on the two corpora for just the basic annotation scheme: Background, Own and Other. The fea- tures used are the set of Teufel features we have implemented (so it does not include unigram or bigram features).

The goal of our study is to classify English re- search papers (Language L1=English, Genre Gl=research papers) into a patent classification using a patent data set written in Japanese (Lan- guage L2=Japanese, Genre G2=patents). Figure 2 shows the system configuration. Our system is comprised of a "Japanese index creating module" and a "document classification module". In the following, we explain both modules. When a title and abstract pair, as shown in Figure 3, is given, the module creates a Japanese index, shown in Figure 4’, using translation models for research papers and for patents.

We proposed several methods that automatically classify research papers into the IPC system us- ing two translation models. To confirm the effec- tiveness of our method, we conducted some ex- aminations using the data of the NTCIR-7 Patent Mining Task. The results showed that one of our methods "SMT(Paper)+Index(Patent)" obtained a MAP score of 0.2897. This score was higher than that of "SMT(Paper)", which used transla- tion results by the translation model for research papers, and this indicates that our method is ef- fective for cross-genre, cross-lingual document classification. Table 3: Recall for top n results (SMT(Paper)+Index(Patent))

NTCIR-7, Proceedings of the 7 NTCIR Workshop Meeting: 351-353. 5 Conclusion

Table 2. Causes of silence: 1.Incorrect analysis by the parser; 2.Inadequacy of the framework for the task; 3. Not SUMMARY or PROBLEM sen- tence according to our definition

Figure 1: Principal information needs and tasks of participants with regard to citations. In the first table, information needs are prefixed by ‘md’ for meta-data and ‘co’ forcontent-oriented. ‘Freq’ in- dicates the number of occurrences in the results.

Figure 5: A sample pop-up with an automatically generated summary, triggered by a mouse action ove the citation. Extracted sentences are grouped together by section titles. Words that match with the citation context are coloured and emboldened.

Figure 1: CGI interface used for matching new references to existing papers Author: Och, Franz Josef ymaster’s Note: The whole dataset is available Here. Please download the dataset instead of crawling the website

Figure 3: Snapshot of the different statistics for a paper

Table 2: Network Statistics of the cita- tion and collaboration network. The re- maining authors (11,180-10,409) are not cited and are therefore removed from the network analysis

Table 3: Degree Statistics of the citation and collaboration networks

Table 7: Authors with the highest h- ndex Table 6: Authors with most incoming citations (the values in parentheses are us- ing non-self- citations)

Table 8: Authors with the least average shortest path (ASP) length in the author collaboration network

of the author. We have been able to annotate 8,578 authors this way: 6,396 male and 2,182 female. The citation text that we have extracted for each paper is a good resource to generate summaries of the contributions of that paper. We have previously developed systems using clustering the similarity networks to generate short, and yet informative, summaries of in- dividual papers (Qazvinian and Radev 2008), and more general scientific topics, such as Dependency Parsing, and Machine Transla- tion (Radev et al. 2009) .

Figure 2: Digital library interface with faceted navigation, continued, from http://berkeley.worldcat.org .

Figure 1: Worldcat consortium digital library interface using faceted navigation. The instance shown is the University of California version, from http://berkeley.worldcat.org .

Figure 3: University of Chicago digital library interface using faceted navigation, using an interface from A quaBrowser.

Figure 1: A web page with a list of references. Paper titles are displayed in bold. {hongchin, jprabawa, kanmy}@comp.nus.edu.sg

Tests were conducted using these 40 pages to obtain the reference string recognition algorithm’s accuracy. A reference string is considered found if there exists, in the set of confirmed reference strings C’, a parsed text segment c that contains the entire title as well as all the authors’ names. Each parsed text segment can only be used to identify one reference string, so if any text segments con- tain more than one reference string, only one of those reference strings will be considered found. those reference strings will be considered found. In order to determine the effect of each stage on overall recognition accuracy, some stages of the recognition algorithm were disabled in testing. The results are presented in Table 2. As all test pages come from university domains, all pass the first URL test. When the keyword search is deac- tivated, all 40 test pages pass Stage 1. Otherwise, 19 pages with reference strings and 6 pages with- out reference strings pass Stage 1. rr. _ as: tse a: aL Ha 4° yy. _- eo 4° "74 4

Table 1: List of classifier features information about the token; 3) Contextual fea- tures, which are lexical or local features of a to- ken’s neighbours. Table 1 gives an exhaustive list of features used in FireCite.

Figure 5: Screenshot of FireCite prototype illus- trating (a) the reference string library, (b) button appended to each reference string, and (c) button state after the reference string has been added to the list.

Table 4: Performance evaluation of the system.

Table 1: Corpus composition To our knowledge, this is the first corpus con- structed in the context of paper summarization re- ated ta anllaatianne Af eittne Ransre 4 To our knowledge, this is the first corpus con- We then linked each c-site to its anchor, each an- chor to its reference, and any background informa- tion to the c-site supplemented. We also decided on annotating entire sentences, even if only part of a sentence referred to the cited paper. Table 1 outlines our corpus. Analysis of the corpus provided some interest- ing insights, though a larger corpus is required to confirm the frequency and validity of such phe- nomena. The more salient discoveries are item- ized below. These phenomena may also co-occur. Analysis of the corpus provided some interest-

Table 2: Evaluation results for coreference resolution against the MUC-7 formal corpus. salient for increased performance. We also ex- tended this list by adding a cosine-similarity met- ric between two noun phrases; it uses bag-of- words to create a vector for each noun phrase (where each word is a term in the vector) to com- pute their similarity. The intuition behind this is that noun phrases with more similar surface forms should be more likely to corefer. resolution with coreference-chains. This is be- cause coreference-chains match noun phrases that appear with other noun phrases to which they re- fer, a characteristic present in these three cate- gories. On the other hand, cue-phrases do not detect any c-site sentence that does not use key- words (e.g. “In addition’). In the following sec- tion we discuss our implementation of a corefer- ence chain-based extraction technique, and how we then applied it to the c-site extraction task. An analysis of the results then follows. words to create a vector for each noun phrase

Table 3: Features used for coreference resolution.

Table 4: Evaluation results for c-site extraction w/o background information

References (207)

Brad Adelberg. 1998. Nodose -a tool for semi- automatically extracting structured and semistruc- tured data from text documents. ACM SIGMOD, 27(2):283-294.
Javier Artiles, Julio Gonzalo, and Satoshi Sekine. 2009. Weps 2 evaluation campaign: overview of the web people search clustering task. In 2nd Web Peo- ple Search Evaluation Workshop (WePS 2009), 18th WWW Conference.
A. L. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schu- bert, and T. Vicsek. 2002. Evolution of the so- cial network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3- 4):590 -614.
Kedar Bellare, Partha Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCal- lum, and Mark Dredze. 2007. Lightly-supervised attribute extraction for web search. In Proceedings of NIPS 2007 Workshop on Machine Learning for Web Search.
Mary Elaine Califf and Raymond J. Mooney. 1999. Relational learning of pattern-match rules for in- formation extraction. In Proceedings of the Six- teenth National Conference on Artificial Intelli- gence, pages 328-334.
Xiwen Cheng, Peter Adolphs, Feiyu Xu, Hans Uszko- reit, and Hong Li. 2009. Gossip galore -a self- learning agent for exchanging pop trivia. In Pro- ceedings of the Demonstrations Session at EACL 2009, pages 13-16, Athens, Greece, April. Associa- tion for Computational Linguistics.
Dayne Freitag. 1998. Information extraction from html: Application of a general machine learning ap- proach. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 517- 523.
A. A Goodrum, K. W McCain, S. Lawrence, and C. L Giles. 2001. Scholarly publishing in the internet age: a citation analysis of computer science liter- ature. Information Processing and Management, 37:661-675, September.
Raymond Kosala and Hendrik Blockeel. 2000. Web mining research: A survey. SIGKDD Explorations, 2:1-15.
Nicholas Kushmerick. 2000. Wrapper induction: Ef- ficiency and expressiveness. Artificial Intelligence, 118:2000.
John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Prob- abilistic models for segmenting and labeling se- quence data. In Proc. 18th International Conf. on Machine Learning, pages 282-289. Morgan Kauf- mann, San Francisco, CA.
Bing Liu and Kevin Chen-Chuan-Chang. 2004. Edito- rial: special issue on web content mining. SIGKDD Explor. Newsl., 6(2):1-4.
Andrew Kachites McCallum. 2002. Mal- let: A machine learning for language toolkit. http://mallet.cs.umass.edu.
M. E. J. Newman. 2001. The structure of scientific collaboration networks. In Proceedings National Academy of Sciences USA, pages 404-418.
Marius Pas ¸ca. 2009. Outclassing Wikipedia in open- domain information extraction: Weakly-supervised acquisition of attributes over conceptual hierarchies. In Proceedings of the 12th Conference of the Eu- ropean Chapter of the ACL (EACL 2009), Athens, Greece, March.
Celine Robardet and Eric Fleury. 2009. Communi- ties detection and the analysis of their dynamics in collaborative networks. Int. J. Web Based Commu- nities, 5(2):195-211.
Yasmin H. Said, Edward J. Wegman, Walid K. Shara- bati, and John T. Rigsby. 2008. Social networks of author-coauthor relationships. Computational Statistics & Data Analysis, 52(4):2177-2184.
Satoshi Sekine. 2006. On-demand information ex- traction. In Proceedings of the COLING/ACL 2006
Main Conference Poster Sessions, pages 731-738, Sydney, Australia, July. Association for Computa- tional Linguistics.
György Szarvas, Richárd Farkas, and András Kocsor. 2006. A multilingual named entity recognition sys- tem using boosting and c4.5 decision tree learning algorithms. DS2006, LNAI, 4265:267-278.
Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Dis- course and Dialogue, pages 80-87, Sydney, Aus- tralia, July. Association for Computational Linguis- tics.
Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Walter Daelemans and Miles Osborne, editors, Pro- ceedings of CoNLL-2003, pages 142-147. Edmon- ton, Canada.
Y. Yang, C. M. Au Yeung, M. J. Weal, and H. Davis. 2009. The researcher social network: A social net- work based on metadata of scientific publications.
Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan Chang. 2002. Pebl: positive example based learn- ing for web page classification using svm. In KDD '02: Proceedings of the eighth ACM SIGKDD in- ternational conference on Knowledge discovery and data mining, pages 239-248, New York, NY, USA. ACM.
E. Amitay. 1998. Using common hypertext links to identify the best phrasal description of target web documents. In Proc. of the SIGIR'98 Post Confe- rence Workshop on Hypertext Information Re- trieval for the Web, Melbourne, Australia.
G. Attardi, A. Gulli, and F. Sebastiani. 1999. Theseus: categorization by context. In Proceedings of the 8th International World Wide Web Conference.
A. Baxter, P. Christen, T. Churches. 2003. A compar- ison of fast blocking methods for record linkage. In ACM SIGKDD'03 Workshop on Data Cleaning, Record Linkage and Object consolidation. Wash- ington DC.
A. Broder, S. Glassman, M. Manasse, and G. Zweig. 1997. Syntactic clustering of the Web. In Proceed- ings of the Sixth International World Wide Web Conference, pp. 391-404.
C.J.C. Burges. 1998. A tutorial on support vector ma- chines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121-167.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. 1998. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference.
K. Chakrabarti, V. Ganti, J. Han, and D. Xin. 2006. Ranking objects based on relationships. In SIG- MOD '06: Proceedings of the 2006 ACM SIG- MOD international conference on Management of data, pages 371-382, New York, NY, USA. ACM.
B. Davison. 2000. Topical locality in the web. In SI- GIR'00: Proceedings of the 23rd annual interna- tional ACM SIGIR conference on Research and development in information retrieval, pages 272- 279, New York, NY, USA. ACM.
I.P. Fellegi, and A.B. Sunter. A Theory for Record Linkage, Journal of the American Statistical Asso- ciation, 64, (1969), 1183-1210.
C. L. Giles, K. Bollacker, and S. Lawrence. 1998. CiteSeer: An automatic citation indexing system. In IanWitten, Rob Akscyn, and Frank M. Shipman III, editors, Digital Libraries 98 -The Third ACM Conference on Digital Libraries, pages 89-98, Pittsburgh, PA, June 23-26. ACM Press.
T.H. Haveliwala, A. Gionis, D. Klein, and P. Indyk. 2002. Evaluating strategies for similarity search on the web. In WWW '02: Proceedings of the 11th in- ternational conference on World Wide Web, pages 432-442, New York, NY, USA. ACM.
K. Jarvelin, and J. Kekalainen. 2000. IR Evaluation Methods for Retrieving Highly Relevant Docu- ments. In Proceedings of the 23rd Annual Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval (SI- GIR2000).
S. Lawrence, C.L. Giles, and K. Bollacker. 1999. Dig- ital libraries and Autonomous Citation Indexing. IEEE Computer, 32(6):67-71.
A. McCallum, K. Nigam, J. Rennie, and K. Seymore. 1999. Building Domain-specific Search Engines with Machine Learning Techniques. In Proceed- ings of the AAAI-99 Spring Symposium on Intelli- gent Agents in Cyberspace.
A. McCallum, K. Nigam, and L. Ungar. 2000. Effi- cient clustering of high-dimensional data sets with application to reference matching. In Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discov- ery and Data Mining.
O.A. McBryan. 1994. Genvl and wwww: Tools for taming the web. In In Proceedings of the First In- ternational World Wide Web Conference, pages 79-90.
H. Nanba, M. Okumura. 1999. Towards Multi-paper Summarization Using Reference Information. In Proc. of the 16 th International Joint Conference on Artificial Intelligence, pp.926-931.
H. Nanba, T. Abekawa, M. Okumura, and S. Saito. 2004. Bilingual PRESRI: Integration of Multiple Research Paper Databases. In Proc. of RIAO 2004, 195-211.
L. Parsons, E. Haque, H. Liu. 2004. Subspace cluster- ing for high dimensional data: a review. SIGKDD Explorations 6(1): 90-105.
S.E. Robertson, S. Walker, and M. Beaulieu. 1999. Okapi at TREC-7: automatic ad hoc, filtering, VLC and filtering tracks. In Proceedings of TREC'99.
S. Shi, R. Song, and J-R Wen. 2006. Latent Additivity: Combining Homogeneous Evidence. Technique report, MSR-TR-2006-110, Microsoft Research, August 2006.
S. Shi, F. Xing, M. Zhu, Z.Nie, and J.-R. Wen. 2006. Pseudo-Anchor Extraction for Search Vertical Ob- jects. In Proc. of the 2006 ACM 15th Conference on Information and Knowledge Management. Ar- lington, USA.
Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. 2005. Object-level ranking: bringing order to web objects. InWWW'05: Proceedings of the 14th international conference on World Wide Web, pages 567-574, New York, NY, USA. ACM. References arXiv. 2005. arxiv.org archive. http://arxiv.org.
R. Barzilay, K.R. McKeown, and M. Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th an- nual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 550-557. Association for Computational Linguistics Morristown, NJ, USA.
R. Brandow, K. Mitze, and L.F. Rau. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing and management, 31(5):675-685.
A.L. Brown, J.D. Day, and R.S. Jones. 1983. The development of plans for summarizing texts. Child Development, pages 968-979.
Stanley Chen and Ronald Rosenfeld. 1999. A Gaus- sian prior for smoothing maximum entropy models. Technical report, Carnegie Mellon University, Pitts- burgh, PA.
James R. Curran and Stephen Clark. 2003. Investigat- ing GIS and smoothing for maximum entropy tag- gers. In Proceedings of the 10th Conference of the European Chapter of the Association for Computa- tional Linguistics, pages 91-98, Budapest, Hungary, 12-17 April.
E. Frank, M.A. Hall, G. Holmes, R. Kirkby, B. Pfahringer, I.H. Witten, and L. Trigg. 2005. Weka-a machine learning workbench for data min- ing. The Data Mining and Knowledge Discovery Handbook, pages 1305-1314.
M. Johnson, S. Geman, S. Canon, Z. Chi, and S. Rie- zler. 1999. Estimators for stochastic 'unification- based' grammars. In Proceedings of the 37th Meet- ing of the ACL, pages 535-541, University of Mary- land, MD.
W. Kintsch and T.A. Van Dijk. 1978. Toward a model of text comprehension and production. Psychologi- cal review, 85(5):363-94.
K. Knight and D. Marcu. 2000. Statistics-based summarization-step one: Sentence compression. In Proceedings of the National Conference on Artifi- cial Intelligence, pages 703-710. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
J. Kupiec, J. Pedersen, and F. Chen. 1995. A train- able document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information re- trieval, pages 68-73. ACM New York, NY, USA.
M. Marcus, B. Santorini, and M. Marcinkiewicz. 1993. Building a large annotated corpus of english: The penn treebank.
T. Murphy, T. McIntosh, and J.R. Curran. 2006. Named entity recognition for astronomy literature. In Proceedings of the 2006 Australasian Language Technology Workshop (ALTW).
K. Nigam, J. Lafferty, and A. McCallum. 1999. Us- ing maximum entropy for text classification. In Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61-67, Stockholm, Sweden.
Adwait Ratnaparkhi. 1996. A maximum entropy part- of-speech tagger. In Proceedings of the EMNLP Conference, pages 133-142, Philadelphia, PA.
J.C. Reynar and A. Ratnaparkhi. 1997. A maximum entropy approach to identifying sentence bound- aries. In Proceedings of the fifth conference on Ap- plied natural language processing, pages 16-19.
R. Rosenfeld. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10:187-228.
J.M. Swales. 1990. Genre analysis: English in aca- demic and research settings. Cambridge University Press.
S. Teufel and M. Moens. 2002. Summarising scientific articles -experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409-445.
S. Teufel, J. Carletta, and M. Moens. 1999. An anno- tation scheme for discourse-level argumentation in research articles. In Proceedings of EACL 1999.
S. Teufel, A. Siddharthan, and D. Tidhar. 2006. Auto- matic classification of citation function. In Proceed- ings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 103-110.
S. Teufel. 1999. Argumentative zoning: Information extraction from scientific text. Ph.D. thesis, Univer- sity of Edinburgh, Edinburgh, UK.
Guo-Wei Bian and Shun-Yuan Teng. 2008. Integrat- ing Query Translation and Text Classification in a Cross-Language Patent Access System, Proceeding of the 7 th TCIR Workshop Meeting: 341-346.
Stephane Clinchant and Jean-Michel Renders. 2008. XRCE's Participation to Patent Mining Task at NTCIR-7, Proceedings of the 7 th TCIR Workshop Meeting: 351-353.
Atsushi Fujii, Makoto Iwayama, and Noriko Kando. 2004. Overview of Patent Retrieval Task at NTCIR-4, Working otes of the 4 th TCIR Work- shop: 225-232.
Atsushi Fujii, Makoto Iwayama, and Noriko Kando. 2005. Overview of Patent Retrieval Task at NTCIR-5, Proceedings of the 5 th TCIR Workshop Meeting: 269-277.
Atsushi Fujii, Makoto Iwayama, and Noriko Kando. 2007. Overview of the Patent Retrieval Task at NTCIR-6 Workshop, Proceedings of the 6 th TCIR Workshop Meeting: 359-365.
Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, and Takehito Utsuro. 2008. Overview of the Patent Translation Task at the NTCIR-7 Workshop, Pro- ceedings of the 7 th TCIR Workshop Meeting: 389- 400.
Marti A. Hearst. 1992. Automatic Acquisition of Hy- ponyms from Large Text Corpora, Proceedings of the 14 th International Conference on Computation- al Linguistics: 539-545.
Daisuke Ikeda, Toshiaki Fujiki, and Manabu Okumu- ra. 2006. Automatically Linking News Articles to Blog Entries, Proceedings of AAAI Spring Sympo- sium Series Computational Approaches to Analyz- ing Weblogs: 78-82.
Masaki Itagaki, Takako Aikawa, and Xiaodong He. 2007. Automatic Validation of Terminology Trans- lation Consistency with Statistical Method, Pro- ceedings of MT summit XI: 269-274.
Hideo Itoh, Hiroko Mano, and Yasushi Ogawa. 2002. Term Distillation for Cross-db Retrieval, Working otes of the 3 rd TCIR Workshop Meeting, Part III: Patent Retrieval Task: 11-14.
Makoto Iwayama, Atsushi, Fujii, Noriko Kando, and Akihiko Takano. 2002. Overview of Patent Re- trieval Task at NTCIR-3, Working otes of the 3 rd TCIR Workshop Meeting, Part III: Patent Re- trieval Task: 1-10.
Makoto Iwayama, Atsushi Fujii, and Noriko Kando. 2005. Overview of Classification Subtask at NTCIR-5 Patent Retrieval Task, Proceedings of the 5 th TCIR Workshop Meeting: 278-286.
Makoto Iwayama, Atsushi Fujii, and Noriko Kando. 2007. Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task, Proceedings of the 6 th TCIR Workshop Meeting: 366-372.
Noriko Kando, Kazuko Kuriyama, Toshihiko Nozue, Koji Eguchi, Hiroyuki Kato, and Soichiro Hidaka. 1999. Overview of IR Tasks at the first NTCIR Workshop, Proceedings of the 1 st TCIR Work- shop on Research in Japanese Text Retrieval and Term Recognition: 11-44.
Noriko Kando, Kazuko Kuriyama, and Makoto Yo- shioka. 2001. Overview of Japanese and English Information Retrieval Tasks (JEIR) at the Second NTCIR Workshop, Proceedings of the 2 nd TCIR Workshop Meeting: 4-37 -4-60.
Hisao Mase and Makoto Iwayama. 2008. NTCIR-7 Patent Mining Experiments at Hitachi, Proceedings of the 7 th TCIR Workshop Meeting: 365-368.
Hidetsugu Nanba. 2007. Query Expansion using an Automatically Constructed Thesaurus, Proceedings of the 6 th TCIR Workshop Meeting: 414-419.
Hidetsugu Nanba, Natsumi Anzen, and Manabu Okumura:a. 2008. Automatic Extraction of Citation Information in Japanese Patent Applications, Inter- national Journal on Digital Libraries, 9(2): 151- 161.
Hidetsugu Nanba, Atsushi Fujii, Makoto Iwayama, and Taiichi Hashimoto:b. 2008. Overview of the Patent Mining Task at the NTCIR-7 Workshop, Proceedings of the 7 th TCIR Workshop Meeting: 325-332.
Hidetsugu Nanba:c. 2008. Hiroshima City University at NTCIR-7 Patent Mining Task. Proceedings of the 7 th TCIR Workshop Meeting: 369-372.
Hidetsugu Nanba, Hideaki Kamaya, Toshiyuki Take- zawa, Manabu Okumura, Akihiro Shinmori, and Hidekazu Tanigawa. 2009. Automatic Translation of Scholarly Terms into Patent Terms, Journal of Information Processing Society Japan TOD, 2(1): 81-92. (in Japanese)
Gerald Salton. 1971. The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ.
Masatsugu Tonoike. Mitsuhiro Kida, Toshihiro Taka- gi, Yasuhiro Sakai, Takehito Utsuro, and Satoshi Sato. 2005. Translation Estimation for Technical Terms using Corpus Collected from the Web, Pro- ceedings of the Pacific Association for Computa- tional Linguistics: 325-331.
Salah Aït-Mokhtar, Jean-Pierre Chanod, and Claude Roux. 2002. Robustness beyond shallowness: in- cremental dependency parsing. Natural Language Engineering, 8(2/3):121-144.
Robin Barrow. 2008. Education and the Body: Prole- gomena. British Journal of Educational Studies 56(3):272-285.
Lutz Bornmann and Hans-Dieter Daniel. 2003. Begu- tachtung durch Fachkollegen in der Wissenschaft. Stand der Forschung zur Reliabilität, Fairness und Validität des Peer-Review-Verfahrens. Universität auf dem Prüfstand. Konzepte und Befunde der Hochschulforschung. (S. Schwarz and U. Teichler, Eds.). Campus Verlag Frankfurt/New York: 207- 225.
Richard Holmes. 1997. Genre analysis, and the social sciences: An investigation of the structure of re- search article discussion sections in three disci- plines. English for Specific Purposes, 16(4):321- 337.
Noriko Kando. 1997. Text-level structure of research papers: Implications for text-based information processing systems. Proceedings of the 19th Brit- ish Computer Society Annual Colloquium of Infor- mation Retrieval Research, Sheffield University, Sheffield, UK, 68-81.
Elizabeth D. Liddy. 1991. The discourse-level struc- ture of empirical abstracts: an exploratory study. Information Processing and Management, 27(1):55-81.
Frédérique Lisacek, Christine Chichester, Aaron Kap- lan, and Ágnes Sandor. 2005. Discovering para- digm shift patterns in biomedical abstracts: appli- cation to neurodegenerative diseases. First Interna- tional Symposium on Semantic Mining in Biomedi- cine, Cambridge, UK, April 11-13, 2005.
Yanping Lu. 2005. Editorial Peer Review in Educa- tion: Mapping the Field. Australian Association for Research in Education 2004 conference papers, Melbourne, Australia (Jeffery, P. L., Ed.):1-19.
Yanping Lu. 2008. Peer review and its contribution to manuscript quality: an Australian perspective. Learned Publishing, 21(3):307-316.
Eric G. Meinberg and Peter J. Stern. 2003. Incidence of Wrong-Site Surgery Among Hand Surgeons. The Journal of Bone and Joint Surgery (American) 85:193-197.
Yoko Mizuta, Anna Korhonen, Tony Mullen, and Nigel Collier. 2006. Zone analysis in biology arti- cles as a basis for information extraction. Interna- tional Journal of Medical Informatics, 75(6):468- 87.
Michaela Montesi and John Mackenzie Owen. 2008. Research journal articles as document genres: ex- ploring their role in knowledge organization. Jour- nal of Documentation, 64(1):143-167.
Robert N. Oddy, Elizabeth D. Liddy, Bhaskaran Balakrishnan, Ann Bishop, Joseph Elewononi and Eileen Martin. 1992. Towards the use of situational information in information retrieval. Journal of Documentation, 48(2):123-171.
Yang Ruiying and Desmond Allison. 2004. Research articles in applied linguistics: structures from a functional perspective. English for Specific Pur- poses, 23(3):264-279.
Ágnes Sándor, Aaron Kaplan and Gilbert Rondeau. 2006. Discourse and citation analysis with concept- matching. International Symposium: Discourse and document (ISDD), Caen, France, June 15- 16, 2006.
Ágnes Sándor. 2007. Modeling metadiscourse con- veying the author's rhetorical strategy in biomedi- cal research abstracts. Revue Française de Linguis- tique Appliquée 200(2):97-109.
Ágnes Sándor. 2009. Automatic detection of dis- course indicating emerging risk. To appear in Critical Approaches to Discourse Analysis across Disciplines. Risk as Discourse -Discourse as Risk: Interdisciplinary perspectives.
Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409-445.
Victoria Uren, Simon Buckingham Shum, Clara Mancini, and Gangmin Li. 2007. Modelling Natu- ralistic Argumentation in Research Literatures: Representation and Interaction Design Issues. In- ternational Journal of Intelligent Systems, (Special Issue on Computational Models of Natural Argu- ment, Eds: C. Reed and F. Grasso), 22(1):17-47.
Richard Whitley and Jochen Gläser. 2007. The Changing Governance of Sciences: The Advent Of Research Evaluation Systems. Springer References
Joan C. Bartlett and Tomasz Neugebauer. 2008. A task-based information retrieval interface to support bioinformatics analysis. In IIiX '08: Proceedings of the second international symposium on Information interaction in context, pages 97-101, New York, NY, USA. ACM.
Nicholas J. Belkin. 1994. Design principles for electronic textual resources: Investigating users and uses of scholarly information. In Current Issues in Computational Linguistics: In Honour of Donald Walker.Kluwer, pages 1-18. Kluwer.
Katriina Bystrm, Katriina Murtonen, Kalervo Jrvelin, Kalervo Jrvelin, and Kalervo Jrvelin. 1995. Task complexity affects information seeking and use. In Information Processing and Management, pages 191-213.
Juliet Corbin and Anselm L. Strauss. 2008. Basics of qualitative research : techniques and procedures for developing grounded theory. Sage, 3rd edition.
John W Ely, Jerome A Osheroff, Paul N Gorman, Mark H Ebell, M Lee Chambliss, Eric A Pifer, and P Zoe Stavri. 2000. A taxonomy of generic clini- cal questions: classification study. British Medical Journal, 321:429-432.
Barney G. Glaser and Anselm L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qual- itative Research. Aldine de Gruyter, New York.
Andreas Henrich and Volker Luedecke. 2007. Char- acteristics of geographic information needs. In GIR '07: Proceedings of the 4th ACM workshop on Ge- ographical information retrieval, pages 1-6, New York, NY, USA. ACM.
W. R. Hersh. 2008. Information Retrieval. Springer. Information Retrieval for biomedical researchers.
Vahed Qazvinian and Dragomir R. Radev. 2008. Sci- entific paper summarization using citation summary networks. In The 22nd International Conference on Computational Linguistics (COLING 2008), Mach- ester, UK, August.
G. Salton and M. J. McGill. 1983. Introduction to modern information retrieval. McGraw-Hill, New York.
Karen Spark Jones. 1998. Automatic summarizing: factors and directions. In I. Mani and M. Maybury, editors, Advances in Automatic Text Summarisation. MIT Press, Cambridge MA.
Robert S Taylor. 1962. Process of asking questions. American Documentation, 13:391-396, October.
Simone Teufel and Marc Moens. 2002. Summa- rizing scientific articles: experiments with rele- vance and rhetorical status. Computional Linguis- tics, 28(4):409-445.
Elaine G. Toms. 2000. Understanding and facilitating the browsing of electronic text. International Jour- nal of Human-Computing Studies, 52(3):423-452.
D Tran, C Dubay, P Gorman, and W. Hersh. 2004. Ap- plying task analysis to describe and facilitate bioin- formatics tasks. Studies in Health Technology and Informatics, 107107(Pt 2):818-22.
Stephen Wan and Cécile Paris. 2008. In-browser sum- marisation: Generating elaborative summaries bi- ased towards the reading context. In The 46th An- nual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Paper, Columbus, Ohio, June.
Stephen Wan, Cécile Paris, and Robert Dale. 2009. Whetting the appetite of scientists: Producing sum- maries tailored to the citation context. In Proceed- ings of the Joint Conference on Digital Libraries. References
Vahed Qazvinian and Dragomir R. Radev. Scien- tific paper summarization using citation sum- mary networks. In COLING 2008, Manchester, UK, 2008.
Dragomir R. Radev, Mark Joseph, Bryan Gibson, and Pradeep Muthukrishnan. A Bibliometric and Network Analysis of the Field of Computa- tional Linguistics. JASIST, 2009 to appear. References J.D. Anderson and M.A. Hofmann. 2006. A fully faceted syntax for Library of Congress subject headings. Cataloging & Classification Quarterly, 43(1):7-38.
K. Antelman, E. Lynema, and A.K. Pace. 2006. To- ward a twenty-first century library catalog. Infor- mation technology and libraries, 25(3):128-138.
David Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Ma- chine Learning Research, 3:993-1022.
W. Dakka and P.G. Ipeirotis. 2008. Automatic extrac- tion of useful facet hierarchies from text databases. In IEEE 24th International Conference on Data En- gineering, 2008. ICDE 2008, pages 466-475.
W. Dakka, P.G. Ipeirotis, and K.R. Wood. 2005. Au- tomatic construction of multifaceted browsing inter- faces. In Proceedings of the 14th ACM international conference on Information and knowledge manage- ment, pages 768-775. ACM New York, NY, USA.
J. English, M.A. Hearst, R. Sinha, K. Swearingen, and K.-P. Yee. 2001. Examining the usability of web site search. Unpublished Manuscript, http://flamenco.berkley.edu/papers/epicurious- study.pdf.
Christiane Fellbaum, editor. 1998. WordNet: An Elec- tronic Lexical Database. MIT Press.
M.A. Hearst, J. English, R. Sinha, K. Swearingen, and K.-P. Yee. 2002. Finding the flow in web site search. Communications of the ACM, 45(9), September.
M.A. Hearst. 2000. Next Generation Web Search: Set- ting Our Sites. IEEE Data Engineering Bulletin, 23(3):38-48.
M.A. Hearst. 2006a. Clustering Versus Faceted Cat- egories For Information Exploration. Communca- tions Of The Acm, 49(4):59-61.
M.A. Hearst. 2006b. Design recommendations for hierarchical faceted search interfaces. In SIGIR'06 Workshop On Faceted Search, Seattle, Wa, August.
K. Hornbaek and E. Frøkjaer. 1999. Do Thematic Maps Improve Information Retrieval. Human-Computer Interaction (INTERACT'99), pages 179-186.
A.J. Kleiboemer, M.B. Lazear, and J.O. Pedersen. 1996. Tailoring a retrieval system for naive users. In Proceedings of the Fifth Annual Symposium on Doc- ument Analysis and Information Retrieval (SDAIR '96), Las Vegas, NV.
J. Koren, Y. Zhang, and X. Liu. 2008. Personalized interactive faceted search. WWW '08: Proceeding of the 17th international conference on World Wide Web.
Bernardo Magnini. 2000. Integrating subject field codes into WordNet. In Proc. of LREC 2000, Athens, Greece.
Rada Mihalcea and Dan I. Moldovan. 2001.
Ez.wordnet: Principles for automatic generation of a coarse grained wordnet. In Proc. of FLAIRS Con- ference 2001, May.
Roberto Navigli, Paola Velardi, and Aldo Gangemi. 2003. Ontology learning and its application to auto- mated terminology translation. Intelligent Systems, 18(1):22-31.
T.A. Olson. 2007. Utility of a faceted catalog for scholarly research. Library Hi Tech, 25(4):550-561.
W. Pratt, M.A. Hearst, and L. Fagan. 1999. A knowledge-based approach to organizing retrieved documents. In Proceedings of 16th Annual Con- ference on Artificial Intelligence(AAAI 99), Orlando, FL. K. Rodden, W. Basalaj, D. Sinclair, and K. R. Wood. 2001. Does organisation by similarity assist im- age browsing? In Proceeedings of ACM CHI 2001, pages 190-197.
D.M. Russell, M. Slaney, Y. Qu, and M. Hous- ton. 2006. Being literate with large document collections: Observational studies and cost struc- ture tradeoffs. In Proceedings of the 39th Annual Hawaii International Conference on System Sci- ences (HICSS'06).
Mark Sanderson and Bruce Croft. 1999. Deriving con- cept hierarchies from text. In Proceedings of SIGIR 1999.
E. Stoica and M. Hearst. 2004. Nearly-automated metadata hierarchy creation. In Companion Pro- ceedings of HLT-NAACL'04, pages 117-120.
E. Stoica, M.A. Hearst, and M. Richardson. 2007. Au- tomating Creation of Hierarchical Faceted Metadata Structures. In Human Language Technologies: The Annual Conference of the North American Chap- ter of the Association for Computational Linguistics (NAACL-HLT 2007), pages 244-251.
K.-P. Yee, K. Swearingen, K. Li, and M.A. Hearst. 2003. Faceted metadata for image search and browsing. In Proceedings of ACM CHI 2003, pages 401-408. ACM New York, NY, USA.
V. Zelevinsky, J. Wang, and D. Tunkelang. 2008. Sup- porting Exploratory Search for the ACM Digital Li- brary. In Workshop on Human-Computer Interac- tion and Information Retrieval (HCIR'08).
Chia-Hui Chang, Chun-Nan Hsu, and Shao-Cheng Lui. 2003. Automatic information extraction from semi- Support Syst., 35(1):129-147.
Eli Cortez, Altigran S. da Silva, Marcos André Gonc ¸alves, Filipe Mesquita, and Edleno S. de Moura. 2007. FLUX-CIM: flexible unsuper- vised extraction of citation metadata. In Proc. JCDL '07, pages 215-224, New York, NY, USA. ACM.
Isaac G. Councill, C. Lee Giles, and Min-Yen Kan. 2008. ParsCit: An open-source CRF reference string parsing package. In LREC '08, Marrakesh, Morrocco, May.
Junfei Geng and Jun Yang. 2004. Autobib: automatic extraction of bibliographic information on the web. pages 193-204, July.
Erik Hetzner. 2008. A simple method for citation metadata extraction using hidden markov models. In Proc. JCDL '08, pages 280-284, New York, NY, USA. ACM.
Fiona Fui-Hoon Nah. 2004. A study on tolerable wait- ing time: how long are web users willing to wait? Behaviour & Information Technology Special Issue on HCI in MIS, 23(3), May-June.
Fuchun Peng and Andrew McCallum. 2004. Accu- rate information extraction from research papers us- ing conditional random fields. pages 329-336. HLT- NAACL.
Kristie Seymore, Andrew McCallum, and Roni Rosen- feld. 1999. Learning hidden markov model struc- ture for information extraction. In AAAI'99 Work- shop on Machine Learning for Information Extrac- tion.
Xin Xin, Juanzi Li, Jie Tang, and Qiong Luo. 2008. Academic conference homepage understanding us- ing constrained hierarchical conditional random fields. In Proc. CIKM '08, pages 1301-1310, New York, NY, USA. ACM.
Kai-Hsiang Yang, Shui-Shi Chen, Ming-Tai Hsieh, Hahn-Ming Lee, and Jan-Ming Ho. 2008. CRE: An automatic citation record extractor for publica- tion list pages. In Proc. WMWA'08 of PAKDD-2008, Osaka, Japan, May.
Yanhong Zhai and Bing Liu. 2005. Web data extrac- tion based on partial tree alignment. In Proc. WWW '05, pages 76-85, New York, NY, USA. ACM.
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, and Wei-Ying Ma. 2006. Simultaneous record detec- tion and attribute labeling in web data extraction. In Proc. KDD '06, pages 494-503, New York, NY, USA. ACM. References
Gregory Crane. 1987. From the old to the new: in- tergrating hypertext into traditional scholarship. In Proceedings of the ACM conference on Hypertext, pages 51-55, Chapel Hill, North Carolina, United States. ACM.
Gregory Crane. 2006. What do you do with a million books. D-Lib Magazine, 12(3).
Aron Culotta, Andrew Mccallum, and Jonathan Betz. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of the main conference on Hu- man Language Technology Conference of the North American Chapter of the Association of Computa- tional Linguistics, pages 296-303, Morristown, NJ, USA. Association for Computational Linguistics.
Andreas Doms and Michael Schroeder. 2005. GoP- ubMed: exploring PubMed with the gene ontology. Nucl. Acids Res., 33(suppl 2):783-786, July.
Andrea Ernst-Gerlach and Gregory Crane, 2008. Iden- tifying Quotations in Reference Works and Primary Materials, pages 78-87.
C. Lee Giles Isaac Councill and Min-Yen Kan. 2008. Parscit: an open-source crf reference string pars- ing package. In Bente Maegaard Joseph Mari- ani Jan Odjik Stelios Piperidis Daniel Tapias Nico- letta Calzolari (Conference Chair), Khalid Choukri, editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Re- sources Association (ELRA). http://www.lrec- conf.org/proceedings/lrec2008/.
Okan Kolak and Bill N. Schilit. 2008. Generating links by mining quotations. In Proceedings of the nine- teenth ACM conference on Hypertext and hyperme- dia, pages 117-126, Pittsburgh, PA, USA. ACM.
John Lafferty, Andrew Mccallum, and Fernando Pereira. 2001. Conditional random fields: Prob- abilistic models for segmenting and labeling se- quence data. In Proc. 18th International Conf. on Machine Learning, pages 289, 282. Morgan Kauf- mann, San Francisco, CA.
Frank Lester. 2007. Backlinks: Alternatives to the citation index for determining impact. Journal of Electronic Publishing, 10(2).
Andrew Kachites McCallum. 2002. MAL- LET: a machine learning for language toolkit. http://mallet.cs.umass.edu.
Matteo Romanello. 2007. A semantic linking sys- tem for canonical references to electronic corpora. Prague. to be next published in the proceedings of the ECAL 2007 Electronic Corpora of Ancient Lan- guages, held in Prague November 2007.
Matteo Romanello. 2008. A semantic linking frame- work to provide critical value-added services for e- journals on classics. In Susanna Mornati and Leslie Chan, editors, ELPUB2008. Open Scholarship: Au- thority, Community, and Sustainability in the Age of Web 2.0 -Proceedings of the 12th International Con- ference on Electronic Publishing held in Toronto, Canada 25-27 June 2008 / Edited by: Leslie Chan and Susanna Mornati.
David A. Smith and Gregory Crane. 2001. Disam- biguating geographic names in a historical digital li- brary. In ECDL '01: Proceedings of the 5th Euro- pean Conference on Research and Advanced Tech- nology for Digital Libraries, pages 127-136, Lon- don, UK. Springer-Verlag.
Neel Smith. 2009. Citation in classical studies. Digital Humanities Quarterly, 3(1).
Shannon Bradshaw. 2003. Reference directed index- ing: Redeeming relevance for subject search in cita- tion indexes. In Proceedings of the 7th ECDL, pages 499-510.
Eugene Garfield, Irving H. Sher, and Richard J. Torpie. 1964. The use of citation data in writing the his- tory of science. Institute for Scientific Information, Philadelphia, Pennsylvania.
Thorsten Joachims. 1999. Making large-scale sup- port vector machine learning practical. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexan- der J. Smola, editors, Advances in kernel methods: support vector learning, pages 169-184. MIT Press, Cambridge, MA, USA.
Dain Kaplan and Takenobu Tokunaga. 2008. Sighting citation sites: A collective-intelligence approach for automatic summarization of research papers using c-sites. In ASWC 2008 Workshops Proceedings.
Andrew Kehler. 2004. The (non)utility of predicate- argument frequencies for pronoun interpretation. In In: Proceedings of 2004 North American chapter of the Association for Computational Linguistics an- nual meeting, pages 289-296.
M. M. Kessler. 1963. Bibliographic coupling be- tween scientific papers. American Documentation, 14(1):10-25.
LDC2001T02. 2001. Message understanding confer- ence (MUC) 7.
Daniel Marcu. 2000. The rhetorical parsing of unre- stricted texts: A surface-based approach. Computa- tional Linguistics, 26(3):395-448.
Hidetsugu Nanba, Noriko Kando, and Manabu Oku- mura. 2000. Classification of research papers using citation links and citation types: Towards automatic review article generation. In Proceedings of 11th SIG/CR Workshop, pages 117-134.
Hidetsugu Nanba, Takeshi Abekawa, Manabu Oku- mura, and Suguru Saito. 2004. Bilingual presri inte- gration of multiple research paper databases. In Pro- ceedings of RIAO 2004, pages 195-211, Avignon, France.
Vincent Ng and Claire Cardie. 2002. Improving ma- chine learning approaches to coreference resolution. In Proceedings of the 40th Annual Meeting on Asso- ciation for Computational Linguistics, pages 104- 111.
J. Nie. 2002. Towards a unified approach to clir and multilingual ir. In In: Workshop on Cross Language Information Retrieval: A Research Roadmap in the 25th Annual International ACM SIGIR Conference on Research and Development in Information Re- trieval, pages 8-14.
Masaki Noguchi, Kenta Miyoshi, Takenobu Tokunaga, Ryu Iida, Mamoru Komachi, and Kentaro Inui. 2008. Multiple purpose annotation using SLAT - Segment and link-based annotation tool -. In Pro- ceedings of 2nd Linguistic Annotation Workshop, pages 61-64, May.
John O'Connor. 1982. Citing statements: Computer recognition and use to improve retrieval. Informa- tion Processing & Management., 18(3):125-131.
Vahed Qazvinian and Dragomir R. Radev. 2008. Sci- entific paper summarization using citation summary networks.
Anna Ritchie, Simone Teufel, and Stephen Robertson. 2006. How to find better index terms through cita- tions. In Proceedings of the Workshop on How Can Computational Linguistics Improve Information Re- trieval?, pages 25-32, Sydney, Australia, July. As- sociation for Computational Linguistics.
Anna Ritchie, Stephen Robertson, and Simone Teufel. 2008. Comparing citation contexts for informa- tion retrieval. In CIKM '08: Proceedings of the 17th ACM conference on Information and knowl- edge management, pages 213-222, New York, NY, USA. ACM.
Serge Sharoff. 2006. Creating general-purpose cor- pora using automated search engine queries. In WaCky! Working papers on the Web as Corpus. Gedit.
H. Small. 1973. Co-citation in the scientific literature: A new measure of the relationship between two doc- uments. JASIS, 24:265-269.
Wee Meng Soon, Daniel Chung, Daniel Chung Yong Lim, Yong Lim, and Hwee Tou Ng. 2001. A machine learning approach to coreference resolu- tion of noun phrases. Computational Linguistics, 27(4):521-544.
Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. Automatic classification of citation function. In In Proceedings of EMNLP-06.
Sandra A. Thompson and William C. Mann. 1987. Rhetorical structure theory: A framework for the analysis of texts. Pragmatics, 1(1):79-105.
Vladimir N. Vapnik. 1998. Statistical Learning The- ory. Adaptive and Learning Systems for Signal Pro- cessing Communications, and control. John Wiley & Sons.
Web-Scale NLP 2008. 2008. http: //research.microsoft.com/ur/asia/ research/NLP.aspx.
M. Weinstock. 1971. Citation indexes. Encyclopedia of Library and Information Science, 5:16-41.
Ying Zhang, Fei Huang, and Stephan Vogel. 2005. Mining translations of oov terms from the web through. In International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 03), pages 669-670.

Researcher affiliation extraction from homepages

Sign up for access to the world's latest research

AbstractAI

Related papers

References (207)

Related papers

Related topics

Abstract
AI