Web-based affiliation matching

Erhard Rahm; David Aumüller

Outline

Web-based affiliation matching

Erhard Rahm

David Aumüller

2009

Abstract

Authors of scholarly publications state their affiliation in various forms. This kind of heterogeneity makes bibliographic analysis tasks on institutions impossible unless a comprehensive cleaning and consolidation of affiliation data is performed. We investigate automatic approaches to consolidate affiliation data to reduce manual work and support scalability of affiliation analysis. In particular, we propose to set up a reference database of affiliation strings found in publications. A key step in this task is the matching of different affiliation strings to determine whether or not they match. For affiliation matching we investigate web based similarity measures utilizing the cognitive power of current search engines. They determine the similarity of affiliations based on how the URLs in the result sets of affiliation web searches overlap. We evaluate the effectiveness of affiliation matching based on URL overlap as well as for the combined use with the Soft TF-IDF similarity measure.

References (17)

Arasu, A., Kaushik, R. A grammar-based entity representation framework for data cleaning. Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), 2009
Aumueller, D. Towards web supported identification of top affiliations from scholarly papers. Proc. German Database Conf. (Database systems in Business, Technology and Web (BTW 2009), 2009
Bollegala, D., Matsuo, Y., Ishizuka, M. Measuring semantic similarity between Words using web search engines. Proc. WWW Conf., 2007
Christen, P., Goiser, K. Quality and Complexity Measures for Data Linkage and Deduplication. Quality Measures in Data Mining. Springer, 2007
Cohen, W., Ravikumar, P, Fienberg, S. A Comparison of String Metrics for Matching Names and Records. Data Cleaning and Object Consolidation, 19(1), 2003
Elmacioglu, E. et al. Web based linkage. Proc. Web information and data management (WIDM), 2007
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S. Duplicate Record Detection: A Survey. Knowledge and Data Engineering, 2007
Gligorov, R. et al. Using Google distance to weight approximate ontology matches. Proc. WWW Conf. 2007
Google Inc. Google AJAX Search API <http://code.google.com/apis/ajaxsearch>
Kalahnikov, D. V., Mehrotra, S., Chen, Z. Exploiting relationships for domain-independent data cleaning. Proc. SIAM International Conference on Data Mining (SDM), 2005
McCann, R., Shen, W., Doan, A. Matching Schemas in Online Communities: A Web 2.0 Approach. Proc. Data Engineering (ICDE), 2008
Michalowski, M., Thakkar, S., Knoblock, C. A. Automatically utilizing secondary sources to align information across sources. AI Magazine, Spring 2005
Pereira, D. A. et al. Using web information for author name disambiguation. Proc. Joint Conference on Digital Libraries (JCDL), 2009
Rahm, E., Thor, A. Citation analysis of database publications. SIGMOD Record, Dec. 2005
Tan, Y.F. et al. Efficient Web-Based Linkage of Short to Long Forms. Proc. ACM Workshop on the Web and Databases (WebDB), Vancouver, 2008
Torvik, V. I., Smalheiser N. R. Author name disambiguation in MEDLINE. ACM Transactions on Konwledge Discovery from Data. 3(3) July 2009
Yahoo! Inc. Yahoo Search BOSS <http://developer.yahoo.com/search/boss>

Web-based affiliation matching

Sign up for access to the world's latest research

Abstract

Related papers

References (17)

Related papers

Related topics