Weaving Scholarly Legacy Data into Web of Data
2012
Abstract
The Linked Open Data project provides a new publishing paradigm for creating machine readable and structured data on the Web. Currently, the significant presence of data sets describing scholarly publications in the Linked Data cloud underpins the importance of Linked Data for the scientific community and for the open access movement. However, these semantically rich datasets need to be exploited and linked with real time applications. In the project we report on this. We have exploited numerous scholarly datasets and have created semantic links to papers in an online journal, particularly Journal of Universal Computer Science (J.UCS). The J. UCS plays an important part in the computer science publishing community and provides a number of innovative features and datasets to its web users. However, the legacy HTML format in which these features are made available makes it difficult for machines to understand and query. Keeping in mind the impressive benefits of the Linked Open Data project, this paper presents an approach to convert J.UCS legacy HTML data from its current form to machine understandable format (RDF). It also interlinks this data with other important Linked Data resources. The approach developed has successfully disambiguated and interlinked J.UCS authors and publications datasets with DBpedia, DBLP, CiteULike and faceted DBLP. Additionally, triplified and interlinked datasets are made available to the scientific and semantic web community for downloading and posing SPARQL queries. This semantically linked dataset can further be used by researchers and semantic agents to identify semantic associations, to build inferencing systems, and to extract useful knowledge.
References (5)
- Afzal et al., 2008] Afzal, M. T., Kulathuramaiyer, N., Maurer, H. (2008). Expertise Finding for an Electronic Journal, In: Proceedings of International Conference on Knowledge Management and Knowledge Technologies, pp. 436-440, Graz, Austria, 3-5, Sep. 2008. [Afzal, 2009] Afzal, M. T.: Information Supply of Related Papers from the Web for Scholarly e-Community, Lecture Notes in Business Information Processing, vol. 45, pp. 61-72 (2010). [Afzal et al., 2009] Afzal, M. T., Latif, A., Ussaeed, A., Sturm, P., Aslam, S., Andrews, K., Tochtermann, K., Maurer, H.: Discovery and Visualization of Expertise in a Scientific Community, In Proc. International Conference of Frontiers of Information Technology, Islamabad, Pakistan, 16-18, Dec. 2009. [Afzal et al 2009a] Afzal, M. T., Balke, W., T., Kulathuramaiyer, N., Maurer, H.: Rule based Autonomous Citation Mining with TIERL, Journal of Digital Information Management, 8 (3), 196-204 (June 2010)
- Afzal , 2010] Afzal, M. T.: Context Aware Information Discovery for Scholarly e-Community, PhD thesis, Graz University of Technology, Austria, 2010. [Auer et al., 2007] Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R.and Ives, Z.: DBpeddia: A Nucleus for a Web of Open Data, In Proc. 6th International Semantic Web Conference, Springer, Busan, Korea, 11-15, Nov. 2007. [Auer et al. 2009] Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D.: Triplify - Lightweight Linked Data Publication from Relational Databases. In Proceedings of the International World Wide Web Conference (WWW 09), Madrid, Spain, pp. 621-630, 2009. [Berners-Lee 2006] Berners-Lee, T.: Linked Data Design Issues. 2006. http://www.w3.org/DesignIssues/LinkedData.html. [Bizer and Cyganiak 2006] Bizer, C., Cyganiak, R.: D2R Server -Publishing Relational Databases on the Semantic Web. Poster at the 5th International Semantic Web Conference (ISWC), Nov. 2006. [Bizer et al., 2009] Bizer, C., Heath, T., Berners-Lee, T., : Linked data -the story so far; International Journal on Semantic Web and Information Systems (IJSWIS), 2009. [Breslin et al., 2005] Breslin, J. G., Harth, A., Bojars, U., Decker, S.: Towards Semantically- Interlinked Online Communities. In Proceedings of the Second European Semantic Web Conference, ESWC 2005, May 29-June 1, 2005, Heraklion, Crete, Greece, 2005. [Brickley and Miller, 2004] Brickley, D., Miller, L.: FOAF Vocabulary Specification. Namespace Document 2 Sept 2004, FOAF Project, 2004. http://xmlns.com/foaf/0.1/.
- Calude et al., 1994] Calude, C., Maure, H., Salomaa, A.: Journal of Universal Computer Science, In Journal of Universal Computer Science 0 (0), pp. 109-116, 1994. [Candela et al., 2009] Candela, L., Castelli, D., Fuhr, N., Ioannidis, Y., Klas, C.-P., Pagano, P., Ross, S., Saidis, C., Schek, H.-J., Schuldt, H., Springmann, M.: Current Digital Library Systems: User Requirements vs Provided Functionality, Deliverable D1.4.1, Mar. 2006. [Cheng et al., 2008] Cheng, G., Ge, W., Qu, Y.: Falcons: Searching and Browsing Entities on the Semantic Web. In: Proceedings of 17th International World Wide Web Conference, pp. 1101-1102, Beijing, China, 21-25 Apr. 2008. [Coetzee et al. 2008] Coetzee, P., Heath, T., Motta, E.: Sparqplug: Generating linked data from legacy html, sparql and the DOM. In Proceeding of the CEUR-WS Vol-369 of Linked Data on the Web (LDOW2008), Beijing, China, 2008. [Ding et al., 2004] Ding, L., Finin, T., Joshi, A., Pan, R., S. Cost, R., Peng, Y., Reddivari, P., C. Doshi, V., Sachs, J.: Swoogle: A Search and Metadata Engine for the Semantic Web. In: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, pp. 652 -659, Washington, D.C., USA, 8-13, Nov. 2004. [Hausenblas 2009] Hausenblas, M.: Linked Data Applications. Technical Report, DERI, 2009. [Hepp et al., 2006] Hepp, M., Siorpaes, K., Bachlechner, D.: Harvesting Wiki Consensus Using Wikipedia Entries as Vocabulary for Knowledge Management, IEEE Internet Computing. 11(5), pp. 54-65. 2007.
- J.UCS 2011] Journal of Universal Computer Science. 2011. http://www.jucs.org [Krottmaier, 2003] Krottmaier, H.: Links to the Future, Journal of Digital Information Management, In Journal of Universal Computer Science 1 (1), pp. 3-8, 2003. [Latif et al., 2009] Latif, A., Tanvir, M.T., Hoefler, P., UsSaeed, A., Tochtermann, K.: Translating Keywords into URIS, In proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, Seoul, Korea, 24-26 Nov. 2009. [Latif et al., 2010] Latif, A., Afzal, M.T., Ussaeed, A., Hoefler, P., Tochtermann, K.: Harvesting Pertinent Resources from Linked Data, In Journal of Digital Information Management (JDIM) 8 (3), pp. 205-212, June 2010. [Latif et al., 2010a] Latif, A. Afzal, M. T., Helic, D., Tochtermann, K., Maurer, H.: Discovery and Construction of Authors' Profile from Linked Data (A case study for Open Digital Journal)", LDOW at World Wide Web conference 2010, April 24-30, 2010, Raleigh, North Carolina. [Marchionini and Maurer 1995] Marchionini, G., Maurer, H.: The roles of digital libraries in teaching and learning, Communication of the ACM, vol. 38, No. 4, pp. 67-75, 1995. [Maurer, 2001] Maurer, H.: Beyond Digital Libraries. Global Digtial Library Development in the New Millenium, In: Proceedings of NIT Conference, pp.165-173, Beijing, China, 2001. [Lay, 2009] Michael, L.: DBLP -Some Lessons Learned. PVLDB 2(2), pp. 1493-1500, 2009. [Oren et al., 2008] Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: A Document-oriented Lookup Index for Open Linked Data. International Journal of Metadata, Semantics and Ontologies, 3(1), pp. 37-52, 2008. [Roberts et al., 2001] Roberts, R.J., Varmus, H.E., Ashburner, M., Brown, P.O., Eisen, M.B., Khosla, C., Kirschner, M., Nusse, R., Scott, M., Wold, B.: Building A GenBank of the Published Literature. Science, 291 (5512), 2318-2319.
- Scharffe et al., 2009] Scharffe, F., Liu, Y., Zhou, C.: RDF-AI: an Architecture for RDF Datasets Matching, Fusion and Interlink. In Proceedings of the IJCAI 2009 workshop on Identity, reference, and knowledge representation (IR-KR), Pasadena, CA US, 2009. [Suchanek et al., 2007] Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge, In Proc. 16th international World Wide Web conference, ACM Press, Banf, Alberta, Canada , 8-12, May, 2007. [Volz et al., 2009] Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk -A Link Discovery Framework for the Web of Data. In Proceeding of the 2nd Workshop about Linked Data on the Web. 2009.