Papers by Sebastian Hellmann

… Workshop on Linked …, 2012
We have often heard that data is the new oil. In particular, extracting information from semi-str... more We have often heard that data is the new oil. In particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. Several attempts have been proposed to extract knowledge from textual documents, extracting named entities, classifying them according to pre-defined taxonomies and disambiguating them through URIs identifying real world entities. As a step towards interconnecting the Web of documents via those entities, different extractors have been proposed. Although they share the same main purpose (extracting named entity), they differ from numerous aspects such as their underlying dictionary or ability to disambiguate entities. We have developed NERD, an API and a front-end user interface powered by an ontology to unify various named entity extractors. The unified result output is serialized in RDF according to the NIF specification and published back on the Linked Data cloud. We evaluated NERD with a dataset composed of five TED talk transcripts, a dataset composed of 1000 New York Times articles and a dataset composed of the 217 abstracts of the papers published at WWW 2011.
The Open Linguistics Working Group (OWLG) is an initiative of experts from different fields conce... more The Open Linguistics Working Group (OWLG) is an initiative of experts from different fields concerned with linguistic data, including academic linguistics (eg typology, corpus linguistics), applied linguistics (eg computational linguistics, lexicography and language documentation) and NLP (eg from the Semantic Web community).
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability... more The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The motivation behind NIF is to allow NLP tools to exchange annotations about text documents in RDF. Hence, the main prerequisite is that parts of the documents (ie strings) are referenceable by URIs, so that they can be used as subjects in RDF statements.
Abstract. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve intero... more Abstract. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The motivation behind NIF is to allow NLP tools to exchange annotations about text documents in RDF. Hence, the main prerequisite is that parts of the documents (ie strings) are referenceable by URIs, so that they can be used as subjects in RDF statements.
The contributions of this part have described recent activities of the OWLG as a whole and of ind... more The contributions of this part have described recent activities of the OWLG as a whole and of individual OWLG members aiming to provide linguistic resources as Linked Data.
ABSTRACT We have often heard that data is the new oil. In particular, extracting information from... more ABSTRACT We have often heard that data is the new oil. In particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. Several attempts have been proposed to extract knowledge from textual documents, extracting named entities, classifying them according to pre-defined taxonomies and disambiguating them through URIs identifying real world entities.
Abstract While the Web of Data, the Web of Documents and Natural Language Processing are well res... more Abstract While the Web of Data, the Web of Documents and Natural Language Processing are well researched individual fields, approaches to combine all three are fragmented and not yet well aligned. This chapter analyzes current efforts in collaborative knowledge extraction to uncover connection points between the three fields.
Purpose–DBpedia extracts structured information from Wikipedia, interlinks it with other knowledg... more Purpose–DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues.
Abstract. Knowledge Engineering is a costly, tedious and often timeconsuming task, for which ligh... more Abstract. Knowledge Engineering is a costly, tedious and often timeconsuming task, for which light-weight processes are desperately needed. In this paper, we present a new paradigm-Navigation-induced Knowledge Engineering by Example (NKE)-to address this problem by producing structured knowledge as a result of users navigating through an information system. Thereby, NKE aims to reduce the costs associated with knowledge engineering by framing it as navigation.
The NLP Interchange Format (NIF) is an RDF/OWL-based format that provides interoperability betwee... more The NLP Interchange Format (NIF) is an RDF/OWL-based format that provides interoperability between Natural Language Processing (NLP) tools, language resources and annotations by allowing NLP tools to exchange annotations about text documents in RDF. Other than more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform.
The modeling of lexico-semantic resources by means of ontologies is an established practice. Simi... more The modeling of lexico-semantic resources by means of ontologies is an established practice. Similarly, general-purpose knowledge bases are available, eg DBpedia, the nucleus for the Web of Data. In this section, we provide a brief introduction to DBpedia and describe recent internationalization efforts (including the creation of a German version) around it. With DBpedia serving as an entity repository it is possible to link the Web of Documents with the Web of Data via DBpedia identifiers.
In 2009, there was a large survey to collect journal articles and conference papers concerning th... more In 2009, there was a large survey to collect journal articles and conference papers concerning the Wikipedia project [3]. The findings of this survey were published, 1 but are, as of now, only partially analyzed. The results showed a continuous growth of the number of journal articles related to Wikipedia. The number of conference papers grew until 5 years after the founding, but afterwards it began to decrease year by year.
Abstract This paper describes the Open Linguistics Working Group (OWLG) of the Open Knowledge Fou... more Abstract This paper describes the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation (OKFN). The OWLG is an initiative concerned with linguistic data by scholars from diverse fields, including linguistics, NLP, and information science. The primary goal of the working group is to promote the idea of open linguistic resources, to develop means for their representation and to encourage the exchange of ideas across different disciplines.
Abstract Recently the publishing and integration of structured data on the Web gained traction wi... more Abstract Recently the publishing and integration of structured data on the Web gained traction with initiatives such as Linked Data, RDFa and schema. org. In this article we outline some fundamental principles and aspects of the emerging Web of Data. We stress the importance of open licenses as an enabler for collaboration, sharing and reuse of structured data on the Web. We discuss some features of the RDF data model and its suitability for integrating structured data on the Web.
ABSTRACT. The Open Linguistics Working Group (OWLG) is an initiative of experts from different fi... more ABSTRACT. The Open Linguistics Working Group (OWLG) is an initiative of experts from different fields concerned with linguistic data, including academic linguistics (eg typology, corpus linguistics), applied linguistics (eg computational linguistics, lexicography and language documentation), and NLP (eg from the Semantic Web community).
Abstract. We present a declarative approach implemented in a comprehensive open-source framework ... more Abstract. We present a declarative approach implemented in a comprehensive open-source framework based on DBpedia to extract lexicalsemantic resources–an ontology about language use–from Wiktionary. The data currently includes language, part of speech, senses, definitions, synonyms, translations and taxonomies (hyponyms, hyperonyms, synonyms, antonyms) for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions of Wiktionary.
The WoLE 2012 workshop envisions the Semantic Web as a Web of Linked Entities (WoLE), which trans... more The WoLE 2012 workshop envisions the Semantic Web as a Web of Linked Entities (WoLE), which transparently connects the World Wide Web (WWW) and the Giant Global Graph (GGG) using methods from Information Retrieval (IR) and Natural Language Processing (NLP).
During the past two decades, the use of the Web has spread across multiple countries and cultures... more During the past two decades, the use of the Web has spread across multiple countries and cultures. While the Semantic Web is already served in many languages, we are still facing challenges concerning its internationalization. The DBpedia project, a community effort to extract structured information from Wikipedia, is already supporting multiple languages. This paper presents a graphical tool for creating internationalized mappings for DBpedia.

svn.aksw.org
We are currently observing a plethora of Natural Language Processing tools and services being mad... more We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In addition, the synergistic combination of tools might ultimately yield a boost in precision and recall for common NLP tasks. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. NIF aware applications will produce output (and possibly also consume input) adhering to the NIF ontology. Other than more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We evaluate the NIF approach by (1) benchmarking the stability of the NIF URI scheme and (2) providing results of a field study, where we integrated 6 diverse NLP tools using NIF wrappers.
Uploads
Papers by Sebastian Hellmann