Automatic semantic annotation with KIM

Borislav Popov

Outline

Title

Natural Language Processing

Automatic semantic annotation with KIM

Borislav Popov

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract
AI

The KIM platform is designed for automatic semantic annotation, indexing, and retrieval of documents, offering scalable and customizable ontology-based information extraction. It leverages an upper-level ontology and a comprehensive knowledge base to support semantic-enhanced information retrieval. Key features include an intuitive web user interface and a plug-in for Internet Explorer that provides lightweight semantic annotations, allowing users to explore and interact with the knowledge base seamlessly.

Alicia Martinez Rebollar

Research in Computing Science, 2019

Due to the needs to improve the information search process, new strategies have been created to enhance searches. The semantic search performs the search by means of meaning instead of literals. The semantic search in unstructured documents requires to formalize knowledge through an annotation semantic process. Some annotation proposals use natural language processing tools, ontologies to link document terms; others use the similarity of entities through the weight of the edges, association between pair of concepts or the ontology structure. In this paper we present an alternative for semantic annotation in unstructured documents by semantic context extraction of entities. In the approach we detect the named entities through a data dictionary created from Wikipedia and link the instances in the ontology. The context extraction strategy is based on the concepts similarity; each term is associated with an instance of the ontology and the similarity between relationships explicit is measured by the combination of two types of measures: the association between each pair of concepts and the weight of the relationships. The approach was tested with two ontologies and two datasets in news and business, respectively.

downloadDownload free PDF View PDFchevron_right

Semantic Annotation of Named Entities using Ontology and Massive World-Knowledge

Atanas Kiryakov, Borislav Popov

downloadDownload free PDF View PDFchevron_right

Inforex - a web-based tool for text corpus management and semantic annotation

Jan Kocoń

2012

The aim of this paper is to present a system for semantic text annotation called Inforex. Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense Disambiguation (WSD) and relations between named entities. The system also supports manual text clean-up and automatic text pre-processing including text segmentation, morphosyntactic analysis and word selection for word sense annotation. Inforex can be accessed from any standard-compliant web browser supporting JavaScript. The user interface has a form of dynamic HTML pages using the AJAX technology. The server part of the system is written in PHP and the data is stored in MySQL database. The system make use of some external tools that are installed on the server or can be accessed via web services. The documents are stored in the database in the original format ― either plain text, XML or HTML. Tokenization and sentence segment...

downloadDownload free PDF View PDFchevron_right

Marvin: Semantic annotation using multiple knowledge sources

Nikola Milosevic

ArXiv, 2016

People are producing more written material then anytime in the history. The increase is so high that professionals from the various fields are no more able to cope with this amount of publications. Text mining tools can offer tools to help them and one of the tools that can aid information retrieval and information extraction is semantic text annotation. In this report we present Marvin, a text annotator written in Java, which can be used as a command line tool and as a Java library. Marvin is able to annotate text using multiple sources, including WordNet, MetaMap, DBPedia and thesauri represented as SKOS.

downloadDownload free PDF View PDFchevron_right

Semantic annotation for knowledge management: Requirements and a survey of the state of the art

Siegfried Handschuh

Web Semantics: Science …, 2006

downloadDownload free PDF View PDFchevron_right

Ogmios: a scalable NLP platform for annotating large web document collections

Adeline Nazarenko

2007

downloadDownload free PDF View PDFchevron_right

FIOBODA - SEMANTIC ANNOTATION FRAMEWORK FOR WEB EXTRACTED DATA

IAEME Publication

IAEME PUBLICATION, 2016

Semantic annotation of web pages is the state of art technology for achieving the unified objective of attaining Semantic web Universe, which enables sharing, and reusing the document content beyond the boundaries and applications. Web is a treasury of knowledge and efficient tools should be designed to explore the structured and unstructured data. Annotating million of web pages manually is an impossible task. For high information retrieval rates, automatic annotation of documents is mandatory. Metadata is added to the web pages to make it intelligent for processing in content based intelligent applications. This paper analyses the problems with the current Semantic annotation systems and proposes a new Ontology based Automatic annotation system Framework. Ontology based semantic annotation is one of the best methods for extracting data from the Knowledge Base. The integration of Modified Manning’s Sentence boundary detection algorithm and Noun Phrase Collocation algorithm and classification using machine learning techiques in the Information Extraction module, and developing a new data model and ontology for Structured Ontology engineering model is contributed in this paper. Annotation module annotates the output of the information extraction module with the aid of ontologies and dictionaries and stores the resultant annotated data as RDF triples in the Annotation database. Reasoning is made on the Annotated data by the RDF repository interface. FIOBODA is abbreviated as the Financial Instruments ontology based open document annotation. Web pages extracted from the Financial securities domain are mapped with the Finance ontology to extract the subject, predicate and object. SVM classifier is used to classify the correct and incorrect annotations. The correct output annotation data is stored in Annotation data base and RDF repository for later use. The proposed framework to an extent solves the problem of knowledge bottleneck due to its reusability and interoperability features.

downloadDownload free PDF View PDFchevron_right

Towards semantic web information extraction

Borislav Popov

… Web and Web Services, 2003

The approach towards Semantic Web Information Extraction (IE) presented here is implemented in KIM-a platform for semantic indexing, annotation, and retrieval. It combines IE based on the mature text engineering platform (GATE 1) with Semantic Web-compliant knowledge representation and management. The cornerstone is automatic generation of named-entity (NE) annotations with class and instance references to a semantic repository. Simplistic upper-level ontology, providing detailed coverage of the most popular entity types (Person, Organization, Location, etc.; more than 250 classes) is designed and used. A knowledge base (KB) with de-facto exhaustive coverage of real-world entities of general importance is maintained, used, and constantly enriched. Extensions of the ontology and KB take care of handling all the lexical resources used for IE, most notable, instead of gazetteer lists, aliases of specific entities are kept together with them in the KB. A Semantic Gazetteer uses the KB to generate lookup annotations. Ontologyaware pattern-matching grammars allow precise class information to be handled via rules at the optimal level of generality. The grammars are used to recognize NE, with class and instance information referring to the KIM ontology and KB. Recognition of identity relations between the entities is used to unify their references to the KB. Based on the recognized NE, template relation construction is performed via grammar rules. As a result of the latter, the KB is being enriched with the recognized relations between entities. At the final phase of the IE process, previously unknown aliases and entities are being added to the KB with their specific types.

downloadDownload free PDF View PDFchevron_right

Title: Semantic Annotation for Knowledge Management: Requirements and a Survey of the State of the Art

Mina Gharacheh

While much of a company's knowledge can be found in text repositories, current content management systems have limited capabilities for structuring and interpreting documents.

downloadDownload free PDF View PDFchevron_right

SEKT: Semantically Enabled Knowledge Technologies

peter sure

When working with large corpora of documents it is hard to comprehend and process all the information contained in them. Standard search engines usually rely on word matching and do not take the structure within the corpus into account. We try to overcome that by automatic extraction of topics covered within the documents, visualization of the corpus and semi-automatic construction of topic ontology.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (2)

Atanas Kiryakov, Borislav Popov, Damyan Ognyanoff, Dimitar Manov, Angel Kirilov, Miroslav Goranov. Semantic Annotation, Indexing, and Retrieval. To appear in Elsevier's Journal of Web Semantics, Vol. 1, ISWC2003 special issue (2), 2004. http://www.websemanticsjournal.org/
Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, Angel Kirilov, Miroslav Goranov. KIM -Semantic Annotation Platform. 2nd International Semantic Web Conference (ISWC2003), 20- 23 October 2003, Florida, USA. LNAI Vol. 2870, pp. 484-499, Springer-Verlag Berlin Heidelberg 2003.

Related papers

KIM – Semantic Annotation Platform

Borislav Popov

Lecture Notes in Computer Science, 2003

The KIM platform provides a novel Knowledge and Information Management infrastructure and services for automatic semantic annotation, indexing, and retrieval of documents. It provides mature infrastructure for scaleable and customizable information extraction (IE 1 ) as well as annotation and document management, based on GATE 2 . In order to provide basic level of performance and allow easy bootstrapping of applications, KIM is equipped with an upper-level ontology and a knowledge base providing extensive coverage of entities of general importance. The ontologies and knowledge bases involved are handled using cutting edge Semantic Web technology and standards, including RDF(S) repositories, ontology middleware and reasoning.

downloadDownload free PDF View PDFchevron_right

Semantic Annotation Framework For Intelligent Information Retrieval Using KIM Architecture

Rajat Goyal

Due to the explosion of information/knowledge on the web and wide use of search engines for desired information,the role of knowledge management(KM) is becoming more significant in an organization. Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manage information efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promising platform for knowledge management systems and vice versa, since they have the potential to give each other the real substance for machine-understandable web resources which in turn will lead to an intelligent, meaningful and efficient information retrieval on web. Today,the challenge for web community is to integrate the distributed heterogeneous resources on web with an objective of an intelligent web environment focusing on data semantics and user requirements. Semantic Annotation(SA) is being widely used which is about assigning to the entities in the text and links to their semantic descriptions. Various tools like KIM, Amaya etc may be used for semantic Annotation.

downloadDownload free PDF View PDFchevron_right

Semantic annotation, indexing, and retrieval

Borislav Popov

Journal of Web Semantics, 2004

The Semantic Web realization depends on the availability of a critical mass of metadata for the web content, associated with the respective formal knowledge about the world. We claim that the Semantic Web, at its current stage of development, is in a state of a critically need of metadata generation and usage schemata that are specific, well-defined and easy to understand. This paper introduces our vision for a holistic architecture for semantic annotation, indexing, and retrieval of documents with regard to extensive semantic repositories. A system (called KIM), implementing this concept, is presented in brief and it is used for the purposes of evaluation and demonstration. A particular schema for semantic annotation with respect to real-world entities is proposed. The underlying philosophy is that a practical semantic annotation is impossible without some particular knowledge modelling commitments. Our understanding is that a system for such semantic annotation should be based upon a simple model of real-world entity classes, complemented with extensive instance knowledge. To ensure the efficiency, ease of sharing, and reusability of the metadata, we introduce an upper-level ontology (of about 250 classes and 100 properties), which starts with some basic philosophical distinctions and then goes down to the most common entity types (people, companies, cities, etc.). Thus it encodes many of the domain-independent commonsense concepts and allows straightforward domainspecific extensions. On the basis of the ontology, a large-scale knowledge base of entity descriptions is bootstrapped, and further extended and maintained. Currently, the knowledge bases usually scales between 10 5 and 10 6 descriptions. Finally, this paper presents a semantically enhanced information extraction system, which provides automatic semantic annotation with references to classes in the ontology and to instances. The system has been running over a continuously growing document collection (currently about 0.5 million news articles), so it has been under constant testing and evaluation for some time now. On the basis of these semantic annotations, we perform semantic based indexing and retrieval where users can mix traditional IR (information retrieval) queries and ontology-based ones. We argue that such large-scale, fully automatic methods are essential for the transformation of the current largely textual web into a semantic web.

downloadDownload free PDF View PDFchevron_right

A Survey on Semantic Annotation Tools for Knowledge Management

pooja kherwa

International Journal of Engineering Applied Sciences and Technology

Support for information and knowledge exchange is a key issue in the information society. To reduce the time wasted in searching and to reduce associated user frustration much more selective user access is needed. This is possible by semantic information processing of online documents.Knowledge management in an organisation are used for managing knowledge resources in order to facilitate access and reuse of knowledge.Semantic annotation is about assigning to the entities in the text, links to their semantic descriptions. This sort of metadata provides both class and instance information about the entities. Semantic annotation is applicable for any type of textweb pages, regular documents etc. For semantic annotation, there are various manual, semiautomatic and full automatic tools are developed by various organizations like mindswap.org, ontotext.org etc. In this paper, we are presenting analysis and review of some of these tools according to their applicability for an application d...

downloadDownload free PDF View PDFchevron_right

From manual to semi-automatic semantic annotation: About ontology-based text annotation tools

Alexander Maedche

Proceedings of the …, 2001

downloadDownload free PDF View PDFchevron_right

Maskkot - An Entity-centric Annotation Platform

Stefano Bortoli, Andrea Turbati

The Semantic Web is facing the important challenge to maintain its promise of a real world-wide graph of interconnected resources. Unfortunately, while URIs almost guarantee a direct reference to entities, the relation between the two is not bijective. Many different URI references to same concepts and entities can arise when --in such a heterogeneous setting as the WWW --people independently build new ontologies, or populate shared ones with new arbitrarily identified individuals. The proliferation of URIs is an unwanted, though natural effect strictly bound to the same principles which characterize the Semantic Web; reducing this phenomenon will improve the recall of Semantic Search engines, which could rely on explicit links between heterogeneous information sources. To address this problem, in this paper we present an integrated environment combining the semantic annotation and ontology building features available in the Semantic Turkey web browser extension, with globally unique identifiers for entities provided by the okkam Entity Name System, thus realizing a valuable resource for preventing diffusion of multiple URIs on the (Semantic) Web.

downloadDownload free PDF View PDFchevron_right

DOCUMENT INDEXING - PROVIDING A BASIS FOR SEMANTIC DOCUMENT ANNOTATION

Harald Sack, Clemens Beckstein

A document index represents a concise ordered compilation of the document's most important topics. It provides direct and fast access to the document parts related to the index information. Together with structural knowledge of the document itself in connection with general knowledge about indexing a 2-layered Index Graph is defined that is further mapped to an ontology representation. By defining suitable metrics it is shown how the Index Graph can be utilised to augment semantic applications. We have developed a system for supporting the author of a document in the process of index compilation. Other possible applications include document visualisation, and semantic document annotation.

downloadDownload free PDF View PDFchevron_right

Automatic semantic annotation with KIM

Sign up for access to the world's latest research

AbstractAI

Related papers

References (2)

Related papers

Related topics

Abstract
AI