Triplifying Equivalence Set Graphs
2019
Sign up for access to the world's latest research
Abstract
In order to conduct large-scale semantic analyses, it is necessary to calculate the deductive closure of very large hierarchical structures. Unfortunately, contemporary reasoners cannot be applied at this scale, unless they rely on expensive hardware such as a multi-node in-memory cluster. In order to handle large-scale semantic analyses on commodity hardware such as regular laptops we introduced [1] a novel data structure called Equivalence Set Graph (ESG). An ESG allows to specify compact views of large RDF graphs thus easing the accomplishment of statistical observations like the number of concepts defined in a graph, the shape of ontological hierarchies etc. ESGs are built by a procedure presented in [1] that delivers graphs as a set of maps storing nodes and edges. In this demo paper (i) we show how facts entailed by an ESG and the graph itself can be specified in RDF following a novel introduced ontology; and, (ii) we present two datasets resulting from the triplification of t...
Related papers
2011
As semantic graph database technology grows to address components ranging from large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a high performance hybrid system comprising computational capability for semantic graph database processing utilizing the multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths of the Billion Triple Challenge 2010 dataset.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011
Semantic Web data with annotations is becoming available, being YAGO knowledge base a prominent example. In this paper we present an approach to perform the closure of large RDF Schema annotated semantic web data using standard database technology. In particular, we exploit several alternatives to address the problem of computing transitive closure with real fuzzy semantic data extracted from YAGO in the PostgreSQL database management system. We benchmark the several alternatives and compare to classical RDF Schema reasoning, providing the first implementation of annotated RDF schema in persistent storage.
LDSR is a collection of datasets from th e Linked Open Data (LOD) W3C commun it y project, which have been selected and refin ed for th e purpose of presenting a us eful perspective to some of th e central LOD datasets and to present a good use-case for large-scale reasoning and data integration. The design objectives are as fol lows: (i) consistency with respect to the formal semanti cs, (ii) generality -no specific domain knowledge should be required to comprehend most of the semantics, and (iii) heterogeneity -data from mult iple data sources should be included. The current version of LDSR consists of about 440 million expli cit statements and includes DBP edia, Geonames, Wordnet, CIA Factbook, li ngvoj, and UMBEL. LDSR includes the ontologi es of th e datasets and th e following schemata, used by them: SKOS, FOAF, RSS, and Dublin Core.
Master's thesis, August, 2004
The Resource Description Framework (RDF) is a language for metadata assertions about information resources on the World-Wide Web, and is thus a foundation for a future Semantic Web. The atomic construct of RDF are statements, which are triples consisting of the resource being described, a property, and a property value. A collection of RDF statements can be intuitively understood as a graph: resources are nodes and statements are arcs connecting the nodes. The graph nature of this abstract triple syntax is indeed appealing, but the RDF specification does not distinguish clearly among (1) the term of RDF Graph (merely a set of triples, thus not a standard graph), (2) the mathematical concept of graph, and (3) the graph-like visualization of RDF data ("node and directed-arc diagrams"). This thesis argues that there is need for an explicit graph representation for RDF, which allows the application of technics and results from graph theory and which serves as an intermediate model between the abstract triple syntax and task-specific serialization of RDF data. Directed labeled graphs currently used by default suffer from an ambiguous definition and, furthermore, have limitations inherent in any approach representing RDF triple statements by essentially binary (although labeled) edges. As an alternative, it is natural to consider hypergraphs with ternary edges; from this, we derive RDF bipartite graphs as an intermediate graph-based model for RDF. This proposal is complemented by studies of its transformation cost and its "size" compared to a directed labeled graph representation. The thesis furthermore investigates some issues of RDF's graph nature in the light of the new model: RDF maps are studied as maps on graphs and an approach to decompose an RDF Graph into data and schema layers is presented. For the processing of RDF data the notions of connectivity and paths in RDF Graphs are essential; because RDF bipartite graphs incorporate statements and properties as nodes into the graph, it turns out that this model conveys a richer sense of connectivity than the standard directed labeled graph representations. Finally, we explore the perspectives of enhancing the expressivity of RDF query languages by a proposal of graph-based query primitives.
In this paper we will discuss two different translations between RDF (Resource Description Format) and Conceptual Graphs (CGs). These translations will allow tools like Cogui and Cogitant to be able to import and export RDF(S) documents. The first translation is sound and complete from a reasoning view point but is not visual nor a representation in the spirit of Conceptual Graphs (CGs). The second translation has the advantage of being natural and fully exploiting the CG features, but, on the other hand it does not apply to the whole RDF(S). We aim this paper as a preliminary report of ongoing work looking in detail at different pro and the cons of each approach.
Ontologies are pervading many areas of knowledge representation and management. To date, most research efforts have been spent on the development of sufficiently expressive languages for the representation and querying of ontologies; however, querying efficiency has received attention only recently, especially for ontologies referring to large amounts of data. In fact, it is still uncertain how reasoning tasks will scale when applied on massive amounts of data. This work is a first step toward this setting: based on a previous result showing that the SPARQL query language can be mapped to a Datalog, we show how efficient querying of big ontologies can be accomplished with a database oriented extension of the well known system DLV, recently developed. We report our initial results and we discuss about benefits of possible alternative data structures for representing RDF graphs in our architecture.
Lecture Notes in Computer Science, 2015
Billions of RDF triples are currently available on the Web through the Linked Open Data cloud (e.g., DBpedia, LinkedGeoData and New York Times). Governments, universities as well as companies (e.g., BBC, CNN) are also producing huge collections of RDF triples and exchanging them through different serialization formats (e.g., RDF/XML, Turtle, N-Triple, etc.). However, RDF descriptions (i.e., graphs and serializations) are verbose in syntax, often contain redundancies, and could be generated differently even when describing the same resources, which would have a negative impact on various RDF-based applications (e.g., RDF storage, processing time, loading time, similarity measuring, mapping, alignment, and versioning). Hence, to improve RDF processing, we propose here an approach to clean and eliminate redundancies from such RDF descriptions as a means of transforming different descriptions of the same information into one representation, which can then be tuned, depending on the target application (information retrieval, compression, etc.). Experimental tests show significant improvements, namely in reducing RDF description loading time and file size.
2018
Graph-based modelling is becoming more popular, in the sciences and elsewhere, as a flexible and powerful way to exploit data to power world-changing digital applications. Compared to the initial vision of the Semantic Web, knowledge graphs and graph databases are becoming a practical and computationally less formal way to manage graph data. On the other hand, linked data based on Semantic Web standards are a complementary, rather than alternative, approach to deal with these data, since they still provide a common way to represent and exchange information. In this paper we introduce rdf2neo, a tool to populate Neo4j databases starting from RDF data sets, based on a configurable mapping between the two. By employing agrigenomicsrelated real use cases, we show how such mapping can allow for a hybrid approach to the management of networked knowledge, based on taking advantage of the best of both RDF and property graphs.
2007
This paper presents a minimalist program for RDF, by showing how one can do without several predicates and keywords of the RDF Schema vocabulary, obtaining a simpler language which preserves the original semantics. This approach is beneficial in at least two directions: (a) To have a simple abstract fragment of RDFS easy to formalize and to reason about, which captures the essence of RDFS; (b) To obtain algorithmic properties of deduction and optimizations that are relevant for particular fragments. Among our results are: the identification of a simple fragment of RDFS; the proof that it encompasses the main features of RDFS; a formal semantics and a deductive system for it; sound and complete deductive systems for their sub-fragments; and an O(n log n) complexity bound for ground entailment in this fragment.
2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems, 2014
We present a new version of CEDAR, a taxonomic reasoner for large-scale ontologies. This extended version provides fuller support for TBox reasoning, checking consistency, and retrieving instances. CEDAR is built on top of the OSF formalism and based on an entirely new architecture which includes several optimization techniques. Using OSF graph structures, we define a bidirectional mapping between OSF structure and the Resource Description Framework (RDF) allowing a translation from OSF queries into SPARQL for retrieving instances. Experiments were carried out using very large ontologies. The results achieved by CEDAR were compared to those obtained by well-known Semantic Web reasoners such as FaCT++, Pellet, HermiT, TrOWL, and RacerPro. CEDAR performs on a par with the best systems for concept classification and several orders of magnitude more efficiently in terms of response time for Boolean query-answering.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.