We consider social peer-to-peer data management systems (PDMS), where each peer maintains both se... more We consider social peer-to-peer data management systems (PDMS), where each peer maintains both semantic mappings between its schema and some acquaintances, and social links with peer friends. In this context, reformulating a query from a peer's schema into other peer's schemas is a hard problem, as it may generate as many rewritings as the set of mappings from that peer to the outside and transitively on, by eventually traversing the entire network. However, not all the obtained rewritings are relevant to a given query. In this paper, we address this problem by inspecting semantic mappings and social links to find only relevant rewritings. We propose a new notion of 'relevance' of a query with respect to a mapping, and, based on this notion, a new semantic query reformulation approach for social PDMS, which achieves great accuracy and flexibility. To find rapidly the most interesting mappings, we combine several techniques: (i) social links are expressed as FOAF (Friend of a Friend) links to characterize peer's friendship and compact mapping summaries are used to obtain mapping descriptions; (ii) local semantic views are special views that contain information about external mappings; and (iii) gossiping techniques improve the search of relevant mappings. Our experimental evaluation, based on a prototype on top of PeerSim and a simulated network demonstrate that our solution yields greater recall, compared to traditional query translation approaches proposed in the literature.
We describe the architecture and main algorithmic design decisions for an XQuery/XPath processing... more We describe the architecture and main algorithmic design decisions for an XQuery/XPath processing engine over XML collections which will be represented using a self-indexing approach, that is, a compressed representation that will allow for basic searching and navigational operations in compressed form. The goal is a structure that occupies little space and thus permits manipulating large collections in main memory.
We outline in this paper the main contributions of the XQueC project. XQueC, namely XQuery proces... more We outline in this paper the main contributions of the XQueC project. XQueC, namely XQuery processor and Compressor, is the first compression tool to seamlessly allow XQuery queries in the compressed domain. It includes a set of data structures, that basically shred the XML document into suitable chunks linked to each other, thus disagreeing with the 'homomorphic' principle so far adopted in previous XML compressors. According to this principle, the compressed document is homomorphic to the original document. Moreover, in order to avoid the time consumption due to compressing and decompressing intermediate query results, XQueC applies 'lazy' decompression by issuing the queries directly in the compressed domain.
XML is rapidly becoming one of the most widely adopted technologies for information exchange and ... more XML is rapidly becoming one of the most widely adopted technologies for information exchange and representation. As the use of XML becomes more widespread, we foresee the development of active XML rules, i.e., rules explicitly designed for the management of XML information. In particular, we argue that active rules for XML offer a natural paradigm for the rapid development of innovative e-services. In the paper, we show how active rules can be specified in the context of XSLT, a pattern-based language for publishing XML documents (promoted by the W3C) which is receiving strong commercial support, and Lorel, a query language for XML documents that is quite popular in the research world. We demonstrate, through simple examples of active rules for XSLT and Lorel, that active rules can be effective for the implementation of e-commerce services. We also discuss the various issues that need to be considered in adapting the notion of relational triggers to the XML context.
Graph data management tools are nowadays evolving at a great pace. Key drivers of progress in the... more Graph data management tools are nowadays evolving at a great pace. Key drivers of progress in the design and study of data intensive systems are solutions for synthetic generation of data and workloads, for use in empirical studies. Current graph generators, however, provide limited or no support for workload generation or are limited to fixed use-cases. Towards addressing these limitations, we demonstrate gMark, the first domain- and query language-independent framework for synthetic graph and query workload generation. Its novel features are: (i) fine-grained control of graph instance and query workload generation via expressive user-defined schemas; (ii) the support of expressive graph query languages, including recursion among other features; and, (iii) selectivity estimation of the generated queries. During the demonstration, we will showcase the highly tunable generation of graphs and queries through various user-defined schemas and targeted selectivities, and the variety of s...
IEEE Transactions on Knowledge and Data Engineering, 2017
Massive graph data sets are pervasive in contemporary application domains. Hence, graph database ... more Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this paper, we present the design and engineering principles of gMark, a domain-and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.
HAL (Le Centre pour la Communication Scientifique Directe), Mar 23, 2015
Graph databases are becoming pervasive in several application scenarios such as the Semantic Web,... more Graph databases are becoming pervasive in several application scenarios such as the Semantic Web, social and biological networks, and geographical databases, to name a few. However, specifying a graph query is a cumbersome task for non-expert users because graph databases (i) are usually of large size hence difficult to visualize and (ii) do not carry proper metadata as there is no clear distinction between the instances and the schemas. We present GPS, a system for interactive path query specification on graph databases, which assists the user to specify path queries defined by regular expressions. The user is interactively asked to visualize small fragments of the graph and to label nodes of interest as positive or negative, depending on whether or not she would like the nodes as part of the query result. After each interaction, the system prunes the uninformative nodes i.e., those that do not add any information about the user's goal query. Thus, the system also guides the user to specify her goal query with a minimal number of interactions.
Proceedings of the 11th international conference on Extending database technology: Advances in database technology, 2008
Schema mapping algorithms rely on value correspondencesi.e., correspondences among semantically r... more Schema mapping algorithms rely on value correspondencesi.e., correspondences among semantically related attributes-to produce complex transformations among data sources. These correspondences are either manually specified or suggested by separate modules called schema matchers. The quality of mappings produced by a mapping generation tool strongly depends on the quality of the input correspondences. In this paper, we introduce the Spicy system, a novel approach to the problem of verifying the quality of mappings. Spicy is based on a three-layer architecture, in which a schema matching module is used to provide input to a mapping generation module. Then, a third module, the mapping verification module, is used to check candidate mappings and choose the ones that represent better transformations of the source into the target. At the core of the system stands a new technique for comparing the structure and actual content of trees, called structural analysis. Experimental results show that, by carefully designing the comparison algorithm, it is possible to achieve both good scalability and high precision in mapping selection.
The intent of peer data management systems (PDMS) is to share as much data as possible. However, ... more The intent of peer data management systems (PDMS) is to share as much data as possible. However, in many applications leveraging sensitive data, users demand adequate mechanisms to restrict the access to authorized parties. In this paper, we study a distributed access control model, where data items are stored, queried and authenticated in a totally decentralized fashion. Our contribution focuses on the design of a comprehensive framework for access control enforcement in PDMS sharing secure data, which blends policy rules defined in a declarative language with distributed key management schemes. The data owner peer decides which data to share and whom to share with by means of such policies, with the data encrypted accordingly. To defend against malicious attackers who can compromise the peers, the decryption keys are decomposed into pieces scattered amongst peers. We discuss the details of how to adapt distributed encryption schemes to PDMS to enforce robust and resilient access control, and demonstrate the efficiency and scalability of our approach by means of an extensive experimental study.
The increasing demand of matching and mapping tasks in modern integration scenarios has led to a ... more The increasing demand of matching and mapping tasks in modern integration scenarios has led to a plethora of tools for facilitating these tasks. While the plethora made these tools available to a broader audience, it led into some form of confusion regarding the exact nature, goals, core functionalities expected features and basic capabilities of these tools. Above all, it made performance measurements of these systems and their distinction, a difficult task. The need for design and development of comparison standards that will allow the evaluation of these tools is becoming apparent. These standards are particularly important to mapping and matching system users since they allow them to evaluate the relative merits of the systems and take the right business decisions. They are also important to mapping system developers, since they offer a way of comparing the system against competitors, and motivating improvements and further development. Finally, they are important to researchers since they serve as illustrations of the existing system limitations, triggering further research in the area. In this work we provide a generic overview of the existing efforts on benchmarking schema matching and mapping tasks. We offer a comprehensive description of the problem, list the basic comparison criteria and techniques and provide a description of the main functionalities and characteristics of existing systems.
XML is becoming the most relevant new standard for data representation and exchange on the WWW. N... more XML is becoming the most relevant new standard for data representation and exchange on the WWW. Novel languages for extracting and restructuring the XML content have been proposed, some in the tradition of database query languages (i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query language has yet been decided, but the discussion is ongoing within the World Wide Web Consortium and within many academic institutions and Internet-related major companies. We present a comparison of five, representative query languages for XML, highlighting their common features and differences.
Fragmentation techniques for XML data are gaining momentum within both distributed and centralize... more Fragmentation techniques for XML data are gaining momentum within both distributed and centralized XML query engines and pose novel and unrecognized challenges to the community. Albeit not novel, and clearly inspired by the classical divide et impera principle, fragmentation for XML trees has been proved successful in boosting the querying performance, and in cutting down the memory requirements. However, fragmentation considered so far has been driven by semantics, i.e. built around query predicates. In this paper, we propose a novel fragmentation technique that founds on structural constraints of XML documents (size, tree-width, and tree-depth) and on special-purpose structure histograms able to meaningfully summarize XML documents. This allows us to predict bounding intervals of structural properties of output (XML) fragments for efficient query processing of distributed XML data. An experimental evaluation of our study confirms the effectiveness of our fragmentation methodology on some representative XML data sets.
2006 10th International Database Engineering and Applications Symposium (IDEAS'06), 2006
The problem of securing XML databases is rapidly gaining interest for both academic and industria... more The problem of securing XML databases is rapidly gaining interest for both academic and industrial research. It becomes even more challenging when XML data are managed and delivered according to the P2P paradigm, as malicious attacks could take advantage from the totally-decentralized and untrusted nature of P2P networks. Starting from these considerations, in this paper we propose the guidelines of a distributed framework for supporting (i) secure fragmentation of XML documents into P2P XML databases by means of lightweight XPath-based identifiers, and (ii) the creation of trusted groups of peers by means of "self-certifying" XPath links that exploit the benefits of well-known fingerprinting techniques.
Lossy compression techniques have been applied to image and text compression, yielding compressio... more Lossy compression techniques have been applied to image and text compression, yielding compression factors that are vastly superior to lossless compression schemes. In this paper, we present a preliminary study on a set of lossy transformations for XML documents that preserve the semantics. Inspired by previous techniques, e.g. lossy text compression and literate programming, we apply a simple algorithm to XML syntactic constructs to loose superfluous layout information and redundant text. The obtained XML keeps the human-readability and machine-readability properties. Additionally, it can lead to a considerable reduction of its space occupancy and boost the application of conventional text compressors, thus representing a promising technology for several data management tasks.
Data exchange is the problem of translating data structured under a source schema according to a ... more Data exchange is the problem of translating data structured under a source schema according to a target schema and a set of source-to-target constraints known as schema mappings. In this paper, we investigate the problem of data exchange in a heterogeneous setting, where the source is a relational database, the target is a graph database, and the schema mappings are defined across them. We study the classical problems considered in data exchange, namely the existence of solutions and query answering. We show that both problems are intractable in the presence of target constraints, already under significant restrictions.
Uploads
Papers by A. Bonifati