Papers by Yannis Vassiliou

Keyword search is the most popular querying technique on semistructured data. Keyword queries are... more Keyword search is the most popular querying technique on semistructured data. Keyword queries are simple and convenient. However, as a consequence of their imprecision, there is usually a huge number of candidate results of which only very few match the user’s intent. Unfortunately, the existing semantics for keyword queries are ad-hoc and they generally fail to “guess” the user intent. Therefore, the quality of their answers is poor and the existing algorithms do not scale satisfactorily. In this paper, we introduce the novel concept of cohesive keyword queries for tree data. Intuitively, a cohesiveness relationship on keywords indicates that they should form a cohesive whole in a query result. Cohesive keyword queries allow term nesting and keyword repetition. Cohesive keyword queries bridge the gap between flat keyword queries and structured queries. Although more expressive, they are as simple as flat keyword queries and not require any schema knowledge. We provide formal semant...

Lecture Notes in Computer Science, 1999
A Data Warehouse DW can be abstractly seen as a set of materialized views de ned over relations t... more A Data Warehouse DW can be abstractly seen as a set of materialized views de ned over relations that are stored in distributed heterogeneous databases. The selection of views for materialization in a DW i s t h us an important decision problem. The objective is the minimization of the combination of the query evaluation and view maintenance costs. In this paper we expand on our previous work by proposing new heuristic algorithms for the DW design problem. These algorithms are described in terms of a state space search problem, and are guaranteed to deliver an optimal solution by expanding only a small fraction of the states produced by the original exhaustive algorithm. ? Research supported by the European Commission under the ESPRIT Program LTR project "DWQ: Foundations of Data Warehouse Quality" 1.1 Related Work Many authors in di erent contexts have addressed the view selection problem. H. Gupta and I.S. Mumick i n 2 use an A* algorithm to select the set of views that minimizes the total query-response time and also keeps the total maintenance time less than a certain value. A greedy heuristic is also presented in this work. Both algorithms are based on the theoretical framework developed in 1 using AND OR view directed acyclic graphs. In 3 a similar problem is considered for selection-join views with indexes. An A* algorithm is also provided as well as rules of thumb, under a number of simplifying assumptions. In 10 , Yang, Karlapalem and Li propose heuristic approaches that provide a feasible solution based on merging individual optimal query plans. In a context where views are sets of pointer arrays, Roussopoulos also provides in 7 an A* algorithm that optimizes the query evaluation and view maintenance cost.
ACM SIGMOD Record, 2006
The Knowledge and Database Systems Lab (KDBSL) of the Electrical and Computer Engineering Dept. i... more The Knowledge and Database Systems Lab (KDBSL) of the Electrical and Computer Engineering Dept. in the National Technical University of Athens was founded in 1992 by Prof. Timos Sellis and Prof. Yannis Vassiliou. Its activities involve theoretical and applied research in the area of Databases and Information Systems. The lab employs three postdoc researchers (Dr Theodore Dalamagas, Dr Alkis Simitsis, Dr Yannis Stavrakas), several PhD students and many graduate students. It has been involved in many research projects supported by the EU, international institutions, Greek organizations, the Greek Government and industrial companies.

Information Systems, 2001
− − − − Previous research has provided metadata models that enable the capturing of the static co... more − − − − Previous research has provided metadata models that enable the capturing of the static components of a data warehouse architecture, along with information on different quality factors over these components. This paper complements this work with the modeling of the dynamic parts of the data warehouse. The proposed metamodel of data warehouse operational processes is capable of modeling complex activities, their interrelationships, and the relationship of activities with data sources and execution details. Moreover, the metamodel complements the existing architecture and quality models in a coherent fashion, resulting in a full framework for quality-oriented data warehouse management, capable of supporting the design, administration and especially evolution of a data warehouse. Finally, we exploit our framework to revert the widespread belief that data warehouses can be treated as collections of materialized views. We have implemented this metamodel using the language Telos and the metadata repository system ConceptBase.
Advanced Information Systems Engineering
Lecture Notes in Computer Science, 1997
... С (IT) Stojanovic, N. (DE) Erickson, J. (US) Matera, M. (IT) Strand, M. (SE) Evermann, J. (CA... more ... С (IT) Stojanovic, N. (DE) Erickson, J. (US) Matera, M. (IT) Strand, M. (SE) Evermann, J. (CA)Mateus, P. (PT ... Werthner Università degli Studi di Trento, Italy Evaluation, Assessment and Certification of Dependable Information Systems Hans-Ludwig Hausen, Germany Martín Hite ...
Journal of Ambient Intelligence and Humanized Computing, 2017
CHOOSING A DATABASE QUERY LANGUAGE Matthias Jarke and Yannis Vassiliou November 1982 Revised Apri... more CHOOSING A DATABASE QUERY LANGUAGE Matthias Jarke and Yannis Vassiliou November 1982 Revised April 1984 Center for Research on Information Systems Computer Applications and Information Systems Graduate School of Business Administration New York ...

Abstract. Databases are continuously evolving environments, where design constructs are added, re... more Abstract. Databases are continuously evolving environments, where design constructs are added, removed or updated quite often. Research has extensively dealt with the problem of database evolution. Nevertheless, problems arise with existing queries, mainly due to the fact that in most cases, their role as integral parts of the environment is not given the proper attention. Furthermore, the queries are not designed to handle database evolution. In this paper, we first introduce a graph-based model that uniformly captures relations, views, constraints and queries. For several cases of database evolution we present rules so that both syntactical and semantic correctness of queries are retained. To this end, we also extend the query formulation capabilities by annotating SQL queries with information concerning the semantically aware adaptation of a query in the presence of changes in the underlying database. 1.
Abstract. In this paper, we discuss the problem of performing impact predic-tion for changes that... more Abstract. In this paper, we discuss the problem of performing impact predic-tion for changes that occur in the schema/structure of the data warehouse sources. We abstract Extract-Transform-Load (ETL) activities as queries and sequences of views. ETL activities and its sources are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and highlights the way they are tuned to respond to it. For many cases of ETL source evolution, we present rules so that both syntactical and semantic correctness of activities are retained. Finally, we experiment with the evaluation of our approach over real-world ETL workflows used in the Greek public sector.

Journal of Complex Networks, 2017
Legislators, designers of legal information systems, as well as citizens face often problems due ... more Legislators, designers of legal information systems, as well as citizens face often problems due to the interdependence of the laws and the growing number of references needed to interpret them. In this paper, we introduce the "Legislation Network" as a novel approach to address several quite challenging issues for identifying and quantifying the complexity inside the Legal Domain. We have collected an extensive data set of a more than 60-year old legislation corpus, as published in the Official Journal of the European Union, and we further analysed it as a complex network, thus gaining insight into its topological structure. Among other issues, we have performed a temporal analysis of the evolution of the Legislation Network, as well as a robust resilience test to assess its vulnerability under specific cases that may lead to possible breakdowns. Results are quite promising, showing that our approach can lead towards an enhanced explanation in respect to the structure and evolution of legislation properties.

Towards Automatic Structuring and Semantic Indexing of Legal Documents
Proceedings of the 20th Pan-Hellenic Conference on Informatics, 2016
Over the last years there has been a great increase on the number of freely available legal resou... more Over the last years there has been a great increase on the number of freely available legal resources. Portals that allow users to search for legislation, using keywords are now a common place. However, in the vast majority of those portals, legal documents are not stored in a structured format with a rich set of meta data, but in presentation oriented manifestation, making impossible for the end users to inquiry semantics about the documents, such as date of enactment, date of repeal, jurisdiction, etc. or to reuse information and establish an interconnection with similar repositories. In this paper, we present an approach for extracting a machine readable semantic representation of legislation, from unstructured document formats. Our method exploits common formats of legal documents to identify blocks of structural and semantic information and models them according to a popular legal meta-schema. Our proposed method is highly extensible and achieves high accuracy for a variety of legal and para legal documents, especially legislation. Our evaluation results reveal that our methodology can be of great assistance for the automatic structuring and semantic indexing of legal resources.

Multi-dimension Diversification in Legal Information Retrieval
Lecture Notes in Computer Science, 2016
The number of freely available legal data sets is increasing at high speed. Citizens can easily a... more The number of freely available legal data sets is increasing at high speed. Citizens can easily access a lot of information about regulations, court orders, statutes, opinions and analytical documents. Such openness brings undeniable benefits in terms of transparency, participation and availability of new services. However, legal information overload poses new challenges, especially in the field of Legal Information Retrieval. Search result diversification has gained attention as a way to increase user satisfaction in web search. We hypothesize that such a strategy will also be beneficial for search on legal data sets. We address diversification of results in legal search by introducing legal domain specific diversification criteria and adopting several state of the art methods from the web search, network analysis and text summarization domains. We evaluate our diversification framework using a real data set from the Common Law domain that we subjectively annotated with relevance judgments for this purpose. Our findings reveal that web search diversification techniques outperform other approaches (e.g. summarization-based, graph-based methods) in the context of legal diversification, as well as that the diversity criteria we introduce provide distinctively diverse subsets of resulting documents, thus differentiating our proposal in respect to traditional diversification techniques.

IFIP Advances in Information and Communication Technology, 2016
Public legal information from all countries and international institutions is part of the common ... more Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law." In accordance with the aforementioned declaration on Free Access to Law by Legal information institutes of the world 3 , a plethora of legal information is available through the Internet, while the provision of legal information has never before been easier. Given that law is accessed by a much wider group of people, the majority of whom are not legally trained or qualified, diversification techniques, should be employed in the context of legal information retrieval, as to increase user satisfaction. We address diversification of results in legal search by adopting several state of the art methods from the web search domain. We provide an exhaustive evaluation of the methods, using a standard data set from the Common Law domain that we subjectively annotated with relevance judgments for this purpose. Our results reveal that users receive broader insights across the results they get from a legal information retrieval system.
A Graph-based Representation of Database Systems and Applications
Legislation as a complex network: Modelling and analysis of European Union legal sources
Legislators, designers of legal information systems, as well as citizens face often problems due ... more Legislators, designers of legal information systems, as well as citizens face often problems due to the interdependence of the laws and the growing number of references needed to interpret them. Quantifying this complexity is not an easy task. In this paper, we introduce the " Legislation Network " as a novel approach to address related problems. We have collected an extensive data set of a more than 60-year old legislation corpus, as published in the Official Journal of the European Union, and we further analysed it as a complex network, thus gaining insight into its topological structure. Results are quite promising, showing that our approach can lead towards an enhanced explanation in respect to the structure and evolution of legislation properties.
Journal on Data Semantics, 2012
The Extract-Transform-Load (ETL) flows are essential for the success of a data warehouse and the ... more The Extract-Transform-Load (ETL) flows are essential for the success of a data warehouse and the business intelligence and decision support mechanisms that are attached to it. During both the ETL design phase and the entire ETL lifecycle, the ETL architect needs to design and improve an ETL design in a way that satisfies both performance and correctness guarantees and often, she has to choose among various alternative designs. In this paper, we focus on ways to predict the maintenance effort of ETL workflows and we explore techniques for assessing the quality of ETL designs under the prism of evolution. We focus on a set of graph-theoretic metrics for the prediction of evolution impact and we investigate their fit into real-world ETL scenarios. We present our experimental findings and describe the lessons we learned working on real-world cases.
Lecture Notes in Computer Science, 2009
In this paper, we discuss the problem of performing impact prediction for changes that occur in t... more In this paper, we discuss the problem of performing impact prediction for changes that occur in the schema/structure of the data warehouse sources. We abstract Extract-Transform-Load (ETL) activities as queries and sequences of views. ETL activities and its sources are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and highlights the way they are tuned to respond to it. For many cases of ETL source evolution, we present rules so that both syntactical and semantic correctness of activities are retained. Finally, we experiment with the evaluation of our approach over real-world ETL workflows used in the Greek public sector.
Lecture Notes in Computer Science, 2008
During data warehouse design, the designer frequently encounters the problem of choosing among di... more During data warehouse design, the designer frequently encounters the problem of choosing among different alternatives for the same design construct. The behavior of the chosen design in the presence of evolution events is an important parameter for this choice. This paper proposes metrics to assess the quality of the warehouse design from the viewpoint of evolution. We employ a graph-based model to uniformly abstract relations and software modules, like queries, views, reports, and ETL activities. We annotate the warehouse graph with policies for the management of evolution events. The proposed metrics are based on graph-theoretic properties of the warehouse graph to assess the sensitivity of the graph to a set of possible events. We evaluate our metrics with experiments over alternative configurations of the same warehouse schema.
Lecture Notes in Computer Science, 2010
In this paper, we visit the problem of the management of inconsistencies emerging on ETL processe... more In this paper, we visit the problem of the management of inconsistencies emerging on ETL processes as results of evolution operations occurring at their sources. We abstract Extract-Transform-Load (ETL) activities as queries and sequences of views. ETL activities and its sources are uniformly modeled as a graph that is annotated with rules for the management of evolution events. Given a change at an element of the graph, our framework detects the parts of the graph that are affected by this change and highlights the way they are tuned to respond to it. We then present the system architecture of a tool called Hecataeus that implements the main concepts of the proposed framework.

Diversifying Microblog Posts
Lecture Notes in Computer Science, 2014
ABSTRACT Microblogs have become an important source of information, a medium for following and sp... more ABSTRACT Microblogs have become an important source of information, a medium for following and spreading trends, news and ideas all over the world. As a result, microblog search has emerged as a new option for covering user information needs, especially with respect to timely events, news or trends. However users are frequently overloaded by the high rate of produced microblogging posts, which often carry no new information with respect to other similar posts. In this paper we propose a method that helps users effectively harvest information from a microblogging stream, by filtering out redundant data and maximizing diversity among the displayed information. We introduce microblog posts-specific diversification criteria and apply them on heuristic diversification algorithms. We implement the above methods into a prototype system that works with data from Twitter. The experimental evaluation, demonstrates the effectiveness of applying our problem specific diversification criteria, as opposed to applying plain content diversity on microblog posts.
Uploads
Papers by Yannis Vassiliou