Managing and mining

chandu Achugattla

Outline

Title

Managing and mining

chandu Achugattla

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Maurice Van Keulen

2011

After the Twente Data Management workshop on Uncertainty in Databases held at the university of Twente in June 2006, the speakers and participants expressed their wish for a workshop on the same topic colocated with a large, international conference. This Management of Uncertain Data workshop, colocated with the international conference on Very Large DataBases (VLDB) is the result of this wish. We received 9 submissions from all over the world.

downloadDownload free PDF View PDFchevron_right

Probabilistic Databases: Diamonds in the Dirt (Extended Version)

Dan Suciu

2008

downloadDownload free PDF View PDFchevron_right

A Naive Approach for Handling Uncertainty Inherent in Query Optimization of Probabilistic Databases

Ritika Wason

bvicam.ac.in

Databases today are deterministic, that is, an item is either in the database or not. Similarly, a tuple is either in the query result or not. This process of mapping the real world inherently includes ambiguities and uncertainties and is seldom perfect. In today's data-driven competitive world a wide range of applications have emerged that needs to handle very large, imprecise data sets with inherent uncertainties. Uncertain data is natural in many important real world applications like environmental surveillance, market analysis and quantitative economic research. Data uncertainty innate in these important real world applications is generally the result of factors like data randomness and incompleteness, misaligned schemas, limitations of measuring equipment, delayed data update, imprecise queries etc . Due to the importance of these applications and the rapidly increasing amount of uncertain data collected and accumulated, analyzing large collections of uncertain data has become an important task and has attracted more and more interest from the database community. Probabilistic Databases hold the promise of being a viable means for large-scale uncertainty management, increasingly being required in a large number of real world application domains . A probabilistic database is an uncertain database in which the possible worlds have associated probabilities, that is, an item belongs to the database is a probabilistic event either with tuple-existence uncertainty or with attribute-value uncertainty. However, a tuple as an answer to query is again a probabilistic event. An important aspect in tackling the research and development on uncertain data processing is the query answering techniques on uncertain and probabilistic data. Query processing in probabilistic databases remains a computational challenge as it is fundamentally more complex than other data models. There exists a rich collection of powerful, non-trivial techniques and results, some old, some very recent, that could lead to practical management techniques for probabilistic databases. However, all such techniques suffer from limitations of uncertainty inherent in result of the query. Hence, there is a need for a general probabilistic model that tackles this uncertainty at the grass root level. The basic tool for dealing with this uncertainty is probability which is defined for an event as the proportion of times that the event would occur in repetitions of essentially identical situations. Although useful and successful in many applications, probability theory is, in fact, appropriate for dealing with only a very special type of uncertainty for measuring information. Probabilistic databases are all the more susceptible to uncertainties in query results being exclusively dependent on the probabilities assigned with inherent uncertainty in the evaluation of probabilities. Thus it becomes a potential area where this fundamental problem can be addressed and a suitable correction can be made to probabilities evaluated thereof.

downloadDownload free PDF View PDFchevron_right

Query Processing on Probabilistic Data: A Survey

Dan Suciu

Foundations and Trends® in Databases

Probabilistic data is motivated by the need to model uncertainty in large databases. Over the last twenty years or so, both the Database community and the AI community have studied various aspects of probabilistic relational data. This survey presents the main approaches developed in the literature, reconciling concepts developed in parallel by the two research communities. The survey starts with an extensive discussion of the main probabilistic data models and their relationships, followed by a brief overview of model counting and its relationship to probabilistic data. After that, the survey discusses lifted probabilistic inference, which are a suite of techniques developed in parallel by the Database and AI communities for probabilistic query evaluation. Then, it gives a short summary of query compilation, presenting some theoretical results highlighting limitations of various query evaluation techniques on probabilistic data. The survey ends with a very brief discussion of some popular probabilistic data sets, systems, and applications that build on this technology.

downloadDownload free PDF View PDFchevron_right

Structured Querying of Web Text Data: A Technical Challenge

Dan Suciu

Cidr, 2007

The Web contains a huge amount of text that is currently beyond the reach of structured access tools. This unstructured data often contains a substantial amount of implicit structure, much of which can be captured using information extraction (IE) algorithms. By combining an IE system with an appropriate data model and query language, we could enable structured access to all of the Web's unstructured data. We propose a general-purpose query system called the extraction database, or ExDB, which supports SQL-like structured queries over Web text. We also describe the technical challenges involved, motivated in part by our experiences with an early 90M-page prototype.

downloadDownload free PDF View PDFchevron_right

Graphical models for uncertain data

Lise Getoor

2009

Abstract Graphical models are a popular and well-studied framework for compact representation of a joint probability distribution over a large number of interdependent variables, and for efficient reasoning about such a distribution. They have been proven useful in a wide range of domains from natural language processing to computer vision to bioinformatics. In this chapter, we present an approach to using graphical models for managing and querying large-scale uncertain databases.

downloadDownload free PDF View PDFchevron_right

Probabilistic Ranking Techniques in Relational Databases

mohamed soliman

2011

Synthesis Lectures on Data Management is edited by Tamer Özsu of the University of Waterloo. The series will publish 50-to 125 page publications on topics pertaining to data management. The scope will largely follow the purview of premier information and computer science conferences, such as ACM SIGMOD, VLDB, ICDE, PODS, ICDT, and ACM KDD. Potential topics include, but not are limited to: query languages, database system architectures, transaction management, data warehousing, XML and databases, data stream systems, wide scale data distribution, multimedia data management, data mining, and related subjects.

downloadDownload free PDF View PDFchevron_right

Database theory column

Dan Suciu

ACM SIGACT News, 2008

The 26th edition of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Databases (PODS), took place from 11 to 13 June 2007. As usual since 1991, the symposium was organized jointly with the ACM SIGMOD International Conference on Management of Data. While SIGMOD focuses on practical aspects of database systems, PODS focuses on the theory of such systems. The joint organization stimulates interaction between systems and theory research. For the first time ever the joint SIGMOD/PODS conference was held in Asia: it took place in Beijing, China.

downloadDownload free PDF View PDFchevron_right

Management of probabilistic data

Dan Suciu

2008

Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as probabilities. We review the basic definitions of the probabilistic data model, present some fundamental theoretical result for query evaluation on probabilistic databases, and discuss several challenges, open problems, and research directions.

downloadDownload free PDF View PDFchevron_right

Structured querying of Web text

Dan Suciu

2007

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Dan Suciu

2009

downloadDownload free PDF View PDFchevron_right

Proceedings of the Third International Workshop on Management of Uncertain Data (MUD2009)

Maurice Van Keulen

2009

Preface This is the third edition of the international workshop on Management of Uncertain Data. The previous editions took place in Vienna, Austria and Auckland, New Zealand. The edition in Auckland was a combined event with the workshop on Quality in Databases. Research on uncertain data has grown over the past few years. Since the prequal to this workshop, the Twente Data Management workshop on Uncertain Data, the number of submissions about management of uncertain data to large conferences has grown rapidly.

downloadDownload free PDF View PDFchevron_right

Handling Uncertainty in Database: An Introduction and Brief Survey

Ahmed Sharaf Eldin

Computer and Information Science, 2015

In the last years, uncertainty management became an important aspect as the presence of uncertain data increased rapidly. Due to the several advanced technologies that have been developed to record large quantity of data continuously, resulting is a data that contain errors or may be partially complete. Instead of dealing with data uncertainty by removing it, we must deal with it as a source of information. To deal with this data, database management system should have special features to handle uncertain data. The aim of this paper is twofold: on one hand, to introduce some main concepts of uncertainty in database by focusing on different data management issues in uncertain databases such as join and query processing, database integration, indexing uncertain data, security and information leakage and representation formalisms. On the other hand, to provide a survey of the current database management systems dealing with uncertain data, presenting their features and comparing them.

downloadDownload free PDF View PDFchevron_right

Flexible matching of Ear Biometrics

Guy De Tré

2007

When identifying found bodies at the scene of a large-scale disaster, a technique for fast and cheap identification is very useful. Many techniques have been proposed in the past for this purpose. The method developed here allows flexible matching of ante mortem and post mortem pictures taken of the human ear and faces a challenge left unhandled before: the low quality control on the pictures of missing persons. As these pictures can be unsharp and out of profile, the comparison becomes more dicult

downloadDownload free PDF View PDFchevron_right

A New Language and Architecture to Obtain Fuzzy Global Dependencies

Mari Aguilar

downloadDownload free PDF View PDFchevron_right

Management of probabilistic data: foundations and challenges

Dan Suciu

2007

downloadDownload free PDF View PDFchevron_right

Docteur en Informatique de l'Université Claude Bernard Lyon 1

Soumaya Amdouni

2015

In this thesis we focus on the data web services composition problem and study the impact of the uncertainty that may be associated with the output of a service on the service selection and composition processes. This work is motivated by the increasing number of application domains where data web services may return uncertain data, including the e-commerce, scientic data exploration, open web data, etc. We call such services that return uncertain data as uncertain services. In this dissertation, we propose new models and techniques for the selection and the composition of uncertain data web services. Our techniques are based on well established fuzzy and probabilistic database theories and can handle the uncertainty eciently. First, we proposed a composition model that takes into account the user preferences. In our model, user preferences are modelled as fuzzy constraints, and services are described with fuzzy constraints to better characterize their accessed data. The composition model features also a composition algebra that allows us to rank the returned results based on their relevance to user's preferences. Second, we proposed a probabilistic approach to model the uncertainty of the data returned by uncertain data services. Specically, we extended the web service description standards (e.g., WSDL) to represent the outputs' probabilities. We also extended the service invocation process to take into account the uncertainty of input data. This extension is based on the possible worlds theory used in the probabilistic databases. We proposed also a set of probability-aware composition operators that are necessary to orchestrate uncertain data services. Since a composition may accept multiple orchestration plans and not all of them compute the correct probabilities of outputs, we dened a set of conditions to check if a plan is safe (i.e., computes the probabilities correctly) or not. We implemented our dierent techniques and applied them to the real-estate and e-commerce domains. We provide a performance study of our dierent composition techniques.

downloadDownload free PDF View PDFchevron_right

Novel Algorithms TempCIPFP for Mining Frequent Patterns using Counting Inference from Probabilistic Temporal Databases and Future Possibilities

Niket Bhargava

2016

In this paper we present novel algorithms TempCIPFP for Mining Frequent Patterns using Counting Inference from Probabilistic Temporal Databases and we also discussed future possibilities. We consider the problem of discovering frequent itemsets and association-rules between/among items in a large database of transactional databases acquired under uncertainty in certain time. With each timestamped transaction associated is a probability gives the confidence that the transaction occurred with given probability on that time. We discuss generalized algorithms for solving this problem that are fundamentally different from the known algorithms. Complete demonstration of algorithm presented and discussed in this paper. We will also show how the best features of the algorithm can be combined into a business system. In this paper, we address the problem of the efficiency of the main phase of most data mining applications: The frequent pattern extraction. This problem is mainly related to the...

downloadDownload free PDF View PDFchevron_right

Quality awareness for managing and mining data

Laure Berti-Equille

Autonomy Heterogeneity no yes totally semi DIS DW & MIS VMS CIS RS P2P no

downloadDownload free PDF View PDFchevron_right

Modeling, Querying, and Mining Uncertain XML Data

Pierre Senellart

2011

downloadDownload free PDF View PDFchevron_right

Managing and mining

Sign up for access to the world's latest research

Related papers

Related papers