Quality awareness for managing and mining data
Abstract
Autonomy Heterogeneity no yes totally semi DIS DW & MIS VMS CIS RS P2P no
References (289)
- 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . .
- 3 General Approach . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Modeling Quality Metadata . . . . . . . . . . . . . . . . . . . . . 2.4.1 The CWM Metamodel . . . . . . . . . . . . . . . . . . . .
- 4.2 CWM Extension for QoD Metadata Management . . . . . 57
- 5 Computing Quality Metadata . . . . . . . . . . . . . . . . . . . . 2.5.1 Level I: QoD Profiling Functions . . . . . . . . . . . . . . .
- 5.2 Level II: QoD Constraint-Based Functions . . . . . . . . .
- 5.3 Level III: QoD Synopses Functions . . . . . . . . . . . . .
- 5.4 Level IV: QoD Mining Functions . . . . . . . . . . . . . . .
- 5.5 Designing Analytic Workflows for QoD Evaluation . . . .
- 5.6 Computing and Assigning Probabilities to QoD Dimensions 76
- 6 Indexing Quality Metadata . . . . . . . . . . . . . . . . . . . . . 2.6.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
- 6.2 Range-Encoded Bitmap Index for QoD measures . . . . .
- 7 Extending the Syntax of a Query Language . . . . . . . . . . . .
- 7.1 Declaration of Quality Requirements . . . . . . . . . . . .
- 7.2 Manipulation of Data and Quality Metadata . . . . . . . . 88 2.7.3 Quality-Extended Query Processing . . . . . . . . . . . .
- 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 8.2 Research Perspectives . . . . . . . . . . . . . . . . . . . . .
- 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 2 Quality-Aware Integration of Biomedical Data . . . . . . . . . . 126 4.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . .
- 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
- 2.3 Contributions and Perspectives . . . . . . . . . . . . . . .
- 3 Quality-Driven Query in Mediation Systems . . . . . . . . . . .
- 3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . .
- 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
- 3.3 Contributions and Perspectives . . . . . . . . . . . . . . .
- 4 Monitoring the Quality of Stream Data . . . . . . . . . . . . . .
- 4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . .
- 4.2 Prospective Work . . . . . . . . . . . . . . . . . . . . . . .
- 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- GIMS, http://www.cs.man.ac.uk/img/gims/
- DataFoundry, http://www.llnl.gov/CASC/datafoundry/ 3 TAMBIS, http://imgproj.cs.man.ac.uk/tambis/
- P/FDM, http://www.csd.abdn.ac.uk/ gjlk/mediator/
- DiscoveryLink, http://www.research.ibm.com/journal/sj/402/haas.html 6 EMBL, European Molecular Biology Laboratory: http://www.embl-heidelberg.de/ 7 Febrl, http://datamining.anu.edu.au/software/febrl/febrldoc/ 8 NCBI References Sequences http://www.ncbi.nlm.nih.gov/RefSeq/ Bibliography AGGARWAL, CHARU. 2007. Data Streams: Models and Algorithms. Springer. AGRAWAL, RAKESH, IMIELINSKI, TOMASZ, & SWAMI, ARUN N. 1993. Mining Association Rules Between Sets of Items in Large Databases. Pages 207-216 of: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington, DC, USA.
- ANANTHAKRISHNA, ROHIT, CHAUDHURI, SURAJIT, & GANTI, VENKATESH. 2002. Eliminating Fuzzy Duplicates in Data Warehouses. Pages 586-597 of: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong Kong, China.
- ARENAS, MARCELO, BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 1999. Con- sistent Query Answers in Inconsistent Databases. Pages 68-79 of: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Philadelphia, PA, USA.
- ARENAS, MARCELO, BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 2000. Speci- fying and Querying Database Repairs Using Logic Programs with Exceptions. Pages 27-41 of: Proceedings of the 4th International Conference on Flexible Query Answering Systems, FQAS 2000. Warsaw, Poland.
- ARENAS, MARCELO, BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 2003. Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory and Prac- tice of Logic Programming (TPLP), 3(4-5), 393-424.
- BALLOU, DONALD P., & TAYI, GIRI KUMAR. 1989. Methodology for Allocating Resources for Data Quality Enhancement. Commun. ACM, 32(3), 320-329.
- BALLOU, DONALD P., & TAYI, GIRI KUMAR. 1999. Enhancing Data Quality in Data Warehouse Environments. Commun. ACM, 42(1), 73-78.
- BANSAL, NIKHIL, BLUM, AVRIM, & CHAWLA, SHUCHI. 2002. Correlation Clus- tering. Page 238 of: Proceedings of 43rd Symposium on Foundations of Computer Science, FOCS 2002. Vancouver, BC, Canada.
- BARBARÁ, DANIEL, GARCIA-MOLINA, HECTOR, & PORTER, DARYL. 1990. A Probalilistic Relational Data Model. Pages 60-74 of: Proceedings of the 2nd Inter- national Conference on Extending Database Technology, EDBT 1990. Lecture Notes in Computer Science, vol. 416. Venice, Italy.
- BARGA, ROGER S., & PU, CALTON. 1993. Accessing Imprecise Data: An Approach Based on Intervals. IEEE Data Eng. Bull., 16(2), 12-15.
- BASU, AYANENDRANATH, HARRIS, IAN R., & BASU, SRABASHI. 1997. Minimum Distance Estimation: The Approach Using Density-Based Distances. Handbook of Statistics, 15, 21-48.
- BATINI, CARLO, & SCANNAPIECO, MONICA. 2006. Data Quality: Concepts, Method- ologies and Techniques. Data-Centric Systems and Applications. Springer-Verlag.
- BATINI, CARLO, TIZIANA, CATARCI, & SCANNAPIECO, MONICA. 2004. A Survey of Data Quality Issues in Cooperative Systems. In: Tutorial of the 23rd Interna- tional Conference on Conceptual Modeling, ER 2004. Shanghai, China.
- BAXTER, ROHAN A., CHRISTEN, PETER, & CHURCHES, TIM. 2003. A Comparison of Fast Blocking Methods for Record Linkage. Pages 27-29 of: Proceedings of the KDD'03 Workshop on Data Cleaning, Record Linkage and Object Consolidation. Washington, DC, USA.
- BENJELLOUN, OMAR, SARMA, ANISH DAS, HALEVY, ALON, & WIDOM, JEN- NIFER. 2005 (June). The Symbiosis of Lineage and Uncertainty. Technical Report 2005-39. Stanford InfoLab, Stanford University, CA, USA.
- BERENGUER, GEMA, ROMERO, RAFAEL, TRUJILLO, JUAN, SERRANO, MANUEL, & PIATTINI, MARIO. 2005. A Set of Quality Indicators and Their Corresponding Metrics for Conceptual Models of Data Warehouses. Pages 95-104 of: Proceed- ings of the 7th International Conference on Data Warehousing and Knowledge Discov- ery, DaWaK 2005. Lecture Notes in Computer Science, vol. 3589. Copenhagen, Denmark.
- BERNSTEIN, PHILIP A., BERGSTRAESSER, THOMAS, CARLSON, JASON, PAL, SHANKAR, SANDERS, PAUL, & SHUTT, DAVID. 1999. Microsoft Repository Ver- sion 2 and the Open Information Model. Inf. Syst., 24(2), 71-98.
- BERTI-ÉQUILLE, LAURE. 1999b. Qualité des données multi-sources et recomman- dation multi-critère. Pages 185-204 of: Actes du congrès francophone INFormatique des ORganisations et Systèmes d'Information Décisionnels, INFORSID 1999. Toulon, France.
- BERTI-ÉQUILLE, LAURE. 1999c. Quality and Recommendation of Multi-Source Data for Assisting Technological Intelligence Applications. Pages 282-291 of: Proceedings of the International Conference on Database and Expert Systems Applica- tions, (DEXA'99). Lecture Notes in Computer Science, vol. 1677. Florence, Italy.
- BERTI-ÉQUILLE, LAURE. 2001. Integration of Biological Data and Quality-driven Source Negotiation. Pages 256-269 of: Proceedings of the 20th International Confer- ence on Conceptual Modeling, ER'2001. Lecture Notes in Computer Science, vol. 2224. Yokohama, Japan.
- BERTI-ÉQUILLE, LAURE. 2002. Annotation et recommandation collaboratives de documents selon leur qualité. Revue Ingénieire des Systèmes d'Information (ISI- NIS), Numéro Spécial "Recherche et Filtrage d'Information", 7(1-2/2002), 125-156.
- BERTI-ÉQUILLE, LAURE. 2003. Quality-Extended Query Processing for Distributed Sources. In: Proceedings of the 1rst International Workshop on Data Quality in Coop- erative Information Systems, DQCIS'2003. Siena, Italy.
- BERTI-ÉQUILLE, LAURE. 2003a. Quality-based Recommendation of XML Docu- ments. Journal of Digital Information Management, 1(3), 117-128.
- BERTI-ÉQUILLE, LAURE. 2004. Quality-Adaptive Query Processing over Dis- tributed Sources. Pages 285-296 of: Proceedings of the 9th International Conference on Information Quality, ICIQ 2004. Massachusetts Institute of Technology, Cam- bridge, MA, USA.
- BERTI-ÉQUILLE, LAURE, & MOUSSOUNI, FOUZIA. 2005. Quality-Aware Integra- tion and Warehousing of Genomic Data. Pages 442-454 of: Proceedings of the 10th International Conference on Information Quality, ICIQ 2005. Massachusetts Institute of Technology, Cambridge, MA, USA.
- BERTI-ÉQUILLE, LAURE, MOUSSOUNI, FOUZIA, & ARCADE, ANNE. 2001. Inte- gration of Biological Data on Transcriptome. Revue ISI-NIS, Numéro Spécial In- teropérabilité et Intégration des Systèmes d'Information, 6(3/2001), 61-86.
- BERTI-ÉQUILLE, LAURE. 2006a. Data Quality Awareness: a Case Study for Cost- Optimal Association Rule Mining. Knowl. Inf. Syst., 11(2), 191-215.
- BERTI-ÉQUILLE, LAURE. 2006b. Qualité des données. Techniques de l'Ingénieur, H3700, 1-19.
- BERTI-ÉQUILLE, LAURE. 2006c. Quality-Aware Association Rule Mining. Pages 440-449 of: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, (PAKDD 2006). Lecture Notes in Artificial Intelligence, vol. 3918. Springer.
- BERTOSSI, LEOPOLDO E., & BRAVO, LORETO. 2005. Consistent Query Answers in Virtual Data Integration Systems. Pages 42-83 of: Inconsistency Tolerance, Dagstuhl Seminar. Lecture Notes in Computer Science, vol. 3300. Schloss Dagstuhl, Germany.
- BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 2003. Query Answering in Incon- sistent Databases. Pages 43-83 of: Logics for Emerging Applications of Databases, Dagstuhl Seminar. Schloss Dagstuhl, Germany.
- BERTOSSI, LEOPOLDO E., & SCHWIND, CAMILLA. 2004. Database Repairs and Analytic Tableaux. Ann. Math. Artif. Intell., 40(1-2), 5-35.
- BHATTACHARYA, INDRAJIT, & GETOOR, LISE. 2004. Iterative Record Linkage for Cleaning and Integration. Pages 11-18 of: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004. Paris, France.
- BILENKO, MIKHAIL, & MOONEY, RAYMOND J. 2003. Adaptive Duplicate Detec- tion Using Learnable String Similarity Measures. Pages 39-48 of: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA.
- BILENKO, MIKHAIL, BASU, SUGATO, & SAHAMI, MEHRAN. 2005. Adaptive Prod- uct Normalization: Using Online Learning for Record Linkage in Comparison Shopping. Pages 58-65 of: Proceedings of the 5th IEEE International Conference on Data Mining, ICDM 2005. Houston, TX, USA.
- BILKE, ALEXANDER, BLEIHOLDER, JENS, BÖHM, CHRISTOPH, DRABA, KARSTEN, NAUMANN, FELIX, & WEIS, MELANIE. 2005. Automatic Data Fusion with Hum- Mer. Pages 1251-1254 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
- BOSC, PATRICK, LIETARD, NADIA, & PIVERT, OLIVIER. 2006. About Inclusion- Based Generalized Yes/No Queries in a Possibilistic Database Context. Pages 284-289 of: Proceedings of the 16th International Symposium on Foundations of Intel- ligent Systems, ISMIS 2006. Bari, Italy.
- BOUZEGHOUB, MOKRANE, & PERALTA, VERÓNIKA. 2004. A Framework for Anal- ysis of Data Freshness. Pages 59-67 of: Proceedings of the 1st International ACM SIGMOD 2004 Workshop on Information Quality in Information Systems, IQIS 2004. Paris, France.
- BRAUMANDL, REINHARD, KEIDL, MARKUS, KEMPER, ALFONS, KOSSMANN, DONALD, SELTZSAM, STEFAN, & STOCKER, KONRAD. 2001. ObjectGlobe: Open Distributed Query Processing Services on the Internet. IEEE Data Eng. Bull., 24(1), 64-70.
- BRAVO, LORETO, & BERTOSSI, LEOPOLDO E. 2003. Logic Programs for Consis- tently Querying Data Integration Systems. Pages 10-15 of: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI-03. Acapulco, Mexico.
- BREUNIG, MARKUS M., KRIEGEL, HANS-PETER, NG, RAYMOND T., & SANDER, JÖRG. 2000. LOF: Identifying Density-Based Local Outliers. Pages 93-104 of: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, TX, USA.
- BRIGHT, LAURA, & RASCHID, LOUIQA. 2002. Using Latency-Recency Profiles for Data Delivery on the Web. Pages 550-561 of: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong Kong, China.
- BRY, FRANÇOIS. 1997. Query Answering in Information Systems with Integrity Constraints. Pages 113-130 of: Integrity and Internal Control in Information Sys- tems, IFIP TC11 Working Group 11.5, First Working Conference on Integrity and Inter- nal Control in Information Systems: Increasing the confidence in Information Systems, IICIS. Zurich, Switzerland.
- BUECHI, MARTIN, BORTHWICK, ANDREW, WINKEL, ADAM, & GOLDBERG, ARTHUR. 2003. ClueMaker: A Language for Approximate Record Matching. Pages 207-223 of: Proceedings of the 8th International Conference on Information Quality, ICIQ 2003. MIT, Cambridge, MA, USA.
- CALÌ, ANDREA, LEMBO, DOMENICO, & ROSATI, RICCARDO. 2003. On the Decid- ability and Complexity of Query Answering Over Inconsistent and Incomplete Databases. Pages 260-271 of: Proceedings of the 22nd ACM SIGACT-SIGMOD- SIGART Symposium on Principles of Database Systems, PODS. San Diego, CA, USA. CARREIRA, PAULO J. F., & GALHARDAS, HELENA. 2004. Execution of Data Map- pers. Pages 2-9 of: Proceedings of the 1st International ACM SIGMOD 2004 Work- shop on Information Quality in Information Systems, IQIS 2004. Paris, France.
- CARUSO, FRANCESCO, COCHINWALA, MUNIR, GANAPATHY, UMA, LALK, GAIL, & MISSIER, PAOLO. 2000. Telcordia's Database Reconciliation and Data Quality Analysis Tool. Pages 615-618 of: Proceedings of 26th International Conference on Very Large Data Bases, VLDB 2000. Cairo, Egypt.
- CAVALLO, ROGER, & PITTARELLI, MICHAEL. 1987. The Theory of Probabilistic Databases. Pages 71-81 of: Proceedings of 13th International Conference on Very Large Data Bases, VLDB 1987. Brighton, England.
- CERI, STEFANO, COCHRANE, ROBERTA, & WIDOM, JENNIFER. 2000. Practical Applications of Triggers and Constraints: Success and Lingering Issues. Pages 254-262 of: Proceedings of 26th International Conference on Very Large Data Bases, VLDB 2000. Cairo, Egypt.
- CHARIKAR, MOSES, GURUSWAMI, VENKATESAN, & WIRTH, ANTHONY. 2003. Clustering with Qualitative Information. Pages 524-533 of: Proceedings of 44th Symposium on Foundations of Computer Science, FOCS 2003. Cambridge, MA, USA. CHAUDHURI, SURAJIT, GANJAM, KRIS, GANTI, VENKATESH, & MOTWANI, RA- JEEV. 2003. Robust and Efficient Fuzzy Match for Online Data Cleaning. Pages 313-324 of: Proceedings of the 2003 ACM SIGMOD International Conference on Man- agement of Data. San Diego, CA, USA.
- CHAUDHURI, SURAJIT, GANTI, VENKATESH, & MOTWANI, RAJEEV. 2005. Robust Identification of Fuzzy Duplicates. Pages 865-876 of: Proceedings of the 21st In- ternational Conference on Data Engineering, ICDE 2005. Tokyo, Japan.
- CHAUDHURI, SURAJIT, GANTI, VENKATESH, & KAUSHIK, RAGHAV. 2006. A Prim- itive Operator for Similarity Joins in Data Cleaning. Page 5 of: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006. Atlanta, GA, USA.
- CHENG, REYNOLD, KALASHNIKOV, DMITRI V., & PRABHAKAR, SUNIL. 2003. Evaluating Probabilistic Queries over Imprecise Data. Pages 551-562 of: Pro- ceedings of the 2003 ACM SIGMOD International Conference on Management of Data. San Diego, CA, USA.
- CHO, JUNGHOO, & GARCIA-MOLINA, HECTOR. 2000. Synchronizing a Database to Improve Freshness. Pages 117-128 of: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, TX, USA.
- CHOENNI, SUNIL, BLOK, HENK ERNST, & LEERTOUWER, ERIK. 2006. Handling Uncertainty and Ignorance in Databases: A Rule to Combine Dependent Data. Pages 295-309 of: Proceedings of 11th International Conference on Database Systems for Advanced Applications, DASFAA 2006. Lecture Notes in Computer Science, vol. 3882. Singapore.
- CHOMICKI, JAN. 2006. Consistent Query Answering: Opportunities and Limi- tations. Pages 527-531 of: Proceedings of 2nd International Workshop on Logical Aspects and Applications of Integrity Constraints, LAAIC 2006. Krakow, Poland.
- CHRISTEN, PETER, CHURCHES, TIM, & HEGLAND, MARKUS. 2004. Febrl -A Par- allel Open Source Data Linkage System. Pages 638-647 of: Proceedings of the 8th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2004. Lecture Notes in Computer Science, vol. 3056. Sydney, Australia.
- COHEN, WILLIAM W., RAVIKUMAR, PRADEEP, & FIENBERG, STEPHEN E. 2003. A Comparison of String Distance Metrics for Name-Matching Tasks. Pages 73-78 of: Proceedings of IJCAI-03 Workshop on Information Integration on the Web, IIWeb- 03. Acapulco, Mexico.
- COULON, CÉDRIC, PACITTI, ESTHER, & VALDURIEZ, PATRICK. 2005. Consistency Management for Partial Replication in a High Performance Database Cluster. Pages 809-815 of: Proceedings of 11th International Conference on Parallel and Dis- tributed Systems, ICPADS 2005, vol. 1. Fuduoka, Japan.
- CUI, YINGWEI, & WIDOM, JENNIFER. 2003. Lineage Tracing for General Data Warehouse Transformations. VLDB J., 12(1), 41-58.
- CULOTTA, ARON, & MCCALLUM, ANDREW. 2005. Joint Deduplication of Multiple Record Types in Relational Data. Pages 257-258 of: Proceedings of the 2005 ACM International Conference on Information and Knowledge Management, CIKM 2005. Bremen, Germany.
- DALVI, NILESH N., & SUCIU, DAN. 2004. Efficient Query Evaluation on Proba- bilistic Databases. Pages 864-875 of: Proceedings of the 30th International Confer- ence on Very Large Data Bases, VLDB 2004. Toronto, ON, Canada.
- DASU, TAMRAPARNI, & JOHNSON, THEODORE. 2003. Exploratory Data Mining and Data Cleaning. John Wiley.
- DASU, TAMRAPARNI, JOHNSON, THEODORE, MUTHUKRISHNAN, S., & SHKAPENYUK, VLADISLAV. 2002. Mining Database Structure or How To Build a Data Quality Browser. Pages 240-251 of: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. Madison, WI, USA.
- DASU, TAMRAPARNI, VESONDER, GREGG T., & WRIGHT, JON R. 2003. Data Qual- ity Through Knowledge Engineering. Pages 705-710 of: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003. Washington, DC, USA.
- DEMPSTER, ARTHUR PENTLAND, LAIRD, NAN M., & RUBIN, DONALD B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39, 1-38.
- DOMINGOS, PEDRO, & HULTEN, GEOFF. 2001. Catching up with the Data: Re- search Issues in Mining Data Streams. In: Proceedings of 2001 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD2001. Santa Barbara, CA, USA.
- DONG, XIN, HALEVY, ALON Y., & MADHAVAN, JAYANT. 2005. Reference Recon- ciliation in Complex Information Spaces. Pages 85-96 of: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. Baltimore, MD, USA. DUMOUCHEL, WILLIAM, VOLINSKY, CHRIS, JOHNSON, THEODORE, CORTES, CORINNA, & PREGIBON, DARYL. 1999. Squashing Flat Files Flatter. Pages 6-15 of: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining, KDD 1999. San Diego, CA, USA.
- ELFEKY, MOHAMED G., ELMAGARMID, AHMED K., & VERYKIOS, VASSILIOS S. 2002. TAILOR: A Record Linkage Tool Box. Pages 17-28 of: Proceedings of the 18th International Conference on Data Engineering, ICDE 2002. San Jose, CA, USA.
- ELMAGARMID, AHMED K., IPEIROTIS, PANAGIOTIS G., & VERYKIOS, VASSIL- IOS S. 2007. Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng., 19(1), 1-16.
- EMBURY, SUZANNE M., BRANDT, SUE M., ROBINSON, JOHN S., SUTHERLAND, IAIN, BISBY, FRANK A., GRAY, W. ALEX, JONES, ANDREW C., & WHITE, RICHARD J. 2001. Adapting Integrity Enforcement Techniques for Data Rec- onciliation. Inf. Syst., 26(8), 657-689.
- ENGLISH, LARRY. 2002. Process Management and Information Quality: How Improving Information Production Processes Improves Information (Product) Quality. Pages 206-209 of: Proceedings of the Seventh International Conference on Information Quality, ICIQ 2002. MIT, Cambridge, MA, USA.
- ENGLISH, LARRY P. 1999. Improving Data Warehouse and Business Information Qual- ity. Wiley.
- FAGIN, RONALD, KOLAITIS, PHOKION G., MILLER, RENÉE J., & POPA, LUCIAN. 2003. Data Exchange: Semantics and Query Answering. Pages 207-224 of: Pro- ceedings of 9th International Conference on Database Theory, ICDT 2003. Lecture Notes in Computer Science, vol. 2572. Siena, Italy.
- FALOUTSOS, CHRISTOS. 2002. Sensor Data Mining: Similarity Search and Pattern Analysis. In: Tutorial of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong Kong, China.
- FALOUTSOS, CHRISTOS, & LIN, KING-IP. 1995. FastMap: A Fast Algorithm for In- dexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Pages 163-174 of: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, CA, USA.
- FELLEGI, IVAN P., & SUNTER, A.B. 1969. A Theory for Record Linkage. Journal of the American Statistical Association, 64, 1183-1210.
- FLESCA, SERGIO, FURFARO, FILIPPO, & PARISI, FRANCESCO. 2005. Consistent Query Answers on Numerical Databases Under Aggregate Constraints. Pages 279-294 of: Proceedings of 10th International Symposium on Database Programming Languages, DBPL 2005. Trondheim, Norway.
- FOX, CHRISTOPHER J., LEVITIN, ANANY, & REDMAN, THOMAS. 1994. The Notion of Data and Its Quality Dimensions. Inf. Process. Manage., 30(1), 9-20.
- GALHARDAS, HELENA, FLORESCU, DANIELA, SHASHA, DENNIS, & SIMON, ERIC. 2000. AJAX: An Extensible Data Cleaning Tool. Page 590 of: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, TX, USA. GALHARDAS, HELENA, FLORESCU, DANIELA, SHASHA, DENNIS, SIMON, ERIC, & SAITA, CRISTIAN-AUGUSTIN. 2001. Declarative Data Cleaning: Language, Model, and Algorithms. Pages 371-380 of: Proceedings of 27th International Con- ference on Very Large Data Bases, VLDB 2001. Roma, Italy.
- GELENBE, EROL, & HÉBRAIL, GEORGES. 1986. A Probability Model of Uncertainty in Data Bases. Pages 328-333 of: Proceedings of the Second International Conference on Data Engineering, ICDE 1986. Los Angeles, CA, USA.
- GRAHNE, GÖSTA. 2002. Information Integration and Incomplete Information. IEEE Data Eng. Bull., 25(3), 46-52.
- GRAVANO, LUIS, IPEIROTIS, PANAGIOTIS G., JAGADISH, H. V., KOUDAS, NICK, MUTHUKRISHNAN, S., PIETARINEN, LAURI, & SRIVASTAVA, DIVESH. 2001. Us- ing q-grams in a DBMS for Approximate String Processing. IEEE Data Eng. Bull., 24(4), 28-34.
- GRAVANO, LUIS, IPEIROTIS, PANAGIOTIS G., KOUDAS, NICK, & SRIVASTAVA, DI- VESH. 2003. Text Joins for Data Cleansing and Integration in an RDBMS. Pages 729-731 of: Proceedings of the 19th International Conference on Data Engineering, ICDE 2003. Bangalore, India.
- GUÉRIN, EMILIE, MOUSSOUNI, FOUZIA, & BERTI-ÉQUILLE, LAURE. 2001. Inté- gration des données sur le transcriptome. Pages 219-228 of: Actes de la journée de travail bi-thématique du GDR-PRC I3. Lyon, France.
- GUÉRIN, EMILIE, MARQUET, GWENAELLE, BURGUN, ANITA, LORÉAL, OLIVIER, BERTI-ÉQUILLE, LAURE, LESER, ULF, & MOUSSOUNI, FOUZIA. 2005. Integrat- ing and Warehousing Liver Gene Expression Data and Related Biomedical Re- sources in GEDAW. Pages 158-174 of: Proceedings of the 2nd International Work- shop on Data Integration in the Life Sciences, DILS 2005. San Diego, CA, USA.
- GUHA, SUDIPTO, RASTOGI, RAJEEV, & SHIM, KYUSEOK. 2001. Cure: An Efficient Clustering Algorithm for Large Databases. Inf. Syst., 26(1), 35-58.
- GUO, HONGFEI, LARSON, PER-ÅKE, RAMAKRISHNAN, RAGHU, & GOLDSTEIN, JONATHAN. 2004. Relaxed Currency and Consistency: How to Say "Good Enough" in SQL. Pages 815-826 of: Proceedings of the ACM SIGMOD Interna- tional Conference on Management of Data. Paris, France.
- GUO, HONGFEI, LARSON, PER-ÅKE, & RAMAKRISHNAN, RAGHU. 2005. Caching with 'Good Enough' Currency, Consistency, and Completeness. Pages 457-468 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
- HALEVY, ALON Y. 2001. Answering Queries Using Views: A Survey. VLDB J., 10(4), 270-294.
- HERNÁNDEZ, MAURICIO A., & STOLFO, SALVATORE J. 1998. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem. Data Min. Knowl. Discov., 2(1), 9-37.
- HJALTASON, GÍSLI R., & SAMET, HANAN. 2003. Properties of Embedding Meth- ods for Similarity Searching in Metric Spaces. IEEE Trans. Pattern Anal. Mach. Intell., 25(5), 530-549.
- HOU, WEN-CHI, & ZHANG, ZHONGYANG. 1995. Enhancing Database Correct- ness: a Statistical Approach. Pages 223-232 of: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, CA, USA.
- HULL, RICHARD, & ZHOU, GANG. 1996. A Framework for Supporting Data In- tegration Using the Materialized and Virtual Approaches. Pages 481-492 of: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Montreal, Quebec, Canada.
- HULTEN, GEOFF, SPENCER, LAURIE, & DOMINGOS, PEDRO. 2001. Mining time- changing data streams. Pages 97-106 of: Proceedings of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2001. San Francisco, CA, USA.
- HUNG, EDWARD, GETOOR, LISE, & SUBRAHMANIAN, V. S. 2003. PXML: A Prob- abilistic Semistructured Data Model and Algebra. Page 467 of: Proceedings of the 19th International Conference on Data Engineering, ICDE'03. Bangalore, India.
- IBRAHIM, HAMIDAH. 2002. A Strategy for Semantic Integrity Checking in Dis- tributed Databases. Pages 139-144 of: Proceedings of 9th International Conference on Parallel and Distributed Systems, ICPADS 2002. Taiwan, ROC.
- IMIELINSKI, TOMASZ, & LIPSKI, WITOLD JR. 1984. Incomplete Information in Relational Databases. J. ACM, 31(4), 761-791.
- JARKE, MATTHIAS, JEUSFELD, MANFRED A., QUIX, CHRISTOPH, & VASSILIADIS, PANOS. 1999. Architecture and Quality in Data Warehouses: An Extended Repository Approach. Inf. Syst., 24(3), 229-253.
- JARO, MATTHEW A. 1989. Advances in Record Linking Methodology as Applied to the 1985 Census of Tampa Florida. Journal of the American Statistical Society, 64, 1183-1210.
- JARO, MATTHEW A. 1995. Probabilistic Linkage of Large Public Health Data File. Statistics in Medicine, 14, 491-498.
- KAHN, BEVERLY K., STRONG, DIANE M., & WANG, RICHARD Y. 2002. Informa- tion Quality Benchmarks: Product and Service Performance. Commun. ACM, 45(4), 184-192.
- KALASHNIKOV, DMITRI V. & MEHROTRA, SHARAD. 2006. Domain-Independent Data Cleaning via Analysis of Entity-Relationship Graph. ACM Transactions on Database Systems, 31(2), 716-767.
- KARAKASIDIS, ALEXANDROS, VASSILIADIS, PANOS, & PITOURA, EVAGGELIA. 2005. ETL Queues for Active Data Warehousing. Pages 28-39 of: Proceedings of the 2nd International ACM SIGMOD 2005 Workshop on Information Quality in Information Systems, IQIS 2005. Baltimore, MA, USA.
- KAUFMAN, L., & ROUSSEEUW, PETER J. 1990. Finding Groups in Data: An Introduc- tion to Cluster Analysis. John Wiley.
- KNORR, EDWIN M., & NG, RAYMOND T. 1998. Algorithms for Mining Distance- Based Outliers in Large Datasets. Pages 392-403 of: Proceedings of 24rd Interna- tional Conference on Very Large Data Bases, VLDB 1998. New York City, NY, USA.
- KORN, FLIP, MUTHUKRISHNAN, S., & ZHU, YUNYUE. 2003. Checks and Balances: Monitoring Data Quality Problems in Network Traffic Databases. Pages 536-547 of: Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003. Berlin, Germany.
- LABRINIDIS, ALEXANDROS, & ROUSSOPOULOS, NICK. 2003. Balancing Perfor- mance and Data Freshness in Web Database Servers. Pages 393-404 of: Proceed- ings of 29th International Conference on Very Large Data Bases, VLDB 2003. Berlin, Germany.
- LACROIX, ZOE, & CRITCHLOW, TERENCE (eds). 2003. Bioinformatics: Managing Scientific Data. Morgan Kaufmann.
- LAKSHMANAN, LAKS V. S., & SADRI, FEREIDOON. 1994. Modeling Uncertainty in Deductive Databases. Pages 724-733 of: Proceedings of the 5th International Conference on Database and Expert Systems Applications, DEXA'94. Lecture Notes in Computer Science, vol. 856. Athens, Greece.
- LAVRA Č, NADA, FLACH, PETER A., & ZUPAN, BLAZ. 1999. Rule Evaluation Mea- sures: A Unifying View. Pages 174-185 of: Proceedings of the Intl. Workshop on Inductive Logic Programming, ILP 1999. Bled, Slovenia.
- LAZARIDIS, IOSIF, & MEHROTRA, SHARAD. 2004. Approximate Selection Queries over Imprecise Data. Pages 140-152 of: Proceedings of the 20th International Con- ference on Data Engineering, ICDE 2004. Boston, MA, USA.
- LEE, LILLIAN. 2001. On the Effectiveness of the Skew Divergence for Statistical Language Analysis. Artificial Intelligence and Statistics, 65-72.
- LEE, MONG-LI, HSU, WYNNE, & KOTHARI, VIJAY. 2004. Cleaning the Spurious Links in Data. IEEE Intelligent Systems, 19(2), 28-33.
- LEE, SUK KYOON. 1992. An Extended Relational Database Model for Uncertain and Imprecise Information. Pages 211-220 of: Proceedings of the 18th International Conference on Very Large Data Bases, VLDB 1992. Vancouver, Canada.
- LEMBO, DOMENICO, LENZERINI, MAURIZIO, & ROSATI, RICCARDO. 2002. Source Inconsistency and Incompleteness in Data Integration. In: Proceedings of the 9th International Workshop on Knowledge Representation meets Databases, KRDB 2002, vol. 54. Toulouse, France.
- LI, CHEN. 2003. Computing Complete Answers to Queries in the Presence of Lim- ited Access Patterns. VLDB J., 12(3), 211-227.
- LI, WEN-SYAN, PO, OLIVER, HSIUNG, WANG-PIN, CANDAN, K. SELÇUK, & AGRAWAL, DIVYAKANT. 2003. Freshness-Driven Adaptive Caching for Dy- namic Content Web Sites. Data Knowl. Eng., 47(2), 269-296.
- LIEPINS, GUNAR E., & UPPULURI, V. R. 1991. Data Quality Control: Theory and Pragmatics. New York, NY, USA: Marcel Dekker, Inc. 0-8247-8354-9.
- LIM, EE-PENG, SRIVASTAVA, JAIDEEP, PRABHAKAR, SATYA, & RICHARDSON, JAMES. 1993. Entity Identification in Database Integration. Pages 294-301 of: Proceedings of the 9th International Conference on Data Engineering, ICDE 1993. Vi- enna, Austria.
- LIN, JINXIN, & MENDELZON, ALBERTO O. 1998. Merging Databases Under Con- straints. Int. J. Cooperative Inf. Syst., 7(1), 55-76.
- LOSHIN, D. 2001. Enterprise Knowledge Management: The Data Quality Approach. Morgan Kaufmann.
- LOW, WAI LUP, LEE, MONG-LI, & LING, TOK WANG. 2001. A Knowledge-Based Approach for Duplicate Elimination in Data Cleaning. Inf. Syst., 26(8), 585-606.
- MANNINO, MICHAEL V., CHU, PAICHENG, & SAGER, THOMAS. 1988. Statistical Profile Estimation in Database Systems. ACM Comput. Surv., 20(3), 191-221.
- MARQUET, GWENAELLE, BURGUN, ANITA, MOUSSOUNI, FOUZIA, GUÉRIN, EM- ILIE, LE DUFF, FRANCK, & LORÉAL, OLIVIER. 2003. BioMeKe: an Ontology- Based Biomedical Knowledge Extraction System Devoted to Transcriptome Analysis. Studies in Health Technology and Informatics, 95, 80-86.
- MARTINEZ, ALEXANDRA, & HAMMER, JOACHIM. 2005. Making Quality Count in Biological Data Sources. Pages 16-27 of: Proceedings of the 2nd International ACM SIGMOD 2005 Workshop on Information Quality in Information Systems, IQIS 2005. Baltimore, MA, USA.
- MCCALLUM, ANDREW, NIGAM, KAMAL, & UNGAR, LYLE H. 2000. Efficient Clus- tering of High-Dimensional Data Sets with Application to Reference Matching. Pages 169-178 of: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000. Boston, MA, USA.
- MCCALLUM, ANDREW, BELLARE, KEDAR, & PEREIRA, FERNANDO. 2005. A Con- ditional Random Field for Discriminatively-trained Finite-state String Edit Dis- tance. Pages 388-396 of: Proceedings of the 21rst Conference in Uncertainty in Arti- ficial Intelligence, UAI'05. Edinburgh, Scotland, UK.
- MCCLEAN, SALLY I., SCOTNEY, BRYAN W., & SHAPCOTT, MARY. 2001. Aggrega- tion of Imprecise and Uncertain Information in Databases. IEEE Trans. Knowl. Data Eng., 13(6), 902-912.
- MIHAILA, GEORGE A., RASCHID, LOUIQA, & VIDAL, MARIA-ESTHER. 2000. Us- ing Quality of Data Metadata for Source Selection and Ranking. Pages 93-98 of: Proceedings of the 3rd International Workshop on the Web and Databases, WebDB 2000. Dallas, TX, USA.
- MONGE, ALVARO E. 2000. Matching Algorithms within a Duplicate Detection System. IEEE Data Eng. Bull., 23(4), 14-20.
- MONGE, ALVARO E., & ELKAN, CHARLES. 1996. The Field Matching Problem: Algorithms and Applications. Pages 267-270 of: Proceedings of the 2nd Interna- tional Conference on Knowledge Discovery and Data Mining, KDD 1996. Portland, OR, USA.
- MOTRO, AMIHAI, & ANOKHIN, PHILIPP. 2006. FusionPlex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources. Infor- mation Fusion, 7(2), 176-196.
- MOTRO, AMIHAI, & RAKOV, IGOR. 1998. Estimating the Quality of Databases. Pages 298-307 of: Proceedings of the 3rd International Conference on Flexible Query Answering Systems, FQAS'98. Roskilde, Denmark.
- MÜLLER, HEIKO, & NAUMANN, FELIX. 2003. Data Quality in Genome Databases. Pages 269-284 of: Proceedings of the 8th International Conference on Information Quality, ICIQ 2003. MIT, Cambridge, MA, USA.
- MÜLLER, HEIKO, LESER, ULF, & FREYTAG, JOHANN CHRISTOPH. 2004. Mining for Patterns in Contradictory Data. Pages 51-58 of: Proceedings of the 1st International ACM SIGMOD 2004 Workshop on Information Quality in Information Systems, IQIS 2004. Paris, France.
- MYLOPOULOS, JOHN, BORGIDA, ALEXANDER, JARKE, MATTHIAS, & KOUBARAKIS, MANOLIS. 1990. Telos: Representing Knowledge About Information Systems. ACM Trans. Inf. Syst., 8(4), 325-362.
- NAJJAR, FAÏZA, & SLIMANI, YAHYA. 1999. Cardinality Estimation of Distributed Join Queries. Pages 66-70 of: Proceedings of the 10th International DEXA Workshop on on Parallel & Distributed Databases: Innovative Applications & New Architectures. Florence, Italy.
- NASH, ALAN, & LUDÄSCHER, BERTRAM. 2004. Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns. Pages 422-440 of: Pro- ceedings of the 9th International Conference on Extending Database Technology, EDBT 2004. Lecture Notes in Computer Science, vol. 2992. Heraklion, Crete, Greece.
- NAUMANN, FELIX. 2002. Quality-Driven Query Answering for Integrated Information Systems. Lecture Notes in Computer Science, vol. 2261. Springer-Verlag.
- NAUMANN, FELIX, LESER, ULF, & FREYTAG, JOHANN CHRISTOPH. 1999. Quality- Driven Integration of Heterogenous Information Systems. Pages 447-458 of: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999. Edinburgh, Scotland, UK.
- NAUMANN, FELIX, FREYTAG, JOHANN CHRISTOPH, & LESER, ULF. 2004. Com- pleteness of Integrated Information Sources. Inf. Syst., 29(7), 583-615.
- NAVARRO, GONZALO. 2001. A Guided Tour to Approximate String Matching. ACM Comput. Surv., 33(1), 31-88.
- NEWCOMBE, H.B., KENNEDY, J.M., AXFORD, S.J., & JAMES, A.P. 1959. Automatic Linkage of Vital Records. Science, 954-959.
- NEWCOMBE, HOWARD B., & KENNEDY, JAMES M. 1962. Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information. Com- mun. ACM, 5(11), 563-566.
- OLSON, JACK E. 2003. Data Quality: The Accuracy Dimension. Morgan Kaufmann.
- OLSTON, CHRISTOPHER, & WIDOM, JENNIFER. 2005. Efficient Monitoring and Querying of Distributed, Dynamic Data via Approximate Replication. IEEE Data Eng. Bull., 28(1), 11-18.
- OMG. 2003 (March). Common Warehouse Metamodel (CWM), Specification Version 1.1. Tech. rept. Object Management Group.
- PANG, HWEEHWA, JAIN, ARPIT, RAMAMRITHAM, KRITHI, & TAN, KIAN-LEE. 2005. Verifying Completeness of Relational Query Results in Data Publishing. Pages 407-418 of: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. Baltimore, MD, USA.
- PAPADIMITRIOU, SPIROS, SUN, JIMENG, & FALOUTSOS, CHRISTOS. 2005. Stream- ing Pattern Discovery in Multiple Time-Series. Pages 697-708 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
- PARSONS, SIMON. 1996. Current Approaches to Handling Imperfect Information in Data and Knowledge Bases. IEEE Trans. Knowl. Data Eng., 8(3), 353-372.
- PASULA, HANNA, MARTHI, BHASKARA, MILCH, BRIAN, RUSSELL, STUART J., & SHPITSER, ILYA. 2002. Identity Uncertainty and Citation Matching. Pages 1401-1408 of: Proceedings of Advances in Neural Information Processing Systems 15, NIPS 2002. Vancouver, BC, Canada.
- PEARSON, RONALD K. 2005. Mining Imperfect Data: Dealing with Contamination and Incomplete Records. Philadelphia: SIAM.
- PEIM, MARTIN, FRANCONI, ENRICO, & PATON, NORMAN W. 2003. Estimating the Quality of Answers When Querying Over Description Logic Ontologies. Data Knowl. Eng., 47(1), 105-129.
- PERALTA, VERÓNIKA. 2006 (November). Data Quality Evaluation in Data Integra- tion Systems. Ph.D. thesis, Université de Versailles, France & Universidad de la República, Uruguay.
- PETROPOULOS, MICHALIS, DEUTSCH, ALIN, & PAPAKONSTANTINOU, YANNIS. 2006. Interactive Query Formulation over Web Service-Accessed Sources. Pages 253-264 of: Proceedings of the ACM SIGMOD International Conference on Manage- ment of Data. Chicago, IL, USA.
- POOLE, JOHN, CHANG, DAN, TOLBERT, DOUGLAS, & MELLOR, DAVID. 2003. Common Warehouse Metamodel Developer's Guide. New York: John Wiley & Sons Inc. PRADHAN, SHEKHAR. 2003. Argumentation Databases. Pages 178-193 of: Proceed- ings of 19th International Conference on Logic Programming, ICLP 2003. Mumbai, India.
- PYLE, DORIAN. 1999. Data Preparation for Data Mining. Morgan Kaufmann.
- QUASS, DALLAN, & STARKEY, P. 2003. A Comparison of Fast Blocking Methods for Record Linkage. Pages 40-42 of: Proceedings of the KDD 2003 Workshop on Data Cleaning, Record Linkage and Object Consolidation. Washington, DC, USA.
- RAHM, ERHARD, & DO, HONG HAI. 2000. Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bull., 23(4), 3-13.
- RAMAMRITHAM, KRITHI. 1993. Real-Time Databases. Distributed and Parallel Databases, 1(2), 199-226.
- RAMAMRITHAM, KRITHI, & CHRYSANTHIS, PANOS K. 1992. In Search of Accept- ability Citeria: Database Consistency Requirements and Transaction Correctness Properties. Pages 212-230 of: Proceedings of International Workshop on Distributed Object Management, IWDOM 1992. Edmonton, AL, Canada.
- RAMAN, VIJAYSHANKAR, & HELLERSTEIN, JOSEPH M. 2001. Potter's Wheel: An Interactive Data Cleaning System. Pages 381-390 of: Proceedings of 27th Interna- tional Conference on Very Large Data Bases, VLDB 2001. Roma, Italy.
- RE, CHRISTOPHER, DALVI, NILESH N., & SUCIU, DAN. 2006. Query Evaluation on Probabilistic Databases. IEEE Data Eng. Bull., 29(1), 25-31.
- REDMAN, THOMAS. 2001. Data Quality: The Field Guide. Digital Press, Elsevier.
- RISTAD, ERIC SVEN, & YIANILOS, PETER N. 1998. Learning String-Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell., 20(5), 522-532.
- SAMPAIO, SANDRA DE F. MENDES, DONG, CHAO, & SAMPAIO, PEDRO. 2005. In- corporating the Timeliness Quality Dimension in Internet Query Systems. Pages 53-62 of: Proceedings of the International Workshop on Web Information Systems En- gineering, WISE 2005. New York, NY, USA.
- SANTIS, LUCA DE, SCANNAPIECO, MONICA, & CATARCI, TIZIANA. 2003. Trust- ing Data Quality in Cooperative Information Systems. Pages 354-369 of: Proceed- ings of CoopIS, DOA, and ODBASE -OTM Confederated International Conferences. Catania, Sicily, Italy.
- SARAWAGI, SUNITA, & KIRPAL, ALOK. 2004. Efficient Set Joins on Similarity Pred- icates. Pages 743-754 of: Proceedings of the 2004 ACM SIGMOD International Con- ference on Management of Data. Paris, France.
- SAYYADIAN, MAYSSAM, LEE, YOONKYONG, DOAN, ANHAI, & ROSENTHAL, ARNON. 2005. Tuning Schema Matching Software using Synthetic Scenarios. Pages 994-1005 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
- SEGEV, ARIE, & FANG, WEIPING. 1990. Currency-Based Updates to Distributed Materialized Views. Pages 512-520 of: Proceedings of the 6th International Confer- ence on Data Engineering, ICDE 1090. Los Angeles, CA, USA.
- SHETH, AMIT P., WOOD, CHRISTOPHER, & KASHYAP, VIPUL. 1993. Q-Data: Using Deductive Database Technology to Improve Data Quality. Pages 23-56 of: Pro- ceedings of the International Workshop on Programming with Logic Databases, ILPS. Vancouver, BC, Canada.
- SIMITSIS, ALKIS, VASSILIADIS, PANOS, & SELLIS, TIMOS K. 2005. Optimizing ETL Processes in Data Warehouses. Pages 564-575 of: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. Tokyo, Japan.
- SINGLA, PARAG, & DOMINGOS, PEDRO. 2005. Collective Object Identification. Pages 1636-1637 of: Proceedings of the 19th International Joint Conference on Artifi- cial Intelligence, IJCAI-05. Edinburgh, Scotland, UK.
- STOCKINGER, KURT. 2002. Bitmap Indices for Speeding Up High-Dimensional Data Analysis. Pages 881-890 of: Proceedings of the 13th International Database and Expert Systems Applications Conference, DEXA 2002. Lecture Notes in Computer Science, vol. 2453. Aix-en-Provence, France.
- TAN, P.N., KUMAR, V., & SRIVASTAVA, J. 2002. Selecting the Right Interestingness Measure for Association Patterns. Pages 32-41 of: Proceedings of the 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2002. Edmon- ton,AL, Canada.
- TEJADA, SHEILA, KNOBLOCK, CRAIG A., & MINTON, STEVEN. 2001. Learning Object Identification Rules for Information Integration. Inf. Syst., 26(8), 607-633.
- TEJADA, SHEILA, KNOBLOCK, CRAIG A., & MINTON, STEVEN. 2002. Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification. Pages 350-359 of: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002. Edmonton, AL, Canada.
- THANGAVEL, ALPHONSE THANARAJ. 1999. A Clean Data Set of EST-confirmed Splice Sites from Homo Sapiens and Standards for Clean-up Procedures. Nucleic Acids Res., 27(13), 2627-2637.
- THEODORATOS, DIMITRI, & BOUZEGHOUB, MOKRANE. 1999. Data Currency Quality Factors in Data Warehouse Design. Page 15 of: Proceedings of the In- ternational Workshop on Design and Management of Data Warehouses, DMDW'99. Heidelberg, Germany.
- THEODORATOS, DIMITRI, & BOUZEGHOUB, MOKRANE. 2001. Data Currency Quality Satisfaction in the Design of a Data Warehouse. Int. J. Cooperative Inf. Syst., 10(3), 299-326.
- THOR, ANDREAS, & RAHM, ERHARD. 2007. MOMA -A Mapping-based Object Matching System. Pages 247-258 of: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, CIDR 2007. Asilomar, CA, USA.
- VAILLANT, BENOÎT, LENCA, PHILIPPE, & LALLICH, STÉPHANE. 2004. A Cluster- ing of Interestingness Measures. Pages 290-297 of: Proceedings of the 7th Interna- tional Conference on Discovery Science, DS 2004. Padova, Italy.
- VASSILIADIS, PANOS, VAGENA, ZOGRAFOULA, SKIADOPOULOS, SPIROS, KARAYANNIDIS, NIKOS, & SELLIS, TIMOS K. 2001. ARKTOS: Towards the Mod- eling, Design, Control and Execution of ETL Processes. Inf. Syst., 26(8), 537-561.
- VASSILIADIS, PANOS, SIMITSIS, ALKIS, GEORGANTAS, PANOS, & TERROVITIS, MANOLIS. 2003. A Framework for the Design of ETL Scenarios. Pages 520-535 of: Proceedings of the 15th International Conference on Advanced Information Systems Engineering, CAiSE 2003. Klagenfurt, Austria.
- VERYKIOS, VASSILIOS S., MOUSTAKIDES, GEORGE V., & ELFEKY, MOHAMED G. 2003. A Bayesian Decision Model for Cost Optimal Record Matching. VLDB J., 12(1), 28-40.
- WANG, KE, ZHOU, SENQIANG, YANG, QIANG, & YEUNG, JACK MAN SHUN. 2005. Mining Customer Value: from Association Rules to Direct Marketing. Data Min. Knowl. Discov., 11(1), 57-79.
- WANG, RICHARD Y. 1998. A Product Perspective on Total Data Quality Manage- ment. Commun. ACM, 41(2), 58-65.
- WANG, RICHARD Y., STOREY, VEDA C., & FIRTH, CHRISTOPHER P. 1995. A Framework for Analysis of Data Quality Research. IEEE Trans. Knowl. Data Eng., 7(4), 623-640.
- WANG, RICHARD Y., ZIAD, MOSTAPHA, & LEE, YANG W. 2002. Data Quality. Advances in Database Systems, vol. 23. Kluwer Academic Publishers.
- WEIS, MELANIE, & MANOLESCU, IOANA. 2007. XClean in Action. Pages 259-262 of: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, CIDR 2007. Asilomar, CA, USA.
- WEIS, MELANIE, & NAUMANN, FELIX. 2004. Detecting Duplicate Objects in XML Documents. Pages 10-19 of: Proceedings of the First International ACM SIGMOD 2004 Workshop on Information Quality in Information Systems, IQIS 2004. Paris, France.
- WEIS, MELANIE, NAUMANN, FELIX, & BROSY, FRANZISKA. 2006. A Duplicate Detection Benchmark for XML (and Relational) Data. In: Proceedings of the 3rd International ACM SIGMOD 2006 Workshop on Information Quality in Information Systems, IQIS 2006. Chicago, IL, USA.
- WIDOM, JENNIFER. 2005. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. Pages 262-276 of: Proceedings of 2nd Biennial Conference on Innovative Data Systems Research. Asilomar, CA, USA.
- WIJSEN, JEF. 2003. Condensed Representation of Database Repairs for Consistent Query Answering. Pages 378-393 of: Proceedings of 9th International Conference on Database Theory, ICDT 2003. Siena, Italy.
- WINKLER, WILLIAM E. 1999. The State of Record Linkage and Current Research Prob- lems. Tech. Rept. Statistics of Income Division, Internal Revenue Service Publi- cation R99/04. U.S. Bureau of the Census, Washington, DC, USA.
- WINKLER, WILLIAM E. 2004. Methods for Evaluating and Creating Data Quality. Inf. Syst., 29(7), 531-550.
- WINKLER, WILLIAM E., & THIBAUDEAU, YVES. 1991. An Application of the Fellegi- Sunter Model of Record Linkage to the 1990 U.S. Decennial Census. Tech. Rept. Sta- tistical Research Report Series RR91/09. U.S. Bureau of the Census, Washington, DC, USA.
- WU, KESHENG, OTOO, EKOW J., & SHOSHANI, ARIE. 2006. Optimizing Bitmap Indices with Efficient Compression. ACM Trans. Database Syst., 31(1), 1-38.
- XIONG, MING, LIANG, BIYU, LAM, KAM-YIU, & GUO, YANG. 2006. Quality of Service Guarantee for Temporal Consistency of Real-Time Transactions. IEEE Trans. Knowl. Data Eng., 18(8), 1097-1110.
- ZHANG, TIAN, RAMAKRISHNAN, RAGHU, & LIVNY, MIRON. 1996. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Pages 103-114 of: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Montreal, Quebec, Canada.
- ZHAO, XIAOFEI, & HUANG, ZHIQIU. 2006. A Formal Framework for Reasoning on Metadata Based on CWM. Pages 371-384 of: Proceedings of 25th International Conference on Conceptual Modeling, ER 2006. Lecture Notes in Computer Science, vol. 4215. Tucson, AZ, USA.
- ZHU, YUNYUE, & SHASHA, DENNIS. 2002. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. Pages 358-369 of: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong-Kong, China.
- ZHUGE, YUE, GARCIA-MOLINA, HECTOR, & WIENER, JANET L. 1997. Multiple View Consistency for Data Warehousing. Pages 289-300 of: Proceedings of the 13th International Conference on Data Engineering, ICDE 1997. Birmingham, UK.
- Data Sources Characteristics for MSISs . . . . . . . . . . . . . . . . . .
- 2 Applications and Data Types Coverage . . . . . . . . . . . . . . . . .
- 1 Taxonomy of Existing Techniques for Entity Resolution . . . . . . . . 16
- 2 Correlation Clustering Example from (Bansal et al., 2002) . . . . . . .
- 1 CWM Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 2 Relational Metamodel of CRM_DB . . . . . . . . . . . . . . . . . . . .
- 3 CWM Relational Data Instance Metamodel (OMG, 2003) . . . . . . .
- 4 QoD Extension to CWM Data Instance Metamodel . . . . . . . . . . .
- 5 Example of QoD Metadata Associated to CRM_DB PRODUCT table 2.6 Example of Analytic Workflow for QoD evaluation of CRM_DB . . .
- 7 Syntax of Quality Contract Type Creation . . . . . . . . . . . . . . . .
- 8 Syntax of Call Specification in Quality Contract Type Declaration . .
- 9 Syntax of Quality Contract Creation . . . . . . . . . . . . . . . . . . .
- 10 Checking QoD Constraints on CRM_DB Granularity Levels . . . . .
- 11 Syntax of QWITH queries . . . . . . . . . . . . . . . . . . . . . . . . .
- 12 Quality-Extended Query Processing . . . . . . . . . . . . . . . . . . .
- 1 General Framework of Data Quality Awareness for the KDD Process 3.2 Different Levels for Measuring Data Quality . . . . . . . . . . . . . .
- 3 Classification Probabilities . . . . . . . . . . . . . . . . . . . . . . . . .
- 4 Decision Areas for Rule Post-Selection . . . . . . . . . . . . . . . . . .
- 5 Decision Costs for Rule Selection with a Priori Probability in [0.1,0.5] without Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . .
- 6 Decision Costs for Rule Selection with Different Data Quality Varia- tions without Misclassification for the a Priori Probability ¼ ¼ ¾¼¼
- 7 Amplitude of Cost Variations Depending on Data Quality Variations without Misclassification for the a priori Probability ¼ ¼ ¾¼¼ . . .
- 8 Decision Status on Rule Selection for Data Quality Variations with- out Misclassification for ¼ ¼ ¾¼¼ . . . . . . . . . . . . . . . . . . . .
- 9 Decision Costs for Rule Selection with a Priori Probability in [0.1,0.5] with Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 1 Problems and Current Solutions for Data Quality Management . . .
- 2 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 3 Decision Models for Handling Duplicates (Batini & Scannapieco, 2006)
- 4 Main Data Transformation Operators for ETL . . . . . . . . . . . . . .
- Data Cleaning Prototypes . . . . . . . . . . . . . . . . . . . . . . . . .
- 1 CRM_DB Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 2 Join over FK attributes considered as correct . . . . . . . . . . . . . .
- 3 Join over incorrect FK attributes . . . . . . . . . . . . . . . . . . . . . .
- 4 Join over non FK attributes considered as correct . . . . . . . . . . . .
- 5 Join over incorrect non FK attributes . . . . . . . . . . . . . . . . . . .
- 6 Join over Deduplicated FK attributes . . . . . . . . . . . . . . . . . . .
- 7 Join over Deduplicated non FK attributes . . . . . . . . . . . . . . . .
- 8 Examples of Level I Basic Functions . . . . . . . . . . . . . . . . . . .
- 9 Examples of Level II Functions . . . . . . . . . . . . . . . . . . . . . .
- 10 Example of Statistics for Specifying SCs on Attributes Values . . . . .
- 11 Examples of Level III Synopses Techniques . . . . . . . . . . . . . . .
- 12 Examples of Level IV Classification and Partitioning Methods . . . .
- 13 Descriptive Metadata for Partitioning Methods . . . . . . . . . . . . .
- Assigning Probabilities to QoD Dimension for a DB Object Instance .
- 15 Range-Encoded Bitmap Index with Binning for QoD Measures . . . .
- 16 Examples of Quality Contract Type Declaration . . . . . . . . . . . . .
- 17 Example of Contract Declaration . . . . . . . . . . . . . . . . . . . . .
- 18 Examples of Acceptability Values per QoD Dimension . . . . . . . . .
- 19 Examples of Simple QWITH Queries with EXACT Constraint Checking Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 20 Examples of Join QWITH Queries with EXACT Constraint Checking Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 21 Examples of QWITH Queries in the Exact Constraint Checking Mode 3.
- Marketing Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 2 Fusion Function Examples for Scoring Quality Dimensions of Asso- ciation Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 3 Example of Costs of Various Decisions for Classifying Association Rules Based on Data Quality . . . . . . . . . . . . . . . . . . . . . . . .