Academia.eduAcademia.edu

Outline

Quality awareness for managing and mining data

Abstract

Autonomy Heterogeneity no yes totally semi DIS DW & MIS VMS CIS RS P2P no

References (289)

  1. 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  2. 2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . .
  3. 3 General Approach . . . . . . . . . . . . . . . . . . . . . . . . . . .
  4. Modeling Quality Metadata . . . . . . . . . . . . . . . . . . . . . 2.4.1 The CWM Metamodel . . . . . . . . . . . . . . . . . . . .
  5. 4.2 CWM Extension for QoD Metadata Management . . . . . 57
  6. 5 Computing Quality Metadata . . . . . . . . . . . . . . . . . . . . 2.5.1 Level I: QoD Profiling Functions . . . . . . . . . . . . . . .
  7. 5.2 Level II: QoD Constraint-Based Functions . . . . . . . . .
  8. 5.3 Level III: QoD Synopses Functions . . . . . . . . . . . . .
  9. 5.4 Level IV: QoD Mining Functions . . . . . . . . . . . . . . .
  10. 5.5 Designing Analytic Workflows for QoD Evaluation . . . .
  11. 5.6 Computing and Assigning Probabilities to QoD Dimensions 76
  12. 6 Indexing Quality Metadata . . . . . . . . . . . . . . . . . . . . . 2.6.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
  13. 6.2 Range-Encoded Bitmap Index for QoD measures . . . . .
  14. 7 Extending the Syntax of a Query Language . . . . . . . . . . . .
  15. 7.1 Declaration of Quality Requirements . . . . . . . . . . . .
  16. 7.2 Manipulation of Data and Quality Metadata . . . . . . . . 88 2.7.3 Quality-Extended Query Processing . . . . . . . . . . . .
  17. 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  18. 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  19. 8.2 Research Perspectives . . . . . . . . . . . . . . . . . . . . .
  20. 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  21. 2 Quality-Aware Integration of Biomedical Data . . . . . . . . . . 126 4.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . .
  22. 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
  23. 2.3 Contributions and Perspectives . . . . . . . . . . . . . . .
  24. 3 Quality-Driven Query in Mediation Systems . . . . . . . . . . .
  25. 3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . .
  26. 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
  27. 3.3 Contributions and Perspectives . . . . . . . . . . . . . . .
  28. 4 Monitoring the Quality of Stream Data . . . . . . . . . . . . . .
  29. 4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . .
  30. 4.2 Prospective Work . . . . . . . . . . . . . . . . . . . . . . .
  31. 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  32. GIMS, http://www.cs.man.ac.uk/img/gims/
  33. DataFoundry, http://www.llnl.gov/CASC/datafoundry/ 3 TAMBIS, http://imgproj.cs.man.ac.uk/tambis/
  34. P/FDM, http://www.csd.abdn.ac.uk/ gjlk/mediator/
  35. DiscoveryLink, http://www.research.ibm.com/journal/sj/402/haas.html 6 EMBL, European Molecular Biology Laboratory: http://www.embl-heidelberg.de/ 7 Febrl, http://datamining.anu.edu.au/software/febrl/febrldoc/ 8 NCBI References Sequences http://www.ncbi.nlm.nih.gov/RefSeq/ Bibliography AGGARWAL, CHARU. 2007. Data Streams: Models and Algorithms. Springer. AGRAWAL, RAKESH, IMIELINSKI, TOMASZ, & SWAMI, ARUN N. 1993. Mining Association Rules Between Sets of Items in Large Databases. Pages 207-216 of: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington, DC, USA.
  36. ANANTHAKRISHNA, ROHIT, CHAUDHURI, SURAJIT, & GANTI, VENKATESH. 2002. Eliminating Fuzzy Duplicates in Data Warehouses. Pages 586-597 of: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong Kong, China.
  37. ARENAS, MARCELO, BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 1999. Con- sistent Query Answers in Inconsistent Databases. Pages 68-79 of: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Philadelphia, PA, USA.
  38. ARENAS, MARCELO, BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 2000. Speci- fying and Querying Database Repairs Using Logic Programs with Exceptions. Pages 27-41 of: Proceedings of the 4th International Conference on Flexible Query Answering Systems, FQAS 2000. Warsaw, Poland.
  39. ARENAS, MARCELO, BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 2003. Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory and Prac- tice of Logic Programming (TPLP), 3(4-5), 393-424.
  40. BALLOU, DONALD P., & TAYI, GIRI KUMAR. 1989. Methodology for Allocating Resources for Data Quality Enhancement. Commun. ACM, 32(3), 320-329.
  41. BALLOU, DONALD P., & TAYI, GIRI KUMAR. 1999. Enhancing Data Quality in Data Warehouse Environments. Commun. ACM, 42(1), 73-78.
  42. BANSAL, NIKHIL, BLUM, AVRIM, & CHAWLA, SHUCHI. 2002. Correlation Clus- tering. Page 238 of: Proceedings of 43rd Symposium on Foundations of Computer Science, FOCS 2002. Vancouver, BC, Canada.
  43. BARBARÁ, DANIEL, GARCIA-MOLINA, HECTOR, & PORTER, DARYL. 1990. A Probalilistic Relational Data Model. Pages 60-74 of: Proceedings of the 2nd Inter- national Conference on Extending Database Technology, EDBT 1990. Lecture Notes in Computer Science, vol. 416. Venice, Italy.
  44. BARGA, ROGER S., & PU, CALTON. 1993. Accessing Imprecise Data: An Approach Based on Intervals. IEEE Data Eng. Bull., 16(2), 12-15.
  45. BASU, AYANENDRANATH, HARRIS, IAN R., & BASU, SRABASHI. 1997. Minimum Distance Estimation: The Approach Using Density-Based Distances. Handbook of Statistics, 15, 21-48.
  46. BATINI, CARLO, & SCANNAPIECO, MONICA. 2006. Data Quality: Concepts, Method- ologies and Techniques. Data-Centric Systems and Applications. Springer-Verlag.
  47. BATINI, CARLO, TIZIANA, CATARCI, & SCANNAPIECO, MONICA. 2004. A Survey of Data Quality Issues in Cooperative Systems. In: Tutorial of the 23rd Interna- tional Conference on Conceptual Modeling, ER 2004. Shanghai, China.
  48. BAXTER, ROHAN A., CHRISTEN, PETER, & CHURCHES, TIM. 2003. A Comparison of Fast Blocking Methods for Record Linkage. Pages 27-29 of: Proceedings of the KDD'03 Workshop on Data Cleaning, Record Linkage and Object Consolidation. Washington, DC, USA.
  49. BENJELLOUN, OMAR, SARMA, ANISH DAS, HALEVY, ALON, & WIDOM, JEN- NIFER. 2005 (June). The Symbiosis of Lineage and Uncertainty. Technical Report 2005-39. Stanford InfoLab, Stanford University, CA, USA.
  50. BERENGUER, GEMA, ROMERO, RAFAEL, TRUJILLO, JUAN, SERRANO, MANUEL, & PIATTINI, MARIO. 2005. A Set of Quality Indicators and Their Corresponding Metrics for Conceptual Models of Data Warehouses. Pages 95-104 of: Proceed- ings of the 7th International Conference on Data Warehousing and Knowledge Discov- ery, DaWaK 2005. Lecture Notes in Computer Science, vol. 3589. Copenhagen, Denmark.
  51. BERNSTEIN, PHILIP A., BERGSTRAESSER, THOMAS, CARLSON, JASON, PAL, SHANKAR, SANDERS, PAUL, & SHUTT, DAVID. 1999. Microsoft Repository Ver- sion 2 and the Open Information Model. Inf. Syst., 24(2), 71-98.
  52. BERTI-ÉQUILLE, LAURE. 1999b. Qualité des données multi-sources et recomman- dation multi-critère. Pages 185-204 of: Actes du congrès francophone INFormatique des ORganisations et Systèmes d'Information Décisionnels, INFORSID 1999. Toulon, France.
  53. BERTI-ÉQUILLE, LAURE. 1999c. Quality and Recommendation of Multi-Source Data for Assisting Technological Intelligence Applications. Pages 282-291 of: Proceedings of the International Conference on Database and Expert Systems Applica- tions, (DEXA'99). Lecture Notes in Computer Science, vol. 1677. Florence, Italy.
  54. BERTI-ÉQUILLE, LAURE. 2001. Integration of Biological Data and Quality-driven Source Negotiation. Pages 256-269 of: Proceedings of the 20th International Confer- ence on Conceptual Modeling, ER'2001. Lecture Notes in Computer Science, vol. 2224. Yokohama, Japan.
  55. BERTI-ÉQUILLE, LAURE. 2002. Annotation et recommandation collaboratives de documents selon leur qualité. Revue Ingénieire des Systèmes d'Information (ISI- NIS), Numéro Spécial "Recherche et Filtrage d'Information", 7(1-2/2002), 125-156.
  56. BERTI-ÉQUILLE, LAURE. 2003. Quality-Extended Query Processing for Distributed Sources. In: Proceedings of the 1rst International Workshop on Data Quality in Coop- erative Information Systems, DQCIS'2003. Siena, Italy.
  57. BERTI-ÉQUILLE, LAURE. 2003a. Quality-based Recommendation of XML Docu- ments. Journal of Digital Information Management, 1(3), 117-128.
  58. BERTI-ÉQUILLE, LAURE. 2004. Quality-Adaptive Query Processing over Dis- tributed Sources. Pages 285-296 of: Proceedings of the 9th International Conference on Information Quality, ICIQ 2004. Massachusetts Institute of Technology, Cam- bridge, MA, USA.
  59. BERTI-ÉQUILLE, LAURE, & MOUSSOUNI, FOUZIA. 2005. Quality-Aware Integra- tion and Warehousing of Genomic Data. Pages 442-454 of: Proceedings of the 10th International Conference on Information Quality, ICIQ 2005. Massachusetts Institute of Technology, Cambridge, MA, USA.
  60. BERTI-ÉQUILLE, LAURE, MOUSSOUNI, FOUZIA, & ARCADE, ANNE. 2001. Inte- gration of Biological Data on Transcriptome. Revue ISI-NIS, Numéro Spécial In- teropérabilité et Intégration des Systèmes d'Information, 6(3/2001), 61-86.
  61. BERTI-ÉQUILLE, LAURE. 2006a. Data Quality Awareness: a Case Study for Cost- Optimal Association Rule Mining. Knowl. Inf. Syst., 11(2), 191-215.
  62. BERTI-ÉQUILLE, LAURE. 2006b. Qualité des données. Techniques de l'Ingénieur, H3700, 1-19.
  63. BERTI-ÉQUILLE, LAURE. 2006c. Quality-Aware Association Rule Mining. Pages 440-449 of: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, (PAKDD 2006). Lecture Notes in Artificial Intelligence, vol. 3918. Springer.
  64. BERTOSSI, LEOPOLDO E., & BRAVO, LORETO. 2005. Consistent Query Answers in Virtual Data Integration Systems. Pages 42-83 of: Inconsistency Tolerance, Dagstuhl Seminar. Lecture Notes in Computer Science, vol. 3300. Schloss Dagstuhl, Germany.
  65. BERTOSSI, LEOPOLDO E., & CHOMICKI, JAN. 2003. Query Answering in Incon- sistent Databases. Pages 43-83 of: Logics for Emerging Applications of Databases, Dagstuhl Seminar. Schloss Dagstuhl, Germany.
  66. BERTOSSI, LEOPOLDO E., & SCHWIND, CAMILLA. 2004. Database Repairs and Analytic Tableaux. Ann. Math. Artif. Intell., 40(1-2), 5-35.
  67. BHATTACHARYA, INDRAJIT, & GETOOR, LISE. 2004. Iterative Record Linkage for Cleaning and Integration. Pages 11-18 of: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004. Paris, France.
  68. BILENKO, MIKHAIL, & MOONEY, RAYMOND J. 2003. Adaptive Duplicate Detec- tion Using Learnable String Similarity Measures. Pages 39-48 of: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA.
  69. BILENKO, MIKHAIL, BASU, SUGATO, & SAHAMI, MEHRAN. 2005. Adaptive Prod- uct Normalization: Using Online Learning for Record Linkage in Comparison Shopping. Pages 58-65 of: Proceedings of the 5th IEEE International Conference on Data Mining, ICDM 2005. Houston, TX, USA.
  70. BILKE, ALEXANDER, BLEIHOLDER, JENS, BÖHM, CHRISTOPH, DRABA, KARSTEN, NAUMANN, FELIX, & WEIS, MELANIE. 2005. Automatic Data Fusion with Hum- Mer. Pages 1251-1254 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
  71. BOSC, PATRICK, LIETARD, NADIA, & PIVERT, OLIVIER. 2006. About Inclusion- Based Generalized Yes/No Queries in a Possibilistic Database Context. Pages 284-289 of: Proceedings of the 16th International Symposium on Foundations of Intel- ligent Systems, ISMIS 2006. Bari, Italy.
  72. BOUZEGHOUB, MOKRANE, & PERALTA, VERÓNIKA. 2004. A Framework for Anal- ysis of Data Freshness. Pages 59-67 of: Proceedings of the 1st International ACM SIGMOD 2004 Workshop on Information Quality in Information Systems, IQIS 2004. Paris, France.
  73. BRAUMANDL, REINHARD, KEIDL, MARKUS, KEMPER, ALFONS, KOSSMANN, DONALD, SELTZSAM, STEFAN, & STOCKER, KONRAD. 2001. ObjectGlobe: Open Distributed Query Processing Services on the Internet. IEEE Data Eng. Bull., 24(1), 64-70.
  74. BRAVO, LORETO, & BERTOSSI, LEOPOLDO E. 2003. Logic Programs for Consis- tently Querying Data Integration Systems. Pages 10-15 of: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI-03. Acapulco, Mexico.
  75. BREUNIG, MARKUS M., KRIEGEL, HANS-PETER, NG, RAYMOND T., & SANDER, JÖRG. 2000. LOF: Identifying Density-Based Local Outliers. Pages 93-104 of: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, TX, USA.
  76. BRIGHT, LAURA, & RASCHID, LOUIQA. 2002. Using Latency-Recency Profiles for Data Delivery on the Web. Pages 550-561 of: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong Kong, China.
  77. BRY, FRANÇOIS. 1997. Query Answering in Information Systems with Integrity Constraints. Pages 113-130 of: Integrity and Internal Control in Information Sys- tems, IFIP TC11 Working Group 11.5, First Working Conference on Integrity and Inter- nal Control in Information Systems: Increasing the confidence in Information Systems, IICIS. Zurich, Switzerland.
  78. BUECHI, MARTIN, BORTHWICK, ANDREW, WINKEL, ADAM, & GOLDBERG, ARTHUR. 2003. ClueMaker: A Language for Approximate Record Matching. Pages 207-223 of: Proceedings of the 8th International Conference on Information Quality, ICIQ 2003. MIT, Cambridge, MA, USA.
  79. CALÌ, ANDREA, LEMBO, DOMENICO, & ROSATI, RICCARDO. 2003. On the Decid- ability and Complexity of Query Answering Over Inconsistent and Incomplete Databases. Pages 260-271 of: Proceedings of the 22nd ACM SIGACT-SIGMOD- SIGART Symposium on Principles of Database Systems, PODS. San Diego, CA, USA. CARREIRA, PAULO J. F., & GALHARDAS, HELENA. 2004. Execution of Data Map- pers. Pages 2-9 of: Proceedings of the 1st International ACM SIGMOD 2004 Work- shop on Information Quality in Information Systems, IQIS 2004. Paris, France.
  80. CARUSO, FRANCESCO, COCHINWALA, MUNIR, GANAPATHY, UMA, LALK, GAIL, & MISSIER, PAOLO. 2000. Telcordia's Database Reconciliation and Data Quality Analysis Tool. Pages 615-618 of: Proceedings of 26th International Conference on Very Large Data Bases, VLDB 2000. Cairo, Egypt.
  81. CAVALLO, ROGER, & PITTARELLI, MICHAEL. 1987. The Theory of Probabilistic Databases. Pages 71-81 of: Proceedings of 13th International Conference on Very Large Data Bases, VLDB 1987. Brighton, England.
  82. CERI, STEFANO, COCHRANE, ROBERTA, & WIDOM, JENNIFER. 2000. Practical Applications of Triggers and Constraints: Success and Lingering Issues. Pages 254-262 of: Proceedings of 26th International Conference on Very Large Data Bases, VLDB 2000. Cairo, Egypt.
  83. CHARIKAR, MOSES, GURUSWAMI, VENKATESAN, & WIRTH, ANTHONY. 2003. Clustering with Qualitative Information. Pages 524-533 of: Proceedings of 44th Symposium on Foundations of Computer Science, FOCS 2003. Cambridge, MA, USA. CHAUDHURI, SURAJIT, GANJAM, KRIS, GANTI, VENKATESH, & MOTWANI, RA- JEEV. 2003. Robust and Efficient Fuzzy Match for Online Data Cleaning. Pages 313-324 of: Proceedings of the 2003 ACM SIGMOD International Conference on Man- agement of Data. San Diego, CA, USA.
  84. CHAUDHURI, SURAJIT, GANTI, VENKATESH, & MOTWANI, RAJEEV. 2005. Robust Identification of Fuzzy Duplicates. Pages 865-876 of: Proceedings of the 21st In- ternational Conference on Data Engineering, ICDE 2005. Tokyo, Japan.
  85. CHAUDHURI, SURAJIT, GANTI, VENKATESH, & KAUSHIK, RAGHAV. 2006. A Prim- itive Operator for Similarity Joins in Data Cleaning. Page 5 of: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006. Atlanta, GA, USA.
  86. CHENG, REYNOLD, KALASHNIKOV, DMITRI V., & PRABHAKAR, SUNIL. 2003. Evaluating Probabilistic Queries over Imprecise Data. Pages 551-562 of: Pro- ceedings of the 2003 ACM SIGMOD International Conference on Management of Data. San Diego, CA, USA.
  87. CHO, JUNGHOO, & GARCIA-MOLINA, HECTOR. 2000. Synchronizing a Database to Improve Freshness. Pages 117-128 of: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, TX, USA.
  88. CHOENNI, SUNIL, BLOK, HENK ERNST, & LEERTOUWER, ERIK. 2006. Handling Uncertainty and Ignorance in Databases: A Rule to Combine Dependent Data. Pages 295-309 of: Proceedings of 11th International Conference on Database Systems for Advanced Applications, DASFAA 2006. Lecture Notes in Computer Science, vol. 3882. Singapore.
  89. CHOMICKI, JAN. 2006. Consistent Query Answering: Opportunities and Limi- tations. Pages 527-531 of: Proceedings of 2nd International Workshop on Logical Aspects and Applications of Integrity Constraints, LAAIC 2006. Krakow, Poland.
  90. CHRISTEN, PETER, CHURCHES, TIM, & HEGLAND, MARKUS. 2004. Febrl -A Par- allel Open Source Data Linkage System. Pages 638-647 of: Proceedings of the 8th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2004. Lecture Notes in Computer Science, vol. 3056. Sydney, Australia.
  91. COHEN, WILLIAM W., RAVIKUMAR, PRADEEP, & FIENBERG, STEPHEN E. 2003. A Comparison of String Distance Metrics for Name-Matching Tasks. Pages 73-78 of: Proceedings of IJCAI-03 Workshop on Information Integration on the Web, IIWeb- 03. Acapulco, Mexico.
  92. COULON, CÉDRIC, PACITTI, ESTHER, & VALDURIEZ, PATRICK. 2005. Consistency Management for Partial Replication in a High Performance Database Cluster. Pages 809-815 of: Proceedings of 11th International Conference on Parallel and Dis- tributed Systems, ICPADS 2005, vol. 1. Fuduoka, Japan.
  93. CUI, YINGWEI, & WIDOM, JENNIFER. 2003. Lineage Tracing for General Data Warehouse Transformations. VLDB J., 12(1), 41-58.
  94. CULOTTA, ARON, & MCCALLUM, ANDREW. 2005. Joint Deduplication of Multiple Record Types in Relational Data. Pages 257-258 of: Proceedings of the 2005 ACM International Conference on Information and Knowledge Management, CIKM 2005. Bremen, Germany.
  95. DALVI, NILESH N., & SUCIU, DAN. 2004. Efficient Query Evaluation on Proba- bilistic Databases. Pages 864-875 of: Proceedings of the 30th International Confer- ence on Very Large Data Bases, VLDB 2004. Toronto, ON, Canada.
  96. DASU, TAMRAPARNI, & JOHNSON, THEODORE. 2003. Exploratory Data Mining and Data Cleaning. John Wiley.
  97. DASU, TAMRAPARNI, JOHNSON, THEODORE, MUTHUKRISHNAN, S., & SHKAPENYUK, VLADISLAV. 2002. Mining Database Structure or How To Build a Data Quality Browser. Pages 240-251 of: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. Madison, WI, USA.
  98. DASU, TAMRAPARNI, VESONDER, GREGG T., & WRIGHT, JON R. 2003. Data Qual- ity Through Knowledge Engineering. Pages 705-710 of: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003. Washington, DC, USA.
  99. DEMPSTER, ARTHUR PENTLAND, LAIRD, NAN M., & RUBIN, DONALD B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39, 1-38.
  100. DOMINGOS, PEDRO, & HULTEN, GEOFF. 2001. Catching up with the Data: Re- search Issues in Mining Data Streams. In: Proceedings of 2001 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD2001. Santa Barbara, CA, USA.
  101. DONG, XIN, HALEVY, ALON Y., & MADHAVAN, JAYANT. 2005. Reference Recon- ciliation in Complex Information Spaces. Pages 85-96 of: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. Baltimore, MD, USA. DUMOUCHEL, WILLIAM, VOLINSKY, CHRIS, JOHNSON, THEODORE, CORTES, CORINNA, & PREGIBON, DARYL. 1999. Squashing Flat Files Flatter. Pages 6-15 of: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining, KDD 1999. San Diego, CA, USA.
  102. ELFEKY, MOHAMED G., ELMAGARMID, AHMED K., & VERYKIOS, VASSILIOS S. 2002. TAILOR: A Record Linkage Tool Box. Pages 17-28 of: Proceedings of the 18th International Conference on Data Engineering, ICDE 2002. San Jose, CA, USA.
  103. ELMAGARMID, AHMED K., IPEIROTIS, PANAGIOTIS G., & VERYKIOS, VASSIL- IOS S. 2007. Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng., 19(1), 1-16.
  104. EMBURY, SUZANNE M., BRANDT, SUE M., ROBINSON, JOHN S., SUTHERLAND, IAIN, BISBY, FRANK A., GRAY, W. ALEX, JONES, ANDREW C., & WHITE, RICHARD J. 2001. Adapting Integrity Enforcement Techniques for Data Rec- onciliation. Inf. Syst., 26(8), 657-689.
  105. ENGLISH, LARRY. 2002. Process Management and Information Quality: How Improving Information Production Processes Improves Information (Product) Quality. Pages 206-209 of: Proceedings of the Seventh International Conference on Information Quality, ICIQ 2002. MIT, Cambridge, MA, USA.
  106. ENGLISH, LARRY P. 1999. Improving Data Warehouse and Business Information Qual- ity. Wiley.
  107. FAGIN, RONALD, KOLAITIS, PHOKION G., MILLER, RENÉE J., & POPA, LUCIAN. 2003. Data Exchange: Semantics and Query Answering. Pages 207-224 of: Pro- ceedings of 9th International Conference on Database Theory, ICDT 2003. Lecture Notes in Computer Science, vol. 2572. Siena, Italy.
  108. FALOUTSOS, CHRISTOS. 2002. Sensor Data Mining: Similarity Search and Pattern Analysis. In: Tutorial of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong Kong, China.
  109. FALOUTSOS, CHRISTOS, & LIN, KING-IP. 1995. FastMap: A Fast Algorithm for In- dexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Pages 163-174 of: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, CA, USA.
  110. FELLEGI, IVAN P., & SUNTER, A.B. 1969. A Theory for Record Linkage. Journal of the American Statistical Association, 64, 1183-1210.
  111. FLESCA, SERGIO, FURFARO, FILIPPO, & PARISI, FRANCESCO. 2005. Consistent Query Answers on Numerical Databases Under Aggregate Constraints. Pages 279-294 of: Proceedings of 10th International Symposium on Database Programming Languages, DBPL 2005. Trondheim, Norway.
  112. FOX, CHRISTOPHER J., LEVITIN, ANANY, & REDMAN, THOMAS. 1994. The Notion of Data and Its Quality Dimensions. Inf. Process. Manage., 30(1), 9-20.
  113. GALHARDAS, HELENA, FLORESCU, DANIELA, SHASHA, DENNIS, & SIMON, ERIC. 2000. AJAX: An Extensible Data Cleaning Tool. Page 590 of: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, TX, USA. GALHARDAS, HELENA, FLORESCU, DANIELA, SHASHA, DENNIS, SIMON, ERIC, & SAITA, CRISTIAN-AUGUSTIN. 2001. Declarative Data Cleaning: Language, Model, and Algorithms. Pages 371-380 of: Proceedings of 27th International Con- ference on Very Large Data Bases, VLDB 2001. Roma, Italy.
  114. GELENBE, EROL, & HÉBRAIL, GEORGES. 1986. A Probability Model of Uncertainty in Data Bases. Pages 328-333 of: Proceedings of the Second International Conference on Data Engineering, ICDE 1986. Los Angeles, CA, USA.
  115. GRAHNE, GÖSTA. 2002. Information Integration and Incomplete Information. IEEE Data Eng. Bull., 25(3), 46-52.
  116. GRAVANO, LUIS, IPEIROTIS, PANAGIOTIS G., JAGADISH, H. V., KOUDAS, NICK, MUTHUKRISHNAN, S., PIETARINEN, LAURI, & SRIVASTAVA, DIVESH. 2001. Us- ing q-grams in a DBMS for Approximate String Processing. IEEE Data Eng. Bull., 24(4), 28-34.
  117. GRAVANO, LUIS, IPEIROTIS, PANAGIOTIS G., KOUDAS, NICK, & SRIVASTAVA, DI- VESH. 2003. Text Joins for Data Cleansing and Integration in an RDBMS. Pages 729-731 of: Proceedings of the 19th International Conference on Data Engineering, ICDE 2003. Bangalore, India.
  118. GUÉRIN, EMILIE, MOUSSOUNI, FOUZIA, & BERTI-ÉQUILLE, LAURE. 2001. Inté- gration des données sur le transcriptome. Pages 219-228 of: Actes de la journée de travail bi-thématique du GDR-PRC I3. Lyon, France.
  119. GUÉRIN, EMILIE, MARQUET, GWENAELLE, BURGUN, ANITA, LORÉAL, OLIVIER, BERTI-ÉQUILLE, LAURE, LESER, ULF, & MOUSSOUNI, FOUZIA. 2005. Integrat- ing and Warehousing Liver Gene Expression Data and Related Biomedical Re- sources in GEDAW. Pages 158-174 of: Proceedings of the 2nd International Work- shop on Data Integration in the Life Sciences, DILS 2005. San Diego, CA, USA.
  120. GUHA, SUDIPTO, RASTOGI, RAJEEV, & SHIM, KYUSEOK. 2001. Cure: An Efficient Clustering Algorithm for Large Databases. Inf. Syst., 26(1), 35-58.
  121. GUO, HONGFEI, LARSON, PER-ÅKE, RAMAKRISHNAN, RAGHU, & GOLDSTEIN, JONATHAN. 2004. Relaxed Currency and Consistency: How to Say "Good Enough" in SQL. Pages 815-826 of: Proceedings of the ACM SIGMOD Interna- tional Conference on Management of Data. Paris, France.
  122. GUO, HONGFEI, LARSON, PER-ÅKE, & RAMAKRISHNAN, RAGHU. 2005. Caching with 'Good Enough' Currency, Consistency, and Completeness. Pages 457-468 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
  123. HALEVY, ALON Y. 2001. Answering Queries Using Views: A Survey. VLDB J., 10(4), 270-294.
  124. HERNÁNDEZ, MAURICIO A., & STOLFO, SALVATORE J. 1998. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem. Data Min. Knowl. Discov., 2(1), 9-37.
  125. HJALTASON, GÍSLI R., & SAMET, HANAN. 2003. Properties of Embedding Meth- ods for Similarity Searching in Metric Spaces. IEEE Trans. Pattern Anal. Mach. Intell., 25(5), 530-549.
  126. HOU, WEN-CHI, & ZHANG, ZHONGYANG. 1995. Enhancing Database Correct- ness: a Statistical Approach. Pages 223-232 of: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, CA, USA.
  127. HULL, RICHARD, & ZHOU, GANG. 1996. A Framework for Supporting Data In- tegration Using the Materialized and Virtual Approaches. Pages 481-492 of: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Montreal, Quebec, Canada.
  128. HULTEN, GEOFF, SPENCER, LAURIE, & DOMINGOS, PEDRO. 2001. Mining time- changing data streams. Pages 97-106 of: Proceedings of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2001. San Francisco, CA, USA.
  129. HUNG, EDWARD, GETOOR, LISE, & SUBRAHMANIAN, V. S. 2003. PXML: A Prob- abilistic Semistructured Data Model and Algebra. Page 467 of: Proceedings of the 19th International Conference on Data Engineering, ICDE'03. Bangalore, India.
  130. IBRAHIM, HAMIDAH. 2002. A Strategy for Semantic Integrity Checking in Dis- tributed Databases. Pages 139-144 of: Proceedings of 9th International Conference on Parallel and Distributed Systems, ICPADS 2002. Taiwan, ROC.
  131. IMIELINSKI, TOMASZ, & LIPSKI, WITOLD JR. 1984. Incomplete Information in Relational Databases. J. ACM, 31(4), 761-791.
  132. JARKE, MATTHIAS, JEUSFELD, MANFRED A., QUIX, CHRISTOPH, & VASSILIADIS, PANOS. 1999. Architecture and Quality in Data Warehouses: An Extended Repository Approach. Inf. Syst., 24(3), 229-253.
  133. JARO, MATTHEW A. 1989. Advances in Record Linking Methodology as Applied to the 1985 Census of Tampa Florida. Journal of the American Statistical Society, 64, 1183-1210.
  134. JARO, MATTHEW A. 1995. Probabilistic Linkage of Large Public Health Data File. Statistics in Medicine, 14, 491-498.
  135. KAHN, BEVERLY K., STRONG, DIANE M., & WANG, RICHARD Y. 2002. Informa- tion Quality Benchmarks: Product and Service Performance. Commun. ACM, 45(4), 184-192.
  136. KALASHNIKOV, DMITRI V. & MEHROTRA, SHARAD. 2006. Domain-Independent Data Cleaning via Analysis of Entity-Relationship Graph. ACM Transactions on Database Systems, 31(2), 716-767.
  137. KARAKASIDIS, ALEXANDROS, VASSILIADIS, PANOS, & PITOURA, EVAGGELIA. 2005. ETL Queues for Active Data Warehousing. Pages 28-39 of: Proceedings of the 2nd International ACM SIGMOD 2005 Workshop on Information Quality in Information Systems, IQIS 2005. Baltimore, MA, USA.
  138. KAUFMAN, L., & ROUSSEEUW, PETER J. 1990. Finding Groups in Data: An Introduc- tion to Cluster Analysis. John Wiley.
  139. KNORR, EDWIN M., & NG, RAYMOND T. 1998. Algorithms for Mining Distance- Based Outliers in Large Datasets. Pages 392-403 of: Proceedings of 24rd Interna- tional Conference on Very Large Data Bases, VLDB 1998. New York City, NY, USA.
  140. KORN, FLIP, MUTHUKRISHNAN, S., & ZHU, YUNYUE. 2003. Checks and Balances: Monitoring Data Quality Problems in Network Traffic Databases. Pages 536-547 of: Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003. Berlin, Germany.
  141. LABRINIDIS, ALEXANDROS, & ROUSSOPOULOS, NICK. 2003. Balancing Perfor- mance and Data Freshness in Web Database Servers. Pages 393-404 of: Proceed- ings of 29th International Conference on Very Large Data Bases, VLDB 2003. Berlin, Germany.
  142. LACROIX, ZOE, & CRITCHLOW, TERENCE (eds). 2003. Bioinformatics: Managing Scientific Data. Morgan Kaufmann.
  143. LAKSHMANAN, LAKS V. S., & SADRI, FEREIDOON. 1994. Modeling Uncertainty in Deductive Databases. Pages 724-733 of: Proceedings of the 5th International Conference on Database and Expert Systems Applications, DEXA'94. Lecture Notes in Computer Science, vol. 856. Athens, Greece.
  144. LAVRA Č, NADA, FLACH, PETER A., & ZUPAN, BLAZ. 1999. Rule Evaluation Mea- sures: A Unifying View. Pages 174-185 of: Proceedings of the Intl. Workshop on Inductive Logic Programming, ILP 1999. Bled, Slovenia.
  145. LAZARIDIS, IOSIF, & MEHROTRA, SHARAD. 2004. Approximate Selection Queries over Imprecise Data. Pages 140-152 of: Proceedings of the 20th International Con- ference on Data Engineering, ICDE 2004. Boston, MA, USA.
  146. LEE, LILLIAN. 2001. On the Effectiveness of the Skew Divergence for Statistical Language Analysis. Artificial Intelligence and Statistics, 65-72.
  147. LEE, MONG-LI, HSU, WYNNE, & KOTHARI, VIJAY. 2004. Cleaning the Spurious Links in Data. IEEE Intelligent Systems, 19(2), 28-33.
  148. LEE, SUK KYOON. 1992. An Extended Relational Database Model for Uncertain and Imprecise Information. Pages 211-220 of: Proceedings of the 18th International Conference on Very Large Data Bases, VLDB 1992. Vancouver, Canada.
  149. LEMBO, DOMENICO, LENZERINI, MAURIZIO, & ROSATI, RICCARDO. 2002. Source Inconsistency and Incompleteness in Data Integration. In: Proceedings of the 9th International Workshop on Knowledge Representation meets Databases, KRDB 2002, vol. 54. Toulouse, France.
  150. LI, CHEN. 2003. Computing Complete Answers to Queries in the Presence of Lim- ited Access Patterns. VLDB J., 12(3), 211-227.
  151. LI, WEN-SYAN, PO, OLIVER, HSIUNG, WANG-PIN, CANDAN, K. SELÇUK, & AGRAWAL, DIVYAKANT. 2003. Freshness-Driven Adaptive Caching for Dy- namic Content Web Sites. Data Knowl. Eng., 47(2), 269-296.
  152. LIEPINS, GUNAR E., & UPPULURI, V. R. 1991. Data Quality Control: Theory and Pragmatics. New York, NY, USA: Marcel Dekker, Inc. 0-8247-8354-9.
  153. LIM, EE-PENG, SRIVASTAVA, JAIDEEP, PRABHAKAR, SATYA, & RICHARDSON, JAMES. 1993. Entity Identification in Database Integration. Pages 294-301 of: Proceedings of the 9th International Conference on Data Engineering, ICDE 1993. Vi- enna, Austria.
  154. LIN, JINXIN, & MENDELZON, ALBERTO O. 1998. Merging Databases Under Con- straints. Int. J. Cooperative Inf. Syst., 7(1), 55-76.
  155. LOSHIN, D. 2001. Enterprise Knowledge Management: The Data Quality Approach. Morgan Kaufmann.
  156. LOW, WAI LUP, LEE, MONG-LI, & LING, TOK WANG. 2001. A Knowledge-Based Approach for Duplicate Elimination in Data Cleaning. Inf. Syst., 26(8), 585-606.
  157. MANNINO, MICHAEL V., CHU, PAICHENG, & SAGER, THOMAS. 1988. Statistical Profile Estimation in Database Systems. ACM Comput. Surv., 20(3), 191-221.
  158. MARQUET, GWENAELLE, BURGUN, ANITA, MOUSSOUNI, FOUZIA, GUÉRIN, EM- ILIE, LE DUFF, FRANCK, & LORÉAL, OLIVIER. 2003. BioMeKe: an Ontology- Based Biomedical Knowledge Extraction System Devoted to Transcriptome Analysis. Studies in Health Technology and Informatics, 95, 80-86.
  159. MARTINEZ, ALEXANDRA, & HAMMER, JOACHIM. 2005. Making Quality Count in Biological Data Sources. Pages 16-27 of: Proceedings of the 2nd International ACM SIGMOD 2005 Workshop on Information Quality in Information Systems, IQIS 2005. Baltimore, MA, USA.
  160. MCCALLUM, ANDREW, NIGAM, KAMAL, & UNGAR, LYLE H. 2000. Efficient Clus- tering of High-Dimensional Data Sets with Application to Reference Matching. Pages 169-178 of: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000. Boston, MA, USA.
  161. MCCALLUM, ANDREW, BELLARE, KEDAR, & PEREIRA, FERNANDO. 2005. A Con- ditional Random Field for Discriminatively-trained Finite-state String Edit Dis- tance. Pages 388-396 of: Proceedings of the 21rst Conference in Uncertainty in Arti- ficial Intelligence, UAI'05. Edinburgh, Scotland, UK.
  162. MCCLEAN, SALLY I., SCOTNEY, BRYAN W., & SHAPCOTT, MARY. 2001. Aggrega- tion of Imprecise and Uncertain Information in Databases. IEEE Trans. Knowl. Data Eng., 13(6), 902-912.
  163. MIHAILA, GEORGE A., RASCHID, LOUIQA, & VIDAL, MARIA-ESTHER. 2000. Us- ing Quality of Data Metadata for Source Selection and Ranking. Pages 93-98 of: Proceedings of the 3rd International Workshop on the Web and Databases, WebDB 2000. Dallas, TX, USA.
  164. MONGE, ALVARO E. 2000. Matching Algorithms within a Duplicate Detection System. IEEE Data Eng. Bull., 23(4), 14-20.
  165. MONGE, ALVARO E., & ELKAN, CHARLES. 1996. The Field Matching Problem: Algorithms and Applications. Pages 267-270 of: Proceedings of the 2nd Interna- tional Conference on Knowledge Discovery and Data Mining, KDD 1996. Portland, OR, USA.
  166. MOTRO, AMIHAI, & ANOKHIN, PHILIPP. 2006. FusionPlex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources. Infor- mation Fusion, 7(2), 176-196.
  167. MOTRO, AMIHAI, & RAKOV, IGOR. 1998. Estimating the Quality of Databases. Pages 298-307 of: Proceedings of the 3rd International Conference on Flexible Query Answering Systems, FQAS'98. Roskilde, Denmark.
  168. MÜLLER, HEIKO, & NAUMANN, FELIX. 2003. Data Quality in Genome Databases. Pages 269-284 of: Proceedings of the 8th International Conference on Information Quality, ICIQ 2003. MIT, Cambridge, MA, USA.
  169. MÜLLER, HEIKO, LESER, ULF, & FREYTAG, JOHANN CHRISTOPH. 2004. Mining for Patterns in Contradictory Data. Pages 51-58 of: Proceedings of the 1st International ACM SIGMOD 2004 Workshop on Information Quality in Information Systems, IQIS 2004. Paris, France.
  170. MYLOPOULOS, JOHN, BORGIDA, ALEXANDER, JARKE, MATTHIAS, & KOUBARAKIS, MANOLIS. 1990. Telos: Representing Knowledge About Information Systems. ACM Trans. Inf. Syst., 8(4), 325-362.
  171. NAJJAR, FAÏZA, & SLIMANI, YAHYA. 1999. Cardinality Estimation of Distributed Join Queries. Pages 66-70 of: Proceedings of the 10th International DEXA Workshop on on Parallel & Distributed Databases: Innovative Applications & New Architectures. Florence, Italy.
  172. NASH, ALAN, & LUDÄSCHER, BERTRAM. 2004. Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns. Pages 422-440 of: Pro- ceedings of the 9th International Conference on Extending Database Technology, EDBT 2004. Lecture Notes in Computer Science, vol. 2992. Heraklion, Crete, Greece.
  173. NAUMANN, FELIX. 2002. Quality-Driven Query Answering for Integrated Information Systems. Lecture Notes in Computer Science, vol. 2261. Springer-Verlag.
  174. NAUMANN, FELIX, LESER, ULF, & FREYTAG, JOHANN CHRISTOPH. 1999. Quality- Driven Integration of Heterogenous Information Systems. Pages 447-458 of: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999. Edinburgh, Scotland, UK.
  175. NAUMANN, FELIX, FREYTAG, JOHANN CHRISTOPH, & LESER, ULF. 2004. Com- pleteness of Integrated Information Sources. Inf. Syst., 29(7), 583-615.
  176. NAVARRO, GONZALO. 2001. A Guided Tour to Approximate String Matching. ACM Comput. Surv., 33(1), 31-88.
  177. NEWCOMBE, H.B., KENNEDY, J.M., AXFORD, S.J., & JAMES, A.P. 1959. Automatic Linkage of Vital Records. Science, 954-959.
  178. NEWCOMBE, HOWARD B., & KENNEDY, JAMES M. 1962. Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information. Com- mun. ACM, 5(11), 563-566.
  179. OLSON, JACK E. 2003. Data Quality: The Accuracy Dimension. Morgan Kaufmann.
  180. OLSTON, CHRISTOPHER, & WIDOM, JENNIFER. 2005. Efficient Monitoring and Querying of Distributed, Dynamic Data via Approximate Replication. IEEE Data Eng. Bull., 28(1), 11-18.
  181. OMG. 2003 (March). Common Warehouse Metamodel (CWM), Specification Version 1.1. Tech. rept. Object Management Group.
  182. PANG, HWEEHWA, JAIN, ARPIT, RAMAMRITHAM, KRITHI, & TAN, KIAN-LEE. 2005. Verifying Completeness of Relational Query Results in Data Publishing. Pages 407-418 of: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. Baltimore, MD, USA.
  183. PAPADIMITRIOU, SPIROS, SUN, JIMENG, & FALOUTSOS, CHRISTOS. 2005. Stream- ing Pattern Discovery in Multiple Time-Series. Pages 697-708 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
  184. PARSONS, SIMON. 1996. Current Approaches to Handling Imperfect Information in Data and Knowledge Bases. IEEE Trans. Knowl. Data Eng., 8(3), 353-372.
  185. PASULA, HANNA, MARTHI, BHASKARA, MILCH, BRIAN, RUSSELL, STUART J., & SHPITSER, ILYA. 2002. Identity Uncertainty and Citation Matching. Pages 1401-1408 of: Proceedings of Advances in Neural Information Processing Systems 15, NIPS 2002. Vancouver, BC, Canada.
  186. PEARSON, RONALD K. 2005. Mining Imperfect Data: Dealing with Contamination and Incomplete Records. Philadelphia: SIAM.
  187. PEIM, MARTIN, FRANCONI, ENRICO, & PATON, NORMAN W. 2003. Estimating the Quality of Answers When Querying Over Description Logic Ontologies. Data Knowl. Eng., 47(1), 105-129.
  188. PERALTA, VERÓNIKA. 2006 (November). Data Quality Evaluation in Data Integra- tion Systems. Ph.D. thesis, Université de Versailles, France & Universidad de la República, Uruguay.
  189. PETROPOULOS, MICHALIS, DEUTSCH, ALIN, & PAPAKONSTANTINOU, YANNIS. 2006. Interactive Query Formulation over Web Service-Accessed Sources. Pages 253-264 of: Proceedings of the ACM SIGMOD International Conference on Manage- ment of Data. Chicago, IL, USA.
  190. POOLE, JOHN, CHANG, DAN, TOLBERT, DOUGLAS, & MELLOR, DAVID. 2003. Common Warehouse Metamodel Developer's Guide. New York: John Wiley & Sons Inc. PRADHAN, SHEKHAR. 2003. Argumentation Databases. Pages 178-193 of: Proceed- ings of 19th International Conference on Logic Programming, ICLP 2003. Mumbai, India.
  191. PYLE, DORIAN. 1999. Data Preparation for Data Mining. Morgan Kaufmann.
  192. QUASS, DALLAN, & STARKEY, P. 2003. A Comparison of Fast Blocking Methods for Record Linkage. Pages 40-42 of: Proceedings of the KDD 2003 Workshop on Data Cleaning, Record Linkage and Object Consolidation. Washington, DC, USA.
  193. RAHM, ERHARD, & DO, HONG HAI. 2000. Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bull., 23(4), 3-13.
  194. RAMAMRITHAM, KRITHI. 1993. Real-Time Databases. Distributed and Parallel Databases, 1(2), 199-226.
  195. RAMAMRITHAM, KRITHI, & CHRYSANTHIS, PANOS K. 1992. In Search of Accept- ability Citeria: Database Consistency Requirements and Transaction Correctness Properties. Pages 212-230 of: Proceedings of International Workshop on Distributed Object Management, IWDOM 1992. Edmonton, AL, Canada.
  196. RAMAN, VIJAYSHANKAR, & HELLERSTEIN, JOSEPH M. 2001. Potter's Wheel: An Interactive Data Cleaning System. Pages 381-390 of: Proceedings of 27th Interna- tional Conference on Very Large Data Bases, VLDB 2001. Roma, Italy.
  197. RE, CHRISTOPHER, DALVI, NILESH N., & SUCIU, DAN. 2006. Query Evaluation on Probabilistic Databases. IEEE Data Eng. Bull., 29(1), 25-31.
  198. REDMAN, THOMAS. 2001. Data Quality: The Field Guide. Digital Press, Elsevier.
  199. RISTAD, ERIC SVEN, & YIANILOS, PETER N. 1998. Learning String-Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell., 20(5), 522-532.
  200. SAMPAIO, SANDRA DE F. MENDES, DONG, CHAO, & SAMPAIO, PEDRO. 2005. In- corporating the Timeliness Quality Dimension in Internet Query Systems. Pages 53-62 of: Proceedings of the International Workshop on Web Information Systems En- gineering, WISE 2005. New York, NY, USA.
  201. SANTIS, LUCA DE, SCANNAPIECO, MONICA, & CATARCI, TIZIANA. 2003. Trust- ing Data Quality in Cooperative Information Systems. Pages 354-369 of: Proceed- ings of CoopIS, DOA, and ODBASE -OTM Confederated International Conferences. Catania, Sicily, Italy.
  202. SARAWAGI, SUNITA, & KIRPAL, ALOK. 2004. Efficient Set Joins on Similarity Pred- icates. Pages 743-754 of: Proceedings of the 2004 ACM SIGMOD International Con- ference on Management of Data. Paris, France.
  203. SAYYADIAN, MAYSSAM, LEE, YOONKYONG, DOAN, ANHAI, & ROSENTHAL, ARNON. 2005. Tuning Schema Matching Software using Synthetic Scenarios. Pages 994-1005 of: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005. Trondheim, Norway.
  204. SEGEV, ARIE, & FANG, WEIPING. 1990. Currency-Based Updates to Distributed Materialized Views. Pages 512-520 of: Proceedings of the 6th International Confer- ence on Data Engineering, ICDE 1090. Los Angeles, CA, USA.
  205. SHETH, AMIT P., WOOD, CHRISTOPHER, & KASHYAP, VIPUL. 1993. Q-Data: Using Deductive Database Technology to Improve Data Quality. Pages 23-56 of: Pro- ceedings of the International Workshop on Programming with Logic Databases, ILPS. Vancouver, BC, Canada.
  206. SIMITSIS, ALKIS, VASSILIADIS, PANOS, & SELLIS, TIMOS K. 2005. Optimizing ETL Processes in Data Warehouses. Pages 564-575 of: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. Tokyo, Japan.
  207. SINGLA, PARAG, & DOMINGOS, PEDRO. 2005. Collective Object Identification. Pages 1636-1637 of: Proceedings of the 19th International Joint Conference on Artifi- cial Intelligence, IJCAI-05. Edinburgh, Scotland, UK.
  208. STOCKINGER, KURT. 2002. Bitmap Indices for Speeding Up High-Dimensional Data Analysis. Pages 881-890 of: Proceedings of the 13th International Database and Expert Systems Applications Conference, DEXA 2002. Lecture Notes in Computer Science, vol. 2453. Aix-en-Provence, France.
  209. TAN, P.N., KUMAR, V., & SRIVASTAVA, J. 2002. Selecting the Right Interestingness Measure for Association Patterns. Pages 32-41 of: Proceedings of the 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2002. Edmon- ton,AL, Canada.
  210. TEJADA, SHEILA, KNOBLOCK, CRAIG A., & MINTON, STEVEN. 2001. Learning Object Identification Rules for Information Integration. Inf. Syst., 26(8), 607-633.
  211. TEJADA, SHEILA, KNOBLOCK, CRAIG A., & MINTON, STEVEN. 2002. Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification. Pages 350-359 of: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002. Edmonton, AL, Canada.
  212. THANGAVEL, ALPHONSE THANARAJ. 1999. A Clean Data Set of EST-confirmed Splice Sites from Homo Sapiens and Standards for Clean-up Procedures. Nucleic Acids Res., 27(13), 2627-2637.
  213. THEODORATOS, DIMITRI, & BOUZEGHOUB, MOKRANE. 1999. Data Currency Quality Factors in Data Warehouse Design. Page 15 of: Proceedings of the In- ternational Workshop on Design and Management of Data Warehouses, DMDW'99. Heidelberg, Germany.
  214. THEODORATOS, DIMITRI, & BOUZEGHOUB, MOKRANE. 2001. Data Currency Quality Satisfaction in the Design of a Data Warehouse. Int. J. Cooperative Inf. Syst., 10(3), 299-326.
  215. THOR, ANDREAS, & RAHM, ERHARD. 2007. MOMA -A Mapping-based Object Matching System. Pages 247-258 of: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, CIDR 2007. Asilomar, CA, USA.
  216. VAILLANT, BENOÎT, LENCA, PHILIPPE, & LALLICH, STÉPHANE. 2004. A Cluster- ing of Interestingness Measures. Pages 290-297 of: Proceedings of the 7th Interna- tional Conference on Discovery Science, DS 2004. Padova, Italy.
  217. VASSILIADIS, PANOS, VAGENA, ZOGRAFOULA, SKIADOPOULOS, SPIROS, KARAYANNIDIS, NIKOS, & SELLIS, TIMOS K. 2001. ARKTOS: Towards the Mod- eling, Design, Control and Execution of ETL Processes. Inf. Syst., 26(8), 537-561.
  218. VASSILIADIS, PANOS, SIMITSIS, ALKIS, GEORGANTAS, PANOS, & TERROVITIS, MANOLIS. 2003. A Framework for the Design of ETL Scenarios. Pages 520-535 of: Proceedings of the 15th International Conference on Advanced Information Systems Engineering, CAiSE 2003. Klagenfurt, Austria.
  219. VERYKIOS, VASSILIOS S., MOUSTAKIDES, GEORGE V., & ELFEKY, MOHAMED G. 2003. A Bayesian Decision Model for Cost Optimal Record Matching. VLDB J., 12(1), 28-40.
  220. WANG, KE, ZHOU, SENQIANG, YANG, QIANG, & YEUNG, JACK MAN SHUN. 2005. Mining Customer Value: from Association Rules to Direct Marketing. Data Min. Knowl. Discov., 11(1), 57-79.
  221. WANG, RICHARD Y. 1998. A Product Perspective on Total Data Quality Manage- ment. Commun. ACM, 41(2), 58-65.
  222. WANG, RICHARD Y., STOREY, VEDA C., & FIRTH, CHRISTOPHER P. 1995. A Framework for Analysis of Data Quality Research. IEEE Trans. Knowl. Data Eng., 7(4), 623-640.
  223. WANG, RICHARD Y., ZIAD, MOSTAPHA, & LEE, YANG W. 2002. Data Quality. Advances in Database Systems, vol. 23. Kluwer Academic Publishers.
  224. WEIS, MELANIE, & MANOLESCU, IOANA. 2007. XClean in Action. Pages 259-262 of: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, CIDR 2007. Asilomar, CA, USA.
  225. WEIS, MELANIE, & NAUMANN, FELIX. 2004. Detecting Duplicate Objects in XML Documents. Pages 10-19 of: Proceedings of the First International ACM SIGMOD 2004 Workshop on Information Quality in Information Systems, IQIS 2004. Paris, France.
  226. WEIS, MELANIE, NAUMANN, FELIX, & BROSY, FRANZISKA. 2006. A Duplicate Detection Benchmark for XML (and Relational) Data. In: Proceedings of the 3rd International ACM SIGMOD 2006 Workshop on Information Quality in Information Systems, IQIS 2006. Chicago, IL, USA.
  227. WIDOM, JENNIFER. 2005. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. Pages 262-276 of: Proceedings of 2nd Biennial Conference on Innovative Data Systems Research. Asilomar, CA, USA.
  228. WIJSEN, JEF. 2003. Condensed Representation of Database Repairs for Consistent Query Answering. Pages 378-393 of: Proceedings of 9th International Conference on Database Theory, ICDT 2003. Siena, Italy.
  229. WINKLER, WILLIAM E. 1999. The State of Record Linkage and Current Research Prob- lems. Tech. Rept. Statistics of Income Division, Internal Revenue Service Publi- cation R99/04. U.S. Bureau of the Census, Washington, DC, USA.
  230. WINKLER, WILLIAM E. 2004. Methods for Evaluating and Creating Data Quality. Inf. Syst., 29(7), 531-550.
  231. WINKLER, WILLIAM E., & THIBAUDEAU, YVES. 1991. An Application of the Fellegi- Sunter Model of Record Linkage to the 1990 U.S. Decennial Census. Tech. Rept. Sta- tistical Research Report Series RR91/09. U.S. Bureau of the Census, Washington, DC, USA.
  232. WU, KESHENG, OTOO, EKOW J., & SHOSHANI, ARIE. 2006. Optimizing Bitmap Indices with Efficient Compression. ACM Trans. Database Syst., 31(1), 1-38.
  233. XIONG, MING, LIANG, BIYU, LAM, KAM-YIU, & GUO, YANG. 2006. Quality of Service Guarantee for Temporal Consistency of Real-Time Transactions. IEEE Trans. Knowl. Data Eng., 18(8), 1097-1110.
  234. ZHANG, TIAN, RAMAKRISHNAN, RAGHU, & LIVNY, MIRON. 1996. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Pages 103-114 of: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Montreal, Quebec, Canada.
  235. ZHAO, XIAOFEI, & HUANG, ZHIQIU. 2006. A Formal Framework for Reasoning on Metadata Based on CWM. Pages 371-384 of: Proceedings of 25th International Conference on Conceptual Modeling, ER 2006. Lecture Notes in Computer Science, vol. 4215. Tucson, AZ, USA.
  236. ZHU, YUNYUE, & SHASHA, DENNIS. 2002. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. Pages 358-369 of: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002. Hong-Kong, China.
  237. ZHUGE, YUE, GARCIA-MOLINA, HECTOR, & WIENER, JANET L. 1997. Multiple View Consistency for Data Warehousing. Pages 289-300 of: Proceedings of the 13th International Conference on Data Engineering, ICDE 1997. Birmingham, UK.
  238. Data Sources Characteristics for MSISs . . . . . . . . . . . . . . . . . .
  239. 2 Applications and Data Types Coverage . . . . . . . . . . . . . . . . .
  240. 1 Taxonomy of Existing Techniques for Entity Resolution . . . . . . . . 16
  241. 2 Correlation Clustering Example from (Bansal et al., 2002) . . . . . . .
  242. 1 CWM Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  243. 2 Relational Metamodel of CRM_DB . . . . . . . . . . . . . . . . . . . .
  244. 3 CWM Relational Data Instance Metamodel (OMG, 2003) . . . . . . .
  245. 4 QoD Extension to CWM Data Instance Metamodel . . . . . . . . . . .
  246. 5 Example of QoD Metadata Associated to CRM_DB PRODUCT table 2.6 Example of Analytic Workflow for QoD evaluation of CRM_DB . . .
  247. 7 Syntax of Quality Contract Type Creation . . . . . . . . . . . . . . . .
  248. 8 Syntax of Call Specification in Quality Contract Type Declaration . .
  249. 9 Syntax of Quality Contract Creation . . . . . . . . . . . . . . . . . . .
  250. 10 Checking QoD Constraints on CRM_DB Granularity Levels . . . . .
  251. 11 Syntax of QWITH queries . . . . . . . . . . . . . . . . . . . . . . . . .
  252. 12 Quality-Extended Query Processing . . . . . . . . . . . . . . . . . . .
  253. 1 General Framework of Data Quality Awareness for the KDD Process 3.2 Different Levels for Measuring Data Quality . . . . . . . . . . . . . .
  254. 3 Classification Probabilities . . . . . . . . . . . . . . . . . . . . . . . . .
  255. 4 Decision Areas for Rule Post-Selection . . . . . . . . . . . . . . . . . .
  256. 5 Decision Costs for Rule Selection with a Priori Probability in [0.1,0.5] without Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . .
  257. 6 Decision Costs for Rule Selection with Different Data Quality Varia- tions without Misclassification for the a Priori Probability ¼ ¼ ¾¼¼
  258. 7 Amplitude of Cost Variations Depending on Data Quality Variations without Misclassification for the a priori Probability ¼ ¼ ¾¼¼ . . .
  259. 8 Decision Status on Rule Selection for Data Quality Variations with- out Misclassification for ¼ ¼ ¾¼¼ . . . . . . . . . . . . . . . . . . . .
  260. 9 Decision Costs for Rule Selection with a Priori Probability in [0.1,0.5] with Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  261. 1 Problems and Current Solutions for Data Quality Management . . .
  262. 2 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  263. 3 Decision Models for Handling Duplicates (Batini & Scannapieco, 2006)
  264. 4 Main Data Transformation Operators for ETL . . . . . . . . . . . . . .
  265. Data Cleaning Prototypes . . . . . . . . . . . . . . . . . . . . . . . . .
  266. 1 CRM_DB Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  267. 2 Join over FK attributes considered as correct . . . . . . . . . . . . . .
  268. 3 Join over incorrect FK attributes . . . . . . . . . . . . . . . . . . . . . .
  269. 4 Join over non FK attributes considered as correct . . . . . . . . . . . .
  270. 5 Join over incorrect non FK attributes . . . . . . . . . . . . . . . . . . .
  271. 6 Join over Deduplicated FK attributes . . . . . . . . . . . . . . . . . . .
  272. 7 Join over Deduplicated non FK attributes . . . . . . . . . . . . . . . .
  273. 8 Examples of Level I Basic Functions . . . . . . . . . . . . . . . . . . .
  274. 9 Examples of Level II Functions . . . . . . . . . . . . . . . . . . . . . .
  275. 10 Example of Statistics for Specifying SCs on Attributes Values . . . . .
  276. 11 Examples of Level III Synopses Techniques . . . . . . . . . . . . . . .
  277. 12 Examples of Level IV Classification and Partitioning Methods . . . .
  278. 13 Descriptive Metadata for Partitioning Methods . . . . . . . . . . . . .
  279. Assigning Probabilities to QoD Dimension for a DB Object Instance .
  280. 15 Range-Encoded Bitmap Index with Binning for QoD Measures . . . .
  281. 16 Examples of Quality Contract Type Declaration . . . . . . . . . . . . .
  282. 17 Example of Contract Declaration . . . . . . . . . . . . . . . . . . . . .
  283. 18 Examples of Acceptability Values per QoD Dimension . . . . . . . . .
  284. 19 Examples of Simple QWITH Queries with EXACT Constraint Checking Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  285. 20 Examples of Join QWITH Queries with EXACT Constraint Checking Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  286. 21 Examples of QWITH Queries in the Exact Constraint Checking Mode 3.
  287. Marketing Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  288. 2 Fusion Function Examples for Scoring Quality Dimensions of Asso- ciation Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  289. 3 Example of Costs of Various Decisions for Classifying Association Rules Based on Data Quality . . . . . . . . . . . . . . . . . . . . . . . .