User-Oriented Approach to Data Quality Evaluation
2020, JUCS - Journal of Universal Computer Science
Abstract
The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, ...
References (32)
- Acosta, 13] Acosta M., Zaveri A., Simperl E., Kontokostas D., Auer S., Lehmann J.: Crowdsourcing linked data quality assessment, In International Semantic Web Conference (pp. 260-276). Springer, Berlin, Heidelberg, 2013.
- Adedugbe, 18] Adedugbe, O., Benkhelifa, E., & Campion, R.: A Cloud-Driven Framework for a Holistic Approach to Semantic Annotation. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 128-134). IEEE, 2018.
- Askham, 13] Askham N., Cook D., Doyle M., Fereday H., Gibson M., Landbeck U., Lee R., Maynard C., Palmer G., Schwarzenbach J.: The six primary dimensions for data quality assessment, DAMA UK Working Group, 432-435, 2013.
- Baker, 18] Baker, Q. B., Al-Rashdan, W., & Jararweh, Y.: Cloud-Based Tools for Next- Generation Sequencing Data Analysis. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 99-105). IEEE, 2018.
- Batini, 16] Batini C., Scannapieco M., "Data and information quality" Cham, Switzerland: Springer International Publishing, Google Scholar, 2016.
- Bevan, 12] Bevan C., Strother D.: Best practices for evaluating method validity, data quality and study reliability of toxicity studies for chemical hazard risk assessments, Washington (DC): American Chemical Council, Centre for Advancing Risk Assessment Science and Policy, 2012. [Bicevska, 17] Bicevska Z., Bicevskis J., Oditis I.: Models of Data Quality, 12th Conference, ISM 2017, Held as Part of FedCSIS, Prague, Czech Republic, Extended Selected Papers. Lecture Notes in Business Information Processing, Vol. 311, pp. 194-211, 2017.
- Bicevskis, 19] Bicevskis J., Nikiforova A., Bicevska Z., Oditis I.: A Step Towards a Data Quality Theory, In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), IEEE, 2019 (in print).
- Bicevskis, 18a] Bicevskis J., Bicevska Z., Nikiforova A., Oditis I.: An approach to data quality evaluation, In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 196-201, IEEE, 2018.
- Bicevskis, 18b] Bicevskis J., Bicevska Z., Nikiforova A., Oditis I.: Data quality evaluation: a comparative analysis of company registers' open data in four European countries, In FedCSIS Communication Papers (pp. 197-204), 2018, http://dx.doi.org/10.15439/2018F92.
- Caro, 07] Caro A., Calero C., Piattini M.: A Portal Data Quality Model for Users and Developers, In ICIQ (pp. 462-476), 2007.
- Chungoora, 13] Chungoora N., Young R. I., Gunendran G., Palmer C., Usman Z., Anjum N. A., Cutting-Decelle A. F., Harding J. A., Case K.: A model-driven ontology approach for manufacturing system interoperability and knowledge sharing, Computers in Industry, 64(4), 392-401, 2013. [Companies House, 18] Companies House: Free Company Data Product, http://download.companieshouse.gov.uk/en_output.html [ComputerWorld, 15] ComputerWorld: The "All In" Costs of Poor Data Quality. It goes beyond dollars and cents, 2015, https://www.computerworld.com/article/2949323/the-all-in-costs-of- poor-data-quality.html [Coutinho, 12] Coutinho C., Cretan A., Jardim-Goncalves R.: Negotiations framework for monitoring the sustainability of interoperability solutions, In International IFIP Working Conference on Enterprise Interoperability (pp. 172-184). Springer, Berlin, Heidelberg, 2012. [Ferney, 17] Ferney M., Estefan L., Alexander V.: Assessing data quality in open data: A case study, Congreso Internacional de Innovacion y Tendencias en Ingenieria (CONIITI) (pp. 1-5). IEEE, 2017 [Global Open Data Index, 18] Global Open Data Index, 2018, https://index.okfn.org/ [Khider, 18] Khider, H., Hammoudi, S., Benna, A., & Meziane, A. Social Business Process Model Recommender: An MDE Approach. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 106-113). IEEE, 2018.
- Kleppe, 03] Kleppe A. G., Warmer J., Warmer J. B., Bast W.: MDA explained: the model driven architecture: practice and promise, Addison-Wesley Professional, 2003.
- Mellor, 04] Mellor S. J., Scott K., Uhl A., Weise D.: MDA distilled: principles of model-driven architecture, Addison-Wesley Professional, 2004.
- Miller, 03] Miller J., Mukerji J.: MDA Guide Version 1.0. 1, Object Management Group (OMG), Needham, MA, 2494, 2003.
- Neumaier, 16] Neumaier S., Umbrich J., Polleres A.: Automated quality assessment of metadata across open data portals, Journal of Data and Information Quality (JDIQ), 8(1), 2016. [Nikiforova, 19a] Nikiforova, A.: Analysis of open health data quality using data object-driven approach to data quality evaluation: insights from a Latvian context. In IADIS International Conference e-Health 2019, MCCSIS 2019, (pp. 119-126). IADIS, 2019. [Nikiforova, 19b] Nikiforova A., Bicevskis, J.: An Extended Data Object-driven Approach to Data Quality Evaluation: Contextual Data Quality Analysis, Proceedings of the 21st International Conference on Enterprise Information Systems. In ICEIS 2019. [Nikiforova, 18a] Nikiforova A.: Open Data Quality, In Baltic DB&IS 2018 Joint Proceedings of the Conference Forum and Doctoral Consortium, Trakai, Lithuania (Vol. 2158), 2018. [Nikiforova, 18b] Nikiforova A.: Open Data Quality Evaluation: A Comparative Analysis of Open Data in Latvia, Baltic Journal of Modern Computing, 6(4), 363-386, 2018.
- Pauker, 16] Pauker F., Frühwirth T., Kittl B., Kastner W.: A systematic approach to OPC UA information model design, Procedia CIRP, 57, 321-326, 2016. [Paulheim, 14] Paulheim H., Bizer C.: Improving the quality of linked data using statistical distributions, International Journal on Semantic Web and Information Systems (IJSWIS), 10(2), 63-86, 2014.
- Redman, 01] Redman T. C.: Data quality: the field guide, Digital press, 2001
- Redman, 97] Redman T. C., Blanton A.: Data quality for the information age, Artech House, Inc., 1997.
- Ruiz, 18] Ruiz M.: TraceME: A Traceability-Based Method for Conceptual Model Evolution, Springer International Publishing, 2018.
- Salesi, 18] Salesi, S., Alani, A. A., Cosma, G.: A Hybrid Model for Classification of Biomedical Data Using Feature Filtering and a Convolutional Neural Network. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 226-232). IEEE, 2018.
- Sasse, 17] Sasse T., Smith A., Broad E., Kennison J., Wells P., Atz U.: Recomendations for Open Data Portals: from Setup to sustainability, https://www.europeandataportal.eu/sites/default/files/edp_s3wp4_sustainability_recommendati ons.pdf, 2017
- Sáez Martín, 16] Sáez Martín A., Rosario A. H. D., Pérez M. D. C. C.: An international analysis of the quality of open government data portals, Social Science Computer Review, 34(3), 298- 311, 2016.
- Scannapieco, 02] Scannapieco, M., Catarci, T.: Data quality under a computer science perspective, Archivi & Computer, 2, 1-15, 2002.
- Schmidt, 15] Schmidt M., Schmidt S. A. J., Sandegaard J. L., Ehrenstein V., Pedersen L., Sørensen H. T.: The Danish National Patient Registry: a review of content, data quality, and research potential, Clinical epidemiology, 7, 449, 2015.
- Shi, 15] Shi X., Han W., Huang Y., Li Y.: Service-oriented business solution development driven by process model, In the Fifth International Conference on Computer and Information Technology (CIT'05) (pp. 1086-1092). IEEE, 2015.
- Soley, 00] Soley R.: Model driven architecture, OMG white paper, 308(308), 5, 2000. [Sprogis, 13] Sprogis A., Barzdins J.: Specification, Configuration and Implementation of DSL Tool, In Databases and Information Systems VII: Selected Papers from the Tenth International Baltic Conference, DB & IS 2012 (Vol. 249, p. 330). IOS Press, 2013. [TDQM, 18] TDQM. The MIT Total Data Quality Management program. Available: http://web.mit.edu/tdqm/
- Tomic, 15] Tomic K., Sandin F., Wigertz A., Robinson D., Lambe M., Stattin P.: Evaluation of data quality in the National Prostate Cancer Register of Sweden, European journal of cancer, 51(1), 101-111, 2015.
- The Economist, 17] Economist, T.: The world's most valuable resource is no longer oil, but data. The Economist: New York, NY, USA, 2017.
- Umbrich, 15] Umbrich J., Neumaier S., Polleres A.: Quality assessment and evolution of open data portals, In 2015 3rd International Conference on Future Internet of Things and Cloud (pp. 404-411). IEEE, 2015. [Van den Berghe, 17] Van den Berghe S., Van Gaeveren K.: Data quality assessment and improvement: a Vrije Universiteit Brussel case study, Procedia Computer Science, 32-38, 2017.
- Vetrò, 16] Vetrò A., Canova L., Torchiano M., Minotas C. O., Iemma R., Morando F.: Open data quality measurement framework: Definition and application to Open Government Data, Government Information Quarterly, 33(2), 325-337, 2016.
- Wand, 96] Wand Y., Wang R. Y.: Anchoring data quality dimensions in ontological foundations, Communications of the ACM, 39(11), 86-96, 1996.
- Wang, 96] Wang R. Y., Strong D. M.: Beyond accuracy: What data quality means to data consumers, Journal of management information systems, 12(4), 5-33, 1996.