Automated approach for quality assessment of RDF resources
BMC Medical Informatics and Decision Making
https://doi.org/10.1186/S12911-023-02182-8Abstract
Introduction The Semantic Web community provides a common Resource Description Framework (RDF) that allows representation of resources such that they can be linked. To maximize the potential of linked data - machine-actionable interlinked resources on the Web - a certain level of quality of RDF resources should be established, particularly in the biomedical domain in which concepts are complex and high-quality biomedical ontologies are in high demand. However, it is unclear which quality metrics for RDF resources exist that can be automated, which is required given the multitude of RDF resources. Therefore, we aim to determine these metrics and demonstrate an automated approach to assess such metrics of RDF resources. Methods An initial set of metrics are identified through literature, standards, and existing tooling. Of these, metrics are selected that fulfil these criteria: (1) objective; (2) automatable; and (3) foundational. Selected metrics are represented in RDF and semantical...
References (54)
- Klyne G, Carroll JJ, McBride B. 2.2.6 Anyone Can Make Statements About Any Resource. https:// www. w3. org/ TR/ rdf-conce pts/ secti on- anyone. Accessed 31 Mar 2022.
- Hitzler P, Janowicz K. Linked Data, Big Data, and the 4th Paradigm. Semantic Web. 2013;4(3):233-5.
- McCrae JP. The Linked Open Data Cloud. https:// lod-cloud. net. Accessed 31 Mar 2022.
- Hitzler P. A review of the semantic web field. Commun ACM. 2021;64(2):76-83.
- Cyganiak R, Wood D, Lanthaler M. RDF 1.1 Concepts and Abstract Syn- tax. W3C Recommendation. 2014. https:// www. w3. org/ TR/ rdf11-conce pts/.
- Saitwal H, Qing D, Jones S, Bernstam EV, Chute CG, Johnson TR. Cross- terminology mapping challenges: a demonstration using medication terminological systems. J Biomed Inf. 2012;45(4):613-25.
- Pacaci A, Gonul S, Sinaci AA, Yuksel M, Laleci Erturkmen GB. A semantic transformation methodology for the secondary use of observational healthcare data in postmarketing safety studies. Front Pharmacol. 2018;9:435.
- Dhombres F, Bodenreider O. Interoperability between phenotypes in research and healthcare terminologies-Investigating partial mappings between HPO and SNOMED CT. J Biomed Semantics. 2016;7(1):1-13.
- Vasant D, Chanas L, Malone J, Hanauer M, Olry A, Jupp S, et al. ORDO: an ontology connecting rare disease, epidemiology and genetic data. Phenotype data ISMB2014. 2014.
- Graves M, Constabaris A, Brickley D. FOAF: connecting people on the semantic web. Cat Classif Q. 2007;43:191-202.
- Bizer C, Heath T, Berners-Lee T. Linked data -the story so far. Int J Semant Web Inf Syst. 2009;5:1-22.
- Boegh J. A new standard for quality requirements. IEEE Softw. 2008;25(2):57.
- Wand Y, Wang RY. Anchoring data quality dimensions in ontological foundations. Commun ACM. 1996;39(11):86-95.
- Bizer C, Cyganiak R. Quality-driven information filtering using the WIQA policy framework. J Web Semantics. 2009;7(1):1-10.
- Mendes P, Mühleisen H, Bizer C. Sieve: linked data quality assessment and fusion. In: ACM international conference proceeding series; 2012. pp. 116-23 .
- Firmani D, Mecella M, Scannapieco M, Batini C. On the meaningfulness of "big data quality'' . Data Sci Eng. 2016;1(1):6-20.
- Tarasowa D, Lange C, Auer S. Measuring the quality of relational-to-RDF mappings. In: international conference on knowledge engineering and the semantic web. Springer; 2015. pp. 210-24.
- Färber M, Bartscherer F, Menne C, Rettinger A. Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago. Semantic Web. 2018;9(1):77-129.
- Fürber C, Hepp M. Swiqa-a semantic web information quality assess- ment framework. 2011.
- Ge M, Helfert M. Data and information quality assessment in informa- tion manufacturing systems. In: lecture notes in business information processing. 2008.
- Schultz A, Matteini A, Isele R, Mendes PN, Bizer C, Becker C. LDIF-A framework for large-scale linked data integration. France: In: 21st International World Wide Web Conference (WWW2012), Developers 831 Track, Lyon, vol. 10. 2012.
- Kontokostas D, Westphal P, Auer S, Hellmann S, Lehmann J, Cornelissen R, et al. Test-driven evaluation of linked data quality. In Proceedings of the 23rd international conference on World Wide Web. New York: ACM; 2014. pp.747-58.
- Debattista J, Auer S, Lange C. Luzzu-a methodology and framework for linked data quality assessment. J Data Inf Qual (JDIQ). 2016;8(1):1-32.
- Debattista J, Lange C, Auer S. daQ, an ontology for dataset quality information. Seoul: In proceedings of the LDOW 2014; 2014. p.7-11.
- Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39(suppl-2):W541-5.
- Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO Foundry: coordinated evolution of ontologies to support bio- medical data integration. Nat Biotechnol. 2007;25(11):1251-5.
- Zhang S, Benis N, Cornet R. Assessing resolvability and consistency in OBO foundry ontologies: pilot study. In: Studies in Health Technology and Informatics; 2021. p. 104-8.
- World Wide Web Consortium. Data catalog vocabulary (DCAT). W3C; 2014.
- Beckett D, Berners-Lee T, Prud'hommeaux E, Carothers G. RDF 1.1 Turtle. World Wide Web Consortium. 2014. p. 18-31.
- 25012:2008 I. ISO/IEC 25012:2008 Software engineering -Software product Quality Requirements and Evaluation (SQuaRE) -Data quality model. https:// www. iso. org/ stand ard/ 35736. html. Accessed 31 Mar 2022.
- Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S. Quality assessment for linked data: A survey. Semantic Web. 2016;7(1):63-93.
- Albertoni R, Isaac A. Introducing the data quality vocabulary (DQV). Semantic Web. 2021;12(1):81-97.
- Debattista J. Data Quality Metric (DQM) vocabulary. http:// purl. org/ eis/ vocab/ dqm#. Accessed 31 Mar 2022.
- Brickley D. Resource description framework (RDF) schema specification RDF schemas. W3C. 2012;1999:1-20.
- Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S. Linked Data Quality Dimension (LDQD) vocabulary. https:// www. w3. org/ 2016/ 05/ ldqd. Accessed 31 Mar 2022.
- Baker T, Bechhofer S, Isaac A, Miles A, Schreiber G, Summers E. Key choices in the design of Simple Knowledge Organization System (SKOS). J Web Semantics. 2013;20:35-49.
- RDFlib. RDFLib. https:// github. com/ RDFLib/ rdflib. Accessed 31 Mar 2022.
- Zhang S. An Automated Tool for Assessing Resolvability, Parsability, and Consistency of RDF Resources. https:// github. com/ sxzha ng1201/ assess-rdf-resou rce. Accessed 31 Mar 2022.
- Dominique Hazaël-Massieux, Dan Connolly. Gleaning resource descrip- tions from dialects of languages (GRDDL). http:// www. w3. org/ TR/ grddl/. Accessed 31 Mar 2022.
- Kunze JA, Baker T. The Dublin core metadata element set. RFC Editor. 2007. https:// doi. org/ 10. 17487/ RFC50 13.
- Dumontier M, Baker CJ, Baran J, Callahan A, Chepelev L, Cruz-Toledo J, et al. The Semanticscience Integrated Ontology (SIO) for bio- medical research and knowledge discovery. J Biomed Semantics. 2014;5(1):1-11.
- Dolin RH, Alschuler L, Boyer S, Beebe C, Behlen FM, Biron PV, et al. HL7 clinical document architecture, release 2. J Am Med Inf Assoc. 2006;13(1):30-9.
- Bender D, Sartipi K. HL7 FHIR: an agile and RESTful approach to health- care information exchange. In: proceedings of CBMS 2013 -26th IEEE international symposium on computer-based medical systems; 2013. p. 326-31.
- CDISC. The Clinical Data Acquisition Standards Harmonization (CDASH). https:// www. cdisc. org/ stand ards/ found ation al/ cdash. Accessed 31 Mar 2022.
- McGuinness DL, Van Harmelen F, et al. OWL web ontology language overview. W3C Recomm. 2004;10(10):2004.
- Freed N, Klensin J, Hansen T .Media type specifications and registration procedures. technical report. Internet society. 2013. https:// doi. org/ 10. 17487/ RFC68 38.
- Hugo W, Le Franc Z, Coen G, Parland-von Essen J, Bonino L. In: D2.5 FAIR semantics recommendations second iteration, zenodo. 2020. https:// doi. org/ 10. 5281/ zenodo. 53620 10.
- Noy NF, McGuinness DL, et al. Ontologydevelopment 101: A guide to creating your first ontology. Technical Report SMI-2001-0880, Stanford- Medical Informatics; 2001.
- Gangemi A. Ontology Design Patterns for Semantic Web Content. In: The Semantic Web -ISWC 2005. Springer Berlin Heidelberg; 2005. p. 262-276.
- He Y, Xiang Z, Zheng J, Lin Y, Overton JA, Ong E. The eXtensible ontology development (XOD) principles and tool implementation to support ontology interoperability. J Biomed Semantics. 2018;9(1):1-10.
- Gennari JH, Musen MA, Fergerson RW, Grosso WE, Crubézy M, Eriksson H, et al. The evolution of Protégé: an environment for knowledge-based systems development. Int J Hum-Comput Stud. 2003;58(1):89-123.
- Hemid A, Halilaj L, Khiat A, Lohmann S. RDF doctor: A holistic approach for syntax error detection and correction of RDF data. In: IC3K 2019 - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. 2019.
- Verborgh R, De Wilde M. Using openrefine. Packt publishing Ltd; 2013.
- Lebo T, Sahoo S, McGuinness D, Belhajjame K, Cheney J, Corsar D, et al. PROV-O: The PROV Ontology. W3C Recommendation 30 April; 2013. http:// www. w3. org/ TR/ 2013/ REC-prov-o-20130 430/.