Academia.eduAcademia.edu

Data Integrity

description11,236 papers
group2,787 followers
lightbulbAbout this topic
Data integrity refers to the accuracy, consistency, and reliability of data throughout its lifecycle. It encompasses the processes and measures that ensure data remains unaltered and trustworthy, protecting it from unauthorized access, corruption, or loss, thereby maintaining its quality and usability for decision-making and analysis.
lightbulbAbout this topic
Data integrity refers to the accuracy, consistency, and reliability of data throughout its lifecycle. It encompasses the processes and measures that ensure data remains unaltered and trustworthy, protecting it from unauthorized access, corruption, or loss, thereby maintaining its quality and usability for decision-making and analysis.

Key research themes

1. How are data integrity challenges identified and mitigated in cloud computing and big data environments?

This research theme focuses on the specific challenges of ensuring data correctness, security, and integrity when large volumes of data are outsourced or managed in cloud and big data platforms. It is crucial due to the intrinsic loss of control, multi-tenancy, and resource heterogeneity in cloud and big data systems, which expose data to a variety of potential modifications, losses, and attacks. The theme explores verification schemes, taxonomies, and frameworks that minimize computational and communication overhead while providing reliable assurance of data integrity.

Key finding: This paper presents a comprehensive taxonomy of data integrity schemes tailored for cloud storage, highlighting design challenges such as computational efficiency, storage and communication costs, and reduced I/O. It... Read more
Key finding: This study introduces a model emphasizing the preservation of data integrity across the entire big data lifecycle, especially considering velocity, volume, and variety aspects. It identifies main integrity challenges... Read more
Key finding: This research proposes an efficient data integrity verification framework based on cross-referencing Bloom filters specifically designed for object-based big data transfers. By leveraging the space efficiency and insertion... Read more
Key finding: Through a systematic review of over 50 scholarly works, this paper identifies ongoing cloud data integrity challenges, including sophisticated attacks and regulatory compliance issues. It proposes a hybrid security... Read more
Key finding: This work presents a multi-layered security framework integrating lightweight encryption, distributed data dispersion across multiple clouds, and access control enforced via private keys for secure cloud storage.... Read more

2. What computational and system-level methods advance practical integrity verification of large-scale and untrusted data storage?

This theme investigates algorithmic and architectural innovations that enable efficient integrity verification for large, untrusted datasets, especially in scenarios constrained by limited trusted memory or high-frequency data operations. Methods include hybrid verification schemes combining log structures and hash trees, fine-grained policy enforcement in operating system kernels, and compact probabilistic data structures. These advancements target minimizing bandwidth, computational costs, and nondeterminism in integrity assurances while facilitating practical deployment in real-world systems such as secure processors and distributed file systems.

Key finding: This paper introduces an adaptive tree-log integrity checking scheme that hybridizes hash tree and log-hash methods. It achieves guaranteed worst-case bandwidth overhead bounds and, under typical program behaviors with... Read more
Key finding: XFilter proposes an LSM-independent fine-grained file-level policy framework within the Linux Integrity Measurement Architecture (IMA) subsystem. Through new matching mechanisms (XLabel and XList), it enables selective and... Read more
Key finding: The CRBF framework innovatively applies cross-referencing Bloom filters for integrity verification in object-based big data transfer systems, supporting parallel and unordered data object transfers. It significantly reduces... Read more

3. How do domain-specific and regulatory considerations influence data integrity management models and frameworks?

Data integrity considerations are shaped not only by technical mechanisms but also by sector-specific data quality requirements, ownership protections, and privacy regulations. This research area examines frameworks for master data quality evaluation, legal data protection regimes, anti-fraud authentication mechanisms in financial services, and integrity validation in emerging scientific domains. The focus is on aligning data integrity approaches with compliance standards, ownership assurances (e.g., watermarking), privacy rights (e.g., GDPR), and the reliability needs of AI-driven materials science or financial systems, ensuring integrity management integrates technical and normative dimensions.

Key finding: This paper develops a data quality evaluation model tailored for master data repositories grounded in international standards (ISO/IEC 25012, 25024, and 8000-1000). It addresses unique master data features distinct from... Read more
Key finding: The study provides a detailed examination of personal data protection regulatory frameworks emphasizing fundamental privacy rights, specifically referencing GDPR and related global standards. It highlights how personal and... Read more
Key finding: The research explores the application of digital watermarking techniques as a data integrity and ownership protection mechanism in databases. Watermark embedding into data objects enhances resistance to unauthorized... Read more
Key finding: This qualitative analysis identifies critical integrity and security challenges in online banking, emphasizing authentication weaknesses, data confidentiality breaches, and system architectural flaws. It presents fraud risks... Read more
Key finding: The paper highlights a crisis in data integrity within AI-assisted materials science, pinning reproducibility issues and fraudulent data on inadequate application of fundamental physical validation methods like f-sum rules... Read more

All papers in Data Integrity

We present FastVer, a high-performance key-value store with strong data integrity guarantees. FastVer is built as an extension of FASTER, an open-source, high-performance key-value store. It offers the same key-value API as FASTER plus an... more
The aim of this paper is to show a method that is able to detect a particular class of semantic inconsistencies in a deductive system (DS). A DS verified by this method contains a set of first order production rules, and a description... more
Despite the dominance of the service sector in the last decades, there is still a need for a strong foundation on service design and innovation. Little attention has paid on service modelling, particularly in the collaboration context.... more
Rules based approaches for data quality solutions often use business rules or integrity rules for data monitoring purpose. Integrity rules are constraints on data derived from business rules into a formal form in order to allow... more
The U.S. Geological Survey is evaluating potentially useful surrogate instruments and methods for inferring the physical characteristics of suspended sediments. Instruments operating on bulk acoustic, bulk and digital optic, laser, and... more
Data is inherently dirty and there has been a sustained effort to come up with different approaches to clean it. A large class of data repair algorithms rely on data-quality rules and integrity constraints to detect and repair the data. A... more
Academic transcripts are essential documents in higher education, reflecting students' academic performance and capabilities. However, the current management of transcript data at Halu Oleo University (UHO) lacks safeguards against... more
Health information systems are increasingly complex, and their development is presented as a challenge for software development companies offering quality, maintainable and interoperable products. HL7 (Health level 7) International, an... more
This research work proposes a method for managing, securing, and validating the health data distribution records using a genetic-based hashing algorithm in a decentralized environment. The reason behind choosing blockchain is to secure... more
Completion of this thesis would not have been possible without the support and contribution of many people. It is a great honor for me to thank some of those many, to whom I owe my deepest gratitude. I would like to express my deepest... more
An educational record contains information directly related to an individuals’ academic performance which needs to be shared as a part of the educational system among various stakeholders consisting of students, schools, companies,... more
To answer user queries, a data integration system employs a set of semantic mappings between the mediated schema and the schemas of data sources. In dynamic environments sources often undergo changes that invalidate the mappings. Hence,... more
Molecular biology offers a large, complex and volatile domain that tests knowledge representation techniques to the limit of their fidelity, precision, expressivity and adaptability. The discipline of molecular biology and bioinformatics... more
The complex physical processes controlling ceiling and visibility (for example, the formation, evolution and motion of low cloud, precipitation and fog) and the diverse seasonal and geographic influences that modulate these controls... more
The development of machine learning (ML) technologies provide a new development direction for cryptanalysis. Several ML research in the field of cryptanalysis was carried out to identify the cryptographic algorithm used, find out the... more
The Italian National Statistics Institute is currently integrating its various legacy spatio-temporal data collections. The SIT-IN project has delivered a first release, whose development relied on web and relational technologies to... more
Global positioning systems first became available for private use in 1995. Since the introduction of NAVSTAR-GPS (Navigation System with Time and Ranging -Global Positioning System) and GLONASS (Globaluaya Navigatsionnaya Sputnikovaya... more
Peer-to-peer systems have considerably evolved since their original conception, in the 90's. The idea of distributing files using the user's terminal as a relay has now been widely extended to embrace virtually any form of... more
Web services are pivotal in contemporary software development, connecting diverse systems over standardized protocols and enabling data sharing across platforms. From the early days of the Simple Object Access Protocol (SOAP) to the... more
This paper studies the problem of increasing the efficiency in controlling burst errors that are caused by external noise in the context of digital data transmission. For this purpose, the utilization of special weighted check sum... more
Abstract. Exploiting the rich traces of users' Web interaction promises to enable cross-application user modeling techniques, which is in particular interesting for applications that have a small user population or that are used... more
Data exchange and interoperability between clinical information systems represent a crucial issue in the context of pati ent record data collection. An XML representation schema adapted to end-stage renal disease (ESRD) patients was... more
Pharmaceutical manufacturing relies on robust Quality Assurance (QA) and Quality Control (QC) systems to ensure product safety, efficacy, and compliance with stringent regulations. This paper explores current trends, challenges, and... more
The Internet of Medical Things (IoMT) is revolutionizing healthcare through real-time monitoring and personalized interventions. However, its rapid adoption raises urgent ethical and regulatory challenges, particularly concerning data... more
NoSQL stores are emerging as an efficient alternative to relational database management systems in the context of big data. Many actors in this domain consider that to gain a wider adoption, several extensions have to be integrated. Some... more
Real-world data (RWD) and real-world evidence (RWE) are now central to healthcare decision-making, supporting regulatory submissions, health technology assessments (HTA), and scientific communication. Yet patients whose data fuel these... more
The annual Hajj presents diversified negative experiences to millions of pilgrims worldwide. The negative experiences and recommendations to overcome them as per pilgrims' feedback are yet to be analyzed from an aggregated perspective in... more
Global positioning systems first became available for private use in 1995. Since the introduction of NAVSTAR-GPS (Navigation System with Time and Ranging -Global Positioning System) and GLONASS (Globaluaya Navigatsionnaya Sputnikovaya... more
This paper explores the integration of blockchain technology into healthcare systems to enhance cybersecurity and safeguard sensitive patient data. As healthcare organizations increasingly digitize their operations, they become vulnerable... more
There is a growing need for strong methods to guarantee the accuracy and reliability of data due to the widespread use of next-generation AI in automated processes. This research delves into new approaches to rethink AI system quality... more
Formal methods can bring many advantages to software practitioners and their adoption has been often advocated. In recent years usage of formal techniques certainly increased, nevertheless there is still ample room for further adoption... more
Web form spamming is a growing cybersecurity threat that disrupts digital services and compromises data integrity. Traditional defenses like CAPTCHA are increasingly ineffective against sophisticated bots. This study proposes a... more
Recent studies have indicated that companies are increasingly experiencing Data Quality (DQ) related problems as more and more complex data are being collected. In order to address such problems, literature suggests the implementation of... more
Digital watermarking has been in multimedia data use over the past years. Recently it has become applicable in relational database system not only to secure copyright ownership but also to ensure data contents integrity. Further, it is... more
Cloud storage provides users to easily store their data and enjoy the good quality of cloud applications which is need not install in any local hardware and software system. such a service is also gives users control of their outsourced... more
There is a growing need for strong methods to guarantee the accuracy and reliability of data due to the widespread use of next-generation AI in automated processes. This research delves into new approaches to rethink AI system quality... more
Many applications that make use of sensor networks require secure communication. Because asymmetric-key solutions are difficult to implement in such a resource-constrained environment, symmetric-key methods coupled with a priori key... more
This paper proposes a system which can be used for the early indication of fungus affected plants in an Agricultural Field. As the yield of crop depends on the healthy growth of the plants, the status of the plants needs to be monitored... more
To use multi-station data to measure charge transport for IC and CG flashes independent from a Lightning Mapping Array (LMA) ⋄ To use multi-station ∆E data constrained by LMA data to see time and space dependence of charge transport along... more
Blockchain, on the other hand, is a groundbreaking technology that provides a distributed and Decentralized environment in which nodes in a list of networks can connect to each other without the need for a central authority. It has the... more
Abstractt-Achieving semantic interoperability is a current challenge in the field of data integration in order to bridge semantic conflicts occurring when the participating sources and receivers use different or implicit data assumptions.... more
The advent of cloud computing has transformed the management and storage of data by providing scalable and flexible services. However, it has also raised critical security concerns related to data confidentiality, authentication, and... more
Data Grids allow for seeing heterogeneous, distributed, and dynamic informational resources as if they were a uniform, stable, secure, and reliable database. According to this view, current proposals for data integration on Grids are... more
Model-based language specification has applications in the implementation of language processors, the design of domain-specific languages, model-driven software development, data integration, text mining, natural language processing, and... more
In today's digital landscape, data integrity and authenticity are critical to secure communication, financial transactions, and information exchange. Digital signatures, which rely on asymmetric cryptography, provide a robust mechanism... more
Download research papers for free!