Academia.eduAcademia.edu

Data Replication

description1,578 papers
group581 followers
lightbulbAbout this topic
Data replication is the process of storing copies of data in multiple locations or systems to ensure consistency, availability, and reliability. It is commonly used in database management and distributed computing to enhance data accessibility and fault tolerance.
lightbulbAbout this topic
Data replication is the process of storing copies of data in multiple locations or systems to ensure consistency, availability, and reliability. It is commonly used in database management and distributed computing to enhance data accessibility and fault tolerance.

Key research themes

1. How do data replication protocols balance availability, consistency, and efficiency in distributed systems?

This research area investigates the design, analysis, and performance evaluation of data replication protocols that ensure consistent and available access to replicated data under various failure conditions and system constraints. It matters because the trade-offs between availability, fault-tolerance, communication overhead, and consistency dominate the effectiveness of replicated data management in distributed and cloud environments. Understanding these protocols aids in deploying resilient, high-performance distributed systems.

Key finding: The paper analyzes original available copy protocols and two variants (naive and optimistic), demonstrating through Markov models that these variants nearly match the original in availability and reliability while not... Read more
Key finding: This study differentiates failure unavailability (due to site failures) from conflict unavailability (due to concurrent access conflicts) and reviews techniques to improve replica availability by refining replica control (RC)... Read more
Key finding: This research develops a genetic programming approach to automatically generating and evolving data replication strategies optimizing the trade-off between availability and operation cost. It demonstrates that novel,... Read more
Key finding: This survey synthesizes the mechanisms of various replication techniques in grid environments, highlighting their impact on availability, fault tolerance, and performance in geographically distributed systems. It emphasizes... Read more

2. What middleware-level approaches integrate transactional concurrency control and group communication to enable scalable, consistent data replication?

This theme explores middleware designs that lie between applications and databases to achieve consistent and scalable data replication without requiring intrusive modifications to underlying database systems. The research examines leveraging transactional protocols with group communication primitives to reduce redundant computation, maintain one-copy serializability, and optimize communication overhead, important for systems like web farms and distributed object platforms.

Key finding: Proposes a middleware-level replication engine combining transactional concurrency control with group communication to maintain one-copy serializability. Introduces protocols that execute transactions at a primary site to... Read more
Key finding: Describes the implementation and performance evaluation of a replication framework supporting pessimistic and optimistic active replication using atomic broadcast primitives. The prototype confirms that in large-scale... Read more
Key finding: Introduces EA2-IMDG that leverages in-memory data grids (IMDGs) to reduce latency and improve scalability of replication and task scheduling in grid systems. By distributing data in RAM across nodes, it minimizes disk I/O... Read more
Key finding: Proposes ICTSDC, which tightly couples task scheduling and data replication by leveraging a self-adaptive Dwarf Mongoose Optimization (SADMO) algorithm. The model optimizes objectives such as bottleneck reduction, migration... Read more

3. How are data replication strategies in cloud environments optimized for multi-objective goals including provider cost, energy consumption, performance, and SLA satisfaction?

This research theme focuses on dynamic and static replication strategies in cloud systems that consider economic factors, energy efficiency, and SLA requirements alongside performance metrics. Approaches include elastic replica management, economic modeling, heuristic optimization, and data mining-based methods to balance replication overhead with provider profit and tenant QoS demands, addressing the challenges created by cloud heterogeneity and large-scale distributed data.

Key finding: Proposes a dynamic replication strategy balancing provider profit and tenant SLA satisfaction, using a cost model that enables replication only when necessary. By incorporating both response time and economic benefit in... Read more
Key finding: Surveys cloud data replication strategies across multiple dimensions including static/dynamic operation, workload balancing approaches, replica factor adjustment, and objective functions. Emphasizes that effective cloud... Read more
Key finding: Introduces E2ARS, a static, multi-objective replication strategy that jointly reduces cloud provider energy consumption and expenditure under SLA constraints. Employs optimization algorithms that leverage cloud heterogeneity... Read more
Key finding: Develops a hybrid replication strategy based on quorum voting structures that balances availability and access operation costs across varying scenarios. The approach supports flexible configuration of read/write quorums to... Read more
Key finding: Proposes a novel algorithm combining particle swarm optimization (PSO) with fuzzy logic system for replica placement and replacement in cloud environments. The method optimizes conflicting objectives such as service time,... Read more
Key finding: Introduces GUEES, a hybrid algorithm combining Sealion Optimization Model and Grey Wolf Optimizer to identify frequent data access patterns for informed data replication. By prioritizing data queues and evaluating storage... Read more

All papers in Data Replication

In data grid, using reservation is accepted to provide scheduling and service quality. Users need to have an access to the stored data in geographical environment, which can be solved by using replication, and an action taken to reach... more
Data grids deal with a huge amount of data regularly. It is a fundamental challenge to ensure efficient accesses to such widely distributed data sets. Creating replicas to a suitable site by data replication strategy can increase the... more
Eventual consistency is demanded nowadays in geo-replicated services that need to be highly scalable and available. According to the CAP constraints, when network partitions may arise, a distributed service should choose between being... more
Modern text retrieval systems often provide a similarity search utility, that allows the user to find efficiently a fixed number h of documents in the data set that are the most similar to a given query (here a query is either a simple... more
Dans un contexte d'utilisation de ressources hétérogènes, la performance reste le critère traditionnel pour la planification de capacité. Mais, de nos jours, tenir compte de la variable énergétique est devenu une nécessité. Cet article... more
Big Data applications allow to successfully analyze large amount of data not necessarily structured, though at the same time they present new challenges. For example, predicting the performance of frameworks such as Hadoop can be a costly... more
The identification of the highest yielding cultivar for a specific environment on the basis of both genotype (G) and genotype × environment (GE) interaction would be useful to breeders and producers since yield estimates based only on G... more
The emergence and widespread adoption of Grid computing has been fueled by continued growth in both our understanding of application requirements and the sophistication of the technologies used to meet these requirements. We provide an... more
Trying to remember something now typically improves your ability to remember it later. However, after watching a video of a simulated bank robbery, participants who verbally described the robber were 25% worse at identifying the robber in... more
The goal of the Globe project is to design and build a middleware platform that facilitates the development of large-scale distributed applications, such as those found on the Internet. To demonstrate the feasibility of our design and to... more
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of... more
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of... more
Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling... more
by Son Vi
In recent years collaborative editing systems such as wikis, GoogleDocs and version control systems became very popular. In order to improve reliability, fault-tolerance and availability shared data is replicated in these systems. User... more
Spatial data analysis applications are emerging from a wide range of domains such as building information management, environmental assessments and medical imaging. Time-consuming computational geometry algorithms make these applications... more
This report presents the V7 milestone of the Phase-Locked Quantum-Plasma Processor. We confirm that localised phase nodes can be copied between heterogeneous fields while conserving functionality. Experiments cover 2-D and 3-D... more
The increasing volume and complexity of Big Data have led to the development of distributed processing frameworks such as Apache Spark, particularly its Python interface, PySpark, which allows for large-scale data processing in cloud... more
In this paper, we address present an optimal solution for the problem of multimedia object placement for hybrid transparent data replication. The performance objective is to minimize the total access cost by considering both transmission... more
Scheduling in traditional distributed systems has been mainly studied for system performance parameters without data transmission requirements. With the emergence of Data Grids (DGs) and Data Centers, data-aware scheduling has become a... more
Data replication is a promising technique for increasing access performance and data availability in Data Grid (DG) systems. Current work on data replication in Grid systems focuses on infrastructure for replication and mechanisms for... more
This paper advances the study of democratic trajectorieswhether democracies deepen, stagnate, erode or break down over time. We show that econometric panel models usually neglect cumulative effects, which are implicitly central to many... more
Alhamdulillah, thanks God, without H~m I am nothing. I would like to express my sillcere thanks to my major adviser Dr. Mitchell L. Neilsen for his guidanceẽ ncouragement, help, and support for the completion of my thesis. Without his... more
Trying to remember something now typically improves your ability to remember it later. However, after watching a video of a simulated bank robbery, participants who verbally described the robber were 25% worse at identifying the robber in... more
In the second section we discuss in general terms how the Varieties of Democracy (V-Dem) project differs from extant indices and how the novel approach taken by V-Dem might assist the work of activists, professionals, and scholars.... more
Patients with schizophrenia frequently demonstrate hypofrontality in tasks that require executive processing; however questions still remain as to whether prefrontal cortex dysfunctions are specific to schizophrenia, or a general feature... more
To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated data is to force service operations to occur in the same order at all sites, but this... more
A real time distributed computing has heterogeneously networked computers to solve a single problem. So coordination of activities among computers is a complex task and deadlines make more complex. The performance of the system depends on... more
Purpose: Genome-wide association studies (GWAS) have identified 6q25, which incorporates the oestrogen receptor a gene (ESR1), as a quantitative trait locus for areal bone mineral density (BMD a ) of the hip and lumbar spine. The aim of... more
Cloud computing is one of the upcoming latest new computing paradigm where applications and data services are provided over the Internet. Cloud computing is attractive to business owners as it eliminates the requirement for users to plan... more
Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using... more
Nous sommes à l'aube d'une énorme explosion de données et la quantité à traiter par les entreprises est de plus en plus grande. Pour faire face à ce chalenge, Google a développé MapReduce, un modèle de programmation parallèle qui est en... more
Multi-tier architectures provide a means for building scalable distributed services. Caching is a classical technique for enhancing the performance of systems (e.g. database servers, or web servers). Although caching solutions have been... more
Sens}@lip6.fr Résumé Le Cloud computing marque une nouvelle avancée vers l'infrastructure informatique dématérialisée. Le Cloud fournit des ressources informatiques, logicielles ou matérielles, accessible à distance en tant que service.... more
We report the results of a BioBlitz held on 24–25 October 2008 at the Wesselman Woods Nature Preserve in Evansville, Indiana, Van
Cloud providers aim to maximise their profits while satisfying tenant requirements, e.g., performance. The relational database management systems face many obstacles in achieving this goal. Therefore, the use of NoSQL databases becomes... more
With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management... more
The increasing adoption of multi-cloud database systems has transformed enterprise data management, enabling enhanced scalability, reliability, and cost efficiency. However, managing databases across multiple cloud providers introduces... more
In a replicated environment, same data are often distributed to several sites in order to improve the data availability, fault tolerance and faster access. However, when a replica is modified by a user, the other replicas become stale. In... more
This paper explores a game-theoretic model for task allocation in distributed systems, where processors with varying speeds and external load factors are considered strategic players. The goal is to understand the impact of processors'... more
The television industry has undergone a massive transformation with the rise of Internet Protocol Television (IPTV). Unlike traditional cable and satellite TV, IPTV leverages the internet to deliver content, offering flexibility,... more
In a sensor network information from multiple nodes must usually be aggregated in order to accomplish a certain task. A natural way to view this information gathering is in terms of interactions between nodes that are producers of... more
The demand for replicability of behavioral results across laboratories is viewed as a burden in behavior genetics. We demonstrate how it can become an asset offering a quantitative criterion that guides the design of better ways to... more
The issue of data replication is considered in the context of a restricted system model motivated by certain distributed data-warehousing applications. A new replica management protocol is defined for this model in which gIobaI... more
This study was conducted to evaluate the significance and magnitude of the effect of genotype x environment (GE) interaction on corn grain yield, and to determine the best genotype for each major corn-growing region in the Philippines.... more
Rice breeders consider grain yield and milled rice percentages in developing cultivars, but usually do not consider gross income. This study’s objectives were to identify rice genotypes that produced high and stable expected gross incomes... more
The Mariposa distributed data manager uses an economic model for managing the allocation of both storage objects and queries to servers. In this paper, we present extensions to the economic model which support replica management, as well... more
One of the basic services in grids is the transfer of data between remote machines. Files may be transferred at the explicit request of the user or as part of delegated resource management services, such as data replication or job... more
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
Download research papers for free!