Virtual Disk Snapshots

description16 papers

group21 followers

lightbulbAbout this topic

Virtual disk snapshots are a technology used in virtualization that captures the state, data, and configuration of a virtual machine's disk at a specific point in time. This allows for the preservation of the virtual machine's environment, enabling users to revert to that state if needed, facilitating backup, recovery, and testing processes.

lightbulbAbout this topic

Key research themes

1. How can virtualization-aware file systems enhance the flexibility and management of virtual disk snapshots beyond conventional coarse-grained rollback?

This research theme focuses on overcoming the inherent limitations of conventional virtual disks, mainly their coarse-grained, all-or-nothing rollback and lack of internal structure that impede fine-grained sharing, searching, and secure management. Virtualization-aware file systems (VAFS) integrate versioning, mobility, and access control features of virtual disks with the fine-grained sharing and usability benefits of distributed file systems. This hybrid approach aims to enable more flexible virtual machine (VM) snapshotting, minimize storage overhead, and improve security and manageability of virtual disk snapshots in dynamic VM environments.

Virtualization Aware File Systems: Getting Beyond the Limitations of Virtual Disks

by Ben Pfaff

2016

Key finding: Introduced Ventana, a virtualization aware file system that blends the versioning, isolation, and mobility of virtual disks with the fine-grained sharing and access-controlled features of distributed file systems. Unlike... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What strategies optimize storage efficiency and reliability in cloud-based virtual disk snapshots using deduplication and erasure coding?

This theme explores techniques to reduce storage costs and improve fault tolerance for virtual disk snapshots, especially within Infrastructure-as-a-Service (IaaS) clouds hosting HPC or large-scale applications. Deduplication eliminates redundant data chunks across snapshots or VMs, considerably reducing space and I/O bandwidth requirements. Erasure codes, particularly Reed-Solomon coding, provide fault tolerance with less storage overhead compared to replication but require optimization for bandwidth efficiency during data writes. Together, these approaches address challenges of large-scale snapshot storage, balancing performance, reliability, and cost.

A Survey and Classification of Storage Deduplication Systems

by João Paulo

2022, ACM Computing Surveys

Key finding: Provided a comprehensive classification framework for deduplication approaches relevant to virtual disk snapshots, detailing design decisions regarding granularity, locality, timing, indexing, technique, and scope.... Read more

articleView Paper downloadDownload

Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds

by LEONARDO SANTIAGO NAVA GOMEZ

2022

Key finding: Developed a Reed-Solomon erasure coding algorithm optimized for concurrent large-scale data dumps to local disks in IaaS, reducing bandwidth consumption while achieving high reliability. Integrated within a checkpoint-restart... Read more

articleView Paper downloadDownload

BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds

by Franck Cappello

2022, Journal of Parallel and Distributed Computing

Key finding: Introduced BlobCR, a checkpoint-restart framework enabling incremental snapshots of entire VM disks using selective copy-on-write techniques. By asynchronously persisting VM disk snapshots with low overhead, BlobCR reduces... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How do advanced snapshot management techniques improve scalability and efficiency of virtual disk snapshots in distributed storage systems and clouds?

This research area investigates algorithms and system designs that manage large volumes of virtual disk snapshots by optimizing snapshot creation, storage, and retrieval in distributed environments. Focus areas include reducing snapshot size and fragmentation, increasing snapshot accessibility, and supporting operations like suspend-resume and migration. Techniques such as copy-on-write with snapshot disentanglement, versioning file systems optimized for multi-deployment and multi-snapshot operations, and hybrid remote replication methods aim to balance storage overhead, network bandwidth use, and I/O performance for scalable snapshot management.

Going back and forth: efficient multideployment and multisnapshotting on clouds

by Swati Gupta

2012, Proceedings of the 20th …

Key finding: Designed a virtual file system optimized for concurrent deployment and snapshotting of large numbers of VMs in IaaS clouds. By leveraging lazy transfer schemes combined with object versioning on a distributed storage service... Read more

articleView Paper downloadDownload

Going Back and Forth: Efficient Multi-Deployment and Multi-Snapshotting on Clouds

by Raghu Y

2011

Key finding: Through detailed design principles and an implementation based on versioning storage services, this work provided a scalable method for handling the common cloud patterns of deploying and snapshotting large VM collections.... Read more

articleView Paper downloadDownload

Thresher: An Efficient Storage Manager for Copy-on-write Snapshots

by Liuba Shrira

2021, Proceedings of the Annual Conference on Usenix 06 Annual Technical Conference

Key finding: Introduced Thresher, a snapshot storage management system that efficiently segregates snapshots based on application-provided rankings to discriminate snapshot importance. By combining ranked segregation with techniques like... Read more

articleView Paper downloadDownload

Hybrid Replication: Optimizing Network Bandwidth and Primary Storage Performance for Remote Replication

by Philip Shilane

2022, 2016 IEEE International Conference on Networking, Architecture and Storage (NAS)

Key finding: Proposed a hybrid remote replication technique that dynamically partitions storage extents between continuous and snapshot replication based on overwrite characteristics, achieving the low network bandwidth of snapshot... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Virtual Disk Snapshots

On Self-Propulsion Assessment of Marine Vehicles

by abdi kukner

2025, Brodogradnja

Estimation of ship self-propulsion is important for the selection of the propulsion system and the main engine so that the ship can move forward with the required speed. Resistance characteristics of the vessel or the open-water... more

descriptionView Paper arrow_downwardDownload

Nested QoS: Providing Flexible Performance in Shared IO Environment

by Peter Varman

2024

descriptionView Paper arrow_downwardDownload

YOLO: Speeding up VM Boot Time by reducing I/O operations

by Thùy Lâm Nguyễn

2024

Several works have shown that the time to boot one virtual machine (VM) can last up to a few minutes in high consolidated cloud scenarios. This time is critical as VM boot duration defines how an application can react w.r.t. demands’... more

descriptionView Paper arrow_downwardDownload

IaaS Cloud as a virtual environment for experimentation in checkpoint analysis

by E. Fadón

2024, Journal of computer science and technology

Cloud Computing offers the possibility of computing resources, allowing remote access to software, storage and data processing through the Internet. Infrastructures as a Service (IaaS), it is a flexible space which can be used as an... more

descriptionView Paper arrow_downwardDownload

I/O Performance of Virtualized Cloud Environments

by Shane Canon

2024, OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information)

The scientific community is exploring the suitability of cloud infrastructure to handle High Performance Computing (HPC) applications. The goal of Magellan, a project funded through DOE ASCR, is to investigate the potential role of cloud... more

4. PERFORMANCE RESULTS Figure 1: IOR Results: Comparison of Direct vs Buffered I/O on NERSC systems

Figure 3: IOR Results: Comparison of Amazon platforms

1/O Performance on Different Platforms Figure 2: IOR Results: I/O Performance on All Platforms

Figure 4: Multinode MPI Shared Filesystem Results on NERSC global scratch and Amazon Cluster Compute in- stances

Figure 6: Large Scale Test Results Histogram and Kernel Density Plots

Table 1: Amazon EC2 Instance Types- Architecture. Source: http://aws.amazon.com/ec2/instance-types/

descriptionView Paper arrow_downwardDownload

Going back and forth

by syeda begum

2024, Proceedings of the 20th international symposium on High performance distributed computing

Infrastructure as a Service (IaaS) cloud computing has revolutionized the way we think of acquiring resources by introducing a simple change: allowing users to lease computational resources from the cloud provider's datacenter for a short... more

descriptionView Paper arrow_downwardDownload

A checkpointing mechanism for virtual clusters using memorybound time-multiplexed data transfers

by International Journal of Electrical and Computer Engineering (IJECE)

2024, International Journal of Electrical and Computer Engineering (IJECE)

Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of virtual machines (VMs) offer an attractive fault tolerance capability for cloud data centers. However, existing mechanisms have suffered... more

Figure 1. Architecture of Mekha all events occurring in the VC when performing VC checkpoint and restart operations. It controls VMs by sending instructions through the agents, which are daemon processes running on every host and sitting between the coordinator and VMs. There are two types of VM instances in Mekha: active and shadow instances. These instances are spawned, destroyed, and monitored by the agent running on the same physical host. Active instances are primary VMs that execute user tasks and are members of a VC. When Mekha starts a VC checkpoint operation, it spawns a shadow instance for an active instance; this shadow instance serves as a transient in-memory storage to store the state of the corresponding active instance. After the checkpoint operation is completed, the coordinator instructs these shadow instances to lazily save their state to persistent storages before terminating them once the state-saving process finishes.

Figure 3. An FSM diagram of the MTD algorithm

Figure 4. Phases and transmissions of the layer-2 frames during a Mekha's checkpoint operation

Figure 5. The virtual cluster architecture used in our experiments PEAT Ae PRCA ATES ENDO MERMAID CER ave meno, © SIRInS<ae IT) SNARE SRN ened ne ROP ANS ane aN O Ee renee yT I The experiments were conducted using three physical hosts, i.e., Hostl, Host2, and Host3, as show1 in Figure 5. Host! and Host2 are used for running a VC while Host3 is used for running the coordinator Host! and Host2 are Dell PowerEdge R720 servers. Each server has two hyper-threading 10 cores Xeon E5 2600v2 2.8GHz CPUs and 192GB DDR3 RAM. The local SSD storage is a 5|00GB Samsung 970 EVO Plu: NVMe card connected to the PCIe 3.0 of the server. Both hosts are connected to a 36TB Dell PS41105E EqualLogic SAN storage via a 10Gbps network and use iSCI protocol to access the storage. Each server ha: two ethernet network interface cards (NICs); the first NIC is connected to a 10Gbps Ethernet network (th data network in the figure) and the second NIC is connected to a 1Gbps network (the management network) Host3 is a server that has a 1Gbps Ethernet NIC connected to the management network. The network tim protocol (NTP) is used to synchronize time among these hosts. Figure 5 shows the VC architecture used it our experiments. All VMs in a VC are connected to a virtual network created over the 10Gbps Etherne network. We use Open vSwitch [32] to create virtual switches on Host1 and Host2 and link them togethe using a GRE tunnel over the 10 Gbps Ethernet network.

Figure 6. Comparing checkpoint performance metrics for different checkpoint mechanisms on the H1 and H2 VCs in (a) average checkpoint overheads per a checkpoint operation (sec) and (b) average checkpoint latency per a checkpoint operation (sec) SRE RAE SESS Oa A Se Smecnes ev Ie ANE nt oo Tan oe The latency of a VC checkpoint operation is the duration from the time the VC checkpoint mechanism starts to the time it ends. The average checkpoint latency (avg.latency) is referred to the latency that is created from a single checkpoint operation. From Figure 6(b), we can see that avg.latencies for each mechanism are dependent on the write IOPS and write bandwidth of checkpoint storage. Our experiments have revealed that most of the checkpoint latency is spent on saving VC state to storage, suggesting that latency is highly influenced by storage’s write IOPS and write bandwidth. As a result, avg.latencies for SSD storage are significantly lower than those for SAN storage due to the superior performance of SSD; for example, Mekha’s avg.latencies using SSD storage are 85.6% and 88.5% lower than those using SAN storage on H1 and H2 VCs respectively.

Figure 8. The trimmed checkpoint phase diagram shows impact of imbalance of the durations of VM memory pages transfer of (a) IMVCCR and (b) Mekha on the H2 cluster

descriptionView Paper arrow_downwardDownload

DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop

by Jason Ansel

2023, arXiv (Cornell University)

DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including... more

descriptionView Paper arrow_downwardDownload

I/O performance of virtualized cloud environments

by Shane Canon

2023, Proceedings of the second international workshop on Data intensive computing in the clouds

descriptionView Paper arrow_downwardDownload

Transparent checkpoint-restart over infiniband

by Kapil Arya

2023, Proceedings of the 23rd international symposium on High-performance parallel and distributed computing

InfiniBand is widely used for low-latency, high-throughput cluster computing. Saving the state of the InfiniBand network as part of distributed checkpointing has been a long-standing challenge for researchers. Because of a lack of a... more

descriptionView Paper arrow_downwardDownload

Virtualizing Disk Performance

by Anna Povzner

2023, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium

Large-and small-scale storage systems frequently serve a mixture of workloads, an increasing number of which require some form of performance guarantee. Providing guaranteed disk performance-the equivalent of a "virtual disk"-is... more

descriptionView Paper arrow_downwardDownload

Virtualizing disk performance with Fahrrad

by Anna Povzner

2023

descriptionView Paper arrow_downwardDownload

Towards a scalable, fault-tolerant, self-adaptive storage for the clouds

by Maria Rosa Adell Perez

2023

descriptionView Paper arrow_downwardDownload

An O(1) Method for Storage Snapshots

by Brian Stuart

2023, 9th International Workshop on Plan 9

The capability of taking snapshots is approaching ubiquity as a feature of file systems and data storage arrays. Here, we present an approach to structuring and managing snapshots in a storage space that provides for rapid creation and... more

descriptionView Paper arrow_downwardDownload

Towards a scalable, fault-tolerant, self-adaptive storage for the clouds

by Maria alessandra Escobar perez

2023

descriptionView Paper arrow_downwardDownload

Damaris: Leveraging Multicore Parallelism to Mask I/O Jitter

by A. Gabriel

2023

Résumé: With exascale computing on the horizon, the performance variability of I/O systems represents a key challenge in sustaining high performance. In many HPC applications, I/O is concurrently performed by all processes, which leads to... more

descriptionView Paper arrow_downwardDownload

Asynchronous snapshots of actor systems for latency-sensitive applications

by Hanspeter Mössenböck

2023, Proceedings of the 16th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes

The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging... more

descriptionView Paper arrow_downwardDownload

Distributed Virtual Disk Storage System

by Muhammad Arshad

2023

Distributed Virtual Disk Storage System (DVDSS) is a reliable and fully decentralized storage system which is based on the concept of Distributed Storage and Virtual Disk. It is Client / Server architecture and utilizes all free space of... more

descriptionView Paper arrow_downwardDownload

Silver: A Scalable, Distributed, Multi-versioning, Always Growing (Ag) File System

by Amy Tai

2022

The storage needs of users have shifted from just needing to store data to requiring a rich interface which enables the efficient query of versions, snapshots and creation of clones. Providing these features in a distributed file system... more

descriptionView Paper arrow_downwardDownload

Building Hierarchical Grid Storage Using the Gfarm Global File System and the JuxMem Grid Data-Sharing Service

by Majd Ghareeb

2022, Lecture Notes in Computer Science

As more and more large-scale applications need to generate and process very large volumes of data, the need for adequate storage facilities is growing. It becomes crucial to efficiently and reliably store and retrieve large sets of data... more

descriptionView Paper arrow_downwardDownload

Blizzard: fast, cloud-scale block storage for cloud-oblivious applications

by Osama Khan

2022, Networked Systems Design and Implementation

Blizzard is a high-performance block store that exposes cloud storage to cloud-oblivious POSIX and Win32 applications. Blizzard connects clients and servers using a network with full-bisection bandwidth, allowing clients to access any... more

descriptionView Paper arrow_downwardDownload

BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds

by Franck Cappello

2022, Journal of Parallel and Distributed Computing

Infrastructure-as-a-Service (IaaS) cloud computing is gaining significant interest in industry and academia as an alternative platform for running HPC applications. Given the need to provide fault tolerance, support for suspend-resume and... more

descriptionView Paper arrow_downwardDownload

FlexStore: A Software Defined, Energy Adaptive Distributed Storage Framework

by KRISHNA KANT

2022, 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems

In this paper we propose a flexible and scalable distributed storage framework called flexStore that can adapt to variations in available or consumable power and demonstrate its performance in the context of deduplicated virtual machine... more

descriptionView Paper arrow_downwardDownload

Resource monitoring and management with OVIS to enable HPC in cloud computing environments

by Matthew Wong

2022, 2009 IEEE International Symposium on Parallel & Distributed Processing

Using the cloud computing paradigm, a host of companies promise to make huge compute resources available to users on a pay-as-you-go basis. These resources can be configured on the fly to provide the hardware and operating system of... more

descriptionView Paper arrow_downwardDownload

Checkpointing as a Service in Heterogeneous Cloud Environments

by aaina arora

2022, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform... more

descriptionView Paper arrow_downwardDownload

Silecs Grid 5000 Le Volet Data Center De Silecs Presentation et Exemples D Experiences

by Zigla Atour

2022, TILECS - Towards an Infrastructure for Large-Scale Experimental Computer Science

An advanced and established infrastructure for the data-center facets of Computer Science Large-scale, distributed Shared (many involved laboratories and institutions) Designed for reconfigurability, observability, reproducible research... more

descriptionView Paper arrow_downwardDownload

Providing Consistent State to Distributed Storage System

by Ram Prasad Reddy Sadi

2022

In cloud storage systems, users must be able to shut down the application when not in use and restart it from the last consistent state when required. BlobSeer is a data storage application, specially designed for distributed systems,... more

descriptionView Paper arrow_downwardDownload

Distributed Virtual Disk Storage System

by Muhammad Arshad

2022

descriptionView Paper arrow_downwardDownload

Management of virtual machine images in heterogeneous clouds

by Paula Prata

2022, International Journal of Computational Science and Engineering

This paper presents the image server of the VISOR cloud agnostic virtual machine images management service. An evaluation approach is also described and the results are discussed. VISOR is not intended to fit in a specific cloud framework... more

descriptionView Paper arrow_downwardDownload

HydraVM: Low-Cost, Transparent High Availability for Virtual Machines

by Mustafa Uysal

2022

Existing approaches to providing high availability (HA) for virtualized environments require a backup VM for every primary running VM. These approaches are expensive in memory because the backup VM requires the same amount of memory as... more

descriptionView Paper arrow_downwardDownload

Increasing Reliability through Dynamic Virtual Clustering

by Dan Stanzione

2022

In a scientific community that increasingly relies upon High Performance Computing (HPC) for large scale simulations and analysis, the reliability of hardware and applications devoted to HPC is extremely important. While hardware... more

descriptionView Paper arrow_downwardDownload

VMAR: Optimizing I/O Performance and Resource Utilization in the Cloud

by N. Fuller

2022, Lecture Notes in Computer Science

A key enabler for standardized cloud services is the encapsulation of software and data into VM images. With the rapid evolution of the cloud ecosystem, the number of VM images is growing at high speed. These images, each containing... more

descriptionView Paper arrow_downwardDownload

Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications

by Engr. Osama Ali Khan

2022

descriptionView Paper arrow_downwardDownload

Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds

by LEONARDO SANTIAGO NAVA GOMEZ

2022

With increasing interest among mainstream users to run HPC applications, Infrastructure-as-a-Service (IaaS) cloud computing platforms represent a viable alternative to the acquisition and maintenance of expensive hardware, often out of... more

descriptionView Paper arrow_downwardDownload

Providing Consistent State to Distributed Storage System

by Ragunathan Thirumalaisamy

2022, Computers

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

descriptionView Paper arrow_downwardDownload

Developing Checkpointing and Recovery Procedures with the Storage Services of Amazon Web Services

by Rafaela Brum

2022, 49th International Conference on Parallel Processing - ICPP : Workshops

Checkpoint/Recovery Checkpoint/Recovery is a common technique for imbuing a program or system with fault tolerant qualities. It allows tasks to recover after some interruption, failure or task abortion.

descriptionView Paper arrow_downwardDownload

A Reed-Solomon Code for Disk Storage, and Efficient Recovery Computations for Erasure-Coded Disk Storage

by Mark Manasse

2022

Reed-Solomon erasure codes provide efficient simple techni ques for re- dundantly encoding information so that the failure of a few disks in a disk array doesn't compromise the availability of data. This pap er presents a tech- nique... more

When we take an arbitrary m by m submatrix of this coding matrix, evaluating the determinant by picking available columns from the identity portion of the matrix makes it clear that the determinant is equal (up to sign, which is irrelevant over a field of characteristic two) to the determinant of the 0 by 0, | by 1, 2 by 2, or For our use, let g be a generator of the finite field. The elements from which we generate our Vandermonde matrix are g° (i.e., 1), g! (i.e., g), and g*. More explic- itly, if our coding matrix is M, for i < m, and j < m, Mjj = 5, and Mi (n+e = git for k = 0, 1, or 2. More explicitly, M can defined as follows:

descriptionView Paper arrow_downwardDownload

A Reed-Solomon Code for Disk Storage, and Efficient Recovery Computations for Erasure-Coded Disk Storage

by Mark Manasse

2022

descriptionView Paper arrow_downwardDownload

ABSTRACT PeerStripe: A P2P-Based Large-File Storage for Desktop Grids

by Chreston Miller

2022

In desktop grids the use of off-the-shelf shared components makes the use of dedicated resources economically nonviable and increases the complexity of design of efficient storage systems that are required to address the exponentially... more

descriptionView Paper arrow_downwardDownload

Reliable and Efficient Checkpoint/Recovery in Shared Grid Environments

by Tanzima Islam

2022

In a Fine-Grained Cycle Sharing (FGCS) system [1], machine owners voluntarily share their unused CPU cycles with guest jobs, as long as their performance degradation is tolerable. However, for guest users, these free computation resources... more

descriptionView Paper arrow_downwardDownload

Reliable and Efficient Checkpoint/Recovery in Shared Grid Environments

by Tanzima Islam

2022

descriptionView Paper arrow_downwardDownload

FREM: A fast restart mechanism for general Checkpoint/Restart

by Zhiling Lan

2022

Abstract—As failure rate keeps on increasing in large systems, applications running atop restart more frequently than ever. Existing research on checkpoint/restart mainly focuses on optimizing checkpoint operation, without paying much... more

descriptionView Paper arrow_downwardDownload

Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications

by Osama Khan

2022

descriptionView Paper arrow_downwardDownload

Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications

by Osama Khan

2022

descriptionView Paper arrow_downwardDownload

Toward a high availability cloud: Techniques and challenges

by cuong pham

2021

Cloud computing in its many forms has become the key computing-infrastructure that supports business and more recently governmental computing across the globe. With its geographical spread and value proposition comes the need to provide... more

descriptionView Paper arrow_downwardDownload

A Reed-Solomon Code for Disk Storage, and Efficient Recovery Computations for Erasure-Coded Disk Storage

by Mark S Manasse

2021

Reed-Solomon erasure codes provide efficient simple techniques for redundantly encoding information so that the failure of a few disks in a disk array doesn’t compromise the availability of data. This paper presents a technique for... more

descriptionView Paper arrow_downwardDownload

Monitoring the BlobSeer distributed data-management platform using the MonALISA framework

by Ancuța Costan

2021

Comment utiliser MonALISA pour surveiller la plate-forme de gestion de données réparties BlobSeer Résumé : La surveillance des grilles est un domaine actif de la recherche, visant à la fois la surveillance des ressources et la... more

descriptionView Paper arrow_downwardDownload

Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems

by Fabio Kon

2021, Proceedings of the 3rd international workshop on Middleware for grid computing - MGC '05

Dealing with the large amounts of data generated by longrunning parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more... more

descriptionView Paper arrow_downwardDownload

Strategies for Checkpoint Storage on Opportunistic Grids

by Fabio Kon

2021, IEEE Distributed Systems Online

This article evaluates several strategies for storing checkpoint data in an opportunistic grid environment, including replication, parity information, and erasure coding. This evaluation compares the computational overhead, storage... more

descriptionView Paper arrow_downwardDownload

Toward a high availability cloud: Techniques and challenges

by Zbigniew Kalbarczyk

2021, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012)

descriptionView Paper arrow_downwardDownload

Virtual Disk Snapshots

Key research themes

1. How can virtualization-aware file systems enhance the flexibility and management of virtual disk snapshots beyond conventional coarse-grained rollback?

2. What strategies optimize storage efficiency and reliability in cloud-based virtual disk snapshots using deduplication and erasure coding?

3. How do advanced snapshot management techniques improve scalability and efficiency of virtual disk snapshots in distributed storage systems and clouds?

Related Topics

All papers in Virtual Disk Snapshots