Academia.eduAcademia.edu

Virtual Disk Snapshots

description16 papers
group21 followers
lightbulbAbout this topic
Virtual disk snapshots are a technology used in virtualization that captures the state, data, and configuration of a virtual machine's disk at a specific point in time. This allows for the preservation of the virtual machine's environment, enabling users to revert to that state if needed, facilitating backup, recovery, and testing processes.
lightbulbAbout this topic
Virtual disk snapshots are a technology used in virtualization that captures the state, data, and configuration of a virtual machine's disk at a specific point in time. This allows for the preservation of the virtual machine's environment, enabling users to revert to that state if needed, facilitating backup, recovery, and testing processes.

Key research themes

1. How can virtualization-aware file systems enhance the flexibility and management of virtual disk snapshots beyond conventional coarse-grained rollback?

This research theme focuses on overcoming the inherent limitations of conventional virtual disks, mainly their coarse-grained, all-or-nothing rollback and lack of internal structure that impede fine-grained sharing, searching, and secure management. Virtualization-aware file systems (VAFS) integrate versioning, mobility, and access control features of virtual disks with the fine-grained sharing and usability benefits of distributed file systems. This hybrid approach aims to enable more flexible virtual machine (VM) snapshotting, minimize storage overhead, and improve security and manageability of virtual disk snapshots in dynamic VM environments.

Key finding: Introduced Ventana, a virtualization aware file system that blends the versioning, isolation, and mobility of virtual disks with the fine-grained sharing and access-controlled features of distributed file systems. Unlike... Read more

2. What strategies optimize storage efficiency and reliability in cloud-based virtual disk snapshots using deduplication and erasure coding?

This theme explores techniques to reduce storage costs and improve fault tolerance for virtual disk snapshots, especially within Infrastructure-as-a-Service (IaaS) clouds hosting HPC or large-scale applications. Deduplication eliminates redundant data chunks across snapshots or VMs, considerably reducing space and I/O bandwidth requirements. Erasure codes, particularly Reed-Solomon coding, provide fault tolerance with less storage overhead compared to replication but require optimization for bandwidth efficiency during data writes. Together, these approaches address challenges of large-scale snapshot storage, balancing performance, reliability, and cost.

Key finding: Provided a comprehensive classification framework for deduplication approaches relevant to virtual disk snapshots, detailing design decisions regarding granularity, locality, timing, indexing, technique, and scope.... Read more
Key finding: Developed a Reed-Solomon erasure coding algorithm optimized for concurrent large-scale data dumps to local disks in IaaS, reducing bandwidth consumption while achieving high reliability. Integrated within a checkpoint-restart... Read more
Key finding: Introduced BlobCR, a checkpoint-restart framework enabling incremental snapshots of entire VM disks using selective copy-on-write techniques. By asynchronously persisting VM disk snapshots with low overhead, BlobCR reduces... Read more

3. How do advanced snapshot management techniques improve scalability and efficiency of virtual disk snapshots in distributed storage systems and clouds?

This research area investigates algorithms and system designs that manage large volumes of virtual disk snapshots by optimizing snapshot creation, storage, and retrieval in distributed environments. Focus areas include reducing snapshot size and fragmentation, increasing snapshot accessibility, and supporting operations like suspend-resume and migration. Techniques such as copy-on-write with snapshot disentanglement, versioning file systems optimized for multi-deployment and multi-snapshot operations, and hybrid remote replication methods aim to balance storage overhead, network bandwidth use, and I/O performance for scalable snapshot management.

Key finding: Designed a virtual file system optimized for concurrent deployment and snapshotting of large numbers of VMs in IaaS clouds. By leveraging lazy transfer schemes combined with object versioning on a distributed storage service... Read more
Key finding: Through detailed design principles and an implementation based on versioning storage services, this work provided a scalable method for handling the common cloud patterns of deploying and snapshotting large VM collections.... Read more
Key finding: Introduced Thresher, a snapshot storage management system that efficiently segregates snapshots based on application-provided rankings to discriminate snapshot importance. By combining ranked segregation with techniques like... Read more
Key finding: Proposed a hybrid remote replication technique that dynamically partitions storage extents between continuous and snapshot replication based on overwrite characteristics, achieving the low network bandwidth of snapshot... Read more

All papers in Virtual Disk Snapshots

Estimation of ship self-propulsion is important for the selection of the propulsion system and the main engine so that the ship can move forward with the required speed. Resistance characteristics of the vessel or the open-water... more
Several works have shown that the time to boot one virtual machine (VM) can last up to a few minutes in high consolidated cloud scenarios. This time is critical as VM boot duration defines how an application can react w.r.t. demands’... more
Cloud Computing offers the possibility of computing resources, allowing remote access to software, storage and data processing through the Internet. Infrastructures as a Service (IaaS), it is a flexible space which can be used as an... more
The scientific community is exploring the suitability of cloud infrastructure to handle High Performance Computing (HPC) applications. The goal of Magellan, a project funded through DOE ASCR, is to investigate the potential role of cloud... more
Infrastructure as a Service (IaaS) cloud computing has revolutionized the way we think of acquiring resources by introducing a simple change: allowing users to lease computational resources from the cloud provider's datacenter for a short... more
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of virtual machines (VMs) offer an attractive fault tolerance capability for cloud data centers. However, existing mechanisms have suffered... more
DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including... more
The scientific community is exploring the suitability of cloud infrastructure to handle High Performance Computing (HPC) applications. The goal of Magellan, a project funded through DOE ASCR, is to investigate the potential role of cloud... more
InfiniBand is widely used for low-latency, high-throughput cluster computing. Saving the state of the InfiniBand network as part of distributed checkpointing has been a long-standing challenge for researchers. Because of a lack of a... more
Large-and small-scale storage systems frequently serve a mixture of workloads, an increasing number of which require some form of performance guarantee. Providing guaranteed disk performance-the equivalent of a "virtual disk"-is... more
The capability of taking snapshots is approaching ubiquity as a feature of file systems and data storage arrays. Here, we present an approach to structuring and managing snapshots in a storage space that provides for rapid creation and... more
Résumé: With exascale computing on the horizon, the performance variability of I/O systems represents a key challenge in sustaining high performance. In many HPC applications, I/O is concurrently performed by all processes, which leads to... more
The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging... more
Distributed Virtual Disk Storage System (DVDSS) is a reliable and fully decentralized storage system which is based on the concept of Distributed Storage and Virtual Disk. It is Client / Server architecture and utilizes all free space of... more
The storage needs of users have shifted from just needing to store data to requiring a rich interface which enables the efficient query of versions, snapshots and creation of clones. Providing these features in a distributed file system... more
As more and more large-scale applications need to generate and process very large volumes of data, the need for adequate storage facilities is growing. It becomes crucial to efficiently and reliably store and retrieve large sets of data... more
Blizzard is a high-performance block store that exposes cloud storage to cloud-oblivious POSIX and Win32 applications. Blizzard connects clients and servers using a network with full-bisection bandwidth, allowing clients to access any... more
Infrastructure-as-a-Service (IaaS) cloud computing is gaining significant interest in industry and academia as an alternative platform for running HPC applications. Given the need to provide fault tolerance, support for suspend-resume and... more
In this paper we propose a flexible and scalable distributed storage framework called flexStore that can adapt to variations in available or consumable power and demonstrate its performance in the context of deduplicated virtual machine... more
Using the cloud computing paradigm, a host of companies promise to make huge compute resources available to users on a pay-as-you-go basis. These resources can be configured on the fly to provide the hardware and operating system of... more
A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform... more
An advanced and established infrastructure for the data-center facets of Computer Science Large-scale, distributed Shared (many involved laboratories and institutions) Designed for reconfigurability, observability, reproducible research... more
In cloud storage systems, users must be able to shut down the application when not in use and restart it from the last consistent state when required. BlobSeer is a data storage application, specially designed for distributed systems,... more
Distributed Virtual Disk Storage System (DVDSS) is a reliable and fully decentralized storage system which is based on the concept of Distributed Storage and Virtual Disk. It is Client / Server architecture and utilizes all free space of... more
This paper presents the image server of the VISOR cloud agnostic virtual machine images management service. An evaluation approach is also described and the results are discussed. VISOR is not intended to fit in a specific cloud framework... more
Existing approaches to providing high availability (HA) for virtualized environments require a backup VM for every primary running VM. These approaches are expensive in memory because the backup VM requires the same amount of memory as... more
In a scientific community that increasingly relies upon High Performance Computing (HPC) for large scale simulations and analysis, the reliability of hardware and applications devoted to HPC is extremely important. While hardware... more
A key enabler for standardized cloud services is the encapsulation of software and data into VM images. With the rapid evolution of the cloud ecosystem, the number of VM images is growing at high speed. These images, each containing... more
Blizzard is a high-performance block store that exposes cloud storage to cloud-oblivious POSIX and Win32 applications. Blizzard connects clients and servers using a network with full-bisection bandwidth, allowing clients to access any... more
With increasing interest among mainstream users to run HPC applications, Infrastructure-as-a-Service (IaaS) cloud computing platforms represent a viable alternative to the acquisition and maintenance of expensive hardware, often out of... more
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Checkpoint/Recovery Checkpoint/Recovery is a common technique for imbuing a program or system with fault tolerant qualities. It allows tasks to recover after some interruption, failure or task abortion.
Reed-Solomon erasure codes provide efficient simple techni ques for re- dundantly encoding information so that the failure of a few disks in a disk array doesn't compromise the availability of data. This pap er presents a tech- nique... more
Reed-Solomon erasure codes provide efficient simple techni ques for re- dundantly encoding information so that the failure of a few disks in a disk array doesn't compromise the availability of data. This pap er presents a tech- nique... more
In desktop grids the use of off-the-shelf shared components makes the use of dedicated resources economically nonviable and increases the complexity of design of efficient storage systems that are required to address the exponentially... more
In a Fine-Grained Cycle Sharing (FGCS) system [1], machine owners voluntarily share their unused CPU cycles with guest jobs, as long as their performance degradation is tolerable. However, for guest users, these free computation resources... more
In a Fine-Grained Cycle Sharing (FGCS) system [1], machine owners voluntarily share their unused CPU cycles with guest jobs, as long as their performance degradation is tolerable. However, for guest users, these free computation resources... more
Abstract—As failure rate keeps on increasing in large systems, applications running atop restart more frequently than ever. Existing research on checkpoint/restart mainly focuses on optimizing checkpoint operation, without paying much... more
Blizzard is a high-performance block store that exposes cloud storage to cloud-oblivious POSIX and Win32 applications. Blizzard connects clients and servers using a network with full-bisection bandwidth, allowing clients to access any... more
Blizzard is a high-performance block store that exposes cloud storage to cloud-oblivious POSIX and Win32 applications. Blizzard connects clients and servers using a network with full-bisection bandwidth, allowing clients to access any... more
Cloud computing in its many forms has become the key computing-infrastructure that supports business and more recently governmental computing across the globe. With its geographical spread and value proposition comes the need to provide... more
Reed-Solomon erasure codes provide efficient simple techniques for redundantly encoding information so that the failure of a few disks in a disk array doesn’t compromise the availability of data. This paper presents a technique for... more
Comment utiliser MonALISA pour surveiller la plate-forme de gestion de données réparties BlobSeer Résumé : La surveillance des grilles est un domaine actif de la recherche, visant à la fois la surveillance des ressources et la... more
Dealing with the large amounts of data generated by longrunning parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more... more
This article evaluates several strategies for storing checkpoint data in an opportunistic grid environment, including replication, parity information, and erasure coding. This evaluation compares the computational overhead, storage... more
Cloud computing in its many forms has become the key computing-infrastructure that supports business and more recently governmental computing across the globe. With its geographical spread and value proposition comes the need to provide... more
Download research papers for free!