Performance of a mass storage system for video-on-demand
Page 1. Performance of A Mass Storage System for Video-On-Demand Jenwei Hsieh, Mengjou Lin&am... more Page 1. Performance of A Mass Storage System for Video-On-Demand Jenwei Hsieh, Mengjou Lin', Jonathan CL Liu, David HC Du and Thomas M. R~wart')~ Distributed Multimedia Center, University of Minnesota3 'Advanced Technology Group, Apple Computer, Inc. ...
The authors outline the design concepts and issues being explored, using a large-scale, high-perf... more The authors outline the design concepts and issues being explored, using a large-scale, high-performance file server, in the High Performance File Server (HPFS) test bed at the US Army High Performance Computing Research Center (AHPCRC) at the University of Minnesota. They present the implementation of a variety of concepts described in the Mass Storage System Reference Model, and being developed by the IEEE Storage System Standards Working Group. The concepts implemented by this project include the separation of data and control within a distributed file system manager to effectively use a separate, high-speed data path managed by a Bitfile Mover. This project has been completed to the point of a successful demonstration of the functionality of the AFS Client interface using Bitfile Movers to read data over a HIPPI (High-Performance Parallel Interface) channel.<<ETX>>
This paper describes a simple but effective method to generate and observe the effects of externa... more This paper describes a simple but effective method to generate and observe the effects of external vibration on the read and write bandwidth of a disk drive. Furthermore, it quantifies these effects as a first-order numerical approximation. This paper is not intended to rate disk drives relative to manufacturer, model, size, form factor, …etc. Rather it is simply intended to answer the simple questions of "Is there a performance impact of external vibration on a disk drive?" and "How significant is that impact?" After testing several 3.5-and 2.5-inch consumer-grade and enterprise-class disk drives the conclusion is that the consumer-grade disk drives are more sensitive to external vibration. In the presence of an external vibration caused by adjacent disk drive seek operations the bandwidth performance of a consumer-grade disk drive "feeling" these vibrations will decrease about 10%-15% when reading data and about 25%-40% when writing data. The final qualitative result of this study is that disk drive packaging is likely the most significant factor in reducing the vibrational effects.
Obtaining consistent bandwidth with predictable latency from disk-based storage systems has prove... more Obtaining consistent bandwidth with predictable latency from disk-based storage systems has proven difficult due to the storage system's inability to understand Quality of Service (QoS) requirements. In this paper, we present a feasibility study of QoS with the Object-based Storage Device (OSD) specification. We look at OSD's ability to provide QoS guarantees for consistent bandwidth with predictable latency. Included in this paper is a description of QoS requirements of a sample application and how these requirements are translated into parameters that are then communicated to, and interpreted by, the OSD. Implementation problems lead to the failure of a hard real-time QoS model, but this failure is not due to the OSD protocol. The paper concludes with a description of how well the Revision 9 OSD standard (OSDR9) is able to accommodate QoS. We provide suggestions for improving the OSD specification and its ability to communicate QoS requirements.
The InTENsity PowerWall is a display system used for high-resolution visualization of very large ... more The InTENsity PowerWall is a display system used for high-resolution visualization of very large volumetric data sets. The display is linked to two separate computing environments consisting of more than a dozen computer systems. Linking these systems is a common shared storage subsystem that allows a great deal of flexibility in the way visualization data can be generated and displayed. These visualization applications demand very high bandwidth performance from the storage subsystem and associated file system. The InTENsity PowerWall system presents a real-world application environment in which to apply a distributed performance testing framework under development at the Laboratory for Computational Science and Engineering at the University of Minnesota. This testing framework allows us to perform focused, coordinated performance testing of the hardware and software components of storage area networks and shared file systems. [2] We use this framework to evaluate various performance characteristics of the PowerWall system's storage area network. We describe our testing approach and some of the results of our testing, and conclude by describing the direction of our future work in this area.
Disk subsystems span the range of configuration complexity from single disk drives to large insta... more Disk subsystems span the range of configuration complexity from single disk drives to large installations of disk arrays. They can be directly attached to individual computer systems or configured as larger, shared access Storage Area Networks (SANs). It is a significant task to evaluate the performance of these subsystems especially when considering the range of performance requirements of any particular installation and application. Storage subsystems can be designed to meet different performance criteria such as bandwidth, transactions per second, latency, capacity, connectivity, …etc. but the question of how the subsystem will perform depends on the software and hardware layering and the number of layers an I/O request must traverse in order to perform the actual operation. As an I/O request traverses more and more software and hardware layers, alignment and request size fragmentation can result in performance anomalies that can degrade the overall bandwidth and transaction rates. Layer traversal can have a significant negative impact on the observed performance of even the fastest hardware components. This paper walks through the Storage Subsystem Hierarchy, defining these layers, presents a method for testing in single and multiple computer environments, and demonstrates the significance of careful, in-depth evaluation of Storage Subsystem Performance.
Applications at the Army High Performance Computing Research Center's (AHPCRC) Graphics and Visua... more Applications at the Army High Performance Computing Research Center's (AHPCRC) Graphics and Visualization Laboratory (GVL) at the University of Minnesota require a tremendous amount of I/O bandwidth and this appetite for data is growing. Silicon Graphics workstation are used to perform the post-processing, visualization, and animation of multi-terabyte size datasets produced by scientific simulations performed on AHPCRC supercomputers. The M.A.X. (Maximum Achievable Xfer) was designed to find the maximum achievable I/O performance of the Silicon Graphics CHALLENGE/Onyx-class machines that run these applications. Running a fully configured Onyx machine with 12 -150MHz R4400 processors, 512MB of 8-way interleaved memory, 31 fast/wide SCSI-2 channels each with a Ciprico disk array controller we were able to achieve a maximum sustained transfer rate of 509.8 megabytes per second. However, after analyzing the results it became clear that the true maximum transfer rate is somewhat beyond this figure and we will need to do further testing with more disk array controllers in order to find the true maximum. N_t_t_II%_ PA_]_ I_AN'I[ lq13¢ I_,M_,Y ) 75
With the unprecedented proliferation of high-speed communication services many people will have h... more With the unprecedented proliferation of high-speed communication services many people will have high-speed access to virtually unlimited amounts of information in many forms of media. A tremendous amount of research effort is being applied to the problems of how to provide these multi-media services to tens and hundreds of thousands of concurrent users. The I/O problems alone are significant and represent a form of data intensive computing. This paper attempts to place analytical boundaries on the problems of designing and building a scalable multimedia storage server. The result is a compelling case for network attached storage as well as a set of equations that can be used to estimate the hardware requirements for such a system.
Interactive smooth-motion animation of high resolution ocean circulation calculations
'Challenges of Our Changing Global Environment'. Conference Proceedings. OCEANS '95 MTS/IEEE
Abstract The authors describe how their recent high resolution North Atlantic ocean circulation c... more Abstract The authors describe how their recent high resolution North Atlantic ocean circulation calculations were visualized and animated. The calculations, performed on the Cray T3D at the Pittsburgh Supercomputer Center, required display technology beyond ...
16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098)
The bandwidth performance of a Fibre Channel Arbitrated Loop (FCAL) is roughly defined to be 100 ... more The bandwidth performance of a Fibre Channel Arbitrated Loop (FCAL) is roughly defined to be 100 MegaBytes (10 6 bytes) per second. Furthermore, FCAL is capable of a theoretical peak of 40,000 I/O operations (transactions) per second., These performance levels, however, are largely not realized by the applications that use Fibre Channel as an interface to disk subsystems. The bandwidth and transaction performance of an Arbitrated Loop is sensitive to both the number of devices on the loop as well as the physical length of the loop. This study focuses on the effects of these two factors on the observed performance of Fibre Channel Arbitrated Loop as the number of nodes is scaled from 2 to 97 devices and as the physical length of the loop is scaled from 50 meters to several kilometers in length. To summarize, this study shows that the performance decreases significantly for very long loops and explains how this can be partially avoided. Also, the loop propagation delay on loops with many devices has only a moderate affect on performance. Finally, the effects of length tend to dominate the effects of population for very long, highly populated loops. The factors associated with the performance of long distance and highly populated networks and shorterdistance, lightly populated I/O channels have been well studied. Fibre Channel allows for the connection of a relatively large number of devices to a single I/O channel. At the same time it has many characteristics of a low latency, high bandwidth network. Typically when Fibre Channel is implemented in a storage environment, there are relatively few disk drives on a relatively short arbitrated loop. The propagation delay imposed by the physical length of the loop is normally insignificant compared to the latencies imposed by the disk drives themselves (i.e. rotational and seek latencies). Furthermore, the small number of disk drives that populate the loop do not contribute any significant propagation delay overall. The Fibre Channel architecture makes it possible to extend an arbitrated loop well beyond the "typical" physical length and population scales. This can be done in order to accommodate, for example, direct access to physically remote disk drives or a completely populated loop of disk devices. Such a system exists at the University of Minnesota, where a 128-processor Origin 2000 computer system at one facility (the Minnesota Supercomputer Institute) is attached directly to a highperformance, high-capacity disk storage subsystem located at another facility (the Laboratory for Computational Science and Engineering ) at a distance of approximately 3.8 kilometers. Since bandwidth performance is critical in this application, the effects of the extended distance of the loop needed to be considered. The infrastructure installed to support this system turned out to be an ideal test bed to construct a highly populated 30 kilometer loop. This, along with generous equipment loans from Seagate Technology, MTI, Finisar, Ciprico, and Ancor Communications, made it possible to investigate the effects of distance and loop population on the observed performance of a Fibre Channel Arbitrated Loop storage subsystem.
22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05)
This paper describes a simple but effective method to generate and observe the effects of externa... more This paper describes a simple but effective method to generate and observe the effects of external vibration on the read and write bandwidth of a disk drive. Furthermore, it quantifies these effects as a first-order numerical approximation. This paper is not intended to rate disk drives relative to manufacturer, model, size, form factor, …etc. Rather it is simply intended to answer the simple questions of "Is there a performance impact of external vibration on a disk drive?" and "How significant is that impact?" After testing several 3.5-and 2.5-inch consumer-grade and enterprise-class disk drives the conclusion is that the consumer-grade disk drives are more sensitive to external vibration. In the presence of an external vibration caused by adjacent disk drive seek operations the bandwidth performance of a consumer-grade disk drive "feeling" these vibrations will decrease about 10%-15% when reading data and about 25%-40% when writing data. The final qualitative result of this study is that disk drive packaging is likely the most significant factor in reducing the vibrational effects.
22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05)
Obtaining consistent bandwidth with predictable latency from disk-based storage systems has prove... more Obtaining consistent bandwidth with predictable latency from disk-based storage systems has proven difficult due to the storage system's inability to understand Quality of Service (QoS) requirements. In this paper, we present a feasibility study of QoS with the Object-based Storage Device (OSD) specification. We look at OSD's ability to provide QoS guarantees for consistent bandwidth with predictable latency. Included in this paper is a description of QoS requirements of a sample application and how these requirements are translated into parameters that are then communicated to, and interpreted by, the OSD. Implementation problems lead to the failure of a hard real-time QoS model, but this failure is not due to the OSD protocol. The paper concludes with a description of how well the Revision 9 OSD standard (OSDR9) is able to accommodate QoS. We provide suggestions for improving the OSD specification and its ability to communicate QoS requirements.
[1993] Proceedings Twelfth IEEE Symposium on Mass Storage systems
The authors outline the design concepts and issues being explored, using a large-scale, high-perf... more The authors outline the design concepts and issues being explored, using a large-scale, high-performance file server, in the High Performance File Server (HPFS) test bed at the US Army High Performance Computing Research Center (AHPCRC) at the University of Minnesota. They present the implementation of a variety of concepts described in the Mass Storage System Reference Model, and being developed by the IEEE Storage System Standards Working Group. The concepts implemented by this project include the separation of data and control within a distributed file system manager to effectively use a separate, high-speed data path managed by a Bitfile Mover. This project has been completed to the point of a successful demonstration of the functionality of the AFS Client interface using Bitfile Movers to read data over a HIPPI (High-Performance Parallel Interface) channel.<<ETX>>
IEEE Transactions on Parallel and Distributed Systems, 2015
Dual-core execution (DCE) is an execution paradigm proposed to utilize chip multiprocessors to im... more Dual-core execution (DCE) is an execution paradigm proposed to utilize chip multiprocessors to improve the performance of single-threaded applications. Previous research has shown that DCE provides a complexity-effective approach to building a highly scalable instruction window and achieves significant latency-hiding capabilities. In this paper, we propose to optimize DCE for power efficiency and/or transient-fault recovery. In DCE, a program is first processed (speculatively) in the front processor and then reexecuted by the back processor. Such reexecution is the key to eliminating the centralized structures that are normally associated with very large instruction windows. In this paper, we exploit the computational redundancy in DCE to improve its reliability and its power efficiency. The main contributions include: 1) DCE-based redundancy checking for transient-fault tolerance and a complexity-effective approach to achieving full redundancy coverage and 2) novel techniques to improve the power/energy efficiency of DCE-based execution paradigms. Our experimental results demonstrate that, with the proposed simple techniques, the optimized DCE can effectively achieve transientfault tolerance or significant performance enhancement in a power/energy-efficient way. Compared to the original DCE, the optimized DCE has similar speedups (34 percent on average) over single-core processors while reducing the energy overhead from 93 percent to 31 percent.
Uploads
Papers by Thomas Ruwart