Academia.eduAcademia.edu

Outline

Benchmarking Dependability of MapReduce Systems

2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems

https://doi.org/10.1109/SRDS.2012.12

Abstract

MapReduce is a popular programming model for distributed data processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance to new fault-tolerance models. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions have been evaluated using microbenchmarks in an ad-hoc and overly simplified setting, which may not be representative of real-world applications. This paper presents MRBS, a comprehensive benchmark suite for evaluating the dependability of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to inject various types of faults at different rates. It also considers different application workloads and dataloads, and produces extensive reliability, availability and performance statistics. We illustrate the use of MRBS with Hadoop clusters running on Amazon EC2, and on a private cloud.

References (34)

  1. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Pro- cessing on Large Clusters," in USENIX Symp. on Operating Systems Design and Implementation (OSDI), 2004.
  2. Z. Fadika and M. Govindaraju, "LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Mem- ory and CPU-Intensive Applications," in IEEE Int. Conf. on Cloud Computing Technology and Science (CloudCom), 2010.
  3. G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Green- berg, I. Stoica, D. Harlan, and E. Harris, "Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters," in EuroSys, 2011.
  4. M. Eltabakh, Y. Tian, F. Ozcan, R. Gemulla, A. Krettek, and J. McPherson, "CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop," in Int. Conf. on Very Large Data Bases (VLDB), 2011.
  5. H. Jin, X. Yang, X.-H. Sun, and I. Raicu, "ADAPT: Availability-Aware MapReduce Data Placement in Non- Dedicated Distributed Computing Environment," in IEEE Int. Conf. on Distributed Computing Systems (ICDCS), 2012.
  6. H. Lin, X. Ma, J. Archuleta, W.-c. Feng, M. Gardner, and Z. Zhang, "MOON: MapReduce On Opportunistic eNviron- ments," in ACM Int. Symp. on High Performance Distributed Computing (HPDC), 2010.
  7. A. N. Bessani, V. V. Cogo, M. Correia, P. Costa, M. Pasin, F. Silva, L. Arantes, O. Marin, P. Sens, and J. Sopena, "Making Hadoop MapReduce Byzantine Fault-Tolerant," in IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN), Fast abstract, 2010.
  8. S. Y. Ko, I. Hoque, B. Cho, and I. Gupta, "Making Cloud Intermediate Data Fault-Tolerant," in ACM Symp. on Cloud computing (SoCC), 2010.
  9. "Fault injection framework." http://hadoop.apache.org/hdfs/ docs/r0.21.0/faultinject framework.
  10. T. Condie, N. Conway, P. Alvaro, J. Hellerstein, K. Elmeleegy, and R. Sears, "MapReduce Online," in USENIX Symp. on Networked Systems Design and Implementation (NSDI), 2010.
  11. H. Liu and D. Orban, "Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating System," in IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing (CCGRID), 2011.
  12. "MovieLens web site." http://movielens.umn.edu/.
  13. "Wikipedia Dump." http://meta.wikimedia.org/wiki/Data dumps.
  14. "Genomic research centre." http://www.sanger.ac.uk/.
  15. T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2009. http://hadoop.apache.org/.
  16. A. Sangroya, D. Serrano, and S. Bouchenak, "MRBS: A Comprehensive MapReduce Benchmark Suite," Research Re- port RR-LIG-024, LIG, Grenoble, France, 2012.
  17. "TPC Benchmark H - Standard Specification." http://www.tpc.org/tpch/.
  18. "Apache Hive." http://hive.apache.org/.
  19. M. C. Schatz, "CloudBurst: Highly Sensitive Read Mapping with MapReduce," Bioinformatics, 2009.
  20. "Apache Mahout." http://mahout.apache.org.
  21. "20 Newsgroups." http://people.csail.mit.edu/jrennie/ 20Newsgroups/.
  22. "Amazon Elastic Compute Cloud (Amazon EC2)." http://aws.amazon.com/ec2/.
  23. J. claude Laprie, "Dependable computing and fault-tolerance: Concepts and terminology," in 25th International Symposium on Fault-Tolerant Computing, 1995.
  24. F. C. et. al., "Grid'5000: A Large Scale and Highly Recon- figurable Grid Experimental Testbed," Int. Journal of High Performance Computing Applications (IJHPCA), 2006.
  25. "TPC-C: an on-line transaction processing benchmark." http://www.tpc.org/tpcc/.
  26. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, "Benchmarking Cloud Serving Systems with YCSB," in ACM Symp. on Cloud Computing (SoCC), 2010.
  27. A. Brown and D. A. Patterson, "Towards Availability Bench- marks: A Case Study of Software RAID Systems," in USENIX Technical Conf., 2000.
  28. M. Vieira and H. Madeira, "A Dependability Benchmark for OLTP Application Environments," in Int. Conf. on Very Large Data Vases (VLDB), 2003.
  29. J. Duraesa and et. al., "Dependability Benchmarking of Web- Servers," in Int. Conf. on Computer Safety, Reliability, and Security (SAFECOMP), 2004.
  30. S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, "The Hi- Bench Benchmark Suite: Characterization of the MapReduce- Based Data Analysis," in IEEE Int. Conf. on Data Engineering Workshops (ICDEW), 2010.
  31. K. Kim, K. Jeon, H. Han, S.-g. Kim, H. Jung, and H. Y. Yeom, "MRBench: A Benchmark for MapReduce Frame- work," in IEEE Int. Conf. on Parallel and Distributed Systems (ICPADS), 2008.
  32. A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker, "A Comparison of Ap- proaches to Large-Scale Data Analysis," in ACM SIGMOD In. Conf. on Management of Data (SIGMOD), 2009.
  33. "Gridmix3 Emulating Production Workload for Apache Hadoop." http://developer.yahoo.com/blogs/hadoop/posts/ 2010/04/gridmix3 emulating production/.
  34. Y. Chen, A. Ganapathi, R. Griffith, and R. Katz, "The Case for Evaluating MapReduce Performance Using Workload Suites," in IEEE Int. Symp. on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2011.