Academia.eduAcademia.edu

Outline

Efficient System-Enforced Deterministic Parallelism

2010, arXiv (Cornell University)

https://doi.org/10.48550/ARXIV.1005.3450

Abstract

Deterministic execution offers many benefits for debugging, fault tolerance, and security. Running parallel programs deterministically is usually difficult and costly, however-especially if we desire system-enforced determinism, ensuring precise repeatability of arbitrarily buggy or malicious software. Determinator is a novel operating system that enforces determinism on both multithreaded and multi-process computations. Determinator's kernel provides only single-threaded, "sharednothing" address spaces interacting via deterministic synchronization. An untrusted user-level runtime uses distributed computing techniques to emulate familiar abstractions such as Unix processes, file systems, and shared memory multithreading. The system runs parallel applications deterministically both on multicore PCs and across nodes in a cluster. Coarse-grained parallel benchmarks perform and scale comparably to-sometimes better than-conventional systems, though determinism is costly for fine-grained parallel applications.

References (56)

  1. PA-RISC 1.1 Architecture and Instruction Set Ref- erence Manual. Hewlett-Packard, third edition, Feb. 1994.
  2. C. Amza et al. TreadMarks: Shared memory com- puting on networks of workstations. IEEE Com- puter, 29(2):18-28, Feb. 1996.
  3. C. Artho, K. Havelund, and A. Biere. High-level data races. In VVEIS, pages 82-93, Apr. 2003.
  4. A. Aviram and B. Ford. Determinating timing chan- nels in statistically multiplexed clouds, Mar. 2010. http://arxiv.org/abs/1003.5303.
  5. A. Aviram and B. Ford. Determinis- tic consistency: A programming model for shared memory parallelism, Feb. 2010. http://arxiv.org/abs/0912.0926.
  6. F. Bellard. QEMU, a fast and portable dynamic translator, Apr. 2005.
  7. M. Beltrametti, K. Bobey, and J. R. Zorbas. The control mechanism for the Myrias parallel computer system. Computer Architecture News, 16(4):21-30, Sept. 1988.
  8. T. Bergan, Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In 15th ASPLOS, Mar. 2010.
  9. E. D. Berger, T. Yang, T. Liu, and G. No- vark. Grace: Safe multithreaded programming for C/C++. In OOPSLA, Oct. 2009.
  10. B. N. Bershad et al. Extensibility, safety and per- formance in the SPIN operating system. In 15th SOSP, 1995.
  11. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In 17th International Conference on Parallel Architectures and Compi- lation Techniques, October 2008.
  12. O. A. R. Board. OpenMP application pro- gram interface version 3.0, May 2008. http://www.openmp.org/mp-documents/spec30.pdf.
  13. R. L. Bocchino Jr., V. S. Adve, S. V. Adve, and M. Snir. Parallel programming must be determinis- tic by default. In 1st HotPar. Mar. 2009.
  14. R. L. Bocchino Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Sim- mons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. Oct. 2009. http://dpj.cs.uiuc.edu/DPJ/Publications_files/
  15. T. C. Bressoud and F. B. Schneider. Hypervisor- based fault-tolerance. TOCS, 14(1):80-107, Feb. 1996.
  16. J. Burnim and K. Sen. Asserting and checking determinism for multithreaded programs. In FSE, Aug. 2009.
  17. J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Im- plementation and performance of munin. In 13th SOSP, Oct. 1991.
  18. M. Castro and B. Liskov. Practical byzantine fault tolerance. In 3rd OSDI, pages 173-186, Feb. 1999.
  19. T. Chiueh, G. Venkitachalam, and P. Pradhan. In- tegrating segmentation and paging protection for safe, efficient and transparent software extensions. In 17th SOSP, pages 140-153, Dec. 1999.
  20. J.-D. Choi and H. Srinivasan. Deterministic replay of Java multithreaded applications. In SPDT '98: Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, pages 48-59. 1998.
  21. R. S. Curtis and L. D. Wittie. BugNet: A debugging system for parallel programming environments. In 3rd ICDCS, pages 394-400, Oct. 1982.
  22. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic shared memory multiprocessing. In 14th ASPLOS, Mar. 2009.
  23. G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling intrusion analy- sis through virtual-machine logging and replay. In 5th OSDI, Dec. 2002.
  24. G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay for multiproces- sor virtual machines. In VEE, Mar. 2008.
  25. S. A. Edwards and O. Tardieu. Shim: A determin- istic model for heterogeneous embedded systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(8):854-867, Aug. 2006.
  26. S. A. Edwards, N. Vasudevan, and O. Tardieu. Pro- gramming shared memory multiprocessors with de- terministic message-passing concurrency: Compil- ing SHIM to Pthreads. In DATE, Mar. 2008.
  27. D. Engler and K. Ashcraft. RacerX: effective, static detection of race conditions and deadlocks. In 19th SOSP, Oct. 2003.
  28. S. I. Feldman and C. B. Brown. IGOR: A sys- tem for program debugging via reversible execu- tion. In Workshop on Parallel & Distributed De- bugging, pages 112-123, May 1988.
  29. B. Ford, M. Hibler, J. Lepreau, P. Tullmann, G. Back, and S. Clawson. Microkernels meet re- cursive virtual machines. In 2nd OSDI, pages 137- 151, 1996.
  30. T. Garfinkel, K. Adams, A. Warfield, and J. Franklin. Compatibility is not transparency: VMM detection myths and realities. In HotOS-XI, May 2007.
  31. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gib- bons, A. Gupta, and J. Hennessy. Memory con- sistency and event ordering in scalable shared- memory multiprocessors. In 17th ISCA, pages 15- 26, May 1990.
  32. I. Goldberg, D. Wagner, R. Thomas, and E. A. Brewer. A secure environment for untrusted helper applications. In 6th USENIX Security Symposium, 1996.
  33. A. Haeberlen, P. Kouznetsov, and P. Druschel. PeerReview: Practical accountability for dis- tributed systems. In 21st SOSP, Oct. 2007.
  34. R. H. Halstead, Jr. Multilisp: A language for con- current symbolic computation. TOPLAS, 7(4):501- 538, Oct. 1985.
  35. M. Herlihy and J. E. B. Moss. Transactional mem- ory: Architectural support for lock-free data struc- tures. In 20th ISCA, pages 289-300, May 1993.
  36. A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen. Detecting past and present intrusions through vulnerability-specific predicates. In SOSP '05: Proceedings of the twentieth ACM sympo- sium on Operating systems principles, pages 91- 104. 2005.
  37. F. Kaashoek et al. 6.828: Operating system engineering. http://pdos.csail.mit.edu/6.828/.
  38. G. Kahn. The semantics of a simple language for parallel programming. In Information Processing, pages 471-475. 1974.
  39. P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy release consistency for software distributed shared memory. In ISCA, pages 13-21, May 1992.
  40. S. T. King, G. W. Dunlap, and P. M. Chen. Debug- ging operating systems with time-traveling virtual machines. In USENIX, pages 1-15, Apr. 2005.
  41. L. Lamport. How to make a multiproces- sor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 28(9):690-691, Sept. 1979.
  42. T. J. Leblanc and J. M. Mellor-Crummey. De- bugging parallel programs with instant replay. IEEE Transactions on Computers, C-36(4):471- 482, Apr. 1987.
  43. E. Lee. The problem with threads. Computer, 39(5):33-42, May 2006.
  44. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes -a comprehensive study on real world concurrency bug characteristics. In 13th ASPLOS, pages 329-339, Mar. 2008.
  45. M. Musuvathi, S. Qadeer, T. Ball, and G. Basler. Finding and reproducing heisenbugs in concurrent programs. In Proceedings of the 8th USENIX Sym- posium on Operating System Design and Imple- mentation (OSDI '08), pages 267-280. 2008.
  46. D. Z. Pan and M. A. Linton. Supporting reverse execution of parallel programs. In PADD '88, pages 124-129. 1988.
  47. D. S. Parker, Jr. et al. Detection of mutual inconsis- tency in distributed systems. IEEE Transactions on Software Engineering, SE-9(3), May 1983.
  48. C. Sadowski, S. N. Freund, and C. Flanagan. Sin- gleTrack: A dynamic determinism checker for mul- tithreaded programs. In 18th ESOP, Mar. 2009.
  49. F. B. Schneider. Implementing fault-tolerant ser- vices using the state machine approach: A tutorial. Technical Report 86-800, Cornell University, Jan. 1990.
  50. J. T. Schwartz. The burroughs FMP machine, Jan. 1980. Ultracomputer Note #5.
  51. N. Shavit and D. Touitou. Software transactional memory. Distributed Computing, 10(2):99-116, Feb. 1997.
  52. O. Tardieu and S. A. Edwards. Scheduling- independent threads and exceptions in SHIM. In EMSOFT, pages 142-151, Oct. 2006.
  53. D. B. Terry et al. Managing update conflicts in Bayou, a weakly connected replicated storage sys- tem. In 15th SOSP, 1995.
  54. R. von Behren, J. Condit, F. Zhou, G. C. Necula, and E. Brewer. Capriccio: Scalable threads for in- ternet services. In SOSP'03.
  55. B. Walker, G. Popek, R. English, C. Kline, and G. Thiel. The LOCUS distributed operating sys- tem. SIGOPS Operating Systems Review, 17(5), Oct. 1983.
  56. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characteri- zation and methodological considerations. In 22nd ISCA, pages 24-36, June 1995.