Academia.eduAcademia.edu

Outline

Analysis of Threading Libraries for High Performance Computing

IEEE Transactions on Computers

https://doi.org/10.1109/TC.2020.2970706

Abstract

With the appearance of multi-/many core machines, applications and runtime systems evolved in order to exploit the new on-node concurrency brought by new software paradigms. POSIX threads (Pthreads) was widely-adopted for that purpose and it remains as the most used threading solution in current hardware. Lightweight thread (LWT) libraries emerged as an alternative offering lighter mechanisms to tackle the massive concurrency of current hardware. In this paper, we analyze in detail the most representative threading libraries including Pthread-and LWT-based solutions. In addition, to examine the suitability of LWTs for different use cases, we develop a set of microbenchmarks consisting of OpenMP patterns commonly found in current parallel codes, and we compare the results using threading libraries and OpenMP implementations. Moreover, we study the semantics offered by threading libraries in order to expose the similarities among different LWT application programming interfaces and their advantages over Pthreads. This study exposes that LWT libraries outperform solutions based on operating system threads when tasks and nested parallelism are required.

FAQs

sparkles

AI

What performance attributes distinguish LWT libraries like Argobots from traditional OS threads?add

The study finds that Argobots' ULTs achieve up to 12 times faster performance compared to Pthreads under high contention due to reduced management overheads.

How do lightweight thread libraries address the overhead of conventional OS thread management?add

The analysis reveals that LWTs minimize OS involvement by managing threads in user space, resulting in lower context-switching costs and less overhead during thread creation.

What are the benefits of using the Generic Lightweight Threads (GLT) API?add

The paper demonstrates that the GLT API unifies various LWT libraries, improving portability and consistency, thus facilitating easier migration across different LWT implementations.

How does the performance of OpenMP implementations compare with LWT libraries?add

In experimental scenarios, the OpenMP implementations are found to be up to 25 times slower than LWTs like Argobots in nested parallel structures due to overhead from OS thread management.

What practical challenges hinder the adoption of lightweight thread libraries in HPC?add

The lack of standardization and the complexity involved in transitioning from Pthreads to LWT libraries are significant barriers to their widespread adoption in high-performance computing.

References (34)

  1. H. Fu, J. Liao, J. Yang, L. Wang, Z. Song, X. Huang, C. Yang, W. Xue, F. Liu, F. Qiao, W. Zhao, X. Yin, C. Hou, C. Zhang, W. Ge, J. Zhang, Y. Wang, C. Zhou, and G. Yang, "The Sunway TaihuLight supercomputer: System and applications," Science China Information Sciences, vol. 59, no. 7, p. 072001, 2016.
  2. "TOP500 Supercomputer Sites," http://www.top500.org/.
  3. B. Nichols, D. Buttlar, and J. Farrell, Pthreads programming: A POSIX standard for better multiprocessing. " O'Reilly Media, Inc.", 1996.
  4. "OpenMP 4.5 specification," www.openmp.org/.
  5. D. Stein and D. Shah, "Implementing lightweight threads," in USENIX Summer, 1992.
  6. "GNU C Library," www.gnu.org/software/libc/, Accessed July 2016.
  7. S. Seo, A. Amer, P. Balaji, C. Bordage, G. Bosilca, A. Brooks, P. Carns, A. Castello, D. Genet, T. Herault, S. Iwasaki, P. Jindal, S. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, Y. Sun, K. Taura, and P. Beckman, "Argobots: A lightweight low-level threading and tasking framework," IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 3, pp. 512-526, 2017.
  8. Microsoft MSDN Library, "Fibers," https://msdn.microsoft.com/ en-us/library/ms682661.aspx.
  9. "Programming with Solaris Threads," https://docs.oracle.com/.
  10. J. d. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao, "TiNy threads: A thread virtual machine for the Cyclops64 cellular architecture," in Fifth Workshop on Massively Parallel Processing, April 2005.
  11. R. von Behren, J. Condit, F. Zhou, G. C. Necula, and E. Brewer, "Capriccio: Scalable threads for Internet services," in Proc. 19th ACM Symposium on Operating Systems Principles, 2003, pp. 268-281.
  12. L. V. Kale, M. A. Bhandarkar, N. Jagathesan, S. Krishnan, and J. Yelon, "Converse: An interoperable framework for parallel pro- gramming," in the 10th Int. Parallel Processing Symposium, 1996, pp. 212-217.
  13. L. V. Kale, J. Yelon, and T. Knuff, "Threads for interoperable parallel programming," in Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing, 1996, pp. 534-552.
  14. L. V. Kale and S. Krishnan, CHARM++: A portable concurrent object oriented system based on C++. ACM, 1993, vol. 28, no. 10.
  15. BSC, "Nanos++," https://pm.bsc.es/projects/nanox/.
  16. A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell, X. Mar- torell, and J. Planas, "OmpSs: A proposal for programming heteroge- neous multi-core architectures," Parallel Processing Letters, vol. 21, no. 02, pp. 173-193, 2011.
  17. "GNU Pth -The GNU Portable Threads," http://www.gnu.org.
  18. K. Taura, K. Tabata, and A. Yonezawa, "StackThreads/MP: Integrating futures into calling standards," in the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1999, pp. 60-71.
  19. A. Dunkels, O. Schmidt, T. Voigt, and M. Ali, "Protothreads: Sim- plifying event-driven programming of memory-constrained embedded systems," in Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, 2006, pp. 29-42.
  20. M. Pérache, H. Jourdren, and R. Namyst, "MPC: A unified parallel runtime for clusters of NUMA machines," in 14th International Euro- Par Conference on Parallel Processing, 2008, pp. 78-88.
  21. J. Nakashima and K. Taura, "MassiveThreads: A thread library for high productivity languages," in Concurrent Objects and Beyond. Springer Berlin Heidelberg, 2014, vol. 8665, pp. 222-238.
  22. K. B. Wheeler, R. C. Murphy, and D. Thain, "Qthreads: An API for programming with millions of lightweight threads," in the Workshop on Multithreaded Architectures and Applications, April 2008.
  23. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: An efficient multithreaded runtime system," Journal of Parallel and Distributed Computing, vol. 37, no. 1, pp. 55-69, 1996.
  24. C. Pheatt, "Intel R threading building blocks," Journal of Computing Sciences in Colleges, vol. 23, no. 4, pp. 298-298, 2008.
  25. F. Schmager, N. Cameron, and J. Noble, "Gohotdraw: Evaluating the go programming language with design patterns," in Evaluation and Usability of Programming Languages and Tools. ACM, 2010, p. 10.
  26. "Stackless Python," http://www.stackless.com.
  27. A. Castelló, S. Seo, R. Mayo, P. Balaji, E. S. Quintana-Ortí, and A. J. Peña, "GLT: A unified API for lightweight thread libraries," in Int. European Conf. on Parallel and Distributed Computing, Spain, 2017.
  28. A. Castelló, R. Mayo, K. Sala, V. Beltran, P. Balaji, and A. J. Peña, "On the adequacy of lightweight thread approaches for high-level parallel programming models," Future Generation Computer Systems, vol. 84, pp. 22-31, 2018.
  29. A. Castelló, S. Seo, R. Mayo, P. Balaji, E. S. Quintana-Ortí, and A. J. Peña, "GLTO: On the adequacy of lightweight thread approaches for OpenMP implementations," in Proceedings of the International Conference on Parallel Processing, Bristol, UK, August 2017.
  30. A. Castelló, A. J. Peña, S. Seo, R. Mayo, P. Balaji, and E. S. Quintana- Ortí, "A review of lightweight thread approaches for high performance computing," in Proceedings of the IEEE International Conference on Cluster Computing, Taipei, Taiwan, September 2016.
  31. "Pthreads API," http://www.cs.wm.edu/wmpthreads.html.
  32. "GNU Portable Threads," www.gnu.org, Accessed July 2016.
  33. R. M. Badia, J. Conejero, C. Diaz, J. Ejarque, D. Lezzi, F. Lordan, C. Ramon-Cortes, and R. Sirvent, "Comp superscalar, an interoperable programming framework," SoftwareX, vol. 3, pp. 32-36, 2015.
  34. G. Tagliavini, D. Cesarini, and A. Marongiu, "Unleashing fine-grained parallelism on embedded many-core accelerators with lightweight openmp tasking," IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 9, pp. 2150-2163, 2018.