Academia.eduAcademia.edu

Outline

Enhancing OpenMP Tasking Model: Performance and Portability

2021, OpenMP: Enabling Massive Node-Level Parallelism

https://doi.org/10.1007/978-3-030-85262-7_3

Abstract

OpenMP, as the de-facto standard programming model in symmetric multiprocessing for HPC, has seen its performance boosted continuously by the community, either through implementation enhancements or specification augmentations. Furthermore, the language has evolved from a prescriptive nature, as defined by the thread-centric model, to a descriptive behavior, as defined by the task-centric model. However, the overhead related to the orchestration of tasks is still relatively high. Applications exploiting very fine-grained parallelism and systems with a large number of cores available might fail on scaling. In this work, we propose to include the concept of Task Dependency Graph (TDG) in the specification by introducing a new clause, named taskgraph, attached to task or target directives. By design, the TDG allows alleviating the overhead associated with the OpenMP tasking model, and it also facilitates linking OpenMP with other programming models that support task parallelism. According to our experiments, a GCC implementation of the taskgraph is able to significantly reduce the execution time of fine-grained task applications and increase their scalability with regard to the number of threads.

References (20)

  1. BSC: Marenostrum IV User's Guide (2017), https://www.bsc.es/support/ MareNostrum4-ug.pdf
  2. Castello, A., Seo, S., Mayo, R., Balaji, P., Quintana-Orti, E.S., Pena, A.J.: GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations. Proceedings of the International Conference on Parallel Processing pp. 60-69 (2017)
  3. Gautier, T., Perez, C., Richard, J.: On the impact of OpenMP task granularity. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11128 LNCS, 205-221 (2018)
  4. Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., et al.: Quantum espresso: a modular and open-source software project for quantum simulations of materials. Journal of physics: Condensed matter 21(39), 395502 (2009)
  5. Kalray MPPA products: [Online] (2021), https://www.kalrayinc.com/
  6. Komatitsch, Dimitri and Tromp, Jeroen: SPECFEM3D Cartesian (2021), https: //github.com/geodynamics/specfem3d
  7. Kukanov, A., Voss, M.J.: The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks. Intel Technology Journal 11(4) (2007)
  8. Lagrone, J., Aribuki, A., Chapman, B.: A Set of Microbenchmarks for Mea- suring OpenMP Task Overheads. Proceedingis of International Conference on Parallel and Distributed Processing Techniques and Applications II, 594-600 (2011), http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1. 1.217.9615&rep=rep1&type=pdf
  9. Leiserson, C.E.: The Cilk++ concurrency platform. The Journal of Supercomputing 51(3), 244-257 (2010)
  10. Munera, A., Royuela, S., Quinones, E.: Towards a Qualifiable OpenMP Framework for Embedded Systems. Proceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020 (2), 903-908 (2020)
  11. Nvidia: CUDA Graph programming guide (2021), https://docs.nvidia.com/ cuda/cuda-c-programming-guide/#cuda-graphs
  12. Olivier, S.L., Prins, J.F.: Evaluating openmp 3.0 run time systems on unbalanced task graphs. In: International Workshop on OpenMP. pp. 63-78. Springer (2009)
  13. Perez, J.M., Beltran, V., Labarta, J., Ayguade, E.: Improving the Integration of Task Nesting and Dependencies in OpenMP. Proceedings -2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017 pp. 809-818 (2017)
  14. Sainz, F., Mateo, S., Beltran, V., Bosque, J.L., Martorell, X., Ayguadé, E.: Leveraging ompss to exploit hardware accelerators. In: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing. pp. 112-119. IEEE (2014)
  15. Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP Task Data Dependency Overhead Measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) Scaling OpenMP for Exascale Performance and Portability. pp. 156-168. Springer International Publishing, Cham (2017)
  16. Serrano, M.A., Melani, A., Vargas, R., Marongiu, A., Bertogna, M., Quiñones, E.: Timing characterization of OpenMP4 tasking model. 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2015 pp. 157-166 (2015)
  17. Stpiczyński, P.: Language-based vectorization and parallelization using intrinsics, openmp, tbb and cilk plus. The Journal of Supercomputing 74(4), 1461-1472 (2018)
  18. TOP500: (2020), https://www.top500.org/lists/top500/2020/11/
  19. Valero-Lara, P., Catalán, S., Martorell, X., Usui, T., Labarta, J.: slass: a fully automatic auto-tuned linear algebra library based on openmp extensions implemented in ompss (lass library). Journal of Parallel and Distributed Computing 138, 153-171 (2020)
  20. Yu, C., Royuela, S., Quiñones, E.: OpenMP to CUDA graphs: A compiler-based transformation to enhance the programmability of NVIDIA devices. In: Proceedings of the 23rd International Workshop on Software and Compilers for Embedded Systems, SCOPES 2020. pp. 42-47 (2020) All links were last followed on April 5, 2021.