Investigating Performance Benefits from OpenACC Kernel Directives
Parallel Computing: Accelerating Computational Science and Engineering (CSE), Mar 1, 2014
OpenACC is a high-level programming model that uses directives for offloading computation to acce... more OpenACC is a high-level programming model that uses directives for offloading computation to accelerators. This paper explores the benefit of using OpenACC performance tuning directives to manually specify GPU scheduling, versus the scheduling OpenACC applies by default. We performed manual scheduling using gang and vector clauses in a directive, and applied to matrix-matrix multiply and Classical Gram-Schmidt orthonormalisation test cases. We then tested using the NVIDIA M2090 and K20 GPGPUs, in conjunction with both the PGI and CAPS implementations of OpenACC. The speedup realised by tuning the gang and vector values ranged from 1.0 to 3.1 in the test cases examined. This shows that the gang and vector values have a large impact on performance, and in some cases the compilers are able to automatically select ideal gang and vector values.
Uploads
Papers by Benjamin Eagan