A Performance Analysis of Vector Length Agnostic Code
2019 International Conference on High Performance Computing & Simulation (HPCS)
https://doi.org/10.1109/HPCS48598.2019.9188238Abstract
Vector extensions are a popular mean to exploit data parallelism in applications. Over recent years, the most commonly used extensions have been growing in vector length and amount of vector instructions. However, code portability remains a problem when speaking about a compute continuum. Hence, vector length agnostic (VLA) architectures have been proposed for the future generations of ARM and RISC-V processors. With these architectures, code is vectorized independently of the vector length of the target hardware platform. It is therefore possible to tune software to a generic vector length. To understand the performance impact of VLA code compared to vector length specific code, we analyze the current capabilities of code generation for ARM's SVE architecture. Our experiments show that VLA code reaches about 90% of the performance of vector length specific code, i.e. a 10% overhead is inferred due to global predication of instructions. Furthermore, we show that code performance is not increasing proportionally with increasing vector lengths due to the higher memory demands.
References (13)
- B. Juurlink, D. Tcheressiz, S. Vassiliadis, and H. A. Wijshoff, "Imple- mentation and Evaluation of the Complex Streamed Instruction Set," in Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pp. 73-82, IEEE, 2001.
- N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell, G. Magklis, A. Martinez, N. Premillieu, et al., "The ARM Scalable Vector Extension," IEEE Micro, vol. 37, no. 2, pp. 26-39, 2017.
- "RISC-V Vector Extension Proposal." https://riscv.org/wp-content/ uploads/2015/06/riscv-vector-workshop-june2015.pdf. Accessed: 2019- 04-05.
- T. Yoshida, "Fujitsu high performance CPU for the Post-K Computer," in Hot Chips 30 Symposium (HCS), Series Hot Chips, vol. 18, 2018.
- A. Rico, J. A. Joao, C. Adeniyi-Jones, and E. Van Hensbergen, "ARM HPC Ecosystem and the Reemergence of Vectors," in Proceedings of the Computing Frontiers Conference, pp. 329-334, ACM, 2017.
- Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee, and M. Sato, "Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths," in 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 677-684, IEEE, 2017.
- N. Meyer, P. Georg, D. Pleiter, S. Solbrig, and T. Wettig, "SVE-Enabling Lattice QCD Codes," in 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 623-628, IEEE, 2018.
- A. Armejach Sanosa, H. Caminal Pallarés, J. M. Cebrián González, R. González-Alberquilla, C. Adeniyi-Jones, M. Valero Cortés, M. Casas, and M. Moreto Planas, "Stencil codes on a vector length agnostic architecture," in Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques: Limassol, Cyprus, November 01-04, 2018, pp. 1-12, Association for Computing Machinery (ACM), 2018.
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, et al., "The gem5 Simulator," ACM SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, 2011.
- "gem5 SVE Branch." https://gem5.googlesource.com/arm/gem5/+/sve/ beta1. Accessed: 2019-04-05.
- S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua, "An Evaluation of Vectorizing Compilers," in Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT '11, pp. 372-382, IEEE Computer Society, 2011.
- "TSVC Benchmark Sources." http://polaris.cs.uiuc.edu/ ∼ maleki1/ TSVC.tar.gz. Accessed: 2019-05-06.
- A. Pohl, B. Cosenza, and B. Juurlink, "Control Flow Vectorization for ARM NEON," in Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems, pp. 66-75, ACM, 2018.