Academia.eduAcademia.edu

Outline

Expressing pipeline parallelism using TBB constructs

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11 - SPLASH '11 Workshops

https://doi.org/10.1145/2095050.2095074

Abstract

Task-based libraries such as Intel's Threading Building Blocks (TBB) provide higher levels of abstraction than threads for parallel programming. Work remains, however, to determine how straightforward it is to use these libraries to express various patterns of parallelism. This case study focuses on a particular pattern: pipeline parallelism. We attempted to transform three representative applicationscontent-based image retrieval, compression and video encoding-to pipelines using TBB. We successfully converted two of the three applications. In the successful cases we discuss our transformation process and contrast the expressivity and performance of our implementations to existing Pthreads versions; in the unsuccessful case, we detail what the challenges were and propose potential solutions.

FAQs

sparkles

AI

What challenges were encountered when transforming applications to TBB pipelines?add

The study found that manual analysis of dependencies between pipeline stages was the most time-consuming part of the transformation process. Developers faced additional challenges due to TBB's structural restrictions, requiring non-obvious code modifications.

How did performance metrics compare between TBB and Pthreads implementations?add

The converted dedup application executed up to 2.13 times faster than its Pthreads counterpart, while the ferret application performed comparably. This indicates that TBB can be a performance-competitive alternative to Pthreads.

What insights were gained regarding boilerplate code reduction in TBB?add

The TBB implementation of ferret reduced lines of code from 437 SLOC to 376 SLOC by minimizing boilerplate code necessary for thread management. This reduction facilitates easier maintenance and understandability of the codebase.

What implications arise from the use of nested pipelines in the dedup application?add

Nested pipelines in dedup improved parallelism despite causing increased memory usage and processing overhead. This approach helped ensure that all tokens were processed, effectively compensating for the performance costs observed.

What limitations were identified in the TBB implementation for x264?add

The x264 pipeline struggled with enforcing frame dependencies, leading to challenges in ensuring correct task execution order. This demonstrated that TBB's pipeline construct might not be well-suited for all forms of application parallelization.

References (19)

  1. Intel Threading Building Blocks. http://www. threadingbuildingblocks.org/.
  2. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec bench- mark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton University, January 2008.
  3. H.-J. Boehm. Threads Cannot Be Implemented As a Library. In PLDI '05.
  4. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: stream com- puting on graphics hardware. In SIGGRAPH '04.
  5. C. Campbell, R. Johnson, A. Miller, and S. Toub. Parallel Programming with Microsoft .NET. Microsoft Press, 2010.
  6. A. J. Dios, R. Asenjo, A. Navarro, F. Corbera, and E. L. Zapata. Wavefront template implementation based on the task programming model. Technical report, University of Malaga, 2011.
  7. H. Hoffman, A. Agarwal, and S. Devadas. Partitioning Strate- gies: Spatiotemporal Patterns of Program Decomposition. In ICPADS '10.
  8. U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream proces- sors. Computer, 36:54-62, August 2003.
  9. E. A. Lee. The Problem with Threads. Computer, 39:33-42, May 2006.
  10. D. Leijen, W. Schulte, and S. Burckhardt. The Design of a Task Parallel Library. In OOPSLA '09.
  11. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Ferret: a toolkit for content-based similarity search of feature-rich data. In EuroSys '06.
  12. S. MacDonald, D. Szafron, and J. Schaeffer. Rethinking the Pipeline as Object-oriented States with Transformations. In HIPS '04.
  13. T. G. Mattson, B. A. Sanders, and B. L. Massingill. Pat- terns for Parallel Programming. Addison-Wesley Profes- sional, 2004.
  14. A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. Analytical Modeling of Pipeline Parallelism. In PACT '09.
  15. I. E. Richardson. The H.264 Advanced Video Compression Standard. Wiley, 2010.
  16. S. Rul, H. Vandierendonck, and K. De Bosschere. A profile- based tool for finding pipeline parallelism in sequential pro- grams. Parallel Comput., 36:531-551, September 2010.
  17. W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In PACT '10.
  18. W. Thies, V. Chandrasekhar, and S. Amarasinghe. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In MICRO '07, .
  19. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC '02. Springer- Verlag, .