Multi-threaded library for many-core systems
2009, 2009 IEEE International Symposium on Parallel & Distributed Processing
https://doi.org/10.1109/IPDPS.2009.5161104Abstract
Multi-core processors are here. Many-core nodes and processors will soon be here. We believe that effective utilization of these systems will require lightweight threading. MAESTRO is a prototype runtime designed to provide simple, very light threads and synchronization between those threads on modern commodity (x86) hardware. The runtime, eventually, should be able to monitor hardware and application scheduling state in real time, dynamically analyze that information, and control both the hardware and scheduler based upon its analysis.
References (23)
- National Aeronautics and Space Administration. NAS parallel benchmarks changes. http://www.nas.nasa.gov/Resources/Software/npb changes.htm.
- George Almasi, Calin Cascaval, Jose G. Castanos, Monty Denneau, Derek Lieber, Jose E. Moreira, and Henry S. Warren. Dissecting cyclops: a detailed analysis of a multithreaded architecture. SIGARCH Comput. Archit. News, 31:2003, 2003.
- Gail A. Alverson, Robert Alverson, David Callahan, Brian Koblenz, Allan Porterfield, and Burton J. Smith. Exploiting heterogeneous parallelism on a multithreaded multiprocessor. In ICS '92: Proceedings of the 6th ACM International Conference on Supercomputing, pages 188-197, Washington, DC, USA, July 1992. ACM.
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks. Technical report, The International Journal of Supercomputer Applications, 1991.
- Robert S. Barton. Functional design of computers. Commun. ACM, 4(9):405, 1961.
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Journal of Parallel and Distributed Computing, pages 207-216, 1995.
- Juan del Cuvillo, Weirong Zhu, Ziang Hu, and Guang R. Gao. TiNy threads: A thread virtual machine for the Cyclops64 cellular architecture. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) -Workshop 14, page 265.2, Washington, DC, USA, 2005. IEEE Computer Society.
- Message Passing Forum. MPI: a message passing interface, 1993.
- Message Passing Forum. MPI-2: extensions to the message-passing interface, 1996.
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multi- threaded language. pages 212-223, 1998.
- Robert H. Halstead. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7:501-538, 1985.
- E. A. Hauck and B. A. Dent. Burroughs' B6500/B7500 stack mechanism. In AFIPS '68 (Spring): Proceedings of the April 30-May 2, 1968, spring joint computer conference, pages 245-251, New York, NY, USA, 1968. ACM.
- Hoel Hruska. Amd talks shangai performance, features,roadmap to 2010. http://arstechnica.com/news.ars/post/20080507-aray-of-sunshine-amd-talks-shanghai-performance- roadmap.html.
- L. V. Kal'e, M. Bh, R. Brunner, N. Krawetz, and J. Phillips. Namd: A case study in multilingual parallel programming. In In Proc. 10th International Workshop on Languages and Compilers for Parallel Computing, pages 367-381. Springer-Verlag, 1997.
- Laxmikant V. Kale and Sanjeev Krishnan. Charm++: a portable concurrent object oriented system based on c++. SIGPLAN Not., 28(10):91-108, 1993.
- Peter J. Keleher, Alan L. Cox, and Willy Zwaenepoel. Lazy release consistency for software distributed shared memory. In ISCA, pages 13-21, 1992.
- John McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December, 1995.
- OpenMP Architecture Review Board. OpenMP fortran application program interface version 1.0, Oc- tober 1997.
- OpenMP Architecture Review Board. OpenMP application program interface version 3.0, May 2008.
- Elliott I. Organick. Computer System Organization: the B5700/6700 Series. Academic Press, 1973.
- L. Peng, W. F. Wong, M. D. Feng, and C. K. Yuen. Silkroad: A multithreaded runtime system with software distributed shared memory for smp clusters. In In IEEE International Conference on Cluster Computing (Cluster2000), pages 243-249, 2000.
- Allan Porterfield, Rob Fowler, Anirban Mandel, and Min Yeol Lim. Empirical evaluation of multi-socket, multi-core memory implementations. Technical Report RENCI Technical Report TR-09-, Renaissance Computing Institute, 2009.
- Brian S. White, Sally A. McKee, Bronis R. de Supinski, Brian Miller, Daniel J. Quinlan, and Martin Schulz. Improving the computational intensity of unstructured mesh applications. In Arvind and Larry Rudolph, editors, Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, Cambridge, Massachusetts, USA, pages 341-350. ACM, 2005.