Automatic scheduling for cache only memory architectures
2002
https://doi.org/10.1109/HICSS.1998.649258Abstract
For parallel and distributed systems to gain wider acceptance than they have to date, they must become significantly easier to program. Fundamentally, parallel programming is more difficult than sequential programming as long as data and computation must be distributed by the programmer. Cache Only Memory Architectures (COMAs) provide a Distributed Shared Memory (DSM) where data distribution is performed automatically and transparently. This paper generalizes this idea to achieve the same distribution for computation, thus arriving at an automatic and transparent form of scheduling. Where COMA literature normally makes no assumptions concerning the parallel programs which use the DSM, we use special compiler techniques originally developed for multithreaded and dataflow architectures. Having done so, we can specify ways of significantly simplifying the basic COMA coherency protocols, while at the same time enabling automatic, transparent, adaptive run-time scheduling.
References (18)
- G. S. Almasi and A. Gottlieb. Kendall Square Research KSR1. In Highly Parallel Computing, section 10.3.3, pages 549-553. Benjamin/Cummings Publishing Company, second edition, 1994.
- M. Annavaram and W. A. Najjar. Comparison of two stor- age models in data-driven multithreaded architectures. In Eighth IEEE Symposium on Parallel and Distributed Pro- cessing (SPDP) [12], pages 122-129.
- Arvind, L. Bic, and T. Ungerer. Evolution of dataflow com- puters. In Advanced Topics in Data-Flow Computing. Pren- tice Hall, 1991.
- D. Culler, A. Sah, K. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hard- ware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Sup- port for Programming Languages and Operating Systems, Santa-Clara, CA, Apr. 1991. (Also available as Technical Report UCB/CSD 91/594, CS Div., University of California at Berkeley).
- D. E. Culler, S. C. Goldstein, K. E. Schauser, and T. von Eicken. TAM -A compiler controlled Threaded Abstract Machine. In Journal of Parallel and Distributed Computing, Special Issue on Dataflow, June 1993.
- J. B. Dennis. First version of a data flow procedure language. In Lecture Notes in Computer Science, volume 19. Springer Verlag, 1974.
- S. C. Goldstein, K. E. Schauser, and D. Culler. En- abling primitives for compiling parallel languages. In Lan- guages, Compilers and Run-Time Systems for Scalable Sys- tems, pages 153-168. Kuwer Academic Press, 1996.
- D. E. Gregory, L. Gao, A. L. Rosenberg, and P. R. Cohen. An empirical study of dynamic scheduling on rings of proces- sors. In Eighth IEEE Symposium on Parallel and Distributed Processing (SPDP) [12], pages 470-473.
- E. Hagersten. Toward Scalable Cache Only Memory Ar- chitectures, 2nd Edition. PhD thesis, Swedish Institute of Computer Science, Royal Institue of Technology, Stockholm, Sweden, July 1993.
- E. Hagersten, A. Landin, and S. Haridi. DDM -A Cache- Only Memory Architecture. IEEE Computer, 25(9), 1992.
- S. Haridi and E. Hagersten. The cache coherence protocol of the Data Diffusion Machine. In Proceddings of the PARLE 89, volume 1, pages 1-18. Springer-Verlag, 1989.
- IEEE. Eighth IEEE Symposium on Parallel and Distributed Processing (SPDP), New Orleans, LA, Oct. 1996. IEEE Computer Society Press.
- R. Moore, B. Klauer, and K. Waldschmidt. A combined vir- tual shared memory and network which schedules. In In- ternational Conference on Parallel and Distributed Systems (Euro-PDS '97), Barceona, Spain, June 1997.
- H. L. Mueller, P. W. A. Stallard, and D. H. D. Warren. Hiding miss latencies with multithreading on the Data Diffusion Ma- chine. In Proceedings of the 1995 International Conference on Parallel Processing, ICPP'95, volume 1, pages 178-185, Oconomowoc, WI, Aug. 1995. CRC Press.
- R. S. Nikhil. A multithreaded implementation of Id using P-RISC graphs. In Proceedings of the Sixth Annual Work- shop on Languages and Compilers for Parallel Computing, pages 390-405, Portland, Oregon, Aug. 1993. Springer Ver- lag LNCS 768.
- A. Saulsbury, T. Wilkinson, J. Carter, and A. Landin. An argument for simple COMA. In First IEEE Symposium on High Performance Computer Architecture, pages 276-285, Rayleigh, North Carolina, Jan. 1995.
- P. Stenström. A survey of cache coherence schemes for mul- tiprocessors. IEEE Computer, 23(6):12-24, 1990.
- T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active messages: a mechanism for integrated com- munication and computation. In Proc. of the 19th Interna- tional Symposium on Computer Architecture, Gold Coast, Australia, May 1992. (Also available as Technical Re- port UCB/CSD 92/675, CS Div., University of California at Berkeley).