Academia.eduAcademia.edu

Outline

Evaluation of the Raw Microprocessor

2004, ACM SIGARCH Computer Architecture News

https://doi.org/10.1145/1028176.1006733

Abstract

This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a general-purpose architecture that performswell on a larger class of stream and embedded computing applicationsthan existing microprocessors, while still running existingILP-based sequential programs with reasonable performance in theface of increasing wire delays. Raw approaches this challenge byimplementing plenty of on-chip resources - including logic, wires,and pins - in a tiled arrangement, and exposing them through a newISA, so that the software can take advantage of these resources forparallel applications. Raw supports both ILP and streams by routingoperands between architecturally-exposed functional units overa point-to-point scalar operand network. This network offers lowlatency for scalar data transport. Raw manages the effect of wiredelays by exposing the interconnect and using software to orchestrateboth scalar and stream data transport.We have implemented a prototype Raw microprocessor...

References (51)

  1. V. Agarwal, et al. Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures. 2000 ISCA, pp. 248-259.
  2. E. Anderson, et al. LAPACK: A Portable Linear Algebra Library for High- Performance Computers. 1990 ICS, pp. 2-11.
  3. M. Annaratone, et al. The Warp Computer: Architecture, Implementation and Per- formance. IEEE Transactions on Computers 36, 12 (December 1987), pp. 1523- 1538.
  4. J. Babb, et al. The RAW Benchmark Suite: Computation Structures for General Purpose Computing. 1997 FCCM, pp. 134-143.
  5. R. Barua, et al. Maps: A Compiler-Managed Memory System for Raw Machines. 1999 ISCA, pp. 4-15.
  6. M. Bohr. Interconnect Scaling -The Real Limiter to High Performance ULSI. 1995 IEDM, pp. 241-244.
  7. D. Chinnery, et al. Closing the Gap Between ASIC & Custom. Kluwer Academic Publishers, 2002.
  8. K. Diefendorff. Intel Raises the Ante With P858. Microprocessor Report (January 1999), pp. 22-25.
  9. R. Espasa, et al. Tarantula: A Vector Extension to the Alpha Architecture. 2002 ISCA, pp. 281-292.
  10. S. Goldstein, et al. PipeRench: A Coprocessor for Streaming Multimedia Accel- eration. 1999 ISCA, pp. 28-39.
  11. M. I. Gordon, et al. A Stream Compiler for Communication-Exposed Architec- tures. 2002 ASPLOS, pp. 291-303.
  12. T. Gross, et al. iWarp, Anatomy of a Parallel Computing System. The MIT Press, Cambridge, MA, 1998.
  13. L. Gwennap. Coppermine Outruns Athlon. Microprocessor Report (October 1999), p. 1.
  14. J. R. Hauser, et al. Garp: A MIPS Processor with Reconfigurable Coprocessor. 1997 FCCM, pp. 12-21.
  15. R. Ho, et al. The Future of Wires. Proceedings of the IEEE 89, 4 (April 2001), pp. 490-504.
  16. H. Hoffmann, et al. Stream Algorithms and Architecture. Technical Memo MIT- LCS-TM-636, LCS, MIT, 2003.
  17. U. Kapasi, et al. The Imagine Stream Processor. 2002 ICCD, pp. 282-288.
  18. H.-S. Kim, et al. An ISA and Microarchitecture for Instruction Level Distributed Processing. 2002 ISCA, pp. 71-81.
  19. J. Kim, et al. Energy Characterization of a Tiled Architecture Processor with On- Chip Networks. 2003 ISLPED, pp. 424-427.
  20. A. KleinOsowski, et al. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research. Computer Architecture Let- ters 1 (June 2002).
  21. C. Kozyrakis, et al. A New Direction for Computer Architecture Research. IEEE Computer 30, 9 (September 1997), pp. 24-32.
  22. R. Krashinsky, et al. The Vector-Thread Architecture. 2004 ISCA.
  23. J. Kubiatowicz. Integrated Shared-Memory and Message-Passing Communication in the Alewife Multiprocessor. PhD thesis, MIT, 1998.
  24. W. Lee, et al. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. 1998 ASPLOS, pp. 46-54.
  25. W. Lee, et al. Convergent Scheduling. 2002 MICRO, pp. 111-122.
  26. D. Lenoski, et al. The Stanford DASH Multiprocessor. IEEE Computer 25, 3 (March 1992), pp. 63-79.
  27. R. Mahnkopf, et al. System on a Chip Technology Platform for .18 micron Digital, Mixed Signal & eDRAM applications. 1999 IEDM, pp. 849-852.
  28. K. Mai, et al. Smart Memories: A Modular Reconfigurable Architecture. 2000 ISCA, pp. 161-171.
  29. D. Matzke. Will Physical Scalability Sabotage Performance Gains? IEEE Com- puter 30, 9 (September 1997), pp. 37-39.
  30. J. McCalpin. STREAM: Sustainable Memory Bandwidth in High Perf. Comput- ers. http://www.cs.virginia.edu/stream.
  31. C. A. Moritz, et al. SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures. IEEE Transactions on Parallel and Distributed Systems (July 2001), pp. 730-742.
  32. S. Naffziger, et al. The Implementation of the Next-Generation 64b Itanium Mi- croprocessor. 2002 ISSCC, pp. 344-345, 472.
  33. R. Nagarajan, et al. A Design Space Evaluation of Grid Processor Architectures. 2001 MICRO, pp. 40-51.
  34. M. Narayanan, et al. Generating Permutation Instructions from a High-Level De- scription. TR UCB-CS-03-1287, UC Berkeley, 2003.
  35. M. Noakes, et al. The J-Machine Multicomputer: An Architectural Evaluation. 1993 ISCA, pp. 224-235.
  36. S. Palacharla. Complexity-Effective Superscalar Processors. PhD thesis, Univer- sity of Wisconsin-Madison, 1998.
  37. N. Rovedo, et al. Introducing IBM's First Copper Wiring Foundry Technology: Design, Development, and Qualification of CMOS 7SF, a .18 micron Dual-Oxide Technology for SRAM, ASICs, and Embedded DRAM. Q4 2000 IBM MicroNews, pp. 34-38.
  38. J. Sanchez, et al. Modulo Scheduling for a Fully-Distributed Clustered VLIW Ar- chitecture. 2000 MICRO, pp. 124-133.
  39. D. Shoemaker, et al. NuMesh: An Architecture Optimized for Scheduled Com- munication. Journal of Supercomputing 10, 3 (1996), pp. 285-302.
  40. G. Sohi, et al. Multiscalar Processors. 1995 ISCA, pp. 414-425.
  41. J. Suh, et al. A Performance Analysis of PIM, Stream Processing, and Tiled Pro- cessing on Memory-Intensive Signal Processing Kernels. 2003 ISCA, pp. 410- 419.
  42. M. B. Taylor. Deionizer: A Tool For Capturing And Embedding I/O Calls. Technical Memo, CSAIL/Laboratory for Computer Science, MIT, 2004. http://cag.csail.mit.edu/∼mtaylor/deionizer.html.
  43. M. B. Taylor. The Raw Processor Specification. Technical Memo, CSAIL/Laboratory for Computer Science, MIT, 2004.
  44. M. B. Taylor, et al. The Raw Microprocessor: A Computational Fabric for Soft- ware Circuits and General-Purpose Programs. IEEE Micro (Mar 2002), pp. 25-35.
  45. M. B. Taylor, et al. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. 2003 HPCA, pp. 341-353.
  46. M. B. Taylor, et al. Scalar Operand Networks: Design, Implementation, and Anal- ysis. Technical Memo, CSAIL/LCS, MIT, 2004.
  47. W. Thies, et al. StreamIt: A Language for Streaming Applications. 2002 Compiler Construction, pp. 179-196.
  48. E. Waingold, et al. Baring It All to Software: Raw Machines. IEEE Computer 30, 9 (September 1997), pp. 86-93.
  49. D. Wentzlaff. Architectural Implications of Bit-level Computation in Communi- cation Applications. Master's thesis, LCS, MIT, 2002.
  50. R. Whaley, et al. Automated Empirical Optimizations of Software and the ATLAS Project. Parallel Computing 27, 1-2 (2001), pp. 3-35.
  51. S. Yang, et al. A High Performance 180 nm Generation Logic Technology. 1998 IEDM, pp. 197-200.