Academia.eduAcademia.edu

Outline

Design space exploration for real-time embedded stream processors

2004, IEEE Micro

https://doi.org/10.1109/MM.2004.25

Abstract

We present a design framework for rapidly exploring the design space for stream processors in realtime embedded systems. Stream processors enable hundreds of arithmetic units in programmable processors by using clusters of functional units. However, to meet a certain real-time requirement for an embedded system, there is a trade-off between the number of arithmetic units in a cluster, number of clusters and the clock frequency as each solution meets real-time with a different power consumption. We have developed a design exploration tool that explores this trade-off and presents a heuristic that minimizes the power consumption in the (functional units, clusters, frequency) design space. Our design methodology relates the instruction level parallelism, subword parallelism and data parallelism to the organization of the functional units in an embedded stream processor. We show that the power minimization methodology also provides insights into the functional unit utilization of the processor. The design exploration tool exploits the static nature of signal processing workloads, providing an extremely fast design space exploration and provides an initial lower bound estimate of the real-time performance of the embedded processor. A sensitivity analysis of the design tool results to the technology and modeling also enables the designer to check the robustness of the design exploration.

References (19)

  1. U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. IEEE Computer, 36(8):54-62, August 2003.
  2. D. Marculescu and A. Iyer. Application-driven processor design exploration for power-performance trade-off analysis. In IEEE International Conference on Computer-Aided Design (ICCAD), pages 306-313, San Jose, CA, November 2001.
  3. P. Mattson, W. J. Dally, S. Rixner, U. J. Kapasi, and J. D. Owens. Communication Scheduling. In 9th international conference on Architectural support for programming languages and operating systems(ASPLOS), volume 35, pages 82-92, Cambridge, MA, November 2000.
  4. R. Leupers. Instruction Scheduling for clustered VLIW DSPs. In International Conference on Parallel Architectures and Compilation Techniques (PACT'00) , pages 291-300, Philadelphia, PA, October 2000.
  5. S. Agarwala et al. A 600 MHz VLIW DSP. In IEEE International Solid-State Circuits Conference, volume 1, pages 56-57, San Fransisco, CA, February 2002.
  6. S. Rajagopal, S. Rixner, and J. R. Cavallaro. A programmable baseband processor design for software defined radios. In IEEE International Midwest Symposium on Circuits and Systems, volume 3, pages 413-416, Tulsa, OK, August 2002.
  7. H. Blume, H. Hubert, H. T. Feldkamper, and T. G. Noll. Model-based exploration of the design space for heterogeneous systems on chip. In IEEE International Conference on Application-specific Systems, Architectures and Processors, pages 29-40, San Jose, CA, July 2002.
  8. V. S. Lapinskii, M. F. Jacome, and G. A. de Veciana. Application-specific clustered VLIW datapaths: Early exploration on a parameterized design space. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(8):889-903, August 2002.
  9. J. Kin, C. Lee, W. H. Mangione-Smith, and M. Potkonjak. Power efficient mediaprocessors: Design space exploration. In ACM/IEEE Design Automation Conference, pages 321-326, New Orleans, LA, June 1999.
  10. I. Kadayif, M. Kandemir, and U. Sezer. An integer linear programming based approach for parallelizing applications in on-chip multiprocessors. In ACM/IEEE Design Automation Conference, pages 703-708, New Orleans, LA, June 2002.
  11. B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owens, and B. Towles. Exploring the VLSI scalability of stream processors. In International Conference on High Performance Computer Architecture (HPCA-2003), pages 153-164, Anaheim, CA, February 2003.
  12. S. Rixner, W. Dally, U. Kapasi, B. Khailany, A. Lopez-Lagunas, P. Mattson, and J. Owens. A bandwidth-efficient architecture for media processing. In 31st Annual International Symposium on Microarchitecture, pages 3-13, Dallas, TX, November 1998.
  13. S. Rajagopal, S. Rixner, and J. R. Cavallaro. Reconfigurable stream processors for wireless base- stations. Rice University Technical Report TREE0305, October 2003.
  14. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low Power CMOS Digital Design. IEEE Journal of Solid-State Circuits, 27(4):119-123, 1992.
  15. J. A. Butts and G. S. Sohi. A static power model for architects. In 33rd Annual International Symposium on Microarchitecture (Micro-33), pages 191-201, Monterey, CA, December 2000.
  16. M. M. Khellah and M. I. Elmasry. Power minimization of high-performance submicron CMOS circuits using a dual-£ £ dual-¦¡ (DVDV)approach. In IEEE International Symposium on Low Power Electronic Design (ISLPED'99), pages 106-108, San Diego, CA, 1999.
  17. A. Bogliolo, R. Corgnati, E. Macii, and M. Poncino. Parameterized RTL power models for combinational soft macros. In IEEE International Conference on Computer-Aided Design (ICCAD), pages 284-288, San Jose, CA, November 1999.
  18. A. Beaumont-Smith, N. Burgess, S. Cui, and M. Liebelt. GaAs multiplier and adder designs for high-speed DSP applications. In £ ¦¡ Asilomar Conference on Signals, Systems and Computers, volume 2, pages 1517-1521, Pacific Grove, CA, November 1997.
  19. J. D. Owens, S. Rixner, U. J. Kapasi, P. Mattson, B. Towles, B. Serebrin, and W. J. Dally. Media processing applications on the Imagine stream processor. In IEEE International Conference on Computer Design (ICCD), pages 295-302, Freiburg, Germany, September 2002.