Academia.eduAcademia.edu

Outline

Memory Optimizations for Time-Predictable Embedded Software

2009

Abstract

Real-time constraints place a requirement on systems to accomplish their assigned functionality in a certain timeframe. This requirement is critical for hard real-time applications, such as safety device controllers, where the system behavior in the worst case determines the system feasibility with respect to timing specifications. There is often a need to improve this worst-case performance to realize the system with efficient use of system resources. The rule remains, however, that all impacts of performance enhancement done to the system should not compromise its timing predictability-the property that its performance can be bounded and guaranteed to meet its timing constraints under all possible scenarios. Due to the yet-to-be-resolved gap between the performance of processor and memory technology, memory accesses remain the reigning performance bottleneck of most applications today. Embedded systems generally include fast memory on-chip to speed up execution time. To utilize this resource for optimal performance gain, it is crucial to design a suitable management scheme. Popular approaches targeted at enhancing average-case performance, typically done via profiling, cannot be directly adapted to effectively improve worst-case performance, due to the inherent possibility of worst-case execution path shift. There is thus a need for new approaches specifically targeted at optimizing worst-case performance in a time-predictable manner.

References (152)

  1. Suhendra, A. Roychoudhury, and T. Mitra. Scratchpad Allocation for Concurrent Embedded Software. In Proc. ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2008.
  2. V. Suhendra and T. Mitra. Exploring Locking & Partitioning for Predictable Shared Caches on Multi-Cores. In Proc. ACM Design Automation Conference (DAC), 2008.
  3. V. Suhendra, C. Raghavan, and T. Mitra. Integrated Scratchpad Memory Optimiza- tion and Task Scheduling for MPSoC Architectures. In Proc. ACM/IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2006.
  4. V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. Efficient Detection and Ex- ploitation of Infeasible Paths for Software Timing Analysis. In Proc. ACM Design Automation Conference (DAC), 2006.
  5. V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. WCET Centric Data Allocation to Scratchpad Memory. In Proc. IEEE Real-Time Systems Symposium (RTSS), 2005.
  6. T. Chen, T. Mitra, A. Roychoudhury, and V. Suhendra. Exploiting Branch Constraints without Explicit Path Enumeration. In Proc. 5th International Workshop on Worst-Case Execution Time Analysis (WCET), 2005. Bibliography
  7. T. A. AlEnawy and H. Aydin. Energy-aware task allocation for rate monotonic scheduling. In Proc. 11th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2005.
  8. M. Alt, C. Ferdinand, F. Martin, and R. Wilhelm. Cache behavior prediction by abstract interpretation. Lecture Notes in Computer Science, 1145:52-66, 1996.
  9. P. Altenbernd. On the false path problem in hard real-time programs. In Proc. Euromicro Conference on Real-Time Systems (ECRTS), 1996.
  10. R. Alur and M. Yannakakis. Model checking message sequence charts. In Proc. Interna- tional Conference on Concurrency Theory (CONCUR), 1999.
  11. J. H. Anderson, J. M. Calandrino, and U. C. Devi. Real-time scheduling on multicore platforms. In Proc. 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2006.
  12. F. Angiolini, F. Menichelli, A. Ferrero, L. Benini, and M. Olivieri. A post-compiler approach to scratchpad mapping of code. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2004.
  13. ARM Ltd. White Paper: Architecture and Implementation of the ARM Cortex-A8 Pro- cessor. Available on: http://www.arm.com/pdfs/TigerWhitepaperFinal. pdf, 2005. Release October 2005.
  14. ARM Ltd. ARM Processor Cores Documentation. Available on: http://www.arm. com/documentation/ARMProcessor Cores/index.html, 2006.
  15. A. Arnaud and I. Puaut. Dynamic instruction cache locking in hard real-time systems. In Proc. 14th International Conference on Real-Time and Network Systems (RNTS), 2006.
  16. T. Austin, E. Larson, and D. Ernst. SimpleScalar: An infrastructure for computer system modeling. IEEE Computer, 35(2), 2002.
  17. O. Avissar, R. Barua, and D. Stewart. Heterogeneous memory management for embedded systems. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2001.
  18. O. Avissar, R. Barua, and D. Stewart. An optimal memory allocation scheme for scratch- pad based embedded systems. ACM Transactions on Embedded Computing Systems, 1(1):6-26, 2002.
  19. R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. Comparison of cache-and scratch-pad-based memory systems with respect to performance, area and energy consumption. Technical Report 762, University of Dortmund, September 2001.
  20. R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad mem- ory: design alternative for cache on-chip memory in embedded systems. In Proc. Inter- national Conference on Hardware/Software Codesign (CODES), 2002.
  21. S. K. Baruah, N. K. Cohen, C. G. Plaxton, and D. A. Varvel. Proportionate progress: a notion of fairness in resource allocation. In Proc. 25th Annual ACM Symposium on Theory of Computing (STOC), 1993.
  22. L. Benini, D. Bertozzi, A. Guerri, and M. Milano. Allocation and scheduling for MPSoCs via decomposition and no-good generation. In Proc. International Joint Conferences on Artificial Intelligence (IJCAI), 2005.
  23. J. Brown. Application-customized CPU design: The Microsoft Xbox 360 CPU story. Available on: http://www-128.ibm.com/developerworks/power/ library/pa-fpfxbox/?ca=dgr-lnxw07XBoxDesign, 2005. Release Dec 6, 2005.
  24. A. Burns. Scheduling hard real-time systems: a review. Software Engineering Journal, 6(3), 1991.
  25. F. Burns, A. Koelmans, and A. Yakovlev. Wcet analysis of superscalar processors using simulation with coloured petri nets. Real-Time Systems, 18(2-3):275-288, 2000.
  26. A. M. Campoy, I. Puaut, A. P. Ivars, and J. V. B. Mataix. Cache contents selection for statically-locked instruction caches: an algorithm comparison. In Proc. 17th Euromicro Conference on Real-Time Systems (ECRTS), 2005.
  27. J. Carpenter, S. Funk, P. Holman, A. Srinivasan, J. Anderson, and S. Baruah. A Cate- gorization of Real-time Multiprocessor Scheduling Problems and Algorithms. In J. Y.-T.
  28. Leung, editor, Handbook of Scheduling: Algorithms, Models, and Performance Analysis. Chapman Hall/CRC Press, 2004.
  29. J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In Proc. Interna- tional Symposium on Computer Architecture (ISCA), 2006.
  30. K. S. Chatha and R. Vemuri. Hardware-software partitioning and pipelined scheduling of transformative applications. IEEE Transactions on VLSI, 10(3), 2002.
  31. S. Chatterjee, E. Parker, P. J. Hanlon, and A. R. Lebeck. Exact analysis of the cache behavior of nested loops. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2001.
  32. D. T. Chiou. Extending the reach of microprocessors: column and curious caching. PhD thesis, MIT, 1999.
  33. A. Colin and I. Puaut. Worst case execution time analysis for a processor with branch prediction. Real-Time Systems, 18(2-3):249-274, May 2000.
  34. CPLEX. The ILOG CPLEX Optimizer v7.5, 2002. Commercial software, http://www.ilog.com.
  35. C. Cullmann and F. Martin. Data-flow based detection of loop bounds. In Proc. 7th International Workshop on Worst-Case Execution Time (WCET) Analysis, 2007.
  36. J.-F. Deverge and I. Puaut. WCET-directed dynamic scratchpad memory allocation of data. In Proc. 19th Euromicro Conference on Real-Time Systems (ECRTS), 2007.
  37. A. Dominguez, S. Udayakumaran, and R. Barua. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing, 2005.
  38. B. Egger, C. Kim, C. Jang, Y. Nam, J. Lee, and S.L. Min. A dynamic code placement technique for scratchpad memory using postpass optimization. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2006.
  39. B. Egger, J. Lee, and H. Shin. Dynamic scratchpad memory management for code in portable systems with an MMU. ACM Transactions on Embedded Computing Systems, 7(2), 2008.
  40. A. Ermedahl and J. Engblom. Modeling complex flows for worst-case execution time analysis. In Proc. IEEE Real-Time Systems Symposium (RTSS), 2000.
  41. A. Ermedahl and J. Gustafsson. Deriving annotations for tight calculation of execution time. In Proc. 3rd International Euro-Par Conference on Parallel Processing (Euro-Par), 1997.
  42. A. Ermedahl, C. Sandberg, J. Gustafsson, S. Bygde, and B. Lisper. Loop bound analysis based on a combination of program slicing, abstract interpretation, and invariant analysis. In Proc. 7th International Workshop on Worst-Case Execution Time (WCET) Analysis, 2007.
  43. European Space Agency. DEBIE -First standard space debris monitoring instrument, 2008. http://gate.etamax.de/edid/publicaccess/debie1.php.
  44. H. Falk and M. Verma. Combined data partitioning and loop nest splitting for energy con- sumption minimization. In Proc. 8th International Workshop on Software and Compilers for Embedded Systems (SCOPES), 2004.
  45. Freescale Semiconductor, Inc. MMC2114/MMC2113 M-CORE Microcontroller Product Brief. Available on: http://www.freescale.com/files/32bit/doc/prod brief/MMC2114PB.pdf, 2008.
  46. S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21(4):703-746, 1999.
  47. J. Gustafsson and A. Ermedahl. Merging techniques for faster derivation of wcet flow information using abstract execution. In Proc. 8th International Workshop on Worst-Case Execution Time (WCET) Analysis, 2008.
  48. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proc. IEEE Annual Workshop on Workload Characterization (WWC), 2001.
  49. D. Harel and P. S. Thiagarajan. Message sequence charts. UML for real: Design of embedded real-time systems, pages 77-105, 2003.
  50. C. Healy, M. Sjödin, V. Rustagi, D. Whalley, and R. V. Engelen. Supporting timing analysis by automatic bounding of loop iterations. Real-Time Systems, 18(2-3):129-156, 2000.
  51. C. A. Healy, R. D. Arnold, F. Mueller, D. B. Whalley, and M. G. Harmon. Bounding pipeline and instruction cache performance. IEEE Transactions on Computers, 48(1):53- 70, Jan 1999.
  52. C. A. Healy and D. B. Whalley. Automatic detection and exploitation of branch con- straints for timing analysis. IEEE Transactions on Software Engineering, 28(8), 2002.
  53. J. L. Hennessy and D. A. Patterson. Computer Organization and Design: The Hard- ware/Software Interface, 2nd Ed. Morgan Kaufmann Publishers Inc., 1998.
  54. T. A. Henzinger, R. Jhala, R. Majumder, and G. Sutre. Lazy abstraction. In Proc. Sympo- sium on Principles of Programming Languages (POPL), 2002.
  55. H. P. Hofstee. Power efficient processor architecture and the Cell processor. In Proc. International Symposium on High-Performance Computer Architecture (HPCA), 2005.
  56. IBM Systems and Technology Group. Cell Broadband Engine Architecture Ver- sion 1.0. Available on: http://www-306.ibm.com/chips/techlib/ techlib.nsf/techdocs/1AEEE1270EA2776387257060006E61BA/
  57. \$file/CBEA 01 pub.pdf, 2005. Release Aug 8, 2005.
  58. IBM Systems and Technology Group. PowerPC: IBM Microelectronics. Avail- able on: http://www-306.ibm.com/chips/techlib/techlib.nsf/ productfamilies/PowerPC, 2006.
  59. Intel Corporation. Intel Multi-core. Available on: http://www.intel.com/ multi-core/, 2006.
  60. I. Issenin, E. Brockmeyer, B. Durinck, and N. Dutt. Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies. In Proc. ACM Design Au- tomation Conference (DAC), 2006.
  61. ITU-T. 120: Message sequence chart (MSC). ITU-T, Geneva, 1996.
  62. J. Robertson and K. Gala. Instruction and Data Cache Locking on the e300 Processor Core. Freescale Semiconductor, Inc., 2006.
  63. A. Janapsatya, A. Ignjatovic, and S. Parameswaran. A novel instruction scratchpad mem- ory optimization method based on concomitance metric. In Proc. Conference on Asia South Pacific Design Automation (ASP-DAC), 2006.
  64. M. Kandemir. Data locality enhancement for CMPs. In Proc. International Conference on Computer Aided Design (ICCAD), 2007.
  65. M. Kandemir and N. Dutt. Memory systems and compiler support for MPSoC archi- tectures. In A. Jerraya and W. Wolf, editors, Multiprocessor Systems-on-Chips. Morgan Kaufmann, 2005.
  66. M. Kandemir, I. Kadayif, and U. Sezer. Exploiting scratch-pad memory using presburger formulas. In Proc. 14th International Symposium on Systems Synthesis (ISSS), 2001.
  67. M. Kandemir, O. Ozturk, and M. Karakoy. Dynamic on-chip memory management for chip multiprocessors. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2004.
  68. M. Kandemir, J. Ramanujam, and A. Choudhary. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proc. ACM Design Automation Conference (DAC), 2002.
  69. M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. A compiler based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Transactions on CAD, 23(2), 2004.
  70. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip mul- tiprocessor architecture. In Proc. 13th International Conference on Parallel Architecture and Compilation Techniques (PACT), 2004.
  71. D. B. Kirk. SMART (Strategic Memory Allocation for Real-Time) cache design. In Proc. IEEE Real-Time Systems Symposium (RTSS), 1989.
  72. S.-R. Kuang, C.-Y. Chen, and R.-Z. Liao. Partitioning and pipelined scheduling of em- bedded system using integer linear programming. In Proc. International Conference on Parallel and Distributed Systems (ICPADS), 2005.
  73. Y.-K. Kwok and I. Ahmad. Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59(3), 1999.
  74. M. S. Lam and M. E. Wolf. A data locality optimizing algorithm. SIGPLAN Notices, 39(4):442-459, 2004.
  75. S. Lauzac, R. Melhem, and D. Mosse. Comparison of global and partitioning schemes for scheduling rate monotonic tasks on a multiprocessor. In Proc. Euromicro Workshop on Real-Time Systems, 1998.
  76. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems. In Proc. Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 1997.
  77. C.-G. Lee, J. Hahn, Y.-M. Seo, S. L. Min, R. Ha, S. Hong, C. Y. Park, M. Lee, and C. S. Kim. Analysis of cache-related preemption delay in fixed-priority preemptive scheduling. IEEE Transactions on Computers, 47(6):700-713, 1998.
  78. J. W. Lee and K. Asanovic. METERG: Measurement-based end-to-end performance esti- mation technique in QoS-capable multiprocessors. In Proc. IEEE Real-Time and Embed- ded Technology and Applications Symposium (RTAS), 2006.
  79. R. L. Lee, P. C. Yew, and D. H. Lawrie. Multiprocessor cache design considerations. In Proc. International Symposium on Computer Architecture (ISCA), 1987.
  80. S. Lee, J. Lee, C. Y. Park, and S. L. Min. A flexible tradeoff between code size and WCET using a dual instruction set processor. In Proc. International Workshop on Software and Compilers for Embedded Systems (SCOPES), 2004.
  81. X. Li, Y. Liang, T. Mitra, and A. Roychoudhury. Chronos: A timing analyzer for embed- ded software. Science of Computer Programming, 69(1-3):56-67, 2007.
  82. X. Li, T. Mitra, and A. Roychoudhury. Accurate timing analysis by modeling caches, speculation and their interaction. In Proc. 40th ACM Design Automation Conference (DAC), pages 466-471, 2003.
  83. Y. Li and W. Wolf. A task-level hierarchical memory model for system synthesis of multiprocessors. In Proc. ACM Design Automation Conference (DAC), 1997.
  84. Y-T. S. Li and S. Malik. Performance analysis of embedded software using implicit path enumeration. In Proc. ACM Design Automation Conference (DAC), 1995.
  85. Y.-T. S. Li, S. Malik, and A. Wolfe. Cache modeling for real-time software: beyond direct mapped instruction caches. In Proc. 17th IEEE Real-Time Systems Symposium (RTSS), 1996.
  86. C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. Journal of the ACM, 20(1):46-61, January 1973.
  87. J. M. Lopez, M. Garcia, J. L. Diaz, and D. F. Garcia. Worst-case utilization bound for EDF scheduling on real-time multiprocessor systems. Real-Time Systems, 2000.
  88. J. M. Lopez, M. Garcia, J. L. Diaz, and D. F. Garcia. Utilization bounds for multiprocessor rate-monotonic scheduling. Real-Time Systems, 24(1), 2003.
  89. T. Lundqvist and P. Stenstrom. An integrated path and timing analysis method based on cycle-level symbolic execution. Real-Time Systems, 17(2-3), 1999.
  90. T. Lundqvist and P. Stenstrom. Timing anomalies in dynamically scheduled microproces- sors. In Proc. 20th IEEE Real-Time Systems Symposium (RTSS), 1999.
  91. P. Marwedel, L. Wehmeyer, M. Verma, S. Steinke, and U. Helmig. Fast, predictable and low energy memory references through architecture-aware compilation. In Proc. Conference on Asia South Pacific Design Automation (ASP-DAC), 2004.
  92. T. Mitra and A. Roychoudhury. Worst case execution time and energy analysis. In Y. Srikant and P. Shankar, editors, The Compiler Design Handbook: Optimizations and Machine Code Generation, 2nd Ed., chapter 1. CRC Press, 2007.
  93. A. M. Molnos, M. J. M. Heijligers, S. D. Cotofana, and J. T. J. van Eijndhoven. Cache partitioning options for compositional multimedia applications. In Proc. 15th Annual Workshop on Circuits, Systems and Signal Processing (ProRISC), 2004.
  94. F. Mueller. Compiler support for software-based cache partitioning. In Proc. ACM Con- ference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 1995.
  95. F. Mueller. Generalizing timing predictions to set-associative caches. In Proc. 9th Eu- romicro Workshop on Real-Time Systems, pages 64-71, 1997.
  96. F. Mueller. Timing analysis for instruction caches. Real-Time Systems, 18(2-3), 2000.
  97. B. A. Nayfeh and K. Olukotun. Exploring the design space for a shared-cache multipro- cessor. In Proc. International Symposium on Computer Architecture (ISCA), 1994.
  98. H. S. Negi, T. Mitra, and A. Roychoudhury. Accurate estimation of cache-related preemp- tion delay. In Proc. 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2003.
  99. F. Nemer, H. Cass, P. Sainrat, J.-P. Bahsoun, and M. De Michiel. PapaBench: A free real-time benchmark. In Proc. International Workshop on Worst-Case Execution Time (WCET) Analysis, 2006.
  100. N. Nguyen, A. Dominguez, and R. Barua. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2005.
  101. R. Niemann and P. Marwedel. Hardware/software partitioning using integer program- ming. In Proc. Conference on Design, Automation and Test in Europe (DATE), 1996.
  102. O. Ozturk, G. Chen, M. Kandemir, and M. Karakoy. An integer linear programming based approach to simultaneous memory space partitioning and data allocation for chip multiprocessors. In Proc. IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI), 2006.
  103. O. Ozturk, M. Kandemir, G. Chen, M. J. Irwin, and M. Karakoy. Customized on-chip memories for embedded chip multiprocessors. In Proc. Conference on Asia South Pacific Design Automation (ASP-DAC), 2005.
  104. O. Ozturk, M. Kandemir, and I. Kolcu. Shared scratch-pad memory space management. In Proc. 7th International Symposium on Quality Electronic Design (ISQED), 2006.
  105. P. R. Panda, N. D. Dutt, and A. Nicolau. Memory Issues in Embedded Systems-On-Chip: Optimizations and Exploration. Kluwer Academic Publishers, 1999.
  106. P. R. Panda, N. D. Dutt, and A. Nicolau. On-chip vs. off-chip memory: the data par- titioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems, 5(3):682-704, 2000.
  107. C. Y. Park. Predicting program execution times by analyzing static and dynamic program paths. Real-Time Systems, 5(1), 1993.
  108. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge University Press, 2002.
  109. I. Puaut. WCET-centric software-controlled instruction caches for hard real-time systems. In Proc. 18th Euromicro Conference on Real-Time Systems (ECRTS), 2006.
  110. I. Puaut, A. Arnaud, and D. Decotigny. Performance analysis of static cache locking in multitasking hard real-time systems. Technical Report 0, IRISA, October 2003.
  111. I. Puaut and D. Decotigny. Low-complexity algorithms for static cache locking in mul- titasking hard real-time systems. In Proc. 23rd IEEE Real-Time Systems Symposium (RTSS), 2002.
  112. P. Puschner and A. Burns. A review of worst-case execution-time analysis. Journal of Real-Time Systems, 18(2/3):115-128, May 2000.
  113. P. Puschner and A. Schedl. Computing maximum task execution times with linear pro- gramming techniques. Technical report, Technical University of Vienna, 1995.
  114. R. A. Ravindran, P. D. Nagarkar, G. S. Dasika, E. D. Marsman, R. M. Senger, S. A. Mahlke, and R. B. Brown. Compiler managed dynamic instruction placement in a low- power code cache. In Proc. International Symposium on Code Generation and Optimiza- tion (CGO), 2005.
  115. R. Reddy and P. Petrov. Eliminating inter-process cache interference through cache re- configurability for real-time and low-power embedded multi-tasking systems. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Sys- tems (CASES), 2007.
  116. J. E. Sasinowski and J. K. Strosnider. A dynamic programming algorithm for cache memory partitioning for real-time systems. IEEE Transactions on Computers, 42(8), 1993.
  117. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. POWER5 Sys- tem Microarchitecture. Available on: http://researchweb.watson.ibm.com/ journal/rd/494/sinharoy.html, 2005. Received March 2, 2005; accepted for publication June 27, 2005; Published online September 7, 2005.
  118. J. Sjodin and C. von Platen. Storage allocation for embedded processors. In Proc. In- ternational Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2001.
  119. M. S. Squillante and E. D. Lazowska. Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Transactions on Parallel and Dis- tributed Systems, 4(2), 1993.
  120. A. Srinivasan, P. Holman, J. H. Anderson, and S. Baruah. The case for fair multipro- cessor scheduling. In Proc. 17th International Symposium on Parallel and Distributed Processing (IPDPS), 2003.
  121. F. Stappert, A. Ermedahl, and J. Engblom. Efficient longest execution path search for programs with compelx flows and pipeline effects. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2001.
  122. J. Staschulat and R. Ernst. Multiple process execution in cache related preemption delay analysis. In Proc. International Conference on Embedded Software (EMSOFT), 2004.
  123. S. Steinke, N. Grunwald, L. Wehmeyer, R. Banakar, M. Balakrishnan, and P. Marwedel. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In Proc. 15th International Symposium on System Synthesis (ISSS), 2002.
  124. S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proc. Design, Automation and Test in Europe Conference and Exposition (DATE), 2002.
  125. G. E. Suh, S. Devadas, and L. Rudolph. Dynamic cache partitioning for simultaneous multithreading systems. In Proc. 13th IASTED International Conference on Parallel and Distributed Computing System (PDCS), 2001.
  126. V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. WCET centric data allocation to scratchpad memory. In Proc. 26th IEEE International Real-Time Systems Symposium (RTSS), 2005.
  127. F. Sun, N. K. Jha, S. Ravi, and A. Raghunathan. Synthesis of application-specific hetero- geneous multiprocessor architectures using extensible processors. In Proc. International Conference on VLSI Design (VLSI), 2005.
  128. Sun Microsystems, Inc. UltraSPARC T1 Overview. Available on: http://www.sun. com/processors/UltraSPARC-T1/index.xml, 2006.
  129. S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers. The wavescalar architecture. ACM Transactions on Computer Systems, 25(2):4, 2007.
  130. O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proc. ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 1994.
  131. Texas Instruments, Inc. TMS470R1x System Module Reference Guide. Available on: http://focus.ti.com/lit/ug/spnu189h/spnu189h.pdf, 2004. Release November 2004.
  132. H. Theiling, C. Ferdinand, and R. Wilhelm. Fast and precise WCET prediction by sepa- rated cache and path analyses. Real-Time Systems, 18(2/3), 2000.
  133. S. Thesing. Safe and Precise WCET Determination by Abstract Interpretation of Pipeline Models. PhD thesis, Saarland University, 2004.
  134. H. Tomiyama and N. D. Dutt. Program path analysis to bound cache-related preemp- tion delay in preemptive real-time systems. In Proc. International Conference on Hard- ware/Software Codesign (CODES), 2000.
  135. S. Udayakumaran and R. Barua. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2003.
  136. J. van Eijndhoven, J. Hoogerbrugge, M. N. Jayram, P. Stravers, and A. Terechko. Cache- Coherent Heterogeneous Multiprocessing as Basis for Streaming Applications, volume 3 of Philips Research Book Series, pages 61-80. Springer, 2005.
  137. X. Vera, B. Lisper, and J. Xue. Data cache locking for higher program predictability. In Proc. International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2003.
  138. X. Vera, B. Lisper, and J. Xue. Data caches in multitasking hard real-time systems. In Proc. 24th IEEE Real-Time Systems Symposium (RTSS), 2003.
  139. M. Verma, K. Petzold, L. Wehmeyer, H. Falk, and P. Marwedel. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In Proc. 3rd Workshop on Embedded Systems for Real-Time Multimedia (EstiMedia), 2005.
  140. M. Verma, L. Wehmeyer, and P. Marwedel. Cache-aware scratchpad allocation algorithm. In Proc. Design, Automation and Test in Europe Conference and Exposition (DATE), 2004.
  141. M. Verma, L. Wehmeyer, and P. Marwedel. Dynamic overlay of scratchpad memory for energy minimization. In Proc. International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2004.
  142. L. Wehmeyer, U. Helmig, and P. Marwedel. Compiler-optimized usage of partitioned memories. In Proc. 3rd Workshop on Memory Performance Issues (WMPI), 2004.
  143. L. Wehmeyer and P. Marwedel. Influence of memory hierarchies on predictability for time constrained embedded software. In Proc. Conference on Design, Automation and Test in Europe (DATE), 2005.
  144. D. J. A. Welsh and M. B. Powell. An upper bound for the chromatic number of a graph and its application to timetabling problems. The Computer Journal, 10(1):85-87, 1967.
  145. I. Wenzel, R. Kirner, P. Puschner, and B. Rieder. Principles of timing anomalies in su- perscalar processors. In Proc. 5th International Conference on Quality Software (QSIC), 2005.
  146. R. T. White, C. A. Healy, D. B. Whalley, F. Mueller, and M. G. Harmon. Timing analysis for data caches and set-associative caches. In Proc. 3rd IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 1997.
  147. J. Xue and X. Vera. Efficient and accurate analytical modeling of whole-program data cache behavior. IEEE Transactions on Computers, 53(5):547-566, 2004.
  148. T.-Y. Yen and W. Wolf. Performance estimation for real-time distributed embedded sys- tems. IEEE Transactions on Parallel and Distributed Systems, 9(10), 1998.
  149. P. Yu and T. Mitra. Satisfying real-time constraints with custom instructions. In Proc. ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2005.
  150. N. Zhang, A. Burns, and M. Nicholson. Pipelined processors and worst case execution times. Real-Time Systems, 5(4):319-343, 1993.
  151. W. Zhao, W. Kreahling, D. Whalley, C. Healy, and F. Mueller. Improving WCET by optimizing worst-case paths. In Proc. IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2005.
  152. W. Zhao, D. Whalley, C. Healy, and F. Mueller. WCET code positioning. In Proc. IEEE Real-Time Systems Symposium (RTSS), 2004.