Energy optimization of multi-level processor cache architectures
1995, Proceedings of the 1995 international symposium on Low power design - ISLPED '95
https://doi.org/10.1145/224081.224090…
5 pages
1 file
Sign up for access to the world's latest research
Abstract
To optimize performance and power of a processor's cache, a multiple-divided module (MDM) cache architecture is proposed to save power at memory peripherals as well as the bit array. For a MxB-divided MDM cache, latency is equivalent to that of the smallest module and power consumption is only 1/MxB of the regular, non-divided cache. Based on the architecture and given transistor budgets for onchip processor caches, this paper extends investigation to analyze energy effects from cache parameters in a multi-level cache design. The analysis is based on execution of SPECint92 benchmark programs with miss ratios of a RISC processor.
Related papers
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000
The memory hierarchy of high-performance and embedded processors has been shown to be one of the major energy consumers. For example, the Level-1 (L1) instruction cache (I-Cache) of the StrongARM processor accounts for 27% of the power dissipation of the whole chip, whereas the instruction fetch unit (IFU) and the I-Cache of Intel's Pentium Pro processor are the single most important power consuming modules with 14% of the total power dissipation [2]. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip. In this paper, we propose a technique that uses an additional mini cache, the LO-Cache, located between the I-Cache and the CPU core. This mechanism can provide the instruction stream to the data path and, when managed properly, it can effectively eliminate the need for high utilization of the more expensive I-Cache. We propose, implement, and evaluate five techniques for dynamic analysis of the program instruction access behavior, which is then used to proactively guide the access of the LO-Cache. The basic idea is that only the most frequently executed portions of the code should be stored in the LO-Cache since this is where the program spends most of its time. We present experimental results to evaluate the effectiveness of our scheme in terms of performance and energy dissipation for a series of SPEC95 benchmarks. We also discuss the performance and energy tradeoffs that are involved in these dynamic schemes. Results for these benchmarks indicate that more than 60% of the dissipated energy in the I-Cache subsystem can be saved
Microprocessors and Microsystems, 2002
The line size/performance trade-offs in off-chip second-level caches in light of energy-ef®ciency are revisited. Based on a mix of applications representing server and mobile computer system usage, we show that while the large line sizes (128 bytes) typically used maximize performance, they result in a high power dissipation owing to the limited exploitation of spatial locality. In contrast, small blocks (32 bytes) are found to cut the energy-delay by more than a factor of 2 with only a moderate performance loss of less than 25%. As a remedy, prefetching, if applied selectively, is shown to avoid the performance losses of small blocks, yet keeping power consumption low.
2005
Instruction caches typically consume 27% of the total power in modern high-end embedded systems. We propose a compiler-managed instruction store architecture (K-store) that places the computation intensive loops in a scratchpad like SRAM memory and allocates the remaining instructions to a regular instruction cache. At runtime, execution is switched dynamically between the instructions in the traditional instruction cache and the ones in the K-store, by inserting jump instructions. The necessary jump instructions add 0.038% on an average to the total dynamic instruction count. We compare the performance and energy consumption of our K-store with that of a conventional instruction cache of equal size. When used in lieu of a 8KB, 4-way associative instruction cache, K-store provides 32% reduction in energy and 7% reduction in execution time. Unlike loop caches, K-store maps the frequent code in a reserved address space and hence, it can switch between the kernel memory and the instruction cache without any noticeable performance penalty.
In order to meet the ever-increasing computing requirement in the embedded market, multiprocessor chips were proposed as the best way out. In this work we investigate the energy consumption in these embedded MPSoC systems. One of the efficient solutions to reduce the energy consumption is to reconfigure the cache memories. This approach was applied for one cache level/one processor architecture, but has not yet been investigated for multiprocessor architecture with two level caches. The main contribution of this paper is to explore two level caches (L1/L2) multiprocessor architecture by estimating the energy consumption. Using a simulation platform, we first built a multiprocessor architecture, and then we propose a new algorithm that tunes the two-level cache memory hierarchy (L1 and L2). The tuning caches approach is based on three parameters: cache size, line size, and associativity. To find the best cache configuration, the application is divided into several execution intervals. And then, for each interval, we generate the best cache configuration. Finally, the approach is validated using a set of open source benchmarks; Spec 2006, Splash-2, MediaBench and we discuss the performance in terms of speedup and energy reduction.
Proceedings of the 20th symposium on Great lakes symposium on VLSI - GLSVLSI '10, 2010
On-chip memory organization is one of the most important aspects that can influence the overall system behavior in multiprocessor systems. Following the trend set by high-performance processors, high-end embedded cores are moving from singlelevel on chip caches to a two-level on-chip cache hierarchy. Whereas in the embedded world there is general consensus on L1 private caches, for L2 there is still not a dominant architectural paradigm. Cache architectures that work for high performance computers turn out to be inefficient for embedded systems (mainly due to power-efficiency issues). This paper presents a virtual platform for design space exploration of L2 cache architectures in low-power Multi-Processor-Systemson-Chip (MPSoCs). The tool contains several L2 caches templates, and new architectures can be easily added using our flexible plug-in system. Given a set of constrains for a specific system (power, area, performance), our tool will perform extensive exploration to find the cache organization that best suits our needs. Through some practical experiments, we show how it is possible to select the optimal L2 cache, and how this kind of tool can help designers avoid some common misconceptions. Benchmarking results in the experiments section will show that for a case study with multiple processors running communicating tasks allocated on different cores, the private L2 cache organization still performs better than the shared one.
IEEE Transactions on Very Large Scale Integration Systems, 2003
Microprocessor performance has been improved by increasing the capacity of on-chip caches. However, the performance gain comes at the price of static energy consumption due to subthreshold leakage current in cache memory arrays. This paper compares three techniques for reducing static energy consumption in on-chip level-1 and level-2 caches. One technique employs low-leakage transistors in the memory cell. Another technique, power supply switching, can be used to turn off memory cells and discard their contents. A third alternative is dynamic threshold modulation, which places memory cells in a standby state that preserves cell contents. In our experiments, we explore the energy and performance tradeoffs of these techniques. We also investigate the sensitivity of microprocessor performance and energy consumption to additional cache latency caused by leakage-reduction techniques.
2010 12th International Conference on Computer Modelling and Simulation, 2010
With the increase of processor-memory performance gap, it has become important to gauge the performance of cache architectures so as to evaluate their impact on energy requirement and throughput of the system. Multilevel caches are found to be increasingly prevalent in the high-end processors. Additionally, the recent drive towards multicore systems has necessitated the use of multilevel cache hierarchies for shared memory architectures. This paper presents simplified and accurate mathematical models to estimate the energy consumption and the impact on throughput for multilevel caches for single core systems.
2014 International Conference on Electronics and Communication Systems (ICECS), 2014
Minimizing power consumption of Chip Multiprocessors has drawn attention of the researchers now-a days. A single chip contains a number of processor cores and equally larger caches. According to recent research, it is seen that, on chip caches consume the maximum amount of total power consumed by the chip. Reducing on-chip cache size may be a solution for reducing on-chip power consumption, but it will degrade the performance. In this paper we present a study of reducing cache capacity and analyzing its effect on power and performance. We reduce the number of available cache banks and see its effect on reduction in dynamic and static energy. Experimental evaluation shows that for most of the benchmarks, we get significant reduction in static energy; which can result in controlling chip temperature. We use CACTI and full system simulator for our experiments.
Communications on Applied Electronics
The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (7)
- S. Date, N. Shibata, S.Mutoh, and J. Yamada, "1V 30MHz Memory-Macrocell-Circuit Technology with a 0.5um Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 90-91, Oct. 1994.
- S. T. Chu, "A 25 ns Low Power Full-CMOS 1Mbit (128Kx8) SRAM," Journal of Solid State Circuits, vol. 23, pp. 1078-1084, Oct. 1988.
- D. T. Wong, "A 11 ns 8Kx18 CMOS Static RAM with 0.5 µm devices," Journal of Solid State Circuits, vol. 23, pp. 1095-1103, Oct. 1988.
- B. Amrutur, and M. Horowitz, "Techniques to Reduce Power in Fast Wide Memories," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 92-93, Oct. 1994.
- K. Itoh, K. Sasaki, and Y. Nakagome, "Trends in Low- Power RAM Circuit Technologies," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 84-87, Oct. 1994.
- A. J. Smith, "Cache Memories," Computing Surveys, pp. 473-530, Sep. 1982.
- J. Gee, M. D. Hill, D. N. Pnevmatikatos, and A.J. Smith, "Cache Performance of the Spec92 Benchmark Suite," IEEE Micro, pp. 17-27, Aug. 1993.