Papers by Luis C. Aparicio

Abstract—In multitasking real-time systems it is required to compute the WCET of each task and al... more Abstract—In multitasking real-time systems it is required to compute the WCET of each task and also the effects of interferences between tasks in the worst case. This is complex with variable latency hardware usually found in the fetch path of commercial processors. Some methods disable cache replacement so that it is easier to model the cache behavior. Lock-MS is an ILP based method to obtain the best selection of memory lines to be locked in a dynamic locking instruction cache. In this paper we first propose a simple memory architecture implementing the next-line tagged prefetch, specially designed for hard real-time systems. Then, we extend Lock-MS to add support for hardware instruction prefetch. Our results show that the WCET of a system with prefetch and an instruction cache with size 5 % of the total code size is better than that of a system having no prefetch and cache size 80 % of the code. We also evaluate the effects of the prefetch penalty on the resulting WCET, showing ...
The WCET computation is one of the main challenges in hard real-time systems, since all further a... more The WCET computation is one of the main challenges in hard real-time systems, since all further analysis is based on this value. The complexity of this problem leads existing analysis methods to compute WCET bounds instead of the exact WCET. In this work we propose a technique to compute the exact instruction fetch contribution to the WCET (IFC-WCET) in presence of a LRU instruction cache. We prove that an exact computation does not need to analyze the full exponential number of possible execution paths, but only a bounded subset of them. In the benchmark codes we have studied, the IFC-WCET is up to 62% lower than a bound computed with a widely used approach, and the difference between the number of possible execution paths and the ones relevant for the analysis is extremely large.

2010 IEEE 16th International Conference on Embedded and Real-Time Computing Systems and Applications, 2010
In multitasking real-time systems it is required to compute the WCET of each task and also the ef... more In multitasking real-time systems it is required to compute the WCET of each task and also the effects of interferences between tasks in the worst case. This is complex with variable latency hardware usually found in the fetch path of commercial processors. Some methods disable cache replacement so that it is easier to model the cache behavior. Lock-MS is an ILP based method to obtain the best selection of memory lines to be locked in a dynamic locking instruction cache. In this paper we first propose a simple memory architecture implementing the next-line tagged prefetch, specially designed for hard real-time systems. Then, we extend Lock-MS to add support for hardware instruction prefetch. Our results show that the WCET of a system with prefetch and an instruction cache with size 5% of the total code size is better than that of a system having no prefetch and cache size 80% of the code. We also evaluate the effects of the prefetch penalty on the resulting WCET, showing that a system without prefetch penalties has a worst-case performance 95% of the ideal case. This highlights the importance of a good prefetch design. Finally, the computation time of our analysis method is relatively short, analyzing tasks of 96 KB with 10 65 paths in less than 3 minutes.

Journal of Systems Architecture, 2013
In real-time systems, time is usually so critical that other parameters such as energy consumptio... more In real-time systems, time is usually so critical that other parameters such as energy consumption are often not even considered. However, optimizing the worst energy consumption case can be a key factor in systems with severe power-supply limitations. In this paper we study several memory architectures using combined time and energy optimization models for real-time multitasking systems. Each task is modeled using Lock-MS, a method to optimize the WCET of a task, with an added set of constraints to model in the same way the WCEC (worst case energy consumption). Our tested hardware components focus on instruction fetching, including a lockable cache, a line buffer and a sequential prefetch buffer. We test a variety of instruction fetch alternatives optimizing time and energy consumption. Our results show that the accuracy of the estimation of the number of context switches in the worst case may affect very much the resulting WCEC (up to 8 times in our experiments) and that optimizing the WCEC may provide similar execution times than optimizing the WCET, with up to 5 times less energy consumption. Additionally optimization functions combining WCET and WCEC with different weights show very interesting WCET-WCEC trade-offs. This confirms that methodologies testing such optimizations at design time could be very helpful to provide a precise system setup .

Cálculo del WCET en Presencia de Memorias Cache
Resumen El calculo de una cota del WCET de una tarea depende de muchos factores tanto hardware co... more Resumen El calculo de una cota del WCET de una tarea depende de muchos factores tanto hardware como software, por ejemplo de las memorias caches, los predictores de salto o el compilador utilizado. El tratamiento conjunto de todos ellos hace especialmente complicado este calculo. Hasta la fecha, todos los trabajos que intentan calcular una cota ajustada del WCET tienen en cuenta cada uno de estos factores por separado para reducir la complejidad del problema. En este trabajo se ponen de manifiesto algunos problemas que no han sido todav´ia resueltos y que consideramos de gran interes para poder obtener una cota ajustada del WCET. Describimos con especial detalle aquellos en los que estamos trabajando en estos momentos. En particular, los accesos a direcciones de memoria desconocidas en compilacion y la perdida de informacion de los metodos anal´iticos.
Memorias Cache en Sistemas de Tiempo Real
Resumen El cálculo de una cota del WCET de una tarea (en un procesador moderno) es uno de los pri... more Resumen El cálculo de una cota del WCET de una tarea (en un procesador moderno) es uno de los principales retos en el estudio de los RTS, ya que el funcionamiento de algunos de los componentes hardware del procesador tiene latencia variable. Un caso particular, muy utilizado en los procesadores actuales son las memorias cache. Estas memorias se emplean para
WCET computation is one of the main challenges in the study of HRTS, since it is needed to guaran... more WCET computation is one of the main challenges in the study of HRTS, since it is needed to guarantee the time requirements. Moreover, modern processors have hardware components with a variable latency not known at compilation time which makes the problem even harder. In particular, the WCET computation problem in presence of caches takes exponential complexity. In this work we propose two techniques targeted to compute WCET accurately in presence of both instruction and data caches. Both techniques reduce drastically the number of states to analyze by pruning all the paths located outside the time-critical path.
Cálculo del peor caso en la cache de datos

Journal of Systems Architecture, 2011
In multitasking real-time systems it is required to compute the WCET of each task and also the ef... more In multitasking real-time systems it is required to compute the WCET of each task and also the effects of interferences between tasks in the worst case. This is very complex with variable latency hardware, such as instruction cache memories, or, to a lesser extent, the line buffers usually found in the fetch path of commercial processors. Some methods disable cache replacement so that it is easier to model the cache behavior. The difficulty in these cache-locking methods lies in obtaining a good selection of the memory lines to be locked into cache. In this paper, we propose an ILP-based method to select the best lines to be loaded and locked into the instruction cache at each context switch (dynamic locking), taking into account both intra-task and inter-task interferences, and we compare it with static locking. Our results show that, without cache, the spatial locality captured by a line buffer doubles the performance of the processor. When adding a lockable instruction cache, dynamic locking systems are schedulable with a cache size between 12.5% and 50% of the cache size required by static locking. Additionally, the computation time of our analysis method is not dependent on the number of possible paths in the task. This allows us to analyze large codes in a relatively short time (100 KB with 10 65 paths in less than 3 min).

ACM Transactions on Embedded Computing Systems, 2015
In multitasking real-time systems, the worst-case execution time (WCET) of each task and also the... more In multitasking real-time systems, the worst-case execution time (WCET) of each task and also the effects of interferences between tasks in the worst-case scenario need to be calculated. This is especially complex in the presence of data caches. In this article, we propose a small instruction-driven data cache (256 bytes) that effectively exploits locality. It works by preselecting a subset of memory instructions that will have data cache replacement permission. Selection of such instructions is based on data reuse theory. Since each selected memory instruction replaces its own data cache line, it prevents pollution and performance in tasks becomes independent of the size of the associated data structures. We have modeled several memory configurations using the Lock-MS WCET analysis method. Our results show that, on average, our data cache effectively services 88% of program data of the tested benchmarks. Such results double the worst-case performance of our tested multitasking expe...

Journal of Systems Architecture, 2015
In this paper we propose a new hardware data cache (FAFB, fully-associative FIFO tagged buffers) ... more In this paper we propose a new hardware data cache (FAFB, fully-associative FIFO tagged buffers) to complement the data cache in processors. It provides predictability when exploiting temporal reuse in array data structures, i.e. it allows an accurate WCET analysis, which is required in real-time systems. With our hardware proposal, compiler transformations that exploit such reuse (essentially tiling) can be safely applied. Moreover, our proposal has other features of particular interest to embedded systems, where a set of welltuned applications run in a hardware platform which may be constrained in size, complexity and energy consumption. In order to test the most uncommon features of the FAFBs (predictability and effectiveness with a small size), we perform a worst-case analysis on several kernel algorithms for embedded and real-time computing, showing the interaction between tiling and our hardware architecture. Our results show that the number of data cache misses is reduced between 1.3 and 19 times on such algorithms.
A Small and Effective Data Cache for Real-Time Multitasking Systems
2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium, 2012
Uploads
Papers by Luis C. Aparicio