Papers by Jalil Boukhobza

MONTRES-NVM: An External Sorting Algorithm for Hybrid Memory
2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA)
DRAM technology is approaching its scaling limit and the use of emerging NVM is seen as one possi... more DRAM technology is approaching its scaling limit and the use of emerging NVM is seen as one possible solution to such an issue. As NVM technologies are not mature enough and does not outperform DRAMs, several studies expect the use of hybrid main memories containing both DRAM and PCM NVM. Redesigning applications for such systems is mandatory as PCM does not have the same performance model as DRAM. In this context, we designed a hybrid memory-aware sorting algorithm called MONTRES-NVM. Since an NVM-based hybrid memory presents a performance gap between DRAM and PCM, we believe that the sorting algorithm falls in the external sorting category. As a matter of fact, we extended our previously designed flash-based external sorting algorithm MONTRES for a hybrid memory by taking profit of byte addressability, and performance asymmetry between reads and writes. MONTRES-NVM enhances the performance of the merge sort algorithm on PCM by more than 60%, the merge sort on DRAM by 3-40% and MONTRES (on a hybrid memory) by 3-33% according to the proportion of already sorted data in the dataset.
arXiv (Cornell University), Sep 10, 2013
Video decoding is considered as one of the most compute and energy intensive application in energ... more Video decoding is considered as one of the most compute and energy intensive application in energy constrained mobile devices. Some specific processing units, such as DSPs, are added to those devices in order to optimize the performance and the energy consumption. However, in DSP video decoding, the inter-processor communication overhead may have a considerable impact on the performance and the energy consumption. In this paper, we propose to evaluate this overhead and analyse its impact on the performance and the energy consumption as compared to the GPP decoding. Our work revealed that the GPP can be the best choice in many cases due to the a significant overhead in DSP decoding which may represents 30% of the total decoding energy.
arXiv (Cornell University), Aug 31, 2012
Today, flash memory are strongly used in the embedded system domain. NAND flash memories are the ... more Today, flash memory are strongly used in the embedded system domain. NAND flash memories are the building block of main secondary storage systems. Such memories present many benefits in terms of data density, I/O performance, shock resistance and power consumption. Nevertheless, flash does not come without constraints: the write / erase granularity asymmetry and the limited lifetime bring the need for specific management. This can be done through the operating system using dedicated Flash File Systems (FFSs). In this document, we present general concepts about FFSs, and implementations example that are JFFS2, YAFFS2 and UBIFS, the most commonly used flash file systems. Then we give performance evaluation results for these FFSs.
2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
In this demo paper, we present an architecture that leverages unused but volatile Cloud resources... more In this demo paper, we present an architecture that leverages unused but volatile Cloud resources to run big data jobs. It is based on a learning algorithm that accurately predicts future availability of resources to automatically scale the ran jobs. We also designed a mechanism that avoids interference between the Big data jobs and co-resident workloads. Our solution is based on Open-Source components such as kubernetes and Apache Spark.

K -MLIO: Enabling K -Means for Large Data-Sets and Memory Constrained Embedded Systems
2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
Machine Learning (ML) algorithms are increasingly used in embedded systems to perform different t... more Machine Learning (ML) algorithms are increasingly used in embedded systems to perform different tasks such as clustering and pattern recognition. These algorithms are both compute and memory intensive whilst embedded devices offer lower hardware capabilities as compared to traditional ML platforms. K-means clustering is one of the widely used ML algorithms. In the case of large data-sets, our analysis showed that on average, more than 70% of the execution time is spent on I/Os. In this paper, we present a version of K-means that drastically reduces the number of I/Os by spanning the data-set only once as compared to the traditional version that reads it several times according to the number of iterations performed. Our evaluation showed that the proposed strategy reduces the overall execution time on large data-sets by 60% on average while lowering the number I/Os operations by 90% with a comparable precision to the traditional K-means implementation.
Session details: Theme: System software and security: EMBS - embedded systems track
Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
Performance Evaluation: Special issue of the 27th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
International audienc

2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)
Cloud data center capacities are over-provisioned to handle demand peaks and hardware failures wh... more Cloud data center capacities are over-provisioned to handle demand peaks and hardware failures which leads to low resources' utilization. One way to improve resource utilization and thus reduce the total cost of ownership is to offer unused resources (referred to as ephemeral resources) at a lower price. However, reselling resources needs to meet the expectations of its customers in terms of Quality of Service. The goal is so to maximize the amount of reclaimed resources while avoiding SLA penalties. To achieve that, cloud providers have to estimate their future utilization to provide availability guarantees. The prediction should consider a safety margin for resources to react to unpredictable workloads. The challenge is to find the safety margin that provides the best trade-off between the amount of resources to reclaim and the risk of SLA violations. Most state-of-the-art solutions consider a fixed safety margin for all types of metrics (e.g., CPU, RAM). However, a unique fixed margin does not consider various workloads variations over time which may lead to SLA violations or/and poor utilization. In order to tackle these challenges, we propose ReLeaSER, a Reinforcement Learning strategy for optimizing the ephemeral resources' utilization in the cloud. ReLeaSER dynamically tunes the safety margin at the host-level for each resource metric. The strategy learns from past prediction errors (that caused SLA violations). Our solution reduces significantly the SLA violation penalties on average by 2.7× and up to 3.4×. It also improves considerably the CPs' potential savings by 27.6% on average and up to 43.6%.
2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
In this paper, we propose RISCLESS, a Reinforcement Learning strategy to exploit unused Cloud res... more In this paper, we propose RISCLESS, a Reinforcement Learning strategy to exploit unused Cloud resources. Our approach consists in using a small proportion of stable ondemand resources alongside the ephemeral ones in order to guarantee customers SLA and reduce the overall costs. The approach decides when and how much stable resources to allocate in order to fulfill customers' demands. RISCLESS improved the Cloud Providers (CPs)' profits by an average of 15.9% compared to past strategies. It also reduced the SLA violation time by 36.7% while increasing the amount of used ephemeral resources by 19.5%.

COPS: Cost Based Object Placement Strategies on Hybrid Storage System for DBaaS Cloud
2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017
Solid State Drives (SSD) are integrated together with Hard Disk Drives (HDD) in Hybrid Storage Sy... more Solid State Drives (SSD) are integrated together with Hard Disk Drives (HDD) in Hybrid Storage Systems (HSS) for Cloud environment. When it comes to storing data, some placement strategies are used to find the best location (SSD or HDD). These strategies should minimize the cost of data placement while satisfying Service Level Objectives (SLO). This paper presents two Cost based Object Placement Strategies (COPS) for DBaaS objects in HSS: a Genetic based approach (G-COPS) and an ad-hoc Heuristic approach (H-COPS) based on incremental optimization. While G-COPS proved to be closer to the optimal solution in case of small instances, H-COPS showed a better scalability as it approached the exact solution even for large instances (by 10% in average). In addition, H-COPS showed small execution times (few seconds) even for large instances which makes it a good candidate to be used in runtime. Both H-COPS and G-COPS performed better than state-of-the-art solutions as they satisfied SLOs while reducing the overall cost by more than 40% for problems of small and large instances.
Simulation de traces réelles d'E/S disque de PC
HAL (Le Centre pour la Communication Scientifique Directe), Oct 4, 2006
On windows file access modes: a performance study
Proceedings of the 4th international symposium on Information and communication technologies, Jan 3, 2005

Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems, 2022
HPC systems are composed of multiple tiers of storage, from the top high performance tier (high s... more HPC systems are composed of multiple tiers of storage, from the top high performance tier (high speed SSDs) to the bottom capacitive one (tapes). File placement in such architecture is managed through prefetchers (bottom-up) and eviction policies (top-down). Most state-of-the-art work focus on the former while using algorithm flavors of LRU, LFU and FIFO for the latter. LRU was for long considered the best choice. However, recent studies has shown that the simplicity of FIFO could make it more scalable than LRU because of metadata management, and thus more adequate in several cases. In this paper, we propose a new eviction policy based on predicted files lifetimes. It is comparable to FIFO in terms of metadata overhead and simplicity (thus scalability), while giving a hit ratio comparable to LRU (or even 10% better for some tested traces). We also propose a naive multi-tier heterogeneous storage simulator implementation to evaluate such policies.
Teaching Real-Time Scheduling Analysis with Cheddar

Introduction to the Special Issue on Memory and Storage Systems for Embedded and IoT Applications
ACM Transactions on Embedded Computing Systems, 2022
With the rapid advances in sensing and communication technologies, embedded systems (e.g., IoT an... more With the rapid advances in sensing and communication technologies, embedded systems (e.g., IoT and edge devices) have evolved tremendously in recent years. On the other hand, data-intensive applications are more and more used on such platforms. However, embedded systems usually have limited energy, computing power, and memory/storage space. Thus, new non-volatile memories (NVMs), such as STT-MRAM, PCM, and Flash Memory, have emerged as popular alternatives in the design of the CPU cache, main memory, and storage to improve the performance, reduce the energy consumption, and increase the memory/storage capacity of embedded systems for novel embedded and IoT applications. This special issue received nearly 30 submissions and involved numerous reviewers who were selected for their expertise on the precise topics of each article. Thus, it represents a collective effort from the research community and industry participants on an international scale. From the many excellent submissions received, twelve articles are included in this special issue. The articles appearing in this special issue tackle some of the most recent and impactful design issues of memory and storage systems for embedded and IoT applications, ranging from system components (e.g., CPU cache, main memory, and storage) to embedded applications, including four categories of topics herein: (1) on-chip cache and scratchpad memory designs, (2) NVM main memory and hybrid DRAM-NVM main memory system designs, (3) enhanced storage systems with flash memory, and (4) various techniques to resolve the security, memory and I/O bottleneck, and real-time issues of embedded and IoT applications. In this special issue, the first three articles are about on-chip cache and scratchpad memory designs. For embedded systems, the on-chip area is a scare resource and the high-density STT-MRAM provides a larger on-chip cache capacity. However, STT-MRAM has the read disturbance issue and its special read/write characteristics also introduce new CPU cache design issues, thus incurring new design challenges on applying STT-MRAM as the CPU cache. On the other hand, the memory access stability is critical to the service quality of embedded systems. Although scratchpad memory (SPM) provides a good service rate improvement for embedded systems, new performance estimation models are needed to accurately estimate the performance of such systems with on-chip SPM. The article “CORIDOR: using COherence and tempoRal locality to mitigate read Disturbance errOR in STT-RAM caches” proposes to apply STT-RAM as the Last-Level-Cache of CPU to enlarge the on-chip cache capacity. However, STT-RAM suffers from the read-disturbance error (RDE) issue and requires running restore operations, thus imposing latency and energy penalties. This article presents a design called CORIDOR to resolve the RDE issue by avoiding the restore for the
Toolbox for Dimensioning Windows Storage Systems
Third International Conference on the Quantitative Evaluation of Systems - (QEST'06), 2006
Performance and Power Consumption of SSD Based Systems: Experimental Results
In this chapter, we aim to describe the measurements that have been performed on real storage sys... more In this chapter, we aim to describe the measurements that have been performed on real storage systems in order to understand the performance and energy behavior of SSDs and HDDs. To this end, we rely on the methodology and the tools described in the previous chapter.

ACM Transactions on Storage, 2021
Cloud federation enables service providers to collaborate to provide better services to customers... more Cloud federation enables service providers to collaborate to provide better services to customers. For cloud storage services, optimizing customer object placement for a member of a federation is a real challenge. Storage, migration, and latency costs need to be considered. These costs are contradictory in some cases. In this article, we modeled object placement as a multi-objective optimization problem. The proposed model takes into account parameters related to the local infrastructure, the federated environment, customer workloads, and their SLAs. For resolving this problem, we propose CDP-NSGAII IR , a Constraint Data Placement matheuristic based on NSGAII with Injection and Repair functions. The injection function aims to enhance the solutions’ quality. It consists to calculate some solutions using an exact method then inject them into the initial population of NSGAII. The repair function ensures that the solutions obey the problem constraints and so prevents from exploring lar...
Evaluation of Performance and Power Consumption of Storage Systems
Flash Memory Integration, 2017
In this chapter, we will illustrate some general methods for evaluating the performance and power... more In this chapter, we will illustrate some general methods for evaluating the performance and power consumption of storage systems based on NAND flash memory. We will consider both the devices based on FTL ( Flash Translation Layer ) and the systems based on FFS ( Flash File Systems ). A performance/power consumption analysis of a storage system is always performed in the conditions of a reference workload, i.e. a benchmark. The first section of this chapter deals with benchmarking for storage systems. The second section introduces the main metrics of performance and power consumption. The third section is a state-of-the-art overview of performance/power consumption analysis studies based on measurements. Finally, the fourth section addresses the simulation-based studies.
Analysis of Memory Performance: Mixed Rank Performance Across Microarchitectures
The two primary measurements for performance in storage and memory systems are latency and throug... more The two primary measurements for performance in storage and memory systems are latency and throughput. It is interesting to see how the memory DIMMs are populated on the server board impact performance. The system bus speed is important when communicating over the Quick Path Interconnect (QPI) to the other CPU local memory resources. This is a crucial part of the performance of systems with a Non-Uniform Memory Access (NUMA). This paper investigates the best practice approaches to optimize performance which have applied to the last few CPU and chipset generations.
Uploads
Papers by Jalil Boukhobza