Papers by Zbigniew Kalbarczyk

Pre-processed Tracing Data for Popular Microservice Benchmarks
We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Ku... more We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://github.com/James-QiuHaoran/firm). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.

Measuring the Resiliency of Extreme-Scale Computing Environments
Springer series in reliability engineering, 2016
This chapter presents a case study on how to characterize the resiliency of large-scale computers... more This chapter presents a case study on how to characterize the resiliency of large-scale computers. The analysis focuses on the failures and errors of Blue Waters, the Cray hybrid (CPU/GPU) supercomputer at the University of Illinois at Urbana-Champaign. The characterization is performed by a joint analysis of several data sources, which include workload and error/failure logs as well as manual failure reports. We describe LogDiver, a tool to automate the data preprocessing and metric computation that measure the impact of system errors and failures on user applications, i.e., the compiled programs launched by user jobs that can execute across one or more XE (CPU) or XK (CPU\(+\)GPU) nodes. Results include (i) a characterization of the root causes of single node failures; (ii) a direct assessment of the effectiveness of system-level failover and of memory, processor, network, GPU accelerator, and file system error resiliency; (iii) an analysis of system-wide outages; (iv) analysis of application resiliency to system-related errors; and (v) insight into the relationship between application scale and resiliency across different error categories.
Exploiting Temporal Data Diversity for Detecting Safety-critical Faults in AV Compute Systems
2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

F-Pro: a Fast and Flexible Provenance-Aware Message Authentication Scheme for Smart Grid
2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm)
Successful attacks against smart grid systems often exploited the insufficiency of checking mecha... more Successful attacks against smart grid systems often exploited the insufficiency of checking mechanisms — e.g., commands are largely executed without checking whether they are issued by the legitimate source and whether they are transmitted through the right network path and hence undergone all necessary mediations and scrutinizes. While adding such enhanced security checking into smart grid systems will significantly raise the bar for attackers, there are two key challenges: 1) the need for real-time, and 2) the need for flexibility — i.e., the scheme needs to be applicable to different deployment settings/communication models and counter various types of attacks. In this work, we design and implement F-Pro, a transparent, bump-in-the-wire solution for fast and flexible message authentication scheme that addresses both challenges. Specifically, by using a lightweight hash-chaining-based scheme that supports provenance verification, F-Pro achieves less than 2 milliseconds end-to-end proving and verifying delay for a single or 2-hop communication in a variety of smart grid communication models, when implemented on a low-cost BeagleBoard-X15 platform.

Towards longitudinal analysis of a population's electronic health records using factor graphs
Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, 2016
In this feasibility study, we demonstrate the use of a factorgraph-based probabilistic graphical ... more In this feasibility study, we demonstrate the use of a factorgraph-based probabilistic graphical model approach to process longitudinal data derived from a population's electronic health records (EHR). Processing of EHR allows for forecasting patient-specific health complications and inference of population-level statistics on several epidemiological factors. As a case-study, we provide preliminary results and demonstrate feasibility of our approach by processing the EHR of a diabetic cohort in Singapore. Our model passes the feasibility test as we are able to forecast a series of health complications of a new patient based on the factor functions inferred from EHR of 100 diabetic patients spanning 10-years. This forecast gives both the caregivers and the patient a better view of the patient's health in the coming years and increases patient's motivation to stay healthy and conform to medication plan. Furthermore, our approach informs commonly occurring health complications in the population that warrant hospital readmissions, which helps a physician/clinician in decide when to intervene to avoid complications in order to improve the patient's quality of life and minimize the cost of care.

37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), 2007
It is my great pleasure to welcome you to DSN 2007. DCCS 2007 continues a tradition of technical ... more It is my great pleasure to welcome you to DSN 2007. DCCS 2007 continues a tradition of technical excellence in the form of a productive blend of contributions from industry and academia. This year's DCCS program consists of 53 excellent papers that cover broad range of dependability issues and show the depth and breadth of our community's research and practice in development and assessment of highly reliable and secure computing systems and applications. A total of 16 sessions are planned: 15 consisting of regular papers and one devoted to practical experience reports. The technical preparations for the symposium commenced in September 2006, when the DCCS 2007 Web page was opened to accept submissions. A total of 212 papers from 27 countries and five continents were received. Of them, 48 were accepted as regular papers and five were accepted as practical experience reports. Each regular paper was assigned at least five reviewers (two Program Committee members and three external reviewers), while practical experience reports were each reviewed by three Program Committee members. More than 93% of the reviews were returned. The Program Committee as a whole greatly appreciated the thoroughness of the reviews. Technical excellence and originality were the foremost criteria for selection, and the Committee also tried to maintain the right balance of theory and practice. The technical program for DCCS 2007 would not have been possible without the active support and help of numerous individuals. First of all, we would like to thank all the authors for taking interest in and giving support to DCCS 2007. In the end, it was their contributions that allowed us to present a strong program. I would like to express my deep appreciation to the 47 Program Committee members and the 390 external reviewers for their unprecedented dedication and hard work in evaluating the papers. I thank the PC members for devoting their time to the reviewing process and for making the program committee meeting (in "somewhat cold and snowy" Urbana in the middle of February) a success. The last several months, many individuals helped me in many ways on this unforgettable journey. Especially, I would like to thank Mootaz Elnozahy and Rick Schlichting for their support and mentoring. I am grateful to Neeraj Suri for his help in running the PC meeting. Special thanks are also due to Mohamed Kaâniche, as a conference coordinator, for his enthusiasm and excellent job in interfacing with the DSN Steering Committee, helping with the PC meeting, and making sure that we meet all the deadlines. Finally, I thank the administrative and technical staff in the Coordinated Science Laboratory in the University of Illinois at Urbana-Champaign for their assistance in hosting the PC meeting. Thank you for attending the symposium. I believe that you have an outstanding program ahead and hope that you will be able not only to enjoy it, but also to contribute to it by engaging in many fruitful discussions with your colleagues. Enjoy the Symposium and enjoy Edinburgh!
Dependable flight control system using data diversity with error recovery
Computer Systems: Science & Engineering, 1994

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC), 2017
The components and systems involved in railway operation are subject to stringent reliability and... more The components and systems involved in railway operation are subject to stringent reliability and safety requirements, but up until now the cyber security of those same systems has been largely under-explored. In this work, we examine a widely-used railway technology, track beacons or balises, which provide a train with its position on the track and often assist with accurate stopping at stations. Balises have been identified as one potential weak link in train signalling systems. We evaluate an automatic train stop controller that is used in real deployment and show that attackers who can compromise the availability or integrity of the balises' data can cause the trains to stop dozens of meters away from the right position, disrupting train service. To address this risk, we have developed a novel countermeasure that ensures the correct stopping of the trains in the presence of attacks, with only a small extra stopping delay.

Machine Learning in the Hands of a Malicious Adversary: A Near Future If Not Reality 1
Game Theory and Machine Learning for Cyber Security, 2021
Machine learning and artificial intelligence are being adopted to varying applications for automa... more Machine learning and artificial intelligence are being adopted to varying applications for automation and flexibility. Cyber security to be no different, researchers and engineers have been investigating the use of data‐driven technologies to harden the security of cyberinfrastructure and the possibility of attackers exploiting vulnerabilities in such technology (e.g. adversarial machine learning). However, not much work has investigated how attackers might try to take advantage of machine learning and AI technology against us. This chapter discusses the potential advances in targeted attacks through the utilization of machine learning techniques. In this chapter, we introduce a new concept of AI‐driven malware which advances already sophisticated cyber threats (i.e. advanced targeted attacks) that are on the rise. Furthermore, we demonstrate our prototype AI‐driven malware, built on top of a set of statistical learning technologies, on two distinct cyber‐physical systems (i.e. the Raven‐II surgical robot and a building automation system). Our experimental results demonstrate that with the support of AI technology, malware can mimic human attackers in deriving attack payloads that are custom to the target system and in determining the most opportune time to trigger the attack payload so to maximize the chance of success in realizing the malicious intent. No public records report a real threat driven by machine learning models. However, such advanced malware might already exist and simply remain undetected. We hope this chapter motivates further research on advanced offensive technologies, not to favor the adversaries, but to know them and be prepared.

Prediction of adenocarcinoma development using game theory
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, Jul 1, 2017
Recent research shows that gene expression changes appear to correlate well with the progression ... more Recent research shows that gene expression changes appear to correlate well with the progression of many types of cancers. Using changes in gene expression as a basis, this paper proposes a data-driven 2-player game-theoretic model to predict the risk of adenocarcinoma based on Nash equilibrium. A key innovation in this work is the pay-off function which is a weighted composite of the expression of a cohort of tumor-suppressor genes (as one player) and an analogous cohort of oncogenes (as the other player). Another novelty of the model is its ability to predict the risk that a healthy sample will develop adenocarcinoma, if its associated gene expression is comparable to that of early-stage tumor samples. The model is validated using two of the largest publicly available adenocarcinoma datasets. The results show that i) the model is able to distinguish between healthy and cancerous samples with an accuracy of 93%, and ii) 95% of the healthy samples said to be at risk had gene express...
Go with the flow
Chemistry & Industry, 2013
2014 IEEE International Congress on Big Data, 2014
This paper explores a hybrid approach of intrusion detection through knowledge discovery from big... more This paper explores a hybrid approach of intrusion detection through knowledge discovery from big data using Latent Dirichlet Allocation (LDA). We identify the "hidden" patterns of operations conducted by both normal users and malicious users from a large volume of network/systems logs, by mapping this problem to the topic modeling problem and leveraging the well established LDA models and learning algorithms. This new approach potentially completes the strength of signature-based and anomaly-based methods.
Dependable Computing and Communications Symposium (DCCS)
Flavio Junqueira (Yahoo, Spain) Zbigniew Kalbarczyk (UIUC, USA) Nagarajan Kandasamy (Drexel U., U... more Flavio Junqueira (Yahoo, Spain) Zbigniew Kalbarczyk (UIUC, USA) Nagarajan Kandasamy (Drexel U., USA) Vana Kelogeraki (UC Riverside, USA) Sy-Yen Kuo (National Taiwan University, Taiwan) Fabio Martinelli (CNR, Pisa, Italy) Shivakant Mishra (U. Colorado, USA) Simin Nadjm-Tehrani (Linkoping U., Sweden) Takashi Nanya (U. Tokyo, Japan) Andras Pataricza (BUTE, Hungary) Michael Paulitsch (EADS Innovation Works, Germany) Leonardo Querzoni (U. Rome, Italy) Hari-Govind Ramasamy (IBM, USA) Michael Reiter (U. North Carolina ...

2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS), 2016
This paper studies false data injection attacks against automatic generation control (AGC), a fun... more This paper studies false data injection attacks against automatic generation control (AGC), a fundamental control system used in all power grids to maintain the grid frequency at a nominal value. Attacks on the sensor measurements for AGC can cause frequency excursion that triggers remedial actions such as disconnecting customer loads or generators, leading to blackouts and potentially costly equipment damage. We derive an attack impact model and analyze an optimal attack, consisting of a series of false data injections, that minimizes the remaining time until the onset of remedial actions, leaving the shortest time for the grid to counteract. We show that, based on eavesdropped sensor data and a few feasible-to-obtain system constants, the attacker can learn the attack impact model and achieve the optimal attack in practice. This paper provides essential understanding on the limits of physical impact of false data injections on power grids, and provides an analysis framework to guide the protection of sensor data links. Our analysis and algorithms are validated by experiments on a physical 16-bus power system testbed and extensive simulations based on a 37-bus power system model.
Networked Systems Design and Implementation, 2019
Design and Implementation (NSDI '19
Security requirements in the cloud have led to the development of new monitoring techniques that ... more Security requirements in the cloud have led to the development of new monitoring techniques that can be broadly categorized as virtual machine introspection (VMI) techniques. VMI monitoring aims to provide high-fidelity monitoring while keeping the monitor secure by leveraging the isolation provided by virtualization. This work shows that not all hypervisor activity is hidden from the guest virtual machine (VM), and the guest VM can detect when the hypervisor performs an action on the guest VM, such as a VMI monitoring check. We call this technique hypervisor introspection and demonstrate how a malicious insider could utilize this technique to evade a passive VMI system.
2.1 Characterisitcs of jobs running on System A (in brown, left in each subfigure) and System B (... more 2.1 Characterisitcs of jobs running on System A (in brown, left in each subfigure) and System B (in green, right in each subfigure) . . . . . . . . . . . 2.2 Characterisitcs of jobs running on System A (in brown, left in each subfigure) and System B (in green, right in each subfigure) . . . . . . . . . . . 2.3 Characterisitcs of jobs running on System A (in brown, left in each subfigure) and System B (in green, right in each subfigure) .
arXiv (Cornell University), Apr 24, 2020

arXiv (Cornell University), Sep 18, 2017
Modern trains rely on balises (communication beacons) located on the track to provide location in... more Modern trains rely on balises (communication beacons) located on the track to provide location information as they traverse a rail network. Balises, such as those conforming to the Eurobalise standard, were not designed with security in mind and are thus vulnerable to cyber attacks targeting data availability, integrity, or authenticity. In this work, we discuss data integrity threats to balise transmission modules and use high-fidelity simulation to study the risks posed by data integrity attacks. To mitigate such risk, we propose a practical twolayer solution: at the device level, we design a lightweight and low-cost cryptographic solution to protect the integrity of the location information; at the system layer, we devise a secure hybrid train speed controller to mitigate the impact under various attacks. Our simulation results demonstrate the effectiveness of our proposed solutions.
IEEE Computer, Mar 1, 2020
Uploads
Papers by Zbigniew Kalbarczyk