Papers by Thibault Debatty

Cyber attacks have become a major factor in the world today and their effect can be devastating. ... more Cyber attacks have become a major factor in the world today and their effect can be devastating. Protecting corporate and government networks has become an increasingly difficult challenge, when new persistent malware infections can remain undetected for long periods of time. In this paper, we introduce the Multi-agent ranking framework (MARK), a novel approach to Advanced Persistent Threat detection through the use of behavioral-analysis and pattern recognition. Such behavior-based mechanisms for discovering and eliminating new sophisticated threats are lacking in current detection systems, but research in this domain is gaining more importance and traction. Our goal is to take a on-hands approach in the detection by actively hunting for the threats, instead of passively waiting for events and alerts to signal abnormal behavior. We devise a framework that can be easily deployed as a stand-alone multiagent system or to compliment many Security Information and Event Management systems. The MARK framework incorporates known and new beyond state-of-the-art detection techniques, in addition to facilitating incorporation of new data sources and detection agent modules through plug-ins. Throughout our testing and evaluation, impressive true detection rates and acceptable false positive rates were obtained, which proves the usefulness of the framework.
Graph-based APT detection
In this paper we propose a new algorithm to detect Advanced Persistent Threats (APT's) that r... more In this paper we propose a new algorithm to detect Advanced Persistent Threats (APT's) that relies on a graph model of HTTP traffic. We also implement a complete detection system with a web interface that allows to interactively analyze the data. We perform a complete parameter study and experimental evaluation using data collected on a real network. The results show that the performance of our system is comparable to currently available antiviruses, although antiviruses use signatures to detect known malwares while our algorithm solely uses behavior analysis to detect new undocumented attacks.
Building a Cyber Range for training CyberDefense Situation Awareness
In cyberspace, maintaining a high level of situation awareness (CDSA) is critical for supporting ... more In cyberspace, maintaining a high level of situation awareness (CDSA) is critical for supporting the decision making process. This can only be trained by simulating real incidents as realistically as possible. A Cyber Range is therefore an essential tool. It allows to simulate complex networks and makes it possible to involve large numbers of participants. In this paper we present the important role of Cyber Ranges for improving CDSA, then we present how a Cyber Range can be implemented to allow such a training.

arXiv (Cornell University), Mar 15, 2021
Intrusion Detection Systems (IDS) are now an essential element when it comes to securing computer... more Intrusion Detection Systems (IDS) are now an essential element when it comes to securing computers and networks. Despite the huge research efforts done in the field, handling sources' reliability remains an open issue. To address this problem, this paper proposes a novel contextual discounting method based on sources' reliability and their distinguishing ability between normal and abnormal behavior. Dempster-Shafer theory, a general framework for reasoning under uncertainty, is used to construct an evidential classifier. The NSL-KDD dataset, a significantly revised and improved version of the existing KDD-CUP'99 dataset, provides the basis for assessing the performance of our new detection approach. While giving comparable results on the KDDTest+ dataset, our approach outperformed some other state-of-the-art methods on the KDDTest-21 dataset which is more challenging.

In this paper, we compare the efficiency of two binary classifiers. The first one uses the Weight... more In this paper, we compare the efficiency of two binary classifiers. The first one uses the Weighted Ordered Weighted Averaging (WOWA) aggregation function whose coefficients are learned thanks to a genetic algorithm. The second is based on an artificial neural network trained by a backpropagation algorithm. They are trained to be used in a multi-criteria decision system. These kind of multi-criteria system are more and more common in the cyber-defence field. In this work, we compare the performance of these two classifiers by using two criteria: Area Under the Curve of a Receiver Operating Characteristics (ROC) curve and the Area Under the Curve of a Precision-Recall (P-R) curve. This second criterion is more adapted for imbalanced dataset what is often the case in the cyber-security field. We perform a complete parameter study of these classifiers to optimize their performance. The dataset used for this work is a pool of Hypertext Preprocessor (PHP) files analyzed by a multiagent PHP webshell detector. We obtain different good results, especially for neural networks and highlights the advantage of the genetic algorithm method that allows a physical interpretation of the result.
This thesis was possible thanks to the help and support of many. First of all, I would like to th... more This thesis was possible thanks to the help and support of many. First of all, I would like to thank my two advisors, Pietro Michiardi and Wim Mees, for their valuable advice, coaching, support and patience. I naturally thank the members of the thesis committee for taking the time to read this thesis: Giovanni Neglia, Jean-Michel Dricot, Marc Dacier and Elena Baralis. I would like to specially thank Olivier Thonnard, without whom this thesis would definitively not have been possible. A lot of thanks go to my friends and colleagues from the Royal Military Academy, from Eurecom, and from the Symantec office at Eurecom. And last but not least, I would like to thank my wife Axelle for supporting me, particularly the days and nights preceding each paper submission deadline.

Springer eBooks, 2021
In the last decade, the use of Machine Learning techniques in anomaly-based intrusion detection s... more In the last decade, the use of Machine Learning techniques in anomaly-based intrusion detection systems has seen much success. However, recent studies have shown that Machine learning in general and deep learning specifically are vulnerable to adversarial attacks where the attacker attempts to fool models by supplying deceptive input. Research in computer vision, where this vulnerability was first discovered, has shown that adversarial images designed to fool a specific model can deceive other machine learning models. In this paper, we investigate the transferability of adversarial network traffic against multiple machine learning-based intrusion detection systems. Furthermore, we analyze the robustness of the ensemble intrusion detection system, which is notorious for its better accuracy compared to a single model, against the transferability of adversarial attacks. Finally, we examine Detect & Reject as a defensive mechanism to limit the effect of the transferability property of adversarial network traffic against machine learning-based intrusion detection systems.

arXiv (Cornell University), Feb 22, 2016
In this paper we propose an online approximate k-nn graph building algorithm, which is able to qu... more In this paper we propose an online approximate k-nn graph building algorithm, which is able to quickly update a k-nn graph using a flow of data points. One very important step of the algorithm consists in using the current distributed graph to search for the neighbors of a new node. Hence we also propose a distributed partitioning method based on balanced k-medoids clustering, that we use to optimize the distributed search process. Finally, we present the improved sequential search procedure that is used inside each partition. We also perform an experimental evaluation of the different algorithms, where we study the influence of the parameters and compare the result of our algorithms to existing state of the art. This experimental evaluation confirms that the fast online k-nn graph building algorithm produces a graph that is highly similar to the graph produced by an offline exhaustive algorithm, while it requires less similarity computations.

Computers & Security, Jun 1, 2023
Due to the numerous advantages of machine learning (ML) algorithms, many applications now incorpo... more Due to the numerous advantages of machine learning (ML) algorithms, many applications now incorporate them. However, many studies in the field of image classification have shown that MLs can be fooled by a variety of adversarial attacks. These attacks take advantage of ML algorithms' inherent vulnerability. This raises many questions in the cybersecurity field, where a growing number of researchers are recently investigating the feasibility of such attacks against machine learning-based security systems, such as intrusion detection systems. The majority of this research demonstrates that it is possible to fool a model using features extracted from a raw data source, but it does not take into account the real implementation of such attacks, i.e., the reverse transformation from theory to practice. The real implementation of these adversarial attacks would be influenced by various constraints that would make their execution more difficult. As a result, the purpose of this study was to investigate the actual feasibility of adversarial attacks, specifically evasion attacks, against network-based intrusion detection systems (NIDS), demonstrating that it is entirely possible to fool these ML-based IDSs using our proposed adversarial algorithm while assuming as many constraints as possible in a black-box setting. In addition, since it is critical to design defense mechanisms to protect ML-based IDSs against such attacks, a defensive scheme is presented. Realistic botnet traffic traces are used to assess this work. Our goal is to create adversarial botnet traffic that can avoid detection while still performing all of its intended malicious functionality.

arXiv (Cornell University), Mar 13, 2023
Nowadays, numerous applications incorporate machine learning (ML) algorithms due to their promine... more Nowadays, numerous applications incorporate machine learning (ML) algorithms due to their prominent achievements. However, many studies in the field of computer vision have shown that ML can be fooled by intentionally crafted instances, called adversarial examples. These adversarial examples take advantage of the intrinsic vulnerability of ML models. Recent research raises many concerns in the cybersecurity field. An increasing number of researchers are studying the feasibility of such attacks on security systems based on ML algorithms, such as Intrusion Detection Systems (IDS). The feasibility of such adversarial attacks would be influenced by various domain-specific constraints. This can potentially increase the difficulty of crafting adversarial examples. Despite the considerable amount of research that has been done in this area, much of it focuses on showing that it is possible to fool a model using features extracted from the raw data but does not address the practical side, i.e., the reverse transformation from theory to practice. For this reason, we propose a review browsing through various important papers to provide a comprehensive analysis. Our analysis highlights some challenges that have not been addressed in the reviewed papers.

Computers & Security
Due to the numerous advantages of machine learning (ML) algorithms, many applications now incorpo... more Due to the numerous advantages of machine learning (ML) algorithms, many applications now incorporate them. However, many studies in the field of image classification have shown that MLs can be fooled by a variety of adversarial attacks. These attacks take advantage of ML algorithms' inherent vulnerability. This raises many questions in the cybersecurity field, where a growing number of researchers are recently investigating the feasibility of such attacks against machine learning-based security systems, such as intrusion detection systems. The majority of this research demonstrates that it is possible to fool a model using features extracted from a raw data source, but it does not take into account the real implementation of such attacks, i.e., the reverse transformation from theory to practice. The real implementation of these adversarial attacks would be influenced by various constraints that would make their execution more difficult. As a result, the purpose of this study was to investigate the actual feasibility of adversarial attacks, specifically evasion attacks, against network-based intrusion detection systems (NIDS), demonstrating that it is entirely possible to fool these ML-based IDSs using our proposed adversarial algorithm while assuming as many constraints as possible in a black-box setting. In addition, since it is critical to design defense mechanisms to protect ML-based IDSs against such attacks, a defensive scheme is presented. Realistic botnet traffic traces are used to assess this work. Our goal is to create adversarial botnet traffic that can avoid detection while still performing all of its intended malicious functionality.
Attack Detection in SS7
Communications in computer and information science, 2022
arXiv (Cornell University), Apr 20, 2021
Nowadays, Deep Neural Networks (DNNs) report stateof-the-art results in many machine learning are... more Nowadays, Deep Neural Networks (DNNs) report stateof-the-art results in many machine learning areas, including intrusion detection. Nevertheless, recent studies in computer vision have shown that DNNs can be vulnerable to adversarial attacks that are capable of deceiving them into misclassification by injecting specially crafted data. In security-critical areas, such attacks can cause serious damage; therefore, in this paper, we examine the effect of adversarial attacks on deep learning-based intrusion detection. In addition, we investigate the effectiveness of adversarial training as a defense against such attacks. Experimental results show that with sufficient distortion, adversarial examples are able to mislead the detector and that the use of adversarial training can improve the robustness of intrusion detection.

Future Generation Computer Systems
Nowadays, intrusion detection systems based on deep learning deliver state-of-the-art performance... more Nowadays, intrusion detection systems based on deep learning deliver state-of-the-art performance. However, recent research has shown that specially crafted perturbations, called adversarial examples, are capable of significantly reducing the performance of these intrusion detection systems. The objective of this paper is to design an efficient transfer learning-based adversarial detector and then to assess the effectiveness of using multiple strategically placed adversarial detectors compared to a single adversarial detector for intrusion detection systems. In our experiments, we implement existing state-of-the-art models for intrusion detection. We then attack those models with a set of chosen evasion attacks. In an attempt to detect those adversarial attacks, we design and implement multiple transfer learning-based adversarial detectors, each receiving a subset of the information passed through the IDS. By combining their respective decisions, we illustrate that combining multiple detectors can further improve the detectability of adversarial traffic compared to a single detector in the case of a parallel IDS design.
JMLR: Workshop and Conference Proceedings 29:1–13, 2014 BIGMINE 2014 Scalable Graph Building from Text Data
In this paper we propose NNCTPH, a new MapReduce algorithm that is able to build an approximate k... more In this paper we propose NNCTPH, a new MapReduce algorithm that is able to build an approximate k-NN graph from large text datasets. The algorithm uses a modified version of Context Triggered Piecewise Hashing to bin the input data into buckets, and uses an exhaustive search inside the buckets to build the graph. It also uses multiple stages to join the different unconnected subgraphs. We experimentally test the algorithm on different datasets consisting of the subject of spam emails. Although the algorithm is still at an early development stage, it already proves to be four times faster than a MapReduce implementation of NN-Descent, for the same quality of produced graph.
Building a Cyber Range for training CyberDefense Situation Awareness
2019 International Conference on Military Communications and Information Systems (ICMCIS), 2019
In cyberspace, maintaining a high level of situation awareness (CDSA) is critical for supporting ... more In cyberspace, maintaining a high level of situation awareness (CDSA) is critical for supporting the decision making process. This can only be trained by simulating real incidents as realistically as possible. A Cyber Range is therefore an essential tool. It allows to simulate complex networks and makes it possible to involve large numbers of participants. In this paper we present the important role of Cyber Ranges for improving CDSA, then we present how a Cyber Range can be implemented to allow such a training.

ArXiv, 2016
In this paper we propose an online approximate k-nn graph building algorithm, which is able to qu... more In this paper we propose an online approximate k-nn graph building algorithm, which is able to quickly update a k-nn graph using a flow of data points. One very important step of the algorithm consists in using the current distributed graph to search for the neighbors of a new node. Hence we also propose a distributed partitioning method based on balanced k-medoids clustering, that we use to optimize the distributed search process. Finally, we present the improved sequential search procedure that is used inside each partition. We also perform an experimental evaluation of the different algorithms, where we study the influence of the parameters and compare the result of our algorithms to existing state of the art. This experimental evaluation confirms that the fast online k-nn graph building algorithm produces a graph that is highly similar to the graph produced by an offline exhaustive algorithm, while it requires less similarity computations.

Cyber attacks have become a major factor in the world today and their effect can be devastating. ... more Cyber attacks have become a major factor in the world today and their effect can be devastating. Protecting corporate and government networks has become an increasingly difficult challenge, when new persistent malware infections can remain undetected for long periods of time. In this paper, we introduce the Multi-agent ranking framework (MARK), a novel approach to Advanced Persistent Threat detection through the use of behavioral-analysis and pattern recognition. Such behavior-based mechanisms for discovering and eliminating new sophisticated threats are lacking in current detection systems, but research in this domain is gaining more importance and traction. Our goal is to take a on-hands approach in the detection by actively hunting for the threats, instead of passively waiting for events and alerts to signal abnormal behavior. We devise a framework that can be easily deployed as a stand-alone multiagent system or to compliment many Security Information and Event Management system...
Scalable k-NN based text clustering, with Alessandro Lulli, Thibault Debatty, Matteo Dell'Amico, and Pietro Michiardi

Communications in Computer and Information Science, 2021
In the last decade, the use of Machine Learning techniques in anomaly-based intrusion detection s... more In the last decade, the use of Machine Learning techniques in anomaly-based intrusion detection systems has seen much success. However, recent studies have shown that Machine learning in general and deep learning specifically are vulnerable to adversarial attacks where the attacker attempts to fool models by supplying deceptive input. Research in computer vision, where this vulnerability was first discovered, has shown that adversarial images designed to fool a specific model can deceive other machine learning models. In this paper, we investigate the transferability of adversarial network traffic against multiple machine learning-based intrusion detection systems. Furthermore, we analyze the robustness of the ensemble intrusion detection system, which is notorious for its better accuracy compared to a single model, against the transferability of adversarial attacks. Finally, we examine Detect & Reject as a defensive mechanism to limit the effect of the transferability property of adversarial network traffic against machine learning-based intrusion detection systems.
Uploads
Papers by Thibault Debatty