Assumption-free anomaly detection in time series
2005, Proceedings of the 17th international conference on Scientific and statistical database management
Sign up for access to the world's latest research
Abstract
Recent advancements in sensor technology have made it possible to collect enormous amounts of data in real time. However, because of the sheer volume of data most of it will never be inspected by an algorithm, much less a human being. One way to mitigate this problem is to perform ...
Related papers
arXiv: Applications, 2020
One of the contemporary challenges in anomaly detection is the ability to detect, and differentiate between, both point and collective anomalies within a data sequence or time series. The anomaly package has been developed to provide users with a choice of anomaly detection methods and, in particular, provides an implementation of the recently proposed CAPA family of anomaly detection algorithms. This article describes the methods implemented whilst also highlighting their application to simulated data as well as real data examples contained in the package.
IEEE Transactions on Neural Networks and Learning Systems, 2021
Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This paper presents a systematic and comprehensive evaluation of unsupervised and semi-supervised deep-learning based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Unlike previous works, we vary the model and post-processing of model errors, i.e. the scoring functions independently of each other, through a grid of 10 models and 4 scoring functions, comparing these variants to state of the art methods. In time-series anomaly detection, detecting anomalous events is more important than detecting individual anomalous time-points. Through experiments, we find that the existing evaluation metrics either do not take events into account, or cannot distinguish between a good detector and trivial detectors, such as a random or an all-positive detector. We propose a new metric to overcome these drawbacks, namely, the composite F-score (F c1), for evaluating time-series anomaly detection. Our study highlights that dynamic scoring functions work much better than static ones for multivariate time series anomaly detection, and the choice of scoring functions often matters more than the choice of the underlying model. We also find that a simple, channel-wise model -the Univariate Fully-Connected Auto-Encoder, with the dynamic Gaussian scoring function emerges as a winning candidate for both anomaly detection and diagnosis, beating state of the art algorithms.
Journal of Artificial Intelligence Research, 2021
The existence of an anomaly detection method that is optimal for all domains is a myth. Thus, there exists a plethora of anomaly detection methods which increases every year for a wide variety of domains. But a strength can also be a weakness; given this massive library of methods, how can one select the best method for their application? Current literature is focused on creating new anomaly detection methods or large frameworks for experimenting with multiple methods at the same time. However, and especially as the literature continues to expand, an extensive evaluation of every anomaly detection method is simply not feasible. To reduce this evaluation burden, we present guidelines to intelligently choose the optimal anomaly detection methods based on the characteristics the time series displays such as seasonality, trend, level change concept drift, and missing time steps. We provide a comprehensive experimental validation and survey of twelve anomaly detection methods over different time series characteristics to form guidelines based on several metrics: the AUC (Area Under the Curve), windowed F-score, and Numenta Anomaly Benchmark (NAB) scoring model. Applying our methodologies can save time and effort by surfacing the most promising anomaly detection methods instead of experimenting extensively with a rapidly expanding library of anomaly detection methods, especially in an online setting. 1. Similar analysis (Emmott et al., 2015) has been performed before to compute the influence of meta-data on anomaly detection but for feature-vector-based datasets instead of time series. 2. See Appendix for a table of all acronyms and their definitions. 3. See https://github.com/dn3kmc/jair anomaly detection for all source code implementations, Jupyter notebooks demonstrating how to determine characteristics, and datasets.
2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)
Anomaly detection is an active research topic in many different fields such as intrusion detection, network monitoring, system health monitoring, IoT healthcare, etc. However, many existing anomaly detection approaches require either human intervention or domain knowledge, and may suffer from high computation complexity, consequently hindering their applicability in real-world scenarios. Therefore, a lightweight and ready-to-go approach that is able to detect anomalies in real-time is highly sought-after. Such an approach could be easily and immediately applied to perform time series anomaly detection on any commodity machine. The approach could provide timely anomaly alerts and by that enable appropriate countermeasures to be undertaken as early as possible. With these goals in mind, this paper introduces ReRe, which is a Real-time Ready-to-go proactive Anomaly Detection algorithm for streaming time series. ReRe employs two lightweight Long Short-Term Memory (LSTM) models to predict and jointly determine whether or not an upcoming data point is anomalous based on short-term historical data points and two long-term self-adaptive thresholds. Experiments based on real-world time-series datasets demonstrate the good performance of ReRe in real-time anomaly detection without requiring human intervention or domain knowledge.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Anomalies in time series appear consecutively, forming anomaly segments. Applying the classical point-based evaluation metrics to evaluate the detection performance of segments leads to considerable underestimation, so most related studies resort to point adjustment. This operation treats all points as true positives within a segment equally when only one individual point alarms, resulting in significant overestimation and creating an illusion of superior performance. This paper proposes smoothing point adjustment, a novel rangebased evaluation protocol for time series anomaly detection. Our protocol reflects detection performance impartially by carefully considering the specific location and frequency of alarms in the raw results. It is achieved by smoothly determining the adjustment range and rewarding early detection via a ranging function and a rewarding function. Compared with other evaluation metrics, experiments on different datasets show that our protocol can yield a performance ranking of various methods more consistent with the desired situation.
2017
This white paper is about finding anomalies in time series, which we encounter in almost every system. I usually keep notes when I work on projects, and this paper is based on my experiences and the notes I took while working on anomaly detection systems.
Advanced Information Networking and Applications
During the past decade, many anomaly detection approaches have been introduced in different fields such as network monitoring, fraud detection, and intrusion detection. However, they require understanding of data pattern and often need a long off-line period to build a model or network for the target data. Providing real-time and proactive anomaly detection for streaming time series without human intervention and domain knowledge is highly valuable since it greatly reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous damage, failure, or other harmful event occurs. However, this issue has not been well studied yet. To address it, this paper proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes short-term historical data points to predict and determine whether or not the upcoming data point is a sign that an anomaly is likely to happen in the near future. By dynamically adjusting the detection threshold over time, RePAD is able to tolerate minor pattern change in time series and detect anomalies either proactively or on time. Experiments based on two time series datasets collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to proactively detect anomalies and provide early warnings in real time without human intervention and domain knowledge.
Lecture Notes in Computer Science, 2023
Anomaly detection (AD) in numerical temporal data series is a prominent task in many domains, including the analysis of industrial equipment operation, the processing of IoT data streams, and the monitoring of appliance energy consumption. The life-cycle of an AD application with a Machine Learning (ML) approach requires data collection and preparation, algorithm design and selection, training, and evaluation. All these activities contain repetitive tasks which could be supported by tools. This paper describes ODIN AD, a framework assisting the life-cycle of AD applications in the phases of data preparation, prediction performance evaluation, and error diagnosis.
2020 28th European Signal Processing Conference (EUSIPCO), 2021
The systematic collection of data has become an intrinsic process of all aspects in modern life. From industrial to healthcare machines and wearable sensors, an unprecedented amount of data is becoming available for mining and information retrieval. In particular, anomaly detection plays a key role in a wide range of applications, and has been studied extensively. However, many anomaly detection methods are unsuitable in practical scenarios, where streaming data of large volume arrive in nearly real-time at devices with limited resources. Dimensionality reduction has been excessively used to enable efficient processing for numerous high-level tasks. In this paper, we propose a computationally efficient, yet highly accurate, framework for anomaly detection of streaming data in lower-dimensional spaces, utilizing a modification of the symbolic aggregate approximation for dimensionality reduction and a statistical hypothesis testing based on the Kullback-Leibler divergence.
arXiv (Cornell University), 2022
Detecting anomalies in time series data is important in a variety of fields, including system monitoring, healthcare, and cybersecurity. While the abundance of available methods makes it difficult to choose the most appropriate method for a given application, each method has its strengths in detecting certain types of anomalies. In this study, we compare six unsupervised anomaly detection methods of varying complexity to determine whether more complex methods generally perform better and if certain methods are better suited to certain types of anomalies. We evaluated the methods using the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We analyzed the results on a dataset and anomaly type level after adjusting the necessary hyperparameters for each method. Additionally, we assessed the ability of each method to incorporate prior knowledge about anomalies and examined the differences between point-wise and sequence-wise features. Our experiments show that classical machine learning methods generally outperform deep learning methods across a range of anomaly types.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (11)
- Barnsley, M.F., & Rising, H. (1993). Fractals Everywhere, second edition, Academic Press.
- Celly, B. & Zordan, V. B. (2004). Animated People Textures. In proceedings of the 17th International Conference on Computer Animation and Social Agents. Geneva, Switzerland.
- Chiu, B., Keogh, E., & Lonardi, S. (2003). Probabilistic Discovery of Time Series Motifs. In the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Jeffrey, H.J. (1992). Chaos Game Visualization of Sequences. Comput. & Graphics 16, pp. 25-33.
- Keogh, E. http://www.cs.ucr.edu/~wli/SSDBM05/
- Keogh, E., Lonardi, S., & Ratanamahatana, C. (2004). Towards Parameter-Free Data Mining. In proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Lin, J., Keogh, E., Lonardi, S., Lankford, J.P. & Nystrom, D.M. (2004). Visually Mining and Monitoring Massive Time Series. In proceedings of the 10th ACM SIGKDD.
- Lin, J., Keogh, E., Lonardi, S. & B. (2003) A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
- Tanaka, Y. & Uehara, K. (2004). Motif Discovery Algorithm from Motion Data. In proceedings of the 18th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). Kanazawa, Japan.
- Wyszecki, G. (1982). Color science: Concepts and methods, quantitative data and formulae, 2nd edition. New York, Wiley, 1982.
- Kumar, N., Lolla N., Keogh, E., Lonardi, S., Ratanamahatana, C. & Wei, L. (2005). Time-series Bitmaps: A Practical Visualization Tool for Working with Large Time Series Databases. SIAM 2005 Data Mining Conference.