Functional Isolation Forest
2019, ArXiv
Abstract
For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular Isolation Forest (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to t...
References (24)
- M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, volume 29, pages 93-104. ACM, 2000.
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3):15:1-15:58, 2009.
- Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista. The UCR time series classification archive, July 2015. URL www.cs.ucr.edu/ ∼ eamonn/time series data/.
- G. Claeskens, M. Hubert, L. Slaets, and K. Vakili. Multivariate functional halfspace depth. Journal of American Statistical Association, 109(505):411-423, 2014.
- A. Cuevas, M. Febrero, and R. Fraiman. Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics, 22(3):481-496, 2007.
- J.H.J. Einmahl and D.M. Mason. Generalized quantile processes. The Annals of Statistics, 20(2):1062-1078, 1992.
- F. Ferraty and P. Vieu. Nonparametric Functional Data Analysis. Springer-Verlag, New York, 2006.
- S. Hariri, M. Carrasco Kind, and R. J. Brunner. Extended isolation forest. ArXiv e-prints, 2018. URL https://arxiv.org/abs/1811.02141.
- M. Hubert, P.J. Rousseeuw, and P. Segaert. Multivariate functional outlier detection. Statistical Methods & Applications, 24(2):177-202, 2015.
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to docu- ment recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
- J. Li, J.A. Cuesta-Albertos, and R.Y. Liu. DD-classifier: Nonparametric classification procedure based on DD-plot. Journal of the American Statistical Association, 107(498): 737-753, 2012.
- F. T. Liu, K. M. Ting, and Z. Zhou. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413-422. IEEE Computer Society, 2008.
- F. T. Liu, K. M. Ting, and Z. Zhou. Isolation-based anomaly detection. In ACM Transac- tions on Knowledge Discovery from Data (TKDD), volume 6, pages 1-39, 2012.
- S.G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on signal processing, 41(12):3397-3415, 1993.
- V. Maz'ya. Sobolev Spaces: with Applications to Elliptic Partial Differential Equations. Springer-Verlag, Berlin Heidelberg, 2011.
- K. Mosler. Depth statistics. In Robustness and Complex Data Structures: Festschrift in Honour of Ursula Gather, pages 17-34. Springer, Berlin Heidelberg, 2013.
- K. Mosler and P. Mozharovskyi. Fast DD-classification of functional data. Statistical Papers, 58(4):1055-1089, 2017.
- C. Park, J.Z. Huang, and Y. Ding. A computable plug-in estimator of minimum volume sets for novelty detection. Operations Research, 58(5):1469-1480, 2010.
- W. Polonik. Minimum volume sets and generalized quantile processes. Stochastic Processes and their Applications, 69(1):1-24, 1997.
- J.O. Ramsay and B.W. Silverman. Functional Data Analysis. Springer-Verlag, New-York, 2005.
- B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443-1471, 2001.
- C. Scott and R. Nowak. Learning minimum volume sets. Journal of Machine Learning Research, 7:665-704, 2006.
- I. Steinwart, D. Hush, and C. Scovel. A classification framework for anomaly detection. Journal of Machine Learning Research, 6:211-232, 2005.
- R. Vert and J.-P. Vert. Consistency and convergence rates of one-class SVMs and related algorithms. Journal of Machine Learning Research, 7:817-854, 2006.