Multivariate abrupt change detectors
2019
Sign up for access to the world's latest research
Abstract
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Multivariate abrupt change detectors Sarra Houidi, F. Auger, Houda Ben Attia Sethom, Laurence Miègeville, Dominique Fourer
Related papers
American Journal of Theoretical and Applied Statistics, 2015
Changepoint detection is the problem of estimating the point at which the statistical properties of a sequence of observations change. Over the years several multiple changepoint search algorithms have been proposed to overcome this challenge. They include binary segmentation algorithm, the Segment neighbourhood algorithm and the Pruned Exact Linear Time (PELT) algorithm. The PELT algorithm is exact and under mild conditions has a computational cost that is linear in the number of data points. PELT is more accurate than binary segmentation and faster as than other exact search methods. However, there is scanty literature on the sensitivity/power of PELT algorithm as the changepoints approach the extremes and as the size of change increases. In this paper, we implemented the PELT algorithm which uses a common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. The study used simulated data to determine the power of the PELT test. The study investigated the power of the PELT algorithm in relation to the size of the change and the location of changepoints. It was observed that the power of the test, for a given size of change, is almost the same at all changepoints location. Also, the power of the test increases with the increase in size of change.
International Journal of Climatology, 2011
The time series of measurements of hydro-meteorological variables often suffer from imperfections such as missing data, outliers and discontinuities in the mean values. The discontinuity in the mean can be the effect of: instrumental offsets and of their corrections, of changes in the monitoring station or in the surrounding environment. If the discontinuities can be identified with a reasonable precision, a correction of the erroneous data can be made. Several authors have put their great effort into developing techniques to identify non-climatic inhomogeneities; the resulting statistical methods are especially effective when the series contains a single change point, while their performances decline when the series contains multiple change points or inhomogeneous segments (a portion of the series bounded by two complementary shifts). These limitations also affect the standard normal homogeneity test (SNHT), one of the most effective and widely applied tests. We present a composite method of homogeneity testing, standard normal homogenization composite method (SNHCM), including the SNHT as one component, which improves the SNHT performances with multiple change points and inhomogeneous segments. A number of comparisons among the new method, the SNHT and a powerful optimal segmentation method (OSM-CM), are illustrated in the paper. SNHCM demonstrates their performances in change-point detection similar to, or better than, the SNHT and very close to the OSM-CM. The SNHCM is effective in recognizing complex patterns of discontinuities, especially inhomogeneous segments, which represent a severe problem for SNHT; on the contrary, SNHT performs slightly better only when the series contains a single change point, but the difference between the two methods is negligible. Compared to the OSM-CM, SNHCM provides very similar performances, with some favourable features deriving from the fact that it is computationally lighter, simpler to implement, can easily handle very long series and is based on statistical hypothesis tests with a well-defined and adjustable significance level. undocumented artificial change points are a common adversity, and the challenge of their detection is reflected by the extensive literature on the subject. Peterson et al.
Journal of Physics: Conference Series, 2008
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Cornell University - arXiv, 2021
Change-point detection methods are proposed for the case of temporary failures, or transient changes, when an unexpected disorder is ultimately followed by a readjustment and return to the initial state. A base distribution of the "in-control" state changes to an "out-of-control" distribution for unknown periods of time. Likelihood based sequential and retrospective tools are proposed for the detection and estimation of each pair of change-points. The accuracy of the obtained change-point estimates is assessed. Proposed methods offer simultaneous control the familywise false alarm and false readjustment rates at the pre-chosen levels.
We present a method to quantify abrupt changes (or changepoints) in data series, represented as a function of depth or time. These changes are often the result of climatic or environmental variations and can be manifested in multiple datasets as different responses, but all datasets can have the same changepoint locations/timings. The method we present uses transdimensional Markov chain Monte Carlo to infer probability distributions on the number and locations (in depth or time) of changepoints, the mean values between changepoints and, if required, the noise variance associated with each dataset being considered. This latter point is important as we generally will have limited information on the noise, such as estimates only of measurement uncertainty, and in most cases it is not practical to make repeat sampling/measurement to assess other contributions to the variation in the data. We describe the main features of the approach (and describe the mathematical formulation in supplementary material), and demonstrate its validity using synthetic datasets, with known changepoint structure (number and locations of changepoints) and distribution of noise variance for each dataset. We show that when using multiple data, we expect to achieve better resolution of the changepoint structure than when we use each dataset individually. This is conditional on the validity of the assumption of common changepoints between different datasets. We then apply the method to two sets of real geochemical data, both from peat cores, taken from NE Australia and eastern Tibet. Under the assumption that changes occur at the same time for all datasets, we recover solutions consistent with those previously inferred qualitatively from independent data and interpretations. However, our approach provides a quantitative estimate of the relative probability of the inferred changepoints, allowing an objective assessment of the significance of each change.
arXiv: Methodology, 2018
Structural change detection problems are often encountered in analytics and econometrics, where the performance of a model can be significantly affected by unforeseen changes in the underlying relationships. Although these problems have a comparatively long history in statistics, the number of studies done in the context of multivariate data under nonparametric settings is still small. In this paper, we propose a consistent method for detecting multiple structural changes in a system of related regressions over a large dimensional variable space. In most applications, practitioners also do not have a priori information on the relevance of different variables, and therefore, both locations of structural changes as well as the corresponding sparse regression coefficients need to be estimated simultaneously. The method combines nonparametric energy distance minimization principle with penalized regression techniques. After showing asymptotic consistency of the model, we compare the pro...
Environmetrics
We propose the Multiple Changepoint Isolation (MCI) method for detecting multiple changes in the mean and covariance of a functional process. We first introduce a pair of projections to represent the variability "between" and "within" the functional observations. We then present an augmented fused lasso procedure to split the projections into multiple regions robustly. These regions act to isolate each changepoint away from the others so that the powerful univariate CUSUM statistic can be applied region-wise to identify the changepoints. Simulations show that our method accurately detects the number and locations of changepoints under many different scenarios. These include light and heavy tailed data, data with symmetric and skewed distributions, sparsely and densely sampled changepoints, and mean and covariance changes. We show that our method outperforms a recent multiple functional changepoint detector and several univariate changepoint detectors applied to our proposed projections. We also show that MCI is more robust than existing approaches and scales linearly with sample size. Finally, we demonstrate our method on a large time series of water vapor mixing ratio profiles from atmospheric emitted radiance interferometer measurements.
The Annals of Statistics
Introduction. Change-point detection has received enormous attention due to the emergence of an increasing amount of temporal data. It is a process of detecting mean, variance, or distributional changes in time-ordered observations, and becomes an integrated part of modeling, estimation and inference. Comprehensive reviews of various existing approaches to the inference of multiple change-points (MCP) can be found, for instance, Chen and Gupta (2012) and Aue and Horváth (2013). The determination of the number of change-points K in a dataset has been central to multiple change-point analysis for decades. It is often approached as a model selection problem, since K drives the model dimension. Bayesian information criterion (BIC, Schwarz (1978)) has become very popular in the change-point problems, for instance, see Yao (1988), Bai and Perron (1998), Braun, Braun and Müller (2000), Fryzlewicz (2014), Zou et al. (2014) and Wang, Zou and Yin (2018), and the asymptotic consistency of the resulting estimator of K has been established in particular contexts of interest. While the BIC is well grounded for general models, different BIC terms are required to adapt to different contexts of MCP problems, and more importantly, the optimal penalization magnitude usually varies from the model and error distribution (Hannart and Naveau (2012), Zhang and Siegmund (2007)). Several ad-hoc criteria for the change-point problem were also proposed, for instance, by Lavielle (2005) and Birgé and Massart (2001). Although these approaches could be visually useful in practice, their theoretical justification remains an open problem. This article develops a new procedure that attempts to circumvent those limitations while improving the performance of existing criteria. Our strategy is to select the number of change
Journal of the American Statistical Association, 2021
Whilst there are a plethora of algorithms for detecting changes in mean in univariate time-series, almost all struggle in real applications where there is autocorrelated noise or where the mean fluctuates locally between the abrupt changes that one wishes to detect. In these cases, default implementations, which are often based on assumptions of a constant mean between changes and independent noise, can lead to substantial over-estimation of the number of changes. We propose a principled approach to detect such abrupt changes that models local fluctuations as a random walk process and autocorrelated noise via an AR(1) process. We then estimate the number and location of changepoints by minimising a penalised cost based on this 1 arXiv:2005.01379v1 [stat.ME] 4 May 2020 model. We develop a novel and efficient dynamic programming algorithm, DeCAFS, that can solve this minimisation problem; despite the additional challenge of dependence across segments, due to the autocorrelated noise, which makes existing algorithms inapplicable. Theory and empirical results show that our approach has greater power at detecting abrupt changes than existing approaches. We apply our method to measuring gene expression levels in bacteria.
Journal of Multivariate Analysis, 2013
The nonparametric test for change-point detection proposed by Gombay and Horváth is revisited and extended in the broader setting of empirical process theory. The resulting testing procedure for potentially multivariate observations is based on a sequential generalization of the functional multiplier central limit theorem and on modifications of Gombay and Horváth's seminal approach that appears to improve the finite-sample behavior of the tests. A large number of candidate test statistics based on processes indexed by lower-left orthants and half-spaces are considered and their performance is studied through extensive Monte Carlo experiments involving univariate, bivariate and trivariate data sets. Finally, practical recommendations are provided and the tests are illustrated on trivariate hydrological data.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (7)
- M. Basseville, "Detecting changes in signals and systems -a survey," Automatica, vol. 24, pp. 309-326, May 1988.
- J.D. Healy, "A Note on Multivariate CUSUM Procedures",Technometrics, vol. 29, pp. 409-412, 1987.
- P. Delacourt, C. Wellekens, "DISTBIC: a speaker-based segmentation for audio data indexing," In Speech Communication, vol. 32, pp. 111-126, September 2000.
- B. Zhou, J. H. L. Hansen,"Efficient audio stream segmentation via the combined T 2 Statistic and Bayesian Information Criterion," IEEE Trans. on speech and audio processing, vol. 13, July 2005.
- D. Nikovski, A. Jain, "Fast adaptive algorithms for abrupt change detection", Machine Learning Springer, vol. 79, No. 3, 283-306, December 2010.
- N. Giri,"On the likelihood ratio test of a normal multivariate testing problem", The Annals of Mathematical Statistics, vol. 35, No. 1, pp. 181-189, March 1964.
- H. Hotelling, "A generalized T test and measure of multivariate dispersion", Proc. of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California, Los Angeles and Berkeley, pp.23-41, 1951.