Multivariate abrupt change detectors

François Auger

Outline

Title

Abstract

Hypothesis Testing Framework

Criterion Derivation for a Multivariate Case

Criterion Derivation for a Scalar Case

Multivariate abrupt change detectors

François Auger

2019

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Multivariate abrupt change detectors Sarra Houidi, F. Auger, Houda Ben Attia Sethom, Laurence Miègeville, Dominique Fourer

Dorcas Wambui

American Journal of Theoretical and Applied Statistics, 2015

Changepoint detection is the problem of estimating the point at which the statistical properties of a sequence of observations change. Over the years several multiple changepoint search algorithms have been proposed to overcome this challenge. They include binary segmentation algorithm, the Segment neighbourhood algorithm and the Pruned Exact Linear Time (PELT) algorithm. The PELT algorithm is exact and under mild conditions has a computational cost that is linear in the number of data points. PELT is more accurate than binary segmentation and faster as than other exact search methods. However, there is scanty literature on the sensitivity/power of PELT algorithm as the changepoints approach the extremes and as the size of change increases. In this paper, we implemented the PELT algorithm which uses a common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. The study used simulated data to determine the power of the PELT test. The study investigated the power of the PELT algorithm in relation to the size of the change and the location of changepoints. It was observed that the power of the test, for a given size of change, is almost the same at all changepoints location. Also, the power of the test increases with the increase in size of change.

downloadDownload free PDF View PDFchevron_right

A composite statistical method for the detection of multiple undocumented abrupt changes in the mean value within a time series

Michele Rienzner

International Journal of Climatology, 2011

The time series of measurements of hydro-meteorological variables often suffer from imperfections such as missing data, outliers and discontinuities in the mean values. The discontinuity in the mean can be the effect of: instrumental offsets and of their corrections, of changes in the monitoring station or in the surrounding environment. If the discontinuities can be identified with a reasonable precision, a correction of the erroneous data can be made. Several authors have put their great effort into developing techniques to identify non-climatic inhomogeneities; the resulting statistical methods are especially effective when the series contains a single change point, while their performances decline when the series contains multiple change points or inhomogeneous segments (a portion of the series bounded by two complementary shifts). These limitations also affect the standard normal homogeneity test (SNHT), one of the most effective and widely applied tests. We present a composite method of homogeneity testing, standard normal homogenization composite method (SNHCM), including the SNHT as one component, which improves the SNHT performances with multiple change points and inhomogeneous segments. A number of comparisons among the new method, the SNHT and a powerful optimal segmentation method (OSM-CM), are illustrated in the paper. SNHCM demonstrates their performances in change-point detection similar to, or better than, the SNHT and very close to the OSM-CM. The SNHCM is effective in recognizing complex patterns of discontinuities, especially inhomogeneous segments, which represent a severe problem for SNHT; on the contrary, SNHT performs slightly better only when the series contains a single change point, but the difference between the two methods is negligible. Compared to the OSM-CM, SNHCM provides very similar performances, with some favourable features deriving from the fact that it is computationally lighter, simpler to implement, can easily handle very long series and is based on statistical hypothesis tests with a well-defined and adjustable significance level. undocumented artificial change points are a common adversity, and the challenge of their detection is reflected by the extensive literature on the subject. Peterson et al.

downloadDownload free PDF View PDFchevron_right

Revisiting and testing stationarity

Patrick Flandrin

Journal of Physics: Conference Series, 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

downloadDownload free PDF View PDFchevron_right

Detection and Estimation of Multiple Transient Changes

Sergey Malov

Cornell University - arXiv, 2021

Change-point detection methods are proposed for the case of temporary failures, or transient changes, when an unexpected disorder is ultimately followed by a readjustment and return to the initial state. A base distribution of the "in-control" state changes to an "out-of-control" distribution for unknown periods of time. Likelihood based sequential and retrospective tools are proposed for the detection and estimation of each pair of change-points. The accuracy of the obtained change-point estimates is assessed. Proposed methods offer simultaneous control the familywise false alarm and false readjustment rates at the pre-chosen levels.

downloadDownload free PDF View PDFchevron_right

Inference of abrupt changes in noisy geochemical records using transdimensional changepoint models

Thomas Bodin

We present a method to quantify abrupt changes (or changepoints) in data series, represented as a function of depth or time. These changes are often the result of climatic or environmental variations and can be manifested in multiple datasets as different responses, but all datasets can have the same changepoint locations/timings. The method we present uses transdimensional Markov chain Monte Carlo to infer probability distributions on the number and locations (in depth or time) of changepoints, the mean values between changepoints and, if required, the noise variance associated with each dataset being considered. This latter point is important as we generally will have limited information on the noise, such as estimates only of measurement uncertainty, and in most cases it is not practical to make repeat sampling/measurement to assess other contributions to the variation in the data. We describe the main features of the approach (and describe the mathematical formulation in supplementary material), and demonstrate its validity using synthetic datasets, with known changepoint structure (number and locations of changepoints) and distribution of noise variance for each dataset. We show that when using multiple data, we expect to achieve better resolution of the changepoint structure than when we use each dataset individually. This is conditional on the validity of the assumption of common changepoints between different datasets. We then apply the method to two sets of real geochemical data, both from peat cores, taken from NE Australia and eastern Tibet. Under the assumption that changes occur at the same time for all datasets, we recover solutions consistent with those previously inferred qualitatively from independent data and interpretations. However, our approach provides a quantitative estimate of the relative probability of the inferred changepoints, allowing an objective assessment of the significance of each change.

downloadDownload free PDF View PDFchevron_right

Non-parametric Structural Change Detection in Multivariate Systems

Pauliina Ilmonen

arXiv: Methodology, 2018

Structural change detection problems are often encountered in analytics and econometrics, where the performance of a model can be significantly affected by unforeseen changes in the underlying relationships. Although these problems have a comparatively long history in statistics, the number of studies done in the context of multivariate data under nonparametric settings is still small. In this paper, we propose a consistent method for detecting multiple structural changes in a system of related regressions over a large dimensional variable space. In most applications, practitioners also do not have a priori information on the relevance of different variables, and therefore, both locations of structural changes as well as the corresponding sparse regression coefficients need to be estimated simultaneously. The method combines nonparametric energy distance minimization principle with penalized regression techniques. After showing asymptotic consistency of the model, we compare the pro...

downloadDownload free PDF View PDFchevron_right

Scalable multiple changepoint detection for functional data sequences

Derek Tucker

Environmetrics

We propose the Multiple Changepoint Isolation (MCI) method for detecting multiple changes in the mean and covariance of a functional process. We first introduce a pair of projections to represent the variability "between" and "within" the functional observations. We then present an augmented fused lasso procedure to split the projections into multiple regions robustly. These regions act to isolate each changepoint away from the others so that the powerful univariate CUSUM statistic can be applied region-wise to identify the changepoints. Simulations show that our method accurately detects the number and locations of changepoints under many different scenarios. These include light and heavy tailed data, data with symmetric and skewed distributions, sparsely and densely sampled changepoints, and mean and covariance changes. We show that our method outperforms a recent multiple functional changepoint detector and several univariate changepoint detectors applied to our proposed projections. We also show that MCI is more robust than existing approaches and scales linearly with sample size. Finally, we demonstrate our method on a large time series of water vapor mixing ratio profiles from atmospheric emitted radiance interferometer measurements.

downloadDownload free PDF View PDFchevron_right

Consistent selection of the number of change-points via sample-splitting

Runze Li

The Annals of Statistics

Introduction. Change-point detection has received enormous attention due to the emergence of an increasing amount of temporal data. It is a process of detecting mean, variance, or distributional changes in time-ordered observations, and becomes an integrated part of modeling, estimation and inference. Comprehensive reviews of various existing approaches to the inference of multiple change-points (MCP) can be found, for instance, Chen and Gupta (2012) and Aue and Horváth (2013). The determination of the number of change-points K in a dataset has been central to multiple change-point analysis for decades. It is often approached as a model selection problem, since K drives the model dimension. Bayesian information criterion (BIC, Schwarz (1978)) has become very popular in the change-point problems, for instance, see Yao (1988), Bai and Perron (1998), Braun, Braun and Müller (2000), Fryzlewicz (2014), Zou et al. (2014) and Wang, Zou and Yin (2018), and the asymptotic consistency of the resulting estimator of K has been established in particular contexts of interest. While the BIC is well grounded for general models, different BIC terms are required to adapt to different contexts of MCP problems, and more importantly, the optimal penalization magnitude usually varies from the model and error distribution (Hannart and Naveau (2012), Zhang and Siegmund (2007)). Several ad-hoc criteria for the change-point problem were also proposed, for instance, by Lavielle (2005) and Birgé and Massart (2001). Although these approaches could be visually useful in practice, their theoretical justification remains an open problem. This article develops a new procedure that attempts to circumvent those limitations while improving the performance of existing criteria. Our strategy is to select the number of change

downloadDownload free PDF View PDFchevron_right

Detecting Abrupt Changes in the Presence of Local Fluctuations and Autocorrelated Noise

Paul Fearnhead

Journal of the American Statistical Association, 2021

Whilst there are a plethora of algorithms for detecting changes in mean in univariate time-series, almost all struggle in real applications where there is autocorrelated noise or where the mean fluctuates locally between the abrupt changes that one wishes to detect. In these cases, default implementations, which are often based on assumptions of a constant mean between changes and independent noise, can lead to substantial over-estimation of the number of changes. We propose a principled approach to detect such abrupt changes that models local fluctuations as a random walk process and autocorrelated noise via an AR(1) process. We then estimate the number and location of changepoints by minimising a penalised cost based on this 1 arXiv:2005.01379v1 [stat.ME] 4 May 2020 model. We develop a novel and efficient dynamic programming algorithm, DeCAFS, that can solve this minimisation problem; despite the additional challenge of dependence across segments, due to the autocorrelated noise, which makes existing algorithms inapplicable. Theory and empirical results show that our approach has greater power at detecting abrupt changes than existing approaches. We apply our method to measuring gene expression levels in bacteria.

downloadDownload free PDF View PDFchevron_right

Nonparametric tests for change-point detection à la Gombay and Horváth

Jean-François Quessy

Journal of Multivariate Analysis, 2013

The nonparametric test for change-point detection proposed by Gombay and Horváth is revisited and extended in the broader setting of empirical process theory. The resulting testing procedure for potentially multivariate observations is based on a sequential generalization of the functional multiplier central limit theorem and on modifications of Gombay and Horváth's seminal approach that appears to improve the finite-sample behavior of the tests. A large number of candidate test statistics based on processes indexed by lower-left orthants and half-spaces are considered and their performance is studied through extensive Monte Carlo experiments involving univariate, bivariate and trivariate data sets. Finally, practical recommendations are provided and the tests are illustrated on trivariate hydrological data.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (7)

M. Basseville, "Detecting changes in signals and systems -a survey," Automatica, vol. 24, pp. 309-326, May 1988.
J.D. Healy, "A Note on Multivariate CUSUM Procedures",Technometrics, vol. 29, pp. 409-412, 1987.
P. Delacourt, C. Wellekens, "DISTBIC: a speaker-based segmentation for audio data indexing," In Speech Communication, vol. 32, pp. 111-126, September 2000.
B. Zhou, J. H. L. Hansen,"Efficient audio stream segmentation via the combined T 2 Statistic and Bayesian Information Criterion," IEEE Trans. on speech and audio processing, vol. 13, July 2005.
D. Nikovski, A. Jain, "Fast adaptive algorithms for abrupt change detection", Machine Learning Springer, vol. 79, No. 3, 283-306, December 2010.
N. Giri,"On the likelihood ratio test of a normal multivariate testing problem", The Annals of Mathematical Statistics, vol. 35, No. 1, pp. 181-189, March 1964.
H. Hotelling, "A generalized T test and measure of multivariate dispersion", Proc. of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California, Los Angeles and Berkeley, pp.23-41, 1951.

Anya M Reading

downloadDownload free PDF View PDFchevron_right

Change detection via affine and quadratic detectors

Arkadi Nemirovski

Electronic Journal of Statistics, 2018

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Public Domain

downloadDownload free PDF View PDFchevron_right

changepoint : An R Package for Changepoint Analysis

Paul Fearnhead

Journal of Statistical Software, 2014

One of the key challenges in changepoint analysis is the ability to detect multiple changes within a given time series or sequence. The changepoint package has been developed to provide users with a choice of multiple changepoint search methods to use in conjunction with a given changepoint method and in particular provides an implementation of the recently proposed PELT algorithm. This article describes the search methods which are implemented in the package as well as some of the available test statistics whilst highlighting their application with simulated and practical examples. segmentation, break points, search methods, bioinformatics, energy time series, R

downloadDownload free PDF View PDFchevron_right

Change Detection and Analysis of Data with Heterogeneous Structures

Shuyu Chu

2017

downloadDownload free PDF View PDFchevron_right

A hierarchical, nonparametric, sequential change-detection test

Giacomo Boracchi

The 2011 International Joint Conference on Neural Networks, 2011

Design of applications working in nonstationary environments requires the ability to detect and anticipate possible behavioral changes affecting the system under investigation. In this direction, the literature provides several tests aiming at assessing the stationarity of a data generating process; of particular interest are nonparametric sequential change-point detection tests that do not require any a-priori information regarding both process and change. Moreover, such tests can be made automatic through an on-line inspection of sequences of data, hence making them particularly interesting to address real applications. Following this approach, we suggest a novel two-level hierarchical changedetection test designed to detect possible occurrences of changes by observing incoming measurements. This hierarchical solution significantly reduces the number of false positives at the expenses of a negligible increase of false negatives and detection delays. Experiments show the effectiveness of the proposed approach both on synthetic dataset and measurements from real applications.

downloadDownload free PDF View PDFchevron_right

Change point detection with respect to variance

Elias Erdtman

Linköping Studies in Science and Technology. Licentiate Thesis

This thesis examines a simple method for detecting a change with respect to the variance in a sequence of independent normally distributed observations with a constant mean. The method filters out observations with extreme values and divides the sequence into equally large subsequences. For each subsequence, the count of extreme values is translated to a binomial random variable which is tested towards the expected number of extremes. The expected number of extremes comes from prior knowledge of the sequence and a specified probability of how common an extreme value should be. Then specifying the significance level of the goodness-of-fit test yields the number of extreme observations needed to detect a change. The approach is extended to a sequence of independent multivariate normally distributed observations by transforming the sequence to a univariate sequence with the help of the Mahalanobis distance. Thereafter it is possible to apply the same approach as when working with a univariate sequence. Given that a change has occurred, the distribution of the Mahalanobis distance of a multivariate normally distributed random vector with zero mean is shown to approximately follow a gamma distribution. The parameters for the approximated gamma distribution depend only on Σ −1/2 1 Σ 2 Σ −1/2 1 with Σ 1 and Σ 2 being the covariance matrices before and after the change has occurred. In addition to the proposed approach, other statistics such as the largest eigenvalue, the Kullback-Leibler divergence, and the Bhattacharyya distance are considered.

downloadDownload free PDF View PDFchevron_right

A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data

Christian Amao Suxo

Change point analysis has applications in a wide variety of fields. The general problem concerns the inference of a change in distribution for a set of time-ordered observations. Sequential detection is an online version in which new data is continually arriving and is analyzed adaptively. We are concerned with the related, but distinct, offline version, in which retrospective analysis of an entire sequence is performed. For a set of multivariate observations of arbitrary dimension, we consider nonparametric estimation of both the number of change points and the positions at which they occur. We do not make any assumptions regarding the nature of the change in distribution or any distribution assumptions beyond the existence of the αth absolute moment, for some α ∈ (0, 2). Estimation is based on hierarchical clustering and we propose both divisive and agglomerative algorithms. The divisive method is shown to provide consistent estimates of both the number and location of change points under standard regularity assumptions. We compare the proposed approach with competing methods in a simulation study. Methods from cluster analysis are applied to assess performance and to allow simple comparisons of location estimates, even when the estimated number differs. We conclude with applications in genetics, finance and spatio-temporal analysis.

downloadDownload free PDF View PDFchevron_right

Assessing data change in scientific datasets

Boris Faybishenko

Concurrency and Computation: Practice and Experience

SummaryScientific datasets are growing rapidly and becoming critical to next‐generation scientific discoveries. The validity of scientific results relies on the quality of data used and data are often subject to change, for example, due to observation additions, quality assessments, or processing software updates. The effects of data change are not well understood and difficult to predict. Datasets are often repeatedly updated and recomputing derived data products quickly becomes time consuming and resource intensive and may in some cases not even be necessary, thus delaying scientific advance. Despite its importance, there is a lack of systematic approaches for best comparing data versions to quantify the changes, and ad‐hoc or manual processes are commonly used. In this article, we propose a novel hierarchical approach for analyzing data changes, including real‐time (online) and offline analyses. We employ a variety of fast‐to‐compute numerical analyses, graphical data change repr...

downloadDownload free PDF View PDFchevron_right

Bayesian Change Point Detection with Spike-and-Slab Priors

Oscar Padilla

Journal of Computational and Graphical Statistics, 2023

Throughout this section we assume the model described by (2) but in the presence of only one change point, thus |C * | = 1. Under such setting, we study the behavior of the posterior means involved in solo.cp, the fast detection procedure proposed in Section 2. Notably, while our estimator is a particular instance of the high-dimensional linear framework from Chen et al. ( ), the theory from Chen et al. ( ) cannot be directly applied in our setting. The reason is that when writing (2) as a linear model the design matrix does not satisfy the conditions required for consistency in Chen et al. (2019). Despite this, we show that a version of our fast estimator attains optimal localization rates for single change point detection. Throughout the section we consider the following change point selection criterion ĵ := arg max where {µ 1,j } is the vector of posterior means defined in (9), {µ 1,j } is the version of {µ 1,j } based on the vector (-Y T , . . . , -Y 1 ) instead of (Y 1 , . . . , Y T ) , and c > 0. The vector {µ 1,j } is employed to obtain the desired localization rate. The need for this second vector will become apparent in the proof of Theorem S1.

downloadDownload free PDF View PDFchevron_right

Hierarchical Change-Detection Tests

Giacomo Boracchi

IEEE Transactions on Neural Networks and Learning Systems, 2017

We present hierarchical change-detection tests (HCDTs), as effective online algorithms for detecting changes in datastreams. HCDTs are characterized by a hierarchical architecture composed of a detection and a validation layer. The detection layer steadily analyzes the input datastream by means of an online, sequential, change-detection test (CDT), which operates as a low-complexity trigger that promptly detects possible changes in the process generating the data. The validation layer is activated when the detection one reveals a change, and performs an offline, more sophisticated, analysis on recently acquired data to reduce false alarms. Our experiments show that, when the process generating the datastream is unknown, as it is mostly the case in the real world, HCDTs achieve a far more advantageous trade-off between false-positive rate and detection delay than their singlelayered, more traditional, counterpart. Moreover, the successful interplay between the two layers permits HCDTs to automatically reconfigure after having detected and validated a change. Thus, HCDTs are able to reveal further departures from the new, postchange, state of the data-generating process.

downloadDownload free PDF View PDFchevron_right

Multivariate abrupt change detectors

Sign up for access to the world's latest research

Abstract

Related papers

References (7)

Related papers

Related topics