Canonical Autocorrelation Analysis
2015, arXiv (Cornell University)
Abstract
We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-tomultiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivariate correlations within just one set of variables. This can be useful when we look for hidden parsimonious structures in data, each involving only a small subset of all features. In addition, the discovered correlations are highly interpretable as they are formed by pairs of sparse linear combinations of the original features. We show how CAA can be of use as a tool for anomaly detection when the expected structure of correlations is not followed by anomalous data. We illustrate the utility of CAA in two application domains where single-class and unsupervised learning of correlation structures are particularly relevant: breast cancer diagnosis and radiation threat detection. When applied to the Wisconsin Breast Cancer data, singleclass CAA is competitive with supervised methods used in literature. On the radiation threat detection task, unsupervised CAA performs significantly better than an unsupervised alternative prevalent in the domain, while providing valuable additional insights for threat analysis.
References (11)
- References [Abonyi and Szeifert 2003] Abonyi, J., and Szeifert, F. 2003. Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recognition Letters 24(14):2195-2207.
- Almendro et al. 2014] Almendro, V.; Kim, H. J.; Cheng, Y.- K.; Gönen, M.; Itzkovitz, S.; Argani, P.; van Oudenaarden, A.; Sukumar, S.; Michor, F.; and Polyak, K. 2014. Genetic and phenotypic diversity in breast tumor metastases. Cancer research 74(5):1338-1348.
- De Clercq et al. 2006] De Clercq, W.; Vergult, A.; Vanrum- ste, B.; Van Paesschen, W.; and Van Huffel, S. 2006. Canon- ical correlation analysis applied to remove muscle artifacts from the electroencephalogram. Biomedical Engineering, IEEE Transactions on 53(12):2583-2587.
- Friman et al. 2002] Friman, O.; Borga, M.; Lundberg, P.; and Knutsson, H. 2002. Exploratory fmri analysis by au- tocorrelation maximization. NeuroImage 16(2):454-464.
- Hotelling 1936] Hotelling, H. 1936. Relations between two sets of variates. Biometrika 321-377.
- Nauck and Kruse 1999] Nauck, D., and Kruse, R. 1999. Obtaining interpretable fuzzy classification rules from med- ical data. Artificial intelligence in medicine 16(2):149-169. [Polat and Günes ¸2007] Polat, K., and Günes ¸, S. 2007. Breast cancer diagnosis using least square support vector machine. Digital Signal Processing 17(4):694-701.
- Quinlan 1996] Quinlan, J. R. 1996. Improved use of con- tinuous attributes in c4. 5. Journal of artificial intelligence research 77-90.
- Senapati et al. 2013] Senapati, M. R.; Mohanty, A. K.; Dash, S.; and Dash, P. K. 2013. Local linear wavelet neural network for breast cancer recognition. Neural Computing and Applications 22(1):125-131.
- Tandon 2015] Tandon, P. 2015. Bayesian Aggregation of Evidence For Detection and Characterization of Patterns in Multiple Noisy Observations. Ph.D. Dissertation, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. [Todros and Hero 2012] Todros, K., and Hero, A. 2012. Measure transformed canonical correlation analysis with ap- plication to financial data. In Sensor Array and Multichannel Signal Processing Workshop (SAM), 2012 IEEE 7th, 361- 364. IEEE. [Witten and Tibshirani 2009] Witten, D. M., and Tibshirani, R. J. 2009. Extensions of sparse canonical correlation anal- ysis with applications to genomic data. Statistical applica- tions in genetics and molecular biology 8(1):1-27.
- Witten, Tibshirani, and Hastie 2009] Witten, D. M.; Tibshi- rani, R.; and Hastie, T. 2009. A penalized matrix decompo- sition, with applications to sparse principal components and canonical correlation analysis. Biostatistics kxp008. [Wolberg and Mangasarian 1990] Wolberg, W. H., and Man- gasarian, O. L. 1990. Multisurface method of pattern sepa- ration for medical diagnosis applied to breast cytology. Pro- ceedings of the national academy of sciences 87(23):9193- 9196.
- Zheng, Yoon, and Lam 2014] Zheng, B.; Yoon, S. W.; and Lam, S. S. 2014. Breast cancer diagnosis based on fea- ture extraction using a hybrid of k-means and support vec- tor machine algorithms. Expert Systems with Applications 41(4):1476-1482.