Abstract
Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has, for instance, been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis, including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpr...
References (92)
- S Akaho. 2001. A Kernel Method For Canonical Correlation Analysis. In In Proceedings of the International Meeting of the Psychometric Society (IMPS2001.
- Md A Alam, M Nasser, and K Fukumizu. 2008. Sensitivity analysis in robust and kernel canonical correla- tion analysis. In Computer and Information Technology, 2008. ICCIT 2008. 11th International Confer- ence on. IEEE, 399-404.
- TW Anderson. 2003. An introduction to statistical multivariate analysis. (2003).
- G Andrew, R Arora, J Bilmes, and K Livescu. 2013. Deep canonical correlation analysis. In International Conference on Machine Learning. 1247-1255.
- C Archambeau and FR Bach. 2009. Sparse probabilistic projections. In Advances in neural information processing systems. 73-80.
- C Archambeau, N Delannay, and M Verleysen. 2006. Robust probabilistic projections. In Proceedings of the 23rd International conference on machine learning. ACM, 33-40.
- S Arlot, A Celisse, and others. 2010. A survey of cross-validation procedures for model selection. Statistics surveys 4 (2010), 40-79.
- F Bach, R Jenatton, J Mairal, G Obozinski, and others. 2011. Convex optimization with sparsity-inducing norms. Optimization for Machine Learning 5 (2011).
- FR Bach and MI Jordan. 2002. Kernel independent component analysis. Journal of machine learning re- search 3, Jul (2002), 1-48.
- FR Bach and MI Jordan. 2005. A probabilistic interpretation of canonical correlation analysis. (2005).
- MS Bartlett. 1938. Further aspects of the theory of multiple regression. In Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 34. Cambridge Univ Press, 33-40.
- MS Bartlett. 1941. The statistical significance of canonical correlations. Biometrika 32, 1 (1941), 29-37.
- B Baur and S Bozdag. 2015. A canonical correlation analysis-based dynamic bayesian network prior to infer gene regulatory networks from multiple types of biological data. Journal of Computational Biology 22, 4 (2015), 289-299.
- Å Björck and GH Golub. 1973. Numerical methods for computing angles between linear subspaces. Mathe- matics of computation 27, 123 (1973), 579-594.
- MB Blaschko, CH Lampert, and A Gretton. 2008. Semi-supervised laplacian regularization of kernel canon- ical correlation analysis. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 133-145.
- MW Browne. 2000. Cross-validation methods. Journal of mathematical psychology 44, 1 (2000), 108-132.
- E Burg and J Leeuw. 1983. Non-linear canonical correlation. British journal of mathematical and statistical psychology 36, 1 (1983), 54-80.
- J Cai. 2013. The distance between feature subspaces of kernel canonical correlation analysis. Mathematical and Computer Modelling 57, 3 (2013), 970-975.
- 1:30 L Cao, Z Ju, J Li, R Jian, and C Jiang. 2015. Sequence detection analysis based on canonical correlation for steady-state visual evoked potential brain computer interfaces. Journal of neuroscience methods 253 (2015), 10-17.
- JD Carroll. 1968. Generalization of canonical correlation analysis to three or more sets of variables. In Proceedings of the 76th annual convention of the American Psychological Association, Vol. 3. 227-228.
- B Chang, U Kr üger, R Kustra, and J Zhang. 2013. Canonical Correlation Analysis based on Hilbert-Schmidt Independence Criterion and Centered Kernel Target Alignment.. In ICML (2). 316-324.
- X Chen, S Chen, H Xue, and X Zhou. 2012. A unified dimensionality reduction framework for semi-paired and semi-supervised multi-view data. Pattern Recognition 45, 5 (2012), 2005-2018.
- X Chen, C He, and H Peng. 2014. Removal of muscle artifacts from single-channel EEG based on ensemble empirical mode decomposition and multiset canonical correlation analysis. Journal of Applied Mathe- matics 2014 (2014).
- X Chen, H Liu, and JG Carbonell. 2012. Structured sparse canonical correlation analysis. In International Conference on Artificial Intelligence and Statistics. 199-207.
- A Cichonska, J Rousu, P Marttinen, AJ Kangas, P Soininen, T Lehtim äki, OT Raitakari, M-R J ärvelin, V Salomaa, M Ala-Korpela, and others. 2016. metaCCA: Summary statistics-based multivariate meta- analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics (2016), btw052.
- N Cristianini, J Shawe-Taylor, and H Lodhi. 2002. Latent semantic kernels. Journal of Intelligent Informa- tion Systems 18, 2-3 (2002), 127-152.
- R Cruz-Cano and MLT Lee. 2014. Fast regularized canonical correlation analysis. Computational Statistics & Data Analysis 70 (2014), 88-100.
- J Dauxois and GM Nkiet. 1997. Canonical analysis of two Euclidean subspaces and its applications. Linear Algebra Appl. 264 (1997), 355-388.
- DL Donoho and IM Johnstone. 1995. Adapting to unknown smoothness via wavelet shrinkage. Journal of the american statistical association 90, 432 (1995), 1200-1224.
- RB Dunham and DJ Kravetz. 1975. Canonical correlation analysis in a predictive system. The Journal of Experimental Education 43, 4 (1975), 35-42.
- C Eckart and G Young. 1936. The approximation of one matrix by another of lower rank. Psychometrika 1, 3 (1936), 211-218.
- B Efron. 1979. Computers and the theory of statistics: thinking the unthinkable. SIAM review 21, 4 (1979), 460-480.
- LM Ewerbring and FT Luk. 1989. Canonical correlations and generalized SVD: applications and new algo- rithms. In 32nd Annual Technical Symposium. International Society for Optics and Photonics, 206-222.
- J Fang, D Lin, SC Schulz, Z Xu, VD Calhoun, and Y-P Wang. 2016. Joint sparse canonical correlation analysis for detecting differential imaging genetics modules. Bioinformatics 32, 22 (2016), 3480-3488.
- Y Fujikoshi and LG Veitch. 1979. Estimation of dimensionality in canonical correlation analysis. Biometrika 66, 2 (1979), 345-351.
- K Fukumizu, FR Bach, and A Gretton. 2007. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research 8, Feb (2007), 361-383.
- C Fyfe and PL Lai. 2000. Canonical correlation analysis neural networks. In Pattern Recognition, 2000. Proceedings. 15th International Conference on, Vol. 2. IEEE, 977-980.
- GH Golub and CF Van Loan. 2012. Matrix computations. Vol. 3. JHU Press.
- GH Golub and H Zha. 1995. The canonical correlations of matrix pairs and their numerical computation. In Linear algebra for signal processing. Springer, 27-49.
- I Gonz ález, S Déjean, PGP Martin, O Gonc ¸alves, P Besse, and A Baccini. 2009. Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical cor- relation analysis. Journal of Biological Systems 17, 02 (2009), 173-199.
- BK Gunderson and RJ Muirhead. 1997. On estimating the dimensionality in canonical correlation analysis. Journal of Multivariate Analysis 62, 1 (1997), 121-136.
- DR Hardoon, J Mourao-Miranda, M Brammer, and J Shawe-Taylor. 2007. Unsupervised analysis of fMRI data using kernel canonical correlation. NeuroImage 37, 4 (2007), 1250-1259.
- DR Hardoon and J Shawe-Taylor. 2009. Convergence analysis of kernel canonical correlation analysis: the- ory and practice. Machine learning 74, 1 (2009), 23-38.
- DR Hardoon and J Shawe-Taylor. 2011. Sparse canonical correlation analysis. Machine Learning 83, 3 (2011), 331-353.
- DR Hardoon, S Szedmak, and J Shawe-Taylor. 2004. Canonical correlation analysis: An overview with ap- plication to learning methods. Neural computation 16, 12 (2004), 2639-2664.
- MJR Healy. 1957. A rotation method for computing canonical correlations. Math. Comp. 11, 58 (1957), 83- 86.
- C Heij and B Roorda. 1991. A modified canonical correlation approach to approximate state space modelling. In Decision and Control, 1991., Proceedings of the 30th IEEE Conference on. IEEE, 1343-1348.
- AE Hoerl and RW Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Techno- metrics 12, 1 (1970), 55-67.
- JW Hooper. 1959. Simultaneous equations and canonical correlation theory. Econometrica: Journal of the Econometric Society (1959), 245-256.
- CE Hopkins. 1969. Statistical analysis by canonical correlation: a computer application. Health services research 4, 4 (1969), 304.
- P Horst. 1961. Relations among sets of measures. Psychometrika 26, 2 (1961), 129-149.
- H Hotelling. 1935. The most predictable criterion. Journal of educational Psychology 26, 2 (1935), 139.
- H Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321-377.
- WW Hsieh. 2000. Nonlinear canonical correlation analysis by neural networks. Neural Networks 13, 10 (2000), 1095-1105.
- I Huopaniemi, T Suvitaival, J Nikkil ä, M Orešič, and S Kaski. 2010. Multivariate multi-way analysis of multi-source data. Bioinformatics 26, 12 (2010), i391-i398.
- A Kabir, RD Merrill, AA Shamim, RDW Klemn, AB Labrique, P Christian, KP West Jr, and M Nasser. 2014. Canonical correlation analysis of infant's size at birth and maternal factors: a study in rural Northwest Bangladesh. PloS one 9, 4 (2014), e94243.
- M Kang, B Zhang, X Wu, C Liu, and J Gao. 2013. Sparse generalized canonical correlation analysis for biological model integration: a genetic study of psychiatric disorders. In Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE. IEEE, 1490-1493.
- JR Kettenring. 1971. Canonical analysis of several sets of variables. Biometrika (1971), 433-451.
- A Kimura, M Sugiyama, T Nakano, H Kameoka, H Sakano, E Maeda, and K Ishiguro. 2013. SemiCCA: Efficient semi-supervised learning of canonical correlations. Information and Media Technologies 8, 2 (2013), 311-318.
- A Klami and S Kaski. 2007. Local dependent components. In Proceedings of the 24th international conference on Machine learning. ACM, 425-432.
- A Klami, S Virtanen, and S Kaski. 2012. Bayesian exponential family projections for coupled data sources. arXiv preprint arXiv:1203.3489 (2012).
- A Klami, S Virtanen, and S Kaski. 2013. Bayesian canonical correlation analysis. Journal of Machine Learn- ing Research 14, Apr (2013), 965-1003.
- D Krstajic, LJ Buturovic, DE Leahy, and S Thomas. 2014. Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics 6, 1 (2014), 1.
- PL Lai and C Fyfe. 1999. A neural implementation of canonical correlation analysis. Neural Networks 12, 10 (1999), 1391-1397.
- PL Lai and C Fyfe. 2000. Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems 10, 05 (2000), 365-377.
- NB Larson, GD Jenkins, MC Larson, RA Vierkant, TA Sellers, CM Phelan, JM Schildkraut, R Sutphen, PPD Pharoah, S A Gayther, and others. 2014. Kernel canonical correlation analysis for assessing gene- gene interactions and application to ovarian cancer. European Journal of Human Genetics 22, 1 (2014), 126-131.
- SC Larson. 1931. The shrinkage of the coefficient of multiple correlation. Journal of Educational Psychology 22, 1 (1931), 45.
- H-S Lee. 2007. Canonical correlation analysis using small number of samples. Communications in Statistic- sSimulation and Computation R 36, 5 (2007), 973-985.
- SE Leurgans, RA Moyeed, and BW Silverman. 1993. Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society. Series B (Methodological) (1993), 725-740.
- H Lindsey, JT Webster, and S Halpern. 1985. Canonical Correlation as a Discriminant Tool in a Periodontal Problem. Biometrical journal 27, 3 (1985), 257-264.
- P Marttinen, J Gillberg, A Havulinna, J Corander, and S Kaski. 2013. Genome-wide association studies with high-dimensional phenotypes. Statistical applications in genetics and molecular biology 12, 4 (2013), 413-431.
- XM Tu. 1991. A bootstrap resampling scheme for using the canonical correlation technique in rank estima- tion. Journal of chemometrics 5, 4 (1991), 333-343.
- XM Tu, DS Burdick, DW Millican, and LB McGown. 1989. Canonical correlation technique for rank estima- tion of excitation-emission matrixes. Analytical Chemistry 61, 19 (1989), 2219-2224.
- V Uurtio, M Bomberg, K Nybo, M It ävaara, and J Rousu. 2015. Canonical correlation methods for exploring microbe-environment interactions in deep subsurface. In International Conference on Discovery Science. Springer, 299-307.
- JP Van de Geer. 1984. Linear relations amongk sets of variables. Psychometrika 49, 1 (1984), 79-94.
- T Van Gestel, JAK Suykens, J De Brabanter, B De Moor, and J Vandewalle. 2001. Kernel canonical cor- relation analysis and least squares support vector machines. In International Conference on Artificial Neural Networks. Springer, 384-389.
- HD Vinod. 1976. Canonical ridge and econometrics of joint production. Journal of Econometrics 4, 2 (1976), 147-166.
- S Waaijenborg, PC Verselewel de Witt Hamer, and AH Zwinderman. 2008. Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Statistical Applications in Genetics and Molecular Biology 7, 1 (2008).
- C Wang. 2007. Variational Bayesian approach to canonical correlation analysis. IEEE Transactions on Neu- ral Networks 18, 3 (2007), 905-910.
- D Wang, L Shi, DS Yeung, and ECC Tsang. 2005. Nonlinear canonical correlation analysis of fMRI signals using HDR models. In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE, 5896-5899.
- GC Wang, N Lin, and B Zhang. 2013. Dimension reduction in functional regression using mixed data canon- ical correlation analysis. Stat Interface 6 (2013), 187-196.
- DS Watkins. 2004. Fundamentals of matrix computations. Vol. 64. John Wiley & Sons.
- FV Waugh. 1942. Regressions between sets of variables. Econometrica, Journal of the Econometric Society (1942), 290-310.
- DM Witten, R Tibshirani, and T Hastie. 2009. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics (2009), kxp008.
- KW Wong, PCW Fung, and CC Lau. 1980. Study of the mathematical approximations made in the basis- correlation method and those made in the canonical-transformation method for an interacting Bose gas. Physical Review A 22, 3 (1980), 1272.
- T Yamada and T Sugiyama. 2006. On the permutation test in canonical correlation analysis. Computational statistics & data analysis 50, 8 (2006), 2111-2123.
- H Yamamoto, H Yamaji, E Fukusaki, H Ohno, and H Fukuda. 2008. Canonical correlation analysis for mul- tivariate regression and its application to metabolic fingerprinting. Biochemical Engineering Journal 40, 2 (2008), 199-204.
- Y-H Yuan, Q-S Sun, and H-W Ge. 2014. Fractional-order embedding canonical correlation analysis and its applications to multi-view dimensionality reduction and recognition. Pattern Recognition 47, 3 (2014), 1411-1424.
- Y-H Yuan, Q-S Sun, Q Zhou, and D-S Xia. 2011. A novel multiset integrated canonical correlation analysis framework and its application in feature fusion. Pattern Recognition 44, 5 (2011), 1031-1040.
- B Zhang, J Hao, G Ma, J Yue, and Z Shi. 2014. Semi-paired probabilistic canonical correlation analysis. In International Conference on Intelligent Information Processing. Springer, 1-10.
- H Zou and T Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 2 (2005), 301-320.
- Received February 2017; revised -; accepted - ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: February 2017.