Data dimensional reduction and principal components analysis
2019, Procedia Computer Science
https://doi.org/10.1016/J.PROCS.2019.12.111Abstract
Research in the fields of machine learning and intelligent systems addresses essential problem of developing computer algorithms that can deal with huge amounts of data and then utilize this data in an intellectual way to solve a variety of real-world problems. In many applications, to interpret data with a large number of variables in a meaningful way, it is essential to reduce the number of variables and interpret linear combinations of the data. Principal Component Analysis (PCA) is an unsupervised learning technique that uses sophisticated mathematical principles to reduce the dimensionality of large datasets. The goal of this paper is to provide a complete understanding of the sophisticated PCA in the fields of machine learning and data dimensional reduction. It explains its mathematical aspect and describes its relationship with Singular Value Decomposition (SVD) when PCA is calculated using the covariance matrix. In addition, with the use of MATLAB, the paper shows the usefulness of PCA in representing and visualizing Iris dataset using a smaller number of variables.
FAQs
AI
How does PCA effectively reduce dimensionality in large datasets?
The paper demonstrates that PCA can retain 99.96% of data variation while reducing dimensions from 4 to 2 for the Iris dataset.
What role does SVD play in PCA calculations?
PCA utilizes SVD to determine eigenvalues and eigenvectors from the covariance matrix, aiding in the dimensionality reduction process.
What percentage of variance is typically retained using PCA?
In practice, including principal components that cover about 70-80% of the data variation is often sufficient.
What methods were compared to PCA in the related work?
The research references methods such as Non-Negative Matrix Factorization and wavelet-based techniques with varying successes in different contexts.
What practical applications are suggested for PCA based on this research?
The research indicates that PCA derived from SVD can be applied to image classification and EEG signal analysis.
References (10)
- Qiao, Hanli. (2015) "New SVD based initialization strategy for non-negative matrix factorization" Pattern Recognition Letters 63: 71-77.
- Kumar, Ranjeet; A. Kumar; and G. K. Singh. (2015) "Electrocardiogram signal compression based on singular value decomposition (SVD) and adaptive scanning wavelet difference reduction (ASWDR) technique" AEU-International Journal of Electronics and Communications 69.12: 1810-1822.
- Houari, Rima; et al. (2016) "Dimensionality reduction in data mining: A Copula approach." Expert Systems with Applications 64: 247-260.
- Menon, Vineetha; Qian Du; and James E. Fowler. (2016) "Fast SVD with random Hadamard projection for hyperspectral dimensionality reduction." IEEE Geoscience and Remote Sensing Letters 13.9: 1275-1279.
- Kumar, Manoj; and Ankita Vaish. (2017) "An efficient encryption-then-compression technique for encrypted images using SVD." Digital Signal Processing 60: 81-89.
- Olive, David J. (2017) "Principal component analysis." Robust Multivariate Analysis. Springer, Cham, 189-217.
- Feng, Jun; et al. (2018) "A Secure Higher-Order Lanczos-Based Orthogonal Tensor SVD for Big Data Reduction." IEEE Transactions on Big Data.
- I. T. Jolliffe. (2002) "Principal Component Analysis", 2nd ed. Springer series in statistics.
- P. David. (2015) "Linear Algebra: A Modern Introduction", 4th ed. Cengage Learning.
- D. a. K. T. Dua, E. (2017) "UCI Machine Learning Repository", Available: http://archive.ics.uci.edu/ml