Math ClubMath Club
v1 · padrão canônico

Lesson 118 — Principal Component Analysis (PCA)

Hotelling 1933: diagonalize covariance to find directions of maximum variance. Scores, explained variance, scree plot. Connection with SVD. Applications in ML, finance, genomics.

Used in: 3rd year of High School (17-18 years) · Equiv. Stochastik LK German · Equiv. H2 Math Statistics Singaporean · Equiv. Math B advanced Japanese

Σ=VΛVT,zk=VkT(xxˉ)\Sigma = V \Lambda V^T, \quad z_k = V_k^T (x - \bar{x})
Choose your door

Rigorous notation, full derivation, hypotheses

Mathematical definition

Setup and sample covariance

"The covariance matrix Σ\Sigma is always symmetric and positive semidefinite. Its eigenvalues are nonnegative and the eigenvectors form an orthonormal basis of Rd\mathbb{R}^d." — Introduction to Applied Linear Algebra (VMLS), §10.1

Principal components

Optimality

"The principal components are the eigenvectors of the data covariance matrix, ordered by decreasing eigenvalue. The first principal component captures the maximum variance; successive components capture maximum residual variance subject to orthogonality." — Understanding Linear Algebra, §7.1

Connection with SVD

Reconstruction and approximation error

x^i=xˉ+k=1Kzikvk\hat{x}_i = \bar{x} + \sum_{k=1}^K z_{ik}\, v_k
what this means · Keeping K components minimizes the mean squared reconstruction error among all rank-K projections (Eckart-Young applied to PCA).

Reconstruction error: .

Worked examples

Exercise list

30 exercises · 7 with worked solution (25%)

Application 17Understanding 3Modeling 5Challenge 2Proof 3
  1. Ex. 118.1Understanding

    Why is it necessary to center the data (subtract the mean) before applying PCA?

  2. Ex. 118.2Application

    Given , compute the covariance and principal components.

  3. Ex. 118.3Application

    Eigenvalues of : . Compute the explained variance by each PC and the cumulative. For which does cumulative variance reach 90%?

  4. Ex. 118.4Application

    PCs of a 2D dataset: , . Compute the scores of in both components.

  5. Ex. 118.5Application

    Using the data from the previous exercise, reconstruct retaining only PC1. What is the reconstruction error?

  6. Ex. 118.6UnderstandingAnswer key

    Why is computing PCA via SVD of the data matrix preferable to eigendecomposing directly?

  7. Ex. 118.7ApplicationAnswer key

    Compute the PCs of . What is the explained variance by each component?

  8. Ex. 118.8ApplicationAnswer key

    With standardized data (z-score), what is the total variance? What does the Kaiser criterion of retaining only PCs with eigenvalue greater than 1 mean?

  9. Ex. 118.9Application

    Dataset with eigenvalues (total 10). Compute the mean squared reconstruction error when keeping K = 1 and K = 2 components.

  10. Ex. 118.10Modeling

    PCA of 10 stocks' returns resulted in PC1 with similar magnitude positive loading for all stocks. What does PC1 represent economically? How would a portfolio manager use this information?

  11. Ex. 118.11ApplicationAnswer key

    SVD of with samples gave singular values . Compute the corresponding covariance eigenvalues and the explained variance by PC1.

  12. Ex. 118.12UnderstandingAnswer key

    Prove that the scores of different principal components are uncorrelated with each other.

  13. Ex. 118.13Application

    A dataset has 50 standardized features. With K = 10 PCs capturing 95% variance, how many parameters are needed to represent covariance via rank-K PCA versus full covariance?

  14. Ex. 118.14ModelingAnswer key

    Explain what the 3 first PCs of the Brazilian yield curve represent. Why do these 3 factors explain ~99% of variance?

  15. Ex. 118.15Application

    Explain the difference between performing PCA with and without prior standardization (z-score). When should you NOT standardize?

  16. Ex. 118.16Proof

    Show that projection onto the K first PCs minimizes mean squared reconstruction error among all rank-K linear projections. What is the value of the minimum error in terms of eigenvalues?

  17. Ex. 118.17Application

    Eigenvalues in descending order: 12, 8, 3, 1, 1, 1, 1, 1. Mentally construct the scree plot and identify the "elbow". How many PCs should you retain for 80% variance?

  18. Ex. 118.18Modeling

    Describe the Eigenfaces method (Turk-Pentland 1991) for facial recognition using PCA. What dimensionality is achieved compared to original pixels?

  19. Ex. 118.19Application

    What is the conceptual difference between PCA and ICA (Independent Component Analysis)? In what type of problem is ICA necessary?

  20. Ex. 118.20Application

    What happens when the covariance matrix is the identity ()? What does this imply for PCA and dimensionality reduction?

  21. Ex. 118.21Proof

    Prove that eigenvectors of a symmetric matrix corresponding to distinct eigenvalues are orthogonal. Use this to justify the orthogonality of PCs.

  22. Ex. 118.22ModelingAnswer key

    Explain what a PCA biplot shows. How do you interpret the direction and length of feature arrows and the position of samples?

  23. Ex. 118.23Application

    Why is classical PCA sensitive to outliers? What is the idea of Robust PCA for handling this problem?

  24. Ex. 118.24Application

    Dataset: N = 1000 samples, d = 100 standardized features. PCA with K = 5 PCs explains 80% variance. Compute the data compression factor (ratio between original and PCA storage).

  25. Ex. 118.25Challenge

    Explain the idea of Kernel PCA. How does replacing the inner product with a kernel allow capturing non-linear structure? What is the computational complexity?

  26. Ex. 118.26Application

    Explain the "dual trick" of PCA: when (more features than samples), how do you compute PCA efficiently? What is the complexity in each case?

  27. Ex. 118.27Application

    PCA of 2023 ENEM microdata (5 grades: CN, CH, LC, MT, Essay) resulted in PC1 with similar magnitude positive loadings for all grades. Interpret PC1. What could PC2 represent?

  28. Ex. 118.28Proof

    Prove that the sample variance of the k-th score equals the k-th eigenvalue of covariance . Use the connection with SVD.

  29. Ex. 118.29Modeling

    In the 1000 Genomes Project, PCA of genomic data from ~2500 people across 26 populations reveals clusters by continent. Explain how this is possible and what the 3 first PCs represent genetically.

  30. Ex. 118.30Challenge

    Describe the Probabilistic PCA model (Tipping-Bishop 1999). What are the advantages over classical PCA? How does this model reduce to classical PCA in a limiting case?

Sources

  • Understanding Linear Algebra — David Austin · Grand Valley State University · CC-BY-SA · Chapter 7: PCA via SVD, explained variance, scree plot, applications.
  • Introduction to Applied Linear Algebra (VMLS) — Stephen Boyd, Lieven Vandenberghe · Stanford University · CC-BY-NC-ND · Ch. 10: rigorous PCA theory, optimality, SVD connection, ML applications.
  • OpenIntro Statistics — Diez, Çetinkaya-Rundel, Barr · CC-BY-SA · §8.3: statistical perspective, explained variance, component interpretation, real data exercises.

Updated on 2026-05-11 · Author(s): Clube da Matemática

Found an error? Open an issue on GitHub or submit a PR — open source forever.