PCA = Eigenvectors of Covariance

Principal components are eigenvectors of the covariance matrix — equivalently, the SVD of the centered data.

PCA = Eigenvectors of Covariance

Principal Component Analysis sounds like an algorithm. It is really a sentence in linear algebra: the principal components of a dataset are the eigenvectors of its covariance matrix. Center the data so its mean is zero, form Σ = (1/n) Xᵀ X, and diagonalize. The eigenvectors point along the directions of greatest variance. The eigenvalues are those variances. That is the entire method.

Equivalently — and this is the form that scales — the principal components are the right singular vectors of the centered data matrix X. The Singular Value Decomposition X = U Σ Vᵀ packages everything: the columns of V are the principal directions, and the squared singular values σₖ² are n times the variances. Among all rank-k approximations of X, U_k Σ_k V_kᵀ minimizes the Frobenius error — the Eckart–Young theorem makes this optimality precise. PCA is the unique linear projection onto a k-dimensional subspace that preserves the maximum amount of variance.

Interactive: Covariance & its Eigenvectors

Drag points to reshape a 2D cloud. The covariance matrix and its eigenvalues update live; the principal axes are drawn as arrows whose lengths are ±2σ along that direction.
covariance Σ = (1/n) Xᵀ X
13.957.417.414.53
eigenvalues (variance per PC)
λ₁ = 18.025 (97.5% of variance)
λ₂ = 0.461 (2.5% of variance)
PC1 angle: 28.8°

The teal arrow is the first principal component — the direction of maximum variance. It is the unit eigenvector of Σ with the larger eigenvalue. The violet arrow is PC2, always orthogonal because Σ is symmetric. Arrow length is ±2σ along that axis. Drag a point to deform the cloud and watch how Σ, its eigenvalues, and the principal axes update in real time.

Interactive: SVD on an Image

An image is just a matrix. Sliding k changes how many singular components you keep — the rank-k SVD is the best low-rank reconstruction in the Frobenius norm. Watch the image sharpen as k grows.

SVD on a centered data matrix is PCA. Here the "data" is a 32×32 grayscale image. Keeping only the top k singular values gives the best rank-k approximation in the Frobenius norm — the Eckart–Young theorem.

original (rank 32)
reconstruction (rank 4)
Frobenius error
10.4%
‖A − A_k‖ / ‖A‖
variance kept
98.9%
Σᵢ₌₁ᵏ σᵢ² / Σ σᵢ²
storage ratio
0.25×
k(m+n+1) / mn
singular value spectrum (scree plot)

The scree plot shows how singular values decay. Real-world images and datasets are typically low-rank in this sense — a handful of components captures most of the variance. PCA exploits exactly this fact: project onto the top-k right singular vectors and you preserve the maximum possible variance for any k-dimensional linear projection.

Interactive: Project a 3D Cloud to 2D

An elongated 3D point cloud with its three principal axes drawn as colored arrows. Toggle the projection and the cloud collapses onto the plane spanned by PC1 and PC2 — the optimal 2D linear summary of the data.

A 3D point cloud shaped like an elongated ellipsoid. The three colored arrows are the principal axes — eigenvectors of the 3×3 covariance matrix, sorted by eigenvalue. Drag to orbit. Toggle the projection to collapse the cloud onto the PC1-PC2 plane.

top-2 PCs preserve 98.3% of variance
PC1
λ = 15.276
85.4% variance
PC2
λ = 2.303
12.9% variance
PC3
λ = 0.299
1.7% variance

Optimal linear dimensionality reduction means: of all linear maps from R³ → R², the projection onto PC1 and PC2 loses the least variance. The third eigenvalue is the variance you throw away when you collapse onto that plane.

The math objects

  • Centered data matrix X: n rows of d-dimensional samples, with the column mean subtracted. Centering matters — it's why PCA captures variance rather than absolute position.
  • Covariance Σ = (1/n) Xᵀ X: a d × d symmetric positive semidefinite matrix. The diagonal entries are per-feature variances, the off-diagonals measure how features co-vary.
  • Eigendecomposition Σ = V Λ Vᵀ: by the spectral theorem, Σ is diagonalizable in an orthonormal basis. The columns of V are the principal axes; the diagonal entries of Λ are the variances along them.
  • SVD X = U Σ_s Vᵀ: the same V appears here. The connection is algebraic: Σ = (1/n) V Σ_s² Vᵀ, so eigenvalues of Σ are σₖ² / n.
  • Eckart–Young: truncating to the top k singular components gives the best rank-k approximation in any unitarily invariant norm — Frobenius and spectral norm included.
  • Whitening: mapping data with V Λ⁻¹ᐟ² Vᵀ leaves a cloud with identity covariance — useful as preprocessing for many downstream methods.

Key takeaways

  • Principal components = eigenvectors of Σ = (1/n) Xᵀ X = right singular vectors of X. Three names, one geometry.
  • Each eigenvalue is the variance along that PC. Their sum is the total variance (the trace of Σ).
  • PCA is the optimal linear dimensionality reduction: of all rank-k linear projections, the top-k SVD truncation preserves the most variance.
  • The SVD form is preferred numerically: never form Xᵀ X explicitly — that squares the condition number.
  • PCA is linear. Real data often lies on curved manifolds; that's where kernel PCA, autoencoders, and t-SNE pick up the story.