Every machine learning idea is a mathematical object. Through 29 interactive demonstrations across loss landscapes, eigendecompositions, kernels, automatic differentiation, Bayesian inference, information theory, equivariance, and manifolds, see the math that quietly runs every model.
See also: Linear Algebra for eigenvectors and SVD, Optimization for the foundations of gradient methods, and Probability for the priors and likelihoods behind Bayesian methods.
A loss function is a surface — training is rolling down it. Compare gradient descent, momentum, and Newton on real landscapes.
Least squares is orthogonal projection onto a column space. Regularization is projection with a soft constraint.
Principal components are eigenvectors of the covariance matrix — equivalently, the SVD of the centered data.
Positive-definite kernels are inner products in implicit Hilbert spaces. The kernel trick makes nonlinear linear.
Reverse-mode automatic differentiation on a computational DAG. The chain rule, organized for efficiency.
Posterior equals likelihood times prior, normalized. Watch beliefs update as evidence arrives.
Entropy, KL divergence, and cross-entropy — why log-loss is the natural objective for classification.
Convolution as group action. CNNs are equivariant maps — representation theory inside every neural network.
High-dimensional data lives on low-dimensional manifolds. t-SNE, UMAP, and autoencoders unfold them.