Positive-definite kernels are inner products in implicit Hilbert spaces. The kernel trick makes nonlinear linear.
A kernel is a function K(x, y) that measures similarity between two inputs. Mercer's theorem says that whenever K is symmetric and positive-definite, there is some feature map φ — possibly into an infinite-dimensional space — for which K(x, y) = ⟨φ(x), φ(y)⟩. The kernel is literally an inner product in that implicit space.
The kernel trick is the engineering payoff: any algorithm that touches its inputs only through inner products (linear regression, SVMs, PCA, ridge regression) can be run inside the lifted feature space without ever computing φ. You replace every xᵀy with a K(x, y) call and inherit a nonlinear method for free. Three kernels cover most of the territory: linear (no lift), polynomial (1 + xᵀy)^d (finite lift to all monomials of degree ≤ d), and the RBF exp(−γ‖x − y‖²) (lift to an infinite-dimensional space of radial basis functions).
We train a kernel-ridge classifier α = (K + λI)⁻¹ y on the displayed points. The coloured background is the continuous decision function f(x); the white curve is f(x) = 0. Switch from linear to polynomial or RBF and the boundary morphs from a straight line into circles, lobes, and contours — all without ever computing φ explicitly.
Slide lift from 0 to 1 to apply the explicit feature map φ(x, y) = (x, y, x² + y²). At t = 0 the two concentric-circle classes lie flat and overlap radially — no straight line in 2D separates them. At t = 1 they sit at different heights, and a horizontal plane (the linear classifier in the lifted space) cleanly slices them apart. This is the geometric content of the kernel trick: nonlinear in 2D = linear in 3D.
Points
Kernel matrix K (n = 16)
Hover any cell in the matrix — the corresponding pair of points lights up on the left.
Every kernel is a similarity score on pairs of inputs. Switch kernels and the matrix repaints: linear is dominated by the rank-2 outer product xᵀy, polynomial sharpens it nonlinearly, and RBF is a band-diagonal glow that decays with distance. Same points, three different geometries — none of which require knowing φ.