Lossy compression: the best rate achievable at a given distortion. The math behind JPEG and MP3.
Lossless coding answers the question “how few bits do I need to recover the source exactly?” The answer is the entropy H(X). Most of the world, though, does not need exact recovery. A photograph that's a fraction of a percent off pixel-by-pixel is still the same photograph; an audio signal off by a tenth of a decibel is still the same song. Lossy compression — JPEG, MP3, MPEG, every modern codec — trades a small, controlled amount of fidelity for a large reduction in bits.
Rate-distortion theory is the mathematics of that trade-off. You fix a distortion measure d(x, x̂) — typically squared error — and a tolerable average distortion D. The rate-distortion function is the smallest rate R achievable at that distortion:
R(D) = minq(x̂|x) : E[d(X,X̂)] ≤ D I(X; X̂)
The minimum is over all conditional distributions q(x̂ | x) that meet the distortion budget; the objective is the mutual information between source and reconstruction. Lossless source coding is exactly the special case D = 0, where R(0) recovers H(X). For a Gaussian source N(0, σ²) under squared-error distortion the answer collapses to a single line:
R(D) = ½ log₂(σ² / D), 0 < D ≤ σ²; R(D) = 0, D > σ².
That single formula is enough to derive the half-bit rule of thumb every codec engineer carries around: doubling your tolerable distortion saves you exactly half a bit per sample.
For a Gaussian source N(0, σ²) with squared-error distortion, R(D) = (1/2) log₂(σ²/D) for D ≤ σ², and 0 otherwise. Drag along the curve to pick an operating point.
At D = 0 the curve diverges — perfect reconstruction means perfect coding, which for a continuous source needs infinitely many bits per sample. This is the lossless “Shannon entropy” corner of rate-distortion. Slide right and the curve falls fast: doubling the tolerable distortion saves exactly half a bit per sample, which is why modest distortion budgets enable enormous compression ratios.
Lloyd's algorithm — better known as k-means — finds a set of k centres so the average squared distance from each sample to its nearest centre is locally minimised. The rate is log₂ k bits per sample because you only have to send the index of the chosen centre. As k grows, rate rises and distortion falls — this is rate-distortion in discrete form. Sweep the slider and watch your operating point trace the empirical R(D) curve below.
A small grayscale image is split into 8×8 blocks. Each block is run through a 2D DCT, then only the lowest-frequency coefficients are kept (in zig-zag order, as JPEG does). Drop the quality and you save bits at the cost of distortion — exactly rate-distortion in practice.
The DCT concentrates the energy of natural images in a few low-frequency coefficients — the upper-left of each 8×8 block. Dropping the rest barely changes the picture but slashes the bit budget. The error map on the right shows where the distortion lives: edges and texture, where the high-frequency content was. JPEG, MP3, and most modern codecs are elaborate variations on this single move.