Shannon's mathematical theory of communication. Through 28 interactive demonstrations covering entropy, mutual information, source coding, channel capacity, error correction, and the rate-distortion frontier, see the math that makes every modem, codec, and compressor possible.
See also: Probability for the random variables underneath every entropy, Algorithms for information-theoretic complexity bounds, and Machine Learning for KL divergence and cross-entropy as loss functions.
H(X) = −Σ p log p. The fundamental measure of uncertainty in a probability distribution.
How much knowing one variable tells you about another. The chain rule for entropy.
Shannon's first theorem: entropy is the optimal compression bound. Huffman codes achieve it.
Long sequences from a source are almost certainly typical — and almost equally probable.
Shannon's noisy channel coding theorem: the maximum reliable rate of information through noise.
Hamming codes, decoding spheres, and Reed-Solomon. Recovery from corruption.
Continuous entropy h(X) = −∫ f log f. Why the Gaussian is nature's default distribution.
The cost in extra bits of coding for the wrong distribution. Cross-entropy and the bridge to ML.
Lossy compression: the best rate achievable at a given distortion. The math behind JPEG and MP3.