Shannon's noisy channel coding theorem: the maximum reliable rate of information through noise.
Shannon's second great theorem — the noisy channel coding theorem — answers a question that for decades had seemed obviously hopeless: how much information can you reliably push through a channel that corrupts what it carries? The intuitive answer had always been “less and less, and eventually nothing.” Shannon showed in 1948 that the truth is sharper and stranger. Every channel has a single number — its capacity C — and below that rate you can communicate with arbitrarily small error. Above it, errors are inevitable.
The capacity is the maximum mutual information between input X and output Y, taken over every input distribution: C = maxP(X) I(X;Y). For the two fundamental binary models, the formulas are clean. The binary symmetric channel (BSC), which flips each bit with probability p, has C = 1 − H(p). The binary erasure channel (BEC), which deletes each bit with probability ε but tells you which ones are missing, has C = 1 − ε. The shapes of the two curves reveal the difference between “you don't know which bits are wrong” and “you know exactly where the gaps are.”
Binary symmetric channel: every input bit is independently flipped with probability p. Outlined output bits are the ones the channel corrupted.
Capacity C = 1 − H(p), where H is the binary entropy. The channel is useless only at p = 1/2 — when the bit you receive is independent of the bit that was sent, every bit you transmit is wasted. Surprisingly, a channel that flips every single bit (p = 1) is fully usable: just invert every output.
Binary erasure channel: each input bit either arrives intact or is replaced with a known erasure symbol ‘?’. The receiver knows which bits are missing — a much friendlier failure mode than a flip.
Capacity C = 1 − ε. The drop is purely linear because the receiver always knows where the gaps are — it loses information only at exactly the rate the channel destroys it. Compare to the BSC, where the receiver has to guess which bits to trust.
The two curves tell different stories. The BEC line drops linearly to zero at ε = 1 — perfectly graceful degradation. The BSC curve is concave: it bottoms out at p = 1/2 (the channel becomes useless because every received bit is independent of the sent bit), then climbs back to 1 at p = 1 (a channel that always flips is just a noiseless channel with a known relabeling). Half-noise is the worst possible noise.