Channel Capacity

Shannon's noisy channel coding theorem: the maximum reliable rate of information through noise.

Channel Capacity

Shannon's second great theorem — the noisy channel coding theorem — answers a question that for decades had seemed obviously hopeless: how much information can you reliably push through a channel that corrupts what it carries? The intuitive answer had always been “less and less, and eventually nothing.” Shannon showed in 1948 that the truth is sharper and stranger. Every channel has a single number — its capacity C — and below that rate you can communicate with arbitrarily small error. Above it, errors are inevitable.

The capacity is the maximum mutual information between input X and output Y, taken over every input distribution: C = maxP(X) I(X;Y). For the two fundamental binary models, the formulas are clean. The binary symmetric channel (BSC), which flips each bit with probability p, has C = 1 − H(p). The binary erasure channel (BEC), which deletes each bit with probability ε but tells you which ones are missing, has C = 1 − ε. The shapes of the two curves reveal the difference between “you don't know which bits are wrong” and “you know exactly where the gaps are.”

Interactive: Binary Symmetric Channel

A reproducible bit stream is sent through a channel that flips each bit independently with probability p. Watch the noise pattern in the middle row and the corrupted output below — outlined cells mark the bits the channel changed. The capacity readout falls as p climbs and crashes to zero at p = 1/2.

Binary symmetric channel: every input bit is independently flipped with probability p. Outlined output bits are the ones the channel corrupted.

bit-error rate
8.8%
capacity C
0.531 bits
lost per use
0.469 bits

Capacity C = 1 − H(p), where H is the binary entropy. The channel is useless only at p = 1/2 — when the bit you receive is independent of the bit that was sent, every bit you transmit is wasted. Surprisingly, a channel that flips every single bit (p = 1) is fully usable: just invert every output.

Interactive: Binary Erasure Channel

Same input stream, different failure mode: every bit either arrives intact or is replaced with a known erasure '?'. The receiver always knows exactly which bits are missing — a much friendlier kind of damage than a silent flip. Capacity is the simple linear C = 1 − ε.

Binary erasure channel: each input bit either arrives intact or is replaced with a known erasure symbol ‘?’. The receiver knows which bits are missing — a much friendlier failure mode than a flip.

erasure rate
20.0%
capacity C
0.750 bits
lost per use
0.250 bits

Capacity C = 1 − ε. The drop is purely linear because the receiver always knows where the gaps are — it loses information only at exactly the rate the channel destroys it. Compare to the BSC, where the receiver has to guess which bits to trust.

Interactive: Capacity Curves Side by Side

The two channels' capacities plotted together. The BEC line drops in a straight line to zero. The BSC curve is concave and dies only at the very middle, where the received bit becomes statistically independent of the sent bit. Drag along the x-axis to compare the two at any noise level.
show:

The two curves tell different stories. The BEC line drops linearly to zero at ε = 1 — perfectly graceful degradation. The BSC curve is concave: it bottoms out at p = 1/2 (the channel becomes useless because every received bit is independent of the sent bit), then climbs back to 1 at p = 1 (a channel that always flips is just a noiseless channel with a known relabeling). Half-noise is the worst possible noise.

The math objects

  • Discrete memoryless channel: a transition matrix P(Y | X) on input alphabet 𝒳 and output alphabet 𝒴, applied independently to each symbol.
  • Mutual information I(X;Y): the reduction in uncertainty about X you gain by observing Y. Equivalently H(Y) − H(Y|X) or H(X) − H(X|Y). Symmetric in X and Y.
  • Capacity C = maxP(X) I(X;Y): the supremum of mutual information over input distributions. A property of the channel alone — the maximum is over the transmitter's strategy.
  • Coding theorem (achievability): for any rate R < C and any ε > 0, there exists a block length n and a codebook of size 2nR with maximum decoding error below ε. The proof uses random codes and the asymptotic equipartition property.
  • Converse: for any rate R > C, the average decoding error is bounded away from zero no matter how clever the code. Reliable communication above capacity is impossible.
  • BSC capacity 1 − H(p): follows because the uniform input maximizes H(Y) (which equals 1 bit) while H(Y|X) = H(p) for any input. The capacity-achieving distribution is uniform over {0, 1}.
  • BEC capacity 1 − ε: with probability 1 − ε the receiver learns the bit; with probability ε it learns nothing. On average it learns 1 − ε bits per use.

Key takeaways

  • Capacity is the maximum mutual information between input and output, optimized over input distributions.
  • Below capacity, arbitrarily reliable communication is possible with long enough codes — Shannon's noisy channel theorem.
  • Above capacity, errors are unavoidable — the converse to Shannon's theorem.
  • BSC capacity C = 1 − H(p) is concave and dies only at p = 1/2 — the “coin-flip channel” is useless.
  • BEC capacity C = 1 − ε is linear — known-location erasures are the gentlest possible damage.
  • A BSC at p = 1 is fully usable: the receiver just inverts every bit. Half-noise is the worst noise.