The Asymptotic Equipartition Property

Interactive: Watch the Histogram Concentrate

Pick a source distribution. We sample 600 length-n sequences and plot a histogram of −(1/n) log P(seq) for each. As n grows, the histogram tightens around H(X). The shaded band marks the ε-typical region: sequences whose per-symbol rate falls within ε of H.

source:

length n15ε0.15

H(X)

0.610

sample mean

0.614

sample std

0.222

% typical

52.2%

Each bar is the count of sampled length-n sequences whose per-symbol log-probability lies in that bin. The dashed line marks H(X). As you crank up n, the histogram tightens around H — that is the AEP. Sequences whose rate lands inside the shaded band [H − ε, H + ε] are the ε-typical sequences. The standard deviation should shrink like roughly 1/√n.

Interactive: Typical vs Atypical Sequences

The full sequence space has 2^n elements. The typical set has only ≈ 2^(nH(p)). For a fair coin those numbers agree. As bias grows, the typical set becomes a vanishing sliver of the cube — yet it still carries almost all of the probability.

For a binary source with bias p, the full sequence space has 2^n elements, but the typical set only ≈ 2^(nH(p)). The ratio is exact — the rectangles are area-scaled.

length n20bias p0.85

H(p)

0.610

2^n

1,048,576

2^(nH)

4,695

ratio

223×

For a fair coin (p = 1/2) the typical set is the entire sequence space — there is nothing to compress. Push p toward 1 and the typical set becomes a vanishing sliver of the full cube, even though it still carries almost all of the probability. That gap is exactly Shannon's compression budget.

Interactive: Achievable Compression Rate

Sample a long binary sequence with bias p, then encode typical sequences with ⌈n(H + ε)⌉ bits and atypical ones literally with n bits. As n grows, the achieved per-symbol rate converges to H(p) — Shannon's source coding theorem in action.

“Compress” by encoding typical sequences with ⌈n(H + ε)⌉ bits and atypical ones literally with n bits. The achieved rate converges to H(p) — Shannon's source coding theorem.

length n50bias p0.85ε0.10

H(p) limit

0.610

achieved

0.837

% typical

58.4%

overhead

22.7%

Increase n and the achieved rate squeezes down toward H(p), the information-theoretic floor. Tighten ε (smaller band) and the floor is closer, but the typical-set probability shrinks unless n is large enough to compensate. This is exactly the trade-off in Shannon's source coding theorem: for any rate above H, there exists an n large enough that we can encode typical sequences within budget and ignore the vanishing tail.

The math objects

The empirical entropy rate: for a sequence x = (x₁, …, x_n) drawn i.i.d. from a source with distribution p, the per-symbol log-probability is −(1/n) log P(x) = −(1/n) Σ log p(x_i). Its expected value is H(X), and the law of large numbers makes it converge to H(X) in probability.
The typical set T_ε^(n): all sequences x with |−(1/n) log P(x) − H(X)| ≤ ε. By construction every typical sequence has probability between 2^(−n(H+ε)) and 2^(−n(H−ε)) — they are all roughly equiprobable, hence the name.
Two AEP guarantees (Cover & Thomas, Theorem 3.1.2): for every ε > 0 and large enough n, P(X^n ∈ T_ε^(n)) ≥ 1 − ε, and (1 − ε) · 2^(n(H − ε)) ≤ |T_ε^(n)| ≤ 2^(n(H + ε)). Almost all of the probability sits on a set of size approximately 2^(nH).
Source coding consequence: any rate R > H(X) is achievable — assign one short codeword to each typical sequence (about ⌈n(H + ε)⌉ bits suffice to address them all) and accept a vanishingly small failure probability on the atypical tail. No rate below H is achievable. This is Shannon's first theorem.
Why “equipartition”: within the typical set, sequences differ in probability only by factors of 2^(±nε), which becomes negligible compared to the absolute probability 2^(−nH). For practical purposes the typical set is uniform.

Interactive: Watch the Histogram Concentrate

Interactive: Typical vs Atypical Sequences

Interactive: Achievable Compression Rate

The math objects

Key takeaways