Source Coding & Huffman

Interactive: Huffman Tree Builder

Pick a probability distribution (or sketch your own with the sliders) and step through Huffman's algorithm. At each merge the two lowest-probability nodes in the forest become children of a new parent. The codes for each symbol fall out of the final tree.

presets:

merges so far: 0 / 4

E0.285T0.204A0.184O0.169I0.157

Resulting Huffman code

E...(- bits)

T...(- bits)

A...(- bits)

O...(- bits)

I...(- bits)

H(X)

2.287 bits

L (Huffman avg)

2.326 bits

L − H

0.039 bits

Step through the algorithm: at each merge, the two lowest-probability nodes in the forest are combined under a new internal parent. Read codewords by walking the final tree top-down (left = 0, right = 1). Shannon's source-coding theorem guarantees L ≥ H(X); Huffman comes within one bit.

Interactive: Compression Compared

The same text encoded three ways: a fixed-length code (⌈log₂ k⌉ bits per symbol), Huffman's variable-length code, and the entropy lower bound. The Huffman bar should sit between fixed and entropy — never below the entropy line, never more than one bit above it.

IT IS A TRUTH UNIVERSALLY ACKNOWLEDGED THAT A SINGLE MAN IN POSSESSION OF A GOOD FORTUNE MUST BE IN WANT OF A WIFE HOWEVER LITTLE KNOWN THE FEELINGS OR VIEWS OF SUCH A MAN MAY BE ON HIS FIRST ENTERING A NEIGHBOURHOOD THIS TRUTH IS SO WELL FIXED IN THE MINDS OF THE SURROUNDING FAMILIES

Symbols (k)

23

Length (chars)

231

Huffman / fixed

83.2%

Slack L − H

9.4 bits

First 30 symbols, Huffman-encoded

0011011001111010001011000111000101101101100011110010111110100001111010000000000010101101000101010010101011111110110100000001010010

Three encodings of the same stream. The fixed-length code wastes bits whenever the distribution is non-uniform. Huffman shrinks toward the entropy lower bound (Shannon's theorem); the gap L − H stays under one bit. With a single repeated symbol the entropy is zero and Huffman degenerates to a useless 1-bit code — the only case where block coding can do meaningfully better. Compression efficiency equals 82.4% of fixed.

Interactive: The Kraft Inequality

A binary tree of depth 3-5; click any node to claim it as a codeword. Each claim consumes 2^(−ℓ) of the budget [0, 1] and grays out its ancestors and descendants (which would break the prefix property). The bar shows Σ 2^(−ℓᵢ); equality with 1 means the tree is full.

presets:depth

Σ 2^(−ℓᵢ)0.0000

Valid prefix code

Codewords

0

Lengths

–

Budget used

0.0%

Slots blocked

0

Claimed codewords

none — click any node to claim

Click any node to claim it as a codeword. The Kraft inequality says Σ 2^(−ℓᵢ) ≤ 1 for any prefix code — every length-ℓ codeword consumes 2^(−ℓ) of the unit budget. A claim grays out its ancestors and descendants (they would no longer give a prefix-free code). Equality Σ = 1 happens exactly when the tree is full — no wasted budget. Try the Huffman 1, 2, 3, 3 preset: ½ + ¼ + ⅛ + ⅛ = 1, the canonical full tree.

The math objects

Source-coding theorem (Shannon, 1948): for any uniquely-decodable code over a source X with entropy H(X) bits, the average codeword length L = Σ pᵢ ℓᵢ satisfies L ≥ H(X). Conversely, there exists a prefix code with L < H(X) + 1. Block-coding the source closes the remaining gap to zero in the limit.
Prefix code: a code in which no codeword is the prefix of another. Equivalent to placing codewords at the leaves of a binary tree, which is why every prefix code can be drawn and decoded by walking from the root.
Kraft inequality: Σ 2^(−ℓᵢ) ≤ 1. Necessary and sufficient for codeword lengths {ℓᵢ} to admit a prefix code. Equality means the binary tree is full — no leaf is wasted.
Huffman's algorithm: repeatedly merge the two lowest-probability nodes into a new parent; recurse on the resulting forest. The final tree is the optimal prefix code for the given distribution. Cost: O(n log n).
Optimality gap L − H: always less than 1 bit for Huffman, and exactly zero when every probability is a power of ½ (so the optimal length ℓᵢ = −log₂ pᵢ is an integer). Real compressors close the gap further with arithmetic coding and block coding.

Interactive: Huffman Tree Builder

Interactive: Compression Compared

Interactive: The Kraft Inequality

The math objects

Key takeaways