Shannon's first theorem: entropy is the optimal compression bound. Huffman codes achieve it.
A source coder turns symbols into bits. The question Shannon answered in 1948 — and the question every compression algorithm since has inherited — is: how few bits, on average, can you possibly use? The answer is exactly the entropy of the source. The source-coding theorem states that the average code length L of any uniquely-decodable code satisfies L ≥ H(X). And constructive matches the bound: Huffman coding achieves L within one bit of H(X) for any discrete source, and zero bits in the limit when the codeword lengths can be chosen as exact log probabilities.
The recipe is intuitive. Common symbols deserve short codes; rare symbols can afford long ones. Huffman's algorithm just keeps merging the two least-probable nodes into a new parent until a single binary tree remains. Reading the tree top-down — left as 0, right as 1 — gives a prefix-free codebook, the optimal one. The Kraft inequality Σ 2^(−ℓᵢ) ≤ 1 turns out to be the combinatorial budget that all prefix codes must respect: each codeword of length ℓ consumes 2^(−ℓ) of a unit pie. Equality means the tree is full and no budget is wasted.
Step through the algorithm: at each merge, the two lowest-probability nodes in the forest are combined under a new internal parent. Read codewords by walking the final tree top-down (left = 0, right = 1). Shannon's source-coding theorem guarantees L ≥ H(X); Huffman comes within one bit.
IT IS A TRUTH UNIVERSALLY ACKNOWLEDGED THAT A SINGLE MAN IN POSSESSION OF A GOOD FORTUNE MUST BE IN WANT OF A WIFE HOWEVER LITTLE KNOWN THE FEELINGS OR VIEWS OF SUCH A MAN MAY BE ON HIS FIRST ENTERING A NEIGHBOURHOOD THIS TRUTH IS SO WELL FIXED IN THE MINDS OF THE SURROUNDING FAMILIES
Three encodings of the same stream. The fixed-length code wastes bits whenever the distribution is non-uniform. Huffman shrinks toward the entropy lower bound (Shannon's theorem); the gap L − H stays under one bit. With a single repeated symbol the entropy is zero and Huffman degenerates to a useless 1-bit code — the only case where block coding can do meaningfully better. Compression efficiency equals 82.4% of fixed.
Click any node to claim it as a codeword. The Kraft inequality says Σ 2^(−ℓᵢ) ≤ 1 for any prefix code — every length-ℓ codeword consumes 2^(−ℓ) of the unit budget. A claim grays out its ancestors and descendants (they would no longer give a prefix-free code). Equality Σ = 1 happens exactly when the tree is full — no wasted budget. Try the Huffman 1, 2, 3, 3 preset: ½ + ¼ + ⅛ + ⅛ = 1, the canonical full tree.