Error-Correcting Codes

Hamming codes, decoding spheres, and Reed-Solomon. Recovery from corruption.

Error-Correcting Codes

Shannon's noisy channel coding theorem promises that reliable communication is possible up to channel capacity — but it does not tell you how. Error-correcting codes are the constructive answer. They take k information symbols, append redundancy, and transmit n > k symbols. A clever receiver can then recover the original message even when some of those symbols flip mid-flight.

The geometry behind every code is the same. Each codeword sits at the center of a decoding sphere in Hamming space — the set of all received words it is closer to than any other codeword. A code with minimum distance d has spheres of radius ⌊(d−1)/2⌋, so it can detect d−1 errors and correct ⌊(d−1)/2⌋. Hamming(7,4) has d = 3 and corrects one bit error per block; Reed-Solomon codes work over larger alphabets and correct symbol errors, which is why they protect CDs, QR codes, and the data packets sent home by the Voyager probes.

Interactive: Hamming(7,4) and the Three Parity Circles

Four data bits become seven transmitted bits with three parity bits. The parity bits are arranged as the three regions of a Venn diagram so each data bit lies in exactly the parity circles whose checks would detect a flip in it. Toggle data bits to re-encode; click any bit on the diagram to inject an error.

data bits (d3 d5 d6 d7):

codeword sent

0 1 1 0 0 1 1

received

0 1 1 0 0 1 1

distance

0 bits

decoded data

1 0 1 1

Toggle any data bit to re-encode; the parity bits update so every circle contains an even number of ones. Click any bit on the diagram to flip it (simulating channel noise). Failing parity circles turn red — the binary number formed by which circles fail is the syndrome, and it names the bit position to flip back. Try flipping two bits to see how the decoder is fooled into "correcting" the wrong one.

Interactive: Decoding Spheres

Every code partitions the space of received words into decoding regions, one per codeword. For a code with minimum distance d, the spheres of radius ⌊(d−1)/2⌋ around codewords are guaranteed disjoint — every word inside a sphere decodes uniquely. Switch between codes to see how minimum distance controls correctability.

received word:

d_min

corrects

decodes to

000

distance

d_min = 3, so this code corrects 1 error. Each sphere has radius 1 — every received word lies in exactly one sphere. The cube shows every binary word of length 3. Codewords are large filled dots; non-codewords are color-coded by which codeword they decode to. The dashed circles are the decoding spheres of radius t = ⌊(d_min − 1)/2⌋. Click any vertex to choose a received word — the decoder maps it to the nearest codeword by Hamming distance.

Interactive: Reed-Solomon over GF(11)

Treat k data symbols as the coefficients of a degree-(k−1) polynomial and evaluate it at n distinct points in a finite field. Any k of those evaluations recover the polynomial via Lagrange interpolation, so the code corrects up to ⌊(n−k)/2⌋ symbol errors — exactly the Singleton-bound limit.

n7k3errors1

data symbols

2 10 6

codeword (mod 11)

2 7 2 9 6 4 3

received

2 7 2 0 6 4 3

decoded data

2 10 6

true polynomial

2 + 10x + 6x^2

recovered polynomial

2 + 10x + 6x^2

Encode k = 3 symbols by treating them as the coefficients of a polynomial and evaluating at n = 7 points in GF(11). The Singleton bound says d ≤ n − k + 1 = 5, so this code corrects up to 2 symbol errors. Push the error slider beyond 2 and watch the recovered polynomial diverge from the true one — decoding is no longer guaranteed.

The math objects

Block code (n, k, d): a set of M = q^k codewords of length n over an alphabet of size q, with minimum pairwise Hamming distance d. The rate is k/n; the redundancy is n − k.
Hamming distance: the number of positions in which two equal-length words differ. The minimum distance d determines what the code can do: detect any pattern of up to d − 1 errors, or correct up to ⌊(d − 1)/2⌋.
Decoding spheres: the ball of radius t = ⌊(d−1)/2⌋ around each codeword. Spheres around distinct codewords are disjoint, so a received word inside any sphere decodes uniquely to its center.
Syndrome: the result of the parity-check equations applied to the received word. A zero syndrome means no detected error; for Hamming(7,4) the syndrome read in binary is exactly the bit position of a single error.
Singleton bound: for any (n, k, d) code, k + d ≤ n + 1. Codes that meet this bound are called MDS (maximum distance separable). Reed-Solomon codes are MDS.
Reed-Solomon: evaluations of a degree-<k polynomial at n distinct points in a finite field. Any k uncorrupted evaluations recover the polynomial via Lagrange interpolation, corrupting more than ⌊(n−k)/2⌋ defeats it.

Key takeaways

Redundancy is geometry: codewords sit far apart in Hamming space, and decoding picks the nearest one.
A code with minimum distance d detects d − 1 errors and corrects ⌊(d − 1)/2⌋.
Hamming(7,4) corrects any single-bit error; the syndrome literally names the position to flip.
Reed-Solomon generalizes to symbol errors over a finite field via polynomial interpolation — used in CDs, DVDs, QR codes, and deep-space links.
The Singleton bound k + d ≤ n + 1 is the universal speed limit; Reed-Solomon achieves it.