Information-Theoretic Bounds

Understand fundamental limits: decision trees, entropy, and why comparison-based sorting requires Omega(n log n) comparisons

Information Theory Meets Algorithms

How many questions do you need to find a needle in a haystack? Information theory gives us a precise answer: the entropy of the probability distribution. This beautiful connection between information and computation reveals fundamental limits on what algorithms can achieve.

In this section, you'll discover why comparison-based sorting must be Ω(n log n), how to build optimal binary search trees for non-uniform access patterns, and witness an adversary prove lower bounds by choosing the worst-case answers.

Demo 1: Decision Trees and Lower Bounds

Every comparison-based algorithm is a decision tree: internal nodes are comparisons, and leaves are outcomes. With n! possible orderings, a sorting tree needs depth at least ⌈log₂(n!)⌉.

Array size:7

Binary search on a sorted array of 7 elements. Each comparison eliminates half the remaining possibilities. The tree depth is ⌈log₂(7)⌉ = 3.

Comparison

Outcome

Max Depth

(worst case)

Outcomes

(leaves)

Comparisons

(internal nodes)

2.00

Info Bound

log₂(leaves)

Information-Theoretic Lower Bound

With 4 possible outcomes, any algorithm needs at least ⌈log₂(4)⌉ = 2 comparisons in the worst case. Each comparison gives 1 bit of information, and we need enough bits to distinguish all outcomes.

Demo 2: Shannon Entropy

Entropy H(X) = -Σ p(x) log₂ p(x) measures the average "bits of surprise" in a distribution. It's the theoretical minimum for the expected number of yes/no questions to identify an outcome.

Shannon Entropy

H(X) = -Σ p(x) log₂ p(x) measures the average "surprise" or information content. Low entropy = predictable. High entropy = unpredictable.

Binary Entropy Function

p(success):0.50

H(0.50) = 1.0000 bits

General Entropy

Outcomes:4

p(x=0)25.0%

p(x=1)25.0%

p(x=2)25.0%

p(x=3)25.0%

Current Entropy:2.0000 bits

0 (certain)Max: 2.00 bits (uniform)

Efficiency: 100.0% of maximum

Information Content per Outcome

x = 0

I = 2.00

p·I = 0.500

x = 1

I = 2.00

p·I = 0.500

x = 2

I = 2.00

p·I = 0.500

x = 3

I = 2.00

p·I = 0.500

I(x) = -log₂(p(x)) = "bits of surprise" when x occurs

Why This Matters for Algorithms

Expected comparisons in search ≥ entropy of distribution
Huffman coding achieves average length ≈ entropy
Sorting n items needs log₂(n!) ≈ n log₂ n bits minimum

Demo 3: Optimal Binary Search Trees

When some keys are accessed more frequently, a balanced BST isn't optimal. Dynamic programming finds the tree minimizing expected search cost — frequently accessed keys should be near the root.

Access Frequencies

1 (10%)

2 (20%)

4 (40%)

2 (20%)

1 (10%)

Optimal BST Cost

1.800

expected comparisons

Balanced BST Cost

1.900

expected comparisons

5.3% savings by optimizing for access patterns!

Show DP Cost Table

i\\j	A	B	C	D	E
A	0.10	0.40	1.10	1.50	1.80
B	-	0.20	0.80	1.20	1.50
C	-	-	0.40	0.80	1.10
D	-	-	-	0.20	0.40
E	-	-	-	-	0.10

cost[i][j] = minimum expected search cost for keys i through j

Dynamic Programming: For each subproblem (keys i..j), try each key k as root. Cost = cost(left) + cost(right) + P(i..j), where P(i..j) is the total probability of the range (since each comparison costs 1 unit of expected work).

Demo 4: The Adversary Argument

An adversary proves lower bounds by answering each query to maximize remaining possibilities. If the adversary can always keep half the options, any algorithm needs ⌈log₂(n!)⌉ comparisons to sort.

Adversary Argument

To prove sorting requires Ω(n log n) comparisons, imagine an adversary who answers each comparison to maximize the number of remaining possibilities. Any algorithm must ask enough questions to distinguish all n! permutations.

n elements:

Permutations

3! = 6

Lower Bound

⌈log₂(3!)⌉ = 3

log₂(3!)

2.58

Make a Comparison

Compare:vs

Remaining Possibilities: 6

2.58 bits of uncertainty

[1,2,3][1,3,2][2,1,3][2,3,1][3,2,1][3,1,2]

The Adversary's Strategy

For each comparison, the adversary counts how many permutations have element i before j, and how many have j before i. It answers whichever eliminates fewer possibilities.

This forces any algorithm to make at least ⌈log₂(n!)⌉ comparisons, because each comparison can eliminate at most half the remaining possibilities.

Information-Theoretic Tools Acquired!

You now understand the deep connection between information and algorithms:

Decision trees model comparison-based algorithms
Entropy bounds expected comparisons from below
Optimal BSTs minimize cost for non-uniform distributions
Adversary arguments prove lower bounds constructively

Next: We'll explore probabilistic data structures like Bloom filters, skip lists, and HyperLogLog — trading certainty for massive space savings.