Information-Theoretic Bounds

Understand fundamental limits: decision trees, entropy, and why comparison-based sorting requires Omega(n log n) comparisons

Information Theory Meets Algorithms

How many questions do you need to find a needle in a haystack? Information theory gives us a precise answer: the entropy of the probability distribution. This beautiful connection between information and computation reveals fundamental limits on what algorithms can achieve.

In this section, you'll discover why comparison-based sorting must be Ω(n log n), how to build optimal binary search trees for non-uniform access patterns, and witness an adversary prove lower bounds by choosing the worst-case answers.

Demo 1: Decision Trees and Lower Bounds

Every comparison-based algorithm is a decision tree: internal nodes are comparisons, and leaves are outcomes. With n! possible orderings, a sorting tree needs depth at least ⌈log₂(n!)⌉.

7

Binary search on a sorted array of 7 elements. Each comparison eliminates half the remaining possibilities. The tree depth is ⌈log₂(7)⌉ = 3.

<>x vs A[3]d=0<>x vs A[1]d=1Found at 0d=2Found at 2d=2<>x vs A[5]d=1Found at 4d=2Found at 6d=2
Comparison
Outcome
2
Max Depth
(worst case)
4
Outcomes
(leaves)
3
Comparisons
(internal nodes)
2.00
Info Bound
log₂(leaves)

Information-Theoretic Lower Bound

With 4 possible outcomes, any algorithm needs at least ⌈log₂(4)⌉ = 2 comparisons in the worst case. Each comparison gives 1 bit of information, and we need enough bits to distinguish all outcomes.

Demo 2: Shannon Entropy

Entropy H(X) = -Σ p(x) log₂ p(x) measures the average "bits of surprise" in a distribution. It's the theoretical minimum for the expected number of yes/no questions to identify an outcome.

Shannon Entropy

H(X) = -Σ p(x) log₂ p(x) measures the average "surprise" or information content. Low entropy = predictable. High entropy = unpredictable.

Binary Entropy Function

0.50
pH(p)00.511
H(0.50) = 1.0000 bits

General Entropy

4
p(x=0)25.0%
p(x=1)25.0%
p(x=2)25.0%
p(x=3)25.0%
Current Entropy:2.0000 bits
0 (certain)Max: 2.00 bits (uniform)
Efficiency: 100.0% of maximum

Information Content per Outcome

x = 0
I = 2.00
p·I = 0.500
x = 1
I = 2.00
p·I = 0.500
x = 2
I = 2.00
p·I = 0.500
x = 3
I = 2.00
p·I = 0.500

I(x) = -log₂(p(x)) = "bits of surprise" when x occurs

Why This Matters for Algorithms

  • Expected comparisons in search ≥ entropy of distribution
  • Huffman coding achieves average length ≈ entropy
  • Sorting n items needs log₂(n!) ≈ n log₂ n bits minimum

Demo 3: Optimal Binary Search Trees

When some keys are accessed more frequently, a balanced BST isn't optimal. Dynamic programming finds the tree minimizing expected search cost — frequently accessed keys should be near the root.

Access Frequencies

A
1 (10%)
B
2 (20%)
C
4 (40%)
D
2 (20%)
E
1 (10%)
B
C
D
Optimal BST StructureCBADE
Optimal BST Cost
1.800
expected comparisons
Balanced BST Cost
1.900
expected comparisons
5.3% savings by optimizing for access patterns!
Show DP Cost Table
i\\jABCDE
A0.100.401.101.501.80
B-0.200.801.201.50
C--0.400.801.10
D---0.200.40
E----0.10

cost[i][j] = minimum expected search cost for keys i through j

Dynamic Programming: For each subproblem (keys i..j), try each key k as root. Cost = cost(left) + cost(right) + P(i..j), where P(i..j) is the total probability of the range (since each comparison costs 1 unit of expected work).

Demo 4: The Adversary Argument

An adversary proves lower bounds by answering each query to maximize remaining possibilities. If the adversary can always keep half the options, any algorithm needs ⌈log₂(n!)⌉ comparisons to sort.

Adversary Argument

To prove sorting requires Ω(n log n) comparisons, imagine an adversary who answers each comparison to maximize the number of remaining possibilities. Any algorithm must ask enough questions to distinguish all n! permutations.

Permutations
3! = 6
Lower Bound
⌈log₂(3!)⌉ = 3
log₂(3!)
2.58

Make a Comparison

Compare:vs

Remaining Possibilities: 6

2.58 bits of uncertainty
[1,2,3][1,3,2][2,1,3][2,3,1][3,2,1][3,1,2]

The Adversary's Strategy

For each comparison, the adversary counts how many permutations have element i before j, and how many have j before i. It answers whichever eliminates fewer possibilities.

This forces any algorithm to make at least ⌈log₂(n!)⌉ comparisons, because each comparison can eliminate at most half the remaining possibilities.

Information-Theoretic Tools Acquired!

You now understand the deep connection between information and algorithms:

  • Decision trees model comparison-based algorithms
  • Entropy bounds expected comparisons from below
  • Optimal BSTs minimize cost for non-uniform distributions
  • Adversary arguments prove lower bounds constructively

Next: We'll explore probabilistic data structures like Bloom filters, skip lists, and HyperLogLog — trading certainty for massive space savings.