Understand fundamental limits: decision trees, entropy, and why comparison-based sorting requires Omega(n log n) comparisons
How many questions do you need to find a needle in a haystack? Information theory gives us a precise answer: the entropy of the probability distribution. This beautiful connection between information and computation reveals fundamental limits on what algorithms can achieve.
In this section, you'll discover why comparison-based sorting must be Ω(n log n), how to build optimal binary search trees for non-uniform access patterns, and witness an adversary prove lower bounds by choosing the worst-case answers.
Every comparison-based algorithm is a decision tree: internal nodes are comparisons, and leaves are outcomes. With n! possible orderings, a sorting tree needs depth at least ⌈log₂(n!)⌉.
Binary search on a sorted array of 7 elements. Each comparison eliminates half the remaining possibilities. The tree depth is ⌈log₂(7)⌉ = 3.
With 4 possible outcomes, any algorithm needs at least ⌈log₂(4)⌉ = 2 comparisons in the worst case. Each comparison gives 1 bit of information, and we need enough bits to distinguish all outcomes.
Entropy H(X) = -Σ p(x) log₂ p(x) measures the average "bits of surprise" in a distribution. It's the theoretical minimum for the expected number of yes/no questions to identify an outcome.
H(X) = -Σ p(x) log₂ p(x) measures the average "surprise" or information content. Low entropy = predictable. High entropy = unpredictable.
I(x) = -log₂(p(x)) = "bits of surprise" when x occurs
When some keys are accessed more frequently, a balanced BST isn't optimal. Dynamic programming finds the tree minimizing expected search cost — frequently accessed keys should be near the root.
| i\\j | A | B | C | D | E |
|---|---|---|---|---|---|
| A | 0.10 | 0.40 | 1.10 | 1.50 | 1.80 |
| B | - | 0.20 | 0.80 | 1.20 | 1.50 |
| C | - | - | 0.40 | 0.80 | 1.10 |
| D | - | - | - | 0.20 | 0.40 |
| E | - | - | - | - | 0.10 |
cost[i][j] = minimum expected search cost for keys i through j
Dynamic Programming: For each subproblem (keys i..j), try each key k as root. Cost = cost(left) + cost(right) + P(i..j), where P(i..j) is the total probability of the range (since each comparison costs 1 unit of expected work).
An adversary proves lower bounds by answering each query to maximize remaining possibilities. If the adversary can always keep half the options, any algorithm needs ⌈log₂(n!)⌉ comparisons to sort.
To prove sorting requires Ω(n log n) comparisons, imagine an adversary who answers each comparison to maximize the number of remaining possibilities. Any algorithm must ask enough questions to distinguish all n! permutations.
For each comparison, the adversary counts how many permutations have element i before j, and how many have j before i. It answers whichever eliminates fewer possibilities.
This forces any algorithm to make at least ⌈log₂(n!)⌉ comparisons, because each comparison can eliminate at most half the remaining possibilities.
You now understand the deep connection between information and algorithms:
Next: We'll explore probabilistic data structures like Bloom filters, skip lists, and HyperLogLog — trading certainty for massive space savings.