Continuous entropy h(X) = −∫ f log f. Why the Gaussian is nature's default distribution.
Shannon's entropy was defined for discrete random variables. To push the same idea into the continuous world we replace the sum with an integral and arrive at differential entropy: for a random variable X with density f(x), h(X) = −∫ f(x) log f(x) dx. The shape of the formula is identical to its discrete cousin, but its behaviour is meaningfully different — h can be negative, and its value depends on the units you measure x in.
Differential entropy gets interesting through the maximum-entropy principle. Among all densities satisfying a given set of constraints, the one that maximizes h(X) is, in a precise sense, the "least biased" choice — it assumes nothing beyond the constraints themselves. Three classic results follow: the uniform distribution is max-entropy on a bounded interval, the exponential is max-entropy on the positive reals at a fixed mean, and the Gaussian is max-entropy on the whole real line at a fixed mean and variance. That last result is one of the deepest reasons Gaussians appear everywhere — in measurement noise, in statistical models, in the central limit theorem, in physics.
The classic max-entropy density on the real line at fixed mean and variance. Push σ below 1/√(2πe) ≈ 0.242 to see h(X) go negative.
Three different constraint sets, three different max-entropy distributions. Adjust each panel's constraint and watch its PDF (and entropy) change.
Each density is the unique max-entropy choice given its constraint. Replace any of these distributions with a different shape that meets the same constraint, and h(X) will be strictly smaller — that is what "maximum entropy" literally means.
All three densities have mean 0 and variance 1. The Gaussian wins — on the real line at fixed (μ, σ²), no other density has a higher differential entropy.
Mixture (1 − t)·Gaussian + t·Laplace, dashed white. h(X) = 2.0471 bits. Notice how it stays at or below the pure-Gaussian value.
This is the maximum-entropy theorem in concrete form. Constrain the first two moments of a real-valued random variable, and the unique density that maximizes h(X) is the Gaussian. Anything else with the same mean and variance — heavier tails, lighter tails, compact support — has strictly less entropy.