Simple function approximation, Littlewood's three principles, and Lusin's theorem
In calculus, we integrate continuous functions. In Lebesgue theory, the class of integrable functions is vastly larger — but not every function can be integrated. The right condition is measurability: a function f : X → ℝ is measurable (with respect to a σ-algebra F on X) if for every Borel set B ⊆ ℝ, the preimage f⁻¹(B) = {x ∈ X : f(x) ∈ B} belongs to F.
This preimage condition ensures that we can "measure" the set of points where f takes values in any reasonable range. In practice, it suffices to check the condition for sets of the form (−∞, a] for all a ∈ ℝ, since these generate the Borel σ-algebra. Every continuous function is measurable (preimages of open sets are open, hence Borel), but measurability is much more permissive: monotone functions, limits of continuous functions, and even the characteristic function of the rationals are all measurable.
Measurable functions are closed under the algebraic operations (sums, products, scalar multiples) and under limits. The closure under limits is the decisive advantage over continuous functions: if fₙ → f pointwise and each fₙ is measurable, then f is measurable. This stability under limits is exactly what is needed for a powerful integration theory.
The definition of measurability through preimages is both natural and powerful. A function f : (X, F) → (ℝ, B(ℝ)) is measurable if and only if any of the following equivalent conditions hold: (1) f⁻¹((a, ∞)) ∈ F for all a ∈ ℝ, (2) f⁻¹([a, ∞)) ∈ F for all a, (3) f⁻¹((−∞, a)) ∈ F for all a, or (4) f⁻¹((−∞, a]) ∈ F for all a. Any one of these implies the full Borel-preimage condition because these sets generate B(ℝ).
The preimage perspective also makes compositions well-behaved: if f : X → ℝ is measurable and g : ℝ → ℝ is Borel measurable, then g ∘ f is measurable. This is because (g ∘ f)⁻¹(B) = f⁻¹(g⁻¹(B)), and g⁻¹(B) is Borel when g is Borel measurable and B is Borel. In particular, |f|, f², max(f, 0), and min(f, 0) are all measurable when f is.
For extended real-valued functions (taking values in [−∞, +∞]), measurability is defined analogously, and the sets f⁻¹({+∞}) and f⁻¹({−∞}) must be measurable. This extension is essential for dealing with suprema, infima, and limits of sequences of functions, which can naturally take infinite values.
A simple function is a measurable function that takes only finitely many values. Every simple function can be written as φ = Σᵢ₌₁ⁿ aᵢ · χ_{Eᵢ}, where a₁, …, aₙ are distinct real numbers and E₁, …, Eₙ are pairwise disjoint measurable sets whose union is X. Here χ_E denotes the characteristic (indicator) function of E.
Simple functions play the role in Lebesgue theory that step functions play in Riemann integration. Their integral is defined in the obvious way: ∫ φ dμ = Σᵢ aᵢ · μ(Eᵢ). This definition is unambiguous and satisfies all the linearity and monotonicity properties we expect. The key difference from Riemann step functions is that the sets Eᵢ can beany measurable sets, not just intervals.
The approximation theorem states that every non-negative measurable function f is the pointwise limit of an increasing sequence of simple functions: 0 ≤ φ₁ ≤ φ₂ ≤ … ≤ f with φₙ → f pointwise. If f is bounded, the convergence is uniform. This approximation from below is the bridge that connects simple functions to the full Lebesgue integral — we define ∫ f dμ = limₙ ∫ φₙ dμ.
J.E. Littlewood articulated three guiding principles that capture the essence of Lebesgue's theory: (1) every measurable set is "nearly" a finite union of intervals, (2) every measurable function is "nearly" continuous, and (3) every convergent sequence of measurable functions "nearly" converges uniformly. The word "nearly" is made precise by allowing exceptions on sets of arbitrarily small measure.
The second principle is formalized by Lusin's theorem: if f : [a, b] → ℝ is measurable and ε > 0, there exists a closed set F ⊆ [a, b] with m([a, b] \ F) < ε such that f restricted to F is continuous. In other words, a measurable function is continuous except on a set of arbitrarily small measure. This does not mean f is "almost everywhere continuous" — the exceptional set changes with ε — but it means measurable functions are never far from being continuous.
The third principle is Egoroff's theorem: if fₙ → f pointwise almost everywhere on a set E of finite measure, then for every ε > 0, there exists a measurable subset F ⊆ E with m(E \ F) < ε such that fₙ → f uniformly on F. Egoroff's theorem is a powerful tool because uniform convergence is much easier to work with than pointwise convergence — and Egoroff tells us that pointwise convergence is "almost" uniform.
The theory of measurable functions is the foundation on which the Lebesgue integral is built. The construction proceeds in stages: first define the integral for simple functions (a finite sum), then extend to non-negative measurable functions (via approximation by simple functions), and finally to general measurable functions (by writing f = f⁺ − f⁻ and integrating each part separately).
The closure of measurable functions under limits is the key property that makes the convergence theorems (Monotone Convergence, Dominated Convergence) possible. Without it, we could not guarantee that the limit of a sequence of integrable functions is even measurable, let alone integrable. This is the fundamental reason Lebesgue's theory is superior to Riemann's for the purposes of modern analysis.
In probability theory, measurable functions are called random variables. The preimage condition ensures that events like {X ≤ a} are in the σ-algebra of events and can be assigned probabilities. The distribution of a random variable is the pushforward measure μ_X(B) = P(X⁻¹(B)), and expectations are Lebesgue integrals. Measure theory and probability are thus two faces of the same coin.