Posterior equals likelihood times prior, normalized. Watch beliefs update as evidence arrives.
Bayesian inference treats unknown parameters as random variables and uses probability to encode what you believe about them. Start with a prior P(θ) — your beliefs before seeing data. Observe data D, and compute the likelihood P(D | θ) — how plausible the data is for each θ. Bayes' rule combines them into the posterior:
P(θ | D) = P(D | θ) · P(θ) / P(D)
The denominator P(D) — the evidence — just normalizes things; the shape of the posterior is set by the numerator. For a handful of special prior–likelihood pairs called conjugate families, the posterior lives in the same family as the prior, and the update is a one-line formula. The two demos below show the two classic cases: Beta–Binomial for proportions, and Normal–Normal for an unknown mean. The third demo shows that a familiar workhorse — ridge regression — is just a MAP estimate in disguise.
Each flip moves the posterior parameters by exactly one: a head bumps α, a tail bumps β. With no data the posterior equals the prior. As evidence accumulates the curve narrows and centres on the empirical proportion — the prior's influence dwindles. Try a confident prior like Beta(8, 8) and watch how many flips it takes for the posterior to forget it.
Click anywhere on the strip below to add an observation. The posterior over μ narrows toward the sample mean as more data arrives.
With known data variance σ², the posterior over μ is Gaussian: the precisions add, and the posterior mean is a precision-weighted average of the prior mean and the sample mean. With one observation, the posterior already shifts noticeably; with many, it pins down μ almost exactly — and you can see the prior's pull fade.
Click on the canvas to add a data point; click an existing point to remove it. Slide λ to strengthen the Gaussian prior on the slope — the MAP line bends toward zero slope (ridge regression).
MLE picks the parameters that maximize the data likelihood — for Gaussian noise that's ordinary least squares. MAP adds a prior: a zero-mean Gaussian prior on the slope contributes a λ‖β‖² penalty, exactly recovering ridge regression. As λ → 0 the MAP line collapses to MLE; as λ → ∞ it flattens to ȳ. The point: MAP = MLE + regularizer, and the regularizer is a prior in disguise.