Barcodes & Persistence Diagrams

Master the visual representations of persistence: barcodes, persistence diagrams, and how to interpret them.

Visualizing Persistence

The output of persistent homology is a collection of birth-death pairs. Two standard visualizations make this data interpretable: barcodes andpersistence diagrams. Both encode the same information but highlight different aspects.

These visualizations are the "fingerprint" of a dataset's topology — they capture the multi-scale structure in a way that can be compared, analyzed, and even used as features for machine learning.

Persistence Barcode

Each horizontal bar represents a feature. The left endpoint is the birth time; the right endpoint is the death time. Long bars indicate significant features; short bars are typically noise. Bars extending to infinity represent essential features.

Hover over a bar to see details. Long bars indicate significant topological features; short bars are typically noise.

Persistence Diagram

Each point (b, d) represents a feature born at time b and dying at time d. Points near the diagonal have low persistence (noise); points far from the diagonal are significant. The diagonal line is birth = death (features that exist for zero time).

Points represent (birth, death) pairs. Distance from diagonal = persistence. Points far from diagonal are significant features; near diagonal is noise.

Barcodes vs Diagrams

Barcodes

  • Easy to see at which scale features exist
  • Natural for time-series/filtration view
  • Length directly shows persistence
  • Can get cluttered with many features

Diagrams

  • Compact representation for many features
  • Easy to see persistence (distance from diagonal)
  • Natural for stability theorems
  • Can define distances between diagrams

The Diagonal and Noise

In the persistence diagram, the diagonal line d = b represents features with zero persistence — they're born and immediately die. While no actual features lie exactly on the diagonal, points near the diagonal have very short lifespans and are typically considered noise.

persistence = d - b = vertical distance from diagonal

The persistence threshold is a common way to filter noise: only keep features with persistence above some threshold. Features above the threshold are considered "topological signal"; those below are noise from sampling or measurement error.

Stability Theorem

One of the most important properties of persistent homology is stability: small changes in the input lead to small changes in the output. Formally, if two functions f and g differ by at most ε, their persistence diagrams differ by at most ε in the bottleneck distance.

d_B(Dgm(f), Dgm(g)) ≤ ||f - g||_∞
Bottleneck distance ≤ L∞ distance between functions

This means persistent homology is robust to noise and sampling variations — critical for real-world data analysis applications.

Comparing Diagrams

To compare two persistence diagrams, we need a notion of distance. The two standard choices are:

Bottleneck Distance (d_B)

Find the best matching between points in the two diagrams (points can also match to the diagonal). The bottleneck distance is the maximum distance between matched points.

Wasserstein Distance (W_p)

Like bottleneck, but uses the p-th power of distances summed over all matched pairs. More sensitive to overall distribution of features.

Key Takeaways

  • Barcodes — horizontal bars showing [birth, death] intervals
  • Persistence diagrams — scatter plot of (birth, death) points
  • Distance from diagonal = persistence = significance
  • Stability theorem — small input changes → small diagram changes
  • Bottleneck/Wasserstein — distances between diagrams