See TDA solving real problems: protein structure analysis, single-cell genomics, time series, and network analysis.
Topological Data Analysis has found applications across science, engineering, and medicine. The key insight is that shape matters — and TDA provides rigorous tools to quantify shape in high-dimensional data where our geometric intuition fails.
This page explores how persistent homology is used in protein structure analysis, single-cell genomics, materials science, time series analysis, and image processing.
Proteins fold into complex 3D structures with pockets and cavities that determine their function. Drug molecules bind to these pockets, so identifying and characterizing them is crucial for drug design.
TDA has been used to predict protein-ligand binding sites, classify protein families, and analyze conformational changes. The key advantage is that topological features are robust to small deformations and don't require manual feature engineering.
scRNA-seq data lives in ~20,000-dimensional gene expression space. Each cell is a point in this space, and cells cluster by type. But cell types often form continuous trajectories (differentiation) rather than discrete clusters.
Tools like Mapper and persistent homology have revealed unexpected structure in cell populations, including circular paths corresponding to cell cycle and branching differentiation trajectories.
Porous materials like zeolites have complex internal structure with channels and cavities. The size and connectivity of these pores determine the material's properties (filtration, catalysis, gas storage).
TDA has been used to screen millions of hypothetical materials, predict properties like gas adsorption capacity, and classify crystal structures.
A 1D time series can be embedded into higher dimensions using delay coordinates: x(t) → (x(t), x(t+τ), x(t+2τ), ...). This Takens embeddingreconstructs the attractor geometry, which TDA can then analyze.
TDA on time series has been used for anomaly detection in manufacturing, classifying cardiac arrhythmias, and predicting financial crashes.
Images can be viewed as functions (pixel intensity) on a grid, and TDA can extract topological features from the sublevel sets. These features capture texture and shape information not easily accessible to standard methods.
Applications include tumor detection in medical imaging, texture classification, and shape analysis in computer vision.
Persistent homology produces a persistence diagram, but how do you feed this into a machine learning pipeline? Several approaches have been developed:
Convert diagram to a sequence of functions that can be averaged, compared with L² distance, and used as features.
Discretize the diagram as a weighted sum of Gaussians, producing a fixed-size vector suitable for any ML model.
Neural network layers that operate directly on persistence diagrams, learning task-specific representations.
Define kernels (similarity functions) between persistence diagrams for use with SVMs and kernel regression.