Applications in Data Science & Biology

See TDA solving real problems: protein structure analysis, single-cell genomics, time series, and network analysis.

TDA in the Real World

Topological Data Analysis has found applications across science, engineering, and medicine. The key insight is that shape matters — and TDA provides rigorous tools to quantify shape in high-dimensional data where our geometric intuition fails.

This page explores how persistent homology is used in protein structure analysis, single-cell genomics, materials science, time series analysis, and image processing.

Protein Structure Analysis

Detecting Binding Pockets and Cavities

Proteins fold into complex 3D structures with pockets and cavities that determine their function. Drug molecules bind to these pockets, so identifying and characterizing them is crucial for drug design.

Build alpha complex from atom coordinates
Compute H₂ (voids) to find enclosed cavities
Persistence indicates cavity size and stability
Compare cavities across protein conformations

TDA has been used to predict protein-ligand binding sites, classify protein families, and analyze conformational changes. The key advantage is that topological features are robust to small deformations and don't require manual feature engineering.

Single-Cell RNA Sequencing

Understanding Cell Populations

scRNA-seq data lives in ~20,000-dimensional gene expression space. Each cell is a point in this space, and cells cluster by type. But cell types often form continuous trajectories (differentiation) rather than discrete clusters.

H₀ reveals distinct cell populations
H₁ detects circular differentiation paths (e.g., cell cycle)
Mapper algorithm visualizes branching trajectories
Robust to dropout noise in sparse scRNA-seq data

Tools like Mapper and persistent homology have revealed unexpected structure in cell populations, including circular paths corresponding to cell cycle and branching differentiation trajectories.

Materials Science

Porous Materials and Zeolites

Porous materials like zeolites have complex internal structure with channels and cavities. The size and connectivity of these pores determine the material's properties (filtration, catalysis, gas storage).

H₁ detects channels through the material
H₂ detects enclosed cavities
Persistence measures pore size distribution
Compare materials by topological fingerprints

TDA has been used to screen millions of hypothetical materials, predict properties like gas adsorption capacity, and classify crystal structures.

Time Series Analysis

Takens Embedding and Recurrence

A 1D time series can be embedded into higher dimensions using delay coordinates: x(t) → (x(t), x(t+τ), x(t+2τ), ...). This Takens embeddingreconstructs the attractor geometry, which TDA can then analyze.

H₁ detects periodic/quasi-periodic behavior (loops)
Chaotic attractors have rich topological structure
Persistence landscapes enable statistical tests
Applications: financial data, EEG, climate, sensor data

TDA on time series has been used for anomaly detection in manufacturing, classifying cardiac arrhythmias, and predicting financial crashes.

Image Analysis

Texture and Shape Features

Images can be viewed as functions (pixel intensity) on a grid, and TDA can extract topological features from the sublevel sets. These features capture texture and shape information not easily accessible to standard methods.

Sublevel set filtration on grayscale images
H₀ counts connected bright/dark regions
H₁ detects holes in regions (texture)
Cubical complexes for efficient computation

Applications include tumor detection in medical imaging, texture classification, and shape analysis in computer vision.

TDA + Machine Learning

Persistent homology produces a persistence diagram, but how do you feed this into a machine learning pipeline? Several approaches have been developed:

Persistence Landscapes

Convert diagram to a sequence of functions that can be averaged, compared with L² distance, and used as features.

Persistence Images

Discretize the diagram as a weighted sum of Gaussians, producing a fixed-size vector suitable for any ML model.

PersLay / Deep Learning

Neural network layers that operate directly on persistence diagrams, learning task-specific representations.

Kernel Methods

Define kernels (similarity functions) between persistence diagrams for use with SVMs and kernel regression.

Key Takeaways

Protein analysis — detect binding pockets and cavities (H₂)
Single-cell genomics — find cell trajectories and cycles
Materials science — characterize porous structures
Time series — Takens embedding + TDA for dynamics
Images — texture and shape from sublevel set filtrations
ML integration — landscapes, images, kernels, deep learning