Neural topology • Competitive learning

Self-Organizing Map (SOM)

A Self-Organizing Map is an unsupervised neural network that projects high-dimensional data onto a low-dimensional grid, preserving topological relationships. Explore the concept, watch it learn in real-time, and try a simple use case.

Topology-preserving Unsupervised Dimensionality reduction Clustering

Quick overview

  • What: A grid of “neurons,” each with a weight vector; they compete to represent input samples.
  • How: Find the best-matching neuron (BMU), then nudge it and its neighbors toward the input.
  • Why: The grid self-organizes into a map of the data manifold, revealing structure and clusters.
  • When: Visualizing high-D data, clustering with topology, vector quantization, and prototyping.

What is a SOM?

A Self-Organizing Map (SOM), also called a Kohonen map, is a 2D lattice of units. Each unit has a weight vector in the same space as the input. During training, inputs pull the most similar unit—the best matching unit (BMU)—and its neighbors closer, so nearby units in the lattice specialize in nearby regions of the data space. The result is a topology-preserving projection: neighborhoods in data map to neighborhoods on the grid.

Core mechanics

  • BMU: For input x, find unit i minimizing distance d(x, wᵢ) (usually Euclidean).
  • Neighborhood: Units near the BMU in grid coordinates receive a larger update.
  • Update: wⱼ ← wⱼ + α(t) · hⱼ,ᵇᵐᵤ(t) · (x − wⱼ), with decaying α and neighborhood width.
  • Convergence: Start broad and strong; narrow and fine-tune over time.

When and why to use SOMs

Great fits

  • Topology-aware clustering: You want cluster structure and neighborhood relationships.
  • Visualization: Map complex vectors (e.g., spectra, embeddings) onto an interpretable 2D grid.
  • Vector quantization: Build codebooks for compression or fast nearest-neighbor lookups.
  • Prototyping: Quick, unsupervised exploration before committing to labels or models.

Considerations

  • Parameters matter: Learning-rate and neighborhood schedules strongly affect outcomes.
  • Grid bias: The pre-set grid size/shape constrains resolution and topology.
  • Scale sensitivity: Inputs should be normalized; feature scaling changes the geometry.
  • Alternatives: For pure visualization consider t-SNE/UMAP; for clustering consider k-means/GMM.
Tip: Normalize features (e.g., to [0,1] or z-score). Start with a larger neighborhood that decays roughly exponentially over the training horizon.

Live simulation: train a SOM

Data Neurons BMU trace
Epoch: 0
Step: 0
Learning rate:
Radius:
BMU: argminᵢ ∥x − wᵢ∥₂ Update: wⱼ ← wⱼ + α(t) · exp(−∥rⱼ − rᵇᵐᵤ∥² / (2σ(t)²)) · (x − wⱼ) α(t), σ(t): decay from initial values to small final values over training

Controls

Keyboard: Space start/pause, . step, r reset

Use case: palette learning for color quantization

Train a small SOM in RGB space to learn a palette from an image, then map each pixel to its nearest neuron. This is vector quantization guided by topology: similar colors become neighboring codewords on the grid.

Compression Faster rendering Artistic stylization

Parameter tuning

  • Learning rate α(t): Start larger (e.g., 0.3–0.7) and decay exponentially to ~0.01.
  • Radius σ(t): Begin near max(gridW, gridH)/2 and decay to 1–2.
  • Schedule: Use epoch-based decay: α(t)=α₀·exp(−t/T), σ(t)=σ₀·exp(−t/T).
  • Epochs: Enough to cover rough ordering then fine tuning (e.g., 20–100).

Diagnostics

  • U‑Matrix: Visualize distances between neighboring units—ridges suggest cluster boundaries.
  • Quantization error: Average ∥x − wᵇᵐᵤ∥; should decrease and plateau.
  • Topographic error: Fraction of samples where first and second BMU are non-adjacent.
  • Stability: Run multiple seeds to assess consistency.

Pitfalls

  • Unscaled features: Dominant dimensions distort topology—normalize first.
  • Too small grid: Over-merges clusters; too large grid under-trains and looks noisy.
  • Too-fast decay: Map freezes early; too-slow decay: never stabilizes.
  • Random init bias: Try PCA-based init for faster ordering when available.