Self-Organizing Map (SOM): Concepts, Simulation, and Demos

What is a SOM?

A Self-Organizing Map (SOM), also called a Kohonen map, is a 2D lattice of units. Each unit has a weight vector in the same space as the input. During training, inputs pull the most similar unit—the best matching unit (BMU)—and its neighbors closer, so nearby units in the lattice specialize in nearby regions of the data space. The result is a topology-preserving projection: neighborhoods in data map to neighborhoods on the grid.

Core mechanics

BMU: For input x, find unit i minimizing distance d(x, wᵢ) (usually Euclidean).
Neighborhood: Units near the BMU in grid coordinates receive a larger update.
Update: wⱼ ← wⱼ + α(t) · hⱼ,ᵇᵐᵤ(t) · (x − wⱼ), with decaying α and neighborhood width.
Convergence: Start broad and strong; narrow and fine-tune over time.

When and why to use SOMs

Great fits

Topology-aware clustering: You want cluster structure and neighborhood relationships.
Visualization: Map complex vectors (e.g., spectra, embeddings) onto an interpretable 2D grid.
Vector quantization: Build codebooks for compression or fast nearest-neighbor lookups.
Prototyping: Quick, unsupervised exploration before committing to labels or models.

Considerations

Parameters matter: Learning-rate and neighborhood schedules strongly affect outcomes.
Grid bias: The pre-set grid size/shape constrains resolution and topology.
Scale sensitivity: Inputs should be normalized; feature scaling changes the geometry.
Alternatives: For pure visualization consider t-SNE/UMAP; for clustering consider k-means/GMM.

Tip: Normalize features (e.g., to [0,1] or z-score). Start with a larger neighborhood that decays roughly exponentially over the training horizon.

Live simulation: train a SOM

Data Neurons BMU trace

Epoch: 0

Step: 0

Learning rate: —

Radius: —

BMU: argminᵢ ∥x − wᵢ∥₂ Update: wⱼ ← wⱼ + α(t) · exp(−∥rⱼ − rᵇᵐᵤ∥² / (2σ(t)²)) · (x − wⱼ) α(t), σ(t): decay from initial values to small final values over training

Controls

Dataset Samples

Grid size (WxH)

Topology

Initial learning rate Initial radius Training epochs Steps per frame

Keyboard: Space start/pause, . step, r reset

Visualization

Use case: palette learning for color quantization

Train a small SOM in RGB space to learn a palette from an image, then map each pixel to its nearest neuron. This is vector quantization guided by topology: similar colors become neighboring codewords on the grid.

Upload image Palette grid (WxH)

Epochs Learning rate

Compression Faster rendering Artistic stylization

Parameter tuning

Learning rate α(t): Start larger (e.g., 0.3–0.7) and decay exponentially to ~0.01.
Radius σ(t): Begin near max(gridW, gridH)/2 and decay to 1–2.
Schedule: Use epoch-based decay: α(t)=α₀·exp(−t/T), σ(t)=σ₀·exp(−t/T).
Epochs: Enough to cover rough ordering then fine tuning (e.g., 20–100).

Diagnostics

U‑Matrix: Visualize distances between neighboring units—ridges suggest cluster boundaries.
Quantization error: Average ∥x − wᵇᵐᵤ∥; should decrease and plateau.
Topographic error: Fraction of samples where first and second BMU are non-adjacent.
Stability: Run multiple seeds to assess consistency.

Pitfalls

Unscaled features: Dominant dimensions distort topology—normalize first.
Too small grid: Over-merges clusters; too large grid under-trains and looks noisy.
Too-fast decay: Map freezes early; too-slow decay: never stabilizes.
Random init bias: Try PCA-based init for faster ordering when available.