What is a SOM?
A Self-Organizing Map (SOM), also called a Kohonen map, is a 2D lattice of units. Each unit has a weight vector in the same space as the input. During training, inputs pull the most similar unit—the best matching unit (BMU)—and its neighbors closer, so nearby units in the lattice specialize in nearby regions of the data space. The result is a topology-preserving projection: neighborhoods in data map to neighborhoods on the grid.
Core mechanics
- BMU: For input x, find unit i minimizing distance d(x, wᵢ) (usually Euclidean).
- Neighborhood: Units near the BMU in grid coordinates receive a larger update.
- Update: wⱼ ← wⱼ + α(t) · hⱼ,ᵇᵐᵤ(t) · (x − wⱼ), with decaying α and neighborhood width.
- Convergence: Start broad and strong; narrow and fine-tune over time.
When and why to use SOMs
Great fits
- Topology-aware clustering: You want cluster structure and neighborhood relationships.
- Visualization: Map complex vectors (e.g., spectra, embeddings) onto an interpretable 2D grid.
- Vector quantization: Build codebooks for compression or fast nearest-neighbor lookups.
- Prototyping: Quick, unsupervised exploration before committing to labels or models.
Considerations
- Parameters matter: Learning-rate and neighborhood schedules strongly affect outcomes.
- Grid bias: The pre-set grid size/shape constrains resolution and topology.
- Scale sensitivity: Inputs should be normalized; feature scaling changes the geometry.
- Alternatives: For pure visualization consider t-SNE/UMAP; for clustering consider k-means/GMM.
Live simulation: train a SOM
Controls
Use case: palette learning for color quantization
Train a small SOM in RGB space to learn a palette from an image, then map each pixel to its nearest neuron. This is vector quantization guided by topology: similar colors become neighboring codewords on the grid.
Parameter tuning
- Learning rate α(t): Start larger (e.g., 0.3–0.7) and decay exponentially to ~0.01.
- Radius σ(t): Begin near max(gridW, gridH)/2 and decay to 1–2.
- Schedule: Use epoch-based decay: α(t)=α₀·exp(−t/T), σ(t)=σ₀·exp(−t/T).
- Epochs: Enough to cover rough ordering then fine tuning (e.g., 20–100).
Diagnostics
- U‑Matrix: Visualize distances between neighboring units—ridges suggest cluster boundaries.
- Quantization error: Average ∥x − wᵇᵐᵤ∥; should decrease and plateau.
- Topographic error: Fraction of samples where first and second BMU are non-adjacent.
- Stability: Run multiple seeds to assess consistency.
Pitfalls
- Unscaled features: Dominant dimensions distort topology—normalize first.
- Too small grid: Over-merges clusters; too large grid under-trains and looks noisy.
- Too-fast decay: Map freezes early; too-slow decay: never stabilizes.
- Random init bias: Try PCA-based init for faster ordering when available.