Anomaly Detection — Theory, Math, and Interactive Simulations

Theory and math

Definition: An anomaly is an observation that deviates so much from the majority that it raises suspicion it was generated by a different mechanism.

Problem settings:

Unsupervised

Semi-supervised (normal only)

Supervised (labeled anomalies)

Z-score detector: For scalar data \(x\) with mean \(\mu\) and standard deviation \(\sigma\), score is \(z=\frac{x-\mu}{\sigma}\). Flag if \(|z|>\tau\). Robust variant replaces \(\mu,\sigma\) with median and \(\mathrm{MAD}=\mathrm{median}(|x-\mathrm{median}(x)|)\), scaling \(\sigma\approx 1.4826\cdot \mathrm{MAD}\).

Multivariate Gaussian: With mean \(\boldsymbol{\mu}\) and covariance \(\Sigma\), the Mahalanobis distance is \[ D_M(\mathbf{x})=\sqrt{(\mathbf{x}-\boldsymbol{\mu})^\top \Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})}. \] If data are Gaussian, \(D_M^2\) follows \(\chi^2_k\) (k = dimensions), enabling thresholds via quantiles.

Density-based view: Learn \(p(\mathbf{x})\). Flag if \(p(\mathbf{x})<\epsilon\). Approximations: kernel density estimation, Gaussian mixtures, normalizing flows.

Reconstruction-based: Learn to reconstruct normals (PCA, autoencoders). Score \(s(\mathbf{x})=\lVert \mathbf{x}-\hat{\mathbf{x}}\rVert\). Flag if \(s>\tau\).

Time series: Baselines like EWMA and residual modeling. EWMA: \[ m_t=\alpha x_t+(1-\alpha)m_{t-1},\quad e_t=x_t-m_t. \] Flag if \(|e_t|>k\cdot \hat{\sigma}_e\). Seasonality uses STL/ETS or Fourier terms before residual detection.

PCA for anomalies: Let \(X\in\mathbb{R}^{n\times d}\), compute principal subspace \(U_r\) (top r eigenvectors). Reconstruction \(\hat{\mathbf{x}}=U_r U_r^\top \mathbf{x}\). Residual \(r=\lVert \mathbf{x}-\hat{\mathbf{x}}\rVert\). Large \(r\Rightarrow\) anomalous. In covariance form: \[ r^2 = \sum_{j=r+1}^{d} (\mathbf{u}_j^\top \mathbf{x})^2. \]

Thresholding: Choose \(\tau\) via: Statistical quantile Target false positive rate Expected contamination Validation ROC/PR

Beware of dataset shift: if the “normal” distribution drifts, fixed thresholds break. Refit, recalibrate, or use adaptive methods.

Class imbalance is extreme in anomaly detection. Precision-Recall is more informative than ROC when positives are rare.

IDS intuition: Model typical traffic features (per-flow bytes, packet rate, unique ports). Outliers may indicate scans, exfiltration, or beaconing.

Medical imaging intuition: Train a model (PCA/autoencoder) on healthy images. Lesions or artifacts yield high reconstruction error or segmentation residuals.

Fake content intuition: Inconsistencies in frequency spectra, resampling footprints, or lighting geometry can be anomalous relative to genuine data.

Interactive simulations

Experiment with classic detectors: z-score and MAD, EWMA for time series, and PCA-based reconstruction. Tune parameters and see anomalies light up.

Simulator 1 — distributions and time series

Normal

Anomaly

Model line / bounds

Mode

Samples (n)

Anomaly ratio (%)

Threshold τ / k

EWMA α

Normal σ

TPR, FPR and counts appear here after simulation.

Simulator 2 — PCA reconstruction in 2D

Inliers

Outliers (high residual)

Principal axis

Samples

Inlier spread

Cluster angle (°)

Outlier ratio (%)

Residual threshold τ

Residual stats and flagged count will show here.

Simulator 3 — synthetic imaging anomalies

Base image

Detected mask

Threshold

Lesion intensity

Noise level

Artifact strength

Z-threshold τ

Pixel-level detection rate and FP will show here.

Applications overview

Healthcare monitoring with smartphones and wearables

Signals: Heart rate (HR), heart rate variability (HRV), step count, accelerometer, SpO₂, skin temperature, sleep stages.

Use case (logs): Model expected HR given activity and time-of-day. Residuals \(e_t = HR_t - \hat{HR}_t(\text{activity}, t)\). Persistent large residuals may indicate sensor misplacement, arrhythmias, or artifacts. Always confirm clinically before action.

Method sketch: Train a baseline on “normal” days. Use EWMA on residuals for online alerts. Robust percentiles adapt per user.

Data hygiene: Battery drops, missing data, and device swaps produce anomalies too. Add quality flags and imputation rules.

Cybersecurity — network anomalies and IDS

Features: Per-flow bytes, packets, duration, inter-arrival time stats, TCP flags, destination port entropy, unique dests per source.

Patterns: Port scans → many short flows to many ports; exfiltration → sustained large egress bytes; beaconing → periodic connections with low variance.

Approach: Windowed features + robust scaling + distance/density methods (e.g., Mahalanobis). Tune thresholds to meet alert budgets; feedback loops reduce noise.

Caveat: Encrypted traffic hides payloads. Side-channel features remain useful but require careful baselining per network segment.

Medical imaging — lesions and artefacts

Modalities: MRI, CT, X-ray, ultrasound. Anomalies include unexpected bright/dark regions, motion streaks, metal artefacts, coil failures.

Pipelines: Train on healthy scans with autoencoders or diffusion models. Use reconstruction error or uncertainty maps. For segmentation, use U-Net with out-of-distribution scoring on features.

Validation: Calibrate per site and scanner. Dice/IoU for lesion masks, pixel AUROC for detection, clinical reader studies for utility.

Detecting fake details

Targets: Spliced regions, resampling, face swaps, AI-generated edits, inconsistent reflections/shadows/specularities.

Signals: JPEG blocking inconsistencies, PRNU sensor noise mismatch, frequency spectra anomalies, copy-move self-similarity, eye specular highlights mismatch.

Workflow: Localize suspect regions → compute forensic features → threshold via robust stats → optional human-in-the-loop review.

Simple use cases

Wearable HR spike: Unusually high HR during sleep; cross-check with accelerometer. If no movement, flag sensor misread vs. tachycardia pattern.

Step count reset: Sudden zeros at midday. Anomaly may be app restart → auto-resume heuristics.

Network scan burst: Spike in unique destination ports per minute → alert as potential scan.

CT metal artifact: High-frequency streaks localized near implants → artifact-aware masking before analysis.

Copy-move forgery: Duplicate texture blocks with different lighting → high self-similarity but inconsistent gradients.

Start simple: robust baselines + interpretable thresholds. Then layer representation learning where it truly reduces false positives or reveals subtle patterns.

Evaluation and deployment tips

Metrics: Precision, recall, F1; AUROC and AUPRC; time-to-detect for streaming.

Calibration: Convert scores into probabilities with Platt/isotonic on validated sets; maintain target alert rate.

Drift: Monitor population stats, PSI, or KS; schedule retraining; keep backtesting windows.

Explainability: Show top contributing features, nearest neighbors, or reconstruction heatmaps.

Human in the loop: Triage queue with context and feedback improves precision over time.

Privacy: Minimize PII, aggregate where possible, and enforce purpose limitation for sensitive domains.

Safety: Never act automatically on medical anomalies without clinical confirmation. Treat alerts as prompts for review.

Find what doesn't fit: anomaly detection