Data-parallel training: gradient bucketing and overlap

Why DDP feels like magic until you look at the allreduce schedule.

May 19, 2026 · 1 min · mc

A minimal, reproducible Docker setup for ML experiments

Pin everything, mount data, never bake it. The boring setup that stopped me losing runs.

March 11, 2026 · 1 min · mc

KV cache, and why LLM inference is memory-bound

The cache that makes autoregressive decoding fast also makes it the thing that runs out of memory first.

February 8, 2026 · 2 min · mc