KV cache, and why LLM inference is memory-bound

The cache that makes autoregressive decoding fast also makes it the thing that runs out of memory first.

February 8, 2026 · 2 min · mc

Notes on cgroups v2: memory limits that actually hold

Why my OOM kills moved around after switching to the unified hierarchy, and the three knobs that matter.

March 9, 2025 · 2 min · mc