Inference on mc

Inference on mc · noteshttps://hk.crepuscule.uk/tags/inference/Recent content in Inference on mc · notesHugoen-usSun, 08 Feb 2026 15:20:00 +0800KV cache, and why LLM inference is memory-boundhttps://hk.crepuscule.uk/posts/kv-cache/Sun, 08 Feb 2026 15:20:00 +0800https://hk.crepuscule.uk/posts/kv-cache/The cache that makes autoregressive decoding fast also makes it the thing that runs out of memory first.