Data-parallel training: gradient bucketing and overlap

Why DDP feels like magic until you look at the allreduce schedule.

May 19, 2026 · 1 min · mc

Tuning TCP BBR and fq on a high-latency link

Switching off loss-based congestion control on a long-fat path, and the fq gotcha for UDP.

November 3, 2025 · 1 min · mc

A first, honest look at io_uring

Two ring buffers, one syscall, and the mental model that finally made it click.

May 22, 2025 · 2 min · mc