Data-parallel training: gradient bucketing and overlap

Why DDP feels like magic until you look at the allreduce schedule.

May 19, 2026 · 1 min · mc