Posts
Archive
Tags

Distributed

Data-parallel training: gradient bucketing and overlap

Why DDP feels like magic until you look at the allreduce schedule.

May 19, 2026 · 1 min · mc

© 2026 mc · notes · Powered by Hugo & PaperMod