Speeding up distributed learning by hiding communication delays

Communication eats training time in distributed learning, especially across slow networks. LOSCAR-SGD tackles this by combining three cost-reduction tricks: sending only important model parameters, letting workers train multiple steps locally, and continuing optimization while waiting for data to arrive. The key innovation is a merge rule that safely incorporates delayed information without losing progress made during communication. Theory shows how sparsity, overlap, and mismatched worker speeds affect convergence on smooth non-convex problems.