arXiv:1908.06077 Abstract | arXiv Analytics

arXiv:1908.06077 [cs.LG]Abstract References Reviews Resources

NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization

Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy

Published 2019-08-16Version 1

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel. Alistarh et al. (2017) describe two variants of data-parallel SGD that quantize and encode gradients to lessen communication costs. For the first variant, QSGD, they provide strong theoretical guarantees. For the second variant, which we call QSGDinf, they demonstrate impressive empirical gains for distributed training of large neural networks. Building on their work, we propose an alternative scheme for quantizing gradients and show that it yields stronger theoretical guarantees than exist for QSGD while matching the empirical performance of QSGDinf.

Comments: 21 pages, 6 figures

Categories: cs.LG, stat.ML

Keywords: data-parallel sgd, nonuniform quantization, communication efficiency, lessen communication costs, yields stronger theoretical guarantees

Related articles: Most relevant | Search more

arXiv:1909.09145 [cs.LG] (Published 2019-09-18)

Detailed comparison of communication efficiency of split learning and federated learning

Abhishek Singh, Praneeth Vepakomma, Otkrist Gupta, Ramesh Raskar

arXiv:2004.02738 [cs.LG] (Published 2020-04-06)

Evaluating the Communication Efficiency in Federated Learning Algorithms

Muhammad Asad, Ahmed Moustafa, Takayuki Ito, Muhammad Aslam

arXiv:2107.10996 [cs.LG] (Published 2021-07-23)

Communication Efficiency in Federated Learning: Achievements and Challenges