arXiv:2006.08517 Abstract | arXiv Analytics

arXiv:2006.08517 [cs.LG]Abstract References Reviews Resources

The Limit of the Batch Size

Yang You, Yuhui Wang, Huan Zhang, Zhao Zhang, James Demmel, Cho-Jui Hsieh

Published 2020-06-15Version 1

Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the limit of the batch size. We think it may provide a guidance to AI supercomputer and algorithm designers. We provide detailed numerical optimization instructions for step-by-step comparison. Moreover, it is important to understand the generalization and optimization performance of huge batch training. Hoffer et al. introduced "ultra-slow diffusion" theory to large-batch training. However, our experiments show contradictory results with the conclusion of Hoffer et al. We provide comprehensive experimental results and detailed analysis to study the limitations of batch size scaling and "ultra-slow diffusion" theory. For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting. We propose an optimization recipe that is able to improve the top-1 test accuracy by 18% compared to the baseline.

Categories: cs.LG, cs.CV, cs.DC, stat.ML

Keywords: batch size, ultra-slow diffusion, state-of-the-art optimization schemes, current distributed deep learning systems, efficient approach

Related articles: Most relevant | Search more

arXiv:2406.02294 [cs.LG] (Published 2024-06-04)

Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling

Arthur Müller, Felix Grumbach, Matthia Sabatelli

arXiv:1711.00489 [cs.LG] (Published 2017-11-01)

Don't Decay the Learning Rate, Increase the Batch Size

Samuel L. Smith, Pieter-Jan Kindermans, Quoc V. Le

arXiv:1206.6472 [cs.LG] (Published 2012-06-27)

An Efficient Approach to Sparse Linear Discriminant Analysis

Luis Francisco Sanchez Merchante, Yves Grandvalet, Gerrad Govaert