arXiv Analytics

Sign in

arXiv:1810.00122 [cs.LG]AbstractReferencesReviewsResources

On the Convergence and Robustness of Batch Normalization

Yongqiang Cai, Qianxiao Li, Zuowei Shen

Published 2018-09-29Version 1

Despite its empirical success, the theoretical underpinnings of the stability, convergence and acceleration properties of batch normalization (BN) remain elusive. In this paper, we attack this problem from a modeling approach, where we perform a thorough theoretical analysis on BN applied to a simplified model: ordinary least squares (OLS). We discover that gradient descent on OLS with BN has interesting properties, including a scaling law, convergence for arbitrary learning rates for the weights, asymptotic acceleration effects, as well as insensitivity to the choice of learning rates. We then demonstrate numerically that these findings are not specific to the OLS problem and hold qualitatively for more complex supervised learning problems. This points to a new direction towards uncovering the mathematical principles that underlies batch normalization.

Related articles: Most relevant | Search more
arXiv:1811.09358 [cs.LG] (Published 2018-11-23)
A Sufficient Condition for Convergences of Adam and RMSProp
arXiv:2109.03194 [cs.LG] (Published 2021-09-07)
On the Convergence of Decentralized Adaptive Gradient Methods
arXiv:2010.12711 [cs.LG] (Published 2020-10-23)
On Convergence and Generalization of Dropout Training