arXiv:1810.00122 Abstract | arXiv Analytics

arXiv:1810.00122 [cs.LG]Abstract References Reviews Resources

On the Convergence and Robustness of Batch Normalization

Published 2018-09-29Version 1

Despite its empirical success, the theoretical underpinnings of the stability, convergence and acceleration properties of batch normalization (BN) remain elusive. In this paper, we attack this problem from a modeling approach, where we perform a thorough theoretical analysis on BN applied to a simplified model: ordinary least squares (OLS). We discover that gradient descent on OLS with BN has interesting properties, including a scaling law, convergence for arbitrary learning rates for the weights, asymptotic acceleration effects, as well as insensitivity to the choice of learning rates. We then demonstrate numerically that these findings are not specific to the OLS problem and hold qualitatively for more complex supervised learning problems. This points to a new direction towards uncovering the mathematical principles that underlies batch normalization.

Categories: cs.LG, stat.ML

Keywords: convergence, robustness, asymptotic acceleration effects, underlies batch normalization, arbitrary learning rates

Related articles: Most relevant | Search more

arXiv:1811.09358 [cs.LG] (Published 2018-11-23)

A Sufficient Condition for Convergences of Adam and RMSProp

Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu

arXiv:2109.03194 [cs.LG] (Published 2021-09-07)

On the Convergence of Decentralized Adaptive Gradient Methods

Xiangyi Chen, Belhal Karimi, Weijie Zhao, Ping Li

arXiv:2010.12711 [cs.LG] (Published 2020-10-23)

On Convergence and Generalization of Dropout Training

Poorya Mianjy, Raman Arora