arXiv:1805.10694 Abstract | arXiv Analytics

arXiv:1805.10694 [stat.ML]Abstract References Reviews Resources

Towards a Theoretical Understanding of Batch Normalization

Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Ming Zhou, Klaus Neymeyr, Thomas Hofmann

Published 2018-05-27Version 1

Normalization techniques such as Batch Normalization have been applied very successfully for training deep neural networks. Yet, despite its apparent empirical benefits, the reasons behind the success of Batch Normalization are mostly hypothetical. We thus aim to provide a more thorough theoretical understanding from an optimization perspective. Our main contribution towards this goal is the identification of various problem instances in the realm of machine learning where, under certain assumptions, Batch Normalization can provably accelerate optimization with gradient-based methods. We thereby turn Batch Normalization from an effective practical heuristic into a provably converging algorithm for these settings. Furthermore, we substantiate our analysis with empirical evidence that suggests the validity of our theoretical results in a broader context.

Categories: stat.ML, cs.LG

Keywords: theoretical understanding, training deep neural networks, turn batch normalization, normalization techniques, apparent empirical benefits

Related articles: Most relevant | Search more

arXiv:1908.06395 [stat.ML] (Published 2019-08-18)

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

Hao Jin, Dachao Lin, Zhihua Zhang

arXiv:2007.07365 [stat.ML] (Published 2020-07-14)

Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

Alexander Camuto, Matthew Willetts, Stephen Roberts, Chris Holmes, Tom Rainforth

arXiv:1903.01435 [stat.ML] (Published 2019-03-04)

Optimistic Adaptive Acceleration for Optimization

Jun-Kun Wang, Xiaoyun Li, Ping Li