arXiv Analytics

Sign in

arXiv:1805.10694 [stat.ML]AbstractReferencesReviewsResources

Towards a Theoretical Understanding of Batch Normalization

Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Ming Zhou, Klaus Neymeyr, Thomas Hofmann

Published 2018-05-27Version 1

Normalization techniques such as Batch Normalization have been applied very successfully for training deep neural networks. Yet, despite its apparent empirical benefits, the reasons behind the success of Batch Normalization are mostly hypothetical. We thus aim to provide a more thorough theoretical understanding from an optimization perspective. Our main contribution towards this goal is the identification of various problem instances in the realm of machine learning where, under certain assumptions, Batch Normalization can provably accelerate optimization with gradient-based methods. We thereby turn Batch Normalization from an effective practical heuristic into a provably converging algorithm for these settings. Furthermore, we substantiate our analysis with empirical evidence that suggests the validity of our theoretical results in a broader context.

Related articles: Most relevant | Search more
arXiv:1908.06395 [stat.ML] (Published 2019-08-18)
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
arXiv:2007.07365 [stat.ML] (Published 2020-07-14)
Towards a Theoretical Understanding of the Robustness of Variational Autoencoders
arXiv:1903.01435 [stat.ML] (Published 2019-03-04)
Optimistic Adaptive Acceleration for Optimization