arXiv:1706.10239 Abstract | arXiv Analytics

arXiv:1706.10239 [cs.LG]Abstract References Reviews Resources

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

Published 2017-06-30Version 1

It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don't. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments.

Categories: cs.LG, cs.AI, stat.ML

Keywords: deep learning, loss landscapes, understanding generalization, model parameters, loss function

Related articles: Most relevant | Search more

arXiv:1611.07476 [cs.LG] (Published 2016-11-22)

Singularity of the Hessian in Deep Learning

Levent Sagun, Leon Bottou, Yann LeCun

arXiv:1801.07648 [cs.LG] (Published 2018-01-23)

Clustering with Deep Learning: Taxonomy and New Methods

Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Daniel Cremers

arXiv:1710.10686 [cs.LG] (Published 2017-10-29)

Regularization for Deep Learning: A Taxonomy

Jan Kukačka, Vladimir Golkov, Daniel Cremers