arXiv Analytics

Sign in

arXiv:1802.03487 [cs.LG]AbstractReferencesReviewsResources

A Critical View of Global Optimality in Deep Learning

Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Published 2018-02-10Version 1

We investigate the loss surface of deep linear and nonlinear neural networks. We show that for deep linear networks with differentiable losses, critical points after the multilinear parameterization inherit the structure of critical points of the underlying loss with linear parameterization. As corollaries we obtain "local minima are global" results that subsume most previous results, while showing how to distinguish global minima from saddle points. For nonlinear neural networks, we prove two theorems showing that even for networks with one hidden layer, there can be spurious local minima. Indeed, for piecewise linear nonnegative homogeneous activations (e.g., ReLU), we prove that for almost all practical datasets there exist infinitely many local minima that are not global. We conclude by constructing a counterexample involving other activation functions (e.g., sigmoid, tanh, arctan, etc.), for which there exists a local minimum strictly inferior to the global minimum.

Related articles: Most relevant | Search more
arXiv:1712.04741 [cs.LG] (Published 2017-12-13)
Mathematics of Deep Learning
arXiv:1710.09513 [cs.LG] (Published 2017-10-26)
Maximum Principle Based Algorithms for Deep Learning
arXiv:1612.07640 [cs.LG] (Published 2016-12-16)
Deep Learning and Its Applications to Machine Health Monitoring: A Survey