arXiv:1802.03487 Abstract | arXiv Analytics

arXiv:1802.03487 [cs.LG]Abstract References Reviews Resources

A Critical View of Global Optimality in Deep Learning

Published 2018-02-10Version 1

We investigate the loss surface of deep linear and nonlinear neural networks. We show that for deep linear networks with differentiable losses, critical points after the multilinear parameterization inherit the structure of critical points of the underlying loss with linear parameterization. As corollaries we obtain "local minima are global" results that subsume most previous results, while showing how to distinguish global minima from saddle points. For nonlinear neural networks, we prove two theorems showing that even for networks with one hidden layer, there can be spurious local minima. Indeed, for piecewise linear nonnegative homogeneous activations (e.g., ReLU), we prove that for almost all practical datasets there exist infinitely many local minima that are not global. We conclude by constructing a counterexample involving other activation functions (e.g., sigmoid, tanh, arctan, etc.), for which there exists a local minimum strictly inferior to the global minimum.

Comments: 35 pages

Categories: cs.LG, math.OC, stat.ML

Keywords: global optimality, local minimum, critical view, deep learning, nonlinear neural networks

Related articles: Most relevant | Search more

arXiv:1712.04741 [cs.LG] (Published 2017-12-13)

Mathematics of Deep Learning

Rene Vidal, Joan Bruna, Raja Giryes, Stefano Soatto

arXiv:1710.09513 [cs.LG] (Published 2017-10-26)

Maximum Principle Based Algorithms for Deep Learning

Qianxiao Li, Long Chen, Cheng Tai, E Weinan

arXiv:1612.07640 [cs.LG] (Published 2016-12-16)

Deep Learning and Its Applications to Machine Health Monitoring: A Survey