arXiv Analytics

Sign in

arXiv:1909.04653 [cs.LG]AbstractReferencesReviewsResources

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao

Published 2019-09-10Version 1

Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.

Comments: Thirty-third Conference on Neural Information Processing Systems, 2019
Categories: cs.LG, math.OC, stat.ML
Related articles: Most relevant | Search more
arXiv:1708.00631 [cs.LG] (Published 2017-08-02)
On the Importance of Consistency in Training Deep Neural Networks
arXiv:1808.07910 [cs.LG] (Published 2018-08-23)
The Importance of Generation Order in Language Modeling
arXiv:1505.07634 [cs.LG] (Published 2015-05-28)
Learning with Symmetric Label Noise: The Importance of Being Unhinged