arXiv Analytics

Sign in

arXiv:2204.10782 [cs.LG]AbstractReferencesReviewsResources

On Feature Learning in Neural Networks with Global Convergence Guarantees

Zhengdao Chen, Eric Vanden-Eijnden, Joan Bruna

Published 2022-04-22Version 1

We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that allow feature learning while admitting non-asymptotic global convergence guarantees. First, for wide shallow NNs under the mean-field scaling and with a general class of activation functions, we prove that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. Building upon this analysis, we study a model of wide multi-layer NNs whose second-to-last layer is trained via GF, for which we also prove a linear-rate convergence of the training loss to zero, but regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.

Comments: Accepted by the 10th International Conference on Learning Representations (ICLR 2022)
Categories: cs.LG, math.OC, math.PR, stat.ML
Related articles: Most relevant | Search more
arXiv:2306.04815 [cs.LG] (Published 2023-06-07)
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
arXiv:2303.08433 [cs.LG] (Published 2023-03-15)
The Benefits of Mixup for Feature Learning
arXiv:1809.03267 [cs.LG] (Published 2018-09-07)
Feature Learning for Meta-Paths in Knowledge Graphs