arXiv:2006.16495 Abstract | arXiv Analytics

arXiv:2006.16495 [stat.ML]Abstract References Reviews Resources

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Published 2020-06-30Version 1

Learning-to-learn (using optimization algorithms to learn a new optimizer) has successfully trained efficient optimizers in practice. This approach relies on meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates. However, there were few theoretical guarantees on how to avoid meta-gradient explosion/vanishing problems, or how to train an optimizer with good generalization performance. In this paper, we study the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that although there is a way to design the meta-objective so that the meta-gradient remain polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues that look similar to gradient explosion/vanishing problems. We also characterize when it is necessary to compute the meta-objective on a separate validation set instead of the original training set. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks.

Categories: stat.ML, cs.LG

Keywords: learning-to-learn approach, step size, guarantees, avoid meta-gradient explosion/vanishing problems, separate validation set

Related articles: Most relevant | Search more

arXiv:2210.13132 [stat.ML] (Published 2022-10-24)

PAC-Bayesian Offline Contextual Bandits With Guarantees

Otmane Sakhi, Nicolas Chopin, Pierre Alquier

arXiv:2306.03372 [stat.ML] (Published 2023-06-06)

Online Tensor Learning: Computational and Statistical Trade-offs, Adaptivity and Optimal Regret

Jian-Feng Cai, Jingyang Li, Dong Xia

arXiv:2002.02892 [stat.ML] (Published 2020-02-07)

Sparse and Smooth: improved guarantees for Spectral Clustering in the Dynamic Stochastic Block Model

Nicolas Keriven, Samuel Vaiter