arXiv:2406.10650 Abstract | arXiv Analytics

arXiv:2406.10650 [stat.ML]Abstract References Reviews Resources

The Implicit Bias of Adam on Separable Data

Published 2024-06-15Version 1

Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maximum $\ell_\infty$-margin. Notably, for a general class of diminishing learning rates, this convergence occurs within polynomial time. Our result shed light on the difference between Adam and (stochastic) gradient descent from a theoretical perspective.

Comments: 33 pages, 2 figures

Categories: stat.ML, cs.LG

Keywords: implicit bias, separable data, linear logistic regression, gradient descent, deep learning problems

Related articles: Most relevant | Search more

arXiv:1906.03559 [stat.ML] (Published 2019-06-09)

The Implicit Bias of AdaGrad on Separable Data

Qian Qian, Xiaoyuan Qian

arXiv:1710.10345 [stat.ML] (Published 2017-10-27)

The Implicit Bias of Gradient Descent on Separable Data

Daniel Soudry, Elad Hoffer, Nathan Srebro

arXiv:2309.08044 [stat.ML] (Published 2023-09-14)

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent