arXiv:1710.10345 Abstract | arXiv Analytics

arXiv:1710.10345 [stat.ML]Abstract References Reviews Resources

The Implicit Bias of Gradient Descent on Separable Data

Daniel Soudry, Elad Hoffer, Nathan Srebro

Published 2017-10-27Version 1

We show that gradient descent on an unregularized logistic regression problem with separable data converges to the max-margin solution. The result generalizes also to other monotone decreasing loss functions with an infimum at infinity, and we also discuss a multi-class generalizations to the cross entropy loss. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.

Categories: stat.ML, cs.LG

Keywords: gradient descent, implicit bias, unregularized logistic regression problem, validation loss increases, monotone decreasing loss functions

Related articles: Most relevant | Search more

arXiv:2406.10650 [stat.ML] (Published 2024-06-15)

The Implicit Bias of Adam on Separable Data

Chenyang Zhang, Difan Zou, Yuan Cao

arXiv:2302.09376 [stat.ML] (Published 2023-02-18)

Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda

arXiv:1906.03559 [stat.ML] (Published 2019-06-09)

The Implicit Bias of AdaGrad on Separable Data

Qian Qian, Xiaoyuan Qian