arXiv:1712.07424 Abstract | arXiv Analytics

arXiv:1712.07424 [stat.ML]Abstract References Reviews Resources

ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

Vishwak Srinivasan, Adepu Ravi Sankar, Vineeth N Balasubramanian

Published 2017-12-20Version 1

Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter $m$ which is always suggested to be set to less than $1$. Although the choice of $m < 1$ is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method $\textit{ADINE}$, which relaxes the constraint of $m < 1$ and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on $m$ by experimentally verifying that a higher momentum ($\ge 1$) can help escape saddles much faster. Using this motivation, we propose our method $\textit{ADINE}$ that helps weigh the previous updates more (by setting the momentum parameter $> 1$), evaluate our proposed algorithm on deep neural networks and show that $\textit{ADINE}$ helps the learning algorithm to converge much faster without compromising on the generalization error.

Comments: 8 + 1 pages, 12 figures, accepted at CoDS-COMAD 2018

Categories: stat.ML, cs.LG

Keywords: stochastic gradient descent, adaptive momentum method, polyaks heavy ball method, momentum parameter, higher momentum

Related articles: Most relevant | Search more

arXiv:2207.04922 [stat.ML] (Published 2022-07-11)

On uniform-in-time diffusion approximation for stochastic gradient descent

Lei Li, Yuliang Wang

arXiv:2006.10840 [stat.ML] (Published 2020-06-18)

Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping

Nicole Mücke, Enrico Reiss

arXiv:2012.03636 [stat.ML] (Published 2020-12-07)

Stochastic Gradient Descent with Large Learning Rate

Kangqiao Liu, Liu Ziyin, Masahito Ueda

arXiv Analytics

arXiv:1712.07424 [stat.ML]Abstract References Reviews Resources

ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

Links

Toolbox

arXiv:1712.07424 [stat.ML]AbstractReferencesReviewsResources

ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

Links

Toolbox

arXiv:1712.07424 [stat.ML]Abstract References Reviews Resources