{ "id": "1712.07424", "version": "v1", "published": "2017-12-20T11:30:16.000Z", "updated": "2017-12-20T11:30:16.000Z", "title": "ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent", "authors": [ "Vishwak Srinivasan", "Adepu Ravi Sankar", "Vineeth N Balasubramanian" ], "comment": "8 + 1 pages, 12 figures, accepted at CoDS-COMAD 2018", "categories": [ "stat.ML", "cs.LG" ], "abstract": "Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter $m$ which is always suggested to be set to less than $1$. Although the choice of $m < 1$ is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method $\\textit{ADINE}$, which relaxes the constraint of $m < 1$ and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on $m$ by experimentally verifying that a higher momentum ($\\ge 1$) can help escape saddles much faster. Using this motivation, we propose our method $\\textit{ADINE}$ that helps weigh the previous updates more (by setting the momentum parameter $> 1$), evaluate our proposed algorithm on deep neural networks and show that $\\textit{ADINE}$ helps the learning algorithm to converge much faster without compromising on the generalization error.", "revisions": [ { "version": "v1", "updated": "2017-12-20T11:30:16.000Z" } ], "analyses": { "keywords": [ "stochastic gradient descent", "adaptive momentum method", "polyaks heavy ball method", "momentum parameter", "higher momentum" ], "note": { "typesetting": "TeX", "pages": 1, "language": "en", "license": "arXiv", "status": "editable" } } }