{
  "id": "1712.07424",
  "version": "v1",
  "published": "2017-12-20T11:30:16.000Z",
  "updated": "2017-12-20T11:30:16.000Z",
  "title": "ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent",
  "authors": [
    "Vishwak Srinivasan",
    "Adepu Ravi Sankar",
    "Vineeth N Balasubramanian"
  ],
  "comment": "8 + 1 pages, 12 figures, accepted at CoDS-COMAD 2018",
  "categories": [
    "stat.ML",
    "cs.LG"
  ],
  "abstract": "Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter $m$ which is always suggested to be set to less than $1$. Although the choice of $m < 1$ is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method $\\textit{ADINE}$, which relaxes the constraint of $m < 1$ and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on $m$ by experimentally verifying that a higher momentum ($\\ge 1$) can help escape saddles much faster. Using this motivation, we propose our method $\\textit{ADINE}$ that helps weigh the previous updates more (by setting the momentum parameter $> 1$), evaluate our proposed algorithm on deep neural networks and show that $\\textit{ADINE}$ helps the learning algorithm to converge much faster without compromising on the generalization error.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2017-12-20T11:30:16.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "stochastic gradient descent",
      "adaptive momentum method",
      "polyaks heavy ball method",
      "momentum parameter",
      "higher momentum"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 1,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}