arXiv Analytics

Sign in

arXiv:1903.01435 [stat.ML]AbstractReferencesReviewsResources

Optimistic Adaptive Acceleration for Optimization

Jun-Kun Wang, Xiaoyun Li, Ping Li

Published 2019-03-04Version 1

We consider a new variant of \textsc{AMSGrad}. AMSGrad \cite{RKK18} is a popular adaptive gradient based optimization algorithm that is widely used in training deep neural networks. Our new variant of the algorithm assumes that mini-batch gradients in consecutive iterations have some underlying structure, which makes the gradients sequentially predictable. By exploiting the predictability and some ideas from the field of \textsc{Optimistic Online learning}, the new algorithm can accelerate the convergence and enjoy a tighter regret bound. We conduct experiments on training various neural networks on several datasets to show that the proposed method speeds up the convergence in practice.

Related articles: Most relevant | Search more
arXiv:1908.06395 [stat.ML] (Published 2019-08-18)
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
arXiv:1805.10694 [stat.ML] (Published 2018-05-27)
Towards a Theoretical Understanding of Batch Normalization
arXiv:1912.12923 [stat.ML] (Published 2019-12-30)
Bayesian Tensor Network and Optimization Algorithm for Probabilistic Machine Learning