arXiv:1903.01435 [stat.ML]AbstractReferencesReviewsResources
Optimistic Adaptive Acceleration for Optimization
Jun-Kun Wang, Xiaoyun Li, Ping Li
Published 2019-03-04Version 1
We consider a new variant of \textsc{AMSGrad}. AMSGrad \cite{RKK18} is a popular adaptive gradient based optimization algorithm that is widely used in training deep neural networks. Our new variant of the algorithm assumes that mini-batch gradients in consecutive iterations have some underlying structure, which makes the gradients sequentially predictable. By exploiting the predictability and some ideas from the field of \textsc{Optimistic Online learning}, the new algorithm can accelerate the convergence and enjoy a tighter regret bound. We conduct experiments on training various neural networks on several datasets to show that the proposed method speeds up the convergence in practice.