arXiv:1212.1824 Abstract | arXiv Analytics

arXiv:1212.1824 [cs.LG]Abstract References Reviews Resources

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Published 2012-12-08, updated 2012-12-28Version 2

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate the performance of SGD without such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. In this framework, we prove that after T rounds, the suboptimality of the last SGD iterate scales as O(log(T)/\sqrt{T}) for non-smooth convex objective functions, and O(log(T)/T) in the non-smooth strongly convex case. To the best of our knowledge, these are the first bounds of this kind, and almost match the minimax-optimal rates obtainable by appropriate averaging schemes. We also propose a new and simple averaging scheme, which not only attains optimal rates, but can also be easily computed on-the-fly (in contrast, the suffix averaging scheme proposed in Rakhlin et al. (2011) is not as simple to implement). Finally, we provide some experimental illustrations.

Comments: To appear in ICML 2013

Categories: cs.LG, math.OC, stat.ML

Keywords: stochastic gradient descent, optimal averaging schemes, non-smooth optimization, convergence results, sgd iterate

Related articles: Most relevant | Search more

arXiv:1509.09002 [cs.LG] (Published 2015-09-30)

Convergence of Stochastic Gradient Descent for PCA

Ohad Shamir

arXiv:1411.1134 [cs.LG] (Published 2014-11-05)

Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems

Christopher De Sa, Kunle Olukotun, Christopher Ré

arXiv:1509.01240 [cs.LG] (Published 2015-09-03)

Train faster, generalize better: Stability of stochastic gradient descent

Moritz Hardt, Benjamin Recht, Yoram Singer

arXiv Analytics

arXiv:1212.1824 [cs.LG]Abstract References Reviews Resources

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Links

Toolbox

arXiv:1212.1824 [cs.LG]AbstractReferencesReviewsResources

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Links

Toolbox

arXiv:1212.1824 [cs.LG]Abstract References Reviews Resources