arXiv:1304.6383 Abstract | arXiv Analytics

arXiv:1304.6383 [cs.LG]Abstract References Reviews Resources

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

Constantinos Panagiotakopoulos, Petroula Tsampouka

Published 2013-04-23, updated 2014-01-25Version 2

We reconsider the stochastic (sub)gradient approach to the unconstrained primal L1-SVM optimization. We observe that if the learning rate is inversely proportional to the number of steps, i.e., the number of times any training pattern is presented to the algorithm, the update rule may be transformed into the one of the classical perceptron with margin in which the margin threshold increases linearly with the number of steps. Moreover, if we cycle repeatedly through the possibly randomly permuted training set the dual variables defined naturally via the expansion of the weight vector as a linear combination of the patterns on which margin errors were made are shown to obey at the end of each complete cycle automatically the box constraints arising in dual optimization. This renders the dual Lagrangian a running lower bound on the primal objective tending to it at the optimum and makes available an upper bound on the relative accuracy achieved which provides a meaningful stopping criterion. In addition, we propose a mechanism of presenting the same pattern repeatedly to the algorithm which maintains the above properties. Finally, we give experimental evidence that algorithms constructed along these lines exhibit a considerably improved performance.

Comments: In v2 the numerical results are obtained using the latest release 1.7 of Cygwin and the g++ compiler version 4.5.3. We also consider in the experiments the algorithms SvmSgd and SGD-QN. A slightly shorter version of this paper appeared in ECML/PKDD 2013

Categories: cs.LG, cs.AI

Keywords: stochastic gradient descent, randomly permuted training set, margin threshold increases, unconstrained primal l1-svm optimization

Related articles: Most relevant | Search more

arXiv:1905.03776 [cs.LG] (Published 2019-05-09)

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

arXiv:1705.07477 [cs.LG] (Published 2017-05-21)

Statistical inference using SGD

Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis

arXiv:1902.00908 [cs.LG] (Published 2019-02-03)

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Yunwen Lei, Ting Hu, Ke Tang