arXiv Analytics

Sign in

arXiv:1405.3229 [cs.LG]AbstractReferencesReviewsResources

Rate of Convergence and Error Bounds for LSTD($λ$)

Manel Tagorti, Bruno Scherrer

Published 2014-05-13Version 1

We consider LSTD($\lambda$), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a $\beta$-mixing assumption, we derive, for any value of $\lambda \in (0,1)$, a high-probability estimate of the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where $\lambda=0$. In particular, our analysis sheds some light on the choice of $\lambda$ with respect to the quality of the chosen linear space and the number of samples, that complies with simulations.

Related articles: Most relevant | Search more
arXiv:1811.09358 [cs.LG] (Published 2018-11-23)
A Sufficient Condition for Convergences of Adam and RMSProp
arXiv:1810.00122 [cs.LG] (Published 2018-09-29)
On the Convergence and Robustness of Batch Normalization
arXiv:2109.03194 [cs.LG] (Published 2021-09-07)
On the Convergence of Decentralized Adaptive Gradient Methods