arXiv:1405.3229 Abstract | arXiv Analytics

arXiv:1405.3229 [cs.LG]Abstract References Reviews Resources

Rate of Convergence and Error Bounds for LSTD($λ$)

Published 2014-05-13Version 1

We consider LSTD($\lambda$), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a $\beta$-mixing assumption, we derive, for any value of $\lambda \in (0,1)$, a high-probability estimate of the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where $\lambda=0$. In particular, our analysis sheds some light on the choice of $\lambda$ with respect to the quality of the chosen linear space and the number of samples, that complies with simulations.

Comments: (2014)

Categories: cs.LG, cs.AI, math.OC, math.ST, stat.TH

Keywords: error bounds, convergence, large markov decision process, eligibility traces algorithm, least-squares temporal-difference algorithm

Related articles: Most relevant | Search more

arXiv:1811.09358 [cs.LG] (Published 2018-11-23)

A Sufficient Condition for Convergences of Adam and RMSProp

Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu

arXiv:1810.00122 [cs.LG] (Published 2018-09-29)

On the Convergence and Robustness of Batch Normalization

Yongqiang Cai, Qianxiao Li, Zuowei Shen

arXiv:2109.03194 [cs.LG] (Published 2021-09-07)

On the Convergence of Decentralized Adaptive Gradient Methods