arXiv:2306.17693 Abstract | arXiv Analytics

arXiv:2306.17693 [cs.LG]Abstract References Reviews Resources

Thompson sampling for improved exploration in GFlowNets

Jarrid Rector-Brooks, Kanika Madan, Moksh Jain, Maksym Korablyov, Cheng-Hao Liu, Sarath Chandar, Nikolay Malkin, Yoshua Bengio

Published 2023-06-30Version 1

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

Comments: Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML 2023

Categories: cs.LG

Keywords: thompson sampling, target distribution, amortized variational inference algorithms, off-policy exploration strategies, approximate posterior distribution

Related articles: Most relevant | Search more

arXiv:1811.04471 [cs.LG] (Published 2018-11-11)

Thompson Sampling for Pursuit-Evasion Problems

Zhen Li, Nicholas J. Meyer, Eric B. Laber, Robert Brigantic

arXiv:1209.3352 [cs.LG] (Published 2012-09-15, updated 2014-02-03)

Thompson Sampling for Contextual Bandits with Linear Payoffs

Shipra Agrawal, Navin Goyal

arXiv:2007.00187 [cs.LG] (Published 2020-07-01)

Variable Selection via Thompson Sampling