arXiv:2305.03784 Abstract | arXiv Analytics

arXiv:2305.03784 [cs.LG]Abstract References Reviews Resources

Neural Exploitation and Exploration of Contextual Bandits

Yikun Ban, Yuchen Yan, Arindam Banerjee, Jingrui He

Published 2023-05-05Version 1

In this paper, we study utilizing neural networks for the exploitation and exploration of contextual multi-armed bandits. Contextual multi-armed bandits have been studied for decades with various applications. To solve the exploitation-exploration trade-off in bandits, there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and Upper Confidence Bound (UCB). In recent literature, a series of neural bandit algorithms have been proposed to adapt to the non-linear reward function, combined with TS or UCB strategies for exploration. In this paper, instead of calculating a large-deviation based statistical bound for exploration like previous methods, we propose, ``EE-Net,'' a novel neural-based exploitation and exploration strategy. In addition to using a neural network (Exploitation network) to learn the reward function, EE-Net uses another neural network (Exploration network) to adaptively learn the potential gains compared to the currently estimated reward for exploration. We provide an instance-based $\widetilde{\mathcal{O}}(\sqrt{T})$ regret upper bound for EE-Net and show that EE-Net outperforms related linear and neural contextual bandit baselines on real-world datasets.

Comments: Journal Version of EE-Net. arXiv admin note: substantial text overlap with arXiv:2110.03177

Categories: cs.LG

Keywords: exploration, neural exploitation, contextual multi-armed bandits, neural contextual bandit baselines, regret upper bound

Related articles: Most relevant | Search more

arXiv:1508.03326 [cs.LG] (Published 2015-08-13)

A Survey on Contextual Multi-armed Bandits

Li Zhou

arXiv:2310.08702 [cs.LG] (Published 2023-10-12)

ELDEN: Exploration via Local Dependencies

Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martin-Martin

arXiv:2302.04009 [cs.LG] (Published 2023-02-08)

Investigating the role of model-based learning in exploration and transfer

Jacob Walker, Eszter Vértes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Théophane Weber, Jessica B. Hamrick

arXiv Analytics

arXiv:2305.03784 [cs.LG]Abstract References Reviews Resources

Neural Exploitation and Exploration of Contextual Bandits

Links

Toolbox

arXiv:2305.03784 [cs.LG]AbstractReferencesReviewsResources

Neural Exploitation and Exploration of Contextual Bandits

Links

Toolbox

arXiv:2305.03784 [cs.LG]Abstract References Reviews Resources