arXiv:2106.15662 Abstract | arXiv Analytics

arXiv:2106.15662 [cs.LG]Abstract References Reviews Resources

Exponential Weights Algorithms for Selective Learning

Published 2021-06-29Version 1

We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time. At a time of its choosing, the learner selects a window length $w$ and a model $\hat\ell$ from the model class $\mathcal{L}$, and then labels the next $w$ data points using $\hat\ell$. The excess risk incurred by the learner is defined as the difference between the average loss of $\hat\ell$ over those $w$ data points and the smallest possible average loss among all models in $\mathcal{L}$ over those $w$ data points. We give an improved algorithm, termed the hybrid exponential weights algorithm, that achieves an expected excess risk of $O((\log\log|\mathcal{L}| + \log\log n)/\log n)$. This result gives a doubly exponential improvement in the dependence on $|\mathcal{L}|$ over the best known bound of $O(\sqrt{|\mathcal{L}|/\log n})$. We complement the positive result with an almost matching lower bound, which suggests the worst-case optimality of the algorithm. We also study a more restrictive family of learning algorithms that are bounded-recall in the sense that when a prediction window of length $w$ is chosen, the learner's decision only depends on the most recent $w$ data points. We analyze an exponential weights variant of the ERM algorithm in Qiao and Valiant (2019). This new algorithm achieves an expected excess risk of $O(\sqrt{\log |\mathcal{L}|/\log n})$, which is shown to be nearly optimal among all bounded-recall learners. Our analysis builds on a generalized version of the selective mean prediction problem in Drucker (2013); Qiao and Valiant (2019), which may be of independent interest.

Comments: To appear in COLT 2021

Categories: cs.LG, cs.DS, stat.ML

Keywords: data points, selective learning, expected excess risk, average loss, hybrid exponential weights algorithm

Related articles: Most relevant | Search more

arXiv:1902.00033 [cs.LG] (Published 2019-01-31)

Compressed Diffusion

Scott Gigante, Jay S. Stanley III, Ngan Vu, David van Dijk, Kevin Moon, Guy Wolf, Smita Krishnaswamy

arXiv:1802.03936 [cs.LG] (Published 2018-02-12)

On the Needs for Rotations in Hypercubic Quantization Hashing

Anne Morvan, Antoine Souloumiac, Krzysztof Choromanski, Cédric Gouy-Pailler, Jamal Atif

arXiv:2103.08493 [cs.LG] (Published 2021-03-15)

How Many Data Points is a Prompt Worth?