arXiv Analytics

Sign in

arXiv:2106.15662 [cs.LG]AbstractReferencesReviewsResources

Exponential Weights Algorithms for Selective Learning

Mingda Qiao, Gregory Valiant

Published 2021-06-29Version 1

We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time. At a time of its choosing, the learner selects a window length $w$ and a model $\hat\ell$ from the model class $\mathcal{L}$, and then labels the next $w$ data points using $\hat\ell$. The excess risk incurred by the learner is defined as the difference between the average loss of $\hat\ell$ over those $w$ data points and the smallest possible average loss among all models in $\mathcal{L}$ over those $w$ data points. We give an improved algorithm, termed the hybrid exponential weights algorithm, that achieves an expected excess risk of $O((\log\log|\mathcal{L}| + \log\log n)/\log n)$. This result gives a doubly exponential improvement in the dependence on $|\mathcal{L}|$ over the best known bound of $O(\sqrt{|\mathcal{L}|/\log n})$. We complement the positive result with an almost matching lower bound, which suggests the worst-case optimality of the algorithm. We also study a more restrictive family of learning algorithms that are bounded-recall in the sense that when a prediction window of length $w$ is chosen, the learner's decision only depends on the most recent $w$ data points. We analyze an exponential weights variant of the ERM algorithm in Qiao and Valiant (2019). This new algorithm achieves an expected excess risk of $O(\sqrt{\log |\mathcal{L}|/\log n})$, which is shown to be nearly optimal among all bounded-recall learners. Our analysis builds on a generalized version of the selective mean prediction problem in Drucker (2013); Qiao and Valiant (2019), which may be of independent interest.

Related articles: Most relevant | Search more
arXiv:1902.00033 [cs.LG] (Published 2019-01-31)
Compressed Diffusion
arXiv:1802.03936 [cs.LG] (Published 2018-02-12)
On the Needs for Rotations in Hypercubic Quantization Hashing
arXiv:2103.08493 [cs.LG] (Published 2021-03-15)
How Many Data Points is a Prompt Worth?