arXiv:2312.04027 Abstract | arXiv Analytics

arXiv:2312.04027 [cs.LG]Abstract References Reviews Resources

The sample complexity of multi-distribution learning

Published 2023-12-07Version 1

Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of $k$ data distributions and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis that minimizes the maximum population loss over $k$ distributions, up to $\epsilon$ additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity $\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao [AHZ23].

Categories: cs.LG, cs.AI, cs.DS, stat.ML

Keywords: sample complexity, maximum population loss, hypothesis class, multiple distributions, data distributions

Related articles: Most relevant | Search more

arXiv:1906.00264 [cs.LG] (Published 2019-06-01)

Graph-based Discriminators: Sample Complexity and Expressiveness

Roi Livni, Yishay Mansour

arXiv:1207.1366 [cs.LG] (Published 2012-07-04)

Learning Factor Graphs in Polynomial Time & Sample Complexity

Pieter Abbeel, Daphne Koller, Andrew Y. Ng

arXiv:1402.4844 [cs.LG] (Published 2014-02-19, updated 2016-05-26)

Subspace Learning with Partial Information

Alon Gonen, Dan Rosenbaum, Yonina Eldar, Shai Shalev-Shwartz