{ "id": "1904.08544", "version": "v1", "published": "2019-04-18T00:24:17.000Z", "updated": "2019-04-18T00:24:17.000Z", "title": "Memory-Sample Tradeoffs for Linear Regression with Small Error", "authors": [ "Vatsal Sharan", "Aaron Sidford", "Gregory Valiant" ], "comment": "22 pages, to appear in STOC'19", "categories": [ "cs.LG", "stat.ML" ], "abstract": "We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples $(a_1,b_1), (a_2,b_2)\\ldots,$ with $a_i$ drawn independently from a $d$-dimensional isotropic Gaussian, and where $b_i = \\langle a_i, x\\rangle + \\eta_i,$ for a fixed $x \\in \\mathbb{R}^d$ with $\\|x\\|_2 = 1$ and with independent noise $\\eta_i$ drawn uniformly from the interval $[-2^{-d/5},2^{-d/5}].$ We show that any algorithm with at most $d^2/4$ bits of memory requires at least $\\Omega(d \\log \\log \\frac{1}{\\epsilon})$ samples to approximate $x$ to $\\ell_2$ error $\\epsilon$ with probability of success at least $2/3$, for $\\epsilon$ sufficiently small as a function of $d$. In contrast, for such $\\epsilon$, $x$ can be recovered to error $\\epsilon$ with probability $1-o(1)$ with memory $O\\left(d^2 \\log(1/\\epsilon)\\right)$ using $d$ examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.", "revisions": [ { "version": "v1", "updated": "2019-04-18T00:24:17.000Z" } ], "analyses": { "keywords": [ "linear regression", "small error", "memory-sample tradeoffs", "first nontrivial lower bounds", "dimensional isotropic gaussian" ], "note": { "typesetting": "TeX", "pages": 22, "language": "en", "license": "arXiv", "status": "editable" } } }