{ "id": "1611.10258", "version": "v1", "published": "2016-11-30T16:42:23.000Z", "updated": "2016-11-30T16:42:23.000Z", "title": "Reliably Learning the ReLU in Polynomial Time", "authors": [ "Surbhi Goel", "Varun Kanade", "Adam Klivans", "Justin Thaler" ], "categories": [ "cs.LG", "stat.ML" ], "abstract": "We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $\\mathbf{x} \\mapsto \\max(0, \\mathbf{w} \\cdot \\mathbf{x})$ with $\\mathbf{w} \\in \\mathbb{S}^{n-1}$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade, and Mansour (2009) where the learner is given access to a distribution $\\cal{D}$ on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\\cal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to any distribution on $\\mathbb{S}^{n-1}$ (the unit sphere in $n$ dimensions) and for any error parameter $\\epsilon = \\Omega(1/\\log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $\\epsilon$ must be $\\Omega(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLUs. Our techniques combine kernel methods and polynomial approximations with a \"dual-loss\" approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for \"convex piecewise-linear fitting\" and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere.", "revisions": [ { "version": "v1", "updated": "2016-11-30T16:42:23.000Z" } ], "analyses": { "keywords": [ "polynomial time", "reliable agnostic learning model", "unit sphere", "first set", "reliably learning linear threshold functions" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }