{ "id": "1910.04867", "version": "v1", "published": "2019-10-01T17:06:29.000Z", "updated": "2019-10-01T17:06:29.000Z", "title": "The Visual Task Adaptation Benchmark", "authors": [ "Xiaohua Zhai", "Joan Puigcerver", "Alexander Kolesnikov", "Pierre Ruyssen", "Carlos Riquelme", "Mario Lucic", "Josip Djolonga", "Andre Susano Pinto", "Maxim Neumann", "Alexey Dosovitskiy", "Lucas Beyer", "Olivier Bachem", "Michael Tschannen", "Marcin Michalski", "Olivier Bousquet", "Sylvain Gelly", "Neil Houlsby" ], "categories": [ "cs.CV", "cs.LG", "stat.ML" ], "abstract": "Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets. Yet, the absence of a unified yardstick to evaluate general visual representations hinders progress. Many sub-fields promise representations, but each has different evaluation protocols that are either too constrained (linear classification), limited in scope (ImageNet, CIFAR, Pascal-VOC), or only loosely related to representation quality (generation). We present the Visual Task Adaptation Benchmark (VTAB): a diverse, realistic, and challenging benchmark to evaluate representations. VTAB embodies one principle: good representations adapt to unseen tasks with few examples. We run a large VTAB study of popular algorithms, answering questions like: How effective are ImageNet representation on non-standard datasets? Are generative models competitive? Is self-supervision useful if one already has labels?", "revisions": [ { "version": "v1", "updated": "2019-10-01T17:06:29.000Z" } ], "analyses": { "keywords": [ "visual task adaptation benchmark", "evaluate general visual representations hinders", "general visual representations hinders progress" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }