arXiv Analytics

Sign in

arXiv:1910.04867 [cs.CV]AbstractReferencesReviewsResources

The Visual Task Adaptation Benchmark

Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

Published 2019-10-01Version 1

Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets. Yet, the absence of a unified yardstick to evaluate general visual representations hinders progress. Many sub-fields promise representations, but each has different evaluation protocols that are either too constrained (linear classification), limited in scope (ImageNet, CIFAR, Pascal-VOC), or only loosely related to representation quality (generation). We present the Visual Task Adaptation Benchmark (VTAB): a diverse, realistic, and challenging benchmark to evaluate representations. VTAB embodies one principle: good representations adapt to unseen tasks with few examples. We run a large VTAB study of popular algorithms, answering questions like: How effective are ImageNet representation on non-standard datasets? Are generative models competitive? Is self-supervision useful if one already has labels?

Related articles:
arXiv:1912.02783 [cs.CV] (Published 2019-12-05)
Self-Supervised Learning of Video-Induced Visual Invariances
arXiv:1912.11370 [cs.CV] (Published 2019-12-24)
Large Scale Learning of General Visual Representations for Transfer
arXiv:2010.02808 [cs.CV] (Published 2020-10-06)
Representation learning from videos in-the-wild: An object-centric approach