{ "id": "1812.06080", "version": "v1", "published": "2018-12-14T18:59:03.000Z", "updated": "2018-12-14T18:59:03.000Z", "title": "Online gradient-based mixtures for transfer modulation in meta-learning", "authors": [ "Ghassen Jerfel", "Erin Grant", "Thomas L. Griffiths", "Katherine Heller" ], "categories": [ "cs.LG", "stat.ML" ], "abstract": "Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not mutually beneficial, for instance, when tasks are sufficiently dissimilar or change over time. Here, we use the connection between gradient-based meta-learning and hierarchical Bayes (Grant et al., 2018) to propose a mixture of hierarchical Bayesian models over the parameters of an arbitrary function approximator such as a neural network. Generalizing the model-agnostic meta-learning (MAML) algorithm (Finn et al., 2017), we present a stochastic expectation maximization procedure to jointly estimate parameter initializations for gradient descent as well as a latent assignment of tasks to initializations. This approach better captures the diversity of training tasks as opposed to consolidating inductive biases into a single set of hyperparameters. Our experiments demonstrate better generalization performance on the standard miniImageNet benchmark for 1-shot classification. We further derive a novel and scalable non-parametric variant of our method that captures the evolution of a task distribution over time as demonstrated on a set of few-shot regression tasks.", "revisions": [ { "version": "v1", "updated": "2018-12-14T18:59:03.000Z" } ], "analyses": { "keywords": [ "online gradient-based mixtures", "transfer modulation", "leverages data-driven inductive bias", "experiments demonstrate better generalization performance", "meta-learning" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }