arXiv Analytics

Sign in

arXiv:1601.00917 [cs.LG]AbstractReferencesReviewsResources

Distilling Reverse-Mode Automatic Differentiation (DrMAD) for Optimizing Hyperparameters of Deep Neural Networks

Jie Fu, Hongyin Luo, Jiashi Feng, Tat-Seng Chua

Published 2016-01-05Version 1

The performance of deep neural networks is sensitive to the setting of their hyperparameters (e.g. L2-norm panelties). Recent advances in reverse-mode automatic differentiation have made it possible to optimize hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass, which is similar to its cousin, back-propagation, used for training weights of neural networks. However, the backward pass usually needs to exactly reverse a training procedure, starting from the trained parameters and working back to the initial random ones. This incurs unaffordable memory consumption as it needs to store all the intermediate variables. Here we propose to distill the knowledge of the forward pass into an shortcut path, through which we approximately reverse the training trajectory. Experiments carried out on MNIST dataset show that our approach reduces memory consumption by orders of magnitude without sacrificing its effectiveness. Our method makes it feasible, for the first time, to automatically tune hundreds of thousands of hyperparameters of deep neural networks in practice.

Related articles: Most relevant | Search more
arXiv:1611.01639 [cs.LG] (Published 2016-11-05)
Representation of uncertainty in deep neural networks through sampling
arXiv:1411.1792 [cs.LG] (Published 2014-11-06)
How transferable are features in deep neural networks?
arXiv:1611.06455 [cs.LG] (Published 2016-11-20)
Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline