arXiv:1601.00917 Abstract | arXiv Analytics

arXiv:1601.00917 [cs.LG]Abstract References Reviews Resources

Distilling Reverse-Mode Automatic Differentiation (DrMAD) for Optimizing Hyperparameters of Deep Neural Networks

Jie Fu, Hongyin Luo, Jiashi Feng, Tat-Seng Chua

Published 2016-01-05Version 1

The performance of deep neural networks is sensitive to the setting of their hyperparameters (e.g. L2-norm panelties). Recent advances in reverse-mode automatic differentiation have made it possible to optimize hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass, which is similar to its cousin, back-propagation, used for training weights of neural networks. However, the backward pass usually needs to exactly reverse a training procedure, starting from the trained parameters and working back to the initial random ones. This incurs unaffordable memory consumption as it needs to store all the intermediate variables. Here we propose to distill the knowledge of the forward pass into an shortcut path, through which we approximately reverse the training trajectory. Experiments carried out on MNIST dataset show that our approach reduces memory consumption by orders of magnitude without sacrificing its effectiveness. Our method makes it feasible, for the first time, to automatically tune hundreds of thousands of hyperparameters of deep neural networks in practice.

Categories: cs.LG, cs.NE

Keywords: deep neural networks, distilling reverse-mode automatic differentiation, optimizing hyperparameters, approach reduces memory consumption, backward pass

Related articles: Most relevant | Search more

arXiv:1611.01639 [cs.LG] (Published 2016-11-05)

Representation of uncertainty in deep neural networks through sampling

Patrick McClure, Nikolaus Kriegeskorte

arXiv:1411.1792 [cs.LG] (Published 2014-11-06)

How transferable are features in deep neural networks?

Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson

arXiv:1611.06455 [cs.LG] (Published 2016-11-20)

Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

Zhiguang Wang, Weizhong Yan, Tim Oates