{ "id": "1611.05162", "version": "v1", "published": "2016-11-16T06:34:41.000Z", "updated": "2016-11-16T06:34:41.000Z", "title": "Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks", "authors": [ "Alireza Aghasi", "Nam Nguyen", "Justin Romberg" ], "categories": [ "cs.LG", "stat.ML" ], "abstract": "Model reduction is a highly desirable process for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Net-Trim is a layer-wise convex framework to prune (sparsify) deep neural networks. The method is applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. The basic idea is to retrain the network layer by layer keeping the layer inputs and outputs close to the originally trained model, while seeking a sparse transform matrix. We present both the parallel and cascade versions of the algorithm. While the former enjoys computational distributability, the latter is capable of achieving simpler models. In both cases, we mathematically show a consistency between the retrained model and the initial trained network. We also derive the general sufficient conditions for the recovery of a sparse transform matrix. In the case of standard Gaussian training samples of dimension $N$ being fed to a layer, and $s$ being the maximum number of nonzero terms across all columns of the transform matrix, we show that $\\mathcal{O}(s\\log N)$ samples are enough to accurately learn the layer model.", "revisions": [ { "version": "v1", "updated": "2016-11-16T06:34:41.000Z" } ], "analyses": { "keywords": [ "deep neural networks", "layer-wise convex pruning", "sparse transform matrix", "enjoys computational distributability", "general sufficient conditions" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }