{ "id": "1712.09196", "version": "v1", "published": "2017-12-26T07:28:14.000Z", "updated": "2017-12-26T07:28:14.000Z", "title": "The Robust Manifold Defense: Adversarial Training using Generative Models", "authors": [ "Andrew Ilyas", "Ajil Jalal", "Eirini Asteri", "Constantinos Daskalakis", "Alexandros G. Dimakis" ], "categories": [ "cs.CV", "cs.CR", "cs.LG", "stat.ML" ], "abstract": "Deep neural networks are demonstrating excellent performance on several classical vision problems. However, these networks are vulnerable to adversarial examples, minutely modified images that induce arbitrary attacker-chosen output from the network. We propose a mechanism to protect against these adversarial inputs based on a generative model of the data. We introduce a pre-processing step that projects on the range of a generative model using gradient descent before feeding an input into a classifier. We show that this step provides the classifier with robustness against first-order, substitute model, and combined adversarial attacks. Using a min-max formulation, we show that there may exist adversarial examples even in the range of the generator, natural-looking images extremely close to the decision boundary for which the classifier has unjustifiedly high confidence. We show that adversarial training on the generative manifold can be used to make a classifier that is robust to these attacks. Finally, we show how our method can be applied even without a pre-trained generative model using a recent method called the deep image prior. We evaluate our method on MNIST, CelebA and Imagenet and show robustness against the current state of the art attacks.", "revisions": [ { "version": "v1", "updated": "2017-12-26T07:28:14.000Z" } ], "analyses": { "keywords": [ "generative model", "robust manifold defense", "adversarial training", "adversarial examples", "induce arbitrary attacker-chosen output" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }