arXiv Analytics

Sign in

arXiv:1712.09196 [cs.CV]AbstractReferencesReviewsResources

The Robust Manifold Defense: Adversarial Training using Generative Models

Andrew Ilyas, Ajil Jalal, Eirini Asteri, Constantinos Daskalakis, Alexandros G. Dimakis

Published 2017-12-26Version 1

Deep neural networks are demonstrating excellent performance on several classical vision problems. However, these networks are vulnerable to adversarial examples, minutely modified images that induce arbitrary attacker-chosen output from the network. We propose a mechanism to protect against these adversarial inputs based on a generative model of the data. We introduce a pre-processing step that projects on the range of a generative model using gradient descent before feeding an input into a classifier. We show that this step provides the classifier with robustness against first-order, substitute model, and combined adversarial attacks. Using a min-max formulation, we show that there may exist adversarial examples even in the range of the generator, natural-looking images extremely close to the decision boundary for which the classifier has unjustifiedly high confidence. We show that adversarial training on the generative manifold can be used to make a classifier that is robust to these attacks. Finally, we show how our method can be applied even without a pre-trained generative model using a recent method called the deep image prior. We evaluate our method on MNIST, CelebA and Imagenet and show robustness against the current state of the art attacks.

Related articles: Most relevant | Search more
arXiv:2001.00116 [cs.CV] (Published 2020-01-01)
Erase and Restore: Simple, Accurate and Resilient Detection of $L_2$ Adversarial Examples
arXiv:1804.08529 [cs.CV] (Published 2018-04-23)
VectorDefense: Vectorization as a Defense to Adversarial Examples
arXiv:2001.03460 [cs.CV] (Published 2020-01-08)
Cloud-based Image Classification Service Is Not Robust To Adversarial Examples: A Forgotten Battlefield