arXiv:1910.07517 Abstract | arXiv Analytics

arXiv:1910.07517 [cs.LG]Abstract References Reviews Resources

Adversarial Examples for Models of Code

Published 2019-10-15Version 1

We introduce a novel approach for attacking trained models of code with adversarial examples. The main idea is to force a given trained model to make a prediction of the adversary's choice by introducing small perturbations that do not change program semantics. We find these perturbations by deriving the desired prediction with respect to the model's inputs while holding the model weights constant and following the gradients to slightly modify the input. To defend a model against such attacks, we propose placing a defensive model in front of the downstream model. The defensive model detects unlikely mutations and masks them before feeding the input to the downstream model. We show that our attack succeeds in changing a prediction to the adversary's desire ("targeted attack") up to 89% of the times, and succeeds in changing a given prediction to any incorrect prediction ("non-targeted attack") 94% of the times. By using our proposed defense, the success rate of the attack drops drastically for both targeted and non-targeted attacks, with a minor penalty of 2% relative degradation in accuracy while not performing under attack.

Categories: cs.LG, cs.PL

Keywords: adversarial examples, prediction, downstream model, defensive model detects unlikely mutations, change program semantics

Related articles: Most relevant | Search more

arXiv:1811.06103 [cs.LG] (Published 2018-11-14)

Deep Neural Networks based Modrec: Some Results with Inter-Symbol Interference and Adversarial Examples

S. Asim Ahmed, Subhashish Chakravarty, Michael Newhouse

arXiv:1911.06479 [cs.LG] (Published 2019-11-15)

On Model Robustness Against Adversarial Examples

Shufei Zhang, Kaizhu Huang, Zenglin Xu

arXiv:1703.04318 [cs.LG] (Published 2017-03-13)

Blocking Transferability of Adversarial Examples in Black-Box Learning Systems

Hossein Hosseini, Yize Chen, Sreeram Kannan, Baosen Zhang, Radha Poovendran