arXiv Analytics

Sign in

arXiv:1910.07517 [cs.LG]AbstractReferencesReviewsResources

Adversarial Examples for Models of Code

Noam Yefet, Uri Alon, Eran Yahav

Published 2019-10-15Version 1

We introduce a novel approach for attacking trained models of code with adversarial examples. The main idea is to force a given trained model to make a prediction of the adversary's choice by introducing small perturbations that do not change program semantics. We find these perturbations by deriving the desired prediction with respect to the model's inputs while holding the model weights constant and following the gradients to slightly modify the input. To defend a model against such attacks, we propose placing a defensive model in front of the downstream model. The defensive model detects unlikely mutations and masks them before feeding the input to the downstream model. We show that our attack succeeds in changing a prediction to the adversary's desire ("targeted attack") up to 89% of the times, and succeeds in changing a given prediction to any incorrect prediction ("non-targeted attack") 94% of the times. By using our proposed defense, the success rate of the attack drops drastically for both targeted and non-targeted attacks, with a minor penalty of 2% relative degradation in accuracy while not performing under attack.

Related articles: Most relevant | Search more
arXiv:1811.06103 [cs.LG] (Published 2018-11-14)
Deep Neural Networks based Modrec: Some Results with Inter-Symbol Interference and Adversarial Examples
arXiv:1911.06479 [cs.LG] (Published 2019-11-15)
On Model Robustness Against Adversarial Examples
arXiv:1703.04318 [cs.LG] (Published 2017-03-13)
Blocking Transferability of Adversarial Examples in Black-Box Learning Systems