Explaining Deep Learning with Adversarial Attacks

Speaker:  Naveed Akhtar – Perth, WN, Australia
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing


Deep visual models are susceptible to adversarial perturbations to inputs. Although these signals are carefully crafted, they still appear noise-like patterns to humans. This observation has led to the argument that deep visual representation is misaligned with human perception. In this talk, we will slightly counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. We will introduce an attack that fools a network to confuse a whole category of objects (source class) with a target label. Our attack also limits the unintended fooling by samples from non-sources classes, thereby circumscribing human-defined semantic notions for network fooling. We will demonstrate that our attack not only leads to the emergence of regular geometric patterns in the perturbations, but also reveals insightful information about the decision boundaries of deep models. Exploring this phenomenon further, we will alter the `adversarial' objective of our attack to use it as a tool to `explain' deep visual representation. We will show that by careful channelling and projection of the perturbations computed by our method, we can visualize a model's understanding of human-defined semantic notions. 

About this Lecture

Number of Slides:  28
Duration:  20 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.