Compressing Deep Neural Networks for Fun and Profit

Speaker:  Dan Alistarh – Klosterneuburg, Austria
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing

Abstract

Deep learning continues to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models efficiently on resource-constrained environments, such as mobile or end-user devices. This talk focuses on this question, and will overview some of the model compression techniques we have developed over the past couple of years, and applied to practice at NeuralMagic, a Boston-based startup. In particular, I will talk about tools for inducing high weight (kernel) sparsity in convolutional neural networks, as well as techniques for exploiting and enhancing activation sparsity in deep networks with ReLU-based networks. The lecture will also include a demo, showcasing some of the practical speedups we can achieve on real deployments.

About this Lecture

Number of Slides:  25
Duration:  35 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.