Design, Compilation, and Acceleration for Deep Neural Networks in IoT ApplicationsSpeaker: Deming Chen – Urbana, IL, United States
Topic(s): Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing
Many new IoT (Internet of Things) applications are driven by the fast creation, adaptation, and enhancement of various types of Deep Neural Networks (DNNs). DNNs are computation intensive. Without efficient hardware implementations of DNNs, these promising IoT applications will not be practically realizable. In this talk, we will analyze several challenges facing the AI and IoT community for mapping DNNs to hardware accelerators. Especially, we will evaluate FPGA's potential role for accelerating DNNs for both the cloud and edge devices. Although FPGAs can provide desirable customized hardware solutions, they are difficult to program and optimize. We will present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include automated hardware/software co-design, the use of configurable DNN IPs, resource allocation across DNN layers, smart pipeline scheduling, Winograd and FFT techniques, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, bidirectional LSTM for machine translation, and Inception module (GoogleNet) for face recognition. We will also present some of our recent work on developing new DNN models and data structures for achieving higher accuracy for several interesting applications such as crowd counting, music synthesis, and smart sound.
About this LectureNumber of Slides: 26 - 54
Duration: 30 - 60 minutes
Languages Available: Chinese (Simplified), English
Request this Lecture
To request this particular lecture, please complete this online form.
Request a Tour
To request a tour with this speaker, please complete this online form.
All requests will be sent to ACM headquarters for review.