Large-Scale Distributed Optimization for Machine Learning

Speaker:  Dan Alistarh – Klosterneuburg, Austria
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing


Machine learning has made considerable progress over the past decade,matching and even surpassing human performance on a varied set of narrow computational tasks. This progress has been enabled by the widespread availability of large datasets, as well as by improved algorithms and models. Distribution, implemented either through single-node concurrency or through multi-node parallelism, has been third third key ingredient to these advances.

The goal of this talk is to provide an overview of the role of distributed computing in machine learning, with an eye towards the intriguing trade-offs between synchronization and communication costs of distributed machine learning algorithms, on the one hand, and their convergence, on the other. The focus will be on parallelization strategies for the fundamental stochastic gradient descent (SGD) algorithm, which is a key tool when training machine learning models, from venerable linear regression, to state-of-the-art neural network architectures. Along the way, we will provide an overview of the ongoing research and open problems in distributed machine learning. The lecture will assume no prior knowledge of machine learning or optimization, beyond familiarity with basic concepts in algebra and analysis.

About this Lecture

Number of Slides:  30
Duration:  45 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.