Machine Learning and Synthetic Data - Potential and Pitfalls

Speaker:  Anura Jayasumana – Fort Collins, CO, United States
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing

Abstract

Machine Learning (ML) models have  become indispensable for solving complex problems. However, they require sufficient volumes of representative  training data to be effective.  There are many problem domains where the collection of sufficient data is expensive or even infeasible.

We will  provide an overview the state of the art  in the use of synthetic data  for training ML models  as well as ML techniques for generating such data.  Unconventional Data Sets (UDS) that are inherently sparse and incomplete are quite common in  social and  behavioral domains.  Our recent research resulted in techniques to model and synthesize large sets of  complex data objects from UDS while maintaining the same statistical and topological characteristics.   Synthetic data sets we generated have shown marked improvements in the performance of ML algorithms in problems such as detection of radicalization potential of individuals, video traffic classification,  and phishing website detection.   The resulting symbiotic relationship, between synthetic data generated by ML models and ML models trained using synthetic data, promises a bold  new field  with creative solutions and   perilous pitfalls.   

About this Lecture

Number of Slides:  20 - 25
Duration:  50 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.