Machine Learning and Synthetic Data - Potential and Pitfalls
Speaker: Anura Jayasumana – Fort Collins, CO, United StatesTopic(s): Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing
Abstract
Machine Learning (ML) models have become indispensable for solving complex problems. However, they require sufficient volumes of representative training data to be effective. There are many problem domains where the collection of sufficient data is expensive or even infeasible.
We will provide an overview the state of the art in the use of synthetic data for training ML models as well as ML techniques for generating such data. Unconventional Data Sets (UDS) that are inherently sparse and incomplete are quite common in social and behavioral domains. Our recent research resulted in techniques to model and synthesize large sets of complex data objects from UDS while maintaining the same statistical and topological characteristics. Synthetic data sets we generated have shown marked improvements in the performance of ML algorithms in problems such as detection of radicalization potential of individuals, video traffic classification, and phishing website detection. The resulting symbiotic relationship, between synthetic data generated by ML models and ML models trained using synthetic data, promises a bold new field with creative solutions and perilous pitfalls.
About this Lecture
Number of Slides: 20 - 25Duration: 50 minutes
Languages Available: English
Last Updated:
Request this Lecture
To request this particular lecture, please complete this online form.
Request a Tour
To request a tour with this speaker, please complete this online form.
All requests will be sent to ACM headquarters for review.