Do your data behave gently to your Machine Learning algorithms? What if not?

Speaker:  Swagatam Das – Kolkata, India
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing

Abstract

Many machine learning systems rely on implicit assumptions regarding the regularity of data. For instance, several classifiers assume that all classes have an equal number of representatives, that sub-concepts within classes are characterized by an equal number of representatives, and that all classes exhibit similar class-conditional distributions. Additionally, both classifiers and clustering methods presuppose that all features are defined and observed for every data instance. However, numerous real-world datasets violate one or more of these assumptions, resulting in data irregularities that can introduce unwarranted bias in learning systems or render them unsuitable for the data at hand. 

Commencing with a taxonomy of these various data irregularities, this presentation will delve into the significant practical challenges encountered by learning systems when handling one or a combination of such irregularities, especially when pre-processing alone cannot rectify them. Furthermore, we will underscore some fundamental theoretical obstacles in analyzing the behavior of learning systems, such as deriving test error bounds for classifiers on imbalanced datasets, in the presence of irregular data.

About this Lecture

Number of Slides:  90
Duration:  75 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.