Scalable Machine Learning on Large Sequence Collections

Speaker:  Themis Palpanas – Paris, France
Topic(s):  Information Systems, Search, Information Retrieval, Database Systems, Data Mining, Data Science


There is an increasingly pressing need, by several applications in
diverse domains, for developing techniques able to analyze very large collections of sequences, or data series. Examples of such applications come from scientific, manufacturing and social domains, where in several cases they need to apply machine learning techniques for knowledge extraction. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. However, no existing data management solution (such as relational databases, column stores, array databases, and time series management systems) can offer native support for sequences and the corresponding operators necessary for complex analytics. In this talk, we argue for the need to study the theory and foundations for sequence management of big data sequences, and to build corresponding systems that will enable scalable management and analytics of very large sequence collections. We describe recent efforts in designing techniques for indexing and analyzing truly massive collections of data series that will enable scientists to run complex analytics on their data. Finally, we present open research directions in the area of big sequence management.

About this Lecture

Number of Slides:  250
Duration:  55 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.