What We Learned Building LLM-Powered Text-to-SQL

Speaker:  Fatma Özcan – San Jose, CA, United States
Topic(s):  Information Systems, Search, Information Retrieval, Database Systems, Data Mining, Data Science

Abstract

The advent of Large Language Models (LLMs) has ignited renewed interest in text-to-SQL from both academia and industry. This task remains challenging, as it requires bridging the gap between inherently ambiguous natural language questions and complex schema and data semantics of the target database.
 
To address these challenges, we have developed and evaluated multiple text-to-SQL solutions. This talk will detail the lessons learned, with a focus on two key contributions. First, we present CHASE-SQL, a novel multi-agent LLM framework that generates diverse SQL candidates through three distinct pipelines, achieving 76% execution accuracy on the BIRD benchmark. Second,  we will discuss a comprehensive study that quantifies the impact of different contextual information sources—including column value examples, few-shot examples, user hints, SQL documentation, and schema structure—on model performance.
 
While benchmarks like BIRD and Spider have driven significant innovation, our experience building real-world applications has revealed opportunities for improvement beyond academic metrics. We will conclude by outlining these opportunities for future research and presenting a forward-looking perspective on the evolution of natural language interfaces for data interaction.

About this Lecture

Number of Slides:  n/a
Duration:  n/a minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.