Abstract:
The objective of the text-to-SQL task is to convert natural language queries into SQL queries. However, the presence of extensive text-to-SQL datasets across multiple domains, such as Spider, introduces the challenge of effectively generalizing to unseen data. Existing semantic parsing models have struggled to achieve notable performance improvements on these cross-domain datasets. As a result, recent advancements have focused on leveraging pre-trained language models to address this issue and enhance performance in text-to-SQL tasks. These approaches represent the latest and most promising attempts to tackle the challenges associated with generalization and performance improvement in this field. I proposed an approach to evaluate and use the Seq2Seq model by giving the most relevant schema items as the input to the encoder and to generate accurate and valid cross-domain SQL queries using the decoder by understanding the skeleton of the target SQL query. The proposed approach is evaluated using Spider dataset which is a well-known dataset for text-to-sql task and able to get promising results where the Exact Match accuracy and Execution accuracy has been boosted to 72.7% and 80.2% respectively compared to other best related approaches. Keywords: Text-to-SQL, Seq2Seq model, BERT, RoBERTa, T5-Base
Citation:
Rushdy, M.S.A. (2023). Text-to-SQL generation using schema item classifier and encoder-decoder architecture [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/23424