LLM-R2 : A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency

Recently, DB query rewrite using LLMs has been one of the sort out use cases. The aim of query rewrite is to output a new query equivalent to the original SQL query, while having a shorter execution time. However, most of these methods utilize the sequence-to-sequence generation ability of a language model to directly output a new rewritten query given an input query, without considering any rewrite rules or DBMS information. Relying solely on LLM’s output query may lead to hallucination problems or a syntax or reference error during generation, especially for long and complicated queries.

To address this researchers have proposed a novel method of query rewrite named LLM-enhanced rule-based rewrite system (LLM-R2), adopting a large language model (LLM) to propose possible rewrite rules for a database rewrite system. To overcome hallucination, LLM-R2 collects a pool of demonstrations consisting of effective query rewrites using existing methods and designed baselines. Then a contrastive query is learned using a representation model to select the most useful in-context demonstration for the given query to prompt the system, optimizing the LLM’s rewrite rule selection. In addition, to address the challenge of limited training data, the learning curriculum technique utilizes to schedule the training data from easy to hard.

During experimentation LLM-R2 method was applied on three different datasets, namely TPC-H, IMDB, and DSB. A significant decrease in query execution time was observed, taking only 52.5%, 56.0%, 39.8% of the querying time of the original query and 94.5%, 63.1%, 40.7% of the time of the state-of-the-art baseline method on average on the three datasets.

Paper : https://lnkd.in/gd3SVd6H