#304 — Improve LLM SQL generation context by using a richer query on pg_stats
Repo: Twill-AI/twill-llm-engine State: open | Status: open Assignee: meliascosta
Created: 2025-06-17 · Updated: 2025-07-09
Description
Originates from https://github.com/Twill-AI/twill-analytics/issues/24
We want to improve the function that generates a rich description per table for SQL geneartion.
See existing implementation in https://github.com/Twill-AI/twill-llm-engine/blob/05341a81f75f92fe5d2f58f85c259a0d0f66b9ff/twill_llm_engine/utils/database.py#L110
and related SQL queries to pg_stats here:
https://github.com/Twill-AI/twill-llm-engine/blob/05341a81f75f92fe5d2f58f85c259a0d0f66b9ff/twill_llm_engine/literals/queries.py#L18
We want to extend this context to use the most_common_values, most_common_freqs, histogram_bins, and n_distinct columns to generate richer per column context.
AC:
- Some quick research has been conducted to undestand what human understantable values can be derived from these columns (approximate min, max, mean or median, // is_categorical_column? // if categorical what are the categories // etc)
- A SQL query has been written to calculate derivative values
- The table info function has been modified to include the new context
Notes
Add implementation notes, blockers, and context here
Related
Add wikilinks to related people, meetings, or other tickets