#304 — Improve LLM SQL generation context by using a richer query on `pg_stats`

Repo: Twill-AI/twill-llm-engine State: open | Status: open Assignee: meliascosta

Created: 2025-06-17 · Updated: 2025-07-09

Description

Originates from https://github.com/Twill-AI/twill-analytics/issues/24

We want to improve the function that generates a rich description per table for SQL geneartion.

See existing implementation in https://github.com/Twill-AI/twill-llm-engine/blob/05341a81f75f92fe5d2f58f85c259a0d0f66b9ff/twill_llm_engine/utils/database.py#L110

We want to extend this context to use the most_common_values, most_common_freqs, histogram_bins, and n_distinct columns to generate richer per column context.

AC:

Some quick research has been conducted to undestand what human understantable values can be derived from these columns (approximate min, max, mean or median, // is_categorical_column? // if categorical what are the categories // etc)
A SQL query has been written to calculate derivative values
The table info function has been modified to include the new context

Notes

Add implementation notes, blockers, and context here

Add wikilinks to related people, meetings, or other tickets

Twill Brain

Explorer

#304 Improve LLM SQL generation context by using a richer query on `pg_stats`

#304 — Improve LLM SQL generation context by using a richer query on `pg_stats`

Description

Notes

Graph View

Table of Contents

Twill Brain

Explorer

#304 Improve LLM SQL generation context by using a richer query on `pg_stats`

#304 — Improve LLM SQL generation context by using a richer query on pg_stats

Description

Notes

Related

Graph View

Table of Contents

#304 — Improve LLM SQL generation context by using a richer query on `pg_stats`