#189 — Create evaluation pipeline for LLM

Repo: Twill-AI/twill-llm-engine State: closed | Status: done Assignee: meliascosta

Created: 2024-11-12 · Updated: 2024-11-19

Description

There are saved test datasets in Langsmith for:
- Plots
- SQL generation
- KPI tiles
- General agent
The test code is placed under tests/llm_tests
There is a make target to run the tests inside the repository
Metrics are recorded for every run, registered by commit hash and are comparable
Documentation has been created in confluence to explain the evaluation process

Add implementation notes, blockers, and context here

Add wikilinks to related people, meetings, or other tickets