#189 — Create evaluation pipeline for LLM
Repo: Twill-AI/twill-llm-engine State: closed | Status: done Assignee: meliascosta
Created: 2024-11-12 · Updated: 2024-11-19
Description
AC:
- There are saved test datasets in Langsmith for:
- Plots
- SQL generation
- KPI tiles
- General agent
- The test code is placed under
tests/llm_tests - There is a make target to run the tests inside the repository
- Metrics are recorded for every run, registered by commit hash and are comparable
- Documentation has been created in confluence to explain the evaluation process
For reference see:
Notes
Add implementation notes, blockers, and context here
Related
Add wikilinks to related people, meetings, or other tickets