#189 — Create evaluation pipeline for LLM

Repo: Twill-AI/twill-llm-engine State: closed | Status: done Assignee: meliascosta

Created: 2024-11-12 · Updated: 2024-11-19

Description

AC:

  • There are saved test datasets in Langsmith for:
    • Plots
    • SQL generation
    • KPI tiles
    • General agent
  • The test code is placed under tests/llm_tests
  • There is a make target to run the tests inside the repository
  • Metrics are recorded for every run, registered by commit hash and are comparable
  • Documentation has been created in confluence to explain the evaluation process

For reference see:

Notes

Add implementation notes, blockers, and context here

Add wikilinks to related people, meetings, or other tickets