#245 — Update LLM evaluation pipeline

Repo: Twill-AI/twill-llm-engine State: closed | Status: done Assignee: meliascosta

Created: 2025-01-29 · Updated: 2025-03-13

Description

AC:

  • Testing datasets have been migrated to the new langsmith account
  • New metrics have been added into twill-llm-engine in-repo tests to monitor SQL performance
  • Datasets have been checked and updated to make sure they are working with the current version of the LLM and relevant.

Notes

Add implementation notes, blockers, and context here

Add wikilinks to related people, meetings, or other tickets