#284 — Try gpt 4.1 in our pipeline
Repo: Twill-AI/twill-llm-engine State: closed | Status: done Assignee: meliascosta
Created: 2025-04-16 · Updated: 2025-04-18
Description
AC:
- Regression tests pass for gpt 4.1
- Changes in performance have been documented
Veredict
It passes our regression tests and it shows some clear improvements. It is better at instruction following and solves an issue we’ve had for a while that resisted prompting which is that we get the output of tables twice, once is the call to the sql query and the other is the model copying in markdown Current gpt4o output:
Output with gpt4.1:
Notes
Add implementation notes, blockers, and context here
Related
Add wikilinks to related people, meetings, or other tickets