#284 — Try gpt 4.1 in our pipeline

Repo: Twill-AI/twill-llm-engine State: closed | Status: done Assignee: meliascosta

Created: 2025-04-16 · Updated: 2025-04-18

Description

AC:

  • Regression tests pass for gpt 4.1
  • Changes in performance have been documented

Veredict

It passes our regression tests and it shows some clear improvements. It is better at instruction following and solves an issue we’ve had for a while that resisted prompting which is that we get the output of tables twice, once is the call to the sql query and the other is the model copying in markdown Current gpt4o output:

Image

Output with gpt4.1:

Image

Notes

Add implementation notes, blockers, and context here

Add wikilinks to related people, meetings, or other tickets