#153 — Add stop functionality

Repo: Twill-AI/twill-llm-engine State: open | Status: open Assignee: meliascosta

Created: 2024-10-03 · Updated: 2024-10-17

Description

We want to be able to interrupt the LLM generation mid-answer. To accomplish this we can interrupt the async for statement here:

https://github.com/Twill-AI/facade/blob/717f05f93d3f2f919ffbb2b65034e66a8fd177f2/app/api/websockets_router.py#L1115

To do so we can check a context var that signals if a message to interrupt generation has been received.

AC:

  • A new type of message has been defined in confluence docs Should be something like a ChatPromptStop message
  • The Websocket handler inside facade has been modified to receive the new message and sets the contextvar accordingly.
  • The interruption logic has been added to the async for shown above
  • LLM engine’s .answer_in_chat() function has been modified to close down gracefully (i. e. handle GeneratorExit)

Implementation details

We had some doubts as to how resources would be released when interrupting the generation so we ran a couple of experiments (see test_async.py inside the /notebooks folder) .

We are using this main routine as example

async def main():
    out = [task() for _ in range(10)]
    print(tracemalloc.get_traced_memory())
    tasks = asyncio.gather(*out)
    res = await tasks
    x = tracemalloc.get_traced_memory()
    print(f"MAIN: finished, current memory {x[0]}. Peak mem: {x[1]}")
    print(res)
    print("Waiting some seconds and
 
## Notes
 
_Add implementation notes, blockers, and context here_
 
## Related
 
_Add wikilinks to related people, meetings, or other tickets_