#153 — Add stop functionality
Repo: Twill-AI/twill-llm-engine State: open | Status: open Assignee: meliascosta
Created: 2024-10-03 · Updated: 2024-10-17
Description
We want to be able to interrupt the LLM generation mid-answer. To accomplish this we can interrupt the async for statement here:
To do so we can check a context var that signals if a message to interrupt generation has been received.
AC:
- A new type of message has been defined in confluence docs Should be something like a
ChatPromptStopmessage - The Websocket handler inside
facadehas been modified to receive the new message and sets the contextvar accordingly. - The interruption logic has been added to the
async forshown above - LLM engine’s .answer_in_chat() function has been modified to close down gracefully (i. e. handle
GeneratorExit)
Implementation details
We had some doubts as to how resources would be released when interrupting the generation so we ran a couple of experiments (see test_async.py inside the /notebooks folder) .
We are using this main routine as example
async def main():
out = [task() for _ in range(10)]
print(tracemalloc.get_traced_memory())
tasks = asyncio.gather(*out)
res = await tasks
x = tracemalloc.get_traced_memory()
print(f"MAIN: finished, current memory {x[0]}. Peak mem: {x[1]}")
print(res)
print("Waiting some seconds and
## Notes
_Add implementation notes, blockers, and context here_
## Related
_Add wikilinks to related people, meetings, or other tickets_