fix(backend): persist partial answer when a run is interrupted mid-stream (#3403)#3571
fix(backend): persist partial answer when a run is interrupted mid-stream (#3403)#3571Eilen6316 wants to merge 1 commit into
Conversation
…ream When a user cancels a streaming response, the partial AI text already shown in the UI was lost on refresh: thread history is rebuilt from run_events, but RunJournal only writes an AI message on on_llm_end — which never fires for an LLM call cancelled mid-generation. So the half-finished answer was never persisted (issue bytedance#3403). run_agent now accumulates streamed messages-mode AI chunks by id during the stream. On an interrupt cancel (not a rollback, which intentionally discards the run), it hands the not-yet-completed partials to the new RunJournal.record_interrupted_ai_messages(), which writes each in the same llm.ai.response / category=message shape on_llm_end uses (flagged interrupted), skipping ids already completed so nothing is duplicated. Incomplete tool calls are dropped (their args may be truncated mid-JSON); user-visible text and reasoning are kept. Adds unit tests for the journal persistence and the worker chunk accumulation.
fancyboi999
left a comment
There was a problem hiding this comment.
Traced the full path on this — it's a clean, surgical fix for #3403 and the persistence shape is right:
_accumulate_partial_ai_chunk(worker.py:714) only touchesmessages-modeAIMessageChunks, keys by message id, and merges consecutive chunks with+— the correct way to rebuild the streamed partial. Good unit coverage intest_worker_partial_persist.py(by-id merge, separate ids, and the non-messages / non-AI / no-id / bare-chunk guards).record_interrupted_ai_messages(journal.py:347) writes each partial in the exact shapeon_llm_enduses —event_type="llm.ai.response",category="message",content=model_dump(), plus_record_message_summary— so history rebuilds it identically. Dedup via_completed_message_ids(populated inon_llm_end) and_partial_message_idscorrectly stops an interrupt from re-writing a message that already completed, and the empty content/reasoning skip avoids persisting tool-call-only chunks.- The cancel gate (
worker.py:413:partial_ai_chunks and record.abort_event.is_set() and record.abort_action != "rollback") correctly skips rollback cancellations and is wrapped so a persistence failure can't break the cancel path. Dropping partial (truncated-JSON) tool calls is the right call — they're not resumable.
One non-blocking note inline about the hardcoded caller.
Heads-up — not a defect in this PR, but worth flagging: #3572 also targets "Fixes #3403" and changes the same journal.py + worker.py, so the two will conflict. The scope differs: this PR persists the partial to run_events only — which fixes the literal "answer vanishes on refresh" symptom — whereas #3572 also writes the partial into the LangGraph checkpoint (so it lands in the next-turn agent context) and synthesizes closure ToolMessages for interrupted tool calls. Whether #3403 wants the minimal run_events fix or the broader one is a scope call for the maintainer; flagging so they don't get merged on top of each other.
The core fix here is correct and merge-ready for the refresh symptom. (Reviewed at code level; I didn't run the suite locally.)
| event_type="llm.ai.response", | ||
| category="message", | ||
| content=message.model_dump(), | ||
| metadata={"caller": "lead_agent", "interrupted": True}, |
There was a problem hiding this comment.
Non-blocking: on_llm_end derives the caller dynamically (caller = self._identify_caller(tags), e.g. journal.py:268), but here it's hardcoded caller="lead_agent" — also passed to _record_message_summary. If an interrupted partial can belong to a subagent LLM call (i.e. subagent AIMessageChunks reach the messages-mode accumulator), it'd be mis-attributed to the lead agent; if only lead-agent chunks ever reach it, this is fine. Either way worth resolving the divergence from on_llm_end (reuse _identify_caller, or a short comment on why lead_agent is always correct here).
|
The feature request is invalid. |
Summary
Fixes #3403 — when a user cancels a streaming response midway, the partial AI answer shown in the UI disappears after a refresh.
Root cause: thread history is rebuilt from the
run_eventsstore, butRunJournalonly writes an AI message onon_llm_end. A cancel stops streaming mid-generation, soon_llm_endnever fires for the in-flight LLM call and its partial text — already streamed to the client — is never persisted. (The>blockquote framing in the issue thread aside, this is the data-consistency gap the maintainer pointed at.)What changes
run_agent(runtime/runs/worker.py) accumulates streamedmessages-mode AI chunks by message id during the stream (consecutiveAIMessageChunks for one response merge with+).rollback, which intentionally discards the run — the worker hands the not-yet-completed partials to the newRunJournal.record_interrupted_ai_messages(), right before flushing the journal in thefinallyblock.record_interrupted_ai_messages()writes each partial in the samellm.ai.response/category="message"shape thaton_llm_endproduces (so history rebuilds it identically), flaggedinterrupted: true. It skips ids already completed viaon_llm_end(tracked in a new_completed_message_idsset) and dedups its own writes, so nothing is duplicated. Incomplete tool calls are dropped (their args may be truncated mid-JSON); user-visible text and reasoning content are kept.No new storage layer is introduced — the reporter's suggested "store run_events for unfinished turns" is exactly the existing
run_eventsmechanism; this just fills the partial in on interrupt. Normal (non-cancelled) runs are unaffected.Testing
tests/test_run_journal.py::TestInterruptedPartialMessages— persists a partial; skips a message already completed viaon_llm_end; skips empty chunks; keeps reasoning-only chunks; dedups repeated emits.tests/test_worker_partial_persist.py— chunk accumulation by id, merge of same-id chunks, and ignoring of non-messagesmodes / non-AI chunks / id-less chunks.ruff checkandruff format --checkclean on the changed files.