fix(backend): persist partial answer when a run is interrupted mid-stream (#3403) by Eilen6316 · Pull Request #3571 · bytedance/deer-flow

Eilen6316 · 2026-06-14T08:23:44Z

Summary

Fixes #3403 — when a user cancels a streaming response midway, the partial AI answer shown in the UI disappears after a refresh.

Root cause: thread history is rebuilt from the run_events store, but RunJournal only writes an AI message on on_llm_end. A cancel stops streaming mid-generation, so on_llm_end never fires for the in-flight LLM call and its partial text — already streamed to the client — is never persisted. (The > blockquote framing in the issue thread aside, this is the data-consistency gap the maintainer pointed at.)

What changes

run_agent (runtime/runs/worker.py) accumulates streamed messages-mode AI chunks by message id during the stream (consecutive AIMessageChunks for one response merge with +).
On an interrupt cancel — but not a rollback, which intentionally discards the run — the worker hands the not-yet-completed partials to the new RunJournal.record_interrupted_ai_messages(), right before flushing the journal in the finally block.
record_interrupted_ai_messages() writes each partial in the same llm.ai.response / category="message" shape that on_llm_end produces (so history rebuilds it identically), flagged interrupted: true. It skips ids already completed via on_llm_end (tracked in a new _completed_message_ids set) and dedups its own writes, so nothing is duplicated. Incomplete tool calls are dropped (their args may be truncated mid-JSON); user-visible text and reasoning content are kept.

No new storage layer is introduced — the reporter's suggested "store run_events for unfinished turns" is exactly the existing run_events mechanism; this just fills the partial in on interrupt. Normal (non-cancelled) runs are unaffected.

Testing

tests/test_run_journal.py::TestInterruptedPartialMessages — persists a partial; skips a message already completed via on_llm_end; skips empty chunks; keeps reasoning-only chunks; dedups repeated emits.
tests/test_worker_partial_persist.py — chunk accumulation by id, merge of same-id chunks, and ignoring of non-messages modes / non-AI chunks / id-less chunks.
All 11 new tests pass; ruff check and ruff format --check clean on the changed files.

Note: a handful of pre-existing tests in test_run_journal.py / test_run_worker_rollback.py fail in my sandbox due to a langgraph version mismatch (No module named 'langgraph.runtime'); I confirmed they fail identically on main without this change, so they're unrelated to this PR.

…ream When a user cancels a streaming response, the partial AI text already shown in the UI was lost on refresh: thread history is rebuilt from run_events, but RunJournal only writes an AI message on on_llm_end — which never fires for an LLM call cancelled mid-generation. So the half-finished answer was never persisted (issue bytedance#3403). run_agent now accumulates streamed messages-mode AI chunks by id during the stream. On an interrupt cancel (not a rollback, which intentionally discards the run), it hands the not-yet-completed partials to the new RunJournal.record_interrupted_ai_messages(), which writes each in the same llm.ai.response / category=message shape on_llm_end uses (flagged interrupted), skipping ids already completed so nothing is duplicated. Incomplete tool calls are dropped (their args may be truncated mid-JSON); user-visible text and reasoning are kept. Adds unit tests for the journal persistence and the worker chunk accumulation.

fancyboi999

Traced the full path on this — it's a clean, surgical fix for #3403 and the persistence shape is right:

_accumulate_partial_ai_chunk (worker.py:714) only touches messages-mode AIMessageChunks, keys by message id, and merges consecutive chunks with + — the correct way to rebuild the streamed partial. Good unit coverage in test_worker_partial_persist.py (by-id merge, separate ids, and the non-messages / non-AI / no-id / bare-chunk guards).
record_interrupted_ai_messages (journal.py:347) writes each partial in the exact shape on_llm_end uses — event_type="llm.ai.response", category="message", content=model_dump(), plus _record_message_summary — so history rebuilds it identically. Dedup via _completed_message_ids (populated in on_llm_end) and _partial_message_ids correctly stops an interrupt from re-writing a message that already completed, and the empty content/reasoning skip avoids persisting tool-call-only chunks.
The cancel gate (worker.py:413: partial_ai_chunks and record.abort_event.is_set() and record.abort_action != "rollback") correctly skips rollback cancellations and is wrapped so a persistence failure can't break the cancel path. Dropping partial (truncated-JSON) tool calls is the right call — they're not resumable.

One non-blocking note inline about the hardcoded caller.

Heads-up — not a defect in this PR, but worth flagging: #3572 also targets "Fixes #3403" and changes the same journal.py + worker.py, so the two will conflict. The scope differs: this PR persists the partial to run_events only — which fixes the literal "answer vanishes on refresh" symptom — whereas #3572 also writes the partial into the LangGraph checkpoint (so it lands in the next-turn agent context) and synthesizes closure ToolMessages for interrupted tool calls. Whether #3403 wants the minimal run_events fix or the broader one is a scope call for the maintainer; flagging so they don't get merged on top of each other.

The core fix here is correct and merge-ready for the refresh symptom. (Reviewed at code level; I didn't run the suite locally.)

fancyboi999 · 2026-06-14T08:44:58Z

+                event_type="llm.ai.response",
+                category="message",
+                content=message.model_dump(),
+                metadata={"caller": "lead_agent", "interrupted": True},


Non-blocking: on_llm_end derives the caller dynamically (caller = self._identify_caller(tags), e.g. journal.py:268), but here it's hardcoded caller="lead_agent" — also passed to _record_message_summary. If an interrupted partial can belong to a subagent LLM call (i.e. subagent AIMessageChunks reach the messages-mode accumulator), it'd be mis-attributed to the lead agent; if only lead-agent chunks ever reach it, this is fine. Either way worth resolving the divergence from on_llm_end (reuse _identify_caller, or a short comment on why lead_agent is always correct here).

WillemJiang · 2026-06-21T08:19:03Z

The feature request is invalid.

github-actions Bot added area:backend Gateway / runtime / core backend under backend/ area:docs Documentation and Markdown only risk:medium Medium risk: regular code changes size/M PR changes 100-300 lines labels Jun 14, 2026

Eilen6316 mentioned this pull request Jun 14, 2026

[feat] 支持被取消的半截对话显示 #3403

Closed

1 task

fancyboi999 reviewed Jun 14, 2026

View reviewed changes

fancyboi999 mentioned this pull request Jun 14, 2026

feat(runtime): persist partial stream output on cancel #3572

Closed

8 tasks

WillemJiang closed this Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): persist partial answer when a run is interrupted mid-stream (#3403)#3571

fix(backend): persist partial answer when a run is interrupted mid-stream (#3403)#3571
Eilen6316 wants to merge 1 commit into
bytedance:mainfrom
Eilen6316:fix/3403-persist-interrupted-partial

Eilen6316 commented Jun 14, 2026

Uh oh!

fancyboi999 left a comment

Uh oh!

fancyboi999 Jun 14, 2026

Uh oh!

WillemJiang commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Eilen6316 commented Jun 14, 2026

Summary

What changes

Testing

Uh oh!

fancyboi999 left a comment

Choose a reason for hiding this comment

Uh oh!

fancyboi999 Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

WillemJiang commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants