Skip to content

fix(backend): persist partial answer when a run is interrupted mid-stream (#3403)#3571

Closed
Eilen6316 wants to merge 1 commit into
bytedance:mainfrom
Eilen6316:fix/3403-persist-interrupted-partial
Closed

fix(backend): persist partial answer when a run is interrupted mid-stream (#3403)#3571
Eilen6316 wants to merge 1 commit into
bytedance:mainfrom
Eilen6316:fix/3403-persist-interrupted-partial

Conversation

@Eilen6316

Copy link
Copy Markdown
Contributor

Summary

Fixes #3403 — when a user cancels a streaming response midway, the partial AI answer shown in the UI disappears after a refresh.

Root cause: thread history is rebuilt from the run_events store, but RunJournal only writes an AI message on on_llm_end. A cancel stops streaming mid-generation, so on_llm_end never fires for the in-flight LLM call and its partial text — already streamed to the client — is never persisted. (The > blockquote framing in the issue thread aside, this is the data-consistency gap the maintainer pointed at.)

What changes

  • run_agent (runtime/runs/worker.py) accumulates streamed messages-mode AI chunks by message id during the stream (consecutive AIMessageChunks for one response merge with +).
  • On an interrupt cancel — but not a rollback, which intentionally discards the run — the worker hands the not-yet-completed partials to the new RunJournal.record_interrupted_ai_messages(), right before flushing the journal in the finally block.
  • record_interrupted_ai_messages() writes each partial in the same llm.ai.response / category="message" shape that on_llm_end produces (so history rebuilds it identically), flagged interrupted: true. It skips ids already completed via on_llm_end (tracked in a new _completed_message_ids set) and dedups its own writes, so nothing is duplicated. Incomplete tool calls are dropped (their args may be truncated mid-JSON); user-visible text and reasoning content are kept.

No new storage layer is introduced — the reporter's suggested "store run_events for unfinished turns" is exactly the existing run_events mechanism; this just fills the partial in on interrupt. Normal (non-cancelled) runs are unaffected.

Testing

  • tests/test_run_journal.py::TestInterruptedPartialMessages — persists a partial; skips a message already completed via on_llm_end; skips empty chunks; keeps reasoning-only chunks; dedups repeated emits.
  • tests/test_worker_partial_persist.py — chunk accumulation by id, merge of same-id chunks, and ignoring of non-messages modes / non-AI chunks / id-less chunks.
  • All 11 new tests pass; ruff check and ruff format --check clean on the changed files.

Note: a handful of pre-existing tests in test_run_journal.py / test_run_worker_rollback.py fail in my sandbox due to a langgraph version mismatch (No module named 'langgraph.runtime'); I confirmed they fail identically on main without this change, so they're unrelated to this PR.

…ream

When a user cancels a streaming response, the partial AI text already shown
in the UI was lost on refresh: thread history is rebuilt from run_events, but
RunJournal only writes an AI message on on_llm_end — which never fires for an
LLM call cancelled mid-generation. So the half-finished answer was never
persisted (issue bytedance#3403).

run_agent now accumulates streamed messages-mode AI chunks by id during the
stream. On an interrupt cancel (not a rollback, which intentionally discards
the run), it hands the not-yet-completed partials to the new
RunJournal.record_interrupted_ai_messages(), which writes each in the same
llm.ai.response / category=message shape on_llm_end uses (flagged
interrupted), skipping ids already completed so nothing is duplicated.
Incomplete tool calls are dropped (their args may be truncated mid-JSON);
user-visible text and reasoning are kept.

Adds unit tests for the journal persistence and the worker chunk accumulation.
@github-actions github-actions Bot added area:backend Gateway / runtime / core backend under backend/ area:docs Documentation and Markdown only risk:medium Medium risk: regular code changes size/M PR changes 100-300 lines labels Jun 14, 2026

@fancyboi999 fancyboi999 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traced the full path on this — it's a clean, surgical fix for #3403 and the persistence shape is right:

  • _accumulate_partial_ai_chunk (worker.py:714) only touches messages-mode AIMessageChunks, keys by message id, and merges consecutive chunks with + — the correct way to rebuild the streamed partial. Good unit coverage in test_worker_partial_persist.py (by-id merge, separate ids, and the non-messages / non-AI / no-id / bare-chunk guards).
  • record_interrupted_ai_messages (journal.py:347) writes each partial in the exact shape on_llm_end uses — event_type="llm.ai.response", category="message", content=model_dump(), plus _record_message_summary — so history rebuilds it identically. Dedup via _completed_message_ids (populated in on_llm_end) and _partial_message_ids correctly stops an interrupt from re-writing a message that already completed, and the empty content/reasoning skip avoids persisting tool-call-only chunks.
  • The cancel gate (worker.py:413: partial_ai_chunks and record.abort_event.is_set() and record.abort_action != "rollback") correctly skips rollback cancellations and is wrapped so a persistence failure can't break the cancel path. Dropping partial (truncated-JSON) tool calls is the right call — they're not resumable.

One non-blocking note inline about the hardcoded caller.

Heads-up — not a defect in this PR, but worth flagging: #3572 also targets "Fixes #3403" and changes the same journal.py + worker.py, so the two will conflict. The scope differs: this PR persists the partial to run_events only — which fixes the literal "answer vanishes on refresh" symptom — whereas #3572 also writes the partial into the LangGraph checkpoint (so it lands in the next-turn agent context) and synthesizes closure ToolMessages for interrupted tool calls. Whether #3403 wants the minimal run_events fix or the broader one is a scope call for the maintainer; flagging so they don't get merged on top of each other.

The core fix here is correct and merge-ready for the refresh symptom. (Reviewed at code level; I didn't run the suite locally.)

event_type="llm.ai.response",
category="message",
content=message.model_dump(),
metadata={"caller": "lead_agent", "interrupted": True},

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: on_llm_end derives the caller dynamically (caller = self._identify_caller(tags), e.g. journal.py:268), but here it's hardcoded caller="lead_agent" — also passed to _record_message_summary. If an interrupted partial can belong to a subagent LLM call (i.e. subagent AIMessageChunks reach the messages-mode accumulator), it'd be mis-attributed to the lead agent; if only lead-agent chunks ever reach it, this is fine. Either way worth resolving the divergence from on_llm_end (reuse _identify_caller, or a short comment on why lead_agent is always correct here).

@WillemJiang

Copy link
Copy Markdown
Collaborator

The feature request is invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:backend Gateway / runtime / core backend under backend/ area:docs Documentation and Markdown only risk:medium Medium risk: regular code changes size/M PR changes 100-300 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] 支持被取消的半截对话显示

3 participants