Skip to content

Join active runs when opening thread#3587

Open
Jholly2008 wants to merge 1 commit into
bytedance:mainfrom
KkkKxj:kkk/join-active-run-on-open
Open

Join active runs when opening thread#3587
Jholly2008 wants to merge 1 commit into
bytedance:mainfrom
KkkKxj:kkk/join-active-run-on-open

Conversation

@Jholly2008

Copy link
Copy Markdown

Why

Opening an existing thread while a background run is already pending or running can leave the chat view looking idle until the page is refreshed. This happens when another client creates the run, so the current chat view loads persisted history but never attaches to the active run stream.

This fixes the user-facing gap where an existing chat can be actively running on the backend, but the frontend does not show the streaming/running state.

What changed

  • Existing chat views now inspect the thread's run list and join the latest pending or running run.
  • Auto-join is best effort, avoids duplicate joins for the same run, and defers to the SDK's existing reconnect metadata when present.
  • E2E mocks can now provide per-thread run lists.
  • Added regression coverage for joining a run created by another client.

Surface area

  • Frontend UI — page / component / setting / interaction under frontend/
  • Backend API — endpoint / SSE event / request-response shape under backend/app
  • Agents / LangGraph — agent node, graph wiring, langgraph.json, or prompt change
  • Sandboxdocker/ or sandboxed execution
  • Skills — change under skills/
  • Dependencies — new/upgraded entry in backend/pyproject.toml or frontend/package.json (say what it buys us)
  • Default behavior change — changes existing behavior without the user opting in (default model, default setting, data shape)
  • Docs / tests / CI only — no runtime behavior change

Screenshots / Recording

Not attached. This is a stream attachment behavior change rather than a visual layout change, and it is covered by the E2E regression test.

Bug fix verification

  • Test path that reproduces the bug: frontend/tests/e2e/thread-history.spec.ts (existing thread joins an active run created by another client).
  • Did it go red on main and green on this branch? Yes. Before the hook change, the new E2E timed out waiting for the active run stream to be joined; after the change it passes.
  • Unit helper tests also covered selecting the latest active run.

Validation

  • cd frontend && pnpm exec vitest run tests/unit/core/threads/message-merge.test.ts
  • cd frontend && pnpm exec prettier --check src/core/threads/hooks.ts tests/unit/core/threads/message-merge.test.ts tests/e2e/thread-history.spec.ts tests/e2e/utils/mock-api.ts
  • cd frontend && pnpm exec tsc --noEmit
  • cd frontend && env SKIP_ENV_VALIDATION=1 DEER_FLOW_AUTH_DISABLED=1 pnpm build
  • cd frontend && pnpm exec eslint src/core/threads/hooks.ts tests/e2e/thread-history.spec.ts tests/e2e/utils/mock-api.ts tests/unit/core/threads/message-merge.test.ts --ext .ts,.tsx
  • Production-start E2E smoke: cd frontend && DEER_FLOW_AUTH_DISABLED=1 SKIP_ENV_VALIDATION=1 pnpm start plus pnpm exec playwright test tests/e2e/thread-history.spec.ts --project=chromium --reporter=line --timeout=120000 --grep=joins

AI assistance

Tool(s) used: Codex

How you used it: Codex investigated the run creation/stream attachment path, implemented the frontend hook change, added regression tests, and ran the validation commands above.

  • I've read and understand every line of this change and take responsibility for it — it's not unreviewed AI output.

@CLAassistant

CLAassistant commented Jun 15, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added area:frontend Next.js frontend under frontend/ needs-validation Touches front/back contract surface; needs real-path validation risk:medium Medium risk: regular code changes size/M PR changes 100-300 lines labels Jun 15, 2026

@fancyboi999 fancyboi999 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the join logic and it hangs together well:

  • findLatestActiveRun (hooks.ts:148) filters to pending/running and picks the newest by updated_at ?? created_at. Ran the new unit tests locally — 2 passed, including the case where a success run with a newer timestamp is correctly skipped in favor of an older running one.
  • The join effect (hooks.ts:761) is guarded three ways before it attaches: the permanent joinedActiveRunIdsRef, the in-flight activeRunJoinInFlightRef, and the lg:stream:${threadId} sessionStorage check. That last one is the nice part — it's the same reconnect key the SDK uses (reconnectOnMount: true at hooks.ts:574, and clearReconnectRun at api-client.ts:65 already manages it), so when the SDK is auto-reconnecting on mount this effect stands down instead of double-attaching. In the cross-client case the run was created elsewhere, so this client's key is unset and the manual join fires — exactly the gap you're closing.
  • The refs reset on threadId change (hooks.ts:843), so switching threads doesn't carry stale join state.

One non-blocking thought inline. Reads correct to me otherwise.

}
}

joinedActiveRunIdsRef.current.add(activeRunId);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeRunId goes into joinedActiveRunIdsRef before the join, and the .catch below only logs — so a non-inactive failure (a transient network blip, say) permanently marks this run as joined and the effect won't retry even when runs re-fires, leaving the view idle-looking until a manual refresh. The inactive-run path is already handled cleanly upstream (the joinStream wrapper calls clearReconnectRun and returns), so those correctly shouldn't retry. For the genuinely-transient case, would it be worth deleting activeRunId from the set in the .catch so a later runs update can have another go? Minor given it's best-effort, but right now a one-off blip is indistinguishable from a permanent give-up.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated the catch path to remove the run id from joinedActiveRunIdsRef when joinStream throws, so later runs updates can retry transient failures. The inactive-run path still returns normally and keeps the run marked as handled.

@Jholly2008 Jholly2008 force-pushed the kkk/join-active-run-on-open branch from ccc2dc9 to 558499e Compare June 15, 2026 03:18
@Jholly2008 Jholly2008 force-pushed the kkk/join-active-run-on-open branch from 558499e to 465e6c8 Compare June 15, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:frontend Next.js frontend under frontend/ needs-validation Touches front/back contract surface; needs real-path validation risk:medium Medium risk: regular code changes size/M PR changes 100-300 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants