Join active runs when opening thread#3587
Conversation
fancyboi999
left a comment
There was a problem hiding this comment.
Read through the join logic and it hangs together well:
findLatestActiveRun(hooks.ts:148) filters topending/runningand picks the newest byupdated_at ?? created_at. Ran the new unit tests locally — 2 passed, including the case where asuccessrun with a newer timestamp is correctly skipped in favor of an olderrunningone.- The join effect (
hooks.ts:761) is guarded three ways before it attaches: the permanentjoinedActiveRunIdsRef, the in-flightactiveRunJoinInFlightRef, and thelg:stream:${threadId}sessionStorage check. That last one is the nice part — it's the same reconnect key the SDK uses (reconnectOnMount: trueathooks.ts:574, andclearReconnectRunatapi-client.ts:65already manages it), so when the SDK is auto-reconnecting on mount this effect stands down instead of double-attaching. In the cross-client case the run was created elsewhere, so this client's key is unset and the manual join fires — exactly the gap you're closing. - The refs reset on
threadIdchange (hooks.ts:843), so switching threads doesn't carry stale join state.
One non-blocking thought inline. Reads correct to me otherwise.
| } | ||
| } | ||
|
|
||
| joinedActiveRunIdsRef.current.add(activeRunId); |
There was a problem hiding this comment.
activeRunId goes into joinedActiveRunIdsRef before the join, and the .catch below only logs — so a non-inactive failure (a transient network blip, say) permanently marks this run as joined and the effect won't retry even when runs re-fires, leaving the view idle-looking until a manual refresh. The inactive-run path is already handled cleanly upstream (the joinStream wrapper calls clearReconnectRun and returns), so those correctly shouldn't retry. For the genuinely-transient case, would it be worth deleting activeRunId from the set in the .catch so a later runs update can have another go? Minor given it's best-effort, but right now a one-off blip is indistinguishable from a permanent give-up.
There was a problem hiding this comment.
Good point. Updated the catch path to remove the run id from joinedActiveRunIdsRef when joinStream throws, so later runs updates can retry transient failures. The inactive-run path still returns normally and keeps the run marked as handled.
ccc2dc9 to
558499e
Compare
558499e to
465e6c8
Compare
Why
Opening an existing thread while a background run is already pending or running can leave the chat view looking idle until the page is refreshed. This happens when another client creates the run, so the current chat view loads persisted history but never attaches to the active run stream.
This fixes the user-facing gap where an existing chat can be actively running on the backend, but the frontend does not show the streaming/running state.
What changed
pendingorrunningrun.Surface area
frontend/backend/applanggraph.json, or prompt changedocker/or sandboxed executionskills/backend/pyproject.tomlorfrontend/package.json(say what it buys us)Screenshots / Recording
Not attached. This is a stream attachment behavior change rather than a visual layout change, and it is covered by the E2E regression test.
Bug fix verification
frontend/tests/e2e/thread-history.spec.ts(existing thread joins an active run created by another client).mainand green on this branch? Yes. Before the hook change, the new E2E timed out waiting for the active run stream to be joined; after the change it passes.Validation
cd frontend && pnpm exec vitest run tests/unit/core/threads/message-merge.test.tscd frontend && pnpm exec prettier --check src/core/threads/hooks.ts tests/unit/core/threads/message-merge.test.ts tests/e2e/thread-history.spec.ts tests/e2e/utils/mock-api.tscd frontend && pnpm exec tsc --noEmitcd frontend && env SKIP_ENV_VALIDATION=1 DEER_FLOW_AUTH_DISABLED=1 pnpm buildcd frontend && pnpm exec eslint src/core/threads/hooks.ts tests/e2e/thread-history.spec.ts tests/e2e/utils/mock-api.ts tests/unit/core/threads/message-merge.test.ts --ext .ts,.tsxcd frontend && DEER_FLOW_AUTH_DISABLED=1 SKIP_ENV_VALIDATION=1 pnpm startpluspnpm exec playwright test tests/e2e/thread-history.spec.ts --project=chromium --reporter=line --timeout=120000 --grep=joinsAI assistance
Tool(s) used: Codex
How you used it: Codex investigated the run creation/stream attachment path, implemented the frontend hook change, added regression tests, and ran the validation commands above.