fix(daemon): don't silently re-execute mutating commands on fetch timeout#1916
Open
Chen17-sq wants to merge 1 commit into
Open
fix(daemon): don't silently re-execute mutating commands on fetch timeout#1916Chen17-sq wants to merge 1 commit into
Chen17-sq wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The daemon client's retry loop can silently re-execute the same mutating browser command (navigate / click / type /
set-file-input/insert-text/exec) twice when the 30s client-side request timeout fires while the daemon is still executing the command. For a non-idempotent command this double-dispatch can e.g. submit a form twice, navigate twice, or insert text twice.Root cause
src/browser/daemon-client.tssendCommandRaw:const id = generateId();(daemon-client.ts:182), inside the retryforloop.TypeErrorand a fetchAbortErroras one retryable "network error" and re-sent the command:These two failures are not equivalent:
TypeError→ the request never reached the daemon (e.g. connection refused), so re-sending is safe.AbortError→ our own 30s request timeout (requestDaemon(..., { timeout: 30000 })) aborted the fetch after the request was dispatched. The daemon may still be holding the command pending and executing it.The daemon dedups requests only by
body.id—if (pending.has(body.id))(src/daemon.ts:315, returnsDuplicate command id already pending; retry). Because the client regenerates the id per attempt, the retry arrives with a fresh id, bypasses the duplicate guard, and the daemon dispatches the same command a second time. There is no write/non-idempotent exclusion and no content- or extension-level dedup.Fix
Split the single
isNetworkErrorcondition into two branches (minimal, two-branch change):AbortError→ throw aBrowserCommandErrorwitherrorCode: 'command_result_unknown'and hint'Inspect state before retrying.', instead of retrying. This surfaces the ambiguous in-flight state to the caller rather than blindly re-running a possibly-already-executed mutation.TypeError→ unchanged: still retries at 500ms (the command never reached the daemon).All other paths are untouched:
maxRetries, the duplicate-command-idcontinue, and theclassifyBrowserErrortransient-retry path. The doc comment abovesendCommandRawis updated to match.This mirrors the maintainer's existing semantics: the daemon already returns
command_result_unknownwith the same'Inspect state before retrying.'hint for the post-dispatch-timeout case, and the client already special-cases that response (daemon-client.ts:201) and refuses to retry it (existing testsendCommand does not retry command_result_unknown even when the message looks transient). This change extends that same "don't re-run an in-flight mutation" semantic to the client-side fetch timeout, which the result-side handling didn't cover.Tests
src/browser/daemon-client.test.ts(vitest,fetchmocked):sendCommand does not silently retry a mutating command on fetch AbortError— mock fetch to reject with anAbortError; asserts fetch is called exactly once and the call rejects withBrowserCommandErrorcodecommand_result_unknown(no second dispatch / no fresh id re-sent).sendCommand still retries on TypeError (connection refused) since the command never reached the daemon— mock fetch to reject once with aTypeErrorthen resolve ok; asserts 2 calls, the two command ids differ, and it resolves. (Regression guard for the genuine connection-failure retry.)command_result_unknownno-retry test remain green.npx vitest run src/browser/daemon-client.test.ts→ 15 passed.npx tsc --noEmit→ clean.Scope / honesty
This only triggers when a command stays in-flight longer than the 30s client request timeout (the daemon's pending window is longer), so the practical blast radius is narrow — but for a non-idempotent command the double-execution is a real correctness hazard, and the failure is silent today. The fix is intentionally limited to the
AbortErrorbranch and preserves all daemon-hiccup resilience for genuine connection failures. No existing issue/PR found for this case.Note on the trade-off
A genuine pre-dispatch stall that somehow exceeds the 30s client timeout on localhost will now surface
command_result_unknowninstead of being retried. This is the safe direction (it cannot double-execute a mutating command) and is consistent with thecommand_result_unknownsemantics already used elsewhere in this client. Connection-refused (TypeError, request never reached the daemon) remains retryable.