Skip to content

fix(daemon): don't silently re-execute mutating commands on fetch timeout#1916

Open
Chen17-sq wants to merge 1 commit into
jackwener:mainfrom
Chen17-sq:fix/daemon-no-retry-nonidempotent-on-abort
Open

fix(daemon): don't silently re-execute mutating commands on fetch timeout#1916
Chen17-sq wants to merge 1 commit into
jackwener:mainfrom
Chen17-sq:fix/daemon-no-retry-nonidempotent-on-abort

Conversation

@Chen17-sq

Copy link
Copy Markdown
Contributor

Problem

The daemon client's retry loop can silently re-execute the same mutating browser command (navigate / click / type / set-file-input / insert-text / exec) twice when the 30s client-side request timeout fires while the daemon is still executing the command. For a non-idempotent command this double-dispatch can e.g. submit a form twice, navigate twice, or insert text twice.

Root cause

src/browser/daemon-client.ts sendCommandRaw:

  • The command id is regenerated on every attempt — const id = generateId(); (daemon-client.ts:182), inside the retry for loop.
  • The catch block (daemon-client.ts:218-224 on origin/main) classified both a TypeError and a fetch AbortError as one retryable "network error" and re-sent the command:
    const isNetworkError = err instanceof TypeError
      || (err instanceof Error && err.name === 'AbortError');
    if (isNetworkError && attempt < maxRetries) { await sleep(500); continue; }

These two failures are not equivalent:

  • TypeError → the request never reached the daemon (e.g. connection refused), so re-sending is safe.
  • AbortError → our own 30s request timeout (requestDaemon(..., { timeout: 30000 })) aborted the fetch after the request was dispatched. The daemon may still be holding the command pending and executing it.

The daemon dedups requests only by body.idif (pending.has(body.id)) (src/daemon.ts:315, returns Duplicate command id already pending; retry). Because the client regenerates the id per attempt, the retry arrives with a fresh id, bypasses the duplicate guard, and the daemon dispatches the same command a second time. There is no write/non-idempotent exclusion and no content- or extension-level dedup.

Fix

Split the single isNetworkError condition into two branches (minimal, two-branch change):

  • AbortError → throw a BrowserCommandError with errorCode: 'command_result_unknown' and hint 'Inspect state before retrying.', instead of retrying. This surfaces the ambiguous in-flight state to the caller rather than blindly re-running a possibly-already-executed mutation.
  • TypeError → unchanged: still retries at 500ms (the command never reached the daemon).

All other paths are untouched: maxRetries, the duplicate-command-id continue, and the classifyBrowserError transient-retry path. The doc comment above sendCommandRaw is updated to match.

This mirrors the maintainer's existing semantics: the daemon already returns command_result_unknown with the same 'Inspect state before retrying.' hint for the post-dispatch-timeout case, and the client already special-cases that response (daemon-client.ts:201) and refuses to retry it (existing test sendCommand does not retry command_result_unknown even when the message looks transient). This change extends that same "don't re-run an in-flight mutation" semantic to the client-side fetch timeout, which the result-side handling didn't cover.

Tests

src/browser/daemon-client.test.ts (vitest, fetch mocked):

  1. sendCommand does not silently retry a mutating command on fetch AbortError — mock fetch to reject with an AbortError; asserts fetch is called exactly once and the call rejects with BrowserCommandError code command_result_unknown (no second dispatch / no fresh id re-sent).
  2. sendCommand still retries on TypeError (connection refused) since the command never reached the daemon — mock fetch to reject once with a TypeError then resolve ok; asserts 2 calls, the two command ids differ, and it resolves. (Regression guard for the genuine connection-failure retry.)
  3. The existing duplicate-command-id retry test and the existing command_result_unknown no-retry test remain green.

npx vitest run src/browser/daemon-client.test.ts → 15 passed. npx tsc --noEmit → clean.

Scope / honesty

This only triggers when a command stays in-flight longer than the 30s client request timeout (the daemon's pending window is longer), so the practical blast radius is narrow — but for a non-idempotent command the double-execution is a real correctness hazard, and the failure is silent today. The fix is intentionally limited to the AbortError branch and preserves all daemon-hiccup resilience for genuine connection failures. No existing issue/PR found for this case.

Note on the trade-off

A genuine pre-dispatch stall that somehow exceeds the 30s client timeout on localhost will now surface command_result_unknown instead of being retried. This is the safe direction (it cannot double-execute a mutating command) and is consistent with the command_result_unknown semantics already used elsewhere in this client. Connection-refused (TypeError, request never reached the daemon) remains retryable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant