Skip to content

fix: bound y-websocket reconnect rate on transient post-open closes#955

Merged
kptdobe merged 3 commits into
mainfrom
cor44
May 21, 2026
Merged

fix: bound y-websocket reconnect rate on transient post-open closes#955
kptdobe merged 3 commits into
mainfrom
cor44

Conversation

@kptdobe
Copy link
Copy Markdown
Contributor

@kptdobe kptdobe commented May 21, 2026

Summary

  • y-websocket resets its wsUnsuccessfulReconnects counter on every successful onopen, so any close that follows a brief successful handshake reschedules at ~100 ms. The 4401/4403 guard from fix: block y-websocket auto-reconnect during async IMS refresh #943 does not cover non-auth paths (1011 from initSession catch, 1005 from closeConn, 1006 from socket reset), and single users behind corporate Zscaler proxies have sustained 5k+ WS upgrades/sec/IP because of it (see COR-43).
  • Add a rapid-reconnect guard to createConnection: track open-then-close intervals on the provider and apply manual exponential backoff (1s/2s/4s/… capped at SHORT_SESSION_MAX_MS = 30 s) when sessions shorter than MIN_HEALTHY_SESSION_MS = 5 s repeat. A healthy session (≥ 5 s lived) resets the counter so a routine reconnect after a long session still reconnects via y-websocket's own timer.
  • The auth path from fix: block y-websocket auto-reconnect during async IMS refresh #943 is unchanged and the new guard never increments on 4401/4403 closes — the auth flow has its own loop guard via lastSentToken.

Implementation notes

  • The new constants live as module-private consts in blocks/edit/prose/index.js.
  • The guard sits BEFORE the existing token-refresh provider.protocols reassignment so a token refresh still happens for the next manual provider.connect() after the backoff.
  • y-websocket's internal 100 ms setTimeout(setupWS) still fires from onclose, but provider.disconnect() flips shouldConnect = false, so that timer's setupWS call no-ops. The manual setTimeout is what re-arms the connection.
  • No changes to da-y-wrapper or upstream y-websocket — the fix sits at the consuming-call site so it ships with da-live's normal release cadence.

Test plan

  • npm run lint clean on the touched files
  • Focused unit tests pass: npx wtr "./test/unit/blocks/edit/prose/index.test.js" — 35/35 green, including the new prose/index createConnection rapid-reconnect guard (COR-44) suite (7 new scenarios) plus an updated existing test that now reflects the post-guard shouldConnect = false parking state.
  • After merge, query Coralogix and confirm the dominant da-collab URLs from the COR-1 daily review drop out of the >95% exception-ratio bucket and that the populated "Network connection lost." exception count returns to baseline (<10k/24h) within 48 h.

Test: https://cor44--da-live--adobe.aem.live/

Refs

🤖 Generated with Claude Code

y-websocket resets its backoff counter on every successful onopen, so
any close that follows a brief successful handshake reschedules at
~100ms. The 4401/4403 guard from #943 does not cover non-auth paths
(1011 from initSession catch, 1005/1006 from socket reset), and single
users behind corporate proxies can sustain 5k+ WS upgrades/sec/IP.

Add a rapid-reconnect guard to createConnection: track open/close
intervals on the provider and apply manual exponential backoff
(1s/2s/4s/... capped at 30s) when sessions shorter than 5s repeat. A
healthy session (>= MIN_HEALTHY_SESSION_MS) resets the counter. The
auth path from PR #943 is unchanged.

Refs: COR-44

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: kptdobe <acapt@adobe.com>
@aem-code-sync
Copy link
Copy Markdown

aem-code-sync Bot commented May 21, 2026

Hello, I'm the AEM Code Sync Bot and I will run some actions to deploy your branch.
In case there are problems, just click the checkbox below to rerun the respective action.

  • Re-sync branch
Commits

Comment thread blocks/edit/prose/index.js
…ssionMs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eset

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kptdobe kptdobe requested a review from chrischrischris May 21, 2026 12:53
@kptdobe kptdobe merged commit 5438c4c into main May 21, 2026
4 checks passed
@kptdobe kptdobe deleted the cor44 branch May 21, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants