fix: bound y-websocket reconnect rate on transient post-open closes#954
Closed
kptdobe wants to merge 1 commit into
Closed
fix: bound y-websocket reconnect rate on transient post-open closes#954kptdobe wants to merge 1 commit into
kptdobe wants to merge 1 commit into
Conversation
y-websocket resets its backoff counter on every successful onopen, so any close that follows a brief successful handshake reschedules at ~100ms. The 4401/4403 guard from #943 does not cover non-auth paths (1011 from initSession catch, 1005/1006 from socket reset), and single users behind corporate proxies can sustain 5k+ WS upgrades/sec/IP. Add a rapid-reconnect guard to createConnection: track open/close intervals on the provider and apply manual exponential backoff (1s/2s/4s/... capped at 30s) when sessions shorter than 5s repeat. A healthy session (>= MIN_HEALTHY_SESSION_MS) resets the counter. The auth path from PR #943 is unchanged. Refs: COR-44 Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: kptdobe <acapt@adobe.com>
|
Hello, I'm the AEM Code Sync Bot and I will run some actions to deploy your branch.
Commits
|
3 tasks
Contributor
Author
|
Replaced by #955 — the branch name on this PR violated da-live CLAUDE.md (branches must be max 8 lowercase alphanumeric chars; this is an IMS constraint that breaks CI/preview). New PR opens the same change from branch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
wsUnsuccessfulReconnectscounter on every successfulonopen, so any close that follows a brief successful handshake reschedules at ~100 ms. The 4401/4403 guard from fix: block y-websocket auto-reconnect during async IMS refresh #943 does not cover non-auth paths (1011 frominitSessioncatch, 1005 fromcloseConn, 1006 from socket reset), and single users behind corporate Zscaler proxies have sustained 5k+ WS upgrades/sec/IP because of it (see COR-43).createConnection: track open-then-close intervals on the provider and apply manual exponential backoff (1s/2s/4s/… capped atSHORT_SESSION_MAX_MS = 30 s) when sessions shorter thanMIN_HEALTHY_SESSION_MS = 5 srepeat. A healthy session (≥ 5 s lived) resets the counter so a routine reconnect after a long session still reconnects via y-websocket's own timer.lastSentToken.Implementation notes
blocks/edit/prose/index.js.provider.protocolsreassignment so a token refresh still happens for the next manualprovider.connect()after the backoff.setTimeout(setupWS)still fires fromonclose, butprovider.disconnect()flipsshouldConnect = false, so that timer'ssetupWScall no-ops. The manualsetTimeoutis what re-arms the connection.Test plan
npm run lintclean on the touched filesnpx wtr "./test/unit/blocks/edit/prose/index.test.js"— 35/35 green, including the newprose/index createConnection rapid-reconnect guard (COR-44)suite (7 new scenarios) plus an updated existing test that now reflects the post-guardshouldConnect = falseparking state."Network connection lost."exception count returns to baseline (<10k/24h) within 48 h.Refs
🤖 Generated with Claude Code