fix: JWT token refresh race conditions in seedless-onboarding-controller.#7989
fix: JWT token refresh race conditions in seedless-onboarding-controller.#7989himanshuchawla009 wants to merge 4 commits intomainfrom
Conversation
b2e076c to
27d5ba4
Compare
|
@metamaskbot publish-preview |
|
Preview builds have been published. See these instructions for more information about preview builds. Expand for full list of packages and versions. |
…MaxKeyChainLength recovery Companion to MetaMask/core#7989. - Add `RefreshTokenHttpError` class to `AuthTokenHandler` that carries the HTTP `statusCode`, enabling callers to distinguish permanent (401) from transient failures. The core controller duck-types this error shape to map 401 responses to `InvalidRefreshToken` instead of the generic `FailedToRefreshJWTTokens`. - Wrap `SeedlessOnboardingController.refreshAuthTokens()` in a try/catch in the `MaxKeyChainLengthExceeded` recovery path. A stale or revoked refresh token should not abort rehydration — the error is logged and execution continues to `rehydrateSeedPhrase`.
| pendingToBeRevokedTokens?: { | ||
| refreshToken: string; | ||
| revokeToken: string; | ||
| queuedAt?: number; |
There was a problem hiding this comment.
Hi, we add this new property queuedAt so that the pendingToBeRevokedTokens will stay in the state for at most 15 seconds.
Since this state value is persisted so we will accidentally expose the revokeToken in the plain text?
There was a problem hiding this comment.
I've noticed that the way we use revokePendingRefreshTokens method in extension and mobile is abit different too.
We call this method immediately after renewRefreshToken is successful in the extension.
But in mobile, we add the 15s delay between calls.
There was a problem hiding this comment.
have reverted the queuedAt property. We will remove it from mobile as well.
…s and fix 401 log
- Add test: concurrent refreshAuthTokens rejection propagates to all callers
- Add test: #pendingRefreshPromise is cleared after failure (recovery path)
- Add test: live state.refreshToken is used in authenticate after concurrent token rotation
- Add test: old token not queued for revocation when vault creation fails
- Move coalescing test inside describe('refreshAuthTokens') (was at wrong nesting level)
- Log httpStatusCode alongside error in #doRefreshAuthTokens catch block so
401 misclassification is observable in production logs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d from token revocation The 15 s delay before revoking pending tokens was intended to cover in-flight refreshAuthTokens calls on the old token, but the mobile client plans to remove the corresponding delay on its side. Remove the PENDING_REVOKE_TOKEN_DELAY_MS constant, the queuedAt timestamp field, the filtering logic in revokePendingRefreshTokens, and the associated tests and changelog entry.
af61130 to
0e35a86
Compare
Explanation
The token refresh/rotation lifecycle has several race conditions that produce stale or already-revoked tokens being sent to the auth server. In production this manifests as mass 401 responses from
POST /api/v1/oauth/token, surfacing to users asSeedlessOnboardingController - Failed to refresh JWT tokensThree bugs are fixed:
1. Concurrent
refreshAuthTokensstormrefreshAuthTokenshad no guard against concurrent invocations. Multiple simultaneous callers (e.g.fetchMetadataAccessCreds,#executeWithTokenRefresh, a background reconnect) each issued an independent HTTP refresh request with the same token. The server honours the first one and may reject the rest.Fix: a
#pendingRefreshPromisefield coalesces concurrent calls — if a refresh is already in-flight, new callers receive the same promise rather than firing a duplicate request.2. Stale
refreshTokencapture bugInside
#doRefreshAuthTokens, after the HTTP call returns, the code was callingauthenticate(…, refreshToken: refreshToken)whererefreshTokenwas captured at the start of the method. IfrenewRefreshTokenran concurrently during the HTTP round-trip,state.refreshTokenwould already have been updated to the new token. Passing the captured (now-stale) value toauthenticatewould silently overwritestate.refreshTokenwith the old token, causing all subsequent refreshes to 401 once that old token is revoked 15 s later.Fix: use
this.state.refreshToken(the live value) rather than the captured variable when callingauthenticate.3. Opaque error taxonomy + premature revocation queuing
All HTTP failures from the refresh endpoint were wrapped in the same
FailedToRefreshJWTTokenserror, making it impossible to distinguish a permanently-revoked token (401) from a transient failure (503, network timeout). The controller also called#addRefreshTokenToRevokeListbefore#createNewVaultWithAuthData, meaning a token was queued for revocation even if vault creation subsequently failed.Fix: the catch block in
#doRefreshAuthTokenschecks whether the underlying error is aRefreshTokenHttpErrorwithstatusCode === 401and maps that toInvalidRefreshTokeninstead ofFailedToRefreshJWTTokens, giving callers a distinct permanent vs transient signal. The#addRefreshTokenToRevokeListcall is moved to after#createNewVaultWithAuthDatasucceeds.References
METAMASK-MOBILE-4FKN—SeedlessOnboardingController - Failed to refresh JWT tokens(metamask-mobile: try/catch aroundrefreshAuthTokensin theMaxKeyChainLengthExceededrecovery path +RefreshTokenHttpErrortyped error class inAuthTokenHandlerChecklist
Note
Medium Risk
Touches authentication/token refresh and rotation flows; while scoped and well-tested, regressions could impact login/session continuity and token revocation behavior.
Overview
Prevents refresh-token race conditions in
SeedlessOnboardingControllerby coalescing concurrentrefreshAuthTokenscalls behind a shared in-flight promise and splitting the implementation intorefreshAuthTokens+#doRefreshAuthTokens.Improves correctness and observability by using the live
state.refreshTokenwhen re-callingauthenticateafter an HTTP refresh, mapping 401RefreshTokenHttpErrortoInvalidRefreshToken(vs genericFailedToRefreshJWTTokens), and only queueing the old token for revocation after#createNewVaultWithAuthDatasucceeds duringrenewRefreshToken.Adds targeted unit tests for the new error behavior, concurrency coalescing/cleanup, and the vault-failure revocation-queueing fix, and documents the changes in the package changelog.
Written by Cursor Bugbot for commit 0e35a86. This will update automatically on new commits. Configure here.