Skip to content

fix: JWT token refresh race conditions in seedless-onboarding-controller.#7989

Open
himanshuchawla009 wants to merge 4 commits intomainfrom
fix/refresh-token-concurrency
Open

fix: JWT token refresh race conditions in seedless-onboarding-controller.#7989
himanshuchawla009 wants to merge 4 commits intomainfrom
fix/refresh-token-concurrency

Conversation

@himanshuchawla009
Copy link
Contributor

@himanshuchawla009 himanshuchawla009 commented Feb 19, 2026

Explanation

The token refresh/rotation lifecycle has several race conditions that produce stale or already-revoked tokens being sent to the auth server. In production this manifests as mass 401 responses from POST /api/v1/oauth/token, surfacing to users as SeedlessOnboardingController - Failed to refresh JWT tokens

Three bugs are fixed:

1. Concurrent refreshAuthTokens storm

refreshAuthTokens had no guard against concurrent invocations. Multiple simultaneous callers (e.g. fetchMetadataAccessCreds, #executeWithTokenRefresh, a background reconnect) each issued an independent HTTP refresh request with the same token. The server honours the first one and may reject the rest.

Fix: a #pendingRefreshPromise field coalesces concurrent calls — if a refresh is already in-flight, new callers receive the same promise rather than firing a duplicate request.

2. Stale refreshToken capture bug

Inside #doRefreshAuthTokens, after the HTTP call returns, the code was calling authenticate(…, refreshToken: refreshToken) where refreshToken was captured at the start of the method. If renewRefreshToken ran concurrently during the HTTP round-trip, state.refreshToken would already have been updated to the new token. Passing the captured (now-stale) value to authenticate would silently overwrite state.refreshToken with the old token, causing all subsequent refreshes to 401 once that old token is revoked 15 s later.

Fix: use this.state.refreshToken (the live value) rather than the captured variable when calling authenticate.

3. Opaque error taxonomy + premature revocation queuing

All HTTP failures from the refresh endpoint were wrapped in the same FailedToRefreshJWTTokens error, making it impossible to distinguish a permanently-revoked token (401) from a transient failure (503, network timeout). The controller also called #addRefreshTokenToRevokeList before #createNewVaultWithAuthData, meaning a token was queued for revocation even if vault creation subsequently failed.

Fix: the catch block in #doRefreshAuthTokens checks whether the underlying error is a RefreshTokenHttpError with statusCode === 401 and maps that to InvalidRefreshToken instead of FailedToRefreshJWTTokens, giving callers a distinct permanent vs transient signal. The #addRefreshTokenToRevokeList call is moved to after #createNewVaultWithAuthData succeeds.

References

  • Related to Sentry issue METAMASK-MOBILE-4FKNSeedlessOnboardingController - Failed to refresh JWT tokens (
  • Companion PR in metamask-mobile: try/catch around refreshAuthTokens in the MaxKeyChainLengthExceeded recovery path + RefreshTokenHttpError typed error class in AuthTokenHandler

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've communicated my changes to consumers by updating changelogs for packages I've changed
  • I've introduced breaking changes in this PR and have prepared draft pull requests for clients and consumer packages to resolve them

Note

Medium Risk
Touches authentication/token refresh and rotation flows; while scoped and well-tested, regressions could impact login/session continuity and token revocation behavior.

Overview
Prevents refresh-token race conditions in SeedlessOnboardingController by coalescing concurrent refreshAuthTokens calls behind a shared in-flight promise and splitting the implementation into refreshAuthTokens + #doRefreshAuthTokens.

Improves correctness and observability by using the live state.refreshToken when re-calling authenticate after an HTTP refresh, mapping 401 RefreshTokenHttpError to InvalidRefreshToken (vs generic FailedToRefreshJWTTokens), and only queueing the old token for revocation after #createNewVaultWithAuthData succeeds during renewRefreshToken.

Adds targeted unit tests for the new error behavior, concurrency coalescing/cleanup, and the vault-failure revocation-queueing fix, and documents the changes in the package changelog.

Written by Cursor Bugbot for commit 0e35a86. This will update automatically on new commits. Configure here.

@himanshuchawla009 himanshuchawla009 requested a review from a team as a code owner February 19, 2026 05:47
@himanshuchawla009 himanshuchawla009 changed the title fix(seedless-onboarding-controller): fix JWT token refresh race condi… fix(seedless-onboarding-controller): fix JWT token refresh race conditions. Feb 19, 2026
@himanshuchawla009 himanshuchawla009 changed the title fix(seedless-onboarding-controller): fix JWT token refresh race conditions. fix: fix JWT token refresh race conditions in seedless-onboarding-controller. Feb 19, 2026
@himanshuchawla009 himanshuchawla009 changed the title fix: fix JWT token refresh race conditions in seedless-onboarding-controller. fix: JWT token refresh race conditions in seedless-onboarding-controller. Feb 19, 2026
@himanshuchawla009 himanshuchawla009 requested a review from a team as a code owner February 19, 2026 05:53
@himanshuchawla009 himanshuchawla009 force-pushed the fix/refresh-token-concurrency branch from b2e076c to 27d5ba4 Compare February 19, 2026 06:00
@lwin-kyaw
Copy link
Contributor

@metamaskbot publish-preview

@github-actions
Copy link
Contributor

Preview builds have been published. See these instructions for more information about preview builds.

Expand for full list of packages and versions.
{
  "@metamask-previews/account-tree-controller": "4.1.1-preview-511b467",
  "@metamask-previews/accounts-controller": "36.0.0-preview-511b467",
  "@metamask-previews/address-book-controller": "7.0.1-preview-511b467",
  "@metamask-previews/ai-controllers": "0.1.0-preview-511b467",
  "@metamask-previews/analytics-controller": "1.0.0-preview-511b467",
  "@metamask-previews/analytics-data-regulation-controller": "0.0.0-preview-511b467",
  "@metamask-previews/announcement-controller": "8.0.0-preview-511b467",
  "@metamask-previews/app-metadata-controller": "2.0.0-preview-511b467",
  "@metamask-previews/approval-controller": "8.0.0-preview-511b467",
  "@metamask-previews/assets-controller": "2.0.0-preview-511b467",
  "@metamask-previews/assets-controllers": "99.4.0-preview-511b467",
  "@metamask-previews/base-controller": "9.0.0-preview-511b467",
  "@metamask-previews/bridge-controller": "67.0.0-preview-511b467",
  "@metamask-previews/bridge-status-controller": "67.0.0-preview-511b467",
  "@metamask-previews/build-utils": "3.0.4-preview-511b467",
  "@metamask-previews/chain-agnostic-permission": "1.4.0-preview-511b467",
  "@metamask-previews/claims-controller": "0.4.2-preview-511b467",
  "@metamask-previews/client-controller": "1.0.0-preview-511b467",
  "@metamask-previews/compliance-controller": "0.0.0-preview-511b467",
  "@metamask-previews/composable-controller": "12.0.0-preview-511b467",
  "@metamask-previews/connectivity-controller": "0.1.0-preview-511b467",
  "@metamask-previews/controller-utils": "11.18.0-preview-511b467",
  "@metamask-previews/core-backend": "5.1.1-preview-511b467",
  "@metamask-previews/delegation-controller": "2.0.1-preview-511b467",
  "@metamask-previews/earn-controller": "11.1.0-preview-511b467",
  "@metamask-previews/eip-5792-middleware": "2.1.0-preview-511b467",
  "@metamask-previews/eip-7702-internal-rpc-middleware": "0.1.0-preview-511b467",
  "@metamask-previews/eip1193-permission-middleware": "1.0.3-preview-511b467",
  "@metamask-previews/ens-controller": "19.0.2-preview-511b467",
  "@metamask-previews/error-reporting-service": "3.0.1-preview-511b467",
  "@metamask-previews/eth-block-tracker": "15.0.1-preview-511b467",
  "@metamask-previews/eth-json-rpc-middleware": "23.1.0-preview-511b467",
  "@metamask-previews/eth-json-rpc-provider": "6.0.0-preview-511b467",
  "@metamask-previews/foundryup": "1.0.1-preview-511b467",
  "@metamask-previews/gas-fee-controller": "26.0.2-preview-511b467",
  "@metamask-previews/gator-permissions-controller": "2.0.0-preview-511b467",
  "@metamask-previews/json-rpc-engine": "10.2.2-preview-511b467",
  "@metamask-previews/json-rpc-middleware-stream": "8.0.8-preview-511b467",
  "@metamask-previews/keyring-controller": "25.1.0-preview-511b467",
  "@metamask-previews/logging-controller": "7.0.1-preview-511b467",
  "@metamask-previews/message-manager": "14.1.0-preview-511b467",
  "@metamask-previews/messenger": "0.3.0-preview-511b467",
  "@metamask-previews/multichain-account-service": "7.0.0-preview-511b467",
  "@metamask-previews/multichain-api-middleware": "1.2.6-preview-511b467",
  "@metamask-previews/multichain-network-controller": "3.0.3-preview-511b467",
  "@metamask-previews/multichain-transactions-controller": "7.0.1-preview-511b467",
  "@metamask-previews/name-controller": "9.0.0-preview-511b467",
  "@metamask-previews/network-controller": "29.0.0-preview-511b467",
  "@metamask-previews/network-enablement-controller": "4.1.1-preview-511b467",
  "@metamask-previews/notification-services-controller": "22.0.0-preview-511b467",
  "@metamask-previews/permission-controller": "12.2.0-preview-511b467",
  "@metamask-previews/permission-log-controller": "5.0.0-preview-511b467",
  "@metamask-previews/perps-controller": "0.0.0-preview-511b467",
  "@metamask-previews/phishing-controller": "16.3.0-preview-511b467",
  "@metamask-previews/polling-controller": "16.0.2-preview-511b467",
  "@metamask-previews/preferences-controller": "22.1.0-preview-511b467",
  "@metamask-previews/profile-metrics-controller": "3.0.1-preview-511b467",
  "@metamask-previews/profile-sync-controller": "27.1.0-preview-511b467",
  "@metamask-previews/ramps-controller": "8.1.0-preview-511b467",
  "@metamask-previews/rate-limit-controller": "7.0.0-preview-511b467",
  "@metamask-previews/remote-feature-flag-controller": "4.0.0-preview-511b467",
  "@metamask-previews/sample-controllers": "4.0.2-preview-511b467",
  "@metamask-previews/seedless-onboarding-controller": "8.0.0-preview-511b467",
  "@metamask-previews/selected-network-controller": "26.0.2-preview-511b467",
  "@metamask-previews/shield-controller": "5.0.1-preview-511b467",
  "@metamask-previews/signature-controller": "39.0.3-preview-511b467",
  "@metamask-previews/storage-service": "1.0.0-preview-511b467",
  "@metamask-previews/subscription-controller": "6.0.0-preview-511b467",
  "@metamask-previews/transaction-controller": "62.17.0-preview-511b467",
  "@metamask-previews/transaction-pay-controller": "15.1.0-preview-511b467",
  "@metamask-previews/user-operation-controller": "41.0.2-preview-511b467"
}

himanshuchawla009 added a commit to MetaMask/metamask-mobile that referenced this pull request Feb 19, 2026
…MaxKeyChainLength recovery

Companion to MetaMask/core#7989.

- Add `RefreshTokenHttpError` class to `AuthTokenHandler` that carries the
  HTTP `statusCode`, enabling callers to distinguish permanent (401) from
  transient failures. The core controller duck-types this error shape to map
  401 responses to `InvalidRefreshToken` instead of the generic
  `FailedToRefreshJWTTokens`.

- Wrap `SeedlessOnboardingController.refreshAuthTokens()` in a try/catch in
  the `MaxKeyChainLengthExceeded` recovery path. A stale or revoked refresh
  token should not abort rehydration — the error is logged and execution
  continues to `rehydrateSeedPhrase`.
pendingToBeRevokedTokens?: {
refreshToken: string;
revokeToken: string;
queuedAt?: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we add this new property queuedAt so that the pendingToBeRevokedTokens will stay in the state for at most 15 seconds.
Since this state value is persisted so we will accidentally expose the revokeToken in the plain text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that the way we use revokePendingRefreshTokens method in extension and mobile is abit different too.

We call this method immediately after renewRefreshToken is successful in the extension.
But in mobile, we add the 15s delay between calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have reverted the queuedAt property. We will remove it from mobile as well.

himanshuchawla009 and others added 4 commits February 20, 2026 16:53
…s and fix 401 log

- Add test: concurrent refreshAuthTokens rejection propagates to all callers
- Add test: #pendingRefreshPromise is cleared after failure (recovery path)
- Add test: live state.refreshToken is used in authenticate after concurrent token rotation
- Add test: old token not queued for revocation when vault creation fails
- Move coalescing test inside describe('refreshAuthTokens') (was at wrong nesting level)
- Log httpStatusCode alongside error in #doRefreshAuthTokens catch block so
  401 misclassification is observable in production logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d from token revocation

The 15 s delay before revoking pending tokens was intended to cover in-flight
refreshAuthTokens calls on the old token, but the mobile client plans to remove
the corresponding delay on its side. Remove the PENDING_REVOKE_TOKEN_DELAY_MS
constant, the queuedAt timestamp field, the filtering logic in
revokePendingRefreshTokens, and the associated tests and changelog entry.
@himanshuchawla009 himanshuchawla009 force-pushed the fix/refresh-token-concurrency branch from af61130 to 0e35a86 Compare February 20, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments