Skip to content

fix: JWT token refresh race conditions in seedless-onboarding-controller.#7989

Merged
himanshuchawla009 merged 4 commits intomainfrom
fix/refresh-token-concurrency
Feb 23, 2026
Merged

fix: JWT token refresh race conditions in seedless-onboarding-controller.#7989
himanshuchawla009 merged 4 commits intomainfrom
fix/refresh-token-concurrency

Conversation

@himanshuchawla009
Copy link
Contributor

@himanshuchawla009 himanshuchawla009 commented Feb 19, 2026

Explanation

The token refresh/rotation lifecycle has several race conditions that produce stale or already-revoked tokens being sent to the auth server. In production this manifests as mass 401 responses from POST /api/v1/oauth/token, surfacing to users as SeedlessOnboardingController - Failed to refresh JWT tokens

Three bugs are fixed:

1. Concurrent refreshAuthTokens storm

refreshAuthTokens had no guard against concurrent invocations. Multiple simultaneous callers (e.g. fetchMetadataAccessCreds, #executeWithTokenRefresh, a background reconnect) each issued an independent HTTP refresh request with the same token. The server honours the first one and may reject the rest.

Fix: a #pendingRefreshPromise field coalesces concurrent calls — if a refresh is already in-flight, new callers receive the same promise rather than firing a duplicate request.

2. Stale refreshToken capture bug

Inside #doRefreshAuthTokens, after the HTTP call returns, the code was calling authenticate(…, refreshToken: refreshToken) where refreshToken was captured at the start of the method. If renewRefreshToken ran concurrently during the HTTP round-trip, state.refreshToken would already have been updated to the new token. Passing the captured (now-stale) value to authenticate would silently overwrite state.refreshToken with the old token, causing all subsequent refreshes to 401 once that old token is revoked 15 s later.

Fix: use this.state.refreshToken (the live value) rather than the captured variable when calling authenticate.

3. Opaque error taxonomy + premature revocation queuing

All HTTP failures from the refresh endpoint were wrapped in the same FailedToRefreshJWTTokens error, making it impossible to distinguish a permanently-revoked token (401) from a transient failure (503, network timeout). The controller also called #addRefreshTokenToRevokeList before #createNewVaultWithAuthData, meaning a token was queued for revocation even if vault creation subsequently failed.

Fix: the catch block in #doRefreshAuthTokens checks whether the underlying error is a RefreshTokenHttpError with statusCode === 401 and maps that to InvalidRefreshToken instead of FailedToRefreshJWTTokens, giving callers a distinct permanent vs transient signal. The #addRefreshTokenToRevokeList call is moved to after #createNewVaultWithAuthData succeeds.

References

  • Related to Sentry issue METAMASK-MOBILE-4FKNSeedlessOnboardingController - Failed to refresh JWT tokens (
  • Companion PR in metamask-mobile: try/catch around refreshAuthTokens in the MaxKeyChainLengthExceeded recovery path + RefreshTokenHttpError typed error class in AuthTokenHandler

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've communicated my changes to consumers by updating changelogs for packages I've changed
  • I've introduced breaking changes in this PR and have prepared draft pull requests for clients and consumer packages to resolve them

Note

Medium Risk
Touches authentication/token refresh and rotation flows; while scoped and well-tested, regressions could impact login/session continuity and token revocation behavior.

Overview
Prevents refresh-token race conditions in SeedlessOnboardingController by coalescing concurrent refreshAuthTokens calls behind a shared in-flight promise and splitting the implementation into refreshAuthTokens + #doRefreshAuthTokens.

Improves correctness and observability by using the live state.refreshToken when re-calling authenticate after an HTTP refresh, mapping 401 RefreshTokenHttpError to InvalidRefreshToken (vs generic FailedToRefreshJWTTokens), and only queueing the old token for revocation after #createNewVaultWithAuthData succeeds during renewRefreshToken.

Adds targeted unit tests for the new error behavior, concurrency coalescing/cleanup, and the vault-failure revocation-queueing fix, and documents the changes in the package changelog.

Written by Cursor Bugbot for commit 0e35a86. This will update automatically on new commits. Configure here.

@himanshuchawla009 himanshuchawla009 requested a review from a team as a code owner February 19, 2026 05:47
@himanshuchawla009 himanshuchawla009 changed the title fix(seedless-onboarding-controller): fix JWT token refresh race condi… fix(seedless-onboarding-controller): fix JWT token refresh race conditions. Feb 19, 2026
@himanshuchawla009 himanshuchawla009 changed the title fix(seedless-onboarding-controller): fix JWT token refresh race conditions. fix: fix JWT token refresh race conditions in seedless-onboarding-controller. Feb 19, 2026
@himanshuchawla009 himanshuchawla009 changed the title fix: fix JWT token refresh race conditions in seedless-onboarding-controller. fix: JWT token refresh race conditions in seedless-onboarding-controller. Feb 19, 2026
@himanshuchawla009 himanshuchawla009 requested a review from a team as a code owner February 19, 2026 05:53
@himanshuchawla009 himanshuchawla009 force-pushed the fix/refresh-token-concurrency branch from b2e076c to 27d5ba4 Compare February 19, 2026 06:00
@lwin-kyaw
Copy link
Contributor

@metamaskbot publish-preview

@github-actions
Copy link
Contributor

Preview builds have been published. See these instructions for more information about preview builds.

Expand for full list of packages and versions.
{
  "@metamask-previews/account-tree-controller": "4.1.1-preview-511b467",
  "@metamask-previews/accounts-controller": "36.0.0-preview-511b467",
  "@metamask-previews/address-book-controller": "7.0.1-preview-511b467",
  "@metamask-previews/ai-controllers": "0.1.0-preview-511b467",
  "@metamask-previews/analytics-controller": "1.0.0-preview-511b467",
  "@metamask-previews/analytics-data-regulation-controller": "0.0.0-preview-511b467",
  "@metamask-previews/announcement-controller": "8.0.0-preview-511b467",
  "@metamask-previews/app-metadata-controller": "2.0.0-preview-511b467",
  "@metamask-previews/approval-controller": "8.0.0-preview-511b467",
  "@metamask-previews/assets-controller": "2.0.0-preview-511b467",
  "@metamask-previews/assets-controllers": "99.4.0-preview-511b467",
  "@metamask-previews/base-controller": "9.0.0-preview-511b467",
  "@metamask-previews/bridge-controller": "67.0.0-preview-511b467",
  "@metamask-previews/bridge-status-controller": "67.0.0-preview-511b467",
  "@metamask-previews/build-utils": "3.0.4-preview-511b467",
  "@metamask-previews/chain-agnostic-permission": "1.4.0-preview-511b467",
  "@metamask-previews/claims-controller": "0.4.2-preview-511b467",
  "@metamask-previews/client-controller": "1.0.0-preview-511b467",
  "@metamask-previews/compliance-controller": "0.0.0-preview-511b467",
  "@metamask-previews/composable-controller": "12.0.0-preview-511b467",
  "@metamask-previews/connectivity-controller": "0.1.0-preview-511b467",
  "@metamask-previews/controller-utils": "11.18.0-preview-511b467",
  "@metamask-previews/core-backend": "5.1.1-preview-511b467",
  "@metamask-previews/delegation-controller": "2.0.1-preview-511b467",
  "@metamask-previews/earn-controller": "11.1.0-preview-511b467",
  "@metamask-previews/eip-5792-middleware": "2.1.0-preview-511b467",
  "@metamask-previews/eip-7702-internal-rpc-middleware": "0.1.0-preview-511b467",
  "@metamask-previews/eip1193-permission-middleware": "1.0.3-preview-511b467",
  "@metamask-previews/ens-controller": "19.0.2-preview-511b467",
  "@metamask-previews/error-reporting-service": "3.0.1-preview-511b467",
  "@metamask-previews/eth-block-tracker": "15.0.1-preview-511b467",
  "@metamask-previews/eth-json-rpc-middleware": "23.1.0-preview-511b467",
  "@metamask-previews/eth-json-rpc-provider": "6.0.0-preview-511b467",
  "@metamask-previews/foundryup": "1.0.1-preview-511b467",
  "@metamask-previews/gas-fee-controller": "26.0.2-preview-511b467",
  "@metamask-previews/gator-permissions-controller": "2.0.0-preview-511b467",
  "@metamask-previews/json-rpc-engine": "10.2.2-preview-511b467",
  "@metamask-previews/json-rpc-middleware-stream": "8.0.8-preview-511b467",
  "@metamask-previews/keyring-controller": "25.1.0-preview-511b467",
  "@metamask-previews/logging-controller": "7.0.1-preview-511b467",
  "@metamask-previews/message-manager": "14.1.0-preview-511b467",
  "@metamask-previews/messenger": "0.3.0-preview-511b467",
  "@metamask-previews/multichain-account-service": "7.0.0-preview-511b467",
  "@metamask-previews/multichain-api-middleware": "1.2.6-preview-511b467",
  "@metamask-previews/multichain-network-controller": "3.0.3-preview-511b467",
  "@metamask-previews/multichain-transactions-controller": "7.0.1-preview-511b467",
  "@metamask-previews/name-controller": "9.0.0-preview-511b467",
  "@metamask-previews/network-controller": "29.0.0-preview-511b467",
  "@metamask-previews/network-enablement-controller": "4.1.1-preview-511b467",
  "@metamask-previews/notification-services-controller": "22.0.0-preview-511b467",
  "@metamask-previews/permission-controller": "12.2.0-preview-511b467",
  "@metamask-previews/permission-log-controller": "5.0.0-preview-511b467",
  "@metamask-previews/perps-controller": "0.0.0-preview-511b467",
  "@metamask-previews/phishing-controller": "16.3.0-preview-511b467",
  "@metamask-previews/polling-controller": "16.0.2-preview-511b467",
  "@metamask-previews/preferences-controller": "22.1.0-preview-511b467",
  "@metamask-previews/profile-metrics-controller": "3.0.1-preview-511b467",
  "@metamask-previews/profile-sync-controller": "27.1.0-preview-511b467",
  "@metamask-previews/ramps-controller": "8.1.0-preview-511b467",
  "@metamask-previews/rate-limit-controller": "7.0.0-preview-511b467",
  "@metamask-previews/remote-feature-flag-controller": "4.0.0-preview-511b467",
  "@metamask-previews/sample-controllers": "4.0.2-preview-511b467",
  "@metamask-previews/seedless-onboarding-controller": "8.0.0-preview-511b467",
  "@metamask-previews/selected-network-controller": "26.0.2-preview-511b467",
  "@metamask-previews/shield-controller": "5.0.1-preview-511b467",
  "@metamask-previews/signature-controller": "39.0.3-preview-511b467",
  "@metamask-previews/storage-service": "1.0.0-preview-511b467",
  "@metamask-previews/subscription-controller": "6.0.0-preview-511b467",
  "@metamask-previews/transaction-controller": "62.17.0-preview-511b467",
  "@metamask-previews/transaction-pay-controller": "15.1.0-preview-511b467",
  "@metamask-previews/user-operation-controller": "41.0.2-preview-511b467"
}

himanshuchawla009 added a commit to MetaMask/metamask-mobile that referenced this pull request Feb 19, 2026
…MaxKeyChainLength recovery

Companion to MetaMask/core#7989.

- Add `RefreshTokenHttpError` class to `AuthTokenHandler` that carries the
  HTTP `statusCode`, enabling callers to distinguish permanent (401) from
  transient failures. The core controller duck-types this error shape to map
  401 responses to `InvalidRefreshToken` instead of the generic
  `FailedToRefreshJWTTokens`.

- Wrap `SeedlessOnboardingController.refreshAuthTokens()` in a try/catch in
  the `MaxKeyChainLengthExceeded` recovery path. A stale or revoked refresh
  token should not abort rehydration — the error is logged and execution
  continues to `rehydrateSeedPhrase`.
pendingToBeRevokedTokens?: {
refreshToken: string;
revokeToken: string;
queuedAt?: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we add this new property queuedAt so that the pendingToBeRevokedTokens will stay in the state for at most 15 seconds.
Since this state value is persisted so we will accidentally expose the revokeToken in the plain text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that the way we use revokePendingRefreshTokens method in extension and mobile is abit different too.

We call this method immediately after renewRefreshToken is successful in the extension.
But in mobile, we add the 15s delay between calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have reverted the queuedAt property. We will remove it from mobile as well.

himanshuchawla009 and others added 4 commits February 20, 2026 16:53
…s and fix 401 log

- Add test: concurrent refreshAuthTokens rejection propagates to all callers
- Add test: #pendingRefreshPromise is cleared after failure (recovery path)
- Add test: live state.refreshToken is used in authenticate after concurrent token rotation
- Add test: old token not queued for revocation when vault creation fails
- Move coalescing test inside describe('refreshAuthTokens') (was at wrong nesting level)
- Log httpStatusCode alongside error in #doRefreshAuthTokens catch block so
  401 misclassification is observable in production logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d from token revocation

The 15 s delay before revoking pending tokens was intended to cover in-flight
refreshAuthTokens calls on the old token, but the mobile client plans to remove
the corresponding delay on its side. Remove the PENDING_REVOKE_TOKEN_DELAY_MS
constant, the queuedAt timestamp field, the filtering logic in
revokePendingRefreshTokens, and the associated tests and changelog entry.
@himanshuchawla009 himanshuchawla009 force-pushed the fix/refresh-token-concurrency branch from af61130 to 0e35a86 Compare February 20, 2026 08:53
@himanshuchawla009 himanshuchawla009 added this pull request to the merge queue Feb 23, 2026
Merged via the queue into main with commit cd1306d Feb 23, 2026
310 checks passed
@himanshuchawla009 himanshuchawla009 deleted the fix/refresh-token-concurrency branch February 23, 2026 04:54
github-merge-queue bot pushed a commit to MetaMask/metamask-mobile that referenced this pull request Feb 25, 2026
…MaxKeyChainLength recovery (#26258)

## Explanation

Companion to MetaMask/core#7989.

### 1. `RefreshTokenHttpError` in `AuthTokenHandler`

The core `SeedlessOnboardingController` now distinguishes permanent
token failures (401) from transient ones to apply the right recovery
strategy. It does this by checking whether the error thrown by
`refreshJWTToken` is a `RefreshTokenHttpError` with `statusCode ===
401`.

Previously `AuthTokenHandler` threw a plain `Error('Failed to refresh
JWT token')` for all non-2xx responses, making this distinction
impossible. This PR adds a `RefreshTokenHttpError` class that carries
the HTTP `statusCode`, giving the controller the information it needs

### 2. try/catch around `refreshAuthTokens` in the
`MaxKeyChainLengthExceeded` recovery path

`refreshAuthTokens` is called fire-and-forget in the
`MaxKeyChainLengthExceeded` recovery path before `rehydrateSeedPhrase`.
If the refresh token is already stale or revoked (which is precisely the
scenario that triggers this path), the call would throw and abort
rehydration entirely — leaving the user stuck.

Wrapping in try/catch logs the error and lets rehydration proceed
regardless.

### 3. encryptorAdapter for SeedlessOnboardingController
KeyringController has a generic EncryptionResult type that accepts our
mobile Encryptor's cipher property. However,
SeedlessOnboardingController uses a fixed VaultEncryptor type (from
ExportableKeyEncryptor) that strictly requires data instead of cipher.

## **Changelog**

<!--
If this PR is not End-User-Facing and should not show up in the
CHANGELOG, you can choose to either:
1. Write `CHANGELOG entry: null`
2. Label with `no-changelog`

If this PR is End-User-Facing, please write a short User-Facing
description in the past tense like:
`CHANGELOG entry: Added a new tab for users to see their NFTs`
`CHANGELOG entry: Fixed a bug that was causing some NFTs to flicker`

(This helps the Release Engineer do their job more quickly and
accurately)
-->

CHANGELOG entry: null


## References

- Companion PR: MetaMask/core#7989 — fixes JWT token refresh race
conditions in `SeedlessOnboardingController`

## Checklist

- [ ] I've updated the test suite for new or updated code as appropriate
- [ ] I've updated documentation (JSDoc, Markdown, etc.) for new or
updated code as appropriate
- [ ] I've manually tested the PR (required)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches OAuth token refresh and seedless recovery flows; incorrect
error typing/handling could change recovery behavior for revoked or
transient refresh-token failures.
> 
> **Overview**
> Improves seedless OAuth token-refresh error handling so callers can
distinguish auth-server HTTP failures by status code.
> 
> `AuthTokenHandler.refreshJWTToken` now throws `RefreshTokenHttpError`
(includes `statusCode`) on non-2xx responses, and the
`MaxKeyChainLengthExceeded` recovery path in
`Authentication.syncPasswordAndUnlockWallet` wraps
`SeedlessOnboardingController.refreshAuthTokens()` in a `try/catch` to
log and continue with `rehydrateSeedPhrase` instead of aborting
recovery.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
413b327. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Gaurav Goel <grvgoel19@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants