feat(core-internal): Add configurable retry logic with exponential backoff#178
Open
codeW-Krish wants to merge 15 commits into
Open
feat(core-internal): Add configurable retry logic with exponential backoff#178codeW-Krish wants to merge 15 commits into
codeW-Krish wants to merge 15 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #179
Summary
Adds configurable retry logic with exponential backoff and jitter to all HTTP transport layers in the GitHits CLI. Transient network failures (timeouts, connection resets, DNS failures), rate limiting (HTTP 429), and temporary server errors (HTTP 5xx) are now automatically retried without user intervention. AI assistant tool calls succeed more reliably in imperfect network conditions.
The retry system is idempotent-aware — only requests that produce the same result when retried (GET, GraphQL queries, search) are eligible for automatic retry. Non-idempotent requests (feedback submissions) are never retried.
Why this change
Previously the HTTP layer had zero retry capability:
fetch-timeout.tspkgseer-graphql.tsPkgseerTransportErrorretryOptionsgithits-service.tscode-navigation-service.tspackage-intelligence-service.tsretryableflag existed but was unusedretryableflagretry.attempt,retry.delayMs,retry.errorretryAttempts,retryAfterfieldsEvery transient failure caused an immediate hard error. Users experienced broken tool calls in their AI assistants when network conditions were imperfect. After this PR, transient failures are invisible to users — the CLI retries automatically up to 3 times with exponential backoff.
What changed in this repo
New files
packages/core-internal/src/shared/retry.tsretryWithBackoff(),isRetryableError(),calculateDelay()— transport-agnostic, zero HTTP dependenciespackages/core-internal/src/shared/retry.test.tsdocs/guidelines/RETRY_GUIDELINES.mdModified files
packages/core-internal/src/shared/fetch-timeout.tsRetryFetchOptionsinterface,retryFetchWithTimeout()function wrappingfetchWithTimeoutwith retry looppackages/core-internal/src/shared/fetch-timeout.test.tsretryFetchWithTimeout— first attempt success, retry on timeout, max retries exceeded, non-retryable errors, abort signalpackages/core-internal/src/shared/pkgseer-graphql.tsretryOptionstoPkgseerGraphqlRequestinterface;postPkgseerGraphql()now delegates toretryFetchWithTimeoutwhen retry options presentpackages/core-internal/src/shared/pkgseer-graphql.test.tspackages/core-internal/src/shared/telemetry.tsrecordRetryAttempt()helper for tracking retry attempts in telemetry spanspackages/core-internal/src/services/githits-service.tsRetryConfigconstructor param;search(),getLanguages(),searchLanguages()useretryFetchWithTimeout;submitFeedback()intentionally NOT retried (non-idempotent)packages/core-internal/src/services/code-navigation-service.tsretryConfigconstructor param;postGraphqlWithTargetResolutionFallback()passesretryOptionsthrough topostPkgseerGraphqlpackages/core-internal/src/services/package-intelligence-service.tsretryConfigconstructor param +getRetryOptions()helper; all 7postPkgseerGraphqlcalls now passretryOptionspackages/core-internal/src/services/config.tsRetryConfiginterface,getRetryConfig()function with env var overrides (GITHITS_RETRY_MAX,GITHITS_RETRY_BASE_DELAY_MS,GITHITS_RETRY_MAX_DELAY_MS,GITHITS_RETRY_JITTER)packages/mcp/src/shared/code-navigation-error-map.tsretryAttemptsandretryAfterfields toMappedErrorinterfacepackages/mcp/src/tools/shared.tsretryAttemptsandretryAftertoToolErrorEnvelopeinterface andbuildMcpErrorPayload()New env vars (optional, all have defaults)
GITHITS_RETRY_MAX3GITHITS_RETRY_BASE_DELAY_MS1000GITHITS_RETRY_MAX_DELAY_MS30000GITHITS_RETRY_JITTERtrue"true"/"false")Architecture
Before — Single attempt, hard failure
%%{init: { "theme": "base", "themeVariables": { "background": "#0b1020", "primaryColor": "#111827", "primaryBorderColor": "#334155", "primaryTextColor": "#f8fafc", "lineColor": "#64748b", "textColor": "#e2e8f0" } }}%% flowchart TD A[Tool Request] --> B[executeWithTokenRefresh] B --> C[fetchWithTimeout] C --> D[Send HTTP Request] D --> E{Request Successful?} E -->|Yes| F[Return Response ✅] E -->|No| G[Return Error ❌] N["User sees broken tool call<br/>No retry attempted"] G -.-> N style F fill:#14532d,stroke:#22c55e,color:#fff style G fill:#451a1a,stroke:#ef4444,color:#fff style N fill:#1e293b,stroke:#475569,color:#fffAfter — Automatic retry with exponential backoff
%%{init: { "theme": "base", "themeVariables": { "background": "#0b1020", "primaryColor": "#111827", "primaryBorderColor": "#334155", "primaryTextColor": "#f8fafc", "lineColor": "#64748b", "textColor": "#e2e8f0" } }}%% flowchart TD A[Tool Request] --> B[executeWithTokenRefresh] B --> C[retryFetchWithTimeout] C --> D[Send HTTP Request] D --> E{Request Successful?} E -->|Yes| F[Return Response ✅] E -->|No| G{Retryable Error?} G -->|No| H[Return Error ❌] G -->|Yes| I{Attempts Remaining?} I -->|No| H I -->|Yes| J[Calculate Exponential Backoff] J --> K[Wait Delay + Jitter] K --> D style F fill:#14532d,stroke:#22c55e,color:#fff style H fill:#451a1a,stroke:#ef4444,color:#fff style J fill:#172554,stroke:#3b82f6,color:#fff style K fill:#172554,stroke:#3b82f6,color:#fffError classification → retry decision
%%{init: { "theme": "base", "themeVariables": { "background": "#0b1020", "primaryColor": "#111827", "primaryBorderColor": "#334155", "primaryTextColor": "#f8fafc", "lineColor": "#64748b", "textColor": "#e2e8f0" } }}%% graph TD A[HTTP Request Fails] A --> B{Error Type} B --> C[Retryable Error] B --> D[Non-Retryable Error] B --> E[Authentication Error] C --> C1[FetchTimeoutError] C --> C2[PkgseerTransportError] C --> C3[HTTP 429] C --> C4[HTTP 5xx] D --> D1[HTTP 4xx except 429] D --> D2[ValidationError] D --> D3[AccessDenied] C1 --> F[RETRY ✅] C2 --> F C3 --> F C4 --> F D1 --> G[NO RETRY ❌] D2 --> G D3 --> G E --> H[Token Refresh 🔄] F --> I{Attempts < maxRetries?} I -->|Yes| J[Wait: baseDelay × 2^attempt × jitter] I -->|No| K[Throw Last Error] J --> A style F fill:#14532d,stroke:#22c55e,color:#fff style G fill:#451a1a,stroke:#ef4444,color:#fff style H fill:#172554,stroke:#3b82f6,color:#fff style K fill:#451a1a,stroke:#ef4444,color:#fffRetry timing
Jitter adds randomness:
delay × (0.5 + Math.random() × 0.5)— prevents thundering herd.Idempotency
The
submitFeedback()method ingithits-service.tsis intentionally not retried.Security properties
submitFeedback()explicitly skips retrymaxRetries(default 3)maxDelayMs(default 30s)AuthenticationErrorclassified as non-retryable (handled by token refresh)MCP error envelope
When a request fails after all retry attempts, the MCP error envelope includes retry metadata:
{ "error": "Code navigation request timed out.", "code": "TIMEOUT", "retryable": true, "retryAttempts": 3, "retryAfter": 5000 }retryAttempts— Number of attempts made (0 if no retry)retryAfter— Suggested delay in milliseconds before retrying (fromRetry-Afterheader)Configuration
Environment variables
Service-level override
Testing checklist
Unit tests (29 total, all passing)
retry.test.tsisRetryableError,calculateDelay,retryWithBackoff(success, retry, max retries, non-retryable, abort, onRetry callback)fetch-timeout.test.tsretryFetchWithTimeout(first attempt, retry on error, max retries, non-retryable, abort signal)Type checking
bun run typecheck # 0 errorsBackward compatibility
submitFeedback()still does NOT retry