Skip to content

feat(core-internal): Add configurable retry logic with exponential backoff#178

Open
codeW-Krish wants to merge 15 commits into
githits-com:mainfrom
codeW-Krish:feat/add-retry-logic
Open

feat(core-internal): Add configurable retry logic with exponential backoff#178
codeW-Krish wants to merge 15 commits into
githits-com:mainfrom
codeW-Krish:feat/add-retry-logic

Conversation

@codeW-Krish

@codeW-Krish codeW-Krish commented Jun 20, 2026

Copy link
Copy Markdown

Closes #179

Summary

Adds configurable retry logic with exponential backoff and jitter to all HTTP transport layers in the GitHits CLI. Transient network failures (timeouts, connection resets, DNS failures), rate limiting (HTTP 429), and temporary server errors (HTTP 5xx) are now automatically retried without user intervention. AI assistant tool calls succeed more reliably in imperfect network conditions.

The retry system is idempotent-aware — only requests that produce the same result when retried (GET, GraphQL queries, search) are eligible for automatic retry. Non-idempotent requests (feedback submissions) are never retried.


Why this change

Previously the HTTP layer had zero retry capability:

Component Before After
fetch-timeout.ts Single attempt, throws on timeout Retry with exponential backoff + jitter
pkgseer-graphql.ts Single POST, throws PkgseerTransportError Configurable retry via retryOptions
githits-service.ts Single fetch per REST call Retry for search/languages (idempotent)
code-navigation-service.ts Single GraphQL call Retry options passed through to transport
package-intelligence-service.ts Single GraphQL call per operation (7 calls) All 7 calls retry-aware
Error classification retryable flag existed but was unused Retry logic consumes retryable flag
Telemetry No retry tracking retry.attempt, retry.delayMs, retry.error
MCP error envelope No retry metadata retryAttempts, retryAfter fields

Every transient failure caused an immediate hard error. Users experienced broken tool calls in their AI assistants when network conditions were imperfect. After this PR, transient failures are invisible to users — the CLI retries automatically up to 3 times with exponential backoff.


What changed in this repo

New files

File Purpose
packages/core-internal/src/shared/retry.ts Standalone retry utility: retryWithBackoff(), isRetryableError(), calculateDelay() — transport-agnostic, zero HTTP dependencies
packages/core-internal/src/shared/retry.test.ts 21 tests covering retry loop, error classification, backoff calculation, jitter, max retries, abort signals, onRetry callback
docs/guidelines/RETRY_GUIDELINES.md Developer + user documentation: retry behavior, configuration, troubleshooting, AI assistant guidance

Modified files

File Change
packages/core-internal/src/shared/fetch-timeout.ts Added RetryFetchOptions interface, retryFetchWithTimeout() function wrapping fetchWithTimeout with retry loop
packages/core-internal/src/shared/fetch-timeout.test.ts Added 8 tests for retryFetchWithTimeout — first attempt success, retry on timeout, max retries exceeded, non-retryable errors, abort signal
packages/core-internal/src/shared/pkgseer-graphql.ts Added retryOptions to PkgseerGraphqlRequest interface; postPkgseerGraphql() now delegates to retryFetchWithTimeout when retry options present
packages/core-internal/src/shared/pkgseer-graphql.test.ts Added retry tests for GraphQL transport layer
packages/core-internal/src/shared/telemetry.ts Added recordRetryAttempt() helper for tracking retry attempts in telemetry spans
packages/core-internal/src/services/githits-service.ts Added RetryConfig constructor param; search(), getLanguages(), searchLanguages() use retryFetchWithTimeout; submitFeedback() intentionally NOT retried (non-idempotent)
packages/core-internal/src/services/code-navigation-service.ts Added retryConfig constructor param; postGraphqlWithTargetResolutionFallback() passes retryOptions through to postPkgseerGraphql
packages/core-internal/src/services/package-intelligence-service.ts Added retryConfig constructor param + getRetryOptions() helper; all 7 postPkgseerGraphql calls now pass retryOptions
packages/core-internal/src/services/config.ts Added RetryConfig interface, getRetryConfig() function with env var overrides (GITHITS_RETRY_MAX, GITHITS_RETRY_BASE_DELAY_MS, GITHITS_RETRY_MAX_DELAY_MS, GITHITS_RETRY_JITTER)
packages/mcp/src/shared/code-navigation-error-map.ts Added retryAttempts and retryAfter fields to MappedError interface
packages/mcp/src/tools/shared.ts Added retryAttempts and retryAfter to ToolErrorEnvelope interface and buildMcpErrorPayload()

New env vars (optional, all have defaults)

Variable Default Description
GITHITS_RETRY_MAX 3 Maximum retry attempts
GITHITS_RETRY_BASE_DELAY_MS 1000 Base delay for exponential backoff
GITHITS_RETRY_MAX_DELAY_MS 30000 Maximum delay cap
GITHITS_RETRY_JITTER true Enable/disable jitter ("true" / "false")

Architecture

Before — Single attempt, hard failure

%%{init: {
  "theme": "base",
  "themeVariables": {
    "background": "#0b1020",
    "primaryColor": "#111827",
    "primaryBorderColor": "#334155",
    "primaryTextColor": "#f8fafc",
    "lineColor": "#64748b",
    "textColor": "#e2e8f0"
  }
}}%%
flowchart TD

    A[Tool Request]
        --> B[executeWithTokenRefresh]

    B --> C[fetchWithTimeout]

    C --> D[Send HTTP Request]

    D --> E{Request Successful?}

    E -->|Yes| F[Return Response ✅]

    E -->|No| G[Return Error ❌]

    N["User sees broken tool call<br/>No retry attempted"]

    G -.-> N

    style F fill:#14532d,stroke:#22c55e,color:#fff
    style G fill:#451a1a,stroke:#ef4444,color:#fff
    style N fill:#1e293b,stroke:#475569,color:#fff
Loading

After — Automatic retry with exponential backoff

%%{init: {
  "theme": "base",
  "themeVariables": {
    "background": "#0b1020",
    "primaryColor": "#111827",
    "primaryBorderColor": "#334155",
    "primaryTextColor": "#f8fafc",
    "lineColor": "#64748b",
    "textColor": "#e2e8f0"
  }
}}%%
flowchart TD

    A[Tool Request] --> B[executeWithTokenRefresh]
    B --> C[retryFetchWithTimeout]

    C --> D[Send HTTP Request]

    D --> E{Request Successful?}

    E -->|Yes| F[Return Response ✅]

    E -->|No| G{Retryable Error?}

    G -->|No| H[Return Error ❌]

    G -->|Yes| I{Attempts Remaining?}

    I -->|No| H

    I -->|Yes| J[Calculate Exponential Backoff]

    J --> K[Wait Delay + Jitter]

    K --> D

    style F fill:#14532d,stroke:#22c55e,color:#fff
    style H fill:#451a1a,stroke:#ef4444,color:#fff
    style J fill:#172554,stroke:#3b82f6,color:#fff
    style K fill:#172554,stroke:#3b82f6,color:#fff
Loading

Error classification → retry decision

%%{init: {
  "theme": "base",
  "themeVariables": {
    "background": "#0b1020",
    "primaryColor": "#111827",
    "primaryBorderColor": "#334155",
    "primaryTextColor": "#f8fafc",
    "lineColor": "#64748b",
    "textColor": "#e2e8f0"
  }
}}%%
graph TD
    A[HTTP Request Fails]

    A --> B{Error Type}

    B --> C[Retryable Error]
    B --> D[Non-Retryable Error]
    B --> E[Authentication Error]

    C --> C1[FetchTimeoutError]
    C --> C2[PkgseerTransportError]
    C --> C3[HTTP 429]
    C --> C4[HTTP 5xx]

    D --> D1[HTTP 4xx except 429]
    D --> D2[ValidationError]
    D --> D3[AccessDenied]

    C1 --> F[RETRY ✅]
    C2 --> F
    C3 --> F
    C4 --> F

    D1 --> G[NO RETRY ❌]
    D2 --> G
    D3 --> G

    E --> H[Token Refresh 🔄]

    F --> I{Attempts < maxRetries?}

    I -->|Yes| J[Wait: baseDelay × 2^attempt × jitter]
    I -->|No| K[Throw Last Error]

    J --> A

    style F fill:#14532d,stroke:#22c55e,color:#fff
    style G fill:#451a1a,stroke:#ef4444,color:#fff
    style H fill:#172554,stroke:#3b82f6,color:#fff
    style K fill:#451a1a,stroke:#ef4444,color:#fff
Loading

Retry timing

Attempt 0: Immediate
Attempt 1: ~1000ms (base × 2^0 × jitter)
Attempt 2: ~2000ms (base × 2^1 × jitter)
Attempt 3: ~4000ms (base × 2^2 × jitter)
Max cap:   30000ms

Jitter adds randomness: delay × (0.5 + Math.random() × 0.5) — prevents thundering herd.


Idempotency

Request Type Idempotent? Retried? Reason
GraphQL queries (search, pkg_info, etc.) Yes Same query = same result
REST GET (languages) Yes Read-only
REST POST /search Yes Search is idempotent
REST POST /feedbacks No Would create duplicate submissions

The submitFeedback() method in githits-service.ts is intentionally not retried.


Security properties

Concern How it's handled
Retry bomb on non-idempotent endpoints submitFeedback() explicitly skips retry
Infinite retry loops Hard cap at maxRetries (default 3)
Thundering herd on recovered servers Jitter randomizes retry timing
Excessive wait times Delay capped at maxDelayMs (default 30s)
Auth errors retried unnecessarily AuthenticationError classified as non-retryable (handled by token refresh)
Client errors retried HTTP 4xx (except 429) classified as non-retryable
Sensitive data in telemetry Only error name logged, not message body

MCP error envelope

When a request fails after all retry attempts, the MCP error envelope includes retry metadata:

{
  "error": "Code navigation request timed out.",
  "code": "TIMEOUT",
  "retryable": true,
  "retryAttempts": 3,
  "retryAfter": 5000
}
  • retryAttempts — Number of attempts made (0 if no retry)
  • retryAfter — Suggested delay in milliseconds before retrying (from Retry-After header)

Configuration

Environment variables

# Increase retry attempts for unreliable networks
export GITHITS_RETRY_MAX=5

# Faster retries for low-latency requirements
export GITHITS_RETRY_BASE_DELAY_MS=500

# Disable jitter (not recommended)
export GITHITS_RETRY_JITTER=false

# Disable retry entirely
export GITHITS_RETRY_MAX=0

Service-level override

const service = new CodeNavigationServiceImpl(
  endpointUrl,
  tokenProvider,
  fetchFn,
  runtime,
  { maxRetries: 5, baseDelayMs: 2000, maxDelayMs: 60000, jitter: true },
);

Testing checklist

Unit tests (29 total, all passing)

# Run all retry-related tests
bun test packages/core-internal/src/shared/retry.test.ts
bun test packages/core-internal/src/shared/fetch-timeout.test.ts
Test file Tests Coverage
retry.test.ts 21 isRetryableError, calculateDelay, retryWithBackoff (success, retry, max retries, non-retryable, abort, onRetry callback)
fetch-timeout.test.ts 8 retryFetchWithTimeout (first attempt, retry on error, max retries, non-retryable, abort signal)

Type checking

bun run typecheck  # 0 errors

Backward compatibility

  • All existing callers work without changes (retry is opt-in via constructor params, defaults match previous behavior of 0 retries → now 3 retries)
  • submitFeedback() still does NOT retry
  • Token refresh still handles 401 separately (not affected by retry logic)
  • No breaking changes to public API surface

@codeW-Krish codeW-Krish changed the title Feat/add retry logic feat/add retry logic Jun 20, 2026
@codeW-Krish codeW-Krish changed the title feat/add retry logic feat(core-internal): Add configurable retry logic with exponential backoff Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No retry mechanism for transient HTTP failures

1 participant