Skip to content

Email retry has no exponential backoff or jitter — same job hammered repeatedly #927

Description

@hman38705

Labels: reliability, email

Priority: Medium

Description

src/email/queue.rs retries failed email jobs without any delay between attempts. Under a sustained SendGrid outage, the worker will hammer the API as fast as possible, wasting resources and worsening the outage impact.

Acceptance Criteria

  • Implement exponential backoff: delay = min(initial_delay * 2^attempt, max_delay) + random jitter
  • Store next_attempt_at in the Redis job value and only dequeue jobs whose next_attempt_at <= now()
  • Add configuration variables: EMAIL_RETRY_INITIAL_DELAY_SECS (default 30), EMAIL_RETRY_MAX_DELAY_SECS (default 3600)
  • Add a test verifying retry delays increase exponentially across three attempts

Metadata

Metadata

Labels

Stellar WaveIssues in the Stellar wave programemailEmail service and queuereliabilityResilience, recovery, and uptime

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions