RFC 0010: Add failure dispatch and metrics to flux-dispatch#3220
RFC 0010: Add failure dispatch and metrics to flux-dispatch#3220
Conversation
Updates RFC 0010 to support dispatching on reconciliation failures (triggering incident workflows, rollback automation) and clarifies that the service will expose Prometheus metrics directly rather than relying solely on Flux's built-in observability. Failure dispatch allows products to respond automatically to deployment errors through GitHub Actions, while integrated metrics provide visibility into dispatch success rates, deduplication hits, and GitHub API errors that Flux cannot observe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…rk policies Address review findings: correct eventSeverity semantics (info forwards all events, not just successes), add HMAC webhook verification from day one via generic-hmac provider, add dispatch_repo format validation with Altinn/ org restriction, add request body size limits and HTTP server timeouts, cap dedup map at 10k entries, include dispatch_repo in dedup key, document repository_dispatch default-branch constraint, and add NetworkPolicies following existing operator patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Looks good in general, but I'm skeptical of just reusing |
|
For prior-art: there's the #studio-deploy channel which reuses the notification controller |
Summary
Expands RFC 0010 (flux-reconcile-webhooks) to support failure-driven automation and clarifies observability strategy.
ReconciliationFailed,ValidationFailed,DependencyNotReady, andArtifactFailedreasons, allowing products to trigger incident workflows or rollback automation via GitHub Actions when deployments fail.reason, ensuring that a failure and success for the same digest both trigger dispatch, while repeated failures are still deduplicated.Products configure failure responses by adding a second Flux Alert with
eventSeverity: errorand a distinctdispatch_event, enabling flexible workflows for both success (e2e tests) and failure (incident response) scenarios.🤖 Generated with Claude Code