Skip to content

RFC 0010: Add failure dispatch and metrics to flux-dispatch#3220

Open
arealmaas wants to merge 2 commits intomainfrom
arealmaas/flux-dispatch-rfc
Open

RFC 0010: Add failure dispatch and metrics to flux-dispatch#3220
arealmaas wants to merge 2 commits intomainfrom
arealmaas/flux-dispatch-rfc

Conversation

@arealmaas
Copy link
Copy Markdown
Contributor

Summary

Expands RFC 0010 (flux-reconcile-webhooks) to support failure-driven automation and clarifies observability strategy.

  • Failure dispatch support: Service now handles ReconciliationFailed, ValidationFailed, DependencyNotReady, and ArtifactFailed reasons, allowing products to trigger incident workflows or rollback automation via GitHub Actions when deployments fail.
  • Prometheus metrics: Service exposes dedicated metrics on port 9090 (dispatch counts, error rates, dedup hits, auth failures) — included directly in the service rather than via a separate sidecar, since Flux's built-in metrics lack visibility into outbound webhook delivery and deduplication.
  • Dedup refinement: Dedup key now includes reconciliation reason, ensuring that a failure and success for the same digest both trigger dispatch, while repeated failures are still deduplicated.

Products configure failure responses by adding a second Flux Alert with eventSeverity: error and a distinct dispatch_event, enabling flexible workflows for both success (e2e tests) and failure (incident response) scenarios.

🤖 Generated with Claude Code

Updates RFC 0010 to support dispatching on reconciliation failures (triggering incident workflows, rollback automation) and clarifies that the service will expose Prometheus metrics directly rather than relying solely on Flux's built-in observability. Failure dispatch allows products to respond automatically to deployment errors through GitHub Actions, while integrated metrics provide visibility into dispatch success rates, deduplication hits, and GitHub API errors that Flux cannot observe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@arealmaas arealmaas requested a review from a team as a code owner March 6, 2026 10:33
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 6, 2026

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • rfcs/0010-flux-reconcile-webhooks.md is excluded by !rfcs/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 21085ab0-16a2-426d-93b4-e69045fcb622

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch arealmaas/flux-dispatch-rfc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…rk policies

Address review findings: correct eventSeverity semantics (info forwards
all events, not just successes), add HMAC webhook verification from day
one via generic-hmac provider, add dispatch_repo format validation with
Altinn/ org restriction, add request body size limits and HTTP server
timeouts, cap dedup map at 10k entries, include dispatch_repo in dedup
key, document repository_dispatch default-branch constraint, and add
NetworkPolicies following existing operator patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sduranc
Copy link
Copy Markdown
Collaborator

sduranc commented Mar 16, 2026

Looks good in general, but I'm skeptical of just reusing notification.toolkit.fluxcd.io/v1beta3, mainly because it's still limited and we could just expose a much useful API for our purposes.

@sduranc
Copy link
Copy Markdown
Collaborator

sduranc commented Mar 16, 2026

For prior-art: there's the #studio-deploy channel which reuses the notification controller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants