Skip to content

fix(worker): restrict Azure lease selectors#790

Merged
steipete merged 1 commit into
mainfrom
codex/issue785-azure-selector-auth
Jul 2, 2026
Merged

fix(worker): restrict Azure lease selectors#790
steipete merged 1 commit into
mainfrom
codex/issue785-azure-selector-auth

Conversation

@steipete

@steipete steipete commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • require admin authentication for brokered Azure image and OS-disk selectors
  • preserve non-admin Azure location selection for capacity routing
  • keep coordinator-managed defaults and direct-mode overrides unchanged
  • document the authorization boundary and add the Unreleased changelog entry

Closes #785

Verification

  • npm test --prefix worker -- test/fleet.test.ts (343 passed)
  • npm run format:check --prefix worker
  • npm run lint --prefix worker
  • npm run check --prefix worker
  • autoreview: clean, no accepted/actionable findings

The regression tests prove denial occurs before provider fetch, storage writes, or alarm scheduling; live Azure mutation cannot strengthen that preflight authorization proof.

@clawsweeper

clawsweeper Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed July 2, 2026, 4:46 PM ET / 20:46 UTC.

Summary
The PR adds an Azure provider restriction hook for brokered azureImage and azureOSDisk, extends fleet tests, and updates Azure provider docs plus the changelog.

Reproducibility: yes. Source inspection on current main shows non-admin broker requests can supply azureImage and azureOSDisk because Azure lacks the provider restriction hook; I did not run a live Azure broker request.

Review metrics: 2 noteworthy metrics.

  • Restricted Azure Selectors: 2 added. The authorization change is narrowly scoped to image and OS-disk selectors while leaving Azure location unchanged.
  • Changed Surface: 4 files affected. The PR touches Worker code, Worker tests, Azure docs, and one release-owned changelog file.

Root-cause cluster
Relationship: fixed_by_candidate
Canonical: #785
Summary: This PR is the candidate fix for the linked Azure selector authorization issue.

Members:

Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted terminal or live Worker output showing non-admin azureImage and azureOSDisk requests return 403 admin_required while azureLocation still follows the expected non-admin path.
  • Remove the CHANGELOG.md edit and keep release-note context in the PR body or commit message.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body lists tests, lint, typecheck, and autoreview only; it needs redacted live or terminal output showing the after-fix broker response before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] Existing non-admin broker clients that set azureImage or azureOSDisk will start receiving 403 admin_required; that is likely the right security posture, but it is an intentional compatibility break.
  • [P1] The PR proof is mock/test-only; this external PR still needs redacted terminal or live Worker output showing the after-fix broker response before merge.
  • [P1] The branch edits CHANGELOG.md, which is release-owned in this review workflow and should be moved to PR-body or commit-message release-note context.

Maintainer options:

  1. Accept The Fail-Closed Security Break (recommended)
    Maintainers can intentionally land the 403 admin_required behavior for non-admin Azure image and OS-disk overrides because those selectors cross the coordinator/provider authority boundary.
  2. Add An Upgrade Grace Path
    If existing non-admin Azure custom-image workflows must keep working temporarily, define an explicit migration or opt-in admin path before merge.
  3. Pause For Selector Policy
    If azureImage or azureOSDisk should remain user-selectable, pause this PR and reopen the authorization policy on the linked security issue.

Next step before merge

  • [P1] Needs human merge handling for a security-boundary compatibility break, missing real behavior proof, and the release-owned changelog edit; there is no safe automated repair path while proof is absent.

Security
Cleared: Cleared: the diff narrows the Azure broker authorization boundary and does not add dependencies, workflow code, secret handling, or supply-chain surface.

Review findings

  • [P3] Remove the release-owned changelog edit — CHANGELOG.md:5-7
Review details

Best possible solution:

Land the Azure provider-hook fix after maintainer approval of the fail-closed selector policy, remove the release-owned changelog edit, and add redacted real behavior proof for the broker response.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows non-admin broker requests can supply azureImage and azureOSDisk because Azure lacks the provider restriction hook; I did not run a live Azure broker request.

Is this the best way to solve the issue?

Yes for azureImage and azureOSDisk: adding the Azure hook is the narrow provider-adapter fix and the tests cover denial, admin bypass, defaults, and location preservation. The remaining merge decision is whether maintainers accept the compatibility break and require real proof before landing.

Full review comments:

  • [P3] Remove the release-owned changelog edit — CHANGELOG.md:5-7
    This PR adds an Unreleased entry, but CHANGELOG.md is release-owned for normal PRs in this review workflow. Please keep the release-note context in the PR body or commit message so release prep owns the changelog update.
    Confidence: 0.84

Overall correctness: patch is correct
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against ab1c29f3fd40.

Label changes

Label changes:

  • add P0: This PR addresses a reported non-admin authorization bypass over privileged Azure provider-resource selection.
  • add merge-risk: 🚨 compatibility: Merging intentionally changes existing non-admin broker requests with azureImage or azureOSDisk from accepted to 403 admin_required.
  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists tests, lint, typecheck, and autoreview only; it needs redacted live or terminal output showing the after-fix broker response before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Label justifications:

  • P0: This PR addresses a reported non-admin authorization bypass over privileged Azure provider-resource selection.
  • merge-risk: 🚨 compatibility: Merging intentionally changes existing non-admin broker requests with azureImage or azureOSDisk from accepted to 403 admin_required.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists tests, lint, typecheck, and autoreview only; it needs redacted live or terminal output showing the after-fix broker response before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was read fully; its provider-neutral boundary guidance applies because the PR keeps Azure-specific selector policy inside the Azure provider adapter instead of adding provider-name branching in core. (AGENTS.md:13, ab1c29f3fd40)
  • Current main non-admin selector gate: createLease calls the selected provider's optional restrictedLeaseRequestFields hook before provisioning and returns 403 admin_required when the hook reports caller-supplied restricted fields. (worker/src/fleet.ts:1705, ab1c29f3fd40)
  • Azure gap on current main: Current AzureProvider has no restriction hook, while nearby GCP and AWS providers implement the hook for protected broker selectors. (worker/src/fleet.ts:15825, ab1c29f3fd40)
  • Selector dataflow on current main: leaseConfig preserves input.azureImage and input.azureOSDisk, and Azure VM creation consumes them through imageForConfig, OS-disk validation, and VM-size candidate selection. (worker/src/config.ts:133, ab1c29f3fd40)
  • PR implementation and regression coverage: The PR adds AzureProvider.restrictedLeaseRequestFields for azureImage and azureOSDisk, adds those cases to the restricted selector table, preserves coordinator defaults, and covers non-admin azureLocation selection. (worker/src/fleet.ts:15840, 58368a77fefd)
  • Linked canonical security issue: The PR body closes [security] Azure broker lease selectors bypass non-admin cloud-selector restrictions #785, whose report documents the same Azure non-admin selector bypass and remains open until a fix merges.

Likely related people:

  • steipete: Authored the broker cloud-selector restriction hook and AWS/GCP selector tests/docs that this PR extends to Azure. (role: recent security gate contributor; confidence: high; commits: 1422d052da07, 8575bc26368f, ef898c27570a; files: worker/src/fleet.ts, worker/test/fleet.test.ts, docs/providers/aws.md)
  • jwmoss: Introduced Azure broker/provider support and later Azure OS-disk behavior that is directly affected by the restricted azureOSDisk selector. (role: Azure provider feature contributor; confidence: high; commits: 00725544c767, 562dba5ca4f6, 389c696cb578; files: worker/src/azure.ts, worker/src/config.ts, worker/src/fleet.ts)
  • vincentkoc: Recently changed Azure image defaults, Windows ARM64 image handling, and related worker/docs surfaces touched by this authorization boundary. (role: recent Azure area contributor; confidence: medium; commits: 2c89d8345933, 0301dbad4ae5; files: worker/src/config.ts, worker/src/fleet.ts, docs/providers/azure.md)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58368a77fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread worker/src/fleet.ts
}

restrictedLeaseRequestFields(input: LeaseRequest): string[] {
return [input.azureImage ? "azureImage" : "", input.azureOSDisk ? "azureOSDisk" : ""].filter(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect empty Azure OS-disk overrides

When the coordinator default is CRABBOX_AZURE_OS_DISK=ephemeral, a non-admin broker caller can send "azureOSDisk":""; this truthiness check reports no restricted fields, but leaseConfig still treats the present empty string as the selected value and normalizes it to managed, so the caller silently overrides the operator-managed OS-disk policy without admin auth. Check field presence (or trim/nullish semantics) rather than truthiness for azureOSDisk.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P0 Emergency: data loss, security bypass, crash loop, or unusable core runtime. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels Jul 2, 2026
@steipete steipete merged commit cf2ace0 into main Jul 2, 2026
11 checks passed
@steipete steipete deleted the codex/issue785-azure-selector-auth branch July 2, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P0 Emergency: data loss, security bypass, crash loop, or unusable core runtime. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[security] Azure broker lease selectors bypass non-admin cloud-selector restrictions

1 participant