feat(vast): add Vast.ai SSH lease provider#680
Conversation
Add the first-party Vast provider configuration surface with env-only API-key sourcing, safe config output, provider flags, built-in registration, and offline placeholder backend contracts for later lifecycle plans. Add focused tests for Vast config precedence, credential destination provenance, provider discovery, flag registration, validation, and secret redaction.
Add the provider-local Vast.ai HTTP client, request and response models, payload builders, redacted API error handling, and compact ownership label helpers for the upcoming lifecycle implementation. Cover the client contract with fake HTTP tests for auth, redirects, offer search, instance creation and management, SSH-key methods, response decoding, redaction, and label ownership safety.
Add the Vast SSH-lease backend for acquire, resolve, list, release, touch, doctor, and cleanup using generated per-lease keys and strict cbx1 ownership labels. Cleanup is guarded by local claims plus provider labels, release destroys by default, and lifecycle tests use a fake Vast API/readiness path without live credentials.
Use fileURLToPath for the Cloudflare Workers mock alias so Vitest resolves the local mock correctly from repository paths that contain spaces.
Respect the generic SSH user override when Vast provider defaults are applied, so provider-specific defaults cannot replace an operator-supplied SSH user. Add regression coverage for both the Vast backend config and embedded direct SSH backend config.
Keep cleanup enabled for possible partial Vast warmup failures, but do not let a pre-acquire not-found stop response convert a known capacity blocker into cleanup_failed. Add regression coverage for the no-lease cleanup path.
Keep config show from replacing an explicitly supplied generic SSH user with the Vast provider default. This keeps the displayed effective config aligned with the backend defaulting path.
Report the release action captured on the lease target instead of the current config, so stopped or kept Vast instances are not reported as destroyed. Also route cleanup of owned-looking instances without matching local claims through the safe missing-claim refusal path.
Give local fake-SSH probe tests the same headroom as other terminal-bound checks so transient scheduling delays do not fail unrelated provider verification.
Preserve persisted Vast release and key metadata when normal resolve refreshes local claim endpoints, so stop/keep leases do not drift back to destroy. Also allow status-only resolution for owned instances that do not yet expose an SSH endpoint.
Detach provider-owned Vast SSH keys before destroying the instance so normal release cleanup follows the same effective ordering as rollback. Add an order assertion to prevent the detach call from becoming a suppressed post-destroy no-op.
Let an explicitly supplied Vast release action override the persisted claim label during release and reporting. This keeps safety overrides such as --vast-release-action keep from being ignored on leases acquired with the default destroy policy.
Give controller subprocess tests more non-race scheduling headroom after repeated local failures in terminal-bound process lifecycle checks. This keeps provider verification from failing on unrelated launch-gate timing.
Encode Vast search and create requests using the documented API shapes, restore stored SSH keys on resolve, and keep stopped lease claims so retained instances can be reconciled or destroyed later. Also widen a race-instrumented Apple VZ helper test timeout encountered during the final verification gate.
|
Codex review: needs real behavior proof before merge. Reviewed June 25, 2026, 1:11 AM ET / 05:11 UTC. Summary Reproducibility: not applicable. this is a feature PR rather than a reproducible failure in existing behavior. The relevant proof path is a successful real Vast provider lifecycle, and the PR body/comments currently show only static checks plus environment-blocked smoke cases. Review metrics: 1 noteworthy metric.
Root-cause cluster Members:
Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land only after maintainers accept Vast as a built-in provider and the PR includes redacted successful live Vast lifecycle proof; otherwise keep #363 as the product-decision home. Do we have a high-confidence way to reproduce the issue? Not applicable; this is a feature PR rather than a reproducible failure in existing behavior. The relevant proof path is a successful real Vast provider lifecycle, and the PR body/comments currently show only static checks plus environment-blocked smoke cases. Is this the best way to solve the issue? Unclear for merge as-is: the adapter shape matches Crabbox's SSH-lease contract and static review found no blocking line-level defect, but the best landing path still needs maintainer acceptance of the built-in provider surface and successful live-provider proof. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against 0ec69d642764. Label changesLabel justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Drop unused Vast helper functions left behind after the pagination and release-lifecycle fixes. CI runs deadcode with test reachability, and these helpers were no longer referenced by the provider implementation or tests.
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
Reject query strings and fragments in the Vast API URL so configured provider endpoints cannot smuggle secret-bearing URL components or break path construction. Document the endpoint contract and cover query and fragment inputs in provider validation tests.
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
Summary
Closes #363
Verification
go test ./internal/providers/vast ./internal/providers/all ./internal/cli && go vet ./...go test -race ./...npm ci --prefix workernpm run format:check --prefix worker && npm test --prefix workernode scripts/generate-provider-matrix.mjs && node scripts/check-provider-matrix.mjs && scripts/check-docs.shbash -n scripts/live-vast-smoke.shCRABBOX_LIVE=0 CRABBOX_LIVE_PROVIDERS=vast scripts/live-vast-smoke.sh->classification=environment_blocked reason=CRABBOX_LIVE_not_enabledCRABBOX_LIVE=1 CRABBOX_LIVE_PROVIDERS=vast CRABBOX_VAST_API_KEY= VAST_API_KEY= scripts/live-vast-smoke.sh->classification=environment_blocked reason=VAST_API_KEY_missingnode --test scripts/live-vast-smoke.test.jsgit diff --check~/.agents/skills/_openclaw/autoreview/scripts/autoreview --engine codex --model gpt-5.5 --thinking high --mode branch --base origin/main-> clean, no accepted/actionable findings