feat(fal): add SSH lease provider#693
Conversation
Add fal provider registration, env-only credential loading, non-secret configuration, and a schema-backed Compute API client skeleton. Keep fal lifecycle support out of the advertised run surface until the PLAN-02 backend wires acquire/list/release behavior.
Restore PLAN-01's advertised fal surface to an SSH lease provider with ssh, crabbox-sync, and cleanup capabilities. Keep lifecycle behavior deferred behind explicit PLAN-02 errors so discovery matches the plan without silently performing unsupported resource operations.
Implement fal Compute lease acquire, resolve, list, touch, release, and cleanup flows with local-claim ownership checks.\n\nAdd offline lifecycle tests for rollback, recovery claims, status-only resolve, persisted SSH endpoints, and destructive-operation safeguards.
Document the direct fal Compute SSH lease provider, add provider matrix metadata, and regenerate the provider category surfaces. Add an opt-in live smoke script with no-live defaults, credential gating, classified external blockers, redaction, cleanup attempts, and dispatcher coverage.
Build the acquired fal lease target after the SSH readiness probe so fallback port discovery is reflected in the returned lease as well as the persisted claim. Add regression coverage for a configured SSH port corrected by the readiness probe.
Retry ambiguous fal Compute creates with the same lease idempotency key before proceeding so a recoverable provider id is required for local claim ownership. Avoid persisting empty-provider-id recovery claims when idempotent reconciliation cannot return a fal instance id, and cover both retry success and retry failure paths.
Check claimed fal instances for provider absence before TTL eligibility so cleanup removes stale local claims for instances deleted outside Crabbox. Add regression coverage for a non-expired local claim whose fal instance is already gone.
|
ClawSweeper review: did not complete due to Codex infrastructure failure. Reviewed June 25, 2026, 5:11 PM ET / 21:11 UTC. Summary Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Merge readiness This is a ClawSweeper/Codex infrastructure failure, not a PR readiness or patch-quality verdict. Risk before merge
Maintainer options:
Next step before merge
Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model internal, reasoning high; reviewed against 184979d683b7. Label changesLabel changes:
Evidence reviewedWhat I checked:
Likely related people:
How this review workflow works
|
Drop the unused fal inventoryDoctorResult wrapper so the CI deadcode gate passes.
Give the Go CI job enough wall-clock budget to finish the full vet, deadcode, race, all-modules, coverage, and build gauntlet. The previous 15-minute cap canceled the coverage step after the preceding checks had already passed.
|
Context on the Go CI timeout update in The first red Go run was a real static-analysis issue, not runtime: After that fix, the next Go run passed the earlier gates: formatting, To verify the diagnosis before changing CI, I ran the same coverage command locally. It completed successfully with Result: the replacement GitHub Actions run passed: https://github.com/openclaw/crabbox/actions/runs/28198853668. The Go job passed in 14m17s, including Deadcode, Test, Test all Go modules, Coverage, and Build: https://github.com/openclaw/crabbox/actions/runs/28198853668/job/83532759285. |
Closes #694
Summary
Adds a direct fal Compute SSH lease provider with Crabbox-managed SSH and sync support, local-claim-owned cleanup, and the
fal-aialias.This also adds fal provider docs, provider matrix metadata, benchmark category generation, and an opt-in guarded live smoke script that defaults to no live provider mutation unless
CRABBOX_LIVE=1and fal is selected.Lifecycle Safety
Idempotency-Keyheader with the Crabbox lease ID for creates.--keepexplicitly owns a failed-acquire recovery claim.Verification
gofmt -w $(git ls-files '*.go') && git diff --checkgo test ./internal/providers/falbash -n scripts/live-fal-smoke.shnode scripts/generate-provider-matrix.mjs --checkscripts/check-docs.shnode --test scripts/live-fal-smoke.test.js scripts/live-smoke.test.jsCRABBOX_LIVE= CRABBOX_LIVE_PROVIDERS= FAL_KEY= CRABBOX_FAL_KEY= scripts/live-fal-smoke.shgo build -trimpath -o bin/crabbox ./cmd/crabboxcrabbox providers --json,doctor --provider fal --json,list --provider fal --json, andcleanup --provider fal --dry-runwith blank fal credentialsgo vet ./...go run golang.org/x/tools/cmd/deadcode@v0.45.0 -test ./...scripts/check-go-coverage.sh 90.0go test -race ./...autoreview --mode branch --base origin/main: clean, no accepted/actionable findingsThe local no-live smoke returned
classification=environment_blocked reason=CRABBOX_LIVE_not_enabled, so no live fal resources were created during local verification.