feat(fal): add SSH lease provider#693
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed July 4, 2026, 8:51 AM ET / 12:51 UTC. Summary Reproducibility: yes. from source, but not from a live fal account. The PR defaults fal SSH to Review metrics: 2 noteworthy metrics.
Root-cause cluster Members:
Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land the fal provider after defaulting or deriving the correct SSH user, removing release-owned changelog churn, and adding redacted authenticated proof for doctor, create, SSH/run, stop, and cleanup. Do we have a high-confidence way to reproduce the issue? Yes from source, but not from a live fal account. The PR defaults fal SSH to Is this the best way to solve the issue? No. The direct SSH lease provider direction fits Crabbox, but the current PR needs the default-user fix, release-note cleanup, and authenticated live lifecycle proof before it is the best merge path. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against c5bc10cc058e. Label changesLabel justifications:
Evidence reviewedAcceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
Review history (3 earlier review cycles)
|
|
Context on the Go CI timeout update in The first red Go run was a real static-analysis issue, not runtime: After that fix, the next Go run passed the earlier gates: formatting, To verify the diagnosis before changing CI, I ran the same coverage command locally. It completed successfully with Result: the replacement GitHub Actions run passed: https://github.com/openclaw/crabbox/actions/runs/28198853668. The Go job passed in 14m17s, including Deadcode, Test, Test all Go modules, Coverage, and Build: https://github.com/openclaw/crabbox/actions/runs/28198853668/job/83532759285. |
605caed to
d9e9bd3
Compare
bfd0e56 to
a6d24f5
Compare
Add fal provider registration, env-only credential loading, non-secret configuration, and a schema-backed Compute API client skeleton. Keep fal lifecycle support out of the advertised run surface until the PLAN-02 backend wires acquire/list/release behavior.
Restore PLAN-01's advertised fal surface to an SSH lease provider with ssh, crabbox-sync, and cleanup capabilities. Keep lifecycle behavior deferred behind explicit PLAN-02 errors so discovery matches the plan without silently performing unsupported resource operations.
Implement fal Compute lease acquire, resolve, list, touch, release, and cleanup flows with local-claim ownership checks.\n\nAdd offline lifecycle tests for rollback, recovery claims, status-only resolve, persisted SSH endpoints, and destructive-operation safeguards.
Document the direct fal Compute SSH lease provider, add provider matrix metadata, and regenerate the provider category surfaces. Add an opt-in live smoke script with no-live defaults, credential gating, classified external blockers, redaction, cleanup attempts, and dispatcher coverage.
Build the acquired fal lease target after the SSH readiness probe so fallback port discovery is reflected in the returned lease as well as the persisted claim. Add regression coverage for a configured SSH port corrected by the readiness probe.
Retry ambiguous fal Compute creates with the same lease idempotency key before proceeding so a recoverable provider id is required for local claim ownership. Avoid persisting empty-provider-id recovery claims when idempotent reconciliation cannot return a fal instance id, and cover both retry success and retry failure paths.
Drop the unused fal inventoryDoctorResult wrapper so the CI deadcode gate passes.
a6d24f5 to
5a030ef
Compare
Closes #694
Summary
Adds a direct fal Compute SSH lease provider with Crabbox-managed SSH and sync support, local-claim-owned cleanup, and the
fal-aialias.This also adds fal provider docs, provider matrix metadata, benchmark category generation, and an opt-in guarded live smoke script that defaults to no live provider mutation unless
CRABBOX_LIVE=1and fal is selected.Lifecycle Safety
Idempotency-Keyheader with the Crabbox lease ID for creates.--keepexplicitly owns a failed-acquire recovery claim.Verification
14dbee97e4b0c5ed215d3e13552a1de3ace4d48cgofmt -w internal/providers/fal/backend.go internal/providers/fal/backend_test.go && git diff --checkgo test -race ./internal/providers/fal ./internal/cli -run 'Fal|fal' -count=1bash -n scripts/live-fal-smoke.shnode scripts/generate-provider-matrix.mjs --checkscripts/check-docs.shnode --test scripts/live-fal-smoke.test.js scripts/live-smoke.test.jsCRABBOX_LIVE= CRABBOX_LIVE_PROVIDERS= FAL_KEY= CRABBOX_FAL_KEY= scripts/live-fal-smoke.shgo build -trimpath -o bin/crabbox ./cmd/crabboxcrabbox providers --json,doctor --provider fal --json,list --provider fal --json, andcleanup --provider fal --dry-runwith blank fal credentialsgo vet ./...go test -race -timeout 8m ./...10/10after a single unrelated full-suite load missautoreview --mode branch --base origin/main: clean, no accepted/actionable findingsThe rebased candidate also fixes the default 1× H100 create path: it no longer sends the 8×-only
sectorfield, even when stale configuration supplies one.Live proof gate
Authenticated create/use/destroy proof is still required before merge. Targeted 1Password lookups found no fal credential record, and neither
CRABBOX_FAL_KEYnorFAL_KEYis set, so the exact candidate could not make an authorized fal API call. No live fal resources were created.