Summary
Add fal as a built-in Crabbox provider backed by fal.ai Compute. The first implementation should target fal Compute dedicated GPU instances as a direct SSH lease provider, not fal Model APIs or fal Serverless, because Crabbox providers need to run arbitrary repo commands and fal Compute exposes full SSH access to Linux GPU machines.
Why
Crabbox already supports direct cloud and GPU-oriented SSH lease providers such as Lambda Cloud, RunPod, Nebius, and NVIDIA Brev. fal.ai Compute fits the same Crabbox execution model:
- fal Compute provides dedicated GPU instances that stay running under the user's control.
- The documented flow provisions an instance, accepts an SSH public key, waits for the instance to become ready, and connects as
ubuntu@<instance-ip>.
- fal exposes Platform APIs for compute instance create, get, list, and delete.
- Instance types documented today are
1xH100-SXM and 8xH100-SXM.
This would let users run:
FAL_KEY=... crabbox run --provider fal -- nvidia-smi
FAL_KEY=... crabbox warmup --provider fal --keep --slug fal-h100
FAL_KEY=... crabbox ssh --provider fal fal-h100
FAL_KEY=... crabbox stop --provider fal fal-h100
Provider shape
Implement fal Compute as an SSH lease provider:
- Provider name:
fal
- Optional alias:
fal-ai
- Kind:
ProviderKindSSHLease
- Family:
fal
- Targets: Linux only
- Initial features:
- Coordinator:
CoordinatorNever
Do not route this through the Worker coordinator initially. Crabbox's coordinator provider registry currently only brokers aws, azure, gcp, and hetzner, and the existing direct GPU providers are CLI-owned.
Non-goals for the first PR
- Do not implement fal Model API inference as the Crabbox provider path. Model APIs run endpoint-specific JSON requests and do not provide an SSH host for arbitrary command execution.
- Do not implement fal Serverless deployment or
fal run as a delegated-run backend in the first PR. Serverless owns the app container and endpoint contract, which is a different execution model from a Crabbox SSH lease.
- Do not add coordinator support until there is a separate design for storing fal credentials and lifecycle state in the broker.
- Do not expose
FAL_KEY through CLI flags or persisted config output.
fal.ai docs evidence
Source analyzed: fal-docs.ai/2026-06-24_fal_ai_combined.md.
Relevant docs:
https://fal.ai/docs/documentation/compute
- fal Compute is dedicated GPU infrastructure with full SSH access.
- Compute instances are intended for training, fine-tuning, batch jobs, and workloads needing sustained hardware access.
- Documented instances include
1xH100-SXM and 8xH100-SXM.
https://fal.ai/docs/documentation/compute/quickstart
- Users paste an SSH public key at creation time.
- Ready state usually takes 2 to 3 minutes.
- SSH example uses
ubuntu@<instance-ip>.
https://fal.ai/docs/api-reference/platform-apis
- Platform APIs list create, get, list, and delete operations for Compute instances.
https://fal.ai/docs/documentation/setting-up/authentication
- fal keys are read from
FAL_KEY.
- Raw API examples use
Authorization: Key $FAL_KEY.
- API keys are account or team scoped.
Additional docs were reviewed for Model APIs, queues, uploads, concurrency, retries, and webhooks. Those are useful context for future fal inference features, but not the best fit for this Crabbox provider because they do not expose a normal SSH lease.
Crabbox codebase analysis
Relevant Crabbox provider architecture:
internal/cli/provider_backend.go
- Defines
Provider, ProviderSpec, ProviderKindSSHLease, FeatureSSH, FeatureCrabboxSync, FeatureCleanup, and SSHLeaseBackend.
- Core owns provider selection, sync, command execution, claims, status rendering, and release workflows.
docs/provider-backends.md and docs/features/provider-authoring.md
- New SSH providers should return a usable
LeaseTarget.SSH and let core handle rsync, command execution, results, and release.
internal/providers/all/all.go
- Built-in providers register through side-effect imports.
internal/providers/lambda
- Good pattern for a direct GPU cloud SSH provider with an HTTP API client, per-lease SSH key handling, readiness polling, cleanup, and doctor checks.
internal/providers/runpod
- Good pattern for direct GPU provider lifecycle with public SSH coordinates and REST API polling.
docs/providers/provider-metadata.json and docs/providers/README.md
- Provider metadata and the generated provider matrix must be kept in sync.
scripts/generate-provider-matrix.mjs and scripts/check-provider-matrix.mjs
- Docs checks fail when provider registration, metadata, docs paths, or the generated matrix drift.
Proposed implementation
1. Add provider package
Create:
internal/providers/fal/
provider.go
backend.go
client.go
config.go
flags.go
doctor.go
*_test.go
Provider spec:
func (Provider) Spec() core.ProviderSpec {
return core.ProviderSpec{
Name: "fal",
Family: "fal",
Kind: core.ProviderKindSSHLease,
Targets: []core.TargetSpec{{OS: core.TargetLinux}},
Features: core.FeatureSet{core.FeatureSSH, core.FeatureCrabboxSync, core.FeatureCleanup},
Coordinator: core.CoordinatorNever,
}
}
Register it in internal/providers/all/all.go.
2. Add fal config and flags
Add config fields consistent with direct providers:
provider: fal
fal:
instanceType: 1xH100-SXM
sector: default
user: ubuntu
apiURL: ""
Suggested env vars:
FAL_KEY, primary because fal docs and SDKs use it.
CRABBOX_FAL_KEY, optional Crabbox-prefixed override for consistency with other providers.
CRABBOX_FAL_INSTANCE_TYPE
CRABBOX_FAL_SECTOR
CRABBOX_FAL_API_URL, for tests and enterprise/private endpoints if needed.
Suggested flags:
--fal-instance-type
--fal-sector
--fal-user
--fal-api-url
Security requirements:
- Do not accept the API key via CLI flag.
- Do not print the API key in doctor, config show, errors, debug output, or tests.
- Redact fal credentials the same way
LAMBDA_API_KEY and RUNPOD_API_KEY are handled.
3. Implement a small fal Compute API client
Use Go's net/http, following the direct provider client style. The client should:
- Read the key from config/env.
- Send
Authorization: Key <token>.
- Set
Accept: application/json.
- Use
Content-Type: application/json for writes.
- Limit response bodies.
- Reject cross-origin redirects unless the configured API URL is a trusted loopback test URL.
- Decode structured error bodies when available and include HTTP status, fal request ID, and safe error type details.
Needed API operations:
- Create compute instance with:
- instance type,
- sector,
- SSH public key,
- Crabbox-owned name or labels if the API supports them.
- Get compute instance by ID.
- List compute instances.
- Delete compute instance by ID.
Important implementation note: verify exact REST paths and request/response schemas from fal's OpenAPI before coding. The combined docs list the operations, but not all operation payload schemas are present in the local bundle.
4. Implement SSH lease lifecycle
Acquire should:
- Generate a Crabbox lease ID and direct lease slug.
- Generate or prepare a Crabbox-managed SSH key for the lease, matching patterns from existing direct SSH providers.
- Create the fal Compute instance with the public key.
- Poll the fal instance until it has:
- terminal non-error lifecycle state,
- public SSH host or IP,
- SSH user, defaulting to
ubuntu if the API does not return one.
- Wait for SSH readiness.
- Return
LeaseTarget with Server, SSHTarget, and LeaseID.
- Claim the lease for the repo.
- If creation or readiness fails and
--keep was not set, delete the partially created instance.
Resolve should:
- Resolve by Crabbox lease ID, slug, or provider instance ID when possible.
- Rebuild
LeaseTarget.SSH from fal instance state.
- Preserve existing local claim behavior.
List should:
- List fal Compute instances owned or recognizable by Crabbox.
- Map them to normalized
LeaseView records.
- Include provider instance ID, instance type, region/sector, state, public IP, and created time when available.
ReleaseLease should:
- Delete the fal Compute instance.
- Remove the local lease claim after successful delete.
- Treat already-deleted/not-found responses as successful cleanup when safe.
Touch should:
- Reuse Crabbox's direct lease label/touch conventions where possible.
Cleanup should:
- Provide dry-run and mutating cleanup for orphaned Crabbox-owned fal instances.
- Never delete instances that are not clearly owned by Crabbox.
- Include a conservative ownership rule if fal labels/tags are unavailable.
Doctor should:
- Verify
FAL_KEY or CRABBOX_FAL_KEY is present.
- Call a read-only endpoint such as list instances or account/billing readiness.
- Report auth, Compute access, inventory readability, and default config.
- Avoid creating any paid resource.
5. Add docs and provider matrix metadata
Add or update:
docs/providers/fal.md
docs/providers/provider-metadata.json
docs/providers/README.md, generated by provider matrix script
docs/source-map.md, if the provider list there is kept manually
README.md, if the provider table is generated or expected to include the new provider
Suggested provider metadata:
"fal": {
"status": "built-in",
"category": "gpu-cloud",
"substrate": "fal Compute H100 instance",
"location": "cloud",
"ssh": "crabbox-managed",
"sync": "crabbox-sync",
"gpu": "yes",
"lifecycle": "Crabbox",
"cleanup": "instance delete",
"bestFit": "Dedicated H100 GPU workloads over SSH",
"caveat": "Requires fal Compute access, FAL_KEY, quota, and available H100 capacity",
"docs": "fal.md"
}
6. Add tests
Unit tests should use fake HTTP servers and no live fal credentials.
Suggested tests:
- Provider registration:
ProviderFor("fal") resolves.
- Optional alias
fal-ai resolves if added.
- Spec is Linux-only SSH lease, direct only, with
ssh, crabbox-sync, and cleanup.
- Provider exposes
DoctorProvider.
- Config:
FAL_KEY and CRABBOX_FAL_KEY loading.
config show does not leak token or token env names.
- Invalid API URL is rejected.
- HTTPS required except trusted loopback test URL.
- Client:
- Sends
Authorization: Key <token>.
- Handles create/get/list/delete.
- Handles 401/403 with clear auth errors.
- Handles 404 delete idempotently where intended.
- Limits response bodies.
- Rejects cross-origin redirects.
- Backend:
- Acquire creates instance, waits ready, waits SSH, claims lease.
- Acquire rolls back on readiness failure when not kept.
- Resolve reconstructs SSH target.
- List maps fal instances to
LeaseView.
- Release deletes instance and removes claim.
- Cleanup does not delete unowned instances.
- Docs:
- Provider matrix check passes.
- Provider docs path exists.
7. Optional live smoke
Add a guarded live smoke only if credentials and Compute access are available:
CRABBOX_LIVE=1 CRABBOX_LIVE_PROVIDERS=fal FAL_KEY=... scripts/live-fal-smoke.sh
The live smoke should:
- Run
doctor --provider fal.
- Assert starting inventory is safe or isolated by Crabbox-owned names.
warmup --provider fal --keep --slug fal-live-....
status --provider fal --wait.
run --provider fal --id <lease> --no-sync -- nvidia-smi.
stop --provider fal <lease>.
- Run cleanup dry-run and final inventory check.
The script must skip cleanly without CRABBOX_LIVE=1, without fal selected in CRABBOX_LIVE_PROVIDERS, or without FAL_KEY.
Acceptance criteria
crabbox providers lists fal as a built-in GPU cloud SSH lease provider.
crabbox doctor --provider fal verifies credentials and Compute access without creating resources.
crabbox warmup --provider fal --keep provisions a fal Compute instance and returns a reusable lease.
crabbox run --provider fal -- nvidia-smi runs on the fal Compute instance over Crabbox-managed SSH.
crabbox ssh --provider fal <id-or-slug> opens an SSH session.
crabbox list --provider fal --json returns normalized lease data without leaking credentials.
crabbox stop --provider fal <id-or-slug> deletes the fal Compute instance and removes the local claim.
- Failed acquire rolls back the fal instance unless
--keep was requested.
- Cleanup only touches instances Crabbox can prove it owns.
- Docs, provider matrix, and generated provider listings are in sync.
- No fal credential appears in command output, config output, logs, tests, or docs.
Validation checklist
Before opening the PR:
gofmt -w $(git ls-files '*.go')
go vet ./...
go test -race ./...
scripts/check-docs.sh
If Worker files are touched for coordinator support, also run:
npm ci --prefix worker
npm run format:check --prefix worker
npm run lint --prefix worker
npm run check --prefix worker
npm test --prefix worker
npm run build --prefix worker
Suggested PR plan
- Create a local branch, for example
feat/fal-provider.
- Land the provider package, config, and tests in small commits.
- Add docs and regenerate the provider matrix.
- Run the validation checklist.
- If live credentials are available, run the guarded live smoke and include the result in the PR.
- Submit this issue and the PR together, linking both by full GitHub URLs.
Open questions
- What are the exact fal Platform API base URL, paths, and request/response schemas for Compute instance lifecycle?
- Does fal Compute expose labels/tags or user metadata that Crabbox can use for safe cleanup ownership?
- Does create support caller-provided names, sectors, and SSH keys directly, or does any of that require prior setup?
- What states should be treated as provisioning, ready, terminal failure, and deleted?
- Are Compute API calls available with normal
API scoped keys, or is ADMIN scope required?
- Are there per-team or enterprise constraints that should be surfaced in
doctor?
- Should
fal-ai be accepted as an alias, or should the provider follow newer Crabbox guidance and use only canonical fal?
Summary
Add
falas a built-in Crabbox provider backed by fal.ai Compute. The first implementation should target fal Compute dedicated GPU instances as a direct SSH lease provider, not fal Model APIs or fal Serverless, because Crabbox providers need to run arbitrary repo commands and fal Compute exposes full SSH access to Linux GPU machines.Why
Crabbox already supports direct cloud and GPU-oriented SSH lease providers such as Lambda Cloud, RunPod, Nebius, and NVIDIA Brev. fal.ai Compute fits the same Crabbox execution model:
ubuntu@<instance-ip>.1xH100-SXMand8xH100-SXM.This would let users run:
Provider shape
Implement fal Compute as an SSH lease provider:
falfal-aiProviderKindSSHLeasefalsshcrabbox-synccleanupCoordinatorNeverDo not route this through the Worker coordinator initially. Crabbox's coordinator provider registry currently only brokers
aws,azure,gcp, andhetzner, and the existing direct GPU providers are CLI-owned.Non-goals for the first PR
fal runas a delegated-run backend in the first PR. Serverless owns the app container and endpoint contract, which is a different execution model from a Crabbox SSH lease.FAL_KEYthrough CLI flags or persisted config output.fal.ai docs evidence
Source analyzed:
fal-docs.ai/2026-06-24_fal_ai_combined.md.Relevant docs:
https://fal.ai/docs/documentation/compute1xH100-SXMand8xH100-SXM.https://fal.ai/docs/documentation/compute/quickstartubuntu@<instance-ip>.https://fal.ai/docs/api-reference/platform-apishttps://fal.ai/docs/documentation/setting-up/authenticationFAL_KEY.Authorization: Key $FAL_KEY.Additional docs were reviewed for Model APIs, queues, uploads, concurrency, retries, and webhooks. Those are useful context for future fal inference features, but not the best fit for this Crabbox provider because they do not expose a normal SSH lease.
Crabbox codebase analysis
Relevant Crabbox provider architecture:
internal/cli/provider_backend.goProvider,ProviderSpec,ProviderKindSSHLease,FeatureSSH,FeatureCrabboxSync,FeatureCleanup, andSSHLeaseBackend.docs/provider-backends.mdanddocs/features/provider-authoring.mdLeaseTarget.SSHand let core handle rsync, command execution, results, and release.internal/providers/all/all.gointernal/providers/lambdainternal/providers/runpoddocs/providers/provider-metadata.jsonanddocs/providers/README.mdscripts/generate-provider-matrix.mjsandscripts/check-provider-matrix.mjsProposed implementation
1. Add provider package
Create:
Provider spec:
Register it in
internal/providers/all/all.go.2. Add fal config and flags
Add config fields consistent with direct providers:
Suggested env vars:
FAL_KEY, primary because fal docs and SDKs use it.CRABBOX_FAL_KEY, optional Crabbox-prefixed override for consistency with other providers.CRABBOX_FAL_INSTANCE_TYPECRABBOX_FAL_SECTORCRABBOX_FAL_API_URL, for tests and enterprise/private endpoints if needed.Suggested flags:
--fal-instance-type--fal-sector--fal-user--fal-api-urlSecurity requirements:
LAMBDA_API_KEYandRUNPOD_API_KEYare handled.3. Implement a small fal Compute API client
Use Go's
net/http, following the direct provider client style. The client should:Authorization: Key <token>.Accept: application/json.Content-Type: application/jsonfor writes.Needed API operations:
Important implementation note: verify exact REST paths and request/response schemas from fal's OpenAPI before coding. The combined docs list the operations, but not all operation payload schemas are present in the local bundle.
4. Implement SSH lease lifecycle
Acquireshould:ubuntuif the API does not return one.LeaseTargetwithServer,SSHTarget, andLeaseID.--keepwas not set, delete the partially created instance.Resolveshould:LeaseTarget.SSHfrom fal instance state.Listshould:LeaseViewrecords.ReleaseLeaseshould:Touchshould:Cleanupshould:Doctorshould:FAL_KEYorCRABBOX_FAL_KEYis present.5. Add docs and provider matrix metadata
Add or update:
docs/providers/fal.mddocs/providers/provider-metadata.jsondocs/providers/README.md, generated by provider matrix scriptdocs/source-map.md, if the provider list there is kept manuallyREADME.md, if the provider table is generated or expected to include the new providerSuggested provider metadata:
6. Add tests
Unit tests should use fake HTTP servers and no live fal credentials.
Suggested tests:
ProviderFor("fal")resolves.fal-airesolves if added.ssh,crabbox-sync, andcleanup.DoctorProvider.FAL_KEYandCRABBOX_FAL_KEYloading.config showdoes not leak token or token env names.Authorization: Key <token>.LeaseView.7. Optional live smoke
Add a guarded live smoke only if credentials and Compute access are available:
The live smoke should:
doctor --provider fal.warmup --provider fal --keep --slug fal-live-....status --provider fal --wait.run --provider fal --id <lease> --no-sync -- nvidia-smi.stop --provider fal <lease>.The script must skip cleanly without
CRABBOX_LIVE=1, withoutfalselected inCRABBOX_LIVE_PROVIDERS, or withoutFAL_KEY.Acceptance criteria
crabbox providerslistsfalas a built-in GPU cloud SSH lease provider.crabbox doctor --provider falverifies credentials and Compute access without creating resources.crabbox warmup --provider fal --keepprovisions a fal Compute instance and returns a reusable lease.crabbox run --provider fal -- nvidia-smiruns on the fal Compute instance over Crabbox-managed SSH.crabbox ssh --provider fal <id-or-slug>opens an SSH session.crabbox list --provider fal --jsonreturns normalized lease data without leaking credentials.crabbox stop --provider fal <id-or-slug>deletes the fal Compute instance and removes the local claim.--keepwas requested.Validation checklist
Before opening the PR:
If Worker files are touched for coordinator support, also run:
npm ci --prefix worker npm run format:check --prefix worker npm run lint --prefix worker npm run check --prefix worker npm test --prefix worker npm run build --prefix workerSuggested PR plan
feat/fal-provider.Open questions
APIscoped keys, or isADMINscope required?doctor?fal-aibe accepted as an alias, or should the provider follow newer Crabbox guidance and use only canonicalfal?