fix: require exact claims for direct cloud cleanup#817
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed July 4, 2026, 2:13 PM ET / 18:13 UTC. Summary Reproducibility: yes. The linked security issue and current main source show weak non-canonical lease label acceptance in provider ownership predicates that feed destructive cleanup paths; live destructive cloud reproduction was not run in this read-only review. Review metrics: 4 noteworthy metrics.
Root-cause cluster Members:
Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land the exact-claim hardening after current-head Vultr proof or waiver, explicit maintainer acceptance of the fail-closed reclaim policy, routine checks on the final head, and removal of release-owned changelog edits. Do we have a high-confidence way to reproduce the issue? Yes. The linked security issue and current main source show weak non-canonical lease label acceptance in provider ownership predicates that feed destructive cleanup paths; live destructive cloud reproduction was not run in this read-only review. Is this the best way to solve the issue? Yes, pending merge gates. Exact local provider/resource claims are the narrow maintainable safety boundary, but Vultr live proof or waiver and explicit upgrade-policy acceptance remain necessary before merge. Full review comments:
Overall correctness: patch is correct AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against e1732002dad3. Label changesLabel justifications:
Evidence reviewedAcceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
Review history (6 earlier review cycles)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a332ab6b8d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| preserveDigitalOceanKeyIdentity(server.Labels, claim.Labels) | ||
| } else if req.ReleaseOnly { | ||
| return core.LeaseTarget{}, core.Exit(2, "digitalocean lease=%s has no exact local claim; refusing release", leaseID) | ||
| } else if !req.NoLocalStateMutations { |
There was a problem hiding this comment.
Allow status-only resolves to remain read-only
When crabbox status resolves a live but claimless DigitalOcean lease, it calls ResolveRequest{StatusOnly: true} from internal/cli/status.go without Repo, Reclaim, or NoLocalStateMutations; this new guard therefore returns the use --reclaim error even though status is a read-only inspection path and the status command has no --reclaim flag. That leaves canonical claimless resources visible in list but impossible to inspect with status; treat non-waiting StatusOnly resolves as read-only here or have the caller set NoLocalStateMutations.
Useful? React with 👍 / 👎.
a332ab6 to
ee49c68
Compare
|
Updated this PR to keep SSH-provider status resolves read-only on current head Validation now green on this head:
Remaining proof gate is still provider-live: Vultr current-head create/use/stop/empty-inventory proof is unavailable here because no @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
ee49c68 to
7d58d5c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d58d5c6d5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if req.Repo.Root != "" && !req.NoLocalStateMutations { | ||
| if _, err := core.ClaimLeaseTargetForRepoConfigIfUnchanged(leaseID, server.Labels["slug"], b.Cfg, server, ssh, req.Repo.Root, b.Cfg.IdleTimeout, req.Reclaim, claim, claimExists); err != nil { |
There was a problem hiding this comment.
Reject cloudless Vultr claims before rebinding
When an exact Vultr claim exists but has an empty CloudID and no valid recovery marker, this non-read-only resolve path still falls through and rewrites the claim with the live instance's ID. That lets a legacy or corrupt local claim become resource-bound without --reclaim, so a later stop can delete the instance based only on matching remote tags; the other direct-cloud providers now reject cloudless claims unless they are explicit recovery states. Please add the same claim.CloudID == "" recovery-state validation before this write.
Useful? React with 👍 / 👎.
|
Updated the PR body with redacted exact-head proof for
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
Pushed a maintainer follow-up for the Vultr finding on current head What changed:
Local validation on this head: This removes the code blocker from the prior review. The remaining merge gate is still exact-head Vultr live proof or an explicit maintainer waiver; no Vultr token is present in this shell, so I am not claiming provider-live proof. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
25d50a9 to
5a2ba9d
Compare
5a2ba9d to
17c55cf
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17c55cfe49
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return claimed && exact && | ||
| claim.ProviderScope == providerScope && | ||
| resourceID != "" && claim.CloudID == resourceID, nil |
There was a problem hiding this comment.
Require provider identity before touching status leases
When crabbox status --wait resolves a direct-cloud lease it now uses NoLocalStateMutations, so the provider-side resolve intentionally skips the exact-claim account/project checks; this helper then decides whether to call Touch, but it only verifies provider scope and CloudID. An exact legacy or corrupt DigitalOcean/Linode/Vultr/Scaleway claim with the same resource ID but missing or mismatched account/project labels would still return true, causing status --wait to mutate remote TTL/last-touched tags even though normal resolve/stop paths would reject that claim as not account/project-bound. Include the provider identity labels in this predicate or avoid touching when the claim has not passed the same ownership checks.
Useful? React with 👍 / 👎.
17c55cf to
9d074c7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d074c707f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| } else { | ||
| switch claim.Labels["recovery"] { | ||
| case "ambiguous-create", "ambiguous-create-response", "ambiguous-key-create", "rollback-cleanup", "rollback-key-cleanup": |
There was a problem hiding this comment.
Reject key-only Scaleway recoveries for live servers
When a live Scaleway server is resolved, targetFromServer and deleteServer use this helper to decide whether an exact claim proves ownership. Allowing empty-CloudID ambiguous-key-create or rollback-key-cleanup claims here treats recovery states that were recorded before any server identity existed, or only for SSH-key cleanup, as proof of server ownership. If a tagged server with the same lease/slug/project exists, it can then be rebound or deleted without an exact server-bound claim; only server-create ambiguous states should be accepted when server.CloudID is non-empty.
Useful? React with 👍 / 👎.
Summary
--reclaimadoption; never synthesize destructive ownership from remote tagsCloses #811
Verification
go test -race ./internal/providers/digitalocean ./internal/providers/linode ./internal/providers/scaleway ./internal/providers/vultr -count=1— pass.go vet ./internal/cli ./internal/providers/digitalocean ./internal/providers/linode ./internal/providers/scaleway ./internal/providers/vultr— pass.scripts/check-docs.sh— pass; 53 command docs, 71 provider-matrix rows, 207 Markdown files, and the generated docs site verified.Exact-head live proof
Candidate:
5a2ba9dd6314b638198e6a2928b3914c18865dfdecho okover SSH; listed and stopped the exact claim; verified final inventory[]and no managed-key residue.echo okover SSH; listed and stopped the exact claim; verified final inventory[]and no local-key residue.echo okover SSH; listed and stopped the exact claim; verified final inventory[]and no managed-key residue.