Skip to content

Prep patches for rollout state machine#49

Merged
jlebon merged 5 commits into
mainfrom
reboot-slot-prep
Jun 3, 2026
Merged

Prep patches for rollout state machine#49
jlebon merged 5 commits into
mainfrom
reboot-slot-prep

Conversation

@jlebon

@jlebon jlebon commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Move reboot slot annotations to BootcNode, split rollout code into its own file, and add reboot slot tracking with candidate selection. No transitions yet, just classifying and selecting.

See individual commits for more details.

jlebon added 4 commits June 2, 2026 17:00
Was investigating something else and came upon this old issue in the MCO
repo. It has a lot of interesting info about etcd and rollout ordering
which I think will be helpful once we tackle cross-pool ordering.

So link to it.
Add a new bootc.dev/in-reboot-slot annotation to persistently track
which nodes occupy a reboot slot. This is needed to make sure that slot
counting survives controller restarts. The annotation is set when a slot
is assigned and removed when freed.

Both in-reboot-slot and was-cordoned are placed on the BootcNode rather
than the K8s Node. The reconciler already has all owned BootcNodes in
hand during driveRollout. Keeping operator bookkeeping off the Node
object also avoids unnecessary churn on an object that other controllers
and tools may be watching.

Assisted-by: Pi (Claude Opus 4.6)
This updates the code to match the architecture change in the previous
commit where we moved the was-cordoned annotation from the Node to the
BootcNode.

Assisted-by: Pi (Claude Opus 4.6)
Move driveRollout(), nodeState enum, classifyNode(), and
TestClassifyNode to dedicated rollout.go and rollout_test.go files.
The controller file is already ~500 lines and the upcoming rollout
state machine commits will add substantially more. Separating now
keeps the split clean: bootcnodepool_controller.go owns Reconcile(),
watches, and membership sync; rollout.go owns the rollout state
machine.

Pure code move, no behavioral changes.

Assisted-by: Pi (Claude Opus 4.6)
Comment thread internal/controller/rollout.go Outdated
Comment thread internal/controller/rollout.go
Comment thread internal/controller/rollout.go
Comment thread internal/controller/rollout.go Outdated
Add rolloutState struct to classify owned BootcNodes into state buckets
and count occupied reboot slots from the in-reboot-slot annotation.
Add resolveMaxUnavailable to compute effective maxUnavailable from the
pool spec (defaults to 1, rounds up for percentages, returns 0 when
paused). Add selectDrainCandidates to pick Staged nodes needing the
drain flow, always re-selecting already-slotted nodes regardless of
capacity. driveRollout now computes slots and candidates but does not
yet act on them.

Also adds testutil node builder options (WithBootedDigest,
WithNodeCondition, WithNodeAnnotation) used across rollout tests.

Assisted-by: Pi (Claude Opus 4.6)
@jlebon jlebon force-pushed the reboot-slot-prep branch from 6d3c61b to fcbacf1 Compare June 3, 2026 16:22
@jlebon

jlebon commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Updated for comments! They're minor changes so will merge based on approval.

@jlebon jlebon enabled auto-merge (rebase) June 3, 2026 16:23
@jlebon jlebon merged commit f34740a into main Jun 3, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants