- No per-node mutations -- all OS changes flow through images
- Not tied to OpenShift -- runs on vanilla K8s
- Tight integration with bootc APIs (soft-reboot, staging, rollback)
- Minimize API server load -- daemon watches a single per-node object
- Clean CRD APIs that can be driven by a higher-level operator (e.g. MCO on OpenShift, or a multi-cluster management layer)
Two binaries: a controller (Deployment) and a daemon (DaemonSet).
┌──────────────────────────────────────────────────────────┐
│ Control Plane │
│ │
│ ┌─────────────────────┐ ┌────────────────────────┐ │
│ │ BootcNodePool │ │ BootcNode (per node) │ │
│ │ (user-created) │ │ (operator-managed) │ │
│ │ │ │ │ │
│ │ spec: │ │ spec: ← controller │ │
│ │ nodeSelector │ │ desiredImage │ │
│ │ image (tag/digest)│ │ desiredImageState │ │
│ │ rollout config │ │ │ │
│ │ update policy │ │ status: ← daemon │ │
│ │ │ │ booted image/digest │ │
│ │ status: │ │ staged image/digest │ │
│ │ targetDigest │ │ rollback image │ │
│ │ node counts │ │ conditions │ │
│ │ conditions │ │ (Idle) │ │
│ └──────────┬──────────┘ └──────────┬─────────────┘ │
│ │ watches r/w │ │
│ ┌──────────▼───────────────────────────▼─────────────┐ │
│ │ Controller (Deployment) │ │
│ │ │ │
│ │ Pool Reconciler: resolves tags, selects nodes, │ │
│ │ computes candidates, writes BootcNode.spec, │ │
│ │ handles drain/cordon/uncordon, │ │
│ │ polls registries for tag updates │ │
│ └────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ Each Node │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Daemon (DaemonSet pod) │ │
│ │ │ │
│ │ Watches: its own BootcNode (single object) │ │
│ │ │ │
│ │ On spec change: │ │
│ │ if desiredImage != booted → stage (locked) │ │
│ │ if desiredImageState == Booted → reboot │ │
│ │ │ │
│ │ On bootc status change: │ │
│ │ (via fsnotify on /proc/1/root/ostree/bootc) │ │
│ │ → update BootcNode.status │ │
│ │ │ │
│ │ Runs: bootc switch, bootc upgrade, bootc │ │
│ │ status, bootc rollback (via nsenter into │ │
│ │ host mount namespace) │ │
│ └──────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────┘
The DaemonSet only runs on nodes managed by a pool. The Pool Reconciler
labels managed nodes with bootc.dev/managed: "" when creating their
BootcNode, and removes the label when the node leaves all pools. The
DaemonSet uses a nodeSelector for this label, so daemon pods are
automatically created and deleted by the scheduler as nodes are
registered/unregistered.
To keep API server load minimal, each daemon pod watches exactly one object: its own BootcNode CRD (field-selected by node name). This is the sole communication channel:
- Controller → Daemon: writes to
BootcNode.spec(desired image, desired image state) - Daemon → Controller: writes to
BootcNode.status(bootc state, conditions)
Updates to the BootcNode should only happen only on state transitions (not periodic heartbeats).
Defines a group of nodes and their desired OS image state.
apiVersion: node.bootc.dev/v1alpha1
kind: BootcNodePool
metadata:
name: workers
spec:
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: ""
image:
# User specifies tag, digest, or both
ref: quay.io/example/myos:latest
rollout:
maxUnavailable: 1 # int or percentage string (e.g. "25%")
paused: false # when true, no new rollouts start
disruption:
rebootPolicy: AllowSoftReboot # RebootOnly (default), AllowSoftReboot
pullSecretRef:
name: my-pull-secret
namespace: bootc-operator
status:
deployedDigest: sha256:old789... # last digest fully rolled out
targetDigest: sha256:abc123... # what we're rolling out to
updateAvailable: true # targetDigest != deployedDigest
nodeCount: 10
updatedCount: 7
updatingCount: 2
degradedCount: 1
conditions:
- type: UpToDate
- type: DegradedPool conditions and their reasons:
| Condition | Status | Reason | Meaning |
|---|---|---|---|
| UpToDate | True | AllUpdated | All nodes are running targetDigest |
| UpToDate | False | RolloutInProgress | Nodes are actively being updated |
| UpToDate | False | Paused | Updates pending but pool is paused |
| Degraded | True | NodeConflict | Node selector overlaps with another pool |
| Degraded | True | NodeDegraded | At least one node has errors or isn't converging |
| Degraded | True | InvalidSpec | Pool spec contains invalid values |
| Degraded | False | Healthy | No issues |
The UpToDate condition is determined by the controller by comparing
spec.desiredImage vs status.booted.imageDigest across all nodes in the pool.
When the condition is False, the message field includes a breakdown of node
states (e.g. "5/10 updated; 2 staging, 2 staged, 1 rebooting") so the user can
see exactly what's happening without inspecting individual BootcNodes.
The NodeDegraded reason is set when one or more nodes have a
Degraded=True condition (daemon-reported errors).
Note each node can belong to at most one BootcNodePool. If a node matches
multiple pool selectors, the controller sets Degraded with reason
NodeConflict on the conflicting pools rather than silently picking one.
Per-node object auto-created by the controller. Named after the Node it
represents. The controller writes spec, the daemon writes status.
apiVersion: node.bootc.dev/v1alpha1
kind: BootcNode
metadata:
name: worker-1 # matches Node name
ownerReferences:
- kind: BootcNodePool # owned by pool
spec: # ← written by controller
desiredImage: quay.io/example/myos@sha256:abc123
desiredImageState: Staged # Staged or Booted
pullSecretRef:
name: my-pull-secret
namespace: bootc-operator
pullSecretHash: sha256:e3b0c4...
status: # ← written by daemon
booted:
image: quay.io/example/myos@sha256:old789
imageDigest: sha256:old789
version: "9.4"
timestamp: "2026-03-20T12:00:00Z"
architecture: amd64
softRebootCapable: true
incompatible: false # true if node has local mutations bootc can't manage
staged:
image: quay.io/example/myos@sha256:abc123
imageDigest: sha256:abc123
softRebootCapable: true
rollback:
image: quay.io/example/myos@sha256:xyz000
imageDigest: sha256:xyz000
conditions:
- type: Idle
status: "False"
reason: Staged
message: "Image staged, awaiting desiredImageState: Booted"
- type: Degraded
status: "False"
reason: OKThe Idle condition represents where the node is in the update
lifecycle. It does not claim whether the node is "up to date" --
that is determined by the controller by comparing spec.desiredImage
against status.booted.imageDigest.
| Status | Reason | Meaning |
|---|---|---|
| True | Idle | Daemon has no active update cycle |
| False | Staging | Pulling/staging the image |
| False | Staged | Image staged, waiting for desiredImageState: Booted |
| False | Rebooting | Reboot in progress |
The Degraded condition represents errors the daemon is hitting at
the current point in that lifecycle. For example, a daemon with
Idle=False/Staging and Degraded=True would mean that the daemon
is having trouble staging the update.
| Status | Reason | Meaning |
|---|---|---|
| True | Error | Daemon encountered an error (message has details) |
| False | Healthy | No errors |
The daemon is intentionally simple -- driven by two inputs: the BootcNode spec (from the controller) and local bootc status.
desiredImage != booted
Idle ────────────────────────────► Staging
(True) (bootc switch --download-only)
▲ │
│ ok ──┴── error
│ │ │
│ ▼ ▼
│ Staged Degraded=True
│ │
│ desiredImageState == Booted
│ && staged == desiredImage
│ │
│ ▼
│ Rebooting
│ (bootc switch --from-downloaded --apply)
│ │
│ ... node reboots ...
│ daemon restarts
│ reads bootc status
└───────────────────────────────┘
The daemon sets Idle=True when desiredImage == booted (or on startup
if they match). It sets Idle=False with the appropriate reason when an
update cycle is in progress. Errors are reported via the Degraded
condition independently of Idle. The daemon never claims whether the
node is "up to date" -- that determination is made by the controller.
On startup and on fsnotify event (see Detecting bootc status
changes), the daemon reads bootc status --json and writes the result to BootcNode.status. This is
event-driven from bootc itself rather than polling.
The DaemonSet runs at the Privileged Pod Security level. The operator namespace needs a PSA exemption for this. The pod runs with:
privileged: true-- grants all Linux capabilitieshostPID: true-- shares the host PID namespace
No host filesystem mount is needed. The daemon accesses the host
through two mechanisms, both available via hostPID:
nsenter -m/proc/1/ns/mnt-- enters PID 1's mount namespace for executing bootc commands and triggering reboots. Thecontainerenvironment variable is filtered out so bootc sees the real host state rather than detecting a container context./proc/1/root/-- resolves to PID 1's root filesystem for fsnotify watching and writing pull secrets to the host.
See the related Future Enhancements entry to improve this.
The Pool Reconciler is the sole controller in the operator. It translates the user's high-level intent (a pool of nodes running a specific image) into concrete per-node actions. It owns the entire update lifecycle.
Watches: BootcNodePool, Node, BootcNode, Secret (for pull secrets).
All watches map events to the owning BootcNodePool key, so there is a
single Reconcile() function. Each invocation runs the full loop below.
Every step is idempotent -- the reconciler is safe to re-run at any point
regardless of what triggered it.
Watch-to-pool mapping:
- BootcNodePool: Direct -- the event is the pool.
- BootcNode: Follow
ownerReferenceto the pool. - Node:
EnqueueRequestsFromMapFuncthat enqueues two sets of pools: (1) pools whosenodeSelectormatches the node's current labels, and (2) if a BootcNode exists for this node, the pool that owns it (fromownerReference). The second set is needed to handle label removal: when a label changes such that a node no longer matches its current pool, only the owning pool can clean up the BootcNode and remove thebootc.dev/managedlabel. All lookups are from cache (no API calls). - Secret:
EnqueueRequestsFromMapFunc-- list pools whosepullSecretRefreferences the changed Secret.
The Node watch uses predicates to filter out high-frequency noise:
only label changes, Ready condition changes, and spec.unschedulable
changes trigger reconciliation. Kubelet heartbeats, resource capacity
updates, and other frequent Node updates are thus ignored.
1. Resolve target digest
Determine the digest the pool should be running:
- Digest ref (e.g.
myos@sha256:abc): Setstatus.targetDigestdirectly from the spec. No registry query needed. - Tag ref (e.g.
myos:latest): If enough time has passed since the last resolution (tracked in pool status), query the registry to resolve the tag. If the resolved digest differs fromstatus.targetDigest, update it. Schedule the next resolution viaRequeueAfter.
In both cases, set status.updateAvailable = (targetDigest != deployedDigest).
2. Sync pool membership
Compute two sets:
- matching nodes: nodes whose labels match the pool's
nodeSelector. - owned BootcNodes: BootcNodes with an
ownerReferenceto this pool.
For each matching node, look up the BootcNode with the same name from cache. Then reconcile:
- New match (matching node, no BootcNode exists): Create a BootcNode
(owned by this pool) with
spec.desiredImageset to the pool'stargetDigest. Label the node withbootc.dev/managed(which triggers DaemonSet pod scheduling). If the node is already running the target image, the daemon will reportIdle=True. If not, it will begin staging immediately -- staging is non-disruptive (image pull only), and the disruptive reboot step is still gated bymaxUnavailable. - Conflict (matching node, BootcNode exists but owned by a different
pool): Do not create a BootcNode for the contested node. Continue
reconciling uncontested nodes normally through all steps. After
membership sync completes, set the pool's
Degradedcondition with reasonNodeConflictand a message identifying the conflicting pool(s). - No longer matching (owned BootcNode whose node doesn't match, or
whose node was deleted): Delete the BootcNode. Remove the
bootc.dev/managedlabel if the node still exists (which triggers DaemonSet pod removal). If the BootcNode has thebootc.dev/was-cordonedannotation, restore the prior cordon state on the K8s Node. If the BootcNode has thebootc.dev/in-reboot-slotannotation, remove it (freeing the slot).
Sync each BootcNode's spec fields from the pool: set desiredImage to
targetDigest, and copy pullSecretRef and pullSecretHash (if they
differ). When desiredImage changes, also reset desiredImageState to
Staged -- this revokes any pending reboot approval for the previous
image. When targetDigest changes, this causes all daemons to begin
staging in parallel. This is intentional -- staging is non-disruptive
(image pull only), and pre-staging everywhere means nodes are ready to
reboot as soon as maxUnavailable capacity allows.
3. Drive per-node rollout state machine
For each BootcNode owned by this pool, read its status.conditions and
act based on the current state. If spec.rollout.paused is true, do not
set desiredImageState: Booted on any new nodes (but let in-progress
staging complete).
The effective state of a BootcNode is determined by three fields:
spec.desiredImage-- always set. Reflects the image this node should be running. Kept in sync with the pool'stargetDigest(updated on all BootcNodes whentargetDigestchanges). After a successful update, it already matches the booted image -- no clearing needed.spec.desiredImageState-- set by the controller.Stagedmeans the daemon should stage the image but not reboot.Bootedmeans the daemon should apply the staged image and reboot. Set toBootedafter drain completes.status.conditions[Idle]-- set by the daemon to report whether it is actively working. The daemon setsIdle=Truewhen it has no active update cycle, andIdle=Falsewith a reason when it does.status.conditions[Degraded]-- set by the daemon to report errors. Independent ofIdle: a daemon can be idle and degraded (tried, failed, stopped), or actively retrying and degraded.
The controller determines whether a node is up to date by comparing
spec.desiredImage against status.booted.imageDigest. It does not
rely on the daemon's Idle condition for this.
The controller classifies each node into one of six effective states.
The Degraded condition is checked first (takes priority over activity
state).
| Effective state | Determination | Reconciler action |
|---|---|---|
| Degraded | BootcNode Degraded=True |
Mark pool degraded |
| UpToDate | desiredImage == booted |
If in reboot slot: free slot only once node is Ready |
| Pending | No booted status, or desiredImage != booted with no |
Wait for daemon to report or react |
| actionable Idle reason (daemon hasn't reported/reacted) | ||
| Staging | desiredImage != booted, Idle=False reason=Staging |
Wait (non-disruptive) |
| Staged | desiredImage != booted, Idle=False reason=Staged |
If reboot slot available: assign slot; else wait |
| Rebooting | desiredImage != booted, Idle=False reason=Rebooting |
Wait for node to come back |
The pool has maxUnavailable reboot slots. Reboot slots are
governed by three rules:
- A node can only take a reboot slot when healthy -- only Staged nodes (not Degraded) are candidates for reboot slots.
- A node can only release a reboot slot when healthy -- after
reboot, the slot is held until the node is Idle (
desiredImage == booted, not Degraded) and the K8s Node is Ready. A node that is Degraded or not Ready post-reboot holds its slot indefinitely. - 2 unhealthy nodes in reboot slots stop the rollout -- when 2 or more nodes occupying reboot slots are unhealthy (Degraded or not Ready), the controller stops assigning new slots. A single unhealthy node might be a hardware issue, but two suggest the image is bad. Note that Degraded is a BootcNode condition while Ready is a K8s Node condition -- these are independent checks.
A node enters a slot when the controller sets the
bootc.dev/in-reboot-slot annotation on the BootcNode, cordons the
K8s Node, and starts draining. The annotation is the persistent marker
for slot occupancy -- it survives controller restarts. Staging is
non-disruptive (image pull only) and does not occupy a slot. Staged
nodes waiting for a slot are still serving workloads normally.
Nodes reporting Degraded=True are flagged at the pool level via
Degraded/NodeDegraded. Unhealthy nodes that never
entered a reboot slot (e.g. staging failed) do not block the rollout
-- other nodes continue normally.
The controller and daemon each own specific transitions:
┌────────┐ controller updates ┌───────────┐ daemon stages ┌──────────┐
│ Idle ├────────────────────►│ Staging ├──────────────►│ Staged │
│ │ desiredImage │ │ successfully │ │
│ │ │ (daemon │◄──────────────┤ (waiting │
│ │ │ pulling) │ staged != │ for slot)│
│ │ │ │ desiredImage │ │
└────▲───┘ └───────────┘ └────┬─────┘
│ │
│ slot │
│ assigned │
│ │
│ │
│ │
│ │
│ │
│ ┌───────────┐ │
│ node reboots, │ Rebooting │ │
└───────────────daemon restarts─────────┤ │◄─────┘
│ (daemon │
│ reboots) │
└───────────┘
Transition details:
-
Idle → Staging: The daemon detects that
spec.desiredImageno longer matches the booted image. It then setsIdle=False reason=Staging, and runsbootc switch --download-only <desiredImage>to stage the image in locked mode (the staged image will not be applied on an unexpected reboot). -
Staging → Staged: The daemon finishes
bootc switch --download-onlysuccessfully and setsIdle=False reason=Staged. The staged image is locked and safe from unexpected reboots. IfdesiredImagechanged during staging, the mismatch is caught in the Staged state (see Staged → Staging below). -
Staged → Staging (re-stage): If
staged.imageDigest != desiredImage(becausedesiredImagechanged while staging or while waiting for a reboot slot), the daemon goes back to Staging. It setsIdle=False reason=Stagingand re-runsbootc switch --download-onlywith the newdesiredImage. -
Staged → Rebooting: The controller assigns the node a reboot slot if one is available. It sets the
bootc.dev/in-reboot-slotannotation on the BootcNode, cordons the K8s Node, and records prior cordon state in thebootc.dev/was-cordonedannotation on the BootcNode. It starts an async drain goroutine usingk8s.io/kubectl/pkg/drain. The goroutine blocks until drain completes (or is cancelled), then signals completion via a channel that re-enqueues the pool. On successful drain, the controller setsBootcNode.spec.desiredImageState = Booted. The daemon detects this and verifiesstaged.imageDigest == desiredImagebefore proceeding. If they match, it setsIdle=False reason=Rebootingand runsbootc switch --from-downloaded --applyto unlock the staged image and reboot into it. If they don't match (race with adesiredImageupdate), the daemon goes back to Staging instead. -
Rebooting → Idle: The node reboots into the new image. The daemon pod restarts, reads
bootc status --json, and setsIdle=True. The controller detects thatdesiredImage == bootedbut keeps the reboot slot occupied until the node is Ready. Once Ready, it restores prior cordon state (uncordons only if the node was not already unschedulable before) and removes both annotations from the BootcNode. This frees the reboot slot for the next candidate. -
Any → Degraded: The daemon is encountering an error either trying to stay in its current state, or trying to transition to another state. For example, if a
bootccommand keeps failing. The daemon setsDegraded=True reason=Errorwith a message describing the failure. The controller classifies this asnodeStateDegradedand flags it at the pool level. A degraded node in a reboot slot keeps its slot. Two degraded nodes cause the rollout to stop (see reboot slot rules above).
4. Aggregate pool status
Compute pool-level fields from the BootcNode statuses:
nodeCount, updatedCount, updatingCount, degradedCount.
Set pool conditions: UpToDate, Degraded.
If all nodes are up to date, set deployedDigest = targetDigest and
clear updateAvailable.
User sets BootcNodePool.spec.image.ref = quay.io/example/myos:v2
│
▼
Pool Reconciler: resolves :v2 → sha256:abc123
Pool Reconciler: stores in pool status.targetDigest
Pool Reconciler: updates desiredImage on ALL BootcNodes to sha256:abc123
│
▼
All daemons: detect spec change, begin staging in parallel
All daemons: set Idle=False Reason=Staged (as each finishes)
│
▼
Pool Reconciler: assigns reboot slot to a Staged node
(cordons, drains, sets desiredImageState: Booted)
│
▼
Daemon: detects desiredImageState: Booted, reboots
│
▼
Node reboots into new image
Daemon: restarts, reads bootc status, sets Idle=True
│
▼
Pool Reconciler: detects desiredImage == booted
Pool Reconciler: waits for node Ready, then frees reboot slot
(uncordons)
Pool Reconciler: assigns freed slot to next Staged node
-
Pause: User sets
pool.spec.rollout.paused = true. The Pool Reconciler stops picking new candidates and does not setdesiredImageState: Bootedon any new nodes. Nodes already mid-staging complete their staging. Tag resolution continues andstatus.targetDigestis kept current, so the user can see what's pending. -
Resume: User sets
pool.spec.rollout.paused = false. The reconciler picks up where it left off: selects candidates, setsdesiredImageState: Bootedfor already-staged nodes. -
Cancel + rollback: User changes
pool.spec.imageback to the previous digest. The reconciler updatestargetDigestand setsdesiredImageon all BootcNodes as usual. Nodes already running that image are Idle. Nodes that were updated to the new image go through the normal staging/reboot cycle.
All commands are run via nsenter -m/proc/1/ns/mnt to enter the host
mount namespace.
| Operator action | bootc invocation | Notes |
|---|---|---|
| Read status | bootc status --json --format-version=1 |
Parse Host struct |
| Stage update | bootc switch --download-only <image> |
Stages but locks -- won't apply on unexpected reboot (pending upstream) |
| Apply + reboot | bootc switch --from-downloaded --apply |
Unlocks staged update and reboots |
| Apply + soft reboot | bootc switch --from-downloaded --apply --soft-reboot=auto |
Unlocks and does userspace-only restart when possible |
| React to changes | fsnotify on /proc/1/root/ostree/bootc |
See below |
On the host, the bootc-status-updated.path systemd path unit watches
/ostree/bootc (physically /sysroot/ostree/bootc) for mtime changes.
Whenever bootc performs an operation that changes status (switch, upgrade,
rollback, edit), it calls update_mtime() to touch this directory's
mtime, which triggers the path unit.
The daemon detects these changes using Go's fsnotify on
/proc/1/root/ostree/bootc. Since the DaemonSet has hostPID: true,
/proc/1/root/ resolves to PID 1's root filesystem, giving full visibility into
the host mount tree. The mtime touch appears as a CHMOD event.
On receiving an event, the daemon re-reads bootc status --json via
nsenter and updates BootcNode.status if the state has changed.
The daemon also polls bootc status --json on a long interval (e.g. 5 minutes)
as a fallback in case an fsnotify event is missed.
Note: bootc's path unit mechanism may not work with composefs (the
/ostree/bootc directory does not exist). This is a known upstream issue. The
polling fallback covers this case.
BootcNodePool references a pull secret (spec.pullSecretRef). The
controller copies this reference into BootcNode.spec.pullSecretRef along
with a hash of the Secret's content (spec.pullSecretHash).
The daemon reads the Secret via the K8s API (one-shot GET, not a watch)
and writes the .dockerconfigjson key to /run/ostree/auth.json on the
host filesystem via nsenter. This is the highest-priority path in bootc's
auth file search order (/run/ostree/ > /etc/ostree/ >
/usr/lib/ostree/), so it cleanly overrides any persistent or vendor
auth config without mutating /etc/.
Using a /run/ path means the file does not survive reboots, but this is
desirable: the daemon re-writes it on every startup and on every
BootcNode spec change, so it is always present before any bootc upgrade
runs. If the DaemonSet is removed, the credentials disappear on the next
reboot rather than lingering on disk.
When the Secret's content changes, the controller detects the change and bumps the hash in BootcNode.spec. This triggers the daemon's existing BootcNode watch, causing it to re-fetch the Secret and update the host file.
This requires the daemon ServiceAccount to have get permission on Secrets in
the operator namespace.
-
Privilege separation: The daemon could fork a privileged helper early on and then drop privileges. The unprivileged main process (API server watch, state machine) would communicate with the helper via a Unix socket. Only the helper would execute nsenter operations and only knows how to execute specific commands.
-
Health checks and automatic rollback: When enabled, monitors node health and automatically roll back if unhealthy. Simplest is NotReady, but could integrate with systemd's Automatic Boot Assessment (i.e.
boot-complete.target) for more customization. -
Maintenance windows: Allow pools to specify time windows during which reboots are permitted (e.g. weekends, off-peak hours). Staging would still happen immediately, but the reconciler would only set
desiredImageState: Bootedwhen the current time falls within the window. Similar to kured/Zincati. -
Pre-staging while paused: A mode where pausing blocks reboots but allows staging to proceed on all target nodes. This way, when the user unpauses, nodes are already staged and can drain and reboot immediately without waiting for image pulls.
-
Signature policy enforcement: Allow users to require signature verification of OS update payloads.
-
Pull-through caching: When enabled, bootc is actually pointed at a pullspec we own, and we cache the layers ourselves. Need to make sure it doesn't conflict with signature policy enforcement feature.
-
Stuck node detection: The controller could track non-progressing nodes and escalate them (e.g. mark the pool as degraded) after a time threshold. This would cover cases where the daemon has not responded to a controller action:
desiredImage != bootedwithIdle=True(daemon should be staging),desiredImage == bootedwithoutIdle=True(daemon should have settled), anddesiredImageState == BootedwithIdle=False reason=Staged(daemon should be rebooting). -
Custom drain implementation: The controller currently uses
k8s.io/kubectl/pkg/drainfor node draining, which pulls in heavy dependencies for a fairly thin orchestration layer on top of the Eviction API (list pods, filter DaemonSet/mirror pods, evict with PDB retry, poll until deleted). A custom implementation would dropk8s.io/kubectlandk8s.io/cli-runtimeand use the Eviction API directly. This should allow dropping the logwriter adapter we have, and possibly not require drain goroutines. -
Cross-pool rollout ordering: Allow a pool to declare a dependency on another pool (e.g.
dependsOn: workers). The reconciler would gate rollout on the dependency pool reachingUpToDate=Truefirst. The primary use case is updating worker nodes before control plane nodes. Without this, users must manually sequence pool updates or rely on a higher-level operator to coordinate. This is intentionally deferred -- two independent pools cover the common case, and cross-pool ordering adds coordination complexity (handling degraded dependencies, cycles, multi-phase chains). See also related discussions in openshift/machine-config-operator#1897.