Add pool membership controller logic#21
Conversation
The controller needs to watch/update Nodes (for labeling and cordon state) and manage BootcNodes (create/delete per pool membership, read status for rollout decisions). Assisted-by: Pi (Claude Opus 4.6)
Set up the controller to watch Nodes and BootcNodes objects. For Node objects, we need a mapper to map it back to the related pool(s). Use predicates for the Node watch to try to filter on only events we care about. Assisted-by: Pi (Claude Opus 4.6)
c9a51e2 to
df4042b
Compare
|
I'm really not a big fan on reimplementing the test logic for polling and the matches. It makes the code much harder to read. Also, with the increase of the test we will require the more and more complex logic. |
|
I would also implement some unit testing for the reconcile loop for the bootcnode pool, there are a lot of corner cases there. |
Implement the core reconciliation loop that matches Nodes to BootcNodePools via nodeSelector and maintains the corresponding BootcNode objects. On each reconcile, the controller computes matching nodes vs owned BootcNodes and creates/deletes as needed. This pretty much follows the logic set out by the ARCHITECTURE doc. Assisted-by: Pi (Claude Opus 4.6)
First, tweak the guidelines in the ARCHITECTURE doc around pool conflicts slightly: we continue on to reconcile uncontested nodes, but otherwise leave contested nodes to their current owners (but still degrade the pool to surface this to admins). This just seems more in line with K8s' "graceful degradation" philosophy. Then, actually implement pool conflict detection. This is easy to detect simply by catching `IsAlreadyExists()` at `BootcNode` creation time. Assisted-by: Pi (Claude Opus 4.6)
Now that we have a real controller, this test doesn't make sense anymore. As soon as the pool is created, the controller will start managing it and update its status. The point was to sanity-check the CRD definition itself, but we'll implicitly test that in a much more meaningful way as we actually implement more of the controller logic and test various situations that exercise the full status schema.
Prep for testing the new controller. While we're here, enable logging as well.
Add tests covering the new controller pool membership logic. Assisted-by: Pi (Claude Opus 4.6)
This is pretty standard stuff. I'm sure there's hundreds of third-party packages out there which offer an API for this. But meh, it's simple code. In the process, we significantly improve our error-handling in those wait loops where before we considered all errors indiscriminately when we should really only handle IsNotFound differently, which is a pet peeve of mine.
This doesn't really add much value beyond what envtests test today but it does exercise the controller in a real cluster. The test here is pretty basic and I don't think it's worth to test all the same corner cases already covered by the envtests. These e2e tests will get much more meaningful and exciting once we actually have a daemon to mutate the node. Assisted-by: Pi (Claude Opus 4.6)
df4042b to
9c00f2d
Compare
|
Thanks for the review! Updated for comments. I tried to fold all the changes into their respective commits but kept the gomega switchover as a separate commit to avoid that pain.
Thanks, I wasn't deeply familiar with gomega and you made me take a closer look. I was under the impression that it was a part of ginkgo, but no it clearly can work without it. Switched over to that now!
Hmm, this patch series does add quite a few envtests here though. Is there anything obvious that's missing? |
We are not testing the path when a node is deleted and that the corresponding bootc should be deleted as well. |
It makes a bunch of our test assertions easier to write, and it has various polling helpers already that allow us to get rid of our homegrown version.
That seems like a nice way to complete the story and verifies that we correctly clear a pool's Degraded condition due to conflicts.
Verify that a node being deleted causes its `BootcNode` to also be deleted.
9c00f2d to
aa2478c
Compare
|
Updated for comments!
Added! |
This covers milestone 3a of the implementation plan.
See individual commits!