Validate UpdateService deployment pods are ready before completing install in disconnected#542
Conversation
WalkthroughThe playbook introduces a ChangesUpdateService Replica Variable and Readiness Gate
Sequence DiagramsequenceDiagram
participant Playbook
participant UpdateServiceCR as UpdateService CR
participant Deployment as Deployment openshift-update-service
Playbook->>Playbook: set_fact updateservice_replicas=3
Playbook->>UpdateServiceCR: create with spec.replicas={{ updateservice_replicas }}
Playbook->>Playbook: wait for RegistryCACertFound condition
Playbook->>Deployment: poll replicas/readyReplicas/updatedReplicas/availableReplicas == 3
Deployment-->>Playbook: all replica counts match
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f7a1712 to
29a2cb0
Compare
29a2cb0 to
6e7ff91
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@operators/cincinnati-operator/tasks.yaml`:
- Around line 51-70: The "Wait for UpdateService deployment pods to be ready"
task has `failed_when: false` which disables failure detection and allows the
task to proceed even when the readiness conditions in the `until` clause
(checking replicas, readyReplicas, updatedReplicas, and availableReplicas match
updateservice_replicas) are never met. Remove the `failed_when: false` parameter
entirely so that the kubernetes.core.k8s_info task will properly fail if the
readiness gates are not satisfied after the configured retries, preventing the
installation from proceeding with non-ready UpdateService pods.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 26c76790-8554-4403-8301-e390371af310
📒 Files selected for processing (1)
operators/cincinnati-operator/tasks.yaml
Adds validation that all UpdateService deployment pods are running and healthy before Enclave install completes. This ensures a functional UpdateService is handed over to partner overlays or day-2 operations.
Previously, Enclave only waited for the UpdateService CR to be created and the RegistryCACertFound condition, but did not verify the actual pods were running. This could allow install to complete with crash-looping UpdateService pods.
The new validation checks the deployment has all replicas ready, updated, and available. The replica count is now parameterized via updateservice_replicas fact for easier maintenance.
The validation is added with
failed_when: falseintentionally a soft readiness gate to observe behaviour without blocking installs or impacting customers, with the plan to harden it once confidence is gained.This addresses scenarios where partner overlays apply certificate changes that affect UpdateService, ensuring baseline functionality before handoff.
Related: https://github.com/gori-project/GoRI/issues/924
Summary by CodeRabbit