Skip to content

Validate UpdateService deployment pods are ready before completing install in disconnected#542

Merged
maorfr merged 1 commit into
mainfrom
revert-532-revert-521-validate-updateservice-pods-ready-2
Jun 25, 2026
Merged

Validate UpdateService deployment pods are ready before completing install in disconnected#542
maorfr merged 1 commit into
mainfrom
revert-532-revert-521-validate-updateservice-pods-ready-2

Conversation

@maorfr

@maorfr maorfr commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Adds validation that all UpdateService deployment pods are running and healthy before Enclave install completes. This ensures a functional UpdateService is handed over to partner overlays or day-2 operations.

Previously, Enclave only waited for the UpdateService CR to be created and the RegistryCACertFound condition, but did not verify the actual pods were running. This could allow install to complete with crash-looping UpdateService pods.

The new validation checks the deployment has all replicas ready, updated, and available. The replica count is now parameterized via updateservice_replicas fact for easier maintenance.

The validation is added with failed_when: false intentionally a soft readiness gate to observe behaviour without blocking installs or impacting customers, with the plan to harden it once confidence is gained.

This addresses scenarios where partner overlays apply certificate changes that affect UpdateService, ensuring baseline functionality before handoff.

Related: https://github.com/gori-project/GoRI/issues/924

Summary by CodeRabbit

  • New Features
    • UpdateService deployments now support configurable replica counts (instead of a fixed value), making it easier to tailor scaling to different environments.
    • Enhanced rollout validation: the operator waits for UpdateService Deployment replica health (replicas, ready, updated, and available) to match the configured count after registry CA is detected, with the check running for disconnected setups by default.

@github-actions github-actions Bot added the operators Operator installation/config label Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The playbook introduces a set_fact task defining updateservice_replicas: 3, substitutes that variable into the UpdateService CR's spec.replicas field (replacing a hard-coded 3), and appends a kubernetes.core.k8s_info polling task that waits for the openshift-update-service Deployment's replica status fields to match the variable, gated by when: disconnected | default(true) | bool.

Changes

UpdateService Replica Variable and Readiness Gate

Layer / File(s) Summary
Replica variable, CR templating, and readiness wait
operators/cincinnati-operator/tasks.yaml
Sets updateservice_replicas to 3 via set_fact, templates UpdateService spec.replicas with that variable, and inserts a polling wait task verifying all four Deployment replica status fields (replicas, readyReplicas, updatedReplicas, availableReplicas) match the variable before proceeding, conditioned on disconnected mode. The polling uses 60 retries at 10-second intervals with failed_when: false to log non-fatal wait timeouts.

Sequence Diagram

sequenceDiagram
  participant Playbook
  participant UpdateServiceCR as UpdateService CR
  participant Deployment as Deployment openshift-update-service

  Playbook->>Playbook: set_fact updateservice_replicas=3
  Playbook->>UpdateServiceCR: create with spec.replicas={{ updateservice_replicas }}
  Playbook->>Playbook: wait for RegistryCACertFound condition
  Playbook->>Deployment: poll replicas/readyReplicas/updatedReplicas/availableReplicas == 3
  Deployment-->>Playbook: all replica counts match
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • rh-ecosystem-edge/enclave#521: Directly modifies the same tasks.yaml file to parameterize updateservice_replicas and add the identical post-RegistryCACertFound Deployment replica wait block.
  • rh-ecosystem-edge/enclave#532: Modifies tasks.yaml around the same updateservice_replicas/spec.replicas logic and disconnected-mode replica readiness wait that this PR introduces.
  • rh-ecosystem-edge/enclave#539: Directly touches UpdateService spec.replicas in tasks.yaml, the same field this PR parameterizes.

Poem

🔒 Three replicas stand guard, no lone point of failure here,
A variable declared, the hard-code disappears.
We poll until all counts align, readyReplicas in sight,
Disconnected mode confirmed — the cluster sealed up tight.
No magic number buried deep; the fact is set up top! 🛡️

🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: validating UpdateService deployment pods are ready before completing installation in disconnected environments, which aligns with the playbook modifications that add pod readiness polling.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Hardcoded-Secrets ✅ Passed No hardcoded secrets detected. File uses templating for all dynamic values and properly externalizes configuration.
No-Weak-Crypto ✅ Passed This PR contains no cryptographic code, weak ciphers, custom crypto implementations, or insecure token comparisons; the check is not applicable.
No-Injection-Vectors ✅ Passed No SQL concatenation, shell injection, eval/exec, pickle.loads, unsafe yaml.load, os.system, or dangerouslySetInnerHTML patterns detected. Variables used safely in Ansible/Kubernetes API contexts.
Container-Privileges ✅ Passed The modified file contains no container specifications, privilege escalation settings, or security-sensitive configurations; it only uses Ansible modules to manage K8s objects.
No-Sensitive-Data-In-Logs ✅ Passed No explicit logging of sensitive data found. The playbook contains no debug/log directives that would expose passwords, tokens, API keys, PII, or credentials. KUBECONFIG paths are used only as envi...
Ai-Attribution ✅ Passed No AI tool usage detected in this PR. The commit message contains no AI mentions, PR description doesn't reference AI tools, and no AI-related trailers are present in the commit metadata.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch revert-532-revert-521-validate-updateservice-pods-ready-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@maorfr maorfr changed the title Validate UpdateService deployment pods are ready before completing install in disconnected Validate UpdateService deployment pods are ready before completing install Jun 23, 2026
@maorfr maorfr force-pushed the revert-532-revert-521-validate-updateservice-pods-ready-2 branch from f7a1712 to 29a2cb0 Compare June 23, 2026 16:41
@maorfr maorfr changed the title Validate UpdateService deployment pods are ready before completing install Validate UpdateService deployment pods are ready before completing install in disconnected Jun 23, 2026
@maorfr maorfr force-pushed the revert-532-revert-521-validate-updateservice-pods-ready-2 branch from 29a2cb0 to 6e7ff91 Compare June 23, 2026 18:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@operators/cincinnati-operator/tasks.yaml`:
- Around line 51-70: The "Wait for UpdateService deployment pods to be ready"
task has `failed_when: false` which disables failure detection and allows the
task to proceed even when the readiness conditions in the `until` clause
(checking replicas, readyReplicas, updatedReplicas, and availableReplicas match
updateservice_replicas) are never met. Remove the `failed_when: false` parameter
entirely so that the kubernetes.core.k8s_info task will properly fail if the
readiness gates are not satisfied after the configured retries, preventing the
installation from proceeding with non-ready UpdateService pods.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 26c76790-8554-4403-8301-e390371af310

📥 Commits

Reviewing files that changed from the base of the PR and between 29a2cb0 and 6e7ff91.

📒 Files selected for processing (1)
  • operators/cincinnati-operator/tasks.yaml

Comment thread operators/cincinnati-operator/tasks.yaml
@maorfr maorfr merged commit b3a2ed9 into main Jun 25, 2026
21 checks passed
@maorfr maorfr deleted the revert-532-revert-521-validate-updateservice-pods-ready-2 branch June 25, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

operators Operator installation/config

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants