Approve pending upgrade InstallPlans during MCH wait#467
Conversation
WalkthroughThe PR modifies the advanced cluster management task sequence to integrate operator version reconciliation into the MultiClusterHub readiness check. The task now derives the MCE operator reference, runs reconciliation for both operators, then polls the MultiClusterHub phase via shell commands until it reaches "Running" status. ChangesRHACM MultiClusterHub Readiness and Operator Reconciliation
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~12 minutes Risk Assessment: The shift from native Kubernetes API polling to shell-based phase queries introduces operational risk. The
🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@operators/advanced-cluster-management/tasks.yaml`:
- Around line 16-18: After setting mce_operator via the set_fact that selects
the 'multicluster-engine' entry, add an ansible.builtin.assert task immediately
after to fail fast if mce_operator is undefined or missing required keys; assert
that mce_operator is defined and that mce_operator.name, mce_operator.version
and mce_operator.namespace are present (with a clear message like
"multicluster-engine operator entry missing or incomplete") so the later shell
task that dereferences mce_operator.name/.version/.namespace cannot run with
nil/invalid values.
- Around line 21-39: The shell task registering mch_phase_result can hang
because the two commands (enclave reconcile operator-versions and oc get
multiclusterhub ...) have no per-attempt timeout; update the
ansible.builtin.shell invocation to wrap each of those commands with a
per-command timeout (e.g., via the GNU timeout utility such as timeout 300s
enclave reconcile ... and timeout 30s oc get ...) so each retry attempt will
fail fast instead of hanging indefinitely; keep set -euo pipefail and the same
register variable (mch_phase_result) unchanged.
- Around line 24-35: The task is polluting stdout by redirecting stderr into it;
remove the "2>&1" (or otherwise stop merging stderr into stdout) from both
enclave reconcile invocations so that mce_phase_result.stdout can match
"Running" reliably (i.e., change the two lines that include "enclave reconcile
operator-versions ... --no-dry-run 2>&1 || true" for the mce_operator and
operator blocks to not redirect stderr into stdout, keeping "|| true" only if
you still want a zero-failure exit behavior).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: adffd63c-612d-465a-90a3-e6ca2acacd81
📒 Files selected for processing (1)
operators/advanced-cluster-management/tasks.yaml
| - name: Set MCE operator reference for upgrade approval during MCH wait | ||
| ansible.builtin.set_fact: | ||
| mce_operator: "{{ operators | selectattr('name', 'equalto', 'multicluster-engine') | first }}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and inspect the referenced tasks file and the usages of mce_operator.
FILE="operators/advanced-cluster-management/tasks.yaml"
echo "=== File: $FILE (lines 1-120) ==="
nl -ba "$FILE" | sed -n '1,120p'
echo
echo "=== Search for mce_operator usage in operators/advanced-cluster-management ==="
rg -n "mce_operator" operators/advanced-cluster-management || true
echo
echo "=== Search for operators variable definition/shape (operators list) in this area ==="
rg -n "operators" operators/advanced-cluster-management || trueRepository: rh-ecosystem-edge/enclave
Length of output: 207
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List files under operators/advanced-cluster-management to understand context.
find operators/advanced-cluster-management -maxdepth 3 -type f -print
echo
echo "=== Inspect tasks.yaml around the reported lines ==="
FILE="operators/advanced-cluster-management/tasks.yaml"
nl -ba "$FILE" | sed -n '1,80p'
nl -ba "$FILE" | sed -n '80,160p' || true
echo
echo "=== Find mce_operator usage anywhere in repo ==="
rg -n "mce_operator" . || trueRepository: rh-ecosystem-edge/enclave
Length of output: 342
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="operators/advanced-cluster-management/tasks.yaml"
echo "=== File: $FILE (head 120) ==="
nl -ba "$FILE" | sed -n '1,120p'
echo
echo "=== File: $FILE (tail 80) ==="
nl -ba "$FILE" | tail -n 80 || true
echo
echo "=== mce_operator usage in $FILE ==="
rg -n "mce_operator" "$FILE" || true
echo
echo "=== mce_operator usage in repo ==="
rg -n "mce_operator" . || true
echo
echo "=== operators variable usage in $FILE ==="
rg -n "operators" "$FILE" || trueRepository: rh-ecosystem-edge/enclave
Length of output: 204
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="operators/advanced-cluster-management/tasks.yaml"
echo "=== File: $FILE (head 120) ==="
nl -ba "$FILE" | sed -n '1,120p'
echo
echo "=== File: $FILE (tail 80) ==="
nl -ba "$FILE" | tail -n 80 || true
echo
echo "=== mce_operator usage in $FILE ==="
rg -n "mce_operator" "$FILE" || true
echo
echo "=== mce_operator usage in repo ==="
rg -n "mce_operator" . || true
echo
echo "=== operators variable usage in $FILE ==="
rg -n "operators" "$FILE" || trueRepository: rh-ecosystem-edge/enclave
Length of output: 204
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="operators/advanced-cluster-management/tasks.yaml"
echo "=== Print $FILE with line numbers (1-120) ==="
python3 - <<'PY'
import itertools
file_path="operators/advanced-cluster-management/tasks.yaml"
start, end = 1, 120
with open(file_path, "r", encoding="utf-8") as f:
for i, line in enumerate(f, start=1):
if i < start:
continue
if i > end:
break
print(f"{i:4d}\t{line.rstrip()}")
PY
echo
echo "=== Print $FILE with line numbers (120-end, up to 260) ==="
python3 - <<'PY'
file_path="operators/advanced-cluster-management/tasks.yaml"
start, end = 121, 260
with open(file_path, "r", encoding="utf-8") as f:
for i, line in enumerate(f, start=1):
if i < start:
continue
if i > end:
break
print(f"{i:4d}\t{line.rstrip()}")
PY
echo
echo "=== rg: mce_operator usage ==="
rg -n "mce_operator" operators/advanced-cluster-management || true
echo
echo "=== rg: operators usage near tasks.yaml ==="
rg -n "operators" operators/advanced-cluster-management/tasks.yaml || trueRepository: rh-ecosystem-edge/enclave
Length of output: 4866
Major: Assert mce_operator is present before dereferencing .name/.version/.namespace.
In operators/advanced-cluster-management/tasks.yaml, mce_operator is set to operators | selectattr('name','equalto','multicluster-engine') | first (line 18). If operators lacks a multicluster-engine entry, the subsequent shell task dereferences mce_operator.name, .version, and .namespace (lines 25-27), leading to a broken reconciliation run (template failure or empty/invalid CLI args). Add an ansible.builtin.assert right after the set_fact to fail fast with a clear message when the required operator entry/fields are missing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@operators/advanced-cluster-management/tasks.yaml` around lines 16 - 18, After
setting mce_operator via the set_fact that selects the 'multicluster-engine'
entry, add an ansible.builtin.assert task immediately after to fail fast if
mce_operator is undefined or missing required keys; assert that mce_operator is
defined and that mce_operator.name, mce_operator.version and
mce_operator.namespace are present (with a clear message like
"multicluster-engine operator entry missing or incomplete") so the later shell
task that dereferences mce_operator.name/.version/.namespace cannot run with
nil/invalid values.
| ansible.builtin.shell: | | ||
| set -euo pipefail | ||
| # Approve any pending upgrade InstallPlans for MCE and ACM | ||
| enclave reconcile operator-versions \ | ||
| --name "{{ mce_operator.name }}" \ | ||
| --version "{{ mce_operator.version }}" \ | ||
| --namespace "{{ mce_operator.namespace }}" \ | ||
| --csv-name "{{ mce_operator.name }}" \ | ||
| --no-dry-run 2>&1 || true | ||
| enclave reconcile operator-versions \ | ||
| --name "{{ operator.name }}" \ | ||
| --version "{{ operator.version }}" \ | ||
| --namespace "{{ operator.namespace }}" \ | ||
| --csv-name "{{ operator.name }}" \ | ||
| --no-dry-run 2>&1 || true | ||
| # Output MCH phase for the until condition | ||
| oc get multiclusterhub multiclusterhub -n open-cluster-management \ | ||
| -o jsonpath='{.status.phase}' | ||
| register: mch_phase_result |
There was a problem hiding this comment.
Major: Add per-attempt timeouts to the retrying ansible.builtin.shell in operators/advanced-cluster-management/tasks.yaml (lines 21-39).
The task retries on mch_phase_result, but the enclave reconcile operator-versions ... and oc get multiclusterhub ... commands have no per-attempt timeout and can hang indefinitely—blocking retries and stalling the upgrade-approval flow (availability risk).
Suggested fix
- enclave reconcile operator-versions \
+ timeout 60s enclave reconcile operator-versions \
--name "{{ mce_operator.name }}" \
--version "{{ mce_operator.version }}" \
--namespace "{{ mce_operator.namespace }}" \
--csv-name "{{ mce_operator.name }}" \
--no-dry-run 2>&1 || true
- enclave reconcile operator-versions \
+ timeout 60s enclave reconcile operator-versions \
--name "{{ operator.name }}" \
--version "{{ operator.version }}" \
--namespace "{{ operator.namespace }}" \
--csv-name "{{ operator.name }}" \
--no-dry-run 2>&1 || true
# Output MCH phase for the until condition
- oc get multiclusterhub multiclusterhub -n open-cluster-management \
+ timeout 60s oc get multiclusterhub multiclusterhub -n open-cluster-management \
-o jsonpath='{.status.phase}'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@operators/advanced-cluster-management/tasks.yaml` around lines 21 - 39, The
shell task registering mch_phase_result can hang because the two commands
(enclave reconcile operator-versions and oc get multiclusterhub ...) have no
per-attempt timeout; update the ansible.builtin.shell invocation to wrap each of
those commands with a per-command timeout (e.g., via the GNU timeout utility
such as timeout 300s enclave reconcile ... and timeout 30s oc get ...) so each
retry attempt will fail fast instead of hanging indefinitely; keep set -euo
pipefail and the same register variable (mch_phase_result) unchanged.
When MCE is installed with Manual approval, a newer version can appear in the catalog while MCH is still reconciling. This creates a second InstallPlan in RequiresApproval phase that nobody approves, causing MCH to get stuck in phase: Installing forever. Replace the simple k8s_info MCH wait with a shell task that directly approves any unapproved InstallPlans in the MCE and ACM namespaces on every retry iteration before checking MCH phase.
f31504d to
6766d0e
Compare
|
this PR effectively cancelles version pinning, not sure we can merge it? |
|
✨ Claude Code: I've created an alternative proposal in PR #471 that introduces a more flexible property for operators. Instead of hardcoding logic for multicluster-engine, the new approach:
This is proposed as a workaround. We should still pursue a bug with the ACM team for a proper fix, but this provides a configurable solution in the meantime. See: #471 |
Summary
RequiresApprovalphase that nobody approves, causing MCH to get stuck inphase: Installingforever.k8s_infoMCH wait with a shell task that directly approves any unapproved InstallPlans in the MCE and ACM namespaces on every retry iteration before checking MCH phase. This bypasses version gating since the version-gated reconcile tool would skip InstallPlans for versions newer than pinned.