[WIP] Add pre-bootstrap leftover check via enclave environment subcommand#496
[WIP] Add pre-bootstrap leftover check via enclave environment subcommand#496rporres wants to merge 7 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughA new pre-bootstrap leftover-resource detection and cleanup capability is introduced to scan and remove stale Metal3/Quay infrastructure before bootstrapping. A Python module ( ChangesLeftover Resource Detection and Cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Security ConsiderationsRisk Level: Moderate
Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (2 errors, 1 warning)
✅ Passed checks (8 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bootstrap.sh`:
- Around line 172-179: The step_check_leftovers function incorrectly assumes any
non-zero exit code means a clean environment, but the CLI contract specifies
exit code 0 means cleanup needed and exit code 1 means clean, with any other
non-zero code indicating an actual failure. Capture the exit code from the
enclave environment check-leftovers command (run it separately and store the
exit code in a variable) and then explicitly branch on that code: if it equals
0, show the cleanup error message and exit; if it equals 1, continue with the
"Environment is clean" message; if it equals anything else, log an error about
the failed precheck and exit. This prevents masking real command failures that
would otherwise be silently treated as a clean environment.
In `@src/enclave/environment/check_leftovers.py`:
- Around line 16-18: The _run function silently ignores command failures by
using check=False without validating the result, causing failed
podman/systemctl/sudo commands to be treated as successful "no leftovers"
results. Check the return code of the subprocess result and raise an exception
if the command exits with a non-zero status code before returning the stdout
lines. This ensures that actual command failures and timeouts are properly
surfaced rather than masking dirty environment states as clean.
In `@src/enclave/environment/cleanup.sh`:
- Around line 61-63: The rm command at line 62 that wipes the working directory
is unsafe because it misses dotfiles (files starting with a dot) and lacks a
guard against dangerous paths like root. Add a safety check before the rm
command to verify that the workingDir variable is not empty and is not a
critical system path like / or /etc, then modify the rm command to properly
delete both regular files and dotfiles by using a pattern that explicitly
includes dotfiles or by enabling the dotglob shell option before executing the
removal.
- Line 7: The cleanup.sh script at line 7 only includes set -u which catches
unset variables but allows command failures to be silently ignored, causing the
CLI to trust a false-success exit code. Add set -e (or set -o pipefail) after
set -u to make the script fail on any command error. Then wrap only the
explicitly-idempotent removal commands with a run_ignore wrapper function so
those specific operations can fail gracefully without terminating the script,
while ensuring all other critical cleanup commands will properly fail the script
if they encounter errors.
In `@src/enclave/environment/cli.py`:
- Around line 54-55: The subprocess.run call for the bash script execution in
the cleanup command lacks a timeout parameter, which can cause the CLI/CI step
to hang indefinitely if the subprocess blocks (such as waiting for interactive
input). Add a timeout parameter to the subprocess.run function call to ensure
the process terminates within a reasonable timeframe if it becomes unresponsive,
and handle the resulting TimeoutExpired exception appropriately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2b00ed72-d137-4868-966c-c0d426e8c3c7
📒 Files selected for processing (12)
.github/workflows/disconnected-dry-run.yml.github/workflows/e2e-deployment.ymlMakefile.cibootstrap.shscripts/deployment/deploy_bootstrap_step.shscripts/verification/validate.shsrc/enclave/cli.pysrc/enclave/environment/__init__.pysrc/enclave/environment/check_leftovers.pysrc/enclave/environment/cleanup.shsrc/enclave/environment/cli.pysrc/tests/test_check_leftovers.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/enclave/environment/cli.py (1)
76-80:⚠️ Potential issue | 🔴 CriticalCritical: Type safety violation in
_read_working_dir_from_config()causes CLI crash on malformed YAML.Line 77-79:
yaml.safe_load(f)can return a list, scalar, orNone—not always a dict. The code assumes a dict and calls.get(), which raisesAttributeErrorif the YAML root is a list or scalar. The exception handler at line 80 only catchesOSErrorandyaml.YAMLError, notAttributeError, so the CLI crashes unexpectedly.Risk: If
config/global.yamlcontains valid YAML that isn't a mapping (e.g.,[1, 2, 3]or a string), the cleanup command terminates with an unhandled exception instead of gracefully returningNone.Suggested fix
try: with config_path.open(encoding="utf-8") as f: - config: dict[str, object] = yaml.safe_load(f) or {} - value = config.get("workingDir") + loaded = yaml.safe_load(f) + if not isinstance(loaded, dict): + return None + value = loaded.get("workingDir") return str(value) if value is not None else None except (OSError, yaml.YAMLError): return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/enclave/environment/cli.py` around lines 76 - 80, The _read_working_dir_from_config() function assumes yaml.safe_load() always returns a dict, but it can return a list, scalar, or None. Calling .get() on a non-dict raises AttributeError, which is not caught by the current exception handler (which only catches OSError and yaml.YAMLError). Fix this by adding a type check to verify that the loaded config is a dict instance before calling .get() on it, returning None if it is not a dict. Additionally, add AttributeError to the except clause at line 80 to catch any attribute-related errors as a defensive measure.
♻️ Duplicate comments (3)
src/enclave/environment/check_leftovers.py (1)
16-21:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMajor:
_run()can hang indefinitely on blocked host commands.At Line 17,
subprocess.run(...)has notimeout. A stucksystemctl/podmancall can stall CI/bootstrap and block safe failure handling.Suggested fix
def _run(cmd: list[str]) -> list[str]: - result = subprocess.run(cmd, capture_output=True, text=True, check=False) + result = subprocess.run( + cmd, + capture_output=True, + text=True, + check=False, + timeout=30, + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/enclave/environment/check_leftovers.py` around lines 16 - 21, The _run() function in src/enclave/environment/check_leftovers.py lacks a timeout parameter when calling subprocess.run() at line 17, which can cause the function to hang indefinitely if a host command (like systemctl or podman) gets stuck. Add a timeout parameter to the subprocess.run() call to ensure commands are terminated if they exceed a reasonable execution time, preventing indefinite hangs that could block CI/bootstrap.src/enclave/environment/cleanup.sh (2)
62-69:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: critical-path guard is bypassable with trailing slashes.
At Lines 62-69, the guard matches exact literals (
/tmp,/etc, etc.), but values like/tmp/or/etc/bypass thecaseand proceed torm -fr .../*, creating catastrophic deletion risk.Suggested fix
-case "${workingDir}" in +safe_working_dir="${workingDir%/}" +[ -z "${safe_working_dir}" ] && safe_working_dir="/" + +case "${safe_working_dir}" in + .|..) + error "Refusing to wipe relative path: ${workingDir}" + exit 1 + ;; /|/etc|/usr|/bin|/sbin|/lib|/lib64|/boot|/proc|/sys|/dev|/var|/home|/root|/tmp) error "Refusing to wipe critical system path: ${workingDir}" exit 1 ;; esac shopt -s dotglob -rm -fr "${workingDir:?}"/* +rm -fr -- "${safe_working_dir:?}/"* shopt -u dotglob🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/enclave/environment/cleanup.sh` around lines 62 - 69, The critical-path guard in the case statement can be bypassed by appending trailing slashes to the protected paths (e.g., `/tmp/` instead of `/tmp`). Before the case statement that evaluates `workingDir`, normalize the variable by removing any trailing slashes using bash parameter expansion (e.g., `${workingDir%/}`) so that paths like `/tmp/`, `/etc/`, etc. are properly matched against the guard patterns and rejected. This ensures the safety check cannot be circumvented through path manipulation.
7-7:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMajor: cleanup can report success after partial failure.
At Line 7,
set -ualone does not fail on command errors. This can leave leftovers while the script still exits “success,” which is risky before bootstrap.Suggested fix pattern
-set -u +set -euo pipefail + +run_ignore() { + "$@" || warning "Ignoring expected cleanup failure: $*" +}-sudo podman pod rm -f metal3-ironic +run_ignore sudo podman pod rm -f metal3-ironicAs per coding guidelines, shell scripts should use fail-fast settings (
set -euo pipefail) to avoid silent failure paths.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/enclave/environment/cleanup.sh` at line 7, The cleanup.sh script uses only `set -u` for error handling, which does not cause the script to fail on command errors and can result in partial cleanup while still reporting success. Replace the `set -u` directive with `set -euo pipefail` to enable fail-fast behavior that exits immediately on any command error, unset variable access, or pipe failure. This ensures the cleanup script fails visibly and completely rather than leaving leftovers behind.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/enclave/environment/check_leftovers.py`:
- Around line 18-21: Define a custom LeftoverCheckError exception class in
check_leftovers.py to replace the raw subprocess.CalledProcessError being raised
in the _run() helper function. Modify the _run() function to catch
subprocess.CalledProcessError internally and raise LeftoverCheckError instead,
providing appropriate error context. Update the exception handler in cli.py that
currently catches subprocess.CalledProcessError to catch the new
LeftoverCheckError type instead. Additionally, update all test assertions that
currently expect subprocess.CalledProcessError to expect LeftoverCheckError
instead.
---
Outside diff comments:
In `@src/enclave/environment/cli.py`:
- Around line 76-80: The _read_working_dir_from_config() function assumes
yaml.safe_load() always returns a dict, but it can return a list, scalar, or
None. Calling .get() on a non-dict raises AttributeError, which is not caught by
the current exception handler (which only catches OSError and yaml.YAMLError).
Fix this by adding a type check to verify that the loaded config is a dict
instance before calling .get() on it, returning None if it is not a dict.
Additionally, add AttributeError to the except clause at line 80 to catch any
attribute-related errors as a defensive measure.
---
Duplicate comments:
In `@src/enclave/environment/check_leftovers.py`:
- Around line 16-21: The _run() function in
src/enclave/environment/check_leftovers.py lacks a timeout parameter when
calling subprocess.run() at line 17, which can cause the function to hang
indefinitely if a host command (like systemctl or podman) gets stuck. Add a
timeout parameter to the subprocess.run() call to ensure commands are terminated
if they exceed a reasonable execution time, preventing indefinite hangs that
could block CI/bootstrap.
In `@src/enclave/environment/cleanup.sh`:
- Around line 62-69: The critical-path guard in the case statement can be
bypassed by appending trailing slashes to the protected paths (e.g., `/tmp/`
instead of `/tmp`). Before the case statement that evaluates `workingDir`,
normalize the variable by removing any trailing slashes using bash parameter
expansion (e.g., `${workingDir%/}`) so that paths like `/tmp/`, `/etc/`, etc.
are properly matched against the guard patterns and rejected. This ensures the
safety check cannot be circumvented through path manipulation.
- Line 7: The cleanup.sh script uses only `set -u` for error handling, which
does not cause the script to fail on command errors and can result in partial
cleanup while still reporting success. Replace the `set -u` directive with `set
-euo pipefail` to enable fail-fast behavior that exits immediately on any
command error, unset variable access, or pipe failure. This ensures the cleanup
script fails visibly and completely rather than leaving leftovers behind.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fe4add4e-dc64-4fa8-b114-90f8996be044
📒 Files selected for processing (5)
bootstrap.shsrc/enclave/environment/check_leftovers.pysrc/enclave/environment/cleanup.shsrc/enclave/environment/cli.pysrc/tests/test_check_leftovers.py
| if result.returncode != 0: | ||
| raise subprocess.CalledProcessError( | ||
| result.returncode, cmd, result.stdout, result.stderr | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and examine its structure
fd -t f check_leftovers.pyRepository: rh-ecosystem-edge/enclave
Length of output: 146
🏁 Script executed:
# Look for the cli.py file and understand exception handling
fd -t f cli.py | grep environmentRepository: rh-ecosystem-edge/enclave
Length of output: 100
🏁 Script executed:
# Search for existing custom exception classes in the environment module
find . -path "*/environment/*.py" -type f | head -20Repository: rh-ecosystem-edge/enclave
Length of output: 185
🏁 Script executed:
cat -n src/enclave/environment/check_leftovers.pyRepository: rh-ecosystem-edge/enclave
Length of output: 3144
🏁 Script executed:
cat -n src/enclave/environment/cli.pyRepository: rh-ecosystem-edge/enclave
Length of output: 3161
🏁 Script executed:
cat -n src/enclave/environment/__init__.pyRepository: rh-ecosystem-edge/enclave
Length of output: 51
🏁 Script executed:
# Check test file for exception handling
cat -n src/tests/test_check_leftovers.pyRepository: rh-ecosystem-edge/enclave
Length of output: 7394
🏁 Script executed:
# Search for other usages of CalledProcessError in relation to check_leftovers
rg "CalledProcessError" --type pyRepository: rh-ecosystem-edge/enclave
Length of output: 326
🏁 Script executed:
# Check if there are any other imports of check_leftovers in the codebase
rg "from enclave.environment.check_leftovers|import.*check_leftovers" --type pyRepository: rh-ecosystem-edge/enclave
Length of output: 261
Define custom exception for leftover check failures.
The _run() helper at lines 18-21 raises raw subprocess.CalledProcessError, leaking implementation details into the module's public contract. This couples callers (cli.py, tests) to subprocess internals and reduces clarity around failure modes.
Define a custom LeftoverCheckError exception (and specific subclasses if needed) in check_leftovers.py and raise that from _run(). Update the exception handler in cli.py to catch the domain-specific type instead. This aligns with the project's custom exception hierarchy requirement for Python code.
Note: Tests currently assert subprocess.CalledProcessError; they'll need updating to expect the new exception type instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/enclave/environment/check_leftovers.py` around lines 18 - 21, Define a
custom LeftoverCheckError exception class in check_leftovers.py to replace the
raw subprocess.CalledProcessError being raised in the _run() helper function.
Modify the _run() function to catch subprocess.CalledProcessError internally and
raise LeftoverCheckError instead, providing appropriate error context. Update
the exception handler in cli.py that currently catches
subprocess.CalledProcessError to catch the new LeftoverCheckError type instead.
Additionally, update all test assertions that currently expect
subprocess.CalledProcessError to expect LeftoverCheckError instead.
Source: Coding guidelines
f3fbf4c to
a57296e
Compare
Introduces `enclave environment check-leftovers` and `enclave environment cleanup` commands so customers can detect and remove Metal3/Quay residue before re-running bootstrap. The logic lives in the Python package (included in the customer tarball) rather than scripts/ (which is excluded). - Add src/enclave/environment/ with check_leftovers.py, cli.py, cleanup.sh - Register `enclave environment` subcommand in src/enclave/cli.py - Add step_check_leftovers() to bootstrap.sh (advise-only, does not auto-clean) - Extend shellcheck coverage to src/ in scripts/verification/validate.sh - Add deploy-cluster-check-leftovers Makefile.ci target - Run leftover check in e2e-deployment and disconnected-dry-run GH Actions workflows after installing Enclave Lab Signed-off-by: Rafa Porres Molina <rporresm@redhat.com> Assisted-by: Claude Code <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/disconnected-dry-run.yml:
- Around line 113-118: The "Check for leftover resources on Landing Zone" step
is missing a step ID, which causes failure attribution to be unclear when the
step fails. Add an id field to this step (for example, id: check-leftovers or
similar) so that downstream failure detection and reporting can properly
identify which phase failed, allowing better root cause analysis in Slack
notifications and summaries.
In @.github/workflows/e2e-deployment.yml:
- Around line 257-262: Add an `id` attribute to the "Check for leftover
resources on Landing Zone" step (currently missing an id) and also to the
corresponding step mentioned at lines 666-671. Use `check_leftovers` as the id
value for both steps. Then add `check_leftovers` to the respective job's
`step_names` and `step_order` lists to ensure the existing failure reporting
logic in the "Determine failed step" step can properly capture and surface any
failures from these leftover-check steps.
In `@src/enclave/environment/cleanup.sh`:
- Around line 62-69: The case statement guard at line 62 compares raw string
values of workingDir, which can be bypassed using path traversal tricks like
/tmp/.. or symlinks, allowing line 69 to delete unintended directories. Before
the case statement, canonicalize workingDir using realpath or readlink to
resolve all symlinks and relative path components, then also add an explicit
check to reject any paths containing .. sequences. Update the case statement
comparison to validate against the canonicalized path instead of the raw input
to ensure the deletion guard cannot be bypassed through path manipulation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 35d0d940-c6c9-447b-93ee-33aeb05fbd72
📒 Files selected for processing (12)
.github/workflows/disconnected-dry-run.yml.github/workflows/e2e-deployment.ymlMakefile.cibootstrap.shscripts/deployment/deploy_bootstrap_step.shscripts/verification/validate.shsrc/enclave/cli.pysrc/enclave/environment/__init__.pysrc/enclave/environment/check_leftovers.pysrc/enclave/environment/cleanup.shsrc/enclave/environment/cli.pysrc/tests/test_check_leftovers.py
Signed-off-by: Rafa Porres Molina <rporresm@redhat.com>
a57296e to
0d9b8f4
Compare
Signed-off-by: Rafa Porres Molina <rporresm@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/enclave/environment/check_leftovers.py (2)
26-36: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winEssential refactor: use a domain exception instead of raw
CalledProcessError.Line 26 and Line 35 expose subprocess internals as part of this module contract. Introduce a module-specific exception (for example,
LeftoverCheckError) and catch that in CLI/tests.As per coding guidelines, "Implement custom exception hierarchy with descriptive error messages in Python code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/enclave/environment/check_leftovers.py` around lines 26 - 36, Define a custom module-specific exception class (such as LeftoverCheckError) at the module level in check_leftovers.py. In the _run function, wrap the subprocess.CalledProcessError that is raised on line 26 by catching it and raising the custom exception instead, preserving the error details. Update the exception handling in the check_systemd function to catch the new custom exception instead of subprocess.CalledProcessError, and ensure all other locations in the codebase that interact with this module (CLI and tests) are updated to handle the new exception type.Source: Coding guidelines
16-17:⚠️ Potential issue | 🟠 MajorMajor: add a timeout to
_run()to prevent bootstrap/preflight hangs.Line 17 runs external commands (
systemctl,podman) without timeout. If these commands block due to system issues,check-leftoverscan stall the entire bootstrap or CI pipeline indefinitely. Established pattern in the codebase (seerun_oc_command(),cleanup_cmd()) uses timeouts for subprocess calls.Suggested patch
def _run(cmd: list[str]) -> list[str]: - result = subprocess.run(cmd, capture_output=True, text=True, check=False) + try: + result = subprocess.run( + cmd, + capture_output=True, + text=True, + check=False, + timeout=30, + ) + except subprocess.TimeoutExpired as exc: + raise subprocess.CalledProcessError( + 124, cmd, exc.stdout or "", exc.stderr or "" + ) from exc🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/enclave/environment/check_leftovers.py` around lines 16 - 17, Add a timeout parameter to the subprocess.run() call in the _run() function to prevent external commands from blocking indefinitely. Review the established timeout patterns used in similar functions like run_oc_command() and cleanup_cmd() elsewhere in the codebase, then apply a reasonable timeout value to the subprocess.run() call to ensure check-leftovers does not stall the bootstrap or CI pipeline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/enclave/environment/cleanup.sh`:
- Around line 63-66: The cleanup script currently treats a missing or invalid
working directory as a fatal error by calling error and exiting with status 1 in
the resolvedWorkingDir validation block. This should instead be treated as an
already-clean state since there is nothing to clean up. Modify the condition
check to exit successfully with status 0 (or remove the exit entirely and let
execution continue) when resolvedWorkingDir is empty or the directory does not
exist, so that repeat cleanup runs do not fail on absent directories.
---
Duplicate comments:
In `@src/enclave/environment/check_leftovers.py`:
- Around line 26-36: Define a custom module-specific exception class (such as
LeftoverCheckError) at the module level in check_leftovers.py. In the _run
function, wrap the subprocess.CalledProcessError that is raised on line 26 by
catching it and raising the custom exception instead, preserving the error
details. Update the exception handling in the check_systemd function to catch
the new custom exception instead of subprocess.CalledProcessError, and ensure
all other locations in the codebase that interact with this module (CLI and
tests) are updated to handle the new exception type.
- Around line 16-17: Add a timeout parameter to the subprocess.run() call in the
_run() function to prevent external commands from blocking indefinitely. Review
the established timeout patterns used in similar functions like run_oc_command()
and cleanup_cmd() elsewhere in the codebase, then apply a reasonable timeout
value to the subprocess.run() call to ensure check-leftovers does not stall the
bootstrap or CI pipeline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cab94113-c19c-4eae-8c36-192e754fbebd
📒 Files selected for processing (12)
.github/workflows/disconnected-dry-run.yml.github/workflows/e2e-deployment.ymlMakefile.cibootstrap.shscripts/deployment/deploy_bootstrap_step.shscripts/verification/validate.shsrc/enclave/cli.pysrc/enclave/environment/__init__.pysrc/enclave/environment/check_leftovers.pysrc/enclave/environment/cleanup.shsrc/enclave/environment/cli.pysrc/tests/test_check_leftovers.py
| if [ -z "${resolvedWorkingDir}" ] || [ ! -d "${resolvedWorkingDir}" ]; then | ||
| error "Invalid WORKING_DIR: ${workingDir}" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Major: missing working directory should be treated as already-clean, not fatal.
Line 63-Line 66 exits with failure when the directory is absent. That turns repeat cleanup runs into false failures and can break automation even after resources were removed.
Suggested patch
if [ -z "${resolvedWorkingDir}" ] || [ ! -d "${resolvedWorkingDir}" ]; then
- error "Invalid WORKING_DIR: ${workingDir}"
- exit 1
+ if [ -e "${resolvedWorkingDir}" ]; then
+ error "Invalid WORKING_DIR (not a directory): ${workingDir}"
+ exit 1
+ fi
+ warning "Working directory does not exist, skipping wipe: ${workingDir}"
+ info "Pre-bootstrap cleanup complete"
+ exit 0
fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/enclave/environment/cleanup.sh` around lines 63 - 66, The cleanup script
currently treats a missing or invalid working directory as a fatal error by
calling error and exiting with status 1 in the resolvedWorkingDir validation
block. This should instead be treated as an already-clean state since there is
nothing to clean up. Modify the condition check to exit successfully with status
0 (or remove the exit entirely and let execution continue) when
resolvedWorkingDir is empty or the directory does not exist, so that repeat
cleanup runs do not fail on absent directories.
Signed-off-by: Rafa Porres Molina <rporresm@redhat.com>
Signed-off-by: Rafa Porres Molina <rporresm@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/infrastructure/generate_enclave_vars.sh`:
- Line 105: The CLUSTER_NAME variable (sourced from the ENCLAVE_CLUSTER_NAME
environment variable at line 26) is used directly in the workingDir path at line
105 without any validation, creating a path traversal vulnerability. Add
validation logic after the CLUSTER_NAME variable is set to reject any values
containing "../" sequences or absolute paths (starting with "/"), and fail
gracefully if invalid input is detected. This ensures the working directory
cannot be redirected outside the intended /home/cloud-user/sessions/ directory
tree.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ace271ff-03e9-4cb4-91b5-eba3a3cf880f
📒 Files selected for processing (3)
scripts/infrastructure/generate_enclave_vars.shsrc/enclave/environment/check_leftovers.pysrc/tests/test_check_leftovers.py
| # Base Configuration | ||
| # ============================================================================ | ||
| workingDir: "/home/cloud-user" | ||
| workingDir: "/home/cloud-user/sessions/${CLUSTER_NAME}" |
There was a problem hiding this comment.
Major: Path traversal risk in CLUSTER_NAME without validation.
CLUSTER_NAME (sourced from environment variable ENCLAVE_CLUSTER_NAME at line 26) is interpolated into the workingDir path without canonicalization or validation. An attacker-controlled value containing ../ sequences or an absolute path could redirect the working directory to an unintended filesystem location, potentially overwriting sensitive files or leaking configuration data outside the intended directory tree.
Security impact: Violates the coding guideline requirement to "canonicalize paths, reject ../".
🛡️ Proposed validation fix
Add validation after line 33 to reject path traversal attempts:
CLUSTER_NAME="${ENCLAVE_CLUSTER_NAME:-enclave-test}"
+
+# Validate CLUSTER_NAME to prevent path traversal
+if [[ "$CLUSTER_NAME" == *".."* ]] || [[ "$CLUSTER_NAME" == /* ]]; then
+ error "CLUSTER_NAME must not contain '..' or absolute paths: $CLUSTER_NAME"
+ exit 1
+fi
+
+# Further restrict to safe characters (alphanumeric, dash, underscore)
+if [[ ! "$CLUSTER_NAME" =~ ^[a-zA-Z0-9_-]+$ ]]; then
+ error "CLUSTER_NAME contains invalid characters: $CLUSTER_NAME"
+ exit 1
+fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/infrastructure/generate_enclave_vars.sh` at line 105, The
CLUSTER_NAME variable (sourced from the ENCLAVE_CLUSTER_NAME environment
variable at line 26) is used directly in the workingDir path at line 105 without
any validation, creating a path traversal vulnerability. Add validation logic
after the CLUSTER_NAME variable is set to reject any values containing "../"
sequences or absolute paths (starting with "/"), and fail gracefully if invalid
input is detected. This ensures the working directory cannot be redirected
outside the intended /home/cloud-user/sessions/ directory tree.
Source: Coding guidelines
Signed-off-by: Rafa Porres Molina <rporresm@redhat.com>
Introduces
enclave environment check-leftoversandenclave environment cleanupcommands so customers can detect and remove Metal3/Quay residue before re-running bootstrap. The logic lives in the Python package (included in the customer tarball) rather than scripts/ (which is excluded).enclave environmentsubcommand in src/enclave/cli.pySummary by CodeRabbit
enclave environmentcommand group withcheck-leftovers(detect stale Metal3/Quay resources) andcleanup(remediate and reset the environment).src/.