Skip to content

Conversation

@mpryc
Copy link
Contributor

@mpryc mpryc commented Dec 17, 2025

Prevents tests from hanging indefinitely when streaming logs from object storage. Uses 5-minute timeout with 3 retries.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

Walkthrough

Adds timeout-and-retry support to E2E CLI helpers: new execution primitives in cli_common.go plus options-enabled describe/log helpers for backup and restore CLIs to use timeout and retry behavior.

Changes

Cohort / File(s) Summary
Core CLI timeout & retry infra
tests/e2e/lib/cli_common.go
Adds DefaultCLITimeout, DefaultCLIRetries, DefaultCLIRetryDelay; adds Timeout time.Duration field to CLICommand; implements ExecuteWithTimeout, ExecuteOutputWithTimeout, ExecuteWithTimeoutAndRetry, ExecuteOutputWithTimeoutAndRetry using context.WithTimeout and retry logic.
Backup CLI variants
tests/e2e/lib/backup_cli.go
Adds DescribeBackupViaCLIWithOptions(name string, timeout time.Duration, maxRetries int, retryDelay time.Duration) string; adds BackupLogsViaCLIWithTimeout(name string, timeout time.Duration) (string, error) and BackupLogsViaCLIWithOptions(name string, timeout time.Duration, maxRetries int, retryDelay time.Duration) (string, error); DescribeBackupViaCLI and BackupLogsViaCLI now delegate to options variants; imports time.
Restore CLI variants
tests/e2e/lib/restore_cli.go
Adds DescribeRestoreViaCLIWithOptions(name string, timeout time.Duration, maxRetries int, retryDelay time.Duration) string; adds RestoreLogsViaCLIWithTimeout(name string, timeout time.Duration) (string, error) and RestoreLogsViaCLIWithOptions(name string, timeout time.Duration, maxRetries int, retryDelay time.Duration) (string, error); DescribeRestoreViaCLI and RestoreLogsViaCLI now delegate to options variants; imports time and propagates errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review context and cancellation usage in ExecuteWithTimeout / ExecuteWithTimeoutAndRetry.
  • Verify retry loop correctness and that last error/output is returned after retries.
  • Check delegation correctness in backup_cli.go and restore_cli.go (default values used, signatures preserved).
  • Confirm constants' default values are appropriate for test timing.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between f95c9cc and 51683de.

📒 Files selected for processing (3)
  • tests/e2e/lib/backup_cli.go (2 hunks)
  • tests/e2e/lib/cli_common.go (2 hunks)
  • tests/e2e/lib/restore_cli.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/e2e/lib/cli_common.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • tests/e2e/lib/restore_cli.go
  • tests/e2e/lib/backup_cli.go
🧬 Code graph analysis (2)
tests/e2e/lib/restore_cli.go (1)
tests/e2e/lib/cli_common.go (4)
  • DefaultCLITimeout (15-15)
  • DefaultCLIRetries (19-19)
  • DefaultCLIRetryDelay (20-20)
  • CLICommand (23-29)
tests/e2e/lib/backup_cli.go (1)
tests/e2e/lib/cli_common.go (4)
  • DefaultCLITimeout (15-15)
  • DefaultCLIRetries (19-19)
  • DefaultCLIRetryDelay (20-20)
  • CLICommand (23-29)
🔇 Additional comments (8)
tests/e2e/lib/backup_cli.go (4)

195-199: LGTM: Backward-compatible timeout/retry support.

The delegation pattern maintains backward compatibility while adding timeout/retry protection against hanging operations.


201-225: LGTM: Clear timeout/retry implementation.

The branching logic correctly routes to timeout-only execution when maxRetries ≤ 1 and to retry-enabled execution otherwise, aligning with the function's flexible options design.


227-237: LGTM: Well-designed API variants.

The three-tier API (default, timeout-only, full-options) provides appropriate flexibility for different use cases while maintaining clear intent.


239-267: LGTM: Consistent implementation.

The implementation matches the describe operation's pattern, maintaining consistency across the codebase. The comments clearly explain the rationale for timeout/retry protection.

tests/e2e/lib/restore_cli.go (4)

170-174: LGTM: Consistent with backup implementation.

The restore operations mirror the backup implementation's approach, ensuring consistency across the codebase.


176-200: LGTM: Parallel implementation.

The describe implementation correctly parallels the backup variant, maintaining architectural consistency.


202-212: LGTM: Consistent API design.

The three-tier API design (default, timeout-only, full-options) is properly applied to restore operations, matching the backup implementation.


214-242: LGTM: Consistent and complete implementation.

The implementation correctly mirrors the backup logs pattern, ensuring uniform timeout/retry behavior across all CLI operations that may hang on object storage access.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from eemcmullan and sseago December 17, 2025 12:29
@openshift-ci
Copy link

openshift-ci bot commented Dec 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mpryc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 17, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/e2e/lib/cli_common.go (2)

103-150: LGTM: Retry logic is correctly implemented.

The retry loops properly handle failures with appropriate delays between attempts and informative logging.

Note: The parameter name maxRetries represents the total number of attempts (including the initial attempt), not the number of retries. For example, maxRetries=3 results in 3 total attempts (1 initial + 2 retries). This is consistent throughout the implementation but could be clearer if renamed to maxAttempts.


28-28: Remove or utilize the unused Timeout field.

The Timeout field in the CLICommand struct is declared but never referenced. All timeout-enabled methods accept timeout as an explicit parameter instead.

Consider either removing the field if unused, using it as a default when methods are called without an explicit timeout, or documenting if it's reserved for future use.

tests/e2e/lib/restore_cli.go (1)

202-242: LGTM: Restore logs methods are correctly implemented.

The restore operations follow the same robust timeout/retry pattern as backup operations, ensuring consistent behavior across the test suite.

Note: There's significant code duplication between the backup and restore CLI methods. Consider extracting a generic helper function to reduce duplication:

func executeWithOptionsHelper(cmd *CLICommand, timeout time.Duration, maxRetries int, retryDelay time.Duration, useOutput bool) ([]byte, error) {
    if maxRetries > 1 {
        if useOutput {
            return cmd.ExecuteOutputWithTimeoutAndRetry(timeout, maxRetries, retryDelay)
        }
        return cmd.ExecuteWithTimeoutAndRetry(timeout, maxRetries, retryDelay)
    }
    if useOutput {
        return cmd.ExecuteOutputWithTimeout(timeout)
    }
    return cmd.ExecuteWithTimeout(timeout)
}

This would reduce duplication across both backup and restore operations, though it's acceptable to defer this refactor given this is test code.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between f75a38d and f95c9cc.

📒 Files selected for processing (3)
  • tests/e2e/lib/backup_cli.go (2 hunks)
  • tests/e2e/lib/cli_common.go (2 hunks)
  • tests/e2e/lib/restore_cli.go (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • tests/e2e/lib/backup_cli.go
  • tests/e2e/lib/restore_cli.go
  • tests/e2e/lib/cli_common.go
🧬 Code graph analysis (3)
tests/e2e/lib/backup_cli.go (1)
tests/e2e/lib/cli_common.go (4)
  • DefaultCLITimeout (15-15)
  • DefaultCLIRetries (19-19)
  • DefaultCLIRetryDelay (20-20)
  • CLICommand (23-29)
tests/e2e/lib/restore_cli.go (1)
tests/e2e/lib/cli_common.go (4)
  • DefaultCLITimeout (15-15)
  • DefaultCLIRetries (19-19)
  • DefaultCLIRetryDelay (20-20)
  • CLICommand (23-29)
tests/e2e/lib/cli_common.go (1)
must-gather/pkg/cli.go (1)
  • Timeout (39-39)
🪛 golangci-lint (2.5.0)
tests/e2e/lib/cli_common.go

[error] 19-19: File is not properly formatted

(gci)

🔇 Additional comments (5)
tests/e2e/lib/cli_common.go (2)

14-21: LGTM: Timeout and retry defaults are appropriate.

The 5-minute timeout and 3-retry configuration with 10-second delays are reasonable defaults for operations that stream from object storage and may experience transient network issues.


55-101: LGTM: Timeout implementation is correct.

Both ExecuteWithTimeout and ExecuteOutputWithTimeout properly use context.WithTimeout and exec.CommandContext to enable command cancellation. The explicit check for context.DeadlineExceeded provides clear timeout error messages.

tests/e2e/lib/backup_cli.go (2)

195-225: LGTM: Describe methods properly implement timeout and retry.

The wrapper pattern preserves backward compatibility while DescribeBackupViaCLIWithOptions provides configurable timeout/retry behavior. The conditional logic correctly distinguishes between single-attempt (timeout-only) and multi-attempt (timeout + retry) scenarios.


227-267: LGTM: Logs methods follow consistent pattern.

The three-tier API design (simple wrapper, timeout-only variant, full options) provides good flexibility while maintaining backward compatibility. The implementation correctly uses ExecuteOutputWithTimeout variants to capture stdout.

tests/e2e/lib/restore_cli.go (1)

170-200: LGTM: Restore describe methods mirror backup implementation.

The implementation is consistent with the backup operations and correctly applies the timeout/retry pattern.

Prevents tests from hanging indefinitely when streaming logs from
object storage. Uses 5-minute timeout with 3 retries.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Michal Pryc <[email protected]>
@mpryc mpryc force-pushed the cli_timeout_retry branch from f95c9cc to 51683de Compare December 17, 2025 13:22
@openshift-ci
Copy link

openshift-ci bot commented Dec 17, 2025

@mpryc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-e2e-test-cli-aws 51683de link false /test 4.20-e2e-test-cli-aws
ci/prow/4.20-e2e-test-aws 51683de link true /test 4.20-e2e-test-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kaovilai
Copy link
Member

if its 5 minutes here you're suggesting CLI itself should use even lower timeout? or is this a catchall if cli does not implement timeout

@mpryc
Copy link
Contributor Author

mpryc commented Dec 17, 2025

if its 5 minutes here you're suggesting CLI itself should use even lower timeout? or is this a catchall if cli does not implement timeout

for CI I think we are fine 5min as we know the download speed is pretty fast, but on the CLI side I would not assume so, so timeout here is separate to CLI itself and we do not look here at the CLI. Another option would be to use env from CLI here, but than we need to have CLI change in and new CLI available

@kaovilai
Copy link
Member

So should CLI have default timeout higher than 5?

@mpryc
Copy link
Contributor Author

mpryc commented Dec 18, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants