Fix address-reviews duplicate detection and add real-data evals by bryan-cox · Pull Request #558 · openshift-eng/ai-helpers

bryan-cox · 2026-06-18T18:52:48Z

Summary

Add github-actions and github-actions[bot] to BOT_SIGNATURES in check_replied.py so the skill's dedup check recognizes replies posted under the GHA identity
Add CI override instruction in address-reviews.md so the skill respects "Do NOT push" directives from the periodic job harness instead of attempting to push (which conflicts with the harness's separate push phase)
Add 14 eval test cases using real PR data from openshift/hypershift PR #8535

Context

The periodic review agent was producing 102+ duplicate replies because:

github-actions wasn't recognized as a bot in check_replied.py, so the skill-level dedup check didn't catch existing replies posted under that identity
The skill's push step conflicted with the periodic job's "Do NOT push" instruction, leading to "changed but didn't push" reports

The eval suite validates these fixes using actual review thread data from the duplicate reply storm on PR #8535, covering duplicate prevention, comment categorization, prioritization, bot filtering, reply format, CI push override, and response rules. All 14 tests pass at 100%.

Companion PR: openshift/release for the periodic job's BOT_ACCOUNTS fix.

Test plan

make eval-plugins EVAL_PLUGIN=utils — 14/14 passing (100%)
make lint — passes
Monitor periodic job runs after both PRs merge to confirm duplicate replies stop

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a CI override rule for address-reviews that avoids manual push and push verification when pushing is managed automatically.
Bug Fixes
- Improved detection of already-processed bot replies by recognizing GitHub Actions-authored comments.
Tests
- Added/updated extensive address-reviews evaluation cases and tightened expected reply/decision behavior checks.
Chores
- Bumped the utils plugin version to 0.0.12.
- Added utils budget tracking for evaluations.

coderabbitai · 2026-06-18T18:53:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Extends the address-reviews skill with a CI push-override rule that skips manual pushing when the prompt disables it, and expands bot detection to include GitHub Actions accounts. Introduces a comprehensive 14-case eval suite with Python judges validating categorization, reply format, and decision flags, plus an LLM quality rubric. Increments the plugin version to 0.0.12 and adds a 14.00 eval budget entry.

Changes

address-reviews skill behavior and evaluation suite

Layer / File(s)	Summary
Skill behavior: CI override and bot detection `plugins/utils/commands/address-reviews.md`, `plugins/utils/scripts/check_replied.py`	`address-reviews.md` adds a CI Override rule at Step 4 that skips push and push-verification when the prompt says "Do NOT push" or indicates CI handles pushing automatically. `check_replied.py` adds `"github-actions"` and `"github-actions[bot]"` to `BOT_SIGNATURES` so `is_bot_reply()` treats those accounts as already-replied bots.
Eval framework: configuration, wiring, and judges `plugins/utils/evals/eval-address-reviews.yaml`	New 280-line eval config wires the `utils:address-reviews` skill in case mode with Claude Code runner and `claude-opus-4-6` judge. Defines dataset/annotation schemas, `analysis.json` output contract, and Python judges for `category`, `should_reply`, `should_change_code`, `should_push` prediction validation; adds `reply_format_correct` judge for prefix/footer/length constraints; includes `analysis_quality` LLM rubric (1–5) with per-judge pass thresholds.
Eval cases 001–002: duplicate bot reply detection `plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/...`, `case-002-duplicate-extreme/...`	Two cases verifying no additional reply when bots have already replied. Case 001 covers normal duplication (GitHub Actions + bot replies on outdated thread); case 002 covers extreme (28 bot replies on resolved/outdated thread). Each case includes input prompt, annotations with expected flags, and reference analysis.
Eval cases 003–008: comment categorization `plugins/utils/evals/cases/address-reviews/case-003-/...` through `case-008-/...`	Six cases covering all primary categorization paths: QUESTION (×2 different phrasings), CHANGE_REQUEST (×2 different contexts), SUGGESTION, and ACTION_INSTRUCTION. Each case specifies input prompt, expected/acceptable categories, evaluation expectations, and grounded reference analysis.
Eval cases 009–014: prioritization, filtering, format, and response types `plugins/utils/evals/cases/address-reviews/case-009-/...` through `case-014-/...`	Six behavioral cases: mixed-comment A→B→C priority ordering, Coderabbit AI comment kept (actionable bug), reply format enforcement (Done. prefix + footer + 3-sentence max), CI push-override validation (`expected_should_push: false`), question-only response (no code change), and imperative-instruction response (code change required).

Plugin version bump and eval budget

Layer / File(s)	Summary
Version 0.0.12 and utils eval budget `plugins/utils/.claude-plugin/plugin.json`, `.claude-plugin/marketplace.json`, `docs/index.html`, `evals/budget.yaml`	Plugin version incremented from `0.0.11` to `0.0.12` in manifest and marketplace files. `evals/budget.yaml` adds `budgets.utils` section with `allowed: 14.00` and `current: 14.00` budget limits.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

openshift-eng/ai-helpers#301: Introduces bot-detection and duplicate-prevention logic in check_replied.py that this PR extends by adding GitHub Actions account recognition.
openshift-eng/ai-helpers#354: Modifies the address-reviews push and verification control flow at Step 4, which this PR augments with the CI override rule.
openshift-eng/ai-helpers#464: Establishes the eval budgeting model in evals/budget.yaml that this PR extends by adding the new budgets.utils entry.

Suggested labels

lgtm

Suggested reviewers

bentito
stleerh

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 errors)

Check name	Status	Explanation	Resolution
No Assumed Git Remote Names	❌ Error	The PR's address-reviews.md contains 4 hardcoded 'origin' git remote names in commands (lines 170, 172, 182, 269) without using the discovered BASE_REMOTE variable, violating the no-assumed-remote-...	Replace hardcoded 'origin' with '$BASE_REMOTE' variable: update lines 170, 172, 182, 269 to use the already-discovered BASE_REMOTE variable instead of hardcoding 'origin'.
Git Push Safety Rules	❌ Error	PR violates Git Push Safety Rules: address-reviews.md line 264 uses prohibited force push variant `git push --force-with-lease`, and performs autonomous push without asking user permission first (o...	Replace `git push --force-with-lease` with standard `git push` and add explicit user confirmation step before pushing, per safety rule requiring user permission before any push.

✅ Passed checks (8 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly reflects the two main changes: fixing duplicate detection in address-reviews and adding evaluation cases with real data.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
No Real People Names In Style References	✅ Passed	No real people names are used as style references in the PR. Author fields in eval test cases are metadata from actual PR `#8535` comments being tested, not style guides.
No Untrusted Mcp Servers	✅ Passed	PR contains no MCP server installations or untrusted package sources. Changes add GitHub-standard account identifiers to bot detection and 14 YAML test cases only.
Ai-Helpers Overlap Detection	✅ Passed	PR modifies two ai-helpers files (address-reviews.md and check-release-health.md). Semantic analysis shows address-reviews is distinct from code-review and pre-commit-review commands (respond to co...

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/utils/commands/address-reviews.md (1)
259-269: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Push flow still lacks required safety gates.

Line 259 adds CI opt-out, but the default path still allows unconditional force-push behavior and origin-assumed verification. This command must require explicit user approval before any push, reject main/master targets, reject force variants, and discover remotes before push/verification.
Suggested patch
-**CI Override**: If the user prompt explicitly says "Do NOT push" or "pushing will be handled automatically", skip Steps 4b and 4c entirely. The CI harness manages pushing separately.
+**CI Override**: If the user prompt explicitly says "Do NOT push" or "pushing will be handled automatically", skip Steps 4b and 4c entirely. The CI harness manages pushing separately.
+
+**Mandatory Push Gate**: If CI override is not active, ask for explicit user approval before pushing:
+`Replies are posted and commits are ready. Push now? (yes/no)`
+If the user does not explicitly approve, skip Steps 4b and 4c.

 After all replies are posted, push all committed changes in a single push:
 
 ```bash
-git push --force-with-lease
+CURRENT_BRANCH=$(git branch --show-current)
+case "$CURRENT_BRANCH" in
+  main|master|*/main|*/master)
+    echo "Refusing push to protected branch: $CURRENT_BRANCH" >&2
+    exit 1
+    ;;
+esac
+TARGET_REMOTE=$(git branch -vv | awk '/^\*/ {print $3}' | cut -d/ -f1)
+if [ -z "$TARGET_REMOTE" ]; then
+  TARGET_REMOTE=$(git remote -v | awk 'NR==1 {print $1}')
+fi
+git push "$TARGET_REMOTE" "$CURRENT_BRANCH"
4c. Verify push

-- Run git log -1 --format='%H' locally and git ls-remote origin <branch> to confirm the remote has your commit
+- Run git log -1 --format='%H' locally and git ls-remote "$TARGET_REMOTE" "$CURRENT_BRANCH" to confirm the remote has your commit
</details>

As per coding guidelines: commands/skills must ask before pushing, must never force-push, must never push main/master, and must discover remotes instead of assuming names.

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/utils/commands/address-reviews.md around lines 259 - 269, The push
flow in the address-reviews.md file (specifically in section 4b where git push
--force-with-lease is used) lacks required safety gates. Replace the force-push
command with a safer implementation that: first rejects pushes to protected
branches (main, master, and their variants) by checking the current branch name
and exiting if it matches these patterns, dynamically discovers the target
remote by querying the branch tracking info or falling back to the first remote
from git remote output instead of assuming "origin", performs a regular push to
the discovered remote and current branch without force flags, and updates the
verification step in section 4c to use the dynamically discovered TARGET_REMOTE
and CURRENT_BRANCH variables instead of hardcoded "origin" and branch name in
both the git log and git ls-remote commands.
</details>



_Source: Coding guidelines_

</blockquote></details>

</blockquote></details>
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/openshift/commands/api-review.md`:
- Line 10: Three fenced code blocks in the file lack language specification: the
code block containing `/openshift:api-review [pr_url]` at line 10 (Step 1
prefix), the code block containing the example URL at line 212, and the code
block at line 218. Add `bash` language specification to each of these three code
blocks by changing the opening triple backticks from ``` to ```bash to comply
with the markdownlint enforcement in the repository.
- Around line 80-84: The code block that adds a git remote blindly assumes the
"upstream" name is available without checking if it already exists. Before
executing the git remote add upstream command, check whether a remote named
"upstream" already exists using git remote. If it already exists, either use a
different remote name, prompt the user to specify an alternative, or dynamically
discover an available remote name. Only proceed with adding the remote if the
name is confirmed to be available or use an alternative that does not conflict
with existing remotes.

---

Outside diff comments:
In `@plugins/utils/commands/address-reviews.md`:
- Around line 259-269: The push flow in the address-reviews.md file
(specifically in section 4b where git push --force-with-lease is used) lacks
required safety gates. Replace the force-push command with a safer
implementation that: first rejects pushes to protected branches (main, master,
and their variants) by checking the current branch name and exiting if it
matches these patterns, dynamically discovers the target remote by querying the
branch tracking info or falling back to the first remote from git remote output
instead of assuming "origin", performs a regular push to the discovered remote
and current branch without force flags, and updates the verification step in
section 4c to use the dynamically discovered TARGET_REMOTE and CURRENT_BRANCH
variables instead of hardcoded "origin" and branch name in both the git log and
git ls-remote commands.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)

Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a91e115-2a60-4552-a5de-af2613f50319

📥 Commits

Reviewing files that changed from the base of the PR and between 1015dd8 and 1e928be.

📒 Files selected for processing (4)

plugins/openshift/commands/api-review.md

plugins/utils/commands/address-reviews.md

plugins/utils/evals/address-reviews.yaml

plugins/utils/scripts/check_replied.py

bryan-cox · 2026-06-18T23:45:35Z

/test ?

bryan-cox · 2026-06-18T23:45:50Z

/test eval-address-reviews

enxebre · 2026-06-22T12:25:59Z

+
+        Author: JoelSpeed
+        Comment: "We are moving away from crd-schema-checker towards the upstream crdify project, and in fact, in o/api we have disabled the majority of checks in crd-schema-checker because they are implemented in crdify already. Could you weigh in on whether to switch this PR to crdify?"
+        Type: issue_comment (general PR comment)


are you intentionally adding promptfoo coverage here vs eval harness?

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/utils/evals/eval-address-reviews.yaml`:
- Around line 155-201: The review points out that schema fields
expected_should_filter and expected_priority_order are defined but lack
corresponding evaluation checks, unlike should_change_code_correct and
should_push_correct. Create two new evaluation checks following the same pattern
as the existing ones: add a check named should_filter_correct that compares the
predicted should_filter value from the analysis.json file against the
expected_should_filter annotation, and add a check named priority_order_correct
that compares the predicted priority_order value from the analysis.json file
against the expected_priority_order annotation. Both checks should reuse the
same JSON parsing and file retrieval logic as the existing checks and return
appropriate pass/fail messages with the compared values.
- Around line 99-103: The valid_output_json function's required fields list is
incomplete and does not match the full output contract defined in the file.
Update the required list to include all required fields specified in the output
contract definition (lines 19-26 and 68-75), which should include: category,
should_reply, should_change_code, rationale, should_push, should_filter,
priority_order, and reply_draft. This ensures the validation properly enforces
the complete output schema and prevents malformed analysis.json files from
passing validation.
- Around line 203-233: The reply_format_correct check is using incorrect
annotation key names that don't match the actual annotations used. Instead of
checking for expected_starts_with and expected_contains_footer, update the check
to look for expected_reply_contains and expected_reply_max_sentences keys.
Additionally, add validation logic to count the number of sentences in the
reply_draft and compare it against the expected_reply_max_sentences constraint,
returning an error if the sentence count exceeds the maximum allowed, and
include this validation result in the final error messages along with the other
format checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 37039e58-4cea-4adb-9e7b-d699c1d4917e

📥 Commits

Reviewing files that changed from the base of the PR and between 18c0496 and 9738f1b.

📒 Files selected for processing (35)

.claude-plugin/marketplace.json
docs/index.html
evals/budget.yaml
plugins/utils/.claude-plugin/plugin.json
plugins/utils/commands/address-reviews.md
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/input.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/input.yaml
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/input.yaml
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/input.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/input.yaml
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/input.yaml
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/input.yaml
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/input.yaml
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/input.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/input.yaml
plugins/utils/evals/cases/address-reviews/case-011-reply-format/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-011-reply-format/input.yaml
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/input.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/input.yaml
plugins/utils/evals/cases/address-reviews/case-014-response-imperative-change/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-014-response-imperative-change/input.yaml
plugins/utils/evals/eval-address-reviews.yaml
plugins/utils/scripts/check_replied.py

✅ Files skipped from review due to trivial changes (25)

plugins/utils/evals/cases/address-reviews/case-003-categorize-question/input.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/input.yaml
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/input.yaml
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/input.yaml
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/input.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/input.yaml
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/input.yaml
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/input.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-011-reply-format/input.yaml
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/input.yaml
plugins/utils/.claude-plugin/plugin.json
plugins/utils/commands/address-reviews.md
.claude-plugin/marketplace.json
docs/index.html

🚧 Files skipped from review as they are similar to previous changes (2)

evals/budget.yaml
plugins/utils/scripts/check_replied.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/reference-analysis.json`:
- Line 2: The category field in reference-analysis.json is incorrectly set to
"CHANGE_REQUEST" when it should be "SUGGESTION" to align with the
case-007-categorize-suggestion test objective. Update the "category" value on
line 2 and line 9 from "CHANGE_REQUEST" to "SUGGESTION" so the test case
properly validates suggestion categorization instead of conflicting with a
different category type.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c1d37ae8-fd3e-4ac5-a31e-320c84edfe43

📥 Commits

Reviewing files that changed from the base of the PR and between 9738f1b and 7214004.

📒 Files selected for processing (49)

.claude-plugin/marketplace.json
docs/index.html
evals/budget.yaml
plugins/utils/.claude-plugin/plugin.json
plugins/utils/commands/address-reviews.md
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/input.yaml
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/input.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/input.yaml
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/input.yaml
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/input.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/input.yaml
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/input.yaml
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/input.yaml
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/input.yaml
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/input.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-011-reply-format/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-011-reply-format/input.yaml
plugins/utils/evals/cases/address-reviews/case-011-reply-format/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/input.yaml
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/input.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-014-response-imperative-change/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-014-response-imperative-change/input.yaml
plugins/utils/evals/cases/address-reviews/case-014-response-imperative-change/reference-analysis.json
plugins/utils/evals/eval-address-reviews.yaml
plugins/utils/scripts/check_replied.py

✅ Files skipped from review due to trivial changes (37)

plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/input.yaml
plugins/utils/.claude-plugin/plugin.json
plugins/utils/evals/cases/address-reviews/case-014-response-imperative-change/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/input.yaml
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/input.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/input.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-011-reply-format/input.yaml
plugins/utils/evals/cases/address-reviews/case-011-reply-format/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/input.yaml
plugins/utils/evals/cases/address-reviews/case-012-ci-push-override/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-014-response-imperative-change/input.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/input.yaml
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/reference-analysis.json
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-013-response-question-no-change/input.yaml
plugins/utils/commands/address-reviews.md
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/input.yaml
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/input.yaml
plugins/utils/evals/cases/address-reviews/case-010-filter-coderabbit-kept/annotations.yaml
docs/index.html
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/input.yaml
.claude-plugin/marketplace.json
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/annotations.yaml

🚧 Files skipped from review as they are similar to previous changes (8)

plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/annotations.yaml
evals/budget.yaml
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/input.yaml
plugins/utils/evals/cases/address-reviews/case-011-reply-format/annotations.yaml
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/input.yaml
plugins/utils/scripts/check_replied.py
plugins/utils/evals/eval-address-reviews.yaml

bryan-cox · 2026-06-22T14:14:05Z

/test eval-address-reviews

bryan-cox · 2026-06-22T14:56:08Z

/test eval-address-reviews

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-06-23T09:59:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~evals/OWNERS~~ [bryan-cox]
~~plugins/openshift-developer/OWNERS~~ [bryan-cox]
~~plugins/utils/OWNERS~~ [bryan-cox]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bryan-cox · 2026-06-23T10:05:36Z

/test eval-address-reviews

openshift-ci · 2026-06-24T09:53:35Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci Bot requested review from brandisher and stleerh June 18, 2026 18:52

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2026

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread plugins/openshift/commands/api-review.md Outdated

Comment thread plugins/openshift/commands/api-review.md Outdated

bryan-cox force-pushed the fix/address-reviews-dedup-and-evals branch 3 times, most recently from c6ea632 to 18c0496 Compare June 18, 2026 19:07

bryan-cox mentioned this pull request Jun 18, 2026

Add eval-address-reviews CI job for ai-helpers openshift/release#80761

Merged

2 tasks

enxebre reviewed Jun 22, 2026

View reviewed changes

bryan-cox force-pushed the fix/address-reviews-dedup-and-evals branch from 18c0496 to 9738f1b Compare June 22, 2026 12:55

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread plugins/utils/evals/eval-address-reviews.yaml Outdated

Comment thread plugins/utils/evals/eval-address-reviews.yaml

Comment thread plugins/utils/evals/eval-address-reviews.yaml

bryan-cox force-pushed the fix/address-reviews-dedup-and-evals branch from 9738f1b to 7214004 Compare June 22, 2026 13:30

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread ...ins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/reference-analysis.json

bryan-cox force-pushed the fix/address-reviews-dedup-and-evals branch 8 times, most recently from 141add0 to b626d33 Compare June 22, 2026 14:06

bryan-cox mentioned this pull request Jun 22, 2026

fix: update eval-address-reviews config path for openshift-eng/ai-helpers openshift/release#80842

Merged

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2026

fix: address-reviews duplicate detection and add real-data evals

0f60617

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bryan-cox force-pushed the fix/address-reviews-dedup-and-evals branch from b626d33 to c750e30 Compare June 23, 2026 09:58

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2026

bryan-cox force-pushed the fix/address-reviews-dedup-and-evals branch from c750e30 to e90f2bb Compare June 23, 2026 10:01

docs: rewrite evals AGENTS.md for eval harness format

d26010c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bryan-cox force-pushed the fix/address-reviews-dedup-and-evals branch from e90f2bb to d26010c Compare June 23, 2026 10:03

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2026

Conversation

bryan-cox commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (2 errors)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

4c. Verify push

Uh oh!

Uh oh!

Uh oh!

bryan-cox commented Jun 18, 2026

Uh oh!

bryan-cox commented Jun 18, 2026

Uh oh!

enxebre Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bryan-cox commented Jun 22, 2026

Uh oh!

bryan-cox commented Jun 22, 2026

Uh oh!

openshift-ci Bot commented Jun 23, 2026

Uh oh!

bryan-cox commented Jun 23, 2026

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bryan-cox commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading