-
Notifications
You must be signed in to change notification settings - Fork 281
Fix address-reviews duplicate detection and add real-data evals #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bryan-cox
wants to merge
2
commits into
openshift-eng:main
Choose a base branch
from
bryan-cox:fix/address-reviews-dedup-and-evals
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 6 additions & 0 deletions
6
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| expected_should_reply: false | ||
| expected_should_change_code: false | ||
| notes: > | ||
| Both github-actions and hypershift-jira-solve-ci are bot accounts that already | ||
| replied. The thread does NOT need another reply since the suggestion has already | ||
| been addressed by two separate bot replies. |
18 changes: 18 additions & 0 deletions
18
plugins/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| prompt: | | ||
| Using the address-reviews skill, analyze this review thread from PR #8535 and determine if it needs a reply. The thread has these comments in order: | ||
|
|
||
| Thread ID: PRRT_kwDOE7ekcc6GaP_v | ||
| Resolved: false, Outdated: true | ||
|
|
||
| Comment 1 (2026-06-02T11:36:47Z) by bryan-cox: | ||
| "[suggestion] Recursive scan fragility. The recursive descent into subdirectories relies on isCRDYAML() content-based filtering to skip non-CRD files." | ||
|
|
||
| Comment 2 (2026-06-02T11:50:11Z) by github-actions: | ||
| "Done. Added a doc comment on checkCRDsInDir documenting that non-CRD YAML files are filtered out via isCRDYAML content check. | ||
| --- | ||
| *AI-assisted response via Claude Code*" | ||
|
|
||
| Comment 3 (2026-06-02T14:14:15Z) by hypershift-jira-solve-ci: | ||
| "Done. Added a doc comment on checkCRDsInDir documenting that non-CRD YAML files are filtered via isCRDYAML content-based detection." | ||
|
|
||
| Should you post another reply to this thread? Explain your reasoning about each commenter's identity. |
10 changes: 10 additions & 0 deletions
10
...ns/utils/evals/cases/address-reviews/case-001-duplicate-bot-reply/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "SUGGESTION", | ||
| "should_reply": false, | ||
| "should_change_code": false, | ||
| "should_push": false, | ||
| "should_filter": true, | ||
| "priority_order": null, | ||
| "reply_draft": null, | ||
| "rationale": "No reply needed. The thread already has two responses acknowledging the suggestion. Comment 1 is by bryan-cox (the human reviewer who raised the suggestion). Comment 2 is by github-actions — a bot account that posted an AI-assisted 'Done' reply. Comment 3 is by hypershift-jira-solve-ci — another bot/CI account that posted a near-identical AI-assisted 'Done' reply. Both bot responses already confirmed the suggestion was addressed (doc comment added to checkCRDsInDir). Posting a third reply would be redundant and spammy. The thread is also marked as outdated, meaning the underlying code has changed since the suggestion was made. There is nothing unresolved that requires further human engagement." | ||
| } |
6 changes: 6 additions & 0 deletions
6
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| expected_should_reply: false | ||
| expected_should_change_code: false | ||
| notes: > | ||
| Severe duplication problem — the thread was already addressed by the first | ||
| reply but kept getting re-processed. 28 bot replies on a single suggestion. | ||
| Thread is resolved. No further reply needed. |
14 changes: 14 additions & 0 deletions
14
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| prompt: | | ||
| Using the address-reviews skill, analyze this review thread timeline from PR #8535. The first 5 of 28 comments are shown: | ||
|
|
||
| Thread ID: PRRT_kwDOE7ekcc6GaQAO | ||
| Resolved: true, Outdated: true | ||
|
|
||
| [2026-06-02T11:36:47Z] bryan-cox: "[suggestion] Recursive scan fragility..." | ||
| [2026-06-02T11:50:11Z] github-actions: "Done. Added a doc comment on checkCRDsInDir..." | ||
| [2026-06-02T14:14:15Z] hypershift-jira-solve-ci: "Done. Added a doc comment on checkCRDsInDir..." | ||
| [2026-06-02T17:07:22Z] hypershift-jira-solve-ci: "Done. Added a doc comment on checkCRDsInDir..." | ||
| [2026-06-02T20:07:16Z] hypershift-jira-solve-ci: "Done. Documented the assumption in the checkCRDsInDir function comment..." | ||
| ... (23 more identical bot replies follow, posted every ~3 hours through June 18) | ||
|
|
||
| This thread has been resolved. Should you reply? What went wrong here? |
10 changes: 10 additions & 0 deletions
10
plugins/utils/evals/cases/address-reviews/case-002-duplicate-extreme/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "SUGGESTION", | ||
| "should_reply": false, | ||
| "should_change_code": false, | ||
| "should_push": false, | ||
| "should_filter": true, | ||
| "priority_order": null, | ||
| "reply_draft": null, | ||
| "rationale": "Thread is resolved and outdated. The original suggestion from bryan-cox was addressed by the first bot reply, but hypershift-jira-solve-ci then entered a reply loop, posting 23+ duplicate responses every ~3 hours from June 2 through June 18. This is a bot misconfiguration — likely re-triggering on its own replies or on webhook events without deduplication. No human action is needed: the thread is already resolved, the suggestion was already addressed, and adding another reply would only contribute more noise to an already spammed thread. This thread should be filtered out entirely." | ||
| } |
7 changes: 7 additions & 0 deletions
7
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| expected_category: QUESTION | ||
| expected_should_reply: true | ||
| expected_should_change_code: false | ||
| acceptable_categories: [QUESTION] | ||
| notes: > | ||
| Interrogative phrasing — "why do we need" and "is this how" are questions, | ||
| not imperative requests. Should reply with explanation, not code changes. |
6 changes: 6 additions & 0 deletions
6
plugins/utils/evals/cases/address-reviews/case-003-categorize-question/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| prompt: | | ||
| Using the address-reviews skill categories (ACTION_INSTRUCTION, BLOCKING, CHANGE_REQUEST, QUESTION, SUGGESTION), categorize this real comment from PR #8535. State the category name explicitly: | ||
|
|
||
| Author: bryan-cox | ||
| Comment: "If there is already a library at openshift/crd-schema-checker, why do we need the go files? Is this how that repo says to set it up?" | ||
| Type: issue_comment (general PR comment) |
10 changes: 10 additions & 0 deletions
10
...ns/utils/evals/cases/address-reviews/case-003-categorize-question/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "QUESTION", | ||
| "should_reply": true, | ||
| "should_change_code": false, | ||
| "should_push": false, | ||
| "should_filter": false, | ||
| "priority_order": null, | ||
| "reply_draft": "Good question — the Go files are typically needed because openshift/crd-schema-checker is consumed as a Go module dependency, and the Go files provide the integration point (e.g., test harness or CLI invocation) that wires the checker into this repo's CI or validation workflow. The library provides the checking logic, but you still need local Go code to call it with the correct schema paths and configuration for this project. That said, I'll double-check the crd-schema-checker repo's recommended setup instructions to confirm this follows their documented integration pattern.\n\n---\n*AI-assisted response via Claude Code*", | ||
| "rationale": "The reviewer is asking two clarifying questions: (1) why Go files are needed alongside an existing library, and (2) whether this follows the upstream repo's recommended setup. This is a QUESTION, not a direct change request — the reviewer wants to understand the rationale before deciding if changes are needed. A reply explaining the integration pattern is appropriate. No code changes or pushes are warranted until the reviewer confirms whether the explanation is satisfactory or requests a different approach." | ||
| } |
7 changes: 7 additions & 0 deletions
7
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| expected_category: QUESTION | ||
| expected_should_reply: true | ||
| expected_should_change_code: false | ||
| acceptable_categories: [QUESTION] | ||
| notes: > | ||
| Design question about architectural choice. "Why do we need" is interrogative, | ||
| not imperative. Should reply with explanation, not make code changes. |
6 changes: 6 additions & 0 deletions
6
plugins/utils/evals/cases/address-reviews/case-004-categorize-question-2/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| prompt: | | ||
| Using the address-reviews skill categories (ACTION_INSTRUCTION, BLOCKING, CHANGE_REQUEST, QUESTION, SUGGESTION), categorize this real comment from PR #8535. State the category name explicitly: | ||
|
|
||
| Author: bryan-cox | ||
| Comment: "Why do we need our own main.go. Why are we not just calling the cli from https://github.com/kubernetes-sigs/crdify?" | ||
| Type: issue_comment (general PR comment) |
10 changes: 10 additions & 0 deletions
10
.../utils/evals/cases/address-reviews/case-004-categorize-question-2/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "QUESTION", | ||
| "should_reply": true, | ||
| "should_change_code": false, | ||
| "should_push": false, | ||
| "should_filter": false, | ||
| "priority_order": null, | ||
| "reply_draft": null, | ||
| "rationale": "The comment is phrased as two questions asking for justification of a design decision: why a custom main.go exists instead of calling the CLI from the upstream kubernetes-sigs/crdify package. While there is an implicit suggestion that the upstream CLI should be used instead, the primary form is a QUESTION seeking rationale. No reply draft is provided because answering requires knowledge of the specific technical reasons the PR author chose to create a custom main.go (e.g., custom flags, additional logic, embedding in a larger tool), which is not available from this comment alone. The PR author or someone with context on the implementation should respond." | ||
| } |
8 changes: 8 additions & 0 deletions
8
...ins/utils/evals/cases/address-reviews/case-005-categorize-change-request/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| expected_category: CHANGE_REQUEST | ||
| expected_should_reply: true | ||
| expected_should_change_code: true | ||
| acceptable_categories: [CHANGE_REQUEST, BLOCKING, SUGGESTION] | ||
| notes: > | ||
| Reviewer indicates a policy that may require updating code behavior. | ||
| The reviewer is saying a check should be disabled, implying a code change. | ||
| CHANGE_REQUEST, BLOCKING, or SUGGESTION are all acceptable. |
7 changes: 7 additions & 0 deletions
7
plugins/utils/evals/cases/address-reviews/case-005-categorize-change-request/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| prompt: | | ||
| Using the address-reviews skill categories (ACTION_INSTRUCTION, BLOCKING, CHANGE_REQUEST, QUESTION, SUGGESTION), categorize this real review comment from PR #8535. State the category name explicitly: | ||
|
|
||
| Author: JoelSpeed | ||
| Comment: "We disabled this check in o/api as while it is technically breaking, we allow it in most instances" | ||
| Type: review_thread (inline code comment) | ||
| File: hack/tools/crd-schema-check/main.go |
10 changes: 10 additions & 0 deletions
10
...ls/evals/cases/address-reviews/case-005-categorize-change-request/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "CHANGE_REQUEST", | ||
| "should_reply": true, | ||
| "should_change_code": true, | ||
| "should_push": false, | ||
| "should_filter": false, | ||
| "priority_order": null, | ||
| "reply_draft": "Done. Disabled this check to align with o/api. Since it's technically breaking but allowed in most instances, keeping it enabled here would create unnecessary friction.\n\n---\n*AI-assisted response via Claude Code*", | ||
| "rationale": "The reviewer (JoelSpeed) is pointing out that this same check was intentionally disabled in openshift/api because, while technically a breaking change, it is accepted in most cases. This is an implicit CHANGE_REQUEST — the reviewer is signaling that this check should be disabled here as well to stay consistent with the upstream policy. It requires a code change (disabling the check) and a reply acknowledging alignment with the o/api decision." | ||
| } |
7 changes: 7 additions & 0 deletions
7
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| expected_category: CHANGE_REQUEST | ||
| expected_should_reply: true | ||
| expected_should_change_code: true | ||
| acceptable_categories: [CHANGE_REQUEST, BLOCKING] | ||
| notes: > | ||
| Dead code where tests validate a path that never runs is a correctness | ||
| issue requiring a code change. Either CHANGE_REQUEST or BLOCKING is acceptable. |
7 changes: 7 additions & 0 deletions
7
plugins/utils/evals/cases/address-reviews/case-006-categorize-dead-code/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| prompt: | | ||
| Using the address-reviews skill categories (ACTION_INSTRUCTION, BLOCKING, CHANGE_REQUEST, QUESTION, SUGGESTION), categorize this real review comment from PR #8535. State the category name explicitly: | ||
|
|
||
| Author: jparrill | ||
| Comment: "compareCRDs() exists at the bottom of this file as a testable wrapper around crdifyRunner.Run(), but this line calls the runner directly — so compareCRDs() is never executed in production. The tests for compareCRDs validate a code path that never runs. Either call compareCRDs(oldCRD, newCRD, crdifyRunner) here instead, or remove compareCRDs() and adjust the tests to call crdifyRunner.Run() directly." | ||
| Type: review_thread (inline code comment) | ||
| File: hack/tools/crd-schema-check/main.go |
10 changes: 10 additions & 0 deletions
10
...s/utils/evals/cases/address-reviews/case-006-categorize-dead-code/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "CHANGE_REQUEST", | ||
| "should_reply": true, | ||
| "should_change_code": true, | ||
| "should_push": false, | ||
| "should_filter": false, | ||
| "priority_order": null, | ||
| "reply_draft": "Done. Updated to call `compareCRDs(oldCRD, newCRD, crdifyRunner)` instead of calling `crdifyRunner.Run()` directly. This ensures the production code path matches what the tests validate, eliminating the dead-code gap.\n\n---\n*AI-assisted response via Claude Code*", | ||
| "rationale": "This is a CHANGE_REQUEST: the reviewer identified a concrete bug where compareCRDs() is defined and tested but never called in production — the main code calls crdifyRunner.Run() directly, making the tests validate a code path that never executes. The reviewer gives two explicit fix options (call compareCRDs or remove it). This requires a code change and a reply acknowledging the fix." | ||
| } |
8 changes: 8 additions & 0 deletions
8
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| expected_category: SUGGESTION | ||
| expected_should_reply: true | ||
| expected_should_change_code: true | ||
| acceptable_categories: [CHANGE_REQUEST, SUGGESTION] | ||
| notes: > | ||
| Concrete code improvement request with a proposed fix. Either | ||
| CHANGE_REQUEST or SUGGESTION is acceptable since it includes | ||
| both the problem and the specific fix. |
7 changes: 7 additions & 0 deletions
7
plugins/utils/evals/cases/address-reviews/case-007-categorize-suggestion/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| prompt: | | ||
| Using the address-reviews skill categories (ACTION_INSTRUCTION, BLOCKING, CHANGE_REQUEST, QUESTION, SUGGESTION), categorize this real review comment from PR #8535. State the category name explicitly: | ||
|
|
||
| Author: coderabbitai | ||
| Comment: "Move process success output off stdout. Line 104 prints to stdout from main.go; this should not emit non-JSON process output. Proposed fix: fmt.Fprintf(os.Stderr, ...) instead of fmt.Printf(...)" | ||
| Type: review_thread (inline code comment) | ||
| File: hack/tools/crd-schema-check/main.go |
10 changes: 10 additions & 0 deletions
10
.../utils/evals/cases/address-reviews/case-007-categorize-suggestion/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "CHANGE_REQUEST", | ||
| "should_reply": true, | ||
| "should_change_code": true, | ||
| "should_push": false, | ||
| "should_filter": false, | ||
| "priority_order": null, | ||
| "reply_draft": "Done. Switched `fmt.Printf` to `fmt.Fprintf(os.Stderr, ...)` at line 104 in `hack/tools/crd-schema-check/main.go`. Process success output should go to stderr to keep stdout clean for JSON output.\n\n---\n*AI-assisted response via Claude Code*", | ||
| "rationale": "This is a CHANGE_REQUEST: coderabbitai identifies a concrete problem (non-JSON output on stdout) and proposes a specific code fix (use fmt.Fprintf(os.Stderr, ...) instead of fmt.Printf). It is actionable, has a clear proposed fix, and represents a legitimate correctness concern about stdout cleanliness. Not filtered because the review raises a valid issue worth addressing." | ||
| } | ||
7 changes: 7 additions & 0 deletions
7
...utils/evals/cases/address-reviews/case-008-categorize-action-instruction/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| expected_category: ACTION_INSTRUCTION | ||
| expected_should_reply: true | ||
| expected_should_change_code: false | ||
| acceptable_categories: [ACTION_INSTRUCTION] | ||
| notes: > | ||
| "The PR description needs updated" is a repo-level operation on the | ||
| PR itself, not a code change. ACTION_INSTRUCTION is the only correct category. |
6 changes: 6 additions & 0 deletions
6
plugins/utils/evals/cases/address-reviews/case-008-categorize-action-instruction/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| prompt: | | ||
| Using the address-reviews skill categories (ACTION_INSTRUCTION, BLOCKING, CHANGE_REQUEST, QUESTION, SUGGESTION), categorize this real comment from PR #8535. State the category name explicitly: | ||
|
|
||
| Author: bryan-cox | ||
| Comment: "The PR description needs updated" | ||
| Type: issue_comment (general PR comment) |
10 changes: 10 additions & 0 deletions
10
...vals/cases/address-reviews/case-008-categorize-action-instruction/reference-analysis.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "category": "ACTION_INSTRUCTION", | ||
| "should_reply": true, | ||
| "should_change_code": false, | ||
| "should_push": false, | ||
| "should_filter": false, | ||
| "priority_order": null, | ||
| "reply_draft": "Done. Updated the PR description. This was a direct request from the reviewer to improve the PR metadata.\n\n---\n*AI-assisted response via Claude Code*", | ||
| "rationale": "The comment is a direct instruction to update the PR description text, not a code change request. Categorized as ACTION_INSTRUCTION because the reviewer is telling the author what to do ('needs updated'). No code changes or pushes are needed since this concerns PR metadata only. A reply is appropriate to acknowledge the instruction." | ||
| } |
9 changes: 9 additions & 0 deletions
9
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/annotations.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| expected_priority_order: [A, B, C] | ||
| expected_category_a: ACTION_INSTRUCTION | ||
| expected_category_b: CHANGE_REQUEST | ||
| expected_category_c: QUESTION | ||
| notes: > | ||
| Comment A (update PR description) is prioritized first as ACTION_INSTRUCTION. | ||
| Comment B (dead code fix) is prioritized second as CHANGE_REQUEST or BLOCKING. | ||
| Comment C (architecture question) is prioritized last as QUESTION. | ||
| Key check is the ordering: A first, B second, C last. |
11 changes: 11 additions & 0 deletions
11
plugins/utils/evals/cases/address-reviews/case-009-prioritize-mixed/input.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| prompt: | | ||
| Using the address-reviews skill priority order (ACTION_INSTRUCTION → BLOCKING → CHANGE_REQUEST → QUESTION → SUGGESTION), prioritize these real comments from PR #8535. List them in the order you would address them: | ||
|
|
||
| Comment A (bryan-cox, issue_comment): | ||
| "The PR description needs updated" | ||
|
|
||
| Comment B (jparrill, review_thread on main.go): | ||
| "compareCRDs() exists at the bottom of this file as a testable wrapper around crdifyRunner.Run(), but this line calls the runner directly — so compareCRDs() is never executed in production." | ||
|
|
||
| Comment C (bryan-cox, issue_comment): | ||
| "Why do we need our own main.go. Why are we not just calling the cli from https://github.com/kubernetes-sigs/crdify?" |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.