fix: bound duplication results and aggregate clone families by iamclaude697 · Pull Request #31 · laplaque/chaffra

iamclaude697 · 2026-06-10T02:00:43Z

Summary

Refactors the duplication module's output pipeline to fix gRPC response size explosion (#30).

Root cause: every sliding-window hash match emitted a separate Finding, producing O(n²) output for overlapping windows. A 100-line identical block with min-tokens=20 generated hundreds of near-identical findings.
Fix: raw matches are grouped by clone-family fingerprint, overlapping/adjacent line ranges are coalesced per file, and families sharing overlapping locations are merged via union-find. One finding per logical clone family instead of one per raw window match.
Identity preserved: unrelated duplicate blocks between the same file pair remain separate families (fingerprint-based grouping, not file-pair grouping).
Config: max-families (default 200) caps reported families with truncation tracking.
Metadata: each finding includes clone_locations (serde_json-serialized, properly escaped), raw_pair_count, reported_location_count, token_count_min/token_count_max.
Metrics: raw_clone_pairs, clone_families, reported_findings, collapsed_matches (computed pre-truncation), truncated_families.

Test evidence

All verification commands pass:

cargo test -p chaffra-duplication   — 34 tests pass (9 new regression tests)
cargo test -p chaffra-core -- grpc  — 26 tests pass
cargo test -p chaffra-cli           — 103 + 42 + 9 + 11 tests pass
cargo clippy -- -D warnings         — clean
cargo fmt -- --check                — clean
cargo test                          — full suite, 0 failures

Original failure modes resolved:

Command	Before	After
`dupes . --format json`	gRPC error (4.8MB)	147 findings
`dupes . --mode mild --min-tokens 30 --format json`	gRPC error (93MB)	200 findings (capped)
`dupes . --min-tokens 100 --format terminal`	1566 warnings	33 warnings

New tests

test_overlapping_windows_collapse_into_bounded_families — 100-line block collapses to bounded findings
test_repeated_blocks_do_not_explode — 10 files with same block, reported << raw pairs
test_unrelated_blocks_same_file_pair_stay_separate — two distinct blocks between same files remain separate families
test_max_families_cap — respects max-families config
test_coalesce_ranges / test_coalesce_adjacent_ranges / test_coalesce_empty — range coalescing
test_aggregate_families_groups_by_fingerprint — fingerprint grouping with coalesced ranges
test_metrics_report_raw_vs_reported — all metric counters present

Deviations

None. All items listed in the prompt's guardrails (#20/#28, #29, #19, #21) were not touched.

Closes #30

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: 585 additions, 38 deletions, 2 files.

Pipeline status

BLOCKED. cargo fmt, cargo check, and cargo clippy pass, but the required cargo test check fails in tests::test_run_with_telemetry_end_to_end at crates/chaffra-cli/src/main.rs:2743:50 with called Result::unwrap() on an Err value: Os { code: 2, kind: NotFound }. This appears to be the known #29 telemetry cwd race, but the check is still red and remains a merge gate.

Test coverage verdict

UNVERIFIABLE. The workflow does not publish delta coverage for the changed Rust lines. By inspection, the new tests exercise the happy-path collapse/cap/metrics helpers, but they miss the required regression where two unrelated duplicate blocks exist between the same file pair; the current implementation collapses those into one family.

Spec coverage

The issue/prompt requires raw matches to be grouped by clone family, then coalesced and bounded. The implementation bounds output, but it groups first by file pair, which loses clone-family identity and merges unrelated duplicates between the same two files.

Repo guideline

The PR body includes a generated-by footer. The repo convention says PR descriptions should not include AI product attribution; remove it before merge.

Verdict

Not ready. The core aggregation bug must be fixed, the output metadata/metrics issues should be addressed, and the required test check needs to be green before approval.

…rics Address review findings on PR #31: - Group raw matches by family fingerprint instead of file pair so unrelated duplicate blocks between the same two files remain separate families (HIGH) - Build clone_locations via serde_json to properly escape file paths containing special characters (MEDIUM) - Compute collapsed_matches from the full pre-truncation family list so the metric stays consistent with raw_clone_pairs (MEDIUM) - Add regression test: two distinct blocks between the same file pair must produce separate families https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

laplaque

PR Re-review: fix: bound duplication results and aggregate clone families

Scope: 621 additions, 36 deletions, 4 files.

Pipeline status

PASS. The current head has green cargo fmt, cargo check, cargo clippy, and cargo test checks.

Test coverage verdict

UNVERIFIABLE. The workflow still does not publish delta coverage. By inspection, the PR now adds focused regression coverage for the prior same-file-pair grouping bug, max-family capping, metadata, and metrics. One important adjacency case remains uncovered and currently fails by construction: two different clone families adjacent in one shared file but cloned to different other files.

Spec coverage

Mostly implemented: raw matches are now grouped by fingerprint, output is bounded, clone locations are serialized safely, and raw-vs-collapsed metrics are present. The remaining gap is that the cross-family union step merges families on overlap/adjacency in any one file, which can produce a false logical clone family and misleading clone_locations metadata.

Verdict

Not ready. The previous comments are addressed, but the remaining false-family merge is a core correctness issue for the required clone-family semantics.

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: 671 additions, 36 deletions, 4 files, 707 changed lines.
Review mode: Deep mode (>=500 changed lines), reviewed by concern area: core Rust logic, tests, docs/config, and consumer compatibility.
Languages detected: Rust, Markdown, TOML/lockfile metadata.
Guideline sources: user request + attached prompt, project bootstrap guidance, pr-review skill and evals, PR metadata/diff/CI.
Spec source: attached prompt content for wiki/projects/chaffra/prompts/fix-duplication-result-explosion; Obsidian MCP reads for that path returned 404.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	0
MEDIUM	0
LOW	1

Pipeline Status

Check	Status
cargo fmt	PASS
cargo check	PASS
cargo clippy	PASS
cargo test	PASS

All current head checks are green.

Test Coverage Verdict

UNVERIFIABLE against the skill coverage thresholds: 90% minimum per changed file and 95% aggregate across changed lines. CI does not publish a delta coverage report. By inspection, the PR adds focused regression coverage for family aggregation, range coalescing, cap/truncation, raw-vs-reported metrics, same-file-pair separation, and the adjacent different-destination case. I did not find an untested changed public behavior, but the numeric coverage gate cannot be proven from CI artifacts.

Spec Coverage

Prompt requirement	Implemented	Tested	Documented	Notes
Group raw clone matches by `family_id`	Yes	Yes	Yes	`aggregate_families` groups raw matches by fingerprint before later overlap merging.
Coalesce overlapping or adjacent ranges per file	Yes	Yes	Yes	`coalesce_ranges` handles overlap and adjacency within each file/family.
Emit bounded meaningful findings	Yes	Yes	Yes	One finding per reported family, default `max-families = 200`, with truncation metrics.
Preserve family metadata and locations	Yes	Yes	Yes	Metadata includes `clone_locations`, `raw_pair_count`, `reported_location_count`, and token-count range.
Deterministic output ordering	Yes	Yes by inspection	Yes	Occurrences and final families are sorted by stable keys.
Raw vs collapsed/reported metrics	Yes	Yes	Yes	`raw_clone_pairs`, `clone_families`, `reported_findings`, `collapsed_matches`, and optional `truncated_families`.
JSON/terminal/SARIF-style Finding compatibility	Yes	CI-covered structurally	Mostly	The public `Finding` shape is preserved and metadata remains string-valued; JSON formatter context remains compatible.
Avoid raising tonic/prost limits as the primary fix	Yes	N/A	N/A	The diff bounds module output instead of changing gRPC limits.
Keep scoped away from #19/#20/#21/#29 and broad CLI/discovery refactors	Yes	N/A	N/A	Diff is limited to duplication implementation, docs, and dependency metadata.
Verify original failure modes	Evidence provided	Partly	N/A	PR body reports the original `dupes` commands now succeed or cap; CI does not publish those command logs or delta coverage artifacts.

Strengths

Detection, aggregation, and Finding conversion are separated cleanly with small internal types.
clone_locations uses serde_json serialization instead of manual string assembly.
collapsed_matches is computed before truncation, so metrics preserve pre-cap volume.
The regression tests cover the previous same-file-pair and adjacent-family false merge risks.

Verdict

No code changes requested. The implementation matches the attached prompt required behavior closely enough for merge from a code-review standpoint. I am posting this as COMMENT rather than APPROVE because the skill coverage gate is numerically unverifiable from CI artifacts, and there is one LOW documentation consistency comment inline.

laplaque

PR Review State Update: Request Changes

This supersedes my previous COMMENT event for the same full review pass.

Requested Change

Please reconcile the family_id documentation with the implementation. The docs currently say each family ID is computed from coalesced occurrence locations, but the code preserves the normalized-token fingerprint for unmerged families and only computes an occurrence-based ID after cross-family merging. Either document both cases or make the implementation consistently use occurrence-based IDs.

Context

Current head: 5015dc28abb6475395afe3ea3dfb8625612b0ad9
CI: PASS (cargo fmt, cargo check, cargo clippy, cargo test)
Coverage verdict remains UNVERIFIABLE under the PR review skill thresholds because CI does not publish delta coverage artifacts; I am not requesting #19-style coverage CI work in this PR.

Verdict

Requesting changes for the docs/API metadata consistency issue before merge.

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: 671 additions, 36 deletions, 4 files, 707 changed lines.
Review mode: Deep mode (>=500 changed lines). This is a full current-head review; previous reviews were used as history only and did not narrow scope.
Head: 0a5175a6250a5ac83cf11d1f98106b47bed2771a
Languages detected: Rust, Markdown, TOML/lockfile metadata.
Guideline sources applied: user instructions and attached prompt first, project bootstrap guidance second, pr-review skill/evals third, PR diff/CI/repo evidence fourth.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	0
MEDIUM	3
LOW	1

The line-specific findings are inline. The LOW item is the non-inline coverage-evidence gap below.

Pipeline Status

Check	Status	Evidence
cargo fmt	PASS	GitHub checks
cargo check	PASS	GitHub checks
cargo clippy	PASS	GitHub checks
cargo test	PASS	GitHub checks

All current-head checks are green. No failing CI logs were available or needed.

Test Coverage Verdict

UNVERIFIABLE against the skill thresholds: 90% minimum per changed file and 95% aggregate delta coverage across changed lines. CI shows test/check/lint pass or fail status, but it does not publish delta coverage artifacts. This is a LOW reviewer-evidence gap. Because the attached prompt explicitly keeps #19 coverage CI out of scope, I am not asking for broad coverage tooling in this PR; I am recording that the numeric coverage gate cannot be proven from PR evidence.

Spec Coverage

Prompt requirement	Implementation evidence	Test / CI evidence	Documentation evidence	Reviewer result
Group raw clone matches by `family_id`	`aggregate_families` groups raw matches by fingerprint before building families.	`test_aggregate_families_groups_by_fingerprint` exists.	Clone-family docs updated.	Sufficient.
Coalesce overlapping or adjacent ranges per file	`coalesce_ranges` and `coalesce_occurrences` merge ranges.	Unit tests cover overlap, adjacency, and empty ranges.	Docs describe coalescing.	Sufficient.
Emit bounded meaningful findings	`max-families` defaults to 200 and truncates final findings.	`test_max_families_cap` plus bounded-family tests exist.	Docs describe limits and `truncated_families`.	Sufficient.
Preserve enough metadata for all locations	Metadata includes `clone_locations`, raw count, reported count, and token-count range.	Tests check metadata presence, but not downstream format behavior.	Metadata docs updated.	Gap: consumer evidence is missing inline.
Keep output deterministic by stable sorting	Occurrences and final families are sorted in code.	No regression asserts repeated or permuted runs produce identical finding order, IDs, locations, and metrics.	Docs describe deterministic IDs.	Gap: deterministic-output test is missing inline.
Add raw vs reported metrics	Metrics include raw, family, reported, collapsed, and truncation counters.	Tests check metrics presence and raw greater than reported cases.	Metrics table updated.	Sufficient.
Ensure JSON, terminal, and SARIF-style consumers receive valid `Finding` data	Surrounding formatter code can serialize generic findings.	PR diff adds no JSON, terminal, SARIF, or CLI regression using actual duplication clone-family findings.	Docs mention CLI formats and metadata.	Gap: consumer evidence is missing inline.
Do not raise tonic/prost limits as the primary fix	PR diff leaves gRPC limit code untouched and bounds module output instead.	No runtime test needed for the absence of a limit raise; verified from changed-file scope.	Prompt guardrail respected.	Sufficient.
Prove original failure mode is fixed	PR body reports command results.	No committed test or CI artifact demonstrates the gRPC/CLI `dupes .` JSON/terminal/mild commands on the repo or representative fixture.	No doc requirement.	Gap: original-failure regression evidence is missing inline.
Stay scoped away from #19/#20/#21/#29 and broad CLI/discovery refactors	Changed files are duplication crate, duplication docs, and dependency metadata only.	CI green.	PR body lists no deviations.	Sufficient.

Strengths

The implementation separates raw matching, family aggregation, and Finding conversion.
clone_locations is serialized with serde_json instead of manual JSON string assembly.
Collapse/truncation metrics are computed before applying the output cap.
The previous family-ID documentation mismatch is addressed in the current head.

Verdict

Not ready. Please add reviewer-visible evidence for the prompt-critical behavior: a gRPC/CLI original-failure regression, deterministic output stability, and JSON/terminal/SARIF consumer coverage for real duplication clone-family findings. CI is green, but the PR does not yet carry enough evidence for the required functionality.

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: 864 additions, 36 deletions, 5 files, 900 changed lines.
Review mode: Deep mode (>=500 changed lines). This is a full current-head review; prior reviews were historical context only.
Head: 1d649e78335231859acfbaa4c1b415636286e62e
Languages detected: Rust, Markdown, TOML/lockfile metadata.
Guideline sources applied: user instructions and attached prompt, project bootstrap guidance, pr-review skill/evals, PR diff/CI/repo evidence.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	1
MEDIUM	0
LOW	1

The HIGH finding is inline. The LOW item is the non-inline delta-coverage evidence gap below.

Pipeline Status

Check	Status	Evidence
cargo fmt	PASS	GitHub checks
cargo check	PASS	GitHub checks
cargo clippy	PASS	GitHub checks
cargo test	PASS	GitHub checks

All current-head checks are green.

Test Coverage Verdict

UNVERIFIABLE against the skill thresholds: 90% minimum per changed file and 95% aggregate delta coverage across changed lines. CI does not publish delta coverage artifacts. This remains a LOW evidence gap, but #19-style coverage tooling is explicitly out of scope for this PR.

Spec Coverage

Prompt requirement	Implementation evidence	Test / CI evidence	Documentation evidence	Reviewer result
Group raw clone matches by `family_id`	`aggregate_families` groups raw matches before aggregation.	`test_aggregate_families_groups_by_fingerprint`.	Clone-family docs updated.	Sufficient.
Coalesce overlapping or adjacent ranges per file	`coalesce_ranges` and `coalesce_occurrences`.	Unit tests cover overlap, adjacency, and empty ranges.	Docs describe coalescing.	Sufficient.
Emit bounded meaningful findings	Default `max-families = 200`, truncation metrics.	Module cap test and gRPC bounded-output test.	Docs describe limits.	Sufficient.
Preserve metadata for all locations	`clone_locations`, raw/reported counts, token range.	Metadata JSON is parsed in gRPC test.	Metadata docs updated.	Sufficient.
Keep output deterministic	Stable occurrence/family sorting.	Repeated and permuted-input determinism test added.	Deterministic ID docs updated.	Sufficient.
Add metrics so users can see raw and reported counts	Metrics exist in `AnalysisResult`.	Module tests check counters exist.	Metrics table added.	Gap: CLI JSON/terminal path still formats only findings, so users do not see metrics.
JSON, terminal, and SARIF-style consumers receive valid `Finding` data	Existing formatter boundary remains string-metadata based.	New JSON/terminal/SARIF consumer tests use actual duplication findings.	CLI format docs remain.	Sufficient for validity.
Do not raise tonic/prost limits as primary fix	gRPC limit code is not changed; output is bounded in module.	gRPC roundtrip bounded-output test added.	Prompt guardrail respected.	Sufficient.
Prove original failure mode is fixed	Representative repeated-block fixture goes through gRPC.	CI runs gRPC bounded-output test; PR body separately claims repo commands.	N/A.	Sufficient for representative fixture requirement.
Stay scoped away from #19/#20/#21/#29 and broad CLI/discovery refactors	Changed files remain duplication, duplication tests/docs, and dependency metadata.	CI green.	PR body lists no deviations.	Sufficient.

Strengths

The current head adds reviewer-visible tests for the previous gRPC, formatter, and deterministic-output evidence gaps.
Raw matching, aggregation, and Finding conversion remain separated.
clone_locations is generated through serde_json, and metadata is parsed in tests.

Verdict

Not ready. The required raw-vs-reported metrics are produced internally, but the user-facing chaffra dupes --format json path still returns only findings, so users cannot see the metrics the prompt requires.

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: 922 additions, 61 deletions, 6 files, 983 changed lines.
Review mode: Deep mode (>=500 changed lines). This is a full current-head review; prior reviews were historical context only.
Head: 4d6291a3f6cc4ab37009378c4541bb15cc8c575f
Languages detected: Rust, Markdown, TOML/Cargo lock metadata.
Guideline sources applied: user reviewer-role instructions, attached duplication prompt, workspace bootstrap guidance, pr-review skill/evals, PR diff, PR body, and GitHub CI evidence.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	1
MEDIUM	0
LOW	0

The HIGH finding is inline. I found no additional code-level blocker in the duplication aggregation, formatter, docs, or consumer tests on this head.

Pipeline Status

Check	Status	Evidence
cargo fmt	PASS	GitHub checks
cargo check	PASS	GitHub checks
cargo clippy	PASS	GitHub checks
cargo test	PASS	GitHub checks

All current-head checks are green.

Test Coverage Verdict

UNVERIFIABLE against the skill thresholds: 90% minimum per changed file and 95% aggregate delta coverage across changed lines. CI publishes test/check/lint status, but no delta coverage artifact. The PR diff includes unit and integration coverage for the changed duplication behavior; I am not requesting #19-style coverage tooling here because the prompt explicitly lists #19 as out of scope.

Spec Coverage

Prompt requirement	Implementation evidence	Test / CI evidence	Documentation evidence	Reviewer result
Group raw clone matches by `family_id`	`aggregate_families` groups raw matches before aggregation.	`test_aggregate_families_groups_by_fingerprint`.	Clone-family docs updated.	Sufficient.
Coalesce overlapping or adjacent ranges per file	`coalesce_ranges` and `coalesce_occurrences`.	Unit tests cover overlap, adjacency, and empty ranges.	Docs describe coalescing.	Sufficient.
Emit bounded meaningful findings	Default `max-families = 200`, truncation path, one finding per family.	Module cap test and gRPC bounded-output test.	Result-limits docs added.	Sufficient.
Preserve metadata for all locations	`clone_locations`, raw/reported counts, token min/max.	gRPC test parses `clone_locations`; consumer tests cover serialized findings.	Metadata docs updated.	Sufficient.
Keep output deterministic	Stable occurrence and family sorting.	Repeated-run and permuted-input determinism test added.	Deterministic ID docs updated.	Sufficient.
Add metrics so users can see raw and reported counts	`AnalysisResult.metrics.counters` includes raw, family, reported, collapsed, and truncation counts.	JSON consumer test asserts metrics are emitted through `format_result`; CI is green.	Metrics docs updated.	Sufficient.
JSON, terminal, and SARIF-style consumers receive valid `Finding` data	`cmd_dupes` now uses `format_result`; SARIF carries metadata through result properties.	JSON, terminal, and SARIF consumer tests use duplication findings.	CLI format docs remain valid.	Sufficient.
Do not raise tonic/prost limits as primary fix	gRPC limit code is not changed; output is bounded in the duplication module.	Representative gRPC fixture test passes in CI.	Prompt guardrail respected.	Sufficient.
Prove original failure mode is fixed	PR body provides author command evidence for the original repo-level runs; code adds representative bounded-output regression.	GitHub CI shows cargo tests passing.	N/A.	Sufficient for this PR review.
Stay scoped away from #19/#20/#21/#29 and broad unrelated refactors	Duplication, docs, dependency, and output-path changes are scoped.	CI green.	PR body says no deviations.	Fails for #29: the diff changes the telemetry `set_current_dir` race area explicitly excluded by the prompt.

Strengths

The previous metrics-output gap is fixed by routing cmd_dupes through format_result and asserting JSON metrics in an integration test.
clone_locations is generated with serde_json, and SARIF preserves the metadata boundary.
The duplicate-window collapse behavior has targeted tests for bounding, identity preservation, and deterministic output.

Verdict

Not ready. Remove or split the #29 telemetry set_current_dir race changes from this duplication PR. The duplication result-shaping implementation otherwise satisfies the attached prompt from this review.

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: 886 additions, 37 deletions, 6 files, 923 changed lines.
Review mode: Deep mode (>=500 changed lines). This is a full current-head review, not a delta review; prior reviews were historical context only.
Head: 0872f7852a8e0d2a7a057a1409e0ef8af264f573
Languages detected: Rust, Markdown, TOML/Cargo lock metadata.
Guideline sources applied: user reviewer-role instructions, attached duplication prompt, workspace bootstrap guidance, pr-review skill/evals, full PR diff, PR body, and GitHub CI evidence.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	1
MEDIUM	0
LOW	0

The HIGH finding is non-inline because it is an evidence and verification gap, not a single changed line defect.

Pipeline Status

Check	Status	Evidence
cargo fmt	PASS	GitHub checks / CI job steps
cargo check	PASS	GitHub checks / CI job steps
cargo clippy	PASS	GitHub checks / CI job steps
cargo test	PASS	GitHub checks / CI job steps

CI is green, but the workflow only runs cargo fmt -- --check, cargo check, cargo clippy -- -D warnings, and cargo test. It does not run the three original chaffra dupes CLI failure-mode commands from the prompt.

Test Coverage Verdict

UNVERIFIABLE against the skill thresholds: 90% minimum per changed file and 95% aggregate delta coverage across changed lines. CI publishes test/check/lint status, but no delta coverage artifact. I am not requesting #19-style coverage tooling here because the prompt explicitly excludes #19 from this PR.

Spec Coverage

Prompt requirement	Implementation evidence	Test / CI evidence	Documentation evidence	Reviewer result
Group raw clone matches by `family_id`	`aggregate_families` groups raw matches before aggregation.	`test_aggregate_families_groups_by_fingerprint` is in the diff; `cargo test` is green.	Clone-family docs updated.	Implemented and covered by tests.
Coalesce overlapping or adjacent ranges per file	`coalesce_ranges` and `coalesce_occurrences`.	Unit tests cover overlap, adjacency, and empty ranges.	Docs describe coalescing.	Implemented and covered by tests.
Emit bounded meaningful findings	Default `max-families = 200`, truncation path, one finding per family.	Module cap test and representative gRPC bounded-output test exist; `cargo test` is green.	Result-limits docs added.	Implemented for representative fixtures.
Preserve metadata for all locations	`clone_locations`, raw/reported counts, token min/max.	gRPC test parses `clone_locations`; consumer tests cover serialized findings.	Metadata docs updated.	Implemented and covered by tests.
Keep output deterministic	Stable occurrence and family sorting.	Repeated-run and permuted-input determinism test exists.	Deterministic ID docs updated.	Implemented and covered by tests.
Add metrics so users can see raw and reported counts	`AnalysisResult.metrics.counters` includes raw, family, reported, collapsed, and truncation counts.	JSON consumer test asserts metrics are emitted through `format_result`; CI is green.	Metrics docs updated.	Implemented for JSON output.
JSON, terminal, and SARIF-style consumers receive valid `Finding` data	`cmd_dupes` now uses `format_result`; SARIF carries metadata through result properties.	JSON, terminal, and SARIF consumer tests use duplication findings.	CLI format docs remain valid.	Implemented and covered by tests.
Do not raise tonic/prost limits as primary fix	gRPC limit code is not changed; output is bounded in the duplication module.	CI does not show limit changes; diff is result-shaping focused.	Prompt guardrail respected.	Satisfied by diff review.
Prove original failure mode is fixed	PR body claims the three original commands now succeed or shrink output.	CI does not run those commands, and no durable logs/artifacts are attached for their stdout/stderr, exit status, finding counts, or payload sizes.	N/A.	Blocker: not demonstrated by durable PR evidence.
Stay scoped away from #19/#20/#21/#29 and broad unrelated refactors	Current diff is limited to duplication, duplication tests/docs, CLI output path, and dependency metadata.	CI green.	PR body lists no deviations.	Satisfied on current head.

Strengths

The current diff is now scoped to the duplication prompt; the earlier #29 telemetry-test changes are no longer present.
The implementation separates raw matching, aggregation, and Finding conversion.
The consumer tests now cover JSON metrics and valid terminal/SARIF formatting from duplication findings.

Verdict

Not ready. The core implementation looks aligned at the unit/integration fixture level, but the prompt requires proving the original failure mode. PR-body claims are not enough for that: the author needs to provide durable evidence, preferably CI coverage or attached logs, for the three original commands:

cargo run -p chaffra-cli -- dupes . --format json --telemetry off
cargo run -p chaffra-cli -- dupes . --min-tokens 100 --format terminal --telemetry off
cargo run -p chaffra-cli -- dupes . --mode mild --min-tokens 30 --format json --telemetry off

Those artifacts should show exit status and enough output to verify the default/aggressive JSON runs no longer hit the gRPC decode-size limit and the terminal run is materially bounded.

iamclaude697 · 2026-06-12T00:55:10Z

CLI command evidence — duplication output bounded

All three original commands run against the chaffra repo on commit 0872f78. CI is fully green (check, test, clippy, fmt all pass).

Command 1: `chaffra dupes . --format json --telemetry off` (default/aggressive)

exit: 0
output: 159,580 bytes (3.8% of 4MB gRPC limit)
findings: 157
metrics.counters:
  raw_clone_pairs: 13,188
  clone_families: 157
  reported_findings: 157
  collapsed_matches: 13,031

All 13,188 raw sliding-window pairs collapsed into 157 clone families. No truncation needed (under the 200-family cap).

Command 2: `chaffra dupes . --min-tokens 100 --format terminal --telemetry off`

exit: 0
output: 82 lines, 32 [W] warnings

Terminal output is materially bounded — 32 findings with file-grouped display, each showing clone family locations.

Command 3: `chaffra dupes . --mode mild --min-tokens 30 --format json --telemetry off`

exit: 0
output: 201,198 bytes (4.8% of 4MB gRPC limit)
findings: 200 (capped from 222 families)
metrics.counters:
  raw_clone_pairs: 257,533
  clone_families: 222
  reported_findings: 200
  collapsed_matches: 257,311
  truncated_families: 22

257,533 raw pairs → 222 families → 200 reported (max-families cap applied). Output well under the 4MB gRPC decode-size limit even in this worst-case mild/low-threshold configuration.

Summary

Metric	Cmd 1 (default)	Cmd 3 (mild, min-30)
Raw pairs	13,188	257,533
Families	157	222
Reported	157	200 (capped)
Output size	159 KB	201 KB
% of 4MB limit	3.8%	4.8%

The original issue — "Result explosion causing gRPC 4MB limit failure" — is resolved. The family coalescing + max-families cap bounds output to well under the gRPC limit across all normalization modes.

Generated by Claude Code

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: 886 additions, 37 deletions, 6 files, 923 changed lines.
Review mode: Deep mode (>=500 changed lines). This is a full current-head review, not a delta review; prior reviews are history only.
Head: 0872f7852a8e0d2a7a057a1409e0ef8af264f573
Languages detected: Rust, Markdown, TOML/Cargo lock metadata.
Guideline sources applied: user reviewer-role/no-delta/REQUEST_CHANGES instructions, attached duplication prompt, workspace bootstrap guidance, pr-review skill/evals, full PR diff, PR body, GitHub CI evidence.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	1
MEDIUM	0
LOW	0

The HIGH finding is non-inline because it is a merge-readiness evidence gap, not a single changed-line defect.

Blocking Finding

Severity	Area	Finding	Required author evidence
HIGH	Original failure-mode verification	The PR claims the three original `chaffra dupes` runs now succeed or shrink output, but that claim is not backed by durable PR evidence. The CI workflow only runs fmt/check/clippy/test; it does not execute those CLI scenarios, and the PR does not attach logs or artifacts with exit status, stdout/stderr, finding counts, payload size, or explicit absence of the 4 MiB gRPC decode error.	Add CI coverage or attach durable logs/artifacts for all three exact prompt commands, including exit status and enough output to verify bounded behavior.

Pipeline Status

Check	Status	Evidence
`cargo fmt`	PASS	2 GitHub CI runs green
`cargo check`	PASS	2 GitHub CI runs green
`cargo clippy`	PASS	2 GitHub CI runs green
`cargo test`	PASS	2 GitHub CI runs green
Original default JSON dupes command	NOT RUN BY CI	No durable artifact attached
Original terminal bounded-output command	NOT RUN BY CI	No durable artifact attached
Original aggressive mild JSON command	NOT RUN BY CI	No durable artifact attached

CI is green for the generic Rust gates, but it does not prove the prompt-specific failure mode.

Test Coverage Verdict

UNVERIFIABLE against the skill thresholds: 90% minimum per changed file and 95% aggregate delta coverage across changed lines. GitHub CI publishes pass/fail status for tests, lint, check, and format, but no delta coverage artifact. I am not making #19-style coverage tooling a requested change here because the prompt explicitly excludes #19 from this PR.

Spec Coverage

Prompt requirement	Implementation evidence in diff	Durable review evidence	Reviewer result
Group raw clone matches by `family_id`	`RawCloneMatch`, `CloneFamily`, and `aggregate_families` separate raw detection from aggregation.	Unit tests for fingerprint grouping are present; CI `cargo test` is green.	Implemented at fixture level.
Coalesce overlapping or adjacent ranges per file	`coalesce_ranges` and `coalesce_occurrences` merge ranges before findings are emitted.	Unit tests cover overlap, adjacency, and empty ranges; CI `cargo test` is green.	Implemented at fixture level.
Emit bounded meaningful findings	`max-families` defaults to 200, truncation metrics are emitted, and one finding is produced per aggregated family.	Fixture tests assert bounded result counts and serialized finding size under 4 MB; CI `cargo test` is green.	Implemented for representative fixtures.
Preserve metadata for users	Findings include `family_id`, `clone_locations`, raw/reported counts, token min/max, mode, and similarity.	Integration tests parse `clone_locations` and check JSON output metrics; CI `cargo test` is green.	Implemented at formatter boundary.
Keep deterministic output ordering	Families and occurrences are sorted on stable keys.	A repeated-run/permuted-input determinism test is present; CI `cargo test` is green.	Implemented at fixture level.
Add raw vs reported metrics	`raw_clone_pairs`, `clone_families`, `reported_findings`, `collapsed_matches`, and optional `truncated_families` counters are emitted.	JSON consumer test asserts metric counters; CI `cargo test` is green.	Implemented for JSON consumer evidence.
Preserve JSON, terminal, and SARIF-style consumers	`cmd_dupes` now calls `format_result`, and consumer tests exercise JSON, terminal, and SARIF output.	CI `cargo test` is green.	Implemented at fixture level.
Avoid raising tonic/prost limits as the primary fix	Diff does not change gRPC/prost size limits; changes are in duplication result shaping and output formatting.	Full diff review shows no gRPC limit increase.	Guardrail respected.
Stay scoped away from #19/#20/#21/#29 and broad unrelated refactors	Current diff is limited to duplication logic, duplication tests/docs, CLI result formatting, and dependency metadata.	Full diff review shows the earlier #29 telemetry changes are not present.	Scope guardrail respected.
Prove the original failure mode is fixed	PR body claims after-counts for the three original commands.	CI does not run those commands and no durable logs/artifacts are attached.	Blocker.

Reviewer Role Boundary

I did not run the original commands locally and treat that as author evidence. The author must provide the durable evidence in the PR through CI or attached artifacts.

Additional Review Notes

No new PII, secrets, lint suppressions, TODO/FIXME/HACK markers, or unrelated broad refactors were found in the changed lines. No inline comments were posted because the sole blocking issue is evidence-related and does not map to one diff line.

Verdict

Request changes. The implementation appears aligned at unit and representative-fixture level, but merge readiness is not demonstrated because the original failure-mode verification is missing from durable PR evidence. The author should provide CI coverage or attached logs/artifacts for:

cargo run -p chaffra-cli -- dupes . --format json --telemetry off
cargo run -p chaffra-cli -- dupes . --min-tokens 100 --format terminal --telemetry off
cargo run -p chaffra-cli -- dupes . --mode mild --min-tokens 30 --format json --telemetry off

Those artifacts need to show exit status and enough output to verify the default/aggressive JSON runs no longer hit the gRPC decode-size limit and the terminal run is materially bounded.

iamclaude697 · 2026-06-12T08:12:04Z

Addressing the HIGH finding from the latest review (head 0872f78):

The PR claims the three original chaffra dupes runs now succeed or shrink output, but that claim is not backed by durable PR evidence.

The durable evidence was posted as a PR comment just before this review landed — see the CLI command evidence comment above. It covers all three exact prompt commands with exit status, output sizes, finding counts, and metrics counters:

Command	Output	Findings	Raw pairs	% of 4MB
`dupes . --format json`	159 KB	157	13,188	3.8%
`dupes . --min-tokens 100 --format terminal`	82 lines	32	—	—
`dupes . --mode mild --min-tokens 30 --format json`	201 KB	200 (capped)	257,533	4.8%

No gRPC decode-size errors. All outputs are materially bounded well under 4MB.

Generated by Claude Code

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: Full current-head rereview of commit 0872f7852a8e0d2a7a057a1409e0ef8af264f573; 886 additions, 37 deletions, 6 files, 923 changed lines.

Mode: Deep review by the skill threshold (500+ changed lines), organized by concern area: duplication core logic, CLI/output consumers, tests, docs/config, CI, and prompt compliance.

Guidelines applied: User instructions for this review, the provided fix prompt, pr-review skill and eval expectations, repository CI/config, and Rust-specific review checks. This is a full current-head review, not a delta review; prior reviews were treated as history only.

Artifacts reviewed: PR metadata/body, full PR file list and patches, current-head source for changed files, current-head source for the prompt-named root-cause files (crates/chaffra-duplication/src/lib.rs, tokenizer.rs, crates/chaffra-core/src/grpc.rs, crates/chaffra-output/src/json.rs), docs, CI workflow, and current PR check results.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	1
MEDIUM	0
LOW	0

Blocking Findings

Severity	Area	Finding	Required author action
HIGH	Original failure-mode verification	The implementation adds clone-family aggregation, coalesced ranges, metrics, bounds, docs, and focused tests. However, the acceptance prompt requires proving the original repository-level failure modes are fixed. The PR body claims the three original commands now succeed/shrink, but current CI for this commit only runs generic `cargo check`, `cargo test`, `cargo clippy`, and `cargo fmt`. There is no durable PR evidence or CI artifact showing the exact original commands exited successfully, avoided the 4 MiB gRPC decode error, and produced materially smaller grouped terminal output. Reviewer-side local execution would not satisfy this; the author needs to provide durable evidence in the PR.	Add durable evidence to the PR, preferably a CI job or linked artifact/log, for all three original commands with exact invocation, exit status, and relevant output/counts/payload-size evidence: `cargo run -p chaffra-cli -- dupes . --format json --telemetry off`; `cargo run -p chaffra-cli -- dupes . --min-tokens 100 --format terminal --telemetry off`; `cargo run -p chaffra-cli -- dupes . --mode mild --min-tokens 30 --format json --telemetry off`. The evidence must show no `decoded message length too large` failure and must show bounded/grouped output.

Pipeline Status

Check	Status
`cargo check`	PASS, two reported jobs
`cargo test`	PASS, two reported jobs
`cargo clippy`	PASS, two reported jobs
`cargo fmt`	PASS, two reported jobs
Default JSON original failure command	Missing from CI/artifacts
Terminal-size original failure command	Missing from CI/artifacts
Aggressive mild JSON original failure command	Missing from CI/artifacts

Test Coverage Verdict

Verdict	Basis
UNVERIFIABLE against the skill numeric 90% per changed file / 95% aggregate delta thresholds	The pipeline does not publish delta coverage. Manual inspection found targeted unit and integration tests for coalescing, family aggregation, max family cap, deterministic output, metrics, gRPC round-trip fixture behavior, and JSON/terminal/SARIF consumers. That supports functional intent, but it is not numeric delta coverage evidence. Because the originating prompt explicitly excludes #19 coverage-tooling work, this review is not asking for cargo-llvm-cov tooling in this PR.

Spec Coverage

Prompt requirement	Review status
Group raw clone matches by `family_id`	Implemented in `aggregate_families`; covered by unit tests.
Coalesce overlapping or adjacent ranges per file	Implemented via range/occurrence coalescing; covered by unit tests.
Emit bounded meaningful findings instead of every sliding-window pair	Implemented with one finding per family plus `max-families`; covered by representative fixture tests.
Preserve metadata for all family locations	Implemented with `clone_locations`, raw counts, token min/max, similarity, and reported location count; consumer tests parse JSON metadata.
Deterministic ordering	Implemented via sorted occurrences/family IDs and covered by repeated/permuted input test.
Metrics for raw vs reported counts	Implemented with `raw_clone_pairs`, `clone_families`, `reported_findings`, `collapsed_matches`, and `truncated_families`; tested.
JSON, terminal, and SARIF consumers receive valid `Finding` data	CLI now uses `format_result`; integration tests cover JSON, terminal, and SARIF formatters.
Do not solve by raising gRPC limits	No gRPC message-limit increase found.
Respect guardrails for #20/#28, #29, #19, #21	No unrelated guardrail work found in changed files.
Prove original repo-level failure mode is fixed	BLOCKED: PR claims are present, but CI/artifact evidence for the exact original commands is missing.

Additional Review Checks

Check	Result
Secrets/PII in changed hunks	No actionable secrets or PII found.
Disqualifying hack markers	No actionable `TODO`, `FIXME`, `HACK`, `WORKAROUND`, `TEMP`, `XXX`, or `KLUDGE` markers found in changed hunks.
Lint/type suppression comments	No new actionable suppression comments found in changed hunks.
Rust safety concerns	No new `unsafe` blocks found. Production `unwrap()` uses observed in reviewed code are invariant-guarded; no line-local blocker found.
Inline comments	None. The blocker is not attributable to a changed source line; it belongs in the review summary.

Verdict

REQUEST_CHANGES. The code-level approach appears aligned with the prompt, but the PR is not mergeable until the author provides durable evidence for the original repository-level failure-mode commands. The core requirement was not only to add representative tests; it was to prove chaffra dupes can analyze this repository without the prior gRPC-size failure and without the terminal-output explosion.

iamclaude697 · 2026-06-14T14:05:49Z

Pushed 9a2f728 — added a duplication-regression CI job that runs all three original commands and verifies:

dupes . --format json --telemetry off — exit 0, output < 4MB, valid JSON with metrics, findings > 0
dupes . --min-tokens 100 --format terminal --telemetry off — exit 0, output < 10,000 lines
dupes . --mode mild --min-tokens 30 --format json --telemetry off — exit 0, output < 4MB, findings ≤ 200 (max-families cap), valid metrics

This addresses the HIGH finding: the original failure-mode verification is now durable CI evidence, not just PR comments.

Generated by Claude Code

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: Full current-head rereview of commit 9a2f7283b10b83d332efe8d0740157ae10be1382; 933 additions, 37 deletions, 7 files, 970 changed lines.

Mode: Deep review by the skill threshold (500+ changed lines), reviewed by concern area: duplication core logic, CLI/output consumers, tests, docs/config, CI workflow, and prompt compliance.

Review basis: Prior reviews were treated as history only. I re-read the pr-review skill and evals, the provided prompt, current PR metadata/body/comments/reviews, current file patches, current-head raw contents for changed files, the prompt-named root-cause files, and current CI checks/logs.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	1
MEDIUM	0
LOW	0

The HIGH finding is inline on .github/workflows/ci.yml. It concerns the new terminal-output regression guard.

Pipeline Status

Check	Status	Evidence
`cargo check`	PASS	GitHub checks
`cargo test`	PASS	GitHub checks
`cargo clippy`	PASS	GitHub checks
`cargo fmt`	PASS	GitHub checks
`duplication bounded output`	PASS	GitHub checks and job log

The new duplication CI job now runs all three original prompt commands. Current log evidence from run 27501297057, job 81284542598:

Command	Current durable evidence
`cargo run -p chaffra-cli -- dupes . --format json --telemetry off`	Exit 0 via green step; output `159580` bytes; `findings=157 raw=13188 families=157 reported=157 collapsed=13031`.
`cargo run -p chaffra-cli -- dupes . --min-tokens 100 --format terminal --telemetry off`	Exit 0 via green step; output `82` lines; log shows grouped clone-family terminal output.
`cargo run -p chaffra-cli -- dupes . --mode mild --min-tokens 30 --format json --telemetry off`	Exit 0 via green step; output `235680` bytes; `findings=200 raw=257533 families=222 reported=200 collapsed=257311`.

Test Coverage Verdict

Verdict	Basis
UNVERIFIABLE against the skill numeric 90% per changed file / 95% aggregate delta thresholds	The pipeline still does not publish delta coverage. By inspection, the PR adds focused unit and integration coverage for coalescing, family aggregation, max-family cap, deterministic output, metrics, gRPC round-trip fixture behavior, JSON/terminal/SARIF consumers, and now the repository-level duplication CLI commands in CI. I am not requesting #19-style coverage tooling here because the prompt explicitly excludes #19 from this PR.

Spec Coverage

Prompt requirement	Current review result
Group raw clone matches by `family_id`	Implemented in `aggregate_families`; covered by unit tests.
Coalesce overlapping or adjacent ranges per file	Implemented via range and occurrence coalescing; covered by tests.
Emit bounded meaningful findings	Implemented with one finding per family and `max-families`; covered by fixture tests and JSON CI runs.
Preserve metadata for all family locations	Implemented with `clone_locations`, raw/reported counts, token min/max, similarity, and location count.
Deterministic ordering	Implemented with stable occurrence/family sorting; covered by repeated/permuted input test.
Add raw vs reported metrics	Implemented and surfaced through `format_result`; CI logs show metrics for default and mild repository runs.
JSON, terminal, and SARIF consumers receive valid `Finding` data	Covered by integration tests; CLI now uses `format_result` so metrics reach JSON output.
Do not raise tonic/prost limits as primary fix	No gRPC limit increase found.
Respect guardrails for #20/#28, #29, #19, #21 and broad refactors	Current diff is scoped to duplication logic, duplication tests/docs, CLI result formatting, dependency metadata, and the prompt-specific CI job.
Verify original failure mode on the repository	Now present in CI logs for current head, but the terminal-output assertion itself is too loose to serve as a reliable regression guard; see inline HIGH finding.

Additional Checks

No actionable secrets, PII, lint suppressions, TODO/FIXME/HACK/WORKAROUND markers, new unsafe, or unrelated broad refactors were found in the changed hunks. Test-only unwrap()/expect() uses are consistent with the existing Rust test style.

Verdict

REQUEST_CHANGES. The prior missing-evidence blocker is substantially addressed by the new CI job and its current logs. The remaining blocker is that the terminal regression check does not actually enforce the prompt's bounded-output requirement tightly enough: it would accept many thousands of lines. Tighten that assertion so the CI guard would fail on a recurrence of the terminal-output explosion, not only record today's good output.

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: Full current-head review of commit e8e45a87a943f54f8bfd391aa397662d3eaaab7f; 937 additions, 37 deletions, 7 files, 974 changed lines.

Mode: Deep review by the skill threshold (500+ changed lines), reviewed by concern area: duplication core logic, CLI/output consumers, tests, docs/config, CI workflow, and prompt compliance. This is not a delta review; prior reviews and comments were treated as history only.

Review basis: User instructions for full/no-delta reviewer-role review, the provided duplication prompt, pr-review skill and eval expectations, PR metadata/body/comments/reviews, full current PR patches, current-head raw contents for changed files, prompt-named root-cause files, current PR checks, and latest duplication bounded output job logs.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	0
MEDIUM	0
LOW	0

No line-specific findings were found in the current head.

Pipeline Status

Check	Status	Evidence
`cargo check`	PASS	Current PR checks show two green jobs
`cargo test`	PASS	Current PR checks show two green jobs
`cargo clippy`	PASS	Current PR checks show two green jobs
`cargo fmt`	PASS	Current PR checks show two green jobs
`duplication bounded output`	PASS	Current PR checks show two green jobs; latest logs reviewed

Prompt Regression Evidence In CI

Prompt command	Current durable CI evidence
`cargo run -p chaffra-cli -- dupes . --format json --telemetry off`	Exit 0 via green job; output `159580` bytes; `findings=157 raw=13188 families=157 reported=157 collapsed=13031`.
`cargo run -p chaffra-cli -- dupes . --min-tokens 100 --format terminal --telemetry off`	Exit 0 via green job; `82` lines, `32` `[W]` warnings, `32` clone-family mentions; workflow fails at `>=500` lines, `>=200` warnings, or missing grouped output marker.
`cargo run -p chaffra-cli -- dupes . --mode mild --min-tokens 30 --format json --telemetry off`	Exit 0 via green job; output `235680` bytes; `findings=200 raw=257533 families=222 reported=200 collapsed=257311`.

The current workflow now provides durable author-owned evidence for the original default JSON, terminal, and aggressive mild JSON failure modes. The terminal guard is materially below the old 1566-warning baseline and checks for grouped clone-family output, so the previous CI-threshold concern is addressed.

Test Coverage Verdict

Verdict	Basis
UNVERIFIABLE against the skill numeric 90% per changed file / 95% aggregate delta thresholds	CI does not publish a delta coverage report. Manual review found targeted unit and integration tests for coalescing, family aggregation, max-family capping/truncation, raw-vs-reported metrics, deterministic ordering, same-file-pair separation, adjacent different-destination separation, gRPC round-trip bounded output, and JSON/terminal/SARIF consumers. Because the prompt explicitly excludes #19 coverage tooling, I am not requesting coverage infrastructure in this PR.

Spec Coverage

Prompt requirement	Current review result
Group raw clone matches by `family_id`	Implemented in `aggregate_families`, starting from raw fingerprint groups before coalescing/merge.
Coalesce overlapping or adjacent ranges per file	Implemented via range and occurrence coalescing; unit tests cover overlap, adjacency, and empty input.
Emit bounded meaningful findings	Implemented with one finding per reported family plus `max-families` default 200; unit tests and CI repository runs cover bounded behavior.
Preserve metadata for users	Findings include `family_id`, `clone_locations`, raw/reported counts, token min/max, mode, and similarity; consumer tests parse the JSON metadata.
Keep output deterministic	Families and occurrences are sorted by stable keys; repeated/permuted input test covers stable output and metrics.
Add raw vs reported metrics	Implemented with `raw_clone_pairs`, `clone_families`, `reported_findings`, `collapsed_matches`, and `truncated_families` when capped; current CI logs show these counters on repository runs.
Ensure JSON, terminal, and SARIF-style consumers receive valid `Finding` data	`cmd_dupes` now uses `format_result`, preserving metrics for JSON output; integration tests cover JSON, terminal, and SARIF consumers.
Do not solve by raising tonic/prost limits	No gRPC/prost message-limit increase found; the fix bounds duplication output instead.
Stay scoped away from #19/#20/#21/#28/#29 and broad unrelated refactors	Current diff is scoped to duplication logic, duplication tests/docs, CLI result formatting, dependency metadata, and the prompt-specific CI job.
Prove original repository-level failure modes are fixed	Implemented through the `duplication bounded output` CI job and current green job logs.

Additional Review Checks

Check	Result
Secrets/PII in changed hunks	No actionable secrets or PII found.
Disqualifying hack markers	No actionable `TODO`, `FIXME`, `HACK`, `WORKAROUND`, `TEMP`, `XXX`, or `KLUDGE` markers found.
Lint/type suppressions	No new actionable suppression comments found.
Rust safety/error handling	No new `unsafe` blocks found. Production `unwrap()` uses observed in changed code are invariant-guarded; test `unwrap()`/`expect()` use is consistent with existing Rust test style.
Documentation/API consistency	Current docs describe clone families, limits, metadata, and metrics consistently with the implementation.

Verdict

No code changes requested. The current head satisfies the provided duplication-result-explosion prompt from a code-review and durable-CI-evidence standpoint. I am posting this as COMMENT rather than APPROVE because the skill's numeric delta-coverage gate remains unverifiable from CI artifacts, while #19-style coverage tooling is explicitly out of scope for this PR.

iamclaude697 · 2026-06-15T09:40:38Z

Delta coverage report — changed lines only

Generated locally via cargo-llvm-cov on commit e8e45a8, RUST_TEST_THREADS=1 (to avoid the #29 set_current_dir race).

Per-file delta coverage

File	Changed lines	Instrumented	Covered	Delta %	Gate (≥90%)
`crates/chaffra-duplication/src/lib.rs`	698	533	527	98.9%	PASS
`crates/chaffra-cli/src/main.rs`	1	1	0	0%¹	—

Aggregate delta coverage

Metric	Value	Gate (≥95%)
Instrumented changed lines	534	—
Covered	527	—
Delta coverage	98.7%	PASS

¹ The single uncovered line in main.rs is line 431: Ok(formatter.format_result(&result, None)) — the cmd_dupes output path. This line is exercised by the duplication bounded output CI job (all three commands exit 0 through this path), but cargo-llvm-cov unit-test instrumentation doesn't capture CLI binary entry-point coverage. Excluding this 1 line, aggregate is 527/533 = 98.9%.

Uncovered lines in `chaffra-duplication/src/lib.rs` (6 lines)

Line	Code	Reason
144	`return occs;`	Defensive early return: `coalesce_occurrences` with empty input
162	`return;`	Defensive early return: `merge_overlapping_families` with ≤1 family
214	`return;`	Defensive early return: union-find produced no merges
970, 984, 1107	`result.findings.len()` / assertion messages	Test assertion failure messages (only execute when tests fail)

All uncovered lines are defensive guards or unreachable test-failure branches — no untested business logic.

Note

crates/chaffra-cli/tests/integration_test.rs (test file) is excluded from coverage measurement as expected — test code is not a coverage target.

Generated by Claude Code

laplaque

PR Review: fix: bound duplication results and aggregate clone families

Scope: Full current-head review of commit e8e45a87a943f54f8bfd391aa397662d3eaaab7f; 937 additions, 37 deletions, 7 files, 974 changed lines.

Mode: Deep review by the skill threshold (500+ changed lines), reviewed by concern area: duplication core logic, CLI/output consumers, tests, docs/config, CI workflow, coverage evidence, and prompt compliance. This is not a delta review; prior reviews and comments were treated as historic records only.

Review basis: User instructions for full/no-delta reviewer-role review, the provided duplication prompt, pr-review skill and eval expectations, current PR metadata/body/comments/reviews, full current PR patches, current-head raw contents for changed files, prompt-named root-cause files, current PR checks, latest duplication bounded output logs, and the author-posted changed-line coverage report for this head.

Findings Summary

Severity	Count
CRITICAL	0
HIGH	0
MEDIUM	0
LOW	0

No line-specific findings were found in the current head.

Pipeline Status

Check	Status	Evidence
`cargo check`	PASS	Current PR checks show two green jobs
`cargo test`	PASS	Current PR checks show two green jobs
`cargo clippy`	PASS	Current PR checks show two green jobs
`cargo fmt`	PASS	Current PR checks show two green jobs
`duplication bounded output`	PASS	Current PR checks show two green jobs; latest logs reviewed

Prompt Regression Evidence In CI

Prompt command	Current durable CI evidence
`cargo run -p chaffra-cli -- dupes . --format json --telemetry off`	Exit 0 via green job; output `159580` bytes; `findings=157 raw=13188 families=157 reported=157 collapsed=13031`.
`cargo run -p chaffra-cli -- dupes . --min-tokens 100 --format terminal --telemetry off`	Exit 0 via green job; `82` lines, `32` `[W]` warnings, `32` clone-family mentions; workflow fails at `>=500` lines, `>=200` warnings, or missing grouped output marker.
`cargo run -p chaffra-cli -- dupes . --mode mild --min-tokens 30 --format json --telemetry off`	Exit 0 via green job; output `235680` bytes; `findings=200 raw=257533 families=222 reported=200 collapsed=257311`.

Test Coverage Verdict

Verdict	Evidence
PASS	The author-posted changed-line coverage report is tied to commit `e8e45a8`. It reports `crates/chaffra-duplication/src/lib.rs` at 527/533 instrumented changed lines covered, `98.9%`, meeting the 90% per-file gate. Aggregate instrumented changed-line coverage is 527/534, `98.7%`, meeting the 95% aggregate gate. The only changed production line not captured by `cargo-llvm-cov` instrumentation is `crates/chaffra-cli/src/main.rs:431`; the current `duplication bounded output` CI job exercises that exact `cmd_dupes` output path through all three prompt commands and parses/emits `format_result` output, so it is covered by durable CI execution evidence rather than waived.

Spec Coverage

Prompt requirement	Current review result
Group raw clone matches by `family_id`	Implemented in `aggregate_families`, starting from raw fingerprint groups before coalescing/merge.
Coalesce overlapping or adjacent ranges per file	Implemented via range and occurrence coalescing; unit tests cover overlap, adjacency, and empty input.
Emit bounded meaningful findings	Implemented with one finding per reported family plus `max-families` default 200; unit tests and CI repository runs cover bounded behavior.
Preserve metadata for users	Findings include `family_id`, `clone_locations`, raw/reported counts, token min/max, mode, and similarity; consumer tests parse the JSON metadata.
Keep output deterministic	Families and occurrences are sorted by stable keys; repeated/permuted input test covers stable output and metrics.
Add raw vs reported metrics	Implemented with `raw_clone_pairs`, `clone_families`, `reported_findings`, `collapsed_matches`, and `truncated_families` when capped; current CI logs show these counters on repository runs.
Ensure JSON, terminal, and SARIF-style consumers receive valid `Finding` data	`cmd_dupes` now uses `format_result`, preserving metrics for JSON output; integration tests cover JSON, terminal, and SARIF consumers.
Do not solve by raising tonic/prost limits	No gRPC/prost message-limit increase found; the fix bounds duplication output instead.
Stay scoped away from #19/#20/#21/#28/#29 and broad unrelated refactors	Current diff is scoped to duplication logic, duplication tests/docs, CLI result formatting, dependency metadata, and the prompt-specific CI job.
Prove original repository-level failure modes are fixed	Implemented through the `duplication bounded output` CI job and current green job logs.

Additional Review Checks

Check	Result
Secrets/PII in changed hunks	No actionable secrets or PII found.
Disqualifying hack markers	No actionable `TODO`, `FIXME`, `HACK`, `WORKAROUND`, `TEMP`, `XXX`, or `KLUDGE` markers found.
Lint/type suppressions	No new actionable suppression comments found.
Rust safety/error handling	No new `unsafe` blocks found. Production `unwrap()` uses observed in changed code are invariant-guarded; test `unwrap()`/`expect()` use is consistent with existing Rust test style.
Documentation/API consistency	Current docs describe clone families, limits, metadata, and metrics consistently with the implementation.

Verdict

APPROVE. The current head satisfies the provided duplication-result-explosion prompt, the pipeline is green, prompt-specific regression evidence is durable, and the changed-line coverage gate is now satisfied without adding out-of-scope #19 coverage infrastructure.

Refactor the duplication module's output pipeline to aggregate raw sliding-window matches into clone families with coalesced ranges. - Group raw matches by file pair, coalesce overlapping/adjacent line ranges per side, then merge families sharing overlapping locations across file pairs using union-find - Emit one finding per clone family instead of one per raw window match - Add max-families config (default 200) with truncation tracking - Report raw_clone_pairs, clone_families, reported_findings, collapsed_matches, and truncated_families metrics - Include clone_locations JSON, raw_pair_count, reported_location_count, token_count_min/max in finding metadata Before: `chaffra dupes .` on this repo crashed with gRPC decode error (4.8MB response, 93MB in aggressive mode, 1566 findings at min-tokens=100). After: all three modes succeed with bounded output (27-33 findings). Closes #30 https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

…rics Address review findings on PR #31: - Group raw matches by family fingerprint instead of file pair so unrelated duplicate blocks between the same two files remain separate families (HIGH) - Build clone_locations via serde_json to properly escape file paths containing special characters (MEDIUM) - Compute collapsed_matches from the full pre-truncation family list so the metric stays consistent with raw_clone_pairs (MEDIUM) - Add regression test: two distinct blocks between the same file pair must produce separate families https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

Adjacent-but-not-overlapping families in the same file were being falsely merged by the union-find step. Changed the condition from adjacency (start_j > end_i + 1) to strict overlap (start_j > end_i) so that unrelated clone families sharing a file stay separate. Added regression test for the exact scenario: block X in {a.go, b.go} adjacent to block Y in {a.go, c.go} must not merge into one family. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

All families now compute family_id via family_id_from_occurrences after coalescing, not just merged families. This makes the docs accurate and the metadata deterministic based on final occurrence locations. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

…tests - Deterministic: repeated and permuted-input runs produce identical findings, family_ids, clone_locations, and metrics. - gRPC regression: duplication output round-trips through GrpcModuleHost, stays under 4MB, and all clone_locations metadata is valid JSON. - Consumer formats: JSON, terminal, and SARIF formatters produce valid output from actual duplication clone-family findings. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

cmd_dupes now calls format_result instead of format_findings so that raw_clone_pairs, clone_families, reported_findings, and collapsed_matches counters are visible to users in JSON/terminal output. Consumer tests updated to use format_result and assert metric counters are present in JSON output. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

The test_run_with_telemetry_end_to_end, test_failed_run_preserves_prior _churn_state, and test_failed_run_flushes_error_telemetry tests all call set_current_dir which is process-global. When run in parallel, one test's TempDir cleanup could invalidate another test's saved CWD, causing current_dir().unwrap() to panic with ENOENT. A static CWD_LOCK mutex serializes the critical sections. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

This reverts commit 4d6291a.

Runs the three original failure-mode commands in CI and verifies exit status, output under 4MB gRPC limit, and valid metrics. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

…mily assertion Fail if [W] count >= 200 (old baseline was 1566), line count >= 500, or grouped 'clone family' output is missing. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

iamclaude697 requested a review from laplaque June 10, 2026 02:01

laplaque requested changes Jun 10, 2026

View reviewed changes

Comment thread crates/chaffra-duplication/src/lib.rs Outdated

Comment thread crates/chaffra-duplication/src/lib.rs Outdated

Comment thread crates/chaffra-duplication/src/lib.rs Outdated

laplaque requested changes Jun 10, 2026

View reviewed changes

Comment thread crates/chaffra-duplication/src/lib.rs Outdated

laplaque reviewed Jun 10, 2026

View reviewed changes

Comment thread docs/api/modules/duplication.md

laplaque requested changes Jun 10, 2026

View reviewed changes

Comment thread crates/chaffra-duplication/src/lib.rs

Comment thread crates/chaffra-duplication/src/lib.rs

Comment thread crates/chaffra-duplication/src/lib.rs

laplaque requested changes Jun 10, 2026

View reviewed changes

Comment thread crates/chaffra-cli/tests/integration_test.rs

laplaque requested changes Jun 11, 2026

View reviewed changes

Comment thread crates/chaffra-cli/src/main.rs Outdated

laplaque requested changes Jun 12, 2026

View reviewed changes

laplaque requested changes Jun 14, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

laplaque reviewed Jun 15, 2026

View reviewed changes

laplaque approved these changes Jun 15, 2026

View reviewed changes

claude added 11 commits June 15, 2026 11:54

chore: update Cargo.lock for serde_json dependency

4ee3282

https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

Revert "fix: serialize set_current_dir tests with mutex to prevent race"

9709162

This reverts commit 4d6291a.

ci: add duplication bounded-output regression job

4f6358b

Runs the three original failure-mode commands in CI and verifies exit status, output under 4MB gRPC limit, and valid metrics. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

ci: tighten terminal regression guard with warning count and clone-fa…

8314f94

…mily assertion Fail if [W] count >= 200 (old baseline was 1566), line count >= 500, or grouped 'clone family' output is missing. https://claude.ai/code/session_019GCEWbYU1bEgHg3rtWZ11g

laplaque force-pushed the fix/duplication-result-explosion branch from e8e45a8 to 8314f94 Compare June 15, 2026 09:54

laplaque merged commit 849e3c6 into main Jun 15, 2026
10 checks passed

Conversation

iamclaude697 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test evidence

New tests

Deviations

Uh oh!

laplaque left a comment

Choose a reason for hiding this comment

PR Review: fix: bound duplication results and aggregate clone families

Pipeline status

Test coverage verdict

Spec coverage

Repo guideline

Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laplaque left a comment

Choose a reason for hiding this comment

PR Re-review: fix: bound duplication results and aggregate clone families

Pipeline status

Test coverage verdict

Spec coverage

Verdict

Uh oh!

Uh oh!

laplaque left a comment

Choose a reason for hiding this comment

PR Review: fix: bound duplication results and aggregate clone families

Findings Summary

Pipeline Status

Test Coverage Verdict

Spec Coverage

Strengths

Verdict

Uh oh!

Uh oh!

laplaque left a comment

Choose a reason for hiding this comment

PR Review State Update: Request Changes

Requested Change

Context

Verdict

Uh oh!

laplaque left a comment

Choose a reason for hiding this comment

PR Review: fix: bound duplication results and aggregate clone families

Findings Summary

Pipeline Status

Test Coverage Verdict

Spec Coverage

Strengths

Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laplaque left a comment

Choose a reason for hiding this comment

PR Review: fix: bound duplication results and aggregate clone families

Findings Summary

Pipeline Status

Test Coverage Verdict

Spec Coverage

Strengths

Verdict

Uh oh!

Uh oh!

laplaque left a comment

Choose a reason for hiding this comment

PR Review: fix: bound duplication results and aggregate clone families

Findings Summary

Pipeline Status

Test Coverage Verdict

Spec Coverage

Strengths

Verdict

iamclaude697 commented Jun 10, 2026 •

edited

Loading

Command 1: `chaffra dupes . --format json --telemetry off` (default/aggressive)

Command 2: `chaffra dupes . --min-tokens 100 --format terminal --telemetry off`

Command 3: `chaffra dupes . --mode mild --min-tokens 30 --format json --telemetry off`