fix(stage5): dedupe merged arrows by (src, op, dst) tuple#65
Merged
Conversation
Stage 5's per-package merger deduplicated relationship arrows by full line
string (`set[str]`). Two arrows with the same (source, op, target) but
different labels survived as distinct entries, producing visible
duplicates in the package overview.
Live example from agent-brain `packages/models/README.md`:
BaseModel <|-- GraphQueryContext : extends
BaseModel <|-- GraphQueryContext : inherits
Five duplicate-edge pairs visible across that one package (GraphQueryContext,
GraphTriple, CodeChunkStrategy, QueryMode, JobRecord *-- JobProgress).
Replace the set with a (src, op, dst) tuple key. First label seen wins —
deterministic given the stable input order (sorted class_docs in the
caller). Genuinely distinct edges (different op, different target)
still survive.
The new _parse_arrow helper sorts _ARROW_OPS longest-first as a defensive
measure even though no current op is a substring of another — this
protects future _ARROW_OPS additions from substring-matching surprises.
Existing tests for identical-arrow dedupe and arrow-preservation continue
to pass. New tests:
* _parse_arrow on all 10 relationship ops
* dedupe collapses labelled variants to one arrow (first wins)
* distinct-op and distinct-target edges are preserved
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
500e8c3 to
a256b2d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace Stage 5's full-line-string arrow dedupe with a
(src, op, dst)tuple key so labelled variants likeA --> B : fooandA --> B : barcollapse to one canonical edge.Why
Discovered during the agent-brain trust-eval pass (
docs/gen/agent-brain/EVAL.md). The_merge_class_diagramshelper used aset[str]keyed on the full line, so two arrows with the same(source, op, target)but different labels survived as distinct entries. Live indocs/gen/agent-brain/packages/models/README.md:Five duplicate-edge pairs across the
modelsoverview alone (GraphQueryContext,GraphTriple,CodeChunkStrategy,QueryMode,JobRecord *-- JobProgress). The diagram still renders but the noise undermines the overview's job of being a clean bird's-eye view. This is the same class of issue I papered over in #60 with the post-strip case inconfig; the eval caught that it also fires without label stripping inmodels.Closes #62.
Changes
src/designdoc/stages/s5_mermaid.py_parse_arrow(line)— returns(src, op, dst, label)orNone. Sorts_ARROW_OPSlongest-first as defence against future ops that might become substrings of one another (no current op is, but it's free insurance)._merge_class_diagramsreplacesarrows: set[str]witharrow_keys: set[tuple[str, str, str]]+arrow_lines: list[str]. First label seen wins — deterministic given the caller's stable input order (sorted class_docs).tests/unit/test_stage5_package_diagrams.pyNonereturn.Invariants
Verification
task cigreen locally (107 passed in 56.76s)test_stage5_package_diagrams.pypass (12 prior + 7 new)ImportError→ implementation → GREEN)packages/models/README.mdloses the 5 duplicate-edge pairs (not part of merge gate; the unit tests prove the behaviour)Related
docs/gen/agent-brain/EVAL.mdconfigwas a symptom of this same dedupe weakness; this PR fixes the root cause)