Fix false-positive archetype classification and verification run count#95
Draft
cursor[bot] wants to merge 1 commit intofeat/session-workflow-fingerprintsfrom
Draft
Conversation
- Use word-boundary regex for doc/migration text hint matching instead of bare substring checks. This prevents 'doc' from matching 'docker' and 'port' from matching 'import'/'export'/'support'/etc. - Fix verification_run_count to iterate over the full execution_evidence list instead of the deduplicated execution_types set, so it counts actual verification runs rather than distinct verification type categories.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three bugs in
workflow_profile_service.py:1.
"doc"substring falsely matches "docker" (high severity)_DOC_TEXT_HINTScontained"doc"which was matched via bareinsubstring check against lowercased text. This caused any session mentioning "docker", "docstring", etc. to be misclassified as"docs"archetype.Fix: Replaced the hint tuple + substring matching with a compiled regex using word boundaries (
\b).\bdocs?\bmatches "doc" and "docs" as standalone words but not "docker" or "docstring".2.
"port"substring falsely matches "import" (medium severity)_MIGRATION_TEXT_HINTScontained"port"which substring-matched "import", "export", "support", "report", etc. Combined with the weak file-count guard, many non-migration sessions would be incorrectly classified.Fix: Same approach — compiled regex with
\bport\bmatches "port" as a standalone word but not when it appears inside other words like "import" or "export". Other migration stems use\w*suffix to preserve prefix-matching behavior (e.g.,migrat\w*matches "migrate", "migration", "migrating").3. Verification run count tallies types not actual runs (medium severity)
verification_run_countiterated overexecution_types(a set of distinct type strings) instead of the fullexecution_evidencelist. A session with 5 test runs would report count of 1 because the set collapses all "test" entries.Fix: Changed iteration to loop over
execution_evidenceand extract each record'sevidence_type, counting all matching records rather than just unique types.