Fix migration regex matching 'port' and docs heuristic text-only matching#98
Draft
cursor[bot] wants to merge 1 commit intofeat/session-workflow-fingerprintsfrom
Draft
Conversation
…hing - Change migration regex from \bport\b to \bport(?:ing|ed)\b to avoid matching the common networking term 'port' (e.g., 'port 3000') - Require at least one documentation file in named_files before text-only regex match can classify a session as docs archetype
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two logic bugs in
workflow_profile_service.pyarchetype heuristics:1. Migration regex matches common "port" network term
_MIGRATION_TEXT_REincluded\bport\bwhich matched the standalone word "port" — an extremely common term for network ports (e.g., "listen on port 3000"). Combined with the weak secondary check requiring onlyfiles_touched_count >= 2, this caused sessions about server configuration or networking to be falsely classified as migration archetype.Fix: Changed regex from
\bport\bto\bport(?:ing|ed)\bto only match verb forms relevant to code migration ("porting", "ported") while avoiding the noun form.2. Docs heuristic too aggressive on text-only matching
_looks_like_docsreturnedTrueimmediately when_DOC_TEXT_REmatched the prompt/summary text, bypassing the file-based check entirely. Words like "guide", "doc", "readme", and "changelog" commonly appear in prompts that aren't about documentation work.Fix: Removed the text-only early return. Now a text regex match requires at least one documentation file in
named_filesto classify as docs. Without a text match, the existing threshold (half of files must be doc files) still applies.