test(ways): expand activation test for taxonomy restructure

aaronsb · aaronsb · commit ba89b389f44c · 2026-02-17T08:55:16.000-06:00
Add steps 4 (newly-semantic way via BM25) and 5 (co-activation of
related ways) to the live activation test. Update test README with
18-way corpus baseline and new coverage scenarios.

8 test steps covering: regex, established BM25, new BM25, co-activation,
negative control, subagent injection, subagent negative.
diff --git a/tests/README.md b/tests/README.md
@@ -13,7 +13,7 @@ Three layers, from fast/automated to slow/interactive. See [way-match/results.md
 
 ### 1. Fixture Tests (BM25 vs NCD scorer comparison)
 
-Runs 32 test prompts against a fixed 7-way corpus (testing, api, debugging, security, design, config, adr-context). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer.
+Runs 54 test prompts against a fixed 18-way corpus (all softwaredev ways with BM25 semantic matching). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer.
 
 ```bash
 tests/way-match/run-tests.sh fixture --verbose
@@ -25,7 +25,7 @@ Options: `--bm25-only`, `--ncd-only`, `--verbose`
 
 **What it covers**: Scorer accuracy, false positive rate, head-to-head comparison. Tests direct vocabulary matches, synonym/paraphrase variants, and negative controls.
 
-**Current baseline**: BM25 26/32, NCD 24/32, 0 FP for both.
+**Current baseline**: BM25 48/54, 0 FP.
 
 ### 2. Integration Tests (real way files)
 
@@ -43,27 +43,29 @@ bash tools/way-match/test-integration.sh
 
 ### 3. Activation Test (live agent + subagent)
 
-Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching, negative controls, and subagent injection.
+Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching (established and newly-added vocabularies), co-activation of related ways, negative controls, and subagent injection.
 
 **To run**: Start a fresh session from `~/.claude/` and type:
 
 ```
 read and run the activation test at tests/way-activation-test.md
 ```
 
-Claude reads the test file (avoiding prompt-hook contamination), then walks you through 7 steps:
+Claude reads the test file (avoiding prompt-hook contamination), then walks you through 9 steps:
 
 | Step | Who | Tests |
 |------|-----|-------|
 | 1 | Claude | Session baseline (no premature domain activation) |
-| 2 | User types prompt | Regex pattern matching (commits way) |
-| 3 | User types prompt | BM25 semantic matching (security way) |
-| 4 | User types prompt | Negative control (no false positives) |
-| 5 | Claude | Subagent injection (Testing Way via SubagentStart) |
-| 6 | Claude | Subagent negative (no injection on irrelevant prompt) |
-| 7 | Claude | Summary table |
+| 2 | User types prompt | Regex pattern matching (delivery/commits) |
+| 3 | User types prompt | BM25 semantic matching, established way (code/security) |
+| 4 | User types prompt | BM25 semantic matching, newly-semantic way (code/performance) |
+| 5 | User types prompt | Co-activation of multiple related ways (delivery/migrations + others) |
+| 6 | User types prompt | Negative control (no false positives) |
+| 7 | Claude | Subagent injection (Testing Way via SubagentStart) |
+| 8 | Claude | Subagent negative (no injection on irrelevant prompt) |
+| 9 | Claude | Summary table |
 
-Takes about 3 minutes. **Current baseline**: 6/6 PASS.
+Takes about 5 minutes. **Current baseline**: 8/8 PASS (steps 1-8).
 
 ### Ad-Hoc Vocabulary Testing
 
@@ -108,6 +110,8 @@ bash governance/governance.sh --lint         # full governance lint
 | Changed a way's vocabulary or threshold | Integration tests + `/test-way` |
 | Changed hook scripts (check-*.sh, inject-*.sh, match-way.sh) | Activation test |
 | Added a new way | Integration tests + `/test-way` + activation test |
+| Restructured way directories | All three test layers + symlink/path verification |
+| Added semantic matching to a way | Fixture tests + integration tests + activation test (step 4) |
 | Renamed or moved documentation files | Doc-graph |
 | Changed provenance metadata in way frontmatter | Governance verification |
 | Changed policy source documents | Governance verification |
diff --git a/tests/way-activation-test.md b/tests/way-activation-test.md
@@ -35,21 +35,41 @@ After reading this file, begin with Step 1.
 
 > **CLAUDE**: After the user sends that message, check if you received new domain-specific content in a system-reminder. Look for guidance about message conventions, branch naming, or attribution rules. Report what fired.
 
-**Expected**: The commits way should fire (regex pattern: `commit|push.*(remote|origin|upstream)`). You should see guidance about conventional commit format and branch naming.
+**Expected**: The commits way (`delivery/commits`) should fire (regex pattern: `commit|push.*(remote|origin|upstream)`). You should see guidance about conventional commit format and branch naming.
 
 ---
 
-### Step 3 — Semantic trigger (BM25)
+### Step 3 — Semantic trigger (BM25, established way)
 
 > **USER**: Type exactly: `how should I hash passwords with bcrypt for our login system?`
 
 > **CLAUDE**: Check if you received new domain-specific content. Look for guidance about vulnerability categories, credential handling, input validation, or defensive defaults. Report what fired.
 
-**Expected**: The security way should fire via BM25 semantic matching (vocabulary includes bcrypt, hash, password, authentication, login). You should see detection rules and security defaults.
+**Expected**: The security way (`code/security`) should fire via BM25 semantic matching (vocabulary includes bcrypt, hash, password, authentication, login). You should see detection rules and security defaults.
 
 ---
 
-### Step 4 — Negative test (no false positive)
+### Step 4 — Semantic trigger (BM25, newly-semantic way)
+
+> **USER**: Type exactly: `profile the rendering loop to find the bottleneck and reduce latency`
+
+> **CLAUDE**: Check if you received new domain-specific content. Look for guidance about profiling tools, algorithmic analysis, benchmarking, or measurement approaches. Report what fired.
+
+**Expected**: The performance way (`code/performance`) should fire via BM25 semantic matching. This way previously only had regex triggers — the vocabulary (optimize, profile, benchmark, latency, bottleneck, etc.) was added during the taxonomy restructure. You should see guidance about static analysis for algorithmic issues and generating before/after measurements.
+
+---
+
+### Step 5 — Co-activation test (multiple related ways)
+
+> **USER**: Type exactly: `create a migration to alter the users table and add an index on the email column`
+
+> **CLAUDE**: Check how many domain-specific ways were injected. List each one by name/heading. Report which ways fired and whether they provide complementary guidance.
+
+**Expected**: The migrations way (`delivery/migrations`) should fire — the prompt contains vocabulary terms (migration, alter, table, column, index). Other ways MAY also co-activate if they share relevant terms (e.g., design via "schema" concepts). Co-activation of related ways is expected and correct — each adds a different lens. Report all ways that fired.
+
+---
+
+### Step 6 — Negative test (no false positive)
 
 > **USER**: Type exactly: `what's the weather like today?`
 
@@ -59,7 +79,7 @@ After reading this file, begin with Step 1.
 
 ---
 
-### Step 5 — Subagent injection (the critical path)
+### Step 7 — Subagent injection (the critical path)
 
 > **CLAUDE**: Spawn a diagnostic subagent with this exact configuration:
 > - Use the Task tool with subagent_type: `general-purpose`
@@ -78,7 +98,7 @@ If the subagent sees NO injected content beyond the base configuration, the inje
 
 ---
 
-### Step 6 — Subagent negative test
+### Step 8 — Subagent negative test
 
 > **CLAUDE**: Spawn another diagnostic subagent:
 > - Use the Task tool with subagent_type: `general-purpose`
@@ -91,17 +111,22 @@ If the subagent sees NO injected content beyond the base configuration, the inje
 
 ---
 
-### Step 7 — Summary
+### Step 9 — Summary
 
 > **CLAUDE**: Compile a summary table:
 >
-> | Step | Test | Expected | Result |
-> |------|------|----------|--------|
-> | 1 | Session baseline | No domain-specific hooks | ? |
-> | 2 | Regex keyword match | Commits way fires | ? |
-> | 3 | BM25 semantic match | Security way fires | ? |
-> | 4 | Negative (no match) | Nothing fires | ? |
-> | 5 | Subagent injection | Testing Way received | ? |
-> | 6 | Subagent negative | No domain content received | ? |
+> | Step | Test | Cluster | Expected | Result |
+> |------|------|---------|----------|--------|
+> | 1 | Session baseline | — | No domain-specific hooks | ? |
+> | 2 | Regex keyword match | delivery | Commits way fires | ? |
+> | 3 | BM25 semantic (established) | code | Security way fires | ? |
+> | 4 | BM25 semantic (new vocabulary) | code | Performance way fires | ? |
+> | 5 | Co-activation | delivery+architecture | Migrations fires, others may join | ? |
+> | 6 | Negative (no match) | — | Nothing fires | ? |
+> | 7 | Subagent injection | code | Testing Way received | ? |
+> | 8 | Subagent negative | — | No domain content received | ? |
 >
-> Report the final pass/fail count and any observations.
+> Report the final pass/fail count and any observations about:
+> - Whether the taxonomy restructure affected hook delivery
+> - Whether newly-semantic ways activate correctly
+> - Whether co-activation produced useful complementary context