You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
test(ways): expand activation test for taxonomy restructure
Add steps 4 (newly-semantic way via BM25) and 5 (co-activation of
related ways) to the live activation test. Update test README with
18-way corpus baseline and new coverage scenarios.
8 test steps covering: regex, established BM25, new BM25, co-activation,
negative control, subagent injection, subagent negative.
Copy file name to clipboardExpand all lines: tests/README.md
+15-11Lines changed: 15 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ Three layers, from fast/automated to slow/interactive. See [way-match/results.md
13
13
14
14
### 1. Fixture Tests (BM25 vs NCD scorer comparison)
15
15
16
-
Runs 32 test prompts against a fixed 7-way corpus (testing, api, debugging, security, design, config, adr-context). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer.
16
+
Runs 54 test prompts against a fixed 18-way corpus (all softwaredev ways with BM25 semantic matching). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer.
Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching, negative controls, and subagent injection.
46
+
Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching (established and newly-added vocabularies), co-activation of related ways, negative controls, and subagent injection.
47
47
48
48
**To run**: Start a fresh session from `~/.claude/` and type:
49
49
50
50
```
51
51
read and run the activation test at tests/way-activation-test.md
52
52
```
53
53
54
-
Claude reads the test file (avoiding prompt-hook contamination), then walks you through 7 steps:
54
+
Claude reads the test file (avoiding prompt-hook contamination), then walks you through 9 steps:
55
55
56
56
| Step | Who | Tests |
57
57
|------|-----|-------|
58
58
| 1 | Claude | Session baseline (no premature domain activation) |
Copy file name to clipboardExpand all lines: tests/way-activation-test.md
+41-16Lines changed: 41 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,21 +35,41 @@ After reading this file, begin with Step 1.
35
35
36
36
> **CLAUDE**: After the user sends that message, check if you received new domain-specific content in a system-reminder. Look for guidance about message conventions, branch naming, or attribution rules. Report what fired.
37
37
38
-
**Expected**: The commits way should fire (regex pattern: `commit|push.*(remote|origin|upstream)`). You should see guidance about conventional commit format and branch naming.
38
+
**Expected**: The commits way (`delivery/commits`) should fire (regex pattern: `commit|push.*(remote|origin|upstream)`). You should see guidance about conventional commit format and branch naming.
39
39
40
40
---
41
41
42
-
### Step 3 — Semantic trigger (BM25)
42
+
### Step 3 — Semantic trigger (BM25, established way)
43
43
44
44
> **USER**: Type exactly: `how should I hash passwords with bcrypt for our login system?`
45
45
46
46
> **CLAUDE**: Check if you received new domain-specific content. Look for guidance about vulnerability categories, credential handling, input validation, or defensive defaults. Report what fired.
47
47
48
-
**Expected**: The security way should fire via BM25 semantic matching (vocabulary includes bcrypt, hash, password, authentication, login). You should see detection rules and security defaults.
48
+
**Expected**: The security way (`code/security`) should fire via BM25 semantic matching (vocabulary includes bcrypt, hash, password, authentication, login). You should see detection rules and security defaults.
> **USER**: Type exactly: `profile the rendering loop to find the bottleneck and reduce latency`
55
+
56
+
> **CLAUDE**: Check if you received new domain-specific content. Look for guidance about profiling tools, algorithmic analysis, benchmarking, or measurement approaches. Report what fired.
57
+
58
+
**Expected**: The performance way (`code/performance`) should fire via BM25 semantic matching. This way previously only had regex triggers — the vocabulary (optimize, profile, benchmark, latency, bottleneck, etc.) was added during the taxonomy restructure. You should see guidance about static analysis for algorithmic issues and generating before/after measurements.
59
+
60
+
---
61
+
62
+
### Step 5 — Co-activation test (multiple related ways)
63
+
64
+
> **USER**: Type exactly: `create a migration to alter the users table and add an index on the email column`
65
+
66
+
> **CLAUDE**: Check how many domain-specific ways were injected. List each one by name/heading. Report which ways fired and whether they provide complementary guidance.
67
+
68
+
**Expected**: The migrations way (`delivery/migrations`) should fire — the prompt contains vocabulary terms (migration, alter, table, column, index). Other ways MAY also co-activate if they share relevant terms (e.g., design via "schema" concepts). Co-activation of related ways is expected and correct — each adds a different lens. Report all ways that fired.
69
+
70
+
---
71
+
72
+
### Step 6 — Negative test (no false positive)
53
73
54
74
> **USER**: Type exactly: `what's the weather like today?`
55
75
@@ -59,7 +79,7 @@ After reading this file, begin with Step 1.
59
79
60
80
---
61
81
62
-
### Step 5 — Subagent injection (the critical path)
82
+
### Step 7 — Subagent injection (the critical path)
63
83
64
84
> **CLAUDE**: Spawn a diagnostic subagent with this exact configuration:
65
85
> - Use the Task tool with subagent_type: `general-purpose`
@@ -78,7 +98,7 @@ If the subagent sees NO injected content beyond the base configuration, the inje
78
98
79
99
---
80
100
81
-
### Step 6 — Subagent negative test
101
+
### Step 8 — Subagent negative test
82
102
83
103
> **CLAUDE**: Spawn another diagnostic subagent:
84
104
> - Use the Task tool with subagent_type: `general-purpose`
@@ -91,17 +111,22 @@ If the subagent sees NO injected content beyond the base configuration, the inje
0 commit comments