Skip to content

Commit ba89b38

Browse files
committed
test(ways): expand activation test for taxonomy restructure
Add steps 4 (newly-semantic way via BM25) and 5 (co-activation of related ways) to the live activation test. Update test README with 18-way corpus baseline and new coverage scenarios. 8 test steps covering: regex, established BM25, new BM25, co-activation, negative control, subagent injection, subagent negative.
1 parent 5396834 commit ba89b38

2 files changed

Lines changed: 56 additions & 27 deletions

File tree

tests/README.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Three layers, from fast/automated to slow/interactive. See [way-match/results.md
1313

1414
### 1. Fixture Tests (BM25 vs NCD scorer comparison)
1515

16-
Runs 32 test prompts against a fixed 7-way corpus (testing, api, debugging, security, design, config, adr-context). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer.
16+
Runs 54 test prompts against a fixed 18-way corpus (all softwaredev ways with BM25 semantic matching). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer.
1717

1818
```bash
1919
tests/way-match/run-tests.sh fixture --verbose
@@ -25,7 +25,7 @@ Options: `--bm25-only`, `--ncd-only`, `--verbose`
2525

2626
**What it covers**: Scorer accuracy, false positive rate, head-to-head comparison. Tests direct vocabulary matches, synonym/paraphrase variants, and negative controls.
2727

28-
**Current baseline**: BM25 26/32, NCD 24/32, 0 FP for both.
28+
**Current baseline**: BM25 48/54, 0 FP.
2929

3030
### 2. Integration Tests (real way files)
3131

@@ -43,27 +43,29 @@ bash tools/way-match/test-integration.sh
4343

4444
### 3. Activation Test (live agent + subagent)
4545

46-
Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching, negative controls, and subagent injection.
46+
Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching (established and newly-added vocabularies), co-activation of related ways, negative controls, and subagent injection.
4747

4848
**To run**: Start a fresh session from `~/.claude/` and type:
4949

5050
```
5151
read and run the activation test at tests/way-activation-test.md
5252
```
5353

54-
Claude reads the test file (avoiding prompt-hook contamination), then walks you through 7 steps:
54+
Claude reads the test file (avoiding prompt-hook contamination), then walks you through 9 steps:
5555

5656
| Step | Who | Tests |
5757
|------|-----|-------|
5858
| 1 | Claude | Session baseline (no premature domain activation) |
59-
| 2 | User types prompt | Regex pattern matching (commits way) |
60-
| 3 | User types prompt | BM25 semantic matching (security way) |
61-
| 4 | User types prompt | Negative control (no false positives) |
62-
| 5 | Claude | Subagent injection (Testing Way via SubagentStart) |
63-
| 6 | Claude | Subagent negative (no injection on irrelevant prompt) |
64-
| 7 | Claude | Summary table |
59+
| 2 | User types prompt | Regex pattern matching (delivery/commits) |
60+
| 3 | User types prompt | BM25 semantic matching, established way (code/security) |
61+
| 4 | User types prompt | BM25 semantic matching, newly-semantic way (code/performance) |
62+
| 5 | User types prompt | Co-activation of multiple related ways (delivery/migrations + others) |
63+
| 6 | User types prompt | Negative control (no false positives) |
64+
| 7 | Claude | Subagent injection (Testing Way via SubagentStart) |
65+
| 8 | Claude | Subagent negative (no injection on irrelevant prompt) |
66+
| 9 | Claude | Summary table |
6567

66-
Takes about 3 minutes. **Current baseline**: 6/6 PASS.
68+
Takes about 5 minutes. **Current baseline**: 8/8 PASS (steps 1-8).
6769

6870
### Ad-Hoc Vocabulary Testing
6971

@@ -108,6 +110,8 @@ bash governance/governance.sh --lint # full governance lint
108110
| Changed a way's vocabulary or threshold | Integration tests + `/test-way` |
109111
| Changed hook scripts (check-*.sh, inject-*.sh, match-way.sh) | Activation test |
110112
| Added a new way | Integration tests + `/test-way` + activation test |
113+
| Restructured way directories | All three test layers + symlink/path verification |
114+
| Added semantic matching to a way | Fixture tests + integration tests + activation test (step 4) |
111115
| Renamed or moved documentation files | Doc-graph |
112116
| Changed provenance metadata in way frontmatter | Governance verification |
113117
| Changed policy source documents | Governance verification |

tests/way-activation-test.md

Lines changed: 41 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -35,21 +35,41 @@ After reading this file, begin with Step 1.
3535
3636
> **CLAUDE**: After the user sends that message, check if you received new domain-specific content in a system-reminder. Look for guidance about message conventions, branch naming, or attribution rules. Report what fired.
3737
38-
**Expected**: The commits way should fire (regex pattern: `commit|push.*(remote|origin|upstream)`). You should see guidance about conventional commit format and branch naming.
38+
**Expected**: The commits way (`delivery/commits`) should fire (regex pattern: `commit|push.*(remote|origin|upstream)`). You should see guidance about conventional commit format and branch naming.
3939

4040
---
4141

42-
### Step 3 — Semantic trigger (BM25)
42+
### Step 3 — Semantic trigger (BM25, established way)
4343

4444
> **USER**: Type exactly: `how should I hash passwords with bcrypt for our login system?`
4545
4646
> **CLAUDE**: Check if you received new domain-specific content. Look for guidance about vulnerability categories, credential handling, input validation, or defensive defaults. Report what fired.
4747
48-
**Expected**: The security way should fire via BM25 semantic matching (vocabulary includes bcrypt, hash, password, authentication, login). You should see detection rules and security defaults.
48+
**Expected**: The security way (`code/security`) should fire via BM25 semantic matching (vocabulary includes bcrypt, hash, password, authentication, login). You should see detection rules and security defaults.
4949

5050
---
5151

52-
### Step 4 — Negative test (no false positive)
52+
### Step 4 — Semantic trigger (BM25, newly-semantic way)
53+
54+
> **USER**: Type exactly: `profile the rendering loop to find the bottleneck and reduce latency`
55+
56+
> **CLAUDE**: Check if you received new domain-specific content. Look for guidance about profiling tools, algorithmic analysis, benchmarking, or measurement approaches. Report what fired.
57+
58+
**Expected**: The performance way (`code/performance`) should fire via BM25 semantic matching. This way previously only had regex triggers — the vocabulary (optimize, profile, benchmark, latency, bottleneck, etc.) was added during the taxonomy restructure. You should see guidance about static analysis for algorithmic issues and generating before/after measurements.
59+
60+
---
61+
62+
### Step 5 — Co-activation test (multiple related ways)
63+
64+
> **USER**: Type exactly: `create a migration to alter the users table and add an index on the email column`
65+
66+
> **CLAUDE**: Check how many domain-specific ways were injected. List each one by name/heading. Report which ways fired and whether they provide complementary guidance.
67+
68+
**Expected**: The migrations way (`delivery/migrations`) should fire — the prompt contains vocabulary terms (migration, alter, table, column, index). Other ways MAY also co-activate if they share relevant terms (e.g., design via "schema" concepts). Co-activation of related ways is expected and correct — each adds a different lens. Report all ways that fired.
69+
70+
---
71+
72+
### Step 6 — Negative test (no false positive)
5373

5474
> **USER**: Type exactly: `what's the weather like today?`
5575
@@ -59,7 +79,7 @@ After reading this file, begin with Step 1.
5979

6080
---
6181

62-
### Step 5 — Subagent injection (the critical path)
82+
### Step 7 — Subagent injection (the critical path)
6383

6484
> **CLAUDE**: Spawn a diagnostic subagent with this exact configuration:
6585
> - Use the Task tool with subagent_type: `general-purpose`
@@ -78,7 +98,7 @@ If the subagent sees NO injected content beyond the base configuration, the inje
7898

7999
---
80100

81-
### Step 6 — Subagent negative test
101+
### Step 8 — Subagent negative test
82102

83103
> **CLAUDE**: Spawn another diagnostic subagent:
84104
> - Use the Task tool with subagent_type: `general-purpose`
@@ -91,17 +111,22 @@ If the subagent sees NO injected content beyond the base configuration, the inje
91111

92112
---
93113

94-
### Step 7 — Summary
114+
### Step 9 — Summary
95115

96116
> **CLAUDE**: Compile a summary table:
97117
>
98-
> | Step | Test | Expected | Result |
99-
> |------|------|----------|--------|
100-
> | 1 | Session baseline | No domain-specific hooks | ? |
101-
> | 2 | Regex keyword match | Commits way fires | ? |
102-
> | 3 | BM25 semantic match | Security way fires | ? |
103-
> | 4 | Negative (no match) | Nothing fires | ? |
104-
> | 5 | Subagent injection | Testing Way received | ? |
105-
> | 6 | Subagent negative | No domain content received | ? |
118+
> | Step | Test | Cluster | Expected | Result |
119+
> |------|------|---------|----------|--------|
120+
> | 1 | Session baseline || No domain-specific hooks | ? |
121+
> | 2 | Regex keyword match | delivery | Commits way fires | ? |
122+
> | 3 | BM25 semantic (established) | code | Security way fires | ? |
123+
> | 4 | BM25 semantic (new vocabulary) | code | Performance way fires | ? |
124+
> | 5 | Co-activation | delivery+architecture | Migrations fires, others may join | ? |
125+
> | 6 | Negative (no match) || Nothing fires | ? |
126+
> | 7 | Subagent injection | code | Testing Way received | ? |
127+
> | 8 | Subagent negative || No domain content received | ? |
106128
>
107-
> Report the final pass/fail count and any observations.
129+
> Report the final pass/fail count and any observations about:
130+
> - Whether the taxonomy restructure affected hook delivery
131+
> - Whether newly-semantic ways activate correctly
132+
> - Whether co-activation produced useful complementary context

0 commit comments

Comments
 (0)