statistical tier: replace the 5% count-delta bar with the Wilson test (v0.1.1)

Pre-registered from the v0.1.0 cycle (advisor review): on a non-receipt platform with unpinned stim 1.16, the plain-BP statistical-tier check once flaked at 5.7% count-delta while the Wilson CIs still overlapped — i.e. the actual statistical test passed and the count-delta heuristic failed. The count-delta bar is the wrong test for cross-platform statistical tiers.

Plan (v0.1.1, not v0.1.0 — changing the bar inside the release cycle would look like moving goalposts):
- statistical tier on platform-local shots: use the Wilson-CI overlap (already implemented in `tridec.validation`) or paired bootstrap as the binding test;
- keep the 5% count-delta as a receipt-environment drift sentinel only (different threat model: catching tridec regressions under the exact pins).

Filed against v0.1.0; documented in docs/benchmark.md §5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statistical tier: replace the 5% count-delta bar with the Wilson test (v0.1.1) #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

statistical tier: replace the 5% count-delta bar with the Wilson test (v0.1.1) #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions