Skip to content

statistical tier: replace the 5% count-delta bar with the Wilson test (v0.1.1) #1

@bledden

Description

@bledden

Pre-registered from the v0.1.0 cycle (advisor review): on a non-receipt platform with unpinned stim 1.16, the plain-BP statistical-tier check once flaked at 5.7% count-delta while the Wilson CIs still overlapped — i.e. the actual statistical test passed and the count-delta heuristic failed. The count-delta bar is the wrong test for cross-platform statistical tiers.

Plan (v0.1.1, not v0.1.0 — changing the bar inside the release cycle would look like moving goalposts):

  • statistical tier on platform-local shots: use the Wilson-CI overlap (already implemented in tridec.validation) or paired bootstrap as the binding test;
  • keep the 5% count-delta as a receipt-environment drift sentinel only (different threat model: catching tridec regressions under the exact pins).

Filed against v0.1.0; documented in docs/benchmark.md §5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions