Status: deferred — opened to track the Tier-3 ASR sanity policy decision in #87 so we revisit it when conditions change.
Background
PR #88 (closes #87) introduced the effective-prosody cap in synthbanshee/tts/renderer.py and a paired qa-report --asr Whisper sanity check. The check detects clips that synthesize fine to a listener but trip Whisper-large-v3's silence-detection / segmentation heuristic — the failure mode on sp_it_a_0001 that produced the residual #87 regression.
Per CLAUDE.md "ASR sanity check policy", this check is currently local-only:
- Tier 1 (cap unit tests, no Whisper) → CI on every PR.
- Tier 2 (Whisper-on-fixture) → not implemented; deferred.
- Tier 3 (
qa-report --asr real-render) → run locally once per audio-touching PR, after all fixes.
Why deferred (2026-05-06 decision)
- Cost. Each real Tier-3 invocation pays ~$1–$2 of Azure rendering for one Tier-A scene plus Whisper download / inference. At current scene volume (single-digit Tier-A clips per PR review cycle), local manual runs are perfectly tractable.
- Author-triggered labelling does not add value yet. The only "audio-touching PR" author is currently Claude/agent-driven — labelling is a self-decision, so a
tts-full-eval label gate would just shift the same decision into a different surface.
- Repo secrets churn. Adding
AZURE_TTS_KEY to repo secrets is non-trivial (rotation, scoping, observability) and we'd want a proper Azure spend cap before letting CI render.
Re-evaluation triggers
Reopen for active design when any of these hold:
What a CI implementation would look like (rough sketch — not committed)
Alternative: a tts-full-eval label-gated pull_request workflow, fired only when an author labels a PR. Same content; different trigger.
What is already in place (so the deferral doesn't lose institutional knowledge)
qa-report --asr is a one-line invocation that runs the full check locally.
- The cap activations are recorded in clip metadata (
generation_metadata.effective_prosody_caps) and surfaced by qa-report without requiring --asr, so a static "did the cap fire" signal is available cheaply on every QA pass.
- The cap thresholds and the rationale for the policy are documented in CLAUDE.md ("ASR sanity check policy").
- This issue is the canonical reminder; do not delete it on merge.
References
Status: deferred — opened to track the Tier-3 ASR sanity policy decision in #87 so we revisit it when conditions change.
Background
PR #88 (closes #87) introduced the effective-prosody cap in
synthbanshee/tts/renderer.pyand a pairedqa-report --asrWhisper sanity check. The check detects clips that synthesize fine to a listener but trip Whisper-large-v3's silence-detection / segmentation heuristic — the failure mode onsp_it_a_0001that produced the residual #87 regression.Per CLAUDE.md "ASR sanity check policy", this check is currently local-only:
qa-report --asrreal-render) → run locally once per audio-touching PR, after all fixes.Why deferred (2026-05-06 decision)
tts-full-evallabel gate would just shift the same decision into a different surface.AZURE_TTS_KEYto repo secrets is non-trivial (rotation, scoping, observability) and we'd want a proper Azure spend cap before letting CI render.Re-evaluation triggers
Reopen for active design when any of these hold:
What a CI implementation would look like (rough sketch — not committed)
.github/workflows/asr-sanity-nightly.ymlthat runs nightly onmain.eval-asrextra (uv pip install -e ".[eval-asr]").sp_it_a_0001— the investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87 evidence-base scene) end-to-end through Azure, runs Whisper, assertslength_ratio ≥ 0.85andWER ≤ 0.10.Alternative: a
tts-full-evallabel-gatedpull_requestworkflow, fired only when an author labels a PR. Same content; different trigger.What is already in place (so the deferral doesn't lose institutional knowledge)
qa-report --asris a one-line invocation that runs the full check locally.generation_metadata.effective_prosody_caps) and surfaced byqa-reportwithout requiring--asr, so a static "did the cap fire" signal is available cheaply on every QA pass.References
CLAUDE.md"ASR sanity check policy" section.