Skip to content

infra(ci): consider adding "qa-report --asr" Whisper sanity check to CI (deferred — Tier-3 from #87) #88

@shaypal5

Description

@shaypal5

Status: deferred — opened to track the Tier-3 ASR sanity policy decision in #87 so we revisit it when conditions change.

Background

PR #88 (closes #87) introduced the effective-prosody cap in synthbanshee/tts/renderer.py and a paired qa-report --asr Whisper sanity check. The check detects clips that synthesize fine to a listener but trip Whisper-large-v3's silence-detection / segmentation heuristic — the failure mode on sp_it_a_0001 that produced the residual #87 regression.

Per CLAUDE.md "ASR sanity check policy", this check is currently local-only:

  • Tier 1 (cap unit tests, no Whisper) → CI on every PR.
  • Tier 2 (Whisper-on-fixture) → not implemented; deferred.
  • Tier 3 (qa-report --asr real-render) → run locally once per audio-touching PR, after all fixes.

Why deferred (2026-05-06 decision)

  1. Cost. Each real Tier-3 invocation pays ~$1–$2 of Azure rendering for one Tier-A scene plus Whisper download / inference. At current scene volume (single-digit Tier-A clips per PR review cycle), local manual runs are perfectly tractable.
  2. Author-triggered labelling does not add value yet. The only "audio-touching PR" author is currently Claude/agent-driven — labelling is a self-decision, so a tts-full-eval label gate would just shift the same decision into a different surface.
  3. Repo secrets churn. Adding AZURE_TTS_KEY to repo secrets is non-trivial (rotation, scoping, observability) and we'd want a proper Azure spend cap before letting CI render.

Re-evaluation triggers

Reopen for active design when any of these hold:

What a CI implementation would look like (rough sketch — not committed)

Alternative: a tts-full-eval label-gated pull_request workflow, fired only when an author labels a PR. Same content; different trigger.

What is already in place (so the deferral doesn't lose institutional knowledge)

  • qa-report --asr is a one-line invocation that runs the full check locally.
  • The cap activations are recorded in clip metadata (generation_metadata.effective_prosody_caps) and surfaced by qa-report without requiring --asr, so a static "did the cap fire" signal is available cheaply on every QA pass.
  • The cap thresholds and the rationale for the policy are documented in CLAUDE.md ("ASR sanity check policy").
  • This issue is the canonical reminder; do not delete it on merge.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    comp: ttsTTS rendering, SSML, Azure/Google providersplanning

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions