bug(tts): Hebrew word merging makes speech unintelligible

## Problem

Azure and Google TTS engines merge adjacent Hebrew words into single non-existent words, making most generated speech unintelligible to native speakers.

**Source:** Listening test, 2026-05-03 (IT + NEG clips).

## Examples (from sp_it_a_0001_00)

| Expected | Rendered | Meaning lost |
|----------|----------|-------------|
| "היי, חשבתי" → "hey, cha-shav-ti" | "ha-ya-cha-shav-ti" (one long word) | "hi" merged with "I thought" |
| "אגב ניסיתי" → "ah-gav nee-see-ti" | "ag-va-nee-see-tee" | "by the way" merged with "I tried" |
| "הייתי בדרך הביתה" → "ha-yee-ti ba-de-rech ha-bye-tah" | "kee-nee-tee ba-de-rech ha-bee-tah-leh" | Multiple words merged + mispronounced |

This happens throughout all generated clips, not just isolated cases.

## Likely causes

1. **Missing or insufficient whitespace/pause hints in SSML.** Azure he-IL may need explicit `<break>` tags between words to prevent liaison.
2. **Nikud (vowel diacritics) absence.** Without nikud, the TTS engine guesses word boundaries and vowelization, often incorrectly.
3. **Script generation producing unnormalized text.** Stage 1b gender disambiguation may not be inserting enough structural cues.

## Possible mitigations

- Insert `<break time="50ms"/>` between words in SSML (at the renderer level)
- Add nikud to high-error words via the normalization lexicon (Stage 1b)
- Investigate if Google Chirp 3 HD handles word boundaries better than Azure
- Test with explicit phoneme tags (`<phoneme alphabet="ipa">`) for problematic words

## Impact

**P0 — Blocker.** If the speech is unintelligible, no downstream processing (labels, augmentation, training) has value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(tts): Hebrew word merging makes speech unintelligible #62

Problem

Examples (from sp_it_a_0001_00)

Likely causes

Possible mitigations

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expected	Rendered	Meaning lost
"היי, חשבתי" → "hey, cha-shav-ti"	"ha-ya-cha-shav-ti" (one long word)	"hi" merged with "I thought"
"אגב ניסיתי" → "ah-gav nee-see-ti"	"ag-va-nee-see-tee"	"by the way" merged with "I tried"
"הייתי בדרך הביתה" → "ha-yee-ti ba-de-rech ha-bye-tah"	"kee-nee-tee ba-de-rech ha-bee-tah-leh"	Multiple words merged + mispronounced

bug(tts): Hebrew word merging makes speech unintelligible #62

Description

Problem

Examples (from sp_it_a_0001_00)

Likely causes

Possible mitigations

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions