Skip to content

feat: streaming TTS generation and persistent voice library#166

Open
ice-product96 wants to merge 3 commits into
k2-fsa:masterfrom
ice-product96:feature/streaming-voice-library
Open

feat: streaming TTS generation and persistent voice library#166
ice-product96 wants to merge 3 commits into
k2-fsa:masterfrom
ice-product96:feature/streaming-voice-library

Conversation

@ice-product96

Copy link
Copy Markdown
  • Add VoiceLibrary class (omnivoice/utils/voice_library.py): pre-clone a voice once, save it by name to disk (~/.omnivoice/voices/), and reuse instantly without re-uploading reference audio.

  • Add OmniVoice.generate_streaming() method: generator that yields (audio_1d, status) tuples as each chunk is decoded, enabling progressive playback in the UI for long texts. Short texts still complete in a single pass.

  • Rewrite Gradio demo (omnivoice/cli/demo.py) with three tabs:

    • Voice Clone — saved-voice dropdown, streaming output, quick-save panel.
    • Voice Design — unchanged.
    • Voice Library — clone & name any audio, manage (list/delete) saved voices; changes propagate live to all dropdowns.
  • Export VoiceClonePrompt and VoiceLibrary from omnivoice/init.py.

ice-product96 and others added 3 commits May 23, 2026 14:39
- Add `VoiceLibrary` class (omnivoice/utils/voice_library.py):
  pre-clone a voice once, save it by name to disk (~/.omnivoice/voices/),
  and reuse instantly without re-uploading reference audio.

- Add `OmniVoice.generate_streaming()` method:
  generator that yields (audio_1d, status) tuples as each chunk is
  decoded, enabling progressive playback in the UI for long texts.
  Short texts still complete in a single pass.

- Rewrite Gradio demo (omnivoice/cli/demo.py) with three tabs:
  * Voice Clone — saved-voice dropdown, streaming output, quick-save panel.
  * Voice Design — unchanged.
  * Voice Library — clone & name any audio, manage (list/delete) saved
    voices; changes propagate live to all dropdowns.

- Export `VoiceClonePrompt` and `VoiceLibrary` from omnivoice/__init__.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New package omnivoice/api/server.py:
  - GET  /health, /info             — system status & model info
  - GET  /voices                    — list all saved voice clones
  - GET  /voices/{name}             — get single voice metadata
  - POST /voices/clone              — clone voice from uploaded audio
  - DELETE /voices/{name}           — delete a voice
  - PATCH /voices/{name}            — rename a voice
  - POST /generate                  — sync TTS → complete WAV file
  - POST /generate/stream           — chunked streaming (PCM frames)
  - WS   /ws/generate               — WebSocket real-time streaming
  - Interactive docs at /docs (Swagger UI)
  - Optional X-Api-Key auth, CORS, single GPU lock

Add omnivoice-api CLI entry point and [api] optional deps group.
Add API_INTEGRATION.md with full usage guide.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers all endpoints with parameters, response formats,
binary framing protocol, JS/Python client examples and Docker setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant