Skip to content

feat: Robust Audio I/O, JSON-based i18n & Windows Portability Enhancements#68

Open
RafaelGodoyEbert wants to merge 1 commit into
k2-fsa:masterfrom
RafaelGodoyEbert:master
Open

feat: Robust Audio I/O, JSON-based i18n & Windows Portability Enhancements#68
RafaelGodoyEbert wants to merge 1 commit into
k2-fsa:masterfrom
RafaelGodoyEbert:master

Conversation

@RafaelGodoyEbert

Copy link
Copy Markdown

This PR addresses the RuntimeError: Could not load libtorchcodec encountered by Windows users, fully implements internationalization (i18n) for the UI/CLI, and adds a standalone Windows batch launcher to ensure dependency isolation and smooth GPU deployment. It is fully integrated with the latest upstream changes (--no-asr, Voice Clone Instruct, and torchaudio.load fixes).

Part 1: Robust Audio I/O (TorchCodec Fallback)

The Problem

On Windows, operations like torchaudio.save() or torchaudio.load() often crash with RuntimeError due to missing or misconfigured torchcodec / FFmpeg dynamic libraries.

The Solution

  • Implemented save_audio inside omnivoice/utils/audio.py that leverages Torchaudio but cleanly falls back to soundfile upon environment errors.
  • Conflict-free Integration: Maintained your recent backend="soundfile" fix in load_audio(), whilst beefing up the exception blockers to catch ImportError and gracefully revert to PyDub+FFmpeg if any PyTorch audio backend fails.

Part 2: JSON-based Internationalization (i18n)

Full interface localization using a lightweight, native .json architecture under omnivoice/locales/.

Key Features

  1. Simple JSON Management: Translations are stored transparently in JSON (e.g. pt_BR.json, en.json, zh.json). Anyone can expand support by adding a single file, eliminating the need for complex .po toolchains.
  2. Logical Masking: Dropdowns and toggles render elegantly in the user's local language (e.g., "Sotaque Português") but the backend still natively extracts the raw English identifier ("portuguese accent") to maintain core generation stability.
  3. Deep UI Integration: Added a --lang argument across demo.py, infer.py, and infer_batch.py. Everything (including console logs and argument help blocks) wraps seamlessly inside the _() dictionary handler.

Part 3: Windows Portability (OmniVoice.bat)

A massive quality-of-life feature for Windows users trying to test the model seamlessly.

  • Packaged an OmniVoice.bat automated Windows startup script.
  • Dependency Isolation: Anchors HuggingFace and uv cache downloads permanently inside the project subfolders to avoid fragmenting the user's global C: drive state.
  • Out-of-the-box Verification: Scans for GPU presence and prints VRAM / PyTorch fallback statuses before launching Gradio, preventing "silent CPU" generation stalls.

Verification

  • Tested synchronously with the latest --no-asr parameters and the newly reintroduced "Instruct" textboxes in Voice Clone.
  • Zero-dependency JSON loaders tested extensively on uv run omnivoice/cli/demo.py --lang pt_BR.

@zhu-han

zhu-han commented Apr 13, 2026

Copy link
Copy Markdown
Collaborator

Thanks for this PR. In the recent update, I switched the audio I/O from torchaudio to soundfile+librosa, which is similar to the Robust Audio I/O you proposed. The demo page is only intended as a simple demonstration, and I think Chinese + English support will satisfy most users. Regarding Windows portability, I noticed this script contains many hardcoded arguments and assumptions, so it may not be suitable for merging into the master branch. However, if you wish to implement and maintain a Windows version of OmniVoice, you can create your own repository and list it under the community projects section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants