feat: Robust Audio I/O, JSON-based i18n & Windows Portability Enhancements by RafaelGodoyEbert · Pull Request #68 · k2-fsa/OmniVoice

RafaelGodoyEbert · 2026-04-08T16:57:52Z

This PR addresses the RuntimeError: Could not load libtorchcodec encountered by Windows users, fully implements internationalization (i18n) for the UI/CLI, and adds a standalone Windows batch launcher to ensure dependency isolation and smooth GPU deployment. It is fully integrated with the latest upstream changes (--no-asr, Voice Clone Instruct, and torchaudio.load fixes).

Part 1: Robust Audio I/O (TorchCodec Fallback)

The Problem

On Windows, operations like torchaudio.save() or torchaudio.load() often crash with RuntimeError due to missing or misconfigured torchcodec / FFmpeg dynamic libraries.

The Solution

Implemented save_audio inside omnivoice/utils/audio.py that leverages Torchaudio but cleanly falls back to soundfile upon environment errors.
Conflict-free Integration: Maintained your recent backend="soundfile" fix in load_audio(), whilst beefing up the exception blockers to catch ImportError and gracefully revert to PyDub+FFmpeg if any PyTorch audio backend fails.

Part 2: JSON-based Internationalization (i18n)

Full interface localization using a lightweight, native .json architecture under omnivoice/locales/.

Key Features

Simple JSON Management: Translations are stored transparently in JSON (e.g. pt_BR.json, en.json, zh.json). Anyone can expand support by adding a single file, eliminating the need for complex .po toolchains.
Logical Masking: Dropdowns and toggles render elegantly in the user's local language (e.g., "Sotaque Português") but the backend still natively extracts the raw English identifier ("portuguese accent") to maintain core generation stability.
Deep UI Integration: Added a --lang argument across demo.py, infer.py, and infer_batch.py. Everything (including console logs and argument help blocks) wraps seamlessly inside the _() dictionary handler.

Part 3: Windows Portability (OmniVoice.bat)

A massive quality-of-life feature for Windows users trying to test the model seamlessly.

Packaged an OmniVoice.bat automated Windows startup script.
Dependency Isolation: Anchors HuggingFace and uv cache downloads permanently inside the project subfolders to avoid fragmenting the user's global C: drive state.
Out-of-the-box Verification: Scans for GPU presence and prints VRAM / PyTorch fallback statuses before launching Gradio, preventing "silent CPU" generation stalls.

Verification

Tested synchronously with the latest --no-asr parameters and the newly reintroduced "Instruct" textboxes in Voice Clone.
Zero-dependency JSON loaders tested extensively on uv run omnivoice/cli/demo.py --lang pt_BR.

…ge UI Support

zhu-han · 2026-04-13T08:19:57Z

Thanks for this PR. In the recent update, I switched the audio I/O from torchaudio to soundfile+librosa, which is similar to the Robust Audio I/O you proposed. The demo page is only intended as a simple demonstration, and I think Chinese + English support will satisfy most users. Regarding Windows portability, I noticed this script contains many hardcoded arguments and assumptions, so it may not be suitable for merging into the master branch. However, if you wish to implement and maintain a Windows version of OmniVoice, you can create your own repository and list it under the community projects section.

feat: Windows Portability & Robust torchaudio I/O + JSON Multi-langua…

10fbb49

…ge UI Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Robust Audio I/O, JSON-based i18n & Windows Portability Enhancements#68

feat: Robust Audio I/O, JSON-based i18n & Windows Portability Enhancements#68
RafaelGodoyEbert wants to merge 1 commit into
k2-fsa:masterfrom
RafaelGodoyEbert:master

RafaelGodoyEbert commented Apr 8, 2026

Uh oh!

zhu-han commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RafaelGodoyEbert commented Apr 8, 2026

Part 1: Robust Audio I/O (TorchCodec Fallback)

The Problem

The Solution

Part 2: JSON-based Internationalization (i18n)

Key Features

Part 3: Windows Portability (OmniVoice.bat)

Verification

Uh oh!

zhu-han commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants