Enhance demo.py with audio saving and settings loading by Redtash1 · Pull Request #89 · k2-fsa/OmniVoice

Redtash1 · 2026-04-12T04:14:31Z

@zhu-han Added automatic voice transcribed words into the "Reference Text Box" when a custom voice is added in "Voice Clone", unless the argument "--no-asr" is used, then like normal, you will have to manually add the words spoken in your custom voice file. Added download models into "ckpts" folder in OmniVoice root. Added optional "--inbrowser" argument to allow gradio to automatically open in browser after successful launch. Added functionality to auto-save generated audio into "outputs" folder in root. Added "omnivoice_settings.JSON" file to automatically save & load last used settings. Added option to save as .wav or .mp3 file. Moved generate button to below Status box, so you don't have to scroll all the way to the bottom to click generate. Added "saved to outputs/name & time" with total time it took to complete the generation. Updated argument handling and UI components for better usability. Thank you.

This is in my portable version I mentioned in "Torch 2.8 is known to have memory leak problems" #9

zhu-han · 2026-04-13T08:13:00Z

Thanks. This is a very large PR. Many of its features are very useful, such as:
Moved generate button to below the Status box

The logics for handling the "--no-asr" case
The "--inbrowser" argument
The audio save function. I would recommend making it optional: only save the audio when an output directory is passed to the script. Also, the audio format should also be passed to the script instead of selecting in the demo.

However, some of these changes may not be very suitable for the project's default demo interface:

Downloading models into the "ckpts" folder in the OmniVoice root: I think some users will simply run pip install omnivoice and then launch the demo. In this case, using the default Hugging Face download method would be better. It also allows sharing the downloaded models across the demo, Python API, and CLI interfaces.
Automatically transcribed voice text into the "Reference Text Box": I considered this approach before. The reason I didn’t implement it is that we have internal logic to handle overly long reference audio. Specifically, if the reference audio is too long, we automatically trim it to a shorter length and then run Whisper ASR to generate the transcript. This behavior is only triggered when no reference text is provided. But with automatic transcription immediately after upload, this behavior will be permanently disabled.
JSON config: This demo script is intended as a simple demonstration. I do not expect users to use this page for production purposes. So I think this config is an over-complicated design for most users.

If you have time to make the changes according to the above suggestions, I would be happy to merge this PR. By the way, in the latest code, the generate function will return a list of np.ndarray with shape (T,) at 24 kHz. This is incompatible with the current code.

Added options for output directory and format in CLI.

Redtash1 · 2026-05-05T01:36:38Z

I think I made the corrections that you asked for. I did add stereo output to the code because in Wan2GP when I tried to use a mono output to make a video, I got an error, so I had to use Audacity to make it into a stereo track to make it work. I hope this is what you were asking for? Thank you.

Redtash1 closed this May 5, 2026

Redtash1 force-pushed the demo.py-enhancements branch from da92c05 to 9e21256 Compare May 5, 2026 01:25

Enhance CLI with output directory and format options

feea19d

Added options for output directory and format in CLI.

Redtash1 reopened this May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance demo.py with audio saving and settings loading#89

Enhance demo.py with audio saving and settings loading#89
Redtash1 wants to merge 1 commit into
k2-fsa:masterfrom
Redtash1:demo.py-enhancements

Redtash1 commented Apr 12, 2026 •

edited

Loading

Uh oh!

zhu-han commented Apr 13, 2026

Uh oh!

Redtash1 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Redtash1 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This is in my portable version I mentioned in "Torch 2.8 is known to have memory leak problems" #9

Uh oh!

zhu-han commented Apr 13, 2026

Uh oh!

Redtash1 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Redtash1 commented Apr 12, 2026 •

edited

Loading