Skip to content

Enhance demo.py with audio saving and settings loading#89

Open
Redtash1 wants to merge 1 commit into
k2-fsa:masterfrom
Redtash1:demo.py-enhancements
Open

Enhance demo.py with audio saving and settings loading#89
Redtash1 wants to merge 1 commit into
k2-fsa:masterfrom
Redtash1:demo.py-enhancements

Conversation

@Redtash1

@Redtash1 Redtash1 commented Apr 12, 2026

Copy link
Copy Markdown

@zhu-han Added automatic voice transcribed words into the "Reference Text Box" when a custom voice is added in "Voice Clone", unless the argument "--no-asr" is used, then like normal, you will have to manually add the words spoken in your custom voice file. Added download models into "ckpts" folder in OmniVoice root. Added optional "--inbrowser" argument to allow gradio to automatically open in browser after successful launch. Added functionality to auto-save generated audio into "outputs" folder in root. Added "omnivoice_settings.JSON" file to automatically save & load last used settings. Added option to save as .wav or .mp3 file. Moved generate button to below Status box, so you don't have to scroll all the way to the bottom to click generate. Added "saved to outputs/name & time" with total time it took to complete the generation. Updated argument handling and UI components for better usability. Thank you.


Screenshot 2026-04-11 215324
Screenshot 2026-04-12 003353
Screenshot 2026-04-11 215346

This is in my portable version I mentioned in "Torch 2.8 is known to have memory leak problems" #9

Screenshot 2026-04-11 215449

@zhu-han

zhu-han commented Apr 13, 2026

Copy link
Copy Markdown
Collaborator

Thanks. This is a very large PR. Many of its features are very useful, such as:
Moved generate button to below the Status box

  1. The logics for handling the "--no-asr" case
  2. The "--inbrowser" argument
  3. The audio save function. I would recommend making it optional: only save the audio when an output directory is passed to the script. Also, the audio format should also be passed to the script instead of selecting in the demo.

However, some of these changes may not be very suitable for the project's default demo interface:

  1. Downloading models into the "ckpts" folder in the OmniVoice root: I think some users will simply run pip install omnivoice and then launch the demo. In this case, using the default Hugging Face download method would be better. It also allows sharing the downloaded models across the demo, Python API, and CLI interfaces.
  2. Automatically transcribed voice text into the "Reference Text Box": I considered this approach before. The reason I didn’t implement it is that we have internal logic to handle overly long reference audio. Specifically, if the reference audio is too long, we automatically trim it to a shorter length and then run Whisper ASR to generate the transcript. This behavior is only triggered when no reference text is provided. But with automatic transcription immediately after upload, this behavior will be permanently disabled.
  3. JSON config: This demo script is intended as a simple demonstration. I do not expect users to use this page for production purposes. So I think this config is an over-complicated design for most users.

If you have time to make the changes according to the above suggestions, I would be happy to merge this PR. By the way, in the latest code, the generate function will return a list of np.ndarray with shape (T,) at 24 kHz. This is incompatible with the current code.

@Redtash1 Redtash1 closed this May 5, 2026
@Redtash1 Redtash1 force-pushed the demo.py-enhancements branch from da92c05 to 9e21256 Compare May 5, 2026 01:25
Added options for output directory and format in CLI.
@Redtash1 Redtash1 reopened this May 5, 2026
@Redtash1

Redtash1 commented May 5, 2026

Copy link
Copy Markdown
Author

I think I made the corrections that you asked for. I did add stereo output to the code because in Wan2GP when I tried to use a mono output to make a video, I got an error, so I had to use Audacity to make it into a stereo track to make it work. I hope this is what you were asking for? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants