Zero Shot RVC #1195
Replies: 5 comments
-
|
cc @Jerrister |
Beta Was this translation helpful? Give feedback.
-
|
Hi @rasenganai I do have some ideas on using F5 architecture for zero-shot VC, but I am not quite understand how F5 could help RVC. Can you explain your idea more clearly? I guess works like Seed-VC or vec2wav 2.0 could be relevant to your idea? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @Jerrister , Current F5 takes: for Voice Conversion: So we condition it on semantic information (continuous or k-means) instead of text. I will try Seed-VC. Please let me know your thoughts. |
Beta Was this translation helpful? Give feedback.
-
|
I get your point now. I think the key is to make sure that your semantic features do not contain speaker information. Otherwise, synthetic parallel data is needed, like what Seed-VC did. |
Beta Was this translation helpful? Give feedback.
-
|
Yes, that makes sense. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Team, F5 models works really well, in terms of copying the tonality as well as Identity.
Thanks for open-sourcing this amazing repo.
I'm exploring voice cloning such as RVC models which take speech input converts it into hubert embeddings and train a GAN model on top of it to get the waveform back, at inference the input speech could be of any speaker and the trained model superimpose the target speaker identity keeping the semantics same.
I wanted to know what do you think about releasing a Better RVC Alternate (Zero-SHOT) by following the same training recipe as F5 but instead use wav2vec2/hubert features as input?
May I suggest using seamless wav2vec2-xls-r-1b, to get these features and training the model.
I think the community would really like a Zero-Shot RVC alternate. I don't have the compute to execute this but would really like to know your thoughts.
Beta Was this translation helpful? Give feedback.
All reactions