Hi Team,
First of all, thank you for releasing the MAGI-1 model and the accompanying resources — the work is really impressive.
I had a question / feature request regarding audio-driven video generation.
On the magi.sand.ai website, there is an option to generate videos that are lip-synced to an input audio file. However, in the open-source implementation here, I don't see any documented way to:
provide an audio file as input,
generate a lip-synced video using both a reference video and audio, or
align the generated frames to spoken audio the same way the website demonstrates.
Could you please clarify:
Is audio-conditioned generation / lip-sync supported in the open-source release?
If yes, can you point to the script, parameters, or example usage?
If not currently available, is this feature planned for a future release?
This capability would be extremely valuable for creating realistic talking videos directly from audio + reference visuals, similar to what the website already provides.
Thanks in advance!
Hi Team,
First of all, thank you for releasing the MAGI-1 model and the accompanying resources — the work is really impressive.
I had a question / feature request regarding audio-driven video generation.
On the magi.sand.ai website, there is an option to generate videos that are lip-synced to an input audio file. However, in the open-source implementation here, I don't see any documented way to:
provide an audio file as input,
generate a lip-synced video using both a reference video and audio, or
align the generated frames to spoken audio the same way the website demonstrates.
Could you please clarify:
Is audio-conditioned generation / lip-sync supported in the open-source release?
If yes, can you point to the script, parameters, or example usage?
If not currently available, is this feature planned for a future release?
This capability would be extremely valuable for creating realistic talking videos directly from audio + reference visuals, similar to what the website already provides.
Thanks in advance!