Skip to content

KlingTeam/UniVideo

Repository files navigation

UniVideo: Unified Understanding, Generation, and Editing for Videos

Cong Wei*,1,2Quande Liu†,2Zixuan Ye2Qiulin Wang2Xintao Wang2

Pengfei Wan2Kun Gai2Wenhu Chen†,1

1University of Waterloo    2Kling Team, Kuaishou Technology
*Work done during an internship at Kling Team, Kuaishou Technology Corresponding author

   

🔔News

How to use

1. Installation

conda env create -f environment.yml
conda activate univideo

This environment is tested with:

  • Python 3.11
  • PyTorch 2.4.1 + CUDA 12.1
  • diffusers 0.34.0
  • transformers 4.51.3

2. Download Checkpoint

Download the Univideo checkpoint to a local path for example ckpts/:

python download_ckpt.py

We provide two UniVideo checkpoint variants as described in Arxiv Preprint Section 3.2:

  • Variant 1 (img, video, txt -> mllm -> last layer hidden -> mmdit)
    Image, video, and text inputs are processed by the MLLM, and the final hidden states are fed into the MMDiT backbone.

  • Variant 2 (img, video, txt, queries -> mllm -> txt + queries last layer hidden -> mmdit)
    Image, video, text, and queries are processed by the MLLM. The final hidden states of text and queries are used as inputs to MMDiT.

3. Inference

We provide inference scripts for running UniVideo on demo inputs for each task:

Univideo variant 1

cd univideo
python univideo_inference.py --task understanding --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml
python univideo_inference.py --task multiid       --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml
python univideo_inference.py --task t2v           --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml
python univideo_inference.py --task t2i           --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml
python univideo_inference.py --task i2i_edit      --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml
python univideo_inference.py --task i2v           --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml
python univideo_inference.py --task v2v_edit      --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml
python univideo_inference.py --task i+v2v_edit    --config configs/univideo_qwen2p5vl7b_hidden_hunyuanvideo.yaml

Univideo variant 2

cd univideo
python univideo_inference.py --task understanding --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml
python univideo_inference.py --task multiid       --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml
python univideo_inference.py --task t2v           --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml
python univideo_inference.py --task t2i           --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml
python univideo_inference.py --task i2i_edit      --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml
python univideo_inference.py --task i2v           --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml
python univideo_inference.py --task v2v_edit      --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml
python univideo_inference.py --task i+v2v_edit    --config configs/univideo_qwen2p5vl7b_queries_hunyuanvideo.yaml

Acknowledgement

  • HunyuanVideo: the base video generation model used in this work. Thanks to the authors for their excellent contribution.
  • Qwen2.5-VL: the base vlm model used in this work. Thanks to the authors for their excellent contribution.
  • MetaQueries: we adopt their query implementation. Thanks to the authors for their excellent contribution.

🌟 Citation

If you find UniVideo useful for your research and applications, please cite using this BibTeX:

@article{wei2025univideo,
  title={Univideo: Unified understanding, generation, and editing for videos},
  author={Wei, Cong and Liu, Quande and Ye, Zixuan and Wang, Qiulin and Wang, Xintao and Wan, Pengfei and Gai, Kun and Chen, Wenhu},
  journal={arXiv preprint arXiv:2510.08377},
  year={2025}
}

About

UniVideo: Unified Understanding, Generation, and Editing for Videos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages