Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

--------------------------------------------------------------------------------

> This repository is a fork of **LightX2V** with [MagiCompiler](https://github.com/SandAI-org/MagiCompiler) integrated. Try it out and check the [MagiCompiler Documentation](README_MagiCompiler.md) for details!

**LightX2V** is an advanced lightweight image/video generation inference framework engineered to deliver efficient, high-performance image/video synthesis solutions. This unified platform integrates multiple state-of-the-art image/video generation techniques, supporting diverse generation tasks including text-to-video (T2V), image-to-video (I2V), text-to-image (T2I), image-editing (I2I). **X2V represents the transformation of different input modalities (X, such as text or images) into vision output (Vision)**.

> 🌐 **Try it online now!** Experience LightX2V without installation: **[LightX2V Online Service](https://x2v.light-ai.top/login)** - Free, lightweight, and fast AI digital human video generation platform.
Expand Down
123 changes: 123 additions & 0 deletions README_MagiCompiler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
<div align="center">

# LightX2V-MagiCompiler

</div>


[MagiCompiler](https://github.com/SandAI-org/MagiCompiler.git) is an advanced compiler and runtime augmentation framework built on top of `torch.compile`. Designed specifically for large-scale Transformer-like architectures, it addresses the critical bottlenecks of memory walls and operator overheads.

By stepping beyond traditional local operator optimization, MagiCompiler introduces system-level optimizations, seamlessly accelerating both training and multi-modality inference workloads with minimal code intrusion.

### 🚀 Using MagiCompiler in LightX2V

To accelerate LightX2V with MagiCompiler, you only need to add minimal code changes to register custom operators and decorate the main inference function:

**1. Register Custom Attention Operators**
Use `@magi_register_custom_op` to register attention functions (like FlashAttention or SageAttention) so they can be recognized and optimized by MagiCompiler.

```python
import torch
from magi_compiler import magi_register_custom_op

# Example: Registering Flash Attention
@magi_register_custom_op("magi_compiler::flash_attn", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
def flash_attn(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
# Your attention implementation (e.g., using flash_attn_interface)
pass

```

**2. Decorate the Inference Function**
Use the `@magi_compile` decorator on the core transformer loop (usually the `infer_without_offload` method) and specify dynamic shape dimensions to enable TorchDynamo tracing and graph optimization.

```python
from magi_compiler import magi_compile

class TransformerInfer(BaseTransformerInfer):
# Specify dynamic dimensions for tensors that change shape (e.g., sequence length)
@magi_compile(dynamic_arg_dims={
"x": 0,
"pre_infer_out.embed": 0,
"pre_infer_out.x": 0,
"pre_infer_out.cos_sin": 0
})
def infer_without_offload(self, blocks, x, pre_infer_out):
for block_idx in range(len(blocks)):
x = self.infer_block(blocks[block_idx], x, pre_infer_out)
return x
```

With just these simple decorators, MagiCompiler can perform graph-level optimizations, fuse operators, and significantly improve the inference speed of LightX2V models like HunyuanVideo and Wan2.2.

## 💡 Quick Start

### Option 1: Installation via Docker (Recommended)
Using Docker is the simplest and fastest way to set up the environment, avoiding tedious environment dependency configurations.

```bash
# 1. Pull the latest MagiCompiler Docker image
docker pull sandai/magi-compiler:latest

# 2. Run and enter the container
# (Please replace /path/to/models with your local models directory)
docker run -it --gpus all -v /path/to/models:/models sandai/magi-compiler:latest bash

# 3. Clone and install MagiCompiler
git clone https://github.com/SandAI-org/MagiCompiler.git
cd MagiCompiler
pip install -r requirements.txt
pip install .
# pip install -e . --no-build-isolation --config-settings editable_mode=compat # Developer / editable
cd ..

# 4. Clone and install LightX2V-MagiCompiler
git clone https://github.com/SandAI-org/LightX2V-MagiCompiler.git
cd LightX2V-MagiCompiler
pip install -v -e .
```

### Option 2: Installation via Conda
If you prefer a local environment, you can create an isolated virtual environment using Conda for source installation.

```bash
# 1. Create and activate a Conda environment (Python 3.12 or higher is recommended)
conda create -n lightx2v python=3.12
conda activate lightx2v

# 2. Install PyTorch
pip install torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0

# 3. Install Flash Attention In Hopper
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
python setup.py install
cd ../..

# 4. Install MagiCompiler
git clone https://github.com/SandAI-org/MagiCompiler.git
cd MagiCompiler
pip install -r requirements.txt
pip install .
# pip install -e . --no-build-isolation --config-settings editable_mode=compat # Developer / editable
cd ..

# 5. Clone the source code and install project dependencies
git clone https://github.com/SandAI-org/LightX2V-MagiCompiler.git
cd LightX2V-MagiCompiler
pip install -r requirements.txt
pip install -v -e .
```


## 🚀 Run LightX2V-MagiCompiler Examples

**Run Wan2.2TI2V-5B**
```bash
bash ./magi_scripts/run_wan.sh
```

**Run Hunyuan1.5 480p_t2v_distilled**
```bash
bash ./magi_scripts/run_hunyuan.sh
```
46 changes: 25 additions & 21 deletions examples/hunyuan_video/hunyuan_t2v_distill.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,48 +3,52 @@
This example demonstrates how to use LightX2V with HunyuanVideo-1.5 4-step distilled model for T2V generation.
"""

import os
from datetime import datetime

from lightx2v import LightX2VPipeline

CP_SIZE = int(os.environ.get("CP_SIZE", 1))
CPU_OFFLOAD = os.environ.get("CPU_OFFLOAD", "false")

# Initialize pipeline for HunyuanVideo-1.5
pipe = LightX2VPipeline(
model_path="/path/to/ckpts/hunyuanvideo-1.5/",
model_path="path/to/HunyuanVideo-1.5/",
model_cls="hunyuan_video_1.5",
transformer_model_name="480p_t2v",
task="t2v",
# 4-step distilled model ckpt
dit_original_ckpt="/path/to/hy1.5_t2v_480p_lightx2v_4step.safetensors",
dit_original_ckpt="path/to/HunyuanVideo-1.5/transformer/480p_t2v_distilled/diffusion_pytorch_model.safetensors",
)

pipe.enable_parallel(
seq_p_size=CP_SIZE, # Sequence parallel size
seq_p_attn_type="ulysses", # Sequence parallel attention type
)

# Alternative: create generator from config JSON file
# pipe.create_generator(config_json="../configs/hunyuan_video_15/hunyuan_video_t2v_720p.json")

# Enable offloading to significantly reduce VRAM usage with minimal speed impact
# Suitable for RTX 30/40/50 consumer GPUs
pipe.enable_offload(
cpu_offload=True,
offload_granularity="block", # For HunyuanVideo-1.5, only "block" is supported
text_encoder_offload=True,
image_encoder_offload=False,
vae_offload=False,
)

# Use lighttae
pipe.enable_lightvae(
use_tae=True,
tae_path="/path/to/lighttaehy1_5.safetensors",
use_lightvae=False,
vae_path=None,
)
if CPU_OFFLOAD == "true":
pipe.enable_offload(
cpu_offload=True,
offload_granularity="block", # For HunyuanVideo-1.5, only "block" is supported
text_encoder_offload=True,
image_encoder_offload=True,
vae_offload=True,
)

# Create generator with specified parameters
pipe.create_generator(attn_mode="sage_attn2", infer_steps=4, num_frames=81, guidance_scale=1, sample_shift=9.0, aspect_ratio="16:9", fps=16, denoising_step_list=[1000, 750, 500, 250])

pipe.create_generator(attn_mode="flash_attn3", infer_steps=10, num_frames=121, guidance_scale=1, sample_shift=5.0, aspect_ratio="16:9", fps=24, denoising_step_list=[1000, 750, 500, 250])
negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"

# Generation parameters
seed = 123
prompt = "A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
negative_prompt = ""
save_result_path = "/data/nvme0/gushiqiao/LightX2V/save_results/output.mp4"
suffix = datetime.now().strftime("%Y%m%d_%H%M%S")
save_result_path = f"./output_hunyuan_t2v_distill_{suffix}.mp4"

# Generate video
pipe.generate(
Expand Down
70 changes: 70 additions & 0 deletions examples/wan/wan_ti2v.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
"""
Wan2.2 image-to-video generation example.
This example demonstrates how to use LightX2V with Wan2.2 model for I2V generation.
"""

import os
from datetime import datetime

from lightx2v import LightX2VPipeline

CP_SIZE = int(os.environ.get("CP_SIZE", 1))


def env_is_true(env_name: str) -> bool:
return str(os.environ.get(env_name, "0")).lower() in {"1", "true", "yes", "y", "on", "enabled"}


CPU_OFFLOAD = env_is_true("CPU_OFFLOAD")


# Generate video

pipe = LightX2VPipeline(
model_path="path/to/Wan2.2-TI2V-5B",
model_cls="wan2.2",
task="i2v",
)
pipe.enable_parallel(
seq_p_size=CP_SIZE, # Sequence parallel size
seq_p_attn_type="ulysses", # Sequence parallel attention type
)

if CPU_OFFLOAD:
print("Enabling CPU offload")
pipe.enable_offload(
cpu_offload=True,
offload_granularity="block", # For HunyuanVideo-1.5, only "block" is supported
text_encoder_offload=True,
image_encoder_offload=True,
vae_offload=True,
)

pipe.create_generator(
attn_mode="flash_attn3",
infer_steps=10,
height=704, # Can be set to 720 for higher resolution
width=1280, # Can be set to 1280 for higher resolution
num_frames=121,
fps=24,
guidance_scale=5.0, # For wan2.1, guidance_scale is a scalar (e.g., 5.0)
sample_shift=5.0,
rope_type="torch",
# config_json="../../configs/wan22/wan_ti2v_i2v.json"
)


seed = 42
prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
negative_prompt = "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
image_path = "path/to/i2v_input.JPG"
suffix = datetime.now().strftime("%Y%m%d_%H%M%S")
save_result_path = f"./output_wan_ti2v_{suffix}.mp4"

pipe.generate(
seed=seed,
image_path=image_path,
prompt=prompt,
negative_prompt=negative_prompt,
save_result_path=save_result_path,
)
36 changes: 22 additions & 14 deletions lightx2v/common/ops/attn/flash_attn.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,31 @@
from loguru import logger

try:
import flash_attn # noqa: F401
from flash_attn.flash_attn_interface import flash_attn_varlen_func
except ImportError:
logger.info("flash_attn_varlen_func not found, please install flash_attn2 first")
flash_attn_varlen_func = None

try:
from flash_attn_interface import flash_attn_varlen_func as flash_attn_varlen_func_v3
except ImportError:
logger.info("flash_attn_varlen_func_v3 not found, please install flash_attn3 first")
flash_attn_varlen_func_v3 = None
import torch
from magi_compiler import magi_register_custom_op

from lightx2v.utils.registry_factory import ATTN_WEIGHT_REGISTER

from .template import AttnWeightTemplate


@magi_register_custom_op("magi_compiler::flash_attn", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
def flash_attn_varlen_func(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, cu_seqlens_q: torch.Tensor, cu_seqlens_kv: torch.Tensor, max_seqlen_q: int, max_seqlen_kv: int) -> torch.Tensor:
try:
from flash_attn.flash_attn_interface import flash_attn_varlen_func as fa2

return fa2(q, k, v, cu_seqlens_q, cu_seqlens_kv, max_seqlen_q, max_seqlen_kv)
except ImportError:
raise ImportError("flash_attn_varlen_func not found, please install flash_attn2 first")


@magi_register_custom_op("magi_compiler::flash_attn_v3", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
def flash_attn_varlen_func_v3(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, cu_seqlens_q: torch.Tensor, cu_seqlens_kv: torch.Tensor, max_seqlen_q: int, max_seqlen_kv: int) -> torch.Tensor:
try:
from flash_attn_interface import flash_attn_varlen_func as fa3

return fa3(q, k, v, cu_seqlens_q, cu_seqlens_kv, max_seqlen_q, max_seqlen_kv)
except ImportError:
raise ImportError("flash_attn_varlen_func_v3 not found, please install flash_attn3 first")


@ATTN_WEIGHT_REGISTER("flash_attn2")
class FlashAttn2Weight(AttnWeightTemplate):
def __init__(self):
Expand Down
31 changes: 19 additions & 12 deletions lightx2v/common/ops/attn/sage_attn.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,12 @@
import torch
from loguru import logger
from magi_compiler import magi_register_custom_op

from lightx2v.utils.registry_factory import ATTN_WEIGHT_REGISTER

from .template import AttnWeightTemplate

capability = torch.cuda.get_device_capability(0) if torch.cuda.is_available() else None
if capability in [(8, 9), (12, 0)]:
try:
from sageattention import sageattn_qk_int8_pv_fp16_triton as sageattn
except ImportError:
logger.info("sageattn not found, please install sageattention first")
sageattn = None
else:
try:
from sageattention import sageattn
except ImportError:
logger.info("sageattn not found, please install sageattention first")
sageattn = None

try:
from sageattn3 import sageattn3_blackwell
Expand All @@ -32,6 +21,24 @@
sageattn3_sparse_blackwell = None


@magi_register_custom_op("magi_compiler::sage_attn", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
def sageattn(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, tensor_layout: str = "NHD") -> torch.Tensor:
if capability in [(8, 9), (12, 0)]:
try:
from sageattention import sageattn_qk_int8_pv_fp16_triton

return sageattn_qk_int8_pv_fp16_triton(q, k, v, tensor_layout=tensor_layout)
except ImportError:
raise ImportError("sageattn_qk_int8_pv_fp16_triton not found, please install sageattention first")
else:
try:
from sageattention import sageattn as sageattn2

return sageattn2(q, k, v, tensor_layout=tensor_layout)
except ImportError:
raise ImportError("sageattn not found, please install sageattention first")


@ATTN_WEIGHT_REGISTER("sage_attn2")
class SageAttn2Weight(AttnWeightTemplate):
def __init__(self):
Expand Down
Loading
Loading