SandAI-org · jiahy0825 · Mar 25, 2026 · Mar 14, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/README.md b/README.md
@@ -16,6 +16,8 @@
 
 --------------------------------------------------------------------------------
 
+> This repository is a fork of **LightX2V** with [MagiCompiler](https://github.com/SandAI-org/MagiCompiler) integrated. Try it out and check the [MagiCompiler Documentation](README_MagiCompiler.md) for details!
+
 **LightX2V** is an advanced lightweight image/video generation inference framework engineered to deliver efficient, high-performance image/video synthesis solutions. This unified platform integrates multiple state-of-the-art image/video generation techniques, supporting diverse generation tasks including text-to-video (T2V), image-to-video (I2V), text-to-image (T2I), image-editing (I2I). **X2V represents the transformation of different input modalities (X, such as text or images) into vision output (Vision)**.
 
 > 🌐 **Try it online now!** Experience LightX2V without installation: **[LightX2V Online Service](https://x2v.light-ai.top/login)** - Free, lightweight, and fast AI digital human video generation platform.

diff --git a/README_MagiCompiler.md b/README_MagiCompiler.md
@@ -0,0 +1,123 @@
+<div align="center">
+
+# LightX2V-MagiCompiler
+
+</div>
+
+
+[MagiCompiler](https://github.com/SandAI-org/MagiCompiler.git) is an advanced compiler and runtime augmentation framework built on top of `torch.compile`. Designed specifically for large-scale Transformer-like architectures, it addresses the critical bottlenecks of memory walls and operator overheads.
+
+By stepping beyond traditional local operator optimization, MagiCompiler introduces system-level optimizations, seamlessly accelerating both training and multi-modality inference workloads with minimal code intrusion.
+
+### 🚀 Using MagiCompiler in LightX2V
+
+To accelerate LightX2V with MagiCompiler, you only need to add minimal code changes to register custom operators and decorate the main inference function:
+
+**1. Register Custom Attention Operators**
+Use `@magi_register_custom_op` to register attention functions (like FlashAttention or SageAttention) so they can be recognized and optimized by MagiCompiler.
+
+```python
+import torch
+from magi_compiler import magi_register_custom_op
+
+# Example: Registering Flash Attention
+@magi_register_custom_op("magi_compiler::flash_attn", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
+def flash_attn(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
+    # Your attention implementation (e.g., using flash_attn_interface)
+    pass
+
+```
+
+**2. Decorate the Inference Function**
+Use the `@magi_compile` decorator on the core transformer loop (usually the `infer_without_offload` method) and specify dynamic shape dimensions to enable TorchDynamo tracing and graph optimization.
+
+```python
+from magi_compiler import magi_compile
+
+class TransformerInfer(BaseTransformerInfer):
+    # Specify dynamic dimensions for tensors that change shape (e.g., sequence length)
+    @magi_compile(dynamic_arg_dims={
+        "x": 0,
+        "pre_infer_out.embed": 0,
+        "pre_infer_out.x": 0,
+        "pre_infer_out.cos_sin": 0
+    })
+    def infer_without_offload(self, blocks, x, pre_infer_out):
+        for block_idx in range(len(blocks)):
+            x = self.infer_block(blocks[block_idx], x, pre_infer_out)
+        return x
+```
+
+With just these simple decorators, MagiCompiler can perform graph-level optimizations, fuse operators, and significantly improve the inference speed of LightX2V models like HunyuanVideo and Wan2.2.
+
+## 💡 Quick Start
+
+### Option 1: Installation via Docker (Recommended)
+Using Docker is the simplest and fastest way to set up the environment, avoiding tedious environment dependency configurations.
+
+```bash
+# 1. Pull the latest MagiCompiler Docker image
+docker pull sandai/magi-compiler:latest
+
+# 2. Run and enter the container
+# (Please replace /path/to/models with your local models directory)
+docker run -it --gpus all -v /path/to/models:/models sandai/magi-compiler:latest bash
+
+# 3. Clone and install MagiCompiler
+git clone https://github.com/SandAI-org/MagiCompiler.git
+cd MagiCompiler
+pip install -r requirements.txt
+pip install .
+# pip install -e . --no-build-isolation --config-settings editable_mode=compat  # Developer / editable
+cd ..
+
+# 4. Clone and install LightX2V-MagiCompiler
+git clone https://github.com/SandAI-org/LightX2V-MagiCompiler.git
+cd LightX2V-MagiCompiler
+pip install -v -e .
+```
+
+### Option 2: Installation via Conda
+If you prefer a local environment, you can create an isolated virtual environment using Conda for source installation.
+
+```bash
+# 1. Create and activate a Conda environment (Python 3.12 or higher is recommended)
+conda create -n lightx2v python=3.12
+conda activate lightx2v
+
+# 2. Install PyTorch
+pip install torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0
+
+# 3. Install Flash Attention In Hopper
+git clone https://github.com/Dao-AILab/flash-attention
+cd flash-attention/hopper
+python setup.py install
+cd ../..
+
+# 4. Install MagiCompiler
+git clone https://github.com/SandAI-org/MagiCompiler.git
+cd MagiCompiler
+pip install -r requirements.txt
+pip install .
+# pip install -e . --no-build-isolation --config-settings editable_mode=compat  # Developer / editable
+cd ..
+
+# 5. Clone the source code and install project dependencies
+git clone https://github.com/SandAI-org/LightX2V-MagiCompiler.git
+cd LightX2V-MagiCompiler
+pip install -r requirements.txt
+pip install -v -e .
+```
+
+
+## 🚀 Run LightX2V-MagiCompiler Examples
+
+**Run Wan2.2TI2V-5B**
+```bash
+bash ./magi_scripts/run_wan.sh
+```
+
+**Run Hunyuan1.5 480p_t2v_distilled**
+```bash
+bash ./magi_scripts/run_hunyuan.sh
+```
diff --git a/examples/hunyuan_video/hunyuan_t2v_distill.py b/examples/hunyuan_video/hunyuan_t2v_distill.py
@@ -3,48 +3,52 @@
 This example demonstrates how to use LightX2V with HunyuanVideo-1.5 4-step distilled model for T2V generation.
 """
 
+import os
+from datetime import datetime
+
 from lightx2v import LightX2VPipeline
 
+CP_SIZE = int(os.environ.get("CP_SIZE", 1))
+CPU_OFFLOAD = os.environ.get("CPU_OFFLOAD", "false")
+
 # Initialize pipeline for HunyuanVideo-1.5
 pipe = LightX2VPipeline(
-    model_path="/path/to/ckpts/hunyuanvideo-1.5/",
+    model_path="path/to/HunyuanVideo-1.5/",
     model_cls="hunyuan_video_1.5",
     transformer_model_name="480p_t2v",
     task="t2v",
     # 4-step distilled model ckpt
-    dit_original_ckpt="/path/to/hy1.5_t2v_480p_lightx2v_4step.safetensors",
+    dit_original_ckpt="path/to/HunyuanVideo-1.5/transformer/480p_t2v_distilled/diffusion_pytorch_model.safetensors",
+)
+
+pipe.enable_parallel(
+    seq_p_size=CP_SIZE,  # Sequence parallel size
+    seq_p_attn_type="ulysses",  # Sequence parallel attention type
 )
 
 # Alternative: create generator from config JSON file
 # pipe.create_generator(config_json="../configs/hunyuan_video_15/hunyuan_video_t2v_720p.json")
 
 # Enable offloading to significantly reduce VRAM usage with minimal speed impact
 # Suitable for RTX 30/40/50 consumer GPUs
-pipe.enable_offload(
-    cpu_offload=True,
-    offload_granularity="block",  # For HunyuanVideo-1.5, only "block" is supported
-    text_encoder_offload=True,
-    image_encoder_offload=False,
-    vae_offload=False,
-)
-
-# Use lighttae
-pipe.enable_lightvae(
-    use_tae=True,
-    tae_path="/path/to/lighttaehy1_5.safetensors",
-    use_lightvae=False,
-    vae_path=None,
-)
+if CPU_OFFLOAD == "true":
+    pipe.enable_offload(
+        cpu_offload=True,
+        offload_granularity="block",  # For HunyuanVideo-1.5, only "block" is supported
+        text_encoder_offload=True,
+        image_encoder_offload=True,
+        vae_offload=True,
+    )
 
 # Create generator with specified parameters
-pipe.create_generator(attn_mode="sage_attn2", infer_steps=4, num_frames=81, guidance_scale=1, sample_shift=9.0, aspect_ratio="16:9", fps=16, denoising_step_list=[1000, 750, 500, 250])
-
+pipe.create_generator(attn_mode="flash_attn3", infer_steps=10, num_frames=121, guidance_scale=1, sample_shift=5.0, aspect_ratio="16:9", fps=24, denoising_step_list=[1000, 750, 500, 250])
+negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
 
 # Generation parameters
 seed = 123
 prompt = "A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
-negative_prompt = ""
-save_result_path = "/data/nvme0/gushiqiao/LightX2V/save_results/output.mp4"
+suffix = datetime.now().strftime("%Y%m%d_%H%M%S")
+save_result_path = f"./output_hunyuan_t2v_distill_{suffix}.mp4"
 
 # Generate video
 pipe.generate(

diff --git a/examples/wan/wan_ti2v.py b/examples/wan/wan_ti2v.py
@@ -0,0 +1,70 @@
+"""
+Wan2.2 image-to-video generation example.
+This example demonstrates how to use LightX2V with Wan2.2 model for I2V generation.
+"""
+
+import os
+from datetime import datetime
+
+from lightx2v import LightX2VPipeline
+
+CP_SIZE = int(os.environ.get("CP_SIZE", 1))
+
+
+def env_is_true(env_name: str) -> bool:
+    return str(os.environ.get(env_name, "0")).lower() in {"1", "true", "yes", "y", "on", "enabled"}
+
+
+CPU_OFFLOAD = env_is_true("CPU_OFFLOAD")
+
+
+# Generate video
+
+pipe = LightX2VPipeline(
+    model_path="path/to/Wan2.2-TI2V-5B",
+    model_cls="wan2.2",
+    task="i2v",
+)
+pipe.enable_parallel(
+    seq_p_size=CP_SIZE,  # Sequence parallel size
+    seq_p_attn_type="ulysses",  # Sequence parallel attention type
+)
+
+if CPU_OFFLOAD:
+    print("Enabling CPU offload")
+    pipe.enable_offload(
+        cpu_offload=True,
+        offload_granularity="block",  # For HunyuanVideo-1.5, only "block" is supported
+        text_encoder_offload=True,
+        image_encoder_offload=True,
+        vae_offload=True,
+    )
+
+pipe.create_generator(
+    attn_mode="flash_attn3",
+    infer_steps=10,
+    height=704,  # Can be set to 720 for higher resolution
+    width=1280,  # Can be set to 1280 for higher resolution
+    num_frames=121,
+    fps=24,
+    guidance_scale=5.0,  # For wan2.1, guidance_scale is a scalar (e.g., 5.0)
+    sample_shift=5.0,
+    rope_type="torch",
+    # config_json="../../configs/wan22/wan_ti2v_i2v.json"
+)
+
+
+seed = 42
+prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
+negative_prompt = "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
+image_path = "path/to/i2v_input.JPG"
+suffix = datetime.now().strftime("%Y%m%d_%H%M%S")
+save_result_path = f"./output_wan_ti2v_{suffix}.mp4"
+
+pipe.generate(
+    seed=seed,
+    image_path=image_path,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    save_result_path=save_result_path,
+)
diff --git a/lightx2v/common/ops/attn/flash_attn.py b/lightx2v/common/ops/attn/flash_attn.py
@@ -1,23 +1,31 @@
-from loguru import logger
-
-try:
-    import flash_attn  # noqa: F401
-    from flash_attn.flash_attn_interface import flash_attn_varlen_func
-except ImportError:
-    logger.info("flash_attn_varlen_func not found, please install flash_attn2 first")
-    flash_attn_varlen_func = None
-
-try:
-    from flash_attn_interface import flash_attn_varlen_func as flash_attn_varlen_func_v3
-except ImportError:
-    logger.info("flash_attn_varlen_func_v3 not found, please install flash_attn3 first")
-    flash_attn_varlen_func_v3 = None
+import torch
+from magi_compiler import magi_register_custom_op
 
 from lightx2v.utils.registry_factory import ATTN_WEIGHT_REGISTER
 
 from .template import AttnWeightTemplate
 
 
+@magi_register_custom_op("magi_compiler::flash_attn", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
+def flash_attn_varlen_func(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, cu_seqlens_q: torch.Tensor, cu_seqlens_kv: torch.Tensor, max_seqlen_q: int, max_seqlen_kv: int) -> torch.Tensor:
+    try:
+        from flash_attn.flash_attn_interface import flash_attn_varlen_func as fa2
+
+        return fa2(q, k, v, cu_seqlens_q, cu_seqlens_kv, max_seqlen_q, max_seqlen_kv)
+    except ImportError:
+        raise ImportError("flash_attn_varlen_func not found, please install flash_attn2 first")
+
+
+@magi_register_custom_op("magi_compiler::flash_attn_v3", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
+def flash_attn_varlen_func_v3(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, cu_seqlens_q: torch.Tensor, cu_seqlens_kv: torch.Tensor, max_seqlen_q: int, max_seqlen_kv: int) -> torch.Tensor:
+    try:
+        from flash_attn_interface import flash_attn_varlen_func as fa3
+
+        return fa3(q, k, v, cu_seqlens_q, cu_seqlens_kv, max_seqlen_q, max_seqlen_kv)
+    except ImportError:
+        raise ImportError("flash_attn_varlen_func_v3 not found, please install flash_attn3 first")
+
+
 @ATTN_WEIGHT_REGISTER("flash_attn2")
 class FlashAttn2Weight(AttnWeightTemplate):
     def __init__(self):

diff --git a/lightx2v/common/ops/attn/sage_attn.py b/lightx2v/common/ops/attn/sage_attn.py
@@ -1,23 +1,12 @@
 import torch
 from loguru import logger
+from magi_compiler import magi_register_custom_op
 
 from lightx2v.utils.registry_factory import ATTN_WEIGHT_REGISTER
 
 from .template import AttnWeightTemplate
 
 capability = torch.cuda.get_device_capability(0) if torch.cuda.is_available() else None
-if capability in [(8, 9), (12, 0)]:
-    try:
-        from sageattention import sageattn_qk_int8_pv_fp16_triton as sageattn
-    except ImportError:
-        logger.info("sageattn not found, please install sageattention first")
-        sageattn = None
-else:
-    try:
-        from sageattention import sageattn
-    except ImportError:
-        logger.info("sageattn not found, please install sageattention first")
-        sageattn = None
 
 try:
     from sageattn3 import sageattn3_blackwell
@@ -32,6 +21,24 @@
     sageattn3_sparse_blackwell = None
 
 
+@magi_register_custom_op("magi_compiler::sage_attn", infer_output_meta_fn=["q"], is_subgraph_boundary=True)
+def sageattn(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, tensor_layout: str = "NHD") -> torch.Tensor:
+    if capability in [(8, 9), (12, 0)]:
+        try:
+            from sageattention import sageattn_qk_int8_pv_fp16_triton
+
+            return sageattn_qk_int8_pv_fp16_triton(q, k, v, tensor_layout=tensor_layout)
+        except ImportError:
+            raise ImportError("sageattn_qk_int8_pv_fp16_triton not found, please install sageattention first")
+    else:
+        try:
+            from sageattention import sageattn as sageattn2
+
+            return sageattn2(q, k, v, tensor_layout=tensor_layout)
+        except ImportError:
+            raise ImportError("sageattn not found, please install sageattention first")
+
+
 @ATTN_WEIGHT_REGISTER("sage_attn2")
 class SageAttn2Weight(AttnWeightTemplate):
     def __init__(self):
-Original file line number
+Diff line change
@@ Expand Up / @@ -16,6 +16,8 @@ @@
     --------------------------------------------------------------------------------
+    > This repository is a fork of **LightX2V** with [MagiCompiler](https://github.com/SandAI-org/MagiCompiler) integrated. Try it out and check the [MagiCompiler Documentation](README_MagiCompiler.md) for details!
     **LightX2V** is an advanced lightweight image/video generation inference framework engineered to deliver efficient, high-performance image/video synthesis solutions. This unified platform integrates multiple state-of-the-art image/video generation techniques, supporting diverse generation tasks including text-to-video (T2V), image-to-video (I2V), text-to-image (T2I), image-editing (I2I). **X2V represents the transformation of different input modalities (X, such as text or images) into vision output (Vision)**.
     > 🌐 **Try it online now!** Experience LightX2V without installation: **[LightX2V Online Service](https://x2v.light-ai.top/login)** - Free, lightweight, and fast AI digital human video generation platform.
@@ Expand Down @@