Using enable_sequential_cpu_offload() together with the FP8 optimization wrapper leads to a RecursionError during inference.
This occurs in videox_fun/utils/fp8_optimization.py.
Reproduction
in notebook.ipynb:
pipe.enable_sequential_cpu_offload()
with torch.no_grad():
pipe(...)
ERROR :
RecursionError: maximum recursion depth exceeded
ATTACHMENTS :
PROBABLE CAUSE :
The FP8 wrapper (in videox_fun/utils/fp8_optimization.py) overrides forward and performs .to() inside the forward pass.
When combined with accelerate hooks used in sequential CPU offloading:
.to() triggers recursive module traversal
- wrapped forward functions are re-entered
- leading to infinite recursion
Additionaally, original_forward may capture an already wrapped forward function, further contributing to recursion.
RUN ENVIRONMENT :
- Platform: Linux (Ubuntu)
- PyTorch: provided version [req.txt]
- accelerate: provided version [req.txt]
Using
enable_sequential_cpu_offload()together with the FP8 optimization wrapper leads to a RecursionError during inference.This occurs in
videox_fun/utils/fp8_optimization.py.Reproduction
in
notebook.ipynb:ERROR :
RecursionError: maximum recursion depth exceededATTACHMENTS :
PROBABLE CAUSE :
The FP8 wrapper (in
videox_fun/utils/fp8_optimization.py) overridesforwardand performs.to()inside the forward pass.When combined with
acceleratehooks used in sequential CPU offloading:.to()triggers recursive module traversalAdditionaally,
original_forwardmay capture an already wrapped forward function, further contributing to recursion.RUN ENVIRONMENT :