Skip to content

Conversation

@macpaul
Copy link

@macpaul macpaul commented Jan 8, 2026

Issue: #11626

Main solution is feat(mps): implement native-like Float8 support via LUT dequantization.
However, it encountered some merge conflicts when master branch was updated to 0.8.0 and later.
Hence, I've added a bunch of fixes to the errors I've encountered. Please check if they are adapteable to master branch.

Signed-off-by: Macpaul Lin [email protected]

Add a new MPS-specific operations module to handle Float8 tensor support
on Apple Silicon. Since MPS does not natively support Float8 dtypes, this
implementation uses a uint8 storage strategy combined with a GPU-accelerated
Lookup Table (LUT) for efficient dequantization, keeping data on the GPU.

- Add comfy/mps_ops.py: Implement cached LUT generation and index-based
  dequantization for MPS.
- Modify comfy/quant_ops.py: Add logic to view Float8 tensors as uint8
  when moving to MPS, and route dequantization to mps_ops.
- Modify comfy/float.py: Add CPU staging for stochastic rounding to
  prevent MPS casting errors during quantization.
- Modify comfy/quant_ops.py: Add fallback for fp8_linear.

Signed-off-by: Macpaul Lin <[email protected]>
…pe to prevent precision mismatch RuntimeErrors

Signed-off-by: Macpaul Lin <[email protected]>
…ike for mock QuantizedTensor

Signed-off-by: Macpaul Lin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant