Skip to content

Conversation

@Oleg-Goncharov
Copy link
Collaborator

Description

This PR disables the previously introduced tuned NVFP4 kernels (PR#2412 [Common] Tuned NVFP4 cast kernel) because it produces small numeric mismatches. It will be re-enabled once the issue is fixed.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Disabled the tuned NVFP4 kernels

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
@Oleg-Goncharov
Copy link
Collaborator Author

/te-ci

@Oleg-Goncharov Oleg-Goncharov added bug Something isn't working 2.12.0 labels Jan 22, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 22, 2026

Greptile Summary

This PR temporarily disables the tuned NVFP4 quantization kernel that was introduced in PR #2412 due to numeric mismatches. The changes are surgical and effective:

  • Commented out the conditional call to quantize_transpose_tuned_1D() in quantize_transpose_nvfp4.cuh:1171-1174, forcing the code to fall through to the standard kernel path
  • Removed the use_fast_nvfp4_scaling_vec test parameter and hardcoded use_fast_math to false in the test suite, eliminating test coverage for the now-disabled tuned kernel path

The disable mechanism is clean and reversible - the tuned kernel code remains in the codebase (in specialized/quantize_transpose_nvfp4_tuned_1D.cuh) and can be re-enabled by uncommenting the conditional block once the numeric issues are resolved. The standard NVFP4 kernel will now be used for all cases where the tuned kernel would have been selected.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are minimal, focused, and reversible. By commenting out the problematic kernel path rather than deleting it, the code can be easily re-enabled once fixed. The test changes appropriately remove coverage for the disabled functionality. No breaking changes or side effects expected.
  • No files require special attention

Important Files Changed

Filename Overview
transformer_engine/common/cast/nvfp4/quantize_transpose_nvfp4.cuh Commented out the conditional call to quantize_transpose_tuned_1D to disable the tuned NVFP4 kernel path
tests/cpp/operator/test_cast_nvfp4_transpose.cu Removed use_fast_nvfp4_scaling_vec test parameter and hardcoded use_fast_math to false in test instantiation

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant QT as quantize_transpose()
    participant Tuned as quantize_transpose_tuned_1D()<br/>(DISABLED)
    participant Standard as Standard NVFP4 Kernel
    
    Note over Test: use_fast_math = false (hardcoded)
    Test->>QT: Call with use_fast_math=false
    Note over QT: Check conditions:<br/>!use_2d_quantization &&<br/>input.dtype() == kBFloat16
    Note over QT,Tuned: Conditional block COMMENTED OUT
    rect rgb(220, 220, 220)
        Note over Tuned: Would call tuned kernel<br/>(NOW DISABLED)
    end
    QT->>Standard: Falls through to standard kernel
    Standard-->>QT: Returns quantized output
    QT-->>Test: Returns result
    
    Note over Test,Standard: Tuned kernel disabled to avoid<br/>numeric mismatches until fixed
Loading

ptrendx
ptrendx previously approved these changes Jan 23, 2026
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
@Oleg-Goncharov
Copy link
Collaborator Author

/te-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.12.0 bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants