[Common] Disabled the tuned NVFP4 kernels #2615

Oleg-Goncharov · 2026-01-22T18:22:26Z

Description

This PR disables the previously introduced tuned NVFP4 kernels (PR#2412 [Common] Tuned NVFP4 cast kernel) because it produces small numeric mismatches. It will be re-enabled once the issue is fixed.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Disabled the tuned NVFP4 kernels

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Oleg-Goncharov · 2026-01-22T18:22:54Z

/te-ci

greptile-apps · 2026-01-22T18:24:47Z

Greptile Summary

This PR temporarily disables the tuned NVFP4 quantization kernel that was introduced in PR #2412 due to numeric mismatches. The changes are surgical and effective:

Commented out the conditional call to quantize_transpose_tuned_1D() in quantize_transpose_nvfp4.cuh:1171-1174, forcing the code to fall through to the standard kernel path
Removed the use_fast_nvfp4_scaling_vec test parameter and hardcoded use_fast_math to false in the test suite, eliminating test coverage for the now-disabled tuned kernel path

The disable mechanism is clean and reversible - the tuned kernel code remains in the codebase (in specialized/quantize_transpose_nvfp4_tuned_1D.cuh) and can be re-enabled by uncommenting the conditional block once the numeric issues are resolved. The standard NVFP4 kernel will now be used for all cases where the tuned kernel would have been selected.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The changes are minimal, focused, and reversible. By commenting out the problematic kernel path rather than deleting it, the code can be easily re-enabled once fixed. The test changes appropriately remove coverage for the disabled functionality. No breaking changes or side effects expected.
No files require special attention

Important Files Changed

Filename	Overview
transformer_engine/common/cast/nvfp4/quantize_transpose_nvfp4.cuh	Commented out the conditional call to `quantize_transpose_tuned_1D` to disable the tuned NVFP4 kernel path
tests/cpp/operator/test_cast_nvfp4_transpose.cu	Removed `use_fast_nvfp4_scaling_vec` test parameter and hardcoded `use_fast_math` to false in test instantiation

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant QT as quantize_transpose()
    participant Tuned as quantize_transpose_tuned_1D()<br/>(DISABLED)
    participant Standard as Standard NVFP4 Kernel
    
    Note over Test: use_fast_math = false (hardcoded)
    Test->>QT: Call with use_fast_math=false
    Note over QT: Check conditions:<br/>!use_2d_quantization &&<br/>input.dtype() == kBFloat16
    Note over QT,Tuned: Conditional block COMMENTED OUT
    rect rgb(220, 220, 220)
        Note over Tuned: Would call tuned kernel<br/>(NOW DISABLED)
    end
    QT->>Standard: Falls through to standard kernel
    Standard-->>QT: Returns quantized output
    QT-->>Test: Returns result
    
    Note over Test,Standard: Tuned kernel disabled to avoid<br/>numeric mismatches until fixed

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Oleg-Goncharov · 2026-01-23T01:25:31Z

/te-ci

Disabled the tuned NVFP4 kernels

1b08001

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Oleg-Goncharov added bug Something isn't working 2.12.0 labels Jan 22, 2026

ptrendx previously approved these changes Jan 23, 2026

View reviewed changes

Disabled fast math in cpp tests

8469742

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Oleg-Goncharov dismissed ptrendx’s stale review via 8469742 January 23, 2026 01:24

Merge branch 'main' into pr_disable_tuned_nvfp4_kernels

afb745b

ptrendx approved these changes Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Common] Disabled the tuned NVFP4 kernels #2615

[Common] Disabled the tuned NVFP4 kernels #2615

Oleg-Goncharov commented Jan 22, 2026

Uh oh!

Oleg-Goncharov commented Jan 22, 2026

Uh oh!

greptile-apps bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

Oleg-Goncharov commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Common] Disabled the tuned NVFP4 kernels #2615

Are you sure you want to change the base?

[Common] Disabled the tuned NVFP4 kernels #2615

Conversation

Oleg-Goncharov commented Jan 22, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

Oleg-Goncharov commented Jan 22, 2026

Uh oh!

greptile-apps bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Oleg-Goncharov commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Jan 22, 2026 •

edited

Loading