-
Notifications
You must be signed in to change notification settings - Fork 612
[Common] Disabled the tuned NVFP4 kernels #2615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Common] Disabled the tuned NVFP4 kernels #2615
Conversation
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
Greptile SummaryThis PR temporarily disables the tuned NVFP4 quantization kernel that was introduced in PR #2412 due to numeric mismatches. The changes are surgical and effective:
The disable mechanism is clean and reversible - the tuned kernel code remains in the codebase (in Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as Test Suite
participant QT as quantize_transpose()
participant Tuned as quantize_transpose_tuned_1D()<br/>(DISABLED)
participant Standard as Standard NVFP4 Kernel
Note over Test: use_fast_math = false (hardcoded)
Test->>QT: Call with use_fast_math=false
Note over QT: Check conditions:<br/>!use_2d_quantization &&<br/>input.dtype() == kBFloat16
Note over QT,Tuned: Conditional block COMMENTED OUT
rect rgb(220, 220, 220)
Note over Tuned: Would call tuned kernel<br/>(NOW DISABLED)
end
QT->>Standard: Falls through to standard kernel
Standard-->>QT: Returns quantized output
QT-->>Test: Returns result
Note over Test,Standard: Tuned kernel disabled to avoid<br/>numeric mismatches until fixed
|
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
Description
This PR disables the previously introduced tuned NVFP4 kernels (PR#2412 [Common] Tuned NVFP4 cast kernel) because it produces small numeric mismatches. It will be re-enabled once the issue is fixed.
Type of change
Changes
Checklist: