-
-
Notifications
You must be signed in to change notification settings - Fork 12.5k
[Feature] Add flag to disable FlashInfer autotune #31875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Mohammad Miadh Angkad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature flag to disable FlashInfer autotuning, which is a valuable addition for users prioritizing faster startup times. The implementation is clean and correctly propagates the new configuration from the command-line interface to the core logic. I have one suggestion regarding the robustness of accessing this new configuration parameter in vllm/model_executor/warmup/kernel_warmup.py to prevent potential silent failures in the future.
| disable_flashinfer_autotune = getattr( | ||
| worker.vllm_config, "disable_flashinfer_autotune", False | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since disable_flashinfer_autotune is now a defined attribute in the VllmConfig dataclass, it will always be present. Using getattr with a default value can mask potential configuration issues. For example, if the attribute name is misspelled or refactored in the future, this code would silently default to False instead of raising an AttributeError, making debugging more difficult. Direct attribute access is safer and clearer in this case.
disable_flashinfer_autotune = worker.vllm_config.disable_flashinfer_autotuneThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Purpose
FlashInfer autotuning can sometimes take a long time to complete during initialization. This PR introduces a flag to disable it, allowing users to bypass this step if they are okay with skipping optimization to speed up startup.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.