Skip to content

Conversation

@mmangkad
Copy link
Contributor

@mmangkad mmangkad commented Jan 7, 2026

Purpose

FlashInfer autotuning can sometimes take a long time to complete during initialization. This PR introduces a flag to disable it, allowing users to bypass this step if they are okay with skipping optimization to speed up startup.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Mohammad Miadh Angkad <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature flag to disable FlashInfer autotuning, which is a valuable addition for users prioritizing faster startup times. The implementation is clean and correctly propagates the new configuration from the command-line interface to the core logic. I have one suggestion regarding the robustness of accessing this new configuration parameter in vllm/model_executor/warmup/kernel_warmup.py to prevent potential silent failures in the future.

Comment on lines +39 to +41
disable_flashinfer_autotune = getattr(
worker.vllm_config, "disable_flashinfer_autotune", False
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Since disable_flashinfer_autotune is now a defined attribute in the VllmConfig dataclass, it will always be present. Using getattr with a default value can mask potential configuration issues. For example, if the attribute name is misspelled or refactored in the future, this code would silently default to False instead of raising an AttributeError, making debugging more difficult. Direct attribute access is safer and clearer in this case.

    disable_flashinfer_autotune = worker.vllm_config.disable_flashinfer_autotune

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant