Skip to content

Conversation

@R3hankhan123
Copy link
Contributor

@R3hankhan123 R3hankhan123 commented Jan 7, 2026

Purpose

Extend existing VLLMValidationError implementation to populate the param field for additional validation scenarios.

Changes:

  • Add _ValueError in sampling_params.py to avoid circular imports with protocol.py
  • Update sampling parameter validations to use _ValueError with parameter metadata: temperature, top_p, logprobs, prompt_logprobs, truncate_prompt_tokens, bad_words
  • Update V1 engine input_processor.py validations: logprobs, prompt_logprobs, logit_bias
  • Extend error handling in serving_engine.py to use duck-typing for _ValueError
  • Pass exception objects (not strings) to create_error_response to preserve metadata

New validations now populate the param field:

  • temperature, top_p, logprobs, prompt_logprobs, logit_bias, truncate_prompt_tokens

Test Plan

  1. Build vLLM image and run the server
  2. Send curl Requests to the server using a script

Test Result

  1. Starting of server
rehankhan@Rehans-MacBook-Pro vllm % docker run --rm --env VLLM_USE_V1=1 -p 8000:8000 local  --model=gpt2 --port=8000
INFO 01-06 13:26:11 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 01-06 13:26:14 [argparse_utils.py:195] With `vllm serve`, you should provide the model as a positional argument or in a config file instead of via the `--model` option. The `--model` option will be removed in v0.13.
(APIServer pid=1) INFO 01-06 13:26:14 [api_server.py:1278] vLLM API server version 0.14.0rc1.dev284+g6ebb66cce.d20260106
(APIServer pid=1) INFO 01-06 13:26:14 [utils.py:253] non-default args: {'model_tag': 'gpt2', 'model': 'gpt2'}
(APIServer pid=1) INFO 01-06 13:26:19 [model.py:522] Resolved architecture: GPT2LMHeadModel
(APIServer pid=1) INFO 01-06 13:26:20 [model.py:1816] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=1) INFO 01-06 13:26:20 [model.py:1508] Using max model len 1024
(APIServer pid=1) WARNING 01-06 13:26:20 [cpu.py:157] VLLM_CPU_KVCACHE_SPACE not set. Using 11.72 GiB for KV cache.
(APIServer pid=1) INFO 01-06 13:26:20 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=1) INFO 01-06 13:26:20 [vllm.py:635] Disabling NCCL for DP synchronization when using async scheduling.
(APIServer pid=1) INFO 01-06 13:26:20 [vllm.py:640] Asynchronous scheduling is enabled.
INFO 01-06 13:26:27 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(EngineCore_DP0 pid=60) INFO 01-06 13:26:28 [core.py:96] Initializing a V1 LLM engine (v0.14.0rc1.dev284+g6ebb66cce.d20260106) with config: model='gpt2', speculative_config=None, tokenizer='gpt2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cpu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False), seed=0, served_model_name=gpt2, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.DYNAMO_TRACE_ONCE: 2>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': None, 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'dce': True, 'size_asserts': False, 'nan_asserts': False, 'epilogue_fusion': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False}, 'local_cache_dir': None}
(EngineCore_DP0 pid=60) INFO 01-06 13:26:29 [parallel_state.py:1214] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.17.0.2:47969 backend=gloo
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=60) INFO 01-06 13:26:29 [parallel_state.py:1425] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
(EngineCore_DP0 pid=60) INFO 01-06 13:26:29 [cpu_model_runner.py:55] Starting to load model gpt2...
(EngineCore_DP0 pid=60) INFO 01-06 13:26:52 [weight_utils.py:510] Time spent downloading weights for gpt2: 20.850118 seconds
(EngineCore_DP0 pid=60) INFO 01-06 13:26:52 [weight_utils.py:550] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.07it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.07it/s]
(EngineCore_DP0 pid=60) 
(EngineCore_DP0 pid=60) INFO 01-06 13:26:52 [default_loader.py:308] Loading weights took 0.20 seconds
(EngineCore_DP0 pid=60) INFO 01-06 13:26:52 [kv_cache_utils.py:1305] GPU KV cache size: 341,248 tokens
(EngineCore_DP0 pid=60) INFO 01-06 13:26:52 [kv_cache_utils.py:1310] Maximum concurrency for 1,024 tokens per request: 333.25x
(EngineCore_DP0 pid=60) INFO 01-06 13:26:55 [cpu_model_runner.py:65] Warming up model for the compilation...
(EngineCore_DP0 pid=60) INFO 01-06 13:27:05 [cpu_model_runner.py:75] Warming up done.
(EngineCore_DP0 pid=60) INFO 01-06 13:27:05 [core.py:273] init engine (profile, create kv cache, warmup model) took 12.64 seconds
(EngineCore_DP0 pid=60) INFO 01-06 13:27:07 [vllm.py:640] Asynchronous scheduling is disabled.
(EngineCore_DP0 pid=60) WARNING 01-06 13:27:07 [vllm.py:671] Inductor compilation was disabled by user settings,Optimizations settings that are only active duringInductor compilation will be ignored.
(EngineCore_DP0 pid=60) WARNING 01-06 13:27:07 [cpu.py:157] VLLM_CPU_KVCACHE_SPACE not set. Using 11.72 GiB for KV cache.
(APIServer pid=1) INFO 01-06 13:27:07 [api_server.py:1020] Supported tasks: ['generate']
(APIServer pid=1) INFO 01-06 13:27:08 [serving_chat.py:180] Warming up chat template processing...
(APIServer pid=1) INFO 01-06 13:27:11 [chat_utils.py:599] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220] Chat template warmup failed
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220] Traceback (most recent call last):
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220]   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 199, in warmup
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220]     await self._preprocess_chat(
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220]   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 1233, in _preprocess_chat
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220]     request_prompt = apply_hf_chat_template(
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220]                      ^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220]   File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/chat_utils.py", line 1826, in apply_hf_chat_template
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220]     raise ChatTemplateResolutionError(
(APIServer pid=1) ERROR 01-06 13:27:11 [serving_chat.py:220] vllm.entrypoints.chat_utils.ChatTemplateResolutionError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
(APIServer pid=1) /opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py:220: RuntimeWarning: coroutine 'AsyncMultiModalItemTracker.all_mm_data' was never awaited
(APIServer pid=1)   logger.exception("Chat template warmup failed")
(APIServer pid=1) RuntimeWarning: Enable tracemalloc to get the object allocation traceback
(APIServer pid=1) INFO 01-06 13:27:12 [api_server.py:1352] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:38] Available routes are:
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /pause, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /resume, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /is_paused, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=1) INFO 01-06 13:27:12 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
  1. curl requests
[Test 1] Invalid logprobs - should have 'param': 'logprobs'
{
    "error": {
        "message": "Requested sample logprobs of 25, which is greater than max allowed: 20 (parameter=logprobs, value=25)",
        "type": "BadRequestError",
        "param": "logprobs",
        "code": 400
    }
}

[Test 2] Invalid temperature - should have 'param': 'temperature'
{
    "error": {
        "message": "temperature must be non-negative, got -1.0.",
        "type": "BadRequestError",
        "param": "temperature",
        "code": 400
    }
}

[Test 3] Stream options without stream - should have 'param': 'stream_options'
{
    "error": {
        "message": "1 validation error:\n  {'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, Stream options can only be defined when `stream=True`. (parameter=stream_options)', 'input': {'model': 'gpt2', 'prompt': 'Hello', 'stream': False, 'stream_options': {'include_usage': True}}, 'ctx': {'error': VLLMValidationError('Stream options can only be defined when `stream=True`.')}}\n\n  File \"/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/utils.py\", line 517, in create_completion\n    POST /v1/completions [{'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, Stream options can only be defined when `stream=True`. (parameter=stream_options)', 'input': {'model': 'gpt2', 'prompt': 'Hello', 'stream': False, 'stream_options': {'include_usage': True}}, 'ctx': {'error': VLLMValidationError('Stream options can only be defined when `stream=True`.')}}]",
        "type": "Bad Request",
        "param": "stream_options",
        "code": 400
    }
}

[Test 4] Too many input tokens - should have 'param': 'input_tokens'
{
    "error": {
        "message": "This model's maximum context length is 1008 tokens. However, your request has 10002 input tokens. Please reduce the length of the input messages. (parameter=input_tokens, value=10002)",
        "type": "BadRequestError",
        "param": "input_tokens",
        "code": 400
    }
}

[Test 5] Invalid top_p - should have 'param': 'top_p'
{
    "error": {
        "message": "top_p must be in (0, 1], got 2.0.",
        "type": "BadRequestError",
        "param": "top_p",
        "code": 400
    }
}

[Test 6] Invalid logprobs (negative) - should have 'param': 'logprobs'
{
    "error": {
        "message": "1 validation error:\n  {'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, `logprobs` must be a positive value. (parameter=logprobs, value=-5)', 'input': {'model': 'gpt2', 'prompt': 'Hello', 'logprobs': -5}, 'ctx': {'error': VLLMValidationError('`logprobs` must be a positive value.')}}\n\n  File \"/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/utils.py\", line 517, in create_completion\n    POST /v1/completions [{'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, `logprobs` must be a positive value. (parameter=logprobs, value=-5)', 'input': {'model': 'gpt2', 'prompt': 'Hello', 'logprobs': -5}, 'ctx': {'error': VLLMValidationError('`logprobs` must be a positive value.')}}]",
        "type": "Bad Request",
        "param": "logprobs",
        "code": 400
    }
}

[Test 7] Invalid prompt_logprobs - should have 'param': 'prompt_logprobs'
{
    "error": {
        "message": "1 validation error:\n  {'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, `prompt_logprobs` must be a positive value or -1. (parameter=prompt_logprobs, value=-3)', 'input': {'model': 'gpt2', 'prompt': 'Hello', 'prompt_logprobs': -3}, 'ctx': {'error': VLLMValidationError('`prompt_logprobs` must be a positive value or -1.')}}\n\n  File \"/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/utils.py\", line 517, in create_completion\n    POST /v1/completions [{'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, `prompt_logprobs` must be a positive value or -1. (parameter=prompt_logprobs, value=-3)', 'input': {'model': 'gpt2', 'prompt': 'Hello', 'prompt_logprobs': -3}, 'ctx': {'error': VLLMValidationError('`prompt_logprobs` must be a positive value or -1.')}}]",
        "type": "Bad Request",
        "param": "prompt_logprobs",
        "code": 400
    }
}

[Test 8] Invalid logit_bias - should have 'param': 'logit_bias'
{
    "error": {
        "message": "token_id(s) [999999] in logit_bias contain out-of-vocab token ids. Vocabulary size: 50257 (parameter=logit_bias, value=[999999])",
        "type": "BadRequestError",
        "param": "logit_bias",
        "code": 400
    }
}
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends the VLLMValidationError to provide more detailed error messages for several validation parameters, improving the API's error reporting. The changes are well-structured, introducing a local _ValueError to prevent circular imports and updating validation logic across sampling_params.py and input_processor.py. The error handling in serving_engine.py is also updated to correctly process these new exception types. Overall, this is a good improvement. I've found one issue in serving_engine.py related to a redundant logical condition that should be simplified for clarity and maintainability.

@R3hankhan123 R3hankhan123 force-pushed the enhance-logging-v2 branch 2 times, most recently from a0d15af to 6f3890b Compare January 7, 2026 10:36
@R3hankhan123 R3hankhan123 force-pushed the enhance-logging-v2 branch 2 times, most recently from baf6058 to 67a77b9 Compare January 7, 2026 10:42
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM now

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 7, 2026 10:43
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 7, 2026
Extend existing VLLMValidationError implementation to populate the param field for additional validation scenarios.

Changes:
- Create vllm/exceptions.py module to hold VLLMValidationError and avoid
  circular import between protocol.py and sampling_params.py
- Move VLLMValidationError from protocol.py to the new exceptions module
- Update sampling_params.py to import and use VLLMValidationError for
  7 validation cases: temperature, top_p, max_tokens, logprobs,
  prompt_logprobs, truncate_prompt_tokens, and bad_words
- Update v1/engine/input_processor.py to use VLLMValidationError for
  logprobs, prompt_logprobs, and logit_bias validations
- Simplify error handling in serving_engine.py by removing duck-typing
  checks that are no longer needed
- Update inline imports in serving_engine.py and api_server.py to use
  the new exceptions module

New validations now populate the `param` field:
- temperature, top_p, logprobs, prompt_logprobs, logit_bias, truncate_prompt_tokens, bad_words

Signed-off-by: Rehan Khan <[email protected]>
auto-merge was automatically disabled January 7, 2026 12:56

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 7, 2026 13:23
@DarkLight1337 DarkLight1337 merged commit 1ab055e into vllm-project:main Jan 7, 2026
49 checks passed
@R3hankhan123 R3hankhan123 deleted the enhance-logging-v2 branch January 7, 2026 16:10
dangoldbj pushed a commit to dangoldbj/vllm that referenced this pull request Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants