CVE-2026-44223
vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
CVSS Vector Breakdown
AV:NAttack VectorAC:LAttack ComplexityPR:LPrivileges RequiredUI:NUser InteractionS:UScopeC:NConfidentialityI:NIntegrityA:HAvailabilityWeaknesses
Affected Products
Exploitability
Attack Graph
Click technique nodes to view MITRE ATT&CK details. Scroll to zoom, drag to pan.
MITRE ATT&CK
1 techniqueReferences
Timeline
Unlock Complete Vulnerability Intelligence
Get the full picture for CVE-2026-44223 and every CVE in our database. Create a free account — no credit card required.
Create Free Account