CVE-2026-53923
vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
CVSS Vector Breakdown
AV:NAttack VectorAC:LAttack ComplexityPR:NPrivileges RequiredUI:NUser InteractionS:UScopeC:HConfidentialityI:NIntegrityA:NAvailabilityWeaknesses
Affected Products
Exploitability
Attack Graph
Click technique nodes to view MITRE ATT&CK details. Scroll to zoom, drag to pan.
MITRE ATT&CK
1 techniqueReferences
Timeline
Unlock Complete Vulnerability Intelligence
Get the full picture for CVE-2026-53923 and every CVE in our database. Create a free account — no credit card required.
Create Free Account