CVE-2026-53923: vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving
Integer truncation of tensor dimensions in vLLM’s GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users’ inference requests, constituting information disclosure.
References
Code Behaviors & Features
Detect and mitigate CVE-2026-53923 with GitLab Dependency Scanning
Secure your software supply chain by verifying that all open source dependencies used in your projects contain no disclosed vulnerabilities. Learn more about Dependency Scanning →