Advisory Database
  • Advisories
  • Dependency Scanning
  1. pypi
  2. ›
  3. vllm
  4. ›
  5. CVE-2026-53923

CVE-2026-53923: vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving

June 17, 2026

Integer truncation of tensor dimensions in vLLM’s GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users’ inference requests, constituting information disclosure.

References

  • github.com/advisories/GHSA-5jv2-g5wq-cmr4
  • github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e
  • github.com/vllm-project/vllm/pull/44971
  • github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4
  • nvd.nist.gov/vuln/detail/CVE-2026-53923

Code Behaviors & Features

Detect and mitigate CVE-2026-53923 with GitLab Dependency Scanning

Secure your software supply chain by verifying that all open source dependencies used in your projects contain no disclosed vulnerabilities. Learn more about Dependency Scanning →

Affected versions

All versions starting from 0.5.5 up to 0.23.0

Solution

Unfortunately, there is no solution available yet.

Impact 5.4 MEDIUM

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:L/I:L/A:N

Learn more about CVSS

Weakness

  • CWE-200: Exposure of Sensitive Information to an Unauthorized Actor
  • CWE-681: Incorrect Conversion between Numeric Types

Source file

pypi/vllm/CVE-2026-53923.yml

Spotted a mistake? Edit the file on GitLab.

  • Site Repo
  • About GitLab
  • Terms
  • Privacy Statement
  • Contact

Page generated Thu, 18 Jun 2026 12:20:08 +0000.