CVE-2026-44223: vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
The extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty).
A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. The crash is deterministic and immediate — no concurrency, race condition, or special workload is required.
References
Code Behaviors & Features
Detect and mitigate CVE-2026-44223 with GitLab Dependency Scanning
Secure your software supply chain by verifying that all open source dependencies used in your projects contain no disclosed vulnerabilities. Learn more about Dependency Scanning →