Advisories for Pypi/Docling package

2026

Docling: Unsafe Zip Extraction in EasyOCR Model Download

In versions < 2.91.0, The EasyOCR model download functionality extracted ZIP archives without validating member paths, enabling Zip Slip attacks. If an attacker could compromise the model download source (via supply chain attack, DNS spoofing, or MITM), they could write arbitrary files to any location writable by the process, potentially achieving: Remote code execution by overwriting Python files or system binaries Persistent backdoors by modifying startup scripts or SSH keys …

Docling: Unsafe XML Entity Expansion in USPTO Patent Backend

The USPTO patent XML parser used the standard xml.sax.parseString() without protection against XML External Entity (XXE) attacks. An attacker could craft malicious USPTO patent XML files with external entity references that could: Read arbitrary files from the server filesystem Perform Server-Side Request Forgery (SSRF) attacks Cause denial of service through entity expansion (Billion Laughs attack) The vulnerability affects three USPTO patent format parsers: ICE (v4.x), Grant v2.5, and Application v1.x.

Docling: Unsafe URI and Path Handling in HTML Backend

The HTML backend did not perform sufficient validation during resource handling: Accepted file:// URIs enabling local file system access when enable_local_fetch=True Path resolution allowed traversal outside intended directories via ../ sequences and absolute paths Did not block internal network resources under enable_remote_fetch=True HTTP redirects were not validated, potentially redirecting to unintended schemes No resource limits for remote image downloads and data: URIs

Docling: Unsafe Playwright-based HTML Rendering

In versions >= 2.82.0, < 2.91.0, if the HTML backend was explicitly configured for rendering (rendering option by default deactivated), then the Playwright-based rendering feature could allow JavaScript execution and unrestricted network access when processing untrusted HTML documents. An attacker could craft malicious HTML that executes arbitrary JavaScript in the rendering context or makes unauthorized network requests to internal services, potentially leading to SSRF attacks, data exfiltration, or remote code …

Docling: Unsafe Archive Extraction and XML Parsing in METS-GBS Backend

The METS-GBS backend's XML parsing and the input document format detection lacked security controls, enabling: XML External Entity (XXE) attacks to read local files or cause denial of service Decompression bombs (zip bombs) to exhaust memory and disk space Unbounded archive extraction consuming system resources An attacker could craft malicious METS-GBS archives that, when processed, could read sensitive files, exhaust system resources, or cause application crashes.

Docling: Potential Path Traversal via LaTeX \includegraphics and \input Commands

The LaTeX backend's handling of \includegraphics, \input, and \include commands lacked path containment validation. Attackers could craft malicious LaTeX documents with path traversal sequences (e.g., ../../../etc/passwd) to: Read arbitrary files from the file system accessible to the process Include sensitive files in the converted document output Potentially access configuration files, credentials, or other sensitive data

Docling's METS GBS backend is vulnerable to XML Entity Expansion (XXE) attacks

Docling's METS GBS backend is vulnerable to XML Entity Expansion (XXE) attacks thru 2.61.0. The backend extracts and validates XML files from .tar.gz archives using etree.fromstring() without disabling entity resolution. An attacker can craft a malicious XML file with nested entity definitions (XML Bomb) and package it into a .tar.gz archive. When processed by Docling, the exponential expansion of entities during XML parsing leads to excessive resource consumption, resulting in …

Docling's JATS XML backend is vulnerable to XML Entity Expansion (XXE) attacks

Docling's JATS XML backend is vulnerable to XML Entity Expansion (XXE) attacks thru 2.61.0. The backend uses etree.parse() to parse XML files without disabling entity resolution. An attacker can craft a malicious XML file containing a nested entity expansion payload (XML Bomb). When processed by Docling, the exponential expansion of entities leads to excessive resource consumption, resulting in a denial of service (DoS) condition on the system running the Docling …