In versions < 2.91.0, The EasyOCR model download functionality extracted ZIP archives without validating member paths, enabling Zip Slip attacks. If an attacker could compromise the model download source (via supply chain attack, DNS spoofing, or MITM), they could write arbitrary files to any location writable by the process, potentially achieving: Remote code execution by overwriting Python files or system binaries Persistent backdoors by modifying startup scripts or SSH keys …
The USPTO patent XML parser used the standard xml.sax.parseString() without protection against XML External Entity (XXE) attacks. An attacker could craft malicious USPTO patent XML files with external entity references that could: Read arbitrary files from the server filesystem Perform Server-Side Request Forgery (SSRF) attacks Cause denial of service through entity expansion (Billion Laughs attack) The vulnerability affects three USPTO patent format parsers: ICE (v4.x), Grant v2.5, and Application v1.x.
The HTML backend did not perform sufficient validation during resource handling: Accepted file:// URIs enabling local file system access when enable_local_fetch=True Path resolution allowed traversal outside intended directories via ../ sequences and absolute paths Did not block internal network resources under enable_remote_fetch=True HTTP redirects were not validated, potentially redirecting to unintended schemes No resource limits for remote image downloads and data: URIs
In versions >= 2.82.0, < 2.91.0, if the HTML backend was explicitly configured for rendering (rendering option by default deactivated), then the Playwright-based rendering feature could allow JavaScript execution and unrestricted network access when processing untrusted HTML documents. An attacker could craft malicious HTML that executes arbitrary JavaScript in the rendering context or makes unauthorized network requests to internal services, potentially leading to SSRF attacks, data exfiltration, or remote code …
The METS-GBS backend's XML parsing and the input document format detection lacked security controls, enabling: XML External Entity (XXE) attacks to read local files or cause denial of service Decompression bombs (zip bombs) to exhaust memory and disk space Unbounded archive extraction consuming system resources An attacker could craft malicious METS-GBS archives that, when processed, could read sensitive files, exhaust system resources, or cause application crashes.
The LaTeX backend's handling of \includegraphics, \input, and \include commands lacked path containment validation. Attackers could craft malicious LaTeX documents with path traversal sequences (e.g., ../../../etc/passwd) to: Read arbitrary files from the file system accessible to the process Include sensitive files in the converted document output Potentially access configuration files, credentials, or other sensitive data
Docling's METS GBS backend is vulnerable to XML Entity Expansion (XXE) attacks thru 2.61.0. The backend extracts and validates XML files from .tar.gz archives using etree.fromstring() without disabling entity resolution. An attacker can craft a malicious XML file with nested entity definitions (XML Bomb) and package it into a .tar.gz archive. When processed by Docling, the exponential expansion of entities during XML parsing leads to excessive resource consumption, resulting in …
Docling's JATS XML backend is vulnerable to XML Entity Expansion (XXE) attacks thru 2.61.0. The backend uses etree.parse() to parse XML files without disabling entity resolution. An attacker can craft a malicious XML file containing a nested entity expansion payload (XML Bomb). When processed by Docling, the exponential expansion of entities leads to excessive resource consumption, resulting in a denial of service (DoS) condition on the system running the Docling …