Advisory Database
  • Advisories
  • Dependency Scanning
  1. pypi
  2. ›
  3. nltk
  4. ›
  5. CVE-2021-43854

CVE-2021-43854: Inefficient Regular Expression Complexity in nltk (word_tokenize, sent_tokenize)

January 6, 2022 (updated September 26, 2024)

The vulnerability is present in PunktSentenceTokenizer, sent_tokenize and word_tokenize. Any users of this class, or these two functions, are vulnerable to a Regular Expression Denial of Service (ReDoS) attack. In short, a specifically crafted long input to any of these vulnerable functions will cause them to take a significant amount of execution time. The effect of this vulnerability is noticeable with the following example:

from nltk.tokenize import word_tokenize

n = 8
for length in [10**i for i in range(2, n)]:

References

  • github.com/advisories/GHSA-f8m6-h2c7-8h9x
  • github.com/nltk/nltk
  • github.com/nltk/nltk/commit/1405aad979c6b8080dbbc8e0858f89b2e3690341
  • github.com/nltk/nltk/issues/2866
  • github.com/nltk/nltk/pull/2869
  • github.com/nltk/nltk/security/advisories/GHSA-f8m6-h2c7-8h9x
  • github.com/pypa/advisory-database/tree/main/vulns/nltk/PYSEC-2021-859.yaml
  • nvd.nist.gov/vuln/detail/CVE-2021-43854

Code Behaviors & Features

Detect and mitigate CVE-2021-43854 with GitLab Dependency Scanning

Secure your software supply chain by verifying that all open source dependencies used in your projects contain no disclosed vulnerabilities. Learn more about Dependency Scanning →

Affected versions

All versions before 3.6.6

Fixed versions

  • 3.6.6

Solution

Upgrade to version 3.6.6 or above.

Impact 7.5 HIGH

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Learn more about CVSS

Weakness

  • CWE-400: Uncontrolled Resource Consumption

Source file

pypi/nltk/CVE-2021-43854.yml

Spotted a mistake? Edit the file on GitLab.

  • Site Repo
  • About GitLab
  • Terms
  • Privacy Statement
  • Contact

Page generated Wed, 14 May 2025 12:14:55 +0000.