GHSA-wvrh-2f4m-924v: ChatterBot: Symlink-Following Arbitrary Write via UbuntuCorpusTrainer
ChatterBot’s UbuntuCorpusTrainer.extract() uses a predictable, home-rooted output directory (~/ubuntu_data/ubuntu_dialogs) with a check-then-create pattern (if not os.path.exists: os.makedirs) followed by tar.extractall(path=self.data_path). A local attacker who pre-plants a symlink at the predictable path causes os.path.exists() to return True (following the symlink), skipping makedirs, and subsequent extractall writes archive contents through the symlink to the attacker-chosen directory.
The existing safe_extract function validates tar member names (zip-slip defense) but does not validate the output directory itself — it cannot detect that self.data_path is a symlink. This is the defining distinction between the archive_extraction (zip-slip) and insecure_fs_create_toctou families.
References
Code Behaviors & Features
Detect and mitigate GHSA-wvrh-2f4m-924v with GitLab Dependency Scanning
Secure your software supply chain by verifying that all open source dependencies used in your projects contain no disclosed vulnerabilities. Learn more about Dependency Scanning →