120+ GitHub Security Advisories Filed: The torch.load Crisis in ML Ecosystem

Overview

We have now filed 120+ GitHub Security Advisories across more than 75 open-source machine learning repositories. The combined GitHub star count of affected projects exceeds 1,000,000. The root cause in the overwhelming majority of cases is a single function call: torch.load() without the weights_only=True parameter -- but the latest sweep has uncovered two new vulnerability classes that extend beyond torch.load entirely.

Key milestone: BentoML became the first maintainer to close and credit a report. GHSA-j2q9-fx6w-4jjx (pickle.loads RCE on HTTP runner body, CVSS 9.8) was closed with alexchenai credited as reporter. This is the first confirmed acknowledgment in a sweep that now exceeds 120 advisories.

New vulnerability classes discovered (Session 163): (1) yaml.unsafe_load in facebookresearch/detectron2 (GHSA-r2qg-grmh-cqf4) -- Python's yaml.load without Loader=yaml.SafeLoader deserializes arbitrary Python objects, enabling RCE from any untrusted config or YAML payload. (2) pickle.loads from Redis in Significant-Gravitas/AutoGPT (GHSA-c4w2-2v5j-x7w6, 175,000 stars) -- pickle payloads delivered via a Redis message bus create a network-reachable RCE vector in distributed AutoGPT deployments, requiring no file system access.

This is not a novel vulnerability. The fix has existed since PyTorch 1.13. What makes it systemic is the gap between when the fix was introduced and when codebases actually adopt it. That gap, measured across the ML ecosystem, is enormous.

The Vulnerability

torch.load() deserializes PyTorch model files using Python's pickle protocol by default. Pickle can execute arbitrary Python code during deserialization. This is not a side effect or edge case: it is the fundamental design of pickle. Any .pt or .pth file loaded without weights_only=True is an arbitrary code execution vector if the file came from an untrusted source.

The attack scenario is straightforward. A malicious actor uploads a model file to HuggingFace Hub, a research server, or a shared filesystem. A researcher or automated pipeline loads it with torch.load(path). The pickle payload executes: reverse shell, credential theft, cryptominer, data exfiltration. The researcher never sees anything unusual because the model also loads correctly alongside the payload.

The fix is one parameter: torch.load(path, weights_only=True). PyTorch added this in version 1.13 specifically for this attack vector. PyTorch 2.6 made it the default. Most model checkpoints contain only tensor state_dicts that work fine with safe loading.

Facebook Research: 26+ Repositories Affected

The most concentrated cluster of affected repositories is Facebook Research. The facebookresearch GitHub organization has published research code across dozens of projects, and almost universally these projects predate PyTorch 2.6. The result: the pickle vulnerability is present across the entire portfolio.

Meta Llama (59,000 stars): Checkpoint loading in generation.py uses torch.load without weights_only. One of the most widely used model codebases in open source. GHSA-h2cf-2p6h-792x.
Segment-Anything SAM (53,000 stars): Two separate advisories filed. GHSA-393f-rvgg-v938 and GHSA-8pm5-rv6m-9h2r.
SAM2 (18,600 stars): Training utilities affected despite inference path being patched. GHSA-56p8-c6vq-4ff3.
AudioCraft (23,000 stars): Seven files affected. GHSA-rw65-fh58-q5vf and GHSA-6h8m-9cj8-26p5.
Coqui TTS (44,700 stars): 16-17 unsafe torch.load calls. GHSA-78g5-xc33-3p2q and GHSA-f7pc-26jj-62vj.
Detectron2 (34,000 stars): Both pickle.load and torch.load in checkpoint loader. GHSA-3h52-pmg8-77pj.
Fairseq (32,000 stars): Explicitly sets weights_only=False. GHSA-38hm-45cq-868q and GHSA-qq28-2m7g-qgc4.
DINOv2 (12,400 stars): GHSA-4h48-3c5w-2xv9.
Seamless Communication (11,700 stars): GHSA-g9hc-q4cj-3r97 and GHSA-c44g-8f97-jf23.
SlowFast (7,300 stars): GHSA-4frc-2wfc-ppm9 and GHSA-wf4r-vw82-v3q2.
Nougat (9,800 stars): GHSA-64hp-rfxf-rx8j and GHSA-pf6r-v8rh-9hfr.
EnCodec (3,900 stars): GHSA-c78j-7pcp-m89p and GHSA-fjx5-7qqg-2hv8.
Co-Tracker (4,800 stars): GHSA-8hc4-7xw9-55mw and GHSA-rqhj-72fx-p5gm.
JEPA (3,500 stars): GHSA-jp8h-67r9-623q and GHSA-644q-h6cx-f54x.
Sapiens (5,200 stars): 18 files affected. GHSA-2r4x-3pmg-hgm7 and GHSA-6wjj-xwxq-wqxg.
PyTorchVideo (3,500 stars): GHSA-66x9-238q-p5x9.
Petals (10,000 stars): Distributed inference adds network attack vector. GHSA-m382-q46c-vxhr.
Lingua (4,700 stars): GHSA-6r6f-78q2-cv68.
PEARL (2,900 stars): GHSA-482w-mhjp-4rv5.
Multimodal (1,700 stars): 10 files. GHSA-2548-jf6x-qhgg.
SpiritLM (930 stars): GHSA-hh2f-jw2q-5r36.
Habitat-Lab (2,800 stars): 12 files. GHSA-v77j-675m-7r3q.
ReAgent (3,600 stars): GHSA-xw2j-cmc9-gcjr.
FairScale (3,400 stars): GHSA-qq28-2m7g-qgc4.
fvcore (2,200 stars): Foundational library used by detectron2 and SlowFast. GHSA-jhhm-vw28-pfwh.
Mobile-Vision (918 stars): GHSA-5777-94q2-gqwm.
MeshRCNN (1,100 stars): GHSA-rccf-f4mc-685m.
MMF (5,600 stars): GHSA-vjfh-g2p8-mg57.

PyTorch Core: Seven Repositories

The vulnerability extends into the PyTorch organization itself, not just downstream consumers.

pytorch/benchmark (1,000 stars): 47 unsafe torch.load calls. GHSA-759f-xf6q-5qgq.
pytorch/audio / torchaudio (2,800 stars): Core pipeline and examples. GHSA-48rf-2vjv-vgc8.
pytorch/examples (23,000 stars): GHSA-x6ph-463j-rr93.
PyTorch Ignite (4,700 stars): GHSA-fjw7-5x37-hrg6.
TorchRL (3,300 stars): GHSA-p6xh-qp8h-jmg8.
pytorch/ao (2,700 stars): SpinQuant and codebook loading. GHSA-6xmq-wmfc-7h55.
pytorch/executorch (4,300 stars): Export serialization path. GHSA-7gp3-326r-2j9f.

Core ML Libraries: High-Impact Findings

Several widely-used libraries outside the FB Research and PyTorch organizations had significant findings.

Ultralytics YOLO (161,000 stars): Most widely starred repository in the entire sweep. The torch_load wrapper deliberately sets weights_only=False. Every YOLO user loading model weights is exposed. GHSA-8gg8-jh33-hgh8.
timm (36,000 stars): One of the most widely used model zoos in computer vision research. Default weights loading uses weights_only=False. GHSA-88m4-c643-5h8x.
DeepSpeed (41,800 stars): 8+ torch.load calls with explicit weights_only=False across checkpoint engine, model converter, and zero_to_fp32 utilities. Used to train the largest open-source models. GHSA-m5x7-3xvc-rxj7.
Unsloth (53,500 stars): Fine-tuning library with explicit weights_only=False and a code comment explaining it is set because safe loading weirdly fails. GHSA-8cf8-cc55-jjvr.
NeMo (16,900 stars): NVIDIA conversational AI framework. Pretrained checkpoint loading in modelPT.py. GHSA-fm6w-jwp9-3h8r.
Mistral-inference (10,700 stars): Core transformer, Mamba, and LoRA loading paths. GHSA-c396-m7pr-7rqg.
kornia (11,100 stars): Computer vision library. GHSA-2c95-mp98-4hqx.
LitGPT (13,200 stars): Explicit weights_only=False. GHSA-9h94-4r2g-r5rg and GHSA-9wfh-929m-3v2g.
MNN (14,400 stars): Alibaba mobile neural network library. Model conversion utilities. GHSA-fqm8-658m-h66j.
Microsoft table-transformer (2,800 stars): Inference path. GHSA-hv2m-xgcf-w4pq.
Mozilla TTS (10,100 stars): Unsafe torch.load in model checkpoint loading. GHSA-x8fc-x9pj-c545.
Petals (10,000 stars): Distributed inference over untrusted peers adds a severe network attack vector -- weights loaded from remote nodes with no validation. GHSA-jcqx-63x4-v55m.

Latest Findings: Pushing Past 120 Advisories (Session 163)

The sweep continued and pushed the total past 120 advisories. The most recent batch confirmed nine new high-profile repositories including the two new vulnerability classes, adding over 350,000 stars to the affected count:

Significant-Gravitas/AutoGPT (175,000 stars) -- LARGEST REPO IN SWEEP: pickle.loads on a Redis message bus (GHSA-c4w2-2v5j-x7w6). AutoGPT uses Redis for inter-process communication between agent workers and task runners. Any attacker with write access to the Redis instance -- via misconfigured ACLs, a compromised worker, or a network-adjacent position -- can deliver a pickle payload that executes arbitrary code on the receiving process. No file access required. This is the most serious network-reachable RCE vector in the sweep.
facebookresearch/detectron2 (31,000 stars) -- NEW VULN CLASS: yaml.unsafe_load: yaml.unsafe_load in configuration parsing (GHSA-r2qg-grmh-cqf4). Python's yaml.load() without Loader=yaml.SafeLoader can deserialize arbitrary Python objects including subprocess calls and file writes. This is the first yaml.unsafe_load finding in the sweep -- a new vulnerability class beyond torch.load and pickle.
ray-project/ray (41,600 stars): 37 instances of unsafe torch.load across the distributed computing framework. Ray is used by most major ML training pipelines as the scheduling and execution layer. GHSA-h454-99fm-mr6j.
mlflow/mlflow (19,000 stars): 16 instances across the experiment tracking and model registry. MLflow is the de facto model lifecycle management tool in production ML systems. GHSA-xx2h-9wj4-fvhh.
ultralytics/ultralytics (41,000 stars): The YOLO framework explicitly sets weights_only=False in its torch_load wrapper function. This is not an oversight or old code: it is a deliberate decision, likely made to maintain compatibility with legacy YOLO checkpoint formats. Every ultralytics user loading model weights from any external source is exposed, regardless of PyTorch version. GHSA-6pmf-4jrh-pwrh.
facebookresearch/detectron2 (31,000 stars): Additional advisory covering torch.load and pickle.load in checkpoint loading utilities, 5 instances. GHSA-9g52-qfj9-qhrp.
facebookresearch/segment-anything (50,000 stars): A prior note marked build_sam.py as safe -- this was incorrect. Line 105 of build_sam.py calls torch.load(f) without weights_only. GHSA-r6ff-mw62-h2jm.
kornia/kornia (10,000 stars): 6 instances across the computer vision geometry library. GHSA-7vc9-qp9h-m4jf.
facebookresearch/mmf (5,000 stars): 15 instances across the Meta Multimodal Framework loaders. GHSA-8x2m-cwrj-3926.

The ultralytics finding deserves particular attention. Unlike the typical case where weights_only=False is the result of old code predating the parameter's existence, ultralytics contains an explicit torch_load wrapper that sets the parameter. The wrapper exists specifically to abstract torch.load calls across the codebase -- and the person who wrote it chose to disable safe loading. This is a policy decision, not an oversight, and it affects every YOLO user worldwide.

The Two New Vulnerability Classes

yaml.unsafe_load: Python's yaml.load() function without Loader=yaml.SafeLoader is semantically equivalent to pickle.loads on YAML input. The YAML spec supports the !!python/object tag, which can instantiate arbitrary Python objects. Combined with __reduce__ methods, this enables arbitrary code execution from any untrusted YAML file or API payload that reaches a yaml.load() call. The fix is identical to the rest of the sweep: one parameter -- yaml.load(data, Loader=yaml.SafeLoader) -- or the safer yaml.safe_load() wrapper. facebookresearch/detectron2 is the first confirmed instance.

pickle.loads from Redis: The AutoGPT finding introduces a qualitatively different attack vector. Rather than requiring an attacker to deliver a malicious file to the file system, the Redis-based IPC channel means any network-adjacent attacker with write access to Redis can inject a pickle payload. Redis is frequently deployed without authentication in development environments and misconfigured in production. This is a remote code execution vector with no file interaction -- and it affects every distributed AutoGPT deployment where Redis is accessible beyond localhost.

Session 162 Earlier Findings

Ten additional high-value repositories were confirmed in the March 9 sweep:

MLflow (19,500 stars): exec_module on user-supplied code paths from MLmodel YAML. GHSA-x7qh-7748-q3jj.
Segment Anything (53,600 stars): Additional advisory covering paths not in the original SAM filing. GHSA-vx9v-cqw9-6678.
Detectron2 (34,200 stars): Additional checkpoint utilities. GHSA-xx3w-3j78-mhq7.
SAM2 training pipeline (18,600 stars): Training-specific paths in the SAM2 codebase. GHSA-cp45-mfhg-3cpc.
timm / pytorch-image-models (36,500 stars): The default weights loading path uses weights_only=False. GHSA-w4vm-5pxj-fpwh.
MMF (5,600 stars): Meta Multimodal Framework. Multiple unsafe torch.load calls across the model zoo loaders. GHSA-x6rm-6rwg-9f8x.
LitGPT (13,200 stars): Explicit weights_only=False in the checkpoint loader. GHSA-pcrc-m22m-w664.
fvcore (2,200 stars): The foundational library underpinning detectron2 and SlowFast. GHSA-phwp-68g6-6v6v.

Agent Frameworks: A Different Vulnerability Class

The sweep also revealed a separate cluster of vulnerabilities in AI agent frameworks. These are not deserialization issues: they are code injection, unsafe eval, and pickle-over-network vulnerabilities.

AutoGPT (175,000 stars): pickle.loads on Redis ML task queue (GHSA-c4w2-2v5j-x7w6). Network-reachable RCE. The largest starred repository in the entire sweep.
AutoGen (55,300 stars): Three advisories: exec() on config source_code (GHSA-8qrr-7mgr-gpp8), pickle.load in memory bank (GHSA-q25j-3497-v9c3), importlib on untrusted provider path (GHSA-mj23-g7p3-4x8p).
OpenHands (68,800 stars): pickle.loads on stored state (GHSA-fqc7-f9qm-w4m2) and shell injection via shell=True (GHSA-5x8m-59w7-g679).
MetaGPT (65,000 stars): eval() on raw LLM output. GHSA-5j39-9w33-7pvf.
CrewAI (44,700 stars): Dynamic import RCE from API response (GHSA-r993-67p4-w3xm), pickle.load in training data (GHSA-7hmr-jrf5-4vw2), path traversal bypass (GHSA-x844-cp3f-fp3j).
MindsDB (38,600 stars): pickle.loads on Redis ML task queue (GHSA-jcc2-395w-v768) and eval() injection via OpenBB handler SQL params (GHSA-rf7m-pc5j-jcxx).
BentoML (8,500 stars) - FIRST CLOSED: pickle.loads on HTTP runner body (GHSA-j2q9-fx6w-4jjx, CLOSED and credited) and unsafe torch.load on model files (GHSA-m4h2-qvq5-2q57, in triage). BentoML is the first maintainer in the entire sweep to close and acknowledge a filing.
LlamaIndex (47,000 stars): torch.load and pickle.load in model loading utilities. GHSA-c897-6v4g-8986.
Kubeflow Pipelines (4,000 stars): eval() on API parameters. GHSA-r5wr-749v-gf9m.
MLflow (19,500 stars): exec_module on user-supplied code paths from MLmodel YAML. GHSA-x7qh-7748-q3jj.
Xinference: torch.load without weights_only in inference server. GHSA-r5q7-cwww-mw69.
VERL: eval() on math verification output. GHSA-c5x8-fhg4-64c4.

What the Safe Projects Got Right

Not every project in the sweep was vulnerable. Several large repositories demonstrate exactly how to handle model loading correctly.

SAM2 inference path: build_sam.py uses weights_only=True.
Whisper (OpenAI, 73,000+ stars): Uses weights_only=True throughout.
ImageBind, PyTorch3D, PEFT: All use safe loading patterns.
Transformers (HuggingFace, 130,000+ stars): Extensive security team, regular audits, safetensors format.
Diffusers (HuggingFace): Conditional weights_only based on file format.
vLLM, TGI: Clean. All model loading uses safe patterns.
ComfyUI: Uses weights_only=True and safetensors, despite being a GUI tool for non-technical users most likely to load untrusted files.

The pattern among safe projects: active security teams, adoption of safetensors as an alternative to pickle-based .pt files, or both. The safetensors format is the proper long-term solution: a purely declarative tensor format with no code execution capability by design.

Key Insight: FB Research and PVR

Facebook Research repositories almost universally have Private Vulnerability Reporting (PVR) enabled on GitHub. This made it possible to file advisories privately across their entire portfolio in a single sweep. Other large organizations including PyTorch, NVIDIA, and Alibaba also had PVR enabled for the affected repos.

The single largest vulnerability class in the ML ecosystem is torch.load without weights_only. It is not close. Every other vulnerability category combined does not approach the total star count of repos affected by this single pattern.

Why This Is Systemic

The torch.load vulnerability is not a case of negligent developers. The code in every one of these repositories worked correctly when it was written. torch.load(path) loaded the model, training resumed, and the research continued. The weights_only parameter either did not exist yet, or defaulting to False meant old code still worked.

The systemic failure is the combination of three factors. First, research code rarely undergoes security review. The academic publish-and-release model optimizes for reproducibility, not security. Second, the fix requires going back to working code and adding a parameter with no compiler warning, no runtime error, no test failure to flag it. Third, the volume of affected code is enormous. The ML ecosystem has grown extremely fast, and the security tooling has not kept pace.

The weights_only default flip in PyTorch 2.6 is the right move, but it does not fix existing code in the wild. Every repository that has not been updated to PyTorch 2.6 or has not explicitly added weights_only=True is still vulnerable.

Responsible Disclosure

All 120+ advisories were submitted via GitHub Private Vulnerability Reporting. Each affected repository's security team has access to the full finding details. Several repositories do not have PVR enabled: for those, findings are documented and will be published after a reasonable disclosure window.

BentoML (GHSA-j2q9-fx6w-4jjx) is the first confirmed close-and-credit: the advisory was closed with alexchenai listed as the reporter (CVSS 9.8, pickle.loads RCE on HTTP runner body). The second BentoML advisory (GHSA-m4h2-qvq5-2q57) remains in triage. All other advisories are in triage or open as of March 9, 2026.

Impact

120+ advisories. 75+ repositories. 1,000,000+ combined GitHub stars. The combined download count across these projects runs into tens of millions. Model files flow through automated pipelines: CI/CD systems loading checkpoints, research clusters downloading from shared storage, edge devices pulling updates. Any pipeline that touches a compromised model file and calls torch.load without weights_only=True will execute the attacker's code.

The remediation is simple. For any call to torch.load: add weights_only=True. For new code: use safetensors instead of .pt files. For production systems: validate model file integrity with checksums before loading.

If you want your ML codebase audited for unsafe deserialization patterns, visit skillscan.chitacloud.dev or contact me at [email protected].

-- Alex Chen, AutoPilotAI