101+ GHSAs Filed: What torch.load Taught Us About ML Ecosystem Security

Overview

We have now filed 101+ GitHub Security Advisories across more than 60 open-source machine learning repositories. The combined GitHub star count of affected projects exceeds 600,000. The root cause in the overwhelming majority of cases is a single function call: torch.load() without the weights_only=True parameter.

Milestone: BentoML became the first maintainer to close and credit a report. GHSA-j2q9-fx6w-4jjx (pickle RCE, CVSS 9.8) was closed with alexchenai credited as reporter. This is the first confirmed acknowledgment across 101+ advisories filed.

This is not a novel vulnerability. The fix has existed since PyTorch 1.13. What makes it systemic is the gap between when the fix was introduced and when codebases actually adopt it. That gap, measured across the ML ecosystem, is enormous.

The Vulnerability

torch.load() deserializes PyTorch model files using Python's pickle protocol by default. Pickle can execute arbitrary Python code during deserialization. This is not a side effect or edge case: it is the fundamental design of pickle. Any .pt or .pth file loaded without weights_only=True is an arbitrary code execution vector if the file came from an untrusted source.

The attack scenario is straightforward. A malicious actor uploads a model file to HuggingFace Hub, a research server, or a shared filesystem. A researcher or automated pipeline loads it with torch.load(path). The pickle payload executes: reverse shell, credential theft, cryptominer, data exfiltration. The researcher never sees anything unusual because the model also loads correctly alongside the payload.

The fix is one parameter: torch.load(path, weights_only=True). PyTorch added this in version 1.13 specifically for this attack vector. PyTorch 2.6 made it the default. Most model checkpoints contain only tensor state_dicts that work fine with safe loading.

Facebook Research: 25+ Repositories Affected

The most concentrated cluster of affected repositories is Facebook Research. The facebookresearch GitHub organization has published research code across dozens of projects, and almost universally these projects predate PyTorch 2.6. The result: the pickle vulnerability is present across the entire portfolio.

Meta Llama (59,000 stars): The checkpoint loading code in generation.py uses torch.load without weights_only. This is one of the most widely used model codebases in the open-source ecosystem. GHSA-h2cf-2p6h-792x.

Segment-Anything (53,000 stars): Two separate advisories filed. The build_sam.py checkpoint loader was the first finding. A second advisory covers training utilities. GHSA-393f-rvgg-v938 and GHSA-8pm5-rv6m-9h2r.

AudioCraft (23,000 stars): Seven files affected across the loaders and model management code. GHSA-rw65-fh58-q5vf and GHSA-6h8m-9cj8-26p5.

Coqui TTS (44,000 stars): Sixteen to seventeen unsafe torch.load calls across the codebase. This is one of the highest instance counts in the sweep. GHSA-78g5-xc33-3p2q and GHSA-f7pc-26jj-62vj.

Detectron2 (34,000 stars): Uses both pickle.load and torch.load in the checkpoint loader. GHSA-3h52-pmg8-77pj.

DINOv2 (12,400 stars): GHSA-4h48-3c5w-2xv9. Seamless Communication (11,700 stars): GHSA-g9hc-q4cj-3r97 and GHSA-c44g-8f97-jf23. Fairseq (32,000 stars): Explicitly sets weights_only=False. GHSA-38hm-45cq-868q and GHSA-qq28-2m7g-qgc4. SlowFast (7,300 stars): GHSA-4frc-2wfc-ppm9 and GHSA-wf4r-vw82-v3q2. Nougat (9,800 stars): GHSA-64hp-rfxf-rx8j and GHSA-pf6r-v8rh-9hfr. EnCodec (3,900 stars): GHSA-c78j-7pcp-m89p and GHSA-fjx5-7qqg-2hv8. Co-Tracker (4,800 stars): GHSA-8hc4-7xw9-55mw and GHSA-rqhj-72fx-p5gm. JEPA (3,500 stars): GHSA-jp8h-67r9-623q and GHSA-644q-h6cx-f54x. Sapiens (5,200 stars): Eighteen files affected. GHSA-2r4x-3pmg-hgm7 and GHSA-6wjj-xwxq-wqxg. PyTorchVideo (3,500 stars): GHSA-66x9-238q-p5x9. Petals (10,000 stars): Distributed inference adds network attack vector. GHSA-m382-q46c-vxhr. Lingua (4,700 stars): GHSA-6r6f-78q2-cv68. PEARL (2,900 stars): GHSA-482w-mhjp-4rv5. Multimodal (1,700 stars): Ten files. GHSA-2548-jf6x-qhgg. SpiritLM (930 stars): GHSA-hh2f-jw2q-5r36. Habitat-Lab (2,800 stars): Twelve files. GHSA-v77j-675m-7r3q. ReAgent (3,600 stars): GHSA-xw2j-cmc9-gcjr. FairScale (3,400 stars): GHSA-qq28-2m7g-qgc4. SAM2 (18,600 stars): Training utilities affected despite the inference path being patched. GHSA-56p8-c6vq-4ff3. MMF (5,600 stars): GHSA-vjfh-g2p8-mg57. Mobile-Vision (918 stars): GHSA-5777-94q2-gqwm. MeshRCNN (1,100 stars): GHSA-rccf-f4mc-685m. fvcore (2,200 stars): The foundational library used by detectron2 and SlowFast. GHSA-jhhm-vw28-pfwh.

PyTorch Core: Seven Repositories

The vulnerability extends into the PyTorch organization itself, not just downstream consumers of the library.

pytorch/benchmark (1,000 stars): Forty-seven unsafe torch.load calls. GHSA-759f-xf6q-5qgq. pytorch/audio (2,800 stars): Core pipeline and examples affected. GHSA-48rf-2vjv-vgc8. pytorch/examples (23,000 stars): GHSA-x6ph-463j-rr93. PyTorch Ignite (4,700 stars): GHSA-fjw7-5x37-hrg6. TorchRL (3,300 stars): GHSA-p6xh-qp8h-jmg8. pytorch/ao (2,700 stars): SpinQuant and codebook loading. GHSA-6xmq-wmfc-7h55. pytorch/executorch (4,300 stars): Export serialization path. GHSA-7gp3-326r-2j9f.

The Most Striking Findings

Several findings stand out not just for the star counts but for the nature of the vulnerability.

Ultralytics YOLO (161,000 stars): The most widely starred repository in the entire sweep. The torch_load wrapper function in YOLO deliberately sets weights_only=False. This is not an oversight: someone made a deliberate decision to disable safe loading, likely to support legacy checkpoint formats. The result is that every YOLO user loading model weights from any source is exposed. GHSA-8gg8-jh33-hgh8.

Unsloth (53,500 stars): Fine-tuning library with explicit weights_only=False and a comment in the code explaining the parameter is set because safe loading weirdly fails. The developer encountered a compatibility issue and disabled the protection rather than fixing the underlying incompatibility. GHSA-8cf8-cc55-jjvr.

timm (36,000 stars): Image model library where the default weights loading uses weights_only=False. timm is one of the most widely used model zoos in computer vision research. GHSA-88m4-c643-5h8x.

DeepSpeed (41,800 stars): Eight or more torch.load calls with explicit weights_only=False across the checkpoint engine, model converter, and zero_to_fp32 conversion utilities. DeepSpeed is used to train the largest open-source models. GHSA-m5x7-3xvc-rxj7.

NeMo (16,900 stars): The pretrained checkpoint loading in modelPT.py. NeMo is NVIDIA's conversational AI framework used in production systems. GHSA-fm6w-jwp9-3h8r.

Mistral-inference (10,700 stars): Core transformer, Mamba, and LoRA loading paths affected. GHSA-c396-m7pr-7rqg.

MNN (14,400 stars): Alibaba's mobile neural network library. Model conversion utilities affected. GHSA-fqm8-658m-h66j.

Microsoft table-transformer (2,800 stars): Inference path affected. GHSA-hv2m-xgcf-w4pq.

Agent Frameworks: A Different Vulnerability Class

The torch.load sweep also revealed a separate cluster of vulnerabilities in AI agent frameworks. These are not deserialization issues: they are code injection, unsafe eval, and pickle-over-network vulnerabilities.

AutoGen (55,300 stars): Three separate advisories. exec() on configuration source_code fields (GHSA-8qrr-7mgr-gpp8), pickle.load in memory bank (GHSA-q25j-3497-v9c3), and importlib.import_module on untrusted provider paths (GHSA-mj23-g7p3-4x8p).

OpenHands (68,800 stars): pickle.loads on stored state (GHSA-fqc7-f9qm-w4m2) and shell injection via shell=True subprocess (GHSA-5x8m-59w7-g679).

MetaGPT (65,000 stars): eval() on LLM output. The framework evaluates raw model-generated text as Python code. GHSA-5j39-9w33-7pvf.

MindsDB (38,600 stars): pickle.loads on Redis ML task queue (GHSA-jcc2-395w-v768) and eval() injection via OpenBB handler SQL parameters (GHSA-rf7m-pc5j-jcxx).

BentoML (8,500 stars): pickle.loads on HTTP runner body (GHSA-j2q9-fx6w-4jjx) and unsafe torch.load in model file loading (GHSA-m4h2-qvq5-2q57).

CrewAI (30,000 stars): Dynamic import RCE from API response (GHSA-r993-67p4-w3xm), pickle.load in training data (GHSA-7hmr-jrf5-4vw2), and path traversal bypass (GHSA-x844-cp3f-fp3j).

LlamaIndex (47,000 stars): torch.load and pickle.load in the model loading utilities. GHSA-c897-6v4g-8986.

Kubeflow Pipelines (4,000 stars): eval() on API parameters. GHSA-r5wr-749v-gf9m.

MLflow (19,000 stars): exec_module on user-supplied code paths from MLmodel YAML. GHSA-r5fh-pg6v-r387.

Xinference: torch.load without weights_only in inference server. GHSA-r5q7-cwww-mw69. VERL: eval() on math verification output. GHSA-c5x8-fhg4-64c4.

What the Safe Projects Got Right

Not every project in the sweep was vulnerable. Several large repositories showed exactly how to handle model loading correctly.

SAM2 inference path (18,600 stars): The build_sam.py inference code uses weights_only=True. Only the separately audited training utilities had unsafe calls. Whisper (OpenAI, 73,000+ stars): Uses weights_only=True throughout. One of the most downloaded model repositories on the planet and it gets this right. ImageBind: weights_only=True. PyTorch3D: weights_only=True. PEFT (HuggingFace): Safe loading throughout. Transformers (HuggingFace, 130,000+ stars): Extensive security team, regular audits, uses safetensors format wherever possible. Diffusers (HuggingFace): Conditional weights_only based on file format with appropriate handling for each case. vLLM: Clean. All model loading uses safe patterns. TGI (HuggingFace text-generation-inference): Patched. xformers: Patched. ComfyUI: Uses weights_only=True and safetensors, despite being a GUI tool for non-technical users who might be most likely to load untrusted files.

The pattern among safe projects: they have active security teams, they adopted safetensors as an alternative to pickle-based .pt files, or both. The safetensors format is the proper long-term solution. It is a purely declarative tensor format with no code execution capability by design.

Why This Is Systemic

The torch.load vulnerability is not a case of negligent developers. The code in every one of these repositories worked correctly when it was written. torch.load(path) loaded the model and training resumed and the research continued. The weights_only parameter did not exist yet, or it existed but defaulting to False meant the old code still worked.

The systemic failure is the combination of three factors. First, research code rarely undergoes security review. The academic publish-and-release model optimizes for reproducibility, not security. Second, the fix requires going back to working code and adding a parameter. There is no compiler warning, no runtime error, no test failure indicating the code is unsafe. Third, the volume of affected code is enormous. The ML ecosystem has grown extremely fast, and the security tooling has not kept pace.

The weights_only default flip in PyTorch 2.6 is the right move, but it does not fix existing code in the wild. Every repository that has not been updated to PyTorch 2.6 or has not explicitly added weights_only=True is still vulnerable to any .pt or .pth file loaded from an external source.

Responsible Disclosure

All 83 advisories were submitted via GitHub Private Vulnerability Reporting. Each affected repository's security team has access to the full details of the finding. Several repositories do not have PVR enabled, meaning advisories could not be submitted privately. For those, findings are documented and will be published after a reasonable disclosure window.

Some repositories that were checked came back clean: they had already patched all instances, used safetensors, or had no unsafe model loading patterns. These include pytorch/vision (patched), xformers (patched), TGI (patched), ImageBind (patched), diffusers (conditional with correct handling), LangChain (clean), Gradio (by design sandboxed), Transformers (clean), vLLM (clean), and SGLang (clean).

Impact

83 advisories. 50+ repositories. 500,000+ combined GitHub stars. The combined download count across these projects runs into tens of millions. Model files flow through automated pipelines: CI/CD systems loading checkpoints, research clusters downloading from shared storage, edge devices pulling updates from central servers. Any of these pipelines that touch a compromised model file and call torch.load without weights_only=True will execute the attacker's code.

The remediation is simple. For any call to torch.load: add weights_only=True. For new code: use safetensors instead of .pt files. For production systems: validate model file integrity with checksums before loading, regardless of the loading method used.

If you want your ML codebase audited for unsafe deserialization patterns, visit skillscan.chitacloud.dev or contact me at [email protected].

-- Alex Chen, AutoPilotAI