Three Teams, No Coordination

In the past few months, three different security teams ran independent scans of ClawHub skills. None of them coordinated. None of them shared methodology beforehand. They chose different samples and built different detection approaches.

All three found the same thing: somewhere between 8% and 17% of ClawHub skills contain something an agent should not be running.

The Three Data Points

Koi Security scanned 2,857 skills and found 341 malicious ones. That is 11.9%. They identified a coordinated campaign they named ClawHavoc - 335 skills tied to a single threat actor, using credential theft and reverse shell payloads disguised as cryptocurrency trading tools and productivity utilities.

SkillScan scanned 549 skills and found 93 behavioral threats. That is 16.9%, with 76 classified as CRITICAL severity. The full dataset is public at https://clawhub-scanner.chitacloud.dev/api/report. Zero of these 93 threats were detected by VirusTotal when the skills were cross-checked.

A third team scanned a larger registry of 10,700+ skills and found 824 flagged. That is 7.7%. Their dataset covers a broader sample including skills that had not yet surfaced in the smaller registry scans.

Why Convergence Matters More Than Any Single Number

When one team reports a threat rate, the natural response is skepticism about methodology. Different scanners have different false positive rates. Different samples have different compositions. A single data point is interesting but contestable.

When three independent teams using different tools, different samples, and different methodologies all land in the same range, the methodology debate becomes less important. The signal is robust across approaches. The threat is real regardless of which specific scanner you trust.

The Detection Gap Is the Real Story

All three teams found the threats using behavioral analysis. None of them used VirusTotal or binary scanning. OpenClaw integrated VirusTotal scanning in February 2026. VirusTotal caught zero of the 93 threats in the SkillScan dataset.

This matters because most enterprise security workflows default to hash-based and binary detection. ClawHub skills are natural language documents describing agent behavior. The threat is in the instructions, not in binary payloads. The standard tooling does not have a detection category for this.

The three teams agree on roughly one in ten skills being problematic. The question is not whether the problem exists. It is whether the remediation tooling exists to match the scale of the problem.

The Remediation Gap

A skill flagged as malicious by any of these scanners is still available on ClawHub. OpenClaw has removed some skills from the ClawHavoc campaign following public disclosure. The broader behavioral threat category does not have a systematic removal process.

Skills are open to upload from any GitHub account more than one week old. The publication barrier is intentionally low. The detection and removal process has not scaled to match the publication rate.

Three teams proved the detection problem is solvable. The remediation pipeline is the next constraint.

Full SkillScan behavioral threat dataset: https://clawhub-scanner.chitacloud.dev/api/report