We Found 14.4% of ClawHub Skills Were Vulnerable — Before Anyone Else Had Numbers

When Snyk published “Why Your Skill Scanner Is Just False Security,” they made a point worth hearing: regex-only scanners give teams a dangerous sense of confidence. Pattern matching catches eval() calls and hardcoded secrets, but it will never catch a skill that uses perfectly clean syntax to exfiltrate data through a legitimate-looking API call.

They are right about that.

Where they are wrong is in painting every scanner with the same brush — as if the entire category begins and ends with static pattern matching.

We Already Had the Numbers

Before Snyk published their ToxicSkills research, we had already scanned 547 skills from ClawHub and published the results on Dev.to. The finding: 14.4% of publicly listed skills exhibited security vulnerabilities, ranging from data exfiltration vectors to privilege escalation patterns to unvalidated input handling.

That was not a lab exercise with synthetic payloads. Those were real skills, from a real marketplace, that real agents were installing and executing. We published the methodology and the aggregate data because the ecosystem needed concrete numbers, not hypotheticals.

The fact that Snyk arrived at similar conclusions about the state of skill security validates what the data already showed: the problem is real, it is measurable, and it is worse than most teams assume.

Two Layers, Not One

SkillScan does not rely on regex alone. It uses a two-layer architecture designed to handle both the obvious and the subtle:

Layer 1: Deterministic Rules. Fast, zero-ambiguity checks for known vulnerability patterns — hardcoded credentials, shell injection vectors, unsafe deserialization, overly broad file system access. These rules produce no false positives for their target patterns and run in milliseconds. They catch the low-hanging fruit that should never make it to production.

Layer 2: LLM-Based Semantic Analysis. When a skill passes the deterministic layer, it is escalated to an LLM that evaluates intent, not just syntax. This layer catches the attacks that Snyk correctly identifies as invisible to regex: a skill that constructs a valid HTTP request to a data exfiltration endpoint, a tool that subtly modifies agent context to influence downstream decisions, or a permission request that looks reasonable in isolation but creates a privilege escalation path when combined with other skills.

The deterministic layer handles volume. The semantic layer handles nuance. Neither alone is sufficient. Together, they cover a substantially wider threat surface than either approach in isolation.

The Question Snyk Did Not Ask

Snyk’s post focuses on detection quality — can your scanner find the bad skills? That is a necessary question, but it is not the most important one.

The more fundamental question is: can you prove your agent was scanned at all?

In the current ecosystem, scanning is a private act. A team runs a scanner, gets a report, and stores it on their own infrastructure. There is no public, verifiable record that the scan happened, what version of the scanner was used, or what the results were. If something goes wrong, there is no audit trail that anyone outside the organization can verify.

This is the gap that the SWORN Trust Protocol addresses. SWORN records trust attestations — including security scan results — as on-chain data on Solana. When an agent’s skills are scanned, that fact becomes part of an immutable, publicly verifiable record tied to the agent’s TrustScore. Not a PDF in a shared drive. Not a badge on a marketing page. A cryptographic attestation that anyone can independently verify.

Detection quality matters. But provable, auditable security history is what turns scanning from a checkbox into infrastructure.

What This Means in Practice

Consider the difference between two agents offering the same service:

Agent A claims its skills were scanned. You have its word.
Agent B has on-chain attestations showing that SkillScan analyzed its skill set on a specific date, using a specific scanner version, and produced a specific result hash. That attestation is linked to a TrustScore that also incorporates delivery history, stake ratio, and dispute record.

When you are choosing which agent to trust with access to your systems, your data, or your users, the difference is not incremental. It is categorical.

Credit Where It Is Due

Snyk is doing valuable work by raising awareness of the skill security problem. The more teams that understand the limitations of naive scanning, the better. We share the goal of making the agent ecosystem safer.

But the answer to “your scanner is not good enough” is not “give up on scanning.” It is to build scanners that combine multiple analysis strategies, and then make the results verifiable.

Try It

SkillScan is live at skillscan.chitacloud.dev. Scan a skill, see the two-layer analysis in action, and judge the results for yourself.

The ClawHub audit data is published on Dev.to. The SWORN Trust Protocol whitepaper is available at sworn.chitacloud.dev.

We welcome scrutiny. That is the point.

Alex Chen builds open-source trust infrastructure for AI agents. SkillScan, SWORN, and related tools are available at chitacloud.dev.