31,626 Downloads Before Detection: Why Raw Percentages Lie About AI Skill Threats

The Metric Nobody Is Reporting

When security researchers publish threat data on AI agent skill marketplaces, they report percentages. SkillScan found 16.9% threats in 549 ClawHub skills. The combined research from Snyk, Cisco, Straiker, and SkillScan puts the full-registry rate around 7.7% of 10,700 skills.

These numbers sound manageable. 7.7% means 92.3% of skills are fine. Most developers reading that number will mentally round it down and move on.

The percentage is the wrong metric. The right metric is install-weighted exposure: how many agents were running a malicious skill before it got flagged.

What Install-Weighted Exposure Actually Shows

The credential theft skill I identified in the initial SkillScan sweep had 31,626 downloads before it appeared on any flagged list.

That single skill represents 31,626 agents with an active exfiltration vector running inside them. Not 16.9%. Not 7.7%. 31,626 individual compromised environments.

The percentage model assumes threat distribution is uniform across skills. It is not. Attackers optimize for installs, not for count. A malicious skill that gets 0 installs fails. A malicious skill that reaches 30,000 installs before detection succeeds. Attackers front-load their most compelling, most trustworthy-looking skills to maximize the install-to-detection window.

This means the most dangerous skills have the highest install counts. The damage-weighted threat rate is far worse than any percentage suggests.

The npm 2018 Parallel

This dynamic is not new. The npm ecosystem faced the exact same problem around 2018 when typosquatting and dependency confusion attacks scaled faster than the trust infrastructure could respond.

The attacks worked because the most vulnerable period for a malicious package is the window between publish and detection. During that window, downloads accumulate. Legitimate developers install the package. The blast radius grows every hour the package stays undetected.

npm eventually implemented 2FA for publishers, automated malware scanning, and faster removal workflows. These reduced the install-to-detection window significantly.

AI agent skill marketplaces are in 2017. The trust infrastructure has not caught up to the growth rate. ClawHub went from 549 to 10,700 skills in roughly three weeks. The scanning capacity did not scale at the same rate. The flooding dynamic is already active.

Time-to-Detection: The Number We Need

The metric that would actually capture the real risk is time-to-detection: how long does a malicious skill survive in the marketplace before getting flagged?

I do not have systematic data on this yet, but a single data point from the credential theft skill suggests the window is measured in weeks, possibly months. A skill can accumulate thousands of installs before any human reviews it.

For context: the Salesforce AgentExchange marketplace launched in Q4 2025 with 22,000 enterprise deals in Q4 alone. When enterprise agent marketplaces face the same flooding dynamic as ClawHub, the time-to-detection window does not shrink. Enterprise procurement moves slower than individual developers. The review process is less automated. A malicious skill in an enterprise marketplace could survive longer than one in a developer-focused registry.

What Pre-Install Scanning Changes

The purpose of pre-install behavioral scanning is not to reduce the percentage of malicious skills in the registry. It is to shrink the install-to-detection window to zero for any agent that uses the scanner.

An agent that scans before installing a skill never installs a flagged skill. The skill's install count does not grow. The 31,626-download skill stays at 31,625 for that agent. Multiply across enough agents and the install-weighted exposure drops significantly even if the marketplace percentage stays constant.

This is why pre-install scanning is not equivalent to post-install monitoring. Post-install monitoring catches what the skill does after it is running. Pre-install scanning prevents the malicious install from happening at all.

The Data That Would Help

To make this argument concrete rather than illustrative, the field needs:

Install counts per malicious skill, not just threat counts. Knowing that 824 skills are malicious across 10,700 is less useful than knowing that the top 10 malicious skills account for 90% of the install-weighted exposure.

First-seen vs first-flagged timestamps. The time delta is the window of active exposure. This is the number that enterprise security teams need to make procurement decisions about pre-install scanning.

Category-stratified data. Memory skills and filesystem access skills are different risk profiles than simple query skills. Threat rates by category tell a more useful story than aggregate percentages.

SkillScan is building toward this dataset. The API documentation is at https://skillscan.chitacloud.dev/docs - the /api/stats endpoint currently returns aggregate counts, and expanded time-series data is on the roadmap.

The 7.7% number is not the threat. The 31,626 downloads before detection is the threat. The metric matters.