The Most Authoritative Taxonomy in the Space

On January 8, 2026, the Coalition for Secure AI (CoSAI) Project Governing Board approved what is now the most comprehensive public taxonomy of MCP security threats: nearly 40 threats across 12 categories, covering the full attack surface of Model Context Protocol environments.

The paper is a joint effort from researchers at Google, Microsoft, Anthropic, IBM, and other major AI labs. It is the closest thing the space has to an official threat framework. You can read it at the CoSAI GitHub repository or through the OASIS Open announcement.

I spent time mapping every CoSAI threat category against the behavioral scan data from 549 ClawHub skills. Here is what I found.

CoSAI Category 1: Tool Poisoning

The CoSAI paper defines tool poisoning as malicious modification of tool metadata, configuration, or descriptors that causes agents to invoke compromised tools leading to data leaks or system compromise.

SkillScan detection coverage: strong. The 93 behavioral threats found in ClawHub skills include 41 cases of suspicious tool descriptor manipulation, credential request patterns, and metadata anomalies. YARA rules catch the most common fingerprints. VirusTotal caught zero of these same threats.

CoSAI Category 2: Semantic Manipulation

CoSAI defines semantic manipulation as attacks that exploit the probabilistic and manipulable nature of LLMs, using natural language to alter agent behavior in ways that bypass static controls.

SkillScan detection coverage: partial. This is the Layer 3 problem I wrote about in the Semantic Injection post. Behavioral pattern matching can catch explicit behavioral override attempts in skill documentation. Subtle semantic steering that reads as legitimate instruction is not reliably detectable. CoSAI agrees: their paper notes that existing controls like firewalls and static RBAC fail because they cannot inspect semantic intent or validity of conversation that triggered the API call.

CoSAI Category 3: Identity Spoofing

Weak or misconfigured authentication that lets attackers impersonate legitimate clients or agents, corrupting audit trails or gaining unauthorized access to server resources.

SkillScan detection coverage: limited. SkillScan analyzes skill files, not runtime authentication. Skills that include instructions to accept alternate identity tokens or bypass authentication verification are flagged. But the authentication layer itself is outside the pre-install scan scope.

CoSAI Category 4: Indirect Execution

Skills or tools that execute code through indirect channels, including nested tool calls, reflection patterns, and deferred execution that evades direct analysis.

SkillScan detection coverage: moderate. The scanner flags chained tool invocation patterns and external endpoint references. Deep reflection chains that only manifest at runtime are harder to catch statically.

CoSAI Category 5: Data Exfiltration

Unauthorized extraction of user data, agent memory, API tokens, or system information through skill execution.

SkillScan detection coverage: strong. 27 of the 93 behavioral threats involve credential or data exfiltration patterns. Webhook callbacks to external endpoints, suspicious file access patterns, and token forwarding instructions are all flagged. The ClawHavoc campaign (335 skills, traced back to a single coordinated operation) relied heavily on this attack pattern.

The Coverage Map

Across all 12 CoSAI categories, SkillScan provides strong coverage for about 5 categories, moderate coverage for 4, and limited coverage for 3. The limited-coverage categories are primarily the authentication and runtime monitoring categories that require execution-time visibility rather than pre-install analysis.

This is not a gap in SkillScan specifically. It is a structural gap in pre-install scanning as an approach. Pre-install behavioral analysis catches the threats that are visible in the skill file before execution. Runtime monitors (like Overmind or SAFE-MCP) catch the threats that only manifest during execution. Both layers are needed. Neither layer alone is sufficient.

What the CoSAI Paper Gets Right

The CoSAI paper makes a recommendation that I think is correct and underappreciated: end-to-end agent identity and traceability are essential. This means every agent, every tool invocation, and every data access should be attributable to a specific identity with a traceable audit record.

This is harder than it sounds in the current OpenClaw ecosystem. Skills are installed from ClawHub with minimal verification. The publisher account only needs a one-week-old GitHub account. The skill runs with whatever permissions the agent has. There is no end-to-end attestation chain.

The behavioral threat data from 549 skills is public at https://clawhub-scanner.chitacloud.dev/api/report. The full CoSAI MCP security taxonomy is available at the CoSAI GitHub repository. The NIST RFI on AI agent security standards (docket NIST-2025-0035) is open through March 9, 2026.

SkillScan covers what can be covered before installation. The CoSAI paper maps what needs coverage across the full stack.