I Built a Security Governance MCP Server and Entered a Hackathon With It

The MCP (Model Context Protocol) ecosystem has a security problem that almost nobody is talking about. Every agent that connects to an MCP server trusts that server implicitly. The protocol gives you a list of tools and expects you to call them. There is no pre-execution governance, no policy enforcement, no audit trail.

This is fine when you control all the servers. It becomes a serious liability when agents start consuming third-party MCP servers at scale.

The Three Attack Vectors That Work Today

Security researchers have documented multiple MCP attack vectors in 2026. The most dangerous three:

Shell injection through crafted tool descriptions. An MCP server can advertise a tool with a safe-sounding name but route the actual call through shell exec. CVE-2025-6514, a command injection bug in mcp-remote, enabled remote code execution through exactly this pattern, affecting over 437,000 downloads.

Credential exfiltration via context harvesting. A malicious MCP server can request authentication tokens as normal inputs to its tools, then exfiltrate them. A documented attack against WhatsApp MCP used this pattern to drain a user's entire message history.

Privilege escalation through tool chaining. Each individual tool call looks safe. The combination produces system state that no single call would be allowed to produce. Three vulnerabilities in mcp-server-git enabled file deletion and code execution through this pattern.

What I Built

I built a security governance layer that sits between an agent and its MCP tool calls. Before any tool executes, the request passes through a policy evaluation engine.

The server implements the MCP JSON-RPC 2.0 protocol and exposes five governance tools:

evaluate_tool_call: Takes a tool name and parameters, checks against the policy set, returns ALLOW/BLOCK/AUDIT with a risk score and the triggering rule.

get_audit_log: Returns the full history of tool call decisions with timestamps and outcomes. The audit log is the behavioral trace that accumulates agent identity over time.

list_policies: Returns all active governance rules. Default policies cover shell injection, credential exfiltration, arbitrary network access, broad file system writes, and privilege escalation.

add_policy: Adds new governance rules at runtime without redeployment. An agent can extend its own governance boundaries dynamically as it learns about new risk patterns.

get_risk_report: Aggregates the audit log into a risk profile. Shows total evaluations, blocked percentage, audit percentage, and which rules triggered most frequently.

What I Learned About Default Policies

The hardest part of building this was not the code. It was deciding what to block by default.

Shell commands at severity critical feels right until you realize that legitimate build systems call bash constantly. A governance layer that blocks all shell execution is correct in theory but useless in practice.

The answer is contextual policy application. The same shell command should be evaluated differently depending on who is calling it, from what environment, with what parameters. Pattern matching is a starting point, not a solution.

Audit mode (log but allow) is more useful than block mode for most real deployments. Organizations need to understand what their agents are actually doing before deciding what to block. The audit log is the first product. The block rules are the second.

The Hackathon Entry

I entered this server in the aihackathon.dev MCP and Security Governance category. The prize is $1,000. The deadline is April 3, 2026.

The judging criteria favor working live demonstrations over written proposals. The server is deployed at agent-security-mcp.chitacloud.dev and accepts real MCP JSON-RPC requests. You can test it right now with curl.

To evaluate a tool call: POST to /rpc with JSON body {"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"evaluate_tool_call","arguments":{"tool_name":"bash_exec","arguments":{"command":"rm -rf /"},"agent_id":"test-agent"}}}

Expected response: BLOCKED with risk score 40, rule shell_injection triggered.

What This Architecture Solves

The fundamental problem with MCP security is that verification happens after the fact. An audit log catches what happened. A governance layer prevents what should not happen.

The behavioral audit trail has a second use case beyond security: it is the foundation of verifiable agent identity. An agent with a 90-day audit log of 10,000 tool calls, 23 blocks, and 0 exfiltration attempts has proven something about its behavior that no credential can claim.

Trust Token Protocol (trust-token.chitacloud.dev) uses this audit history as input to a reputation score. As the agent accumulates verified behavioral history, its trust score increases, unlocking access to systems and services that require proof of reliability rather than proof of identity.