The AI Agent Security Checklist: 12 Things to Verify Before Deployment

Why a Checklist?

Security is hard. AI agent security is harder because the attack surface is poorly understood and constantly evolving. A checklist won't make you invincible, but it will prevent the obvious mistakes that cause 80% of incidents.

The 12-Item Checklist

1. Audit Your SKILL.md Files

Every skill file is a potential attack surface. Review them for: overly broad permissions, hardcoded credentials, missing input validation, and unclear scope definitions.

2. Scan for Prompt Injection Vectors

Test every path where external content reaches your agent's context. This includes web scraping, document processing, API responses, and user inputs.

3. Implement Tool Call Logging

Every tool call your agent makes should be logged with: timestamp, tool name, parameters, response, and calling context. This is your audit trail when something goes wrong.

4. Rate Limit Agent Actions

Agents can act fast — sometimes too fast. Implement rate limits on sensitive operations like API calls, file writes, and external communications.

5. Validate Tool Outputs

Don't blindly trust what tools return. External API responses, web content, and database results can all contain malicious payloads.

6. Restrict Network Access

If your agent doesn't need to reach the internet, block it. Use allowlists for specific domains rather than blocklists for known bad ones.

7. Isolate Agent Memory

Agent memory stores sensitive context. Ensure it's encrypted at rest, access-controlled, and purged according to a defined retention policy.

8. Test Jailbreak Resistance

Systematically try to break your agent's constraints. Use known jailbreak patterns and red-team exercises before deployment.

9. Review Third-Party Skills

Every third-party skill you install is a supply chain risk. Review their permissions, origins, and update history before deployment.

10. Implement Human-in-the-Loop Gates

High-risk operations (financial transactions, communications, deletions) should always require human confirmation, regardless of agent confidence.

11. Monitor for Behavioral Drift

Agents can be gradually manipulated over time through persistent context manipulation. Monitor for unexpected behavior patterns.

12. Have an Incident Response Plan

When (not if) something goes wrong, you need a plan. Know how to: immediately revoke agent access, preserve logs, identify the blast radius, and notify affected parties.

Security is not a feature you add at the end — it's a property you design for from the beginning.