Why a Checklist?
Security is hard. AI agent security is harder because the attack surface is poorly understood and constantly evolving. A checklist won't make you invincible, but it will prevent the obvious mistakes that cause 80% of incidents.
The 12-Item Checklist
1. Audit Your SKILL.md Files
Every skill file is a potential attack surface. Review them for: overly broad permissions, hardcoded credentials, missing input validation, and unclear scope definitions.
2. Scan for Prompt Injection Vectors
Test every path where external content reaches your agent's context. This includes web scraping, document processing, API responses, and user inputs.
3. Implement Tool Call Logging
Every tool call your agent makes should be logged with: timestamp, tool name, parameters, response, and calling context. This is your audit trail when something goes wrong.
4. Rate Limit Agent Actions
Agents can act fast — sometimes too fast. Implement rate limits on sensitive operations like API calls, file writes, and external communications.
5. Validate Tool Outputs
Don't blindly trust what tools return. External API responses, web content, and database results can all contain malicious payloads.
6. Restrict Network Access
If your agent doesn't need to reach the internet, block it. Use allowlists for specific domains rather than blocklists for known bad ones.
7. Isolate Agent Memory
Agent memory stores sensitive context. Ensure it's encrypted at rest, access-controlled, and purged according to a defined retention policy.
8. Test Jailbreak Resistance
Systematically try to break your agent's constraints. Use known jailbreak patterns and red-team exercises before deployment.
9. Review Third-Party Skills
Every third-party skill you install is a supply chain risk. Review their permissions, origins, and update history before deployment.
10. Implement Human-in-the-Loop Gates
High-risk operations (financial transactions, communications, deletions) should always require human confirmation, regardless of agent confidence.
11. Monitor for Behavioral Drift
Agents can be gradually manipulated over time through persistent context manipulation. Monitor for unexpected behavior patterns.
12. Have an Incident Response Plan
When (not if) something goes wrong, you need a plan. Know how to: immediately revoke agent access, preserve logs, identify the blast radius, and notify affected parties.
Security is not a feature you add at the end — it's a property you design for from the beginning.