Sandboxing & Security for AI Agents
Table of content
Why Sandboxing Matters
AI agents operate with elevated permissions. They read files, execute code, and make network requests on your behalf. This creates the Confused Deputy Problem: the agent has more access than it needs for any single task.
Without isolation:
- A prompt injection attack could exfiltrate sensitive files
- A malformed command could modify system configs
- You click “approve” hundreds of times per session (approval fatigue)
- One compromised tool endangers your entire system
Sandboxing creates defined boundaries. The agent operates freely within those boundaries, but cannot escape them.
How Sandboxing Works
Modern sandboxing uses OS-level primitives—not application-level checks that can be bypassed. On macOS, this means sandbox-exec profiles. On Linux, namespaces and seccomp filters.
The sandbox enforces:
- Filesystem isolation: Agent can only access specified directories
- Network isolation: Restrict or block outbound connections
- Process isolation: Cannot spawn arbitrary system processes
Enabling Sandbox Mode in Claude Code
Claude Code supports native sandboxing. Enable it in your configuration:
# Enable sandbox mode
claude config set sandbox true
# Or run a single session sandboxed
claude --sandbox
For project-specific settings, add to .claude/settings.json:
{
"sandbox": {
"enabled": true,
"mode": "strict",
"allowedPaths": [
"/Users/you/projects/current-project",
"/tmp"
],
"networkAccess": "none"
}
}
Sandbox Modes
| Mode | Filesystem | Network | Use Case |
|---|---|---|---|
strict | Project dir only | None | Sensitive codebases |
standard | Project + temp | Local only | Daily development |
permissive | Home dir | Filtered | Research tasks |
Choose based on your threat model. For most development work, standard mode balances security with usability.
Best Practices
Start strict, loosen as needed. Begin with strict mode. If the agent needs additional access, grant it explicitly rather than starting wide open.
Separate contexts for separate tasks. Run financial code reviews in one sandboxed session, open-source exploration in another. Don’t mix trust levels.
Audit allowed paths regularly. Your allowedPaths list grows over time. Prune it.
# Review current sandbox config
claude config get sandbox
Use read-only mounts where possible. If the agent only needs to analyze code, don’t give it write access:
{
"allowedPaths": [
{ "path": "/Users/you/projects/legacy-app", "mode": "read" }
]
}
Monitor sandbox violations. Failed access attempts indicate either a misconfiguration or suspicious behavior:
# Check sandbox logs (macOS)
log show --predicate 'subsystem == "com.apple.sandbox"' --last 1h
The Tradeoff
Sandboxing adds friction. The agent will fail when it hits boundaries. You’ll need to adjust configurations.
This friction is the point. Every boundary you define is a decision about trust. Explicit decisions beat implicit assumptions.
The goal isn’t to make agents useless—it’s to make their capabilities match their current task. An agent writing unit tests doesn’t need network access. An agent reviewing PRs doesn’t need write permissions.
Match permissions to purpose. Your system stays secure, and you stop clicking “approve” all day.
Next: Building Trust with AI Agents
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.