Sandboxing & Security for AI Agents

Table of content

Why Sandboxing Matters

AI agents operate with elevated permissions. They read files, execute code, and make network requests on your behalf. This creates the Confused Deputy Problem: the agent has more access than it needs for any single task.

Without isolation:

A prompt injection attack could exfiltrate sensitive files
A malformed command could modify system configs
You click “approve” hundreds of times per session (approval fatigue)
One compromised tool endangers your entire system

Sandboxing creates defined boundaries. The agent operates freely within those boundaries, but cannot escape them.

How Sandboxing Works

Modern sandboxing uses OS-level primitives—not application-level checks that can be bypassed. On macOS, this means sandbox-exec profiles. On Linux, namespaces and seccomp filters.

The sandbox enforces:

Filesystem isolation: Agent can only access specified directories
Network isolation: Restrict or block outbound connections
Process isolation: Cannot spawn arbitrary system processes

Enabling Sandbox Mode in Claude Code

Claude Code supports native sandboxing. Enable it in your configuration:

# Enable sandbox mode
claude config set sandbox true

# Or run a single session sandboxed
claude --sandbox

For project-specific settings, add to .claude/settings.json:

{
  "sandbox": {
    "enabled": true,
    "mode": "strict",
    "allowedPaths": [
      "/Users/you/projects/current-project",
      "/tmp"
    ],
    "networkAccess": "none"
  }
}

Sandbox Modes

Mode	Filesystem	Network	Use Case
`strict`	Project dir only	None	Sensitive codebases
`standard`	Project + temp	Local only	Daily development
`permissive`	Home dir	Filtered	Research tasks

Choose based on your threat model. For most development work, standard mode balances security with usability.

Best Practices

Start strict, loosen as needed. Begin with strict mode. If the agent needs additional access, grant it explicitly rather than starting wide open.

Separate contexts for separate tasks. Run financial code reviews in one sandboxed session, open-source exploration in another. Don’t mix trust levels.

Audit allowed paths regularly. Your allowedPaths list grows over time. Prune it.

# Review current sandbox config
claude config get sandbox

Use read-only mounts where possible. If the agent only needs to analyze code, don’t give it write access:

{
  "allowedPaths": [
    { "path": "/Users/you/projects/legacy-app", "mode": "read" }
  ]
}

Monitor sandbox violations. Failed access attempts indicate either a misconfiguration or suspicious behavior:

# Check sandbox logs (macOS)
log show --predicate 'subsystem == "com.apple.sandbox"' --last 1h

The Tradeoff

Sandboxing adds friction. The agent will fail when it hits boundaries. You’ll need to adjust configurations.

This friction is the point. Every boundary you define is a decision about trust. Explicit decisions beat implicit assumptions.

The goal isn’t to make agents useless—it’s to make their capabilities match their current task. An agent writing unit tests doesn’t need network access. An agent reviewing PRs doesn’t need write permissions.

Match permissions to purpose. Your system stays secure, and you stop clicking “approve” all day.

Next: Building Trust with AI Agents

Topics: security ai-agents setup