By the end of this chapter you can harden a workflow so that even a fully compromised prompt can do very little damage: least-privilege permissions:, an egress network: firewall, and strict mode, layered on top of the safe-outputs boundary from Chapter 6. You'll learn the threat model these defenses answer to, and how much locking-down is enough.
Everything targets gh aw v0.81.6. We take the Repo Assistant and give it a genuinely paranoid security posture — the version you'd be comfortable running on a public repo.
Chapter 6 removed the agent's write access. But writes aren't the only way to cause harm. Security researchers describe a now widely-cited danger called the “lethal trifecta”: an AI agent becomes genuinely dangerous when it has all three of — (1) exposure to untrusted content, (2) access to private data, and (3) the ability to communicate externally. Any one alone is fine. Together, a prompt-injection in the untrusted content can read your secrets and smuggle them out.
An agentic workflow naturally trends toward all three: it reads issues (untrusted), checks out your repo (private data), and can reach the network (exfiltration channel). So the strategy isn't to find the “one fix” — it's to break the trifecta from several directions at once, so no single failure is catastrophic. That is defense in depth, and it's exactly how gh-aw is built: it “implements a defense-in-depth security architecture that protects against untrusted MCP servers and compromised agents” (Security Architecture).
Three layers of trust
gh-aw organizes its defenses into three layers, “each enforc[ing] distinct security properties… and constrain[ing] the impact of failures above it” (Security Architecture):
Substrate — VM, kernel, container runtime, and the network firewall: isolation that holds “even if an untrusted user-level component is fully compromised.”
Configuration — schema validation, SHA-pinned actions, security scanners, and role/permission checks applied at compile time.
Plan — staged execution: content sanitization, threat detection, secret redaction, and the SafeOutputs permission separation you already met.
You control several of these layers directly from frontmatter. Three levers matter most day to day.
1. Least-privilege permissions:
The permissions: block grants read scopes to the agent, which “runs with minimal read-only permissions, while write operations are deferred to separate jobs” (Security Architecture). Grant only what the task reads — a triager needs issues: read, not contents: write. If you omit permissions:, gh-aw defaults to read-only.
2. The network firewall (network:)
This is the trifecta's third leg — the exfiltration channel — and gh-aw lets you cut it. The Agent Workflow Firewall (AWF) “controls the agent's egress traffic via a configurable domain allowlist to prevent data exfiltration” (Security Architecture). Three postures, following least privilege (Network Permissions):
Three network postures, tightest to most open
network: {} # no network at all — the tightest
network: defaults # basic infrastructure only (the default)
network: # an explicit allowlist
allowed: [defaults, github, python] # ecosystem identifiers + domains
Use ecosystem identifiers (python, node, github…) instead of raw domains — strict mode nudges you toward them, and “blocked entries take precedence over allowed ones.” A workflow that only reads issues needs no egress at all.
3. Strict mode (the default)
You've been relying on this since Chapter 2. Strict mode is on by default, and it enforces the configuration layer at compile time: no top-level write permissions, explicit network config, no wildcard domains, no deprecated fields, SHA-pinned actions, and security scanners (Security Architecture). Turning it off is a cliff: “Workflows compiled with strict: false cannot run on public repositories” (Frontmatter).
The layers you get for free
Beyond what you configure, the Plan layer runs automatically: incoming issue/PR text is sanitized (mentions neutralized, non-HTTPS and untrusted URLs redacted); a separate threat-detection job uses AI to scan the agent's buffered output for “secret leakage, malicious code patterns, and policy violations” and “must complete successfully and emit a ‘safe’ verdict before any safe output jobs execute”; and secret redaction scrubs artifacts “with if: always()” (Security Architecture).
The honest answer is: the defaults are already strong, and for many workflows you barely add anything. The skill is matching the lock-down to the trifecta legs your workflow actually has.
If your workflow…
Then…
only reads issues/PRs and comments
keep read-only perms; consider network: {} — it needs no egress
installs packages (tests, builds)
add just the ecosystem: network: { allowed: [defaults, node] }
runs on a public repo
never set strict: false; lean on the auto-applied min-integrity: approved
can be triggered by outsiders
tighten the roles: gate and fork policy from Chapter 4
When not to
Don't disable strict mode to “make it work.” A strict-mode error is a real risk being flagged. Fix the cause — it's the compiler doing its job, and strict: false won't even run on public repos.
Don't open the firewall wide.network: { allowed: [...] } with a broad list, or disabling the firewall, hands a compromised agent an exfiltration channel. Add domains one at a time, guided by gh aw audit.
Don't over-grant read scopes either. Read access is still access to private data (trifecta leg two). Only request the scopes the task reads.
Don't treat any single layer as sufficient. Safe outputs, the firewall, strict mode, and threat detection are complementary. The point is that they overlap.
Here is the Repo Assistant with every lever pulled toward safety — the version you'd happily run on a public repo. It still compiles cleanly under strict mode with no secrets.
examples/ch07/repo-assistant-hardened.md — defense in depth in one frontmatter (compiles: 0 errors, 0 warnings)
on:
issues:
types: [opened]
roles: [admin, maintainer, write] # who may trigger (Configuration layer)
permissions:
contents: read # least-privilege reads only
issues: read
engine: copilot
strict: true # enforce the Configuration layer
network:
allowed:
- defaults # cut the exfiltration leg to essentials
- github
timeout-minutes: 10 # bound blast radius in time
safe-outputs:
add-comment:
max: 1
add-labels:
allowed: [bug, enhancement, question, documentation]
max: 1
Count the independent controls, each attacking a different leg of the trifecta or bounding the blast radius:
Least privilege — the agent gets only contents: read and issues: read. No write token exists to steal.
Trigger gate — roles: means a stranger's issue can't even start the agent.
Egress firewall — a tight network: allowlist closes the exfiltration channel; a leaked secret has nowhere to go.
Strict mode — the compiler refuses unsafe choices before this ever ships.
Time cap + safe outputs — timeout-minutes bounds a runaway run, and writes still flow through the sanitized, permission-scoped boundary.
You can now harden a workflow so a compromised prompt is a non-event:
The threat is the lethal trifecta — untrusted content + private data + external communication. The defense is to break it from several directions: defense in depth.
gh-aw layers trust across substrate, configuration, and plan, so a failure in one layer is caught by another.
You directly control three levers: least-privilege permissions:, the network: egress firewall, and strict mode (on by default — don't turn it off).
You get content sanitization, threat detection, and secret redaction for free. Match the lock-down to the trifecta legs your workflow actually has — the defaults are already strong.
What's next. A hardened, read-only agent is safe — but also limited to what it can read. To do real work it often needs capabilities: querying a database, browsing docs, calling an API. In Chapter 8: Tools & MCP, we grant those capabilities through the tools: block and MCP servers — without reopening the doors we just closed.