By the time you're reviewing the diff, the destructive call already ran.

The agent sends the email while you’re still scrolling up to read what it planned to do.

Or it drops the table. Or it issues the refund. Or it force-pushes over the branch your teammate spent the afternoon on. You were reading the transcript — diligently, in good faith — and the destructive call landed two lines above where your eyes were. The diff review you were doing arrived one tool call too late, because the thing you needed to approve wasn’t a diff. It was an action, and the action had already happened.

This is the failure mode nobody markets. The pitch is “fully autonomous agent.” The reality is that full autonomy is the wrong goal — not because autonomy is bad, but because the right amount of autonomy is task-dependent. Reading files, grepping, running a dry-run, querying a read replica: let it rip, full speed, no prompts. Dropping a production table: that should never happen without a human saying yes first, in that exact moment, before the call goes out.

The line between those two worlds is not “important” versus “trivial.” It’s reversible versus irreversible.

Reversibility is the only line that matters

Most permission setups draw the line in the wrong place. They gate on scariness — “ask before running shell commands” — which trains you to rubber-stamp, because the overwhelming majority of shell commands are ls and cat and you learn to hit y without reading. A prompt that fires on everything is a prompt that protects nothing.

Draw the line at reversibility instead. The question for any tool the agent can call is simple: if it does this and it’s wrong, can I undo it?

Reading a file, listing a directory, running tests, querying a read replica, a dry-run migration — reversible. Worst case, the agent wasted some tokens. Let it run unattended.
git commit on a feature branch, writing a new file, editing source — reversible enough. A bad commit is a git reset. Let it run, review later.
DROP TABLE, DELETE FROM without a WHERE, sending an email, charging a card, terraform apply against prod, git push --force — irreversible. There is no undo. These get a hard human gate.

The corporate version of this is sharper still. If you live in Copilot or Cursor today and you’ve started letting an agent run terminal commands, you already have a tool that can reach production. The agent is broad — it has read more terraform and more raw SQL than you have. But it doesn’t know that payments_ledger is append-only by policy, or that the staging database alias points at prod on Fridays during the migration window. That’s the gap: the agent’s competence is general, your knowledge is specific and load-bearing, and the place that gap hurts most is the one call you can’t take back.

Closing it takes three primitives, layered. None of them is enough alone.

Step 1 — Split the toolset by reversibility in permissions

Permissions is where you encode the reversibility line so the agent obeys it without you watching. The shape is an allowlist with three buckets — allow, ask, deny — and the whole trick is putting each tool in the right one.

Here is the split as a Claude Code settings.json. opencode keys the same three actions — allow, ask, deny — directly off each tool (edit, bash, webfetch), with per-command patterns underneath. Codex draws the line a layer down, through its sandbox mode plus an approval policy (untrusted, on-request, never, or a granular policy that keeps some prompt categories interactive while auto-rejecting others). The mechanisms differ; the reversibility logic you encode into them is identical.

{
  "permissions": {
    "allow": [
      "Read",
      "Grep",
      "Glob",
      "Bash(ls:*)",
      "Bash(cat:*)",
      "Bash(git status:*)",
      "Bash(git diff:*)",
      "Bash(npm test:*)",
      "Bash(psql --variable=default_transaction_read_only=on:*)"
    ],
    "ask": [
      "Bash(git push:*)",
      "Bash(npm run deploy:*)",
      "Edit"
    ],
    "deny": [
      "Bash(rm -rf:*)",
      "Bash(git push --force:*)",
      "Bash(terraform apply:*)"
    ]
  }
}

One thing the matcher won’t do for you: catch a DROP buried in the middle of a psql -c "..." string. Claude Code’s permission patterns match on the command prefix, not arbitrary substrings — Bash(psql:*) matches any psql invocation, but there’s no pattern that reliably fires only when the SQL payload contains DROP TABLE. Don’t fake it with a glob that looks like it works. Mid-string content matching is exactly the job for the hook in Step 3, where a real regex runs against the full command.

Read the buckets as a policy statement. allow is everything reversible — the agent does it silently, full autonomy, no interruption. ask is the reversible-but-consequential middle — it pauses for a yes. deny is the irreversible class that should never run from inside an agent loop at all; a human goes and does it by hand, or it runs through a gated path you control.

This already buys you most of the safety. But deny is blunt — it bans the tool outright, which means the agent can’t even propose a terraform apply as the obvious last step of a plan. That’s fine for rm -rf. It’s too blunt for the legitimate destructive action you actually do want to happen, once, after you’ve seen the plan. For that, you need the other two primitives.

Step 2 — Force a plan so the action is visible before it runs

The diff-review problem is a timing problem. You’re reviewing the consequence after the cause. Plan mode fixes the timing by making the agent produce its full intended action list before it touches anything — and wait.

In plan mode the agent reads, reasons, and emits something like this, then stops:

PLAN
1. Read migrations/ to find the column rename. (read-only)
2. Write a new migration that backfills `users.full_name`. (reversible)
3. Run the migration against the staging replica as a dry-run. (reversible)
4. Apply the migration to production. (IRREVERSIBLE — requires approval)

Awaiting approval before step 4.

Now the irreversible step is a line item you read with your eyes, in advance, not a tool call you catch mid-scroll. The plan is the artifact that makes the reversibility line reviewable. This is the move people miss: plan mode’s value isn’t that the agent “thinks first.” It’s that the agent’s intent becomes a document a human SME can veto — the one document where your specific knowledge (“step 4 hits prod, and it’s Friday, so don’t”) can intercept the agent’s general competence before it acts.

One thing makes the difference between a gate you use and one you abandon: approving has to be cheap. If hitting “yes” means the agent restarts and re-derives the whole plan from scratch — re-running the forty searches it already did — the gate taxes you every time you use it, and you’ll stop. Persist the planning state to a resumable checkpoint and have approval resume from it: agent plan ... --checkpoint plan.json, then agent execute --resume plan.json. The agent picks up holding everything it learned, minus the one decision you struck. The plan becomes the savepoint, not the starting gun.

But a plan is just text. The agent can write “Awaiting approval before step 4” and then, in a long enough session or after a compaction, forget it said that and call the tool anyway. Plans are intentions, and intentions drift. You need something that enforces the gate deterministically. That’s the third primitive.

Step 3 — Gate the destructive step behind a hook that interrupts

Hooks are deterministic code that runs at fixed points in the agent loop — including before a tool call executes. A hook doesn’t ask the model nicely. It runs your script, and if your script says no, the tool call dies. That’s the hard stop the plan only promised.

Here’s a PreToolUse hook that intercepts the irreversible class and stops the call. The hook receives the tool call as JSON on stdin — there is no magic environment variable — so you pipe stdin into jq to pull the command out:

#!/usr/bin/env bash
# Fires before every Bash tool call. Reads the tool call as JSON on stdin.
# Exit 2 = block the call and return the message to the agent.

cmd=$(jq -r '.tool_input.command')   # JSON arrives on stdin

irreversible='(terraform apply|DROP TABLE|DELETE FROM[^;]*;|git push --force|send_email|charge_card)'

if [[ "$cmd" =~ $irreversible ]]; then
  echo "BLOCKED: irreversible action detected:" >&2
  echo "  $cmd" >&2
  echo "This class of command is gated. A human must run it by hand," >&2
  echo "or approve it through the harness before it can proceed." >&2
  exit 2   # non-zero (2) blocks the call; stderr is returned into the agent's context
fi
exit 0

The block mechanism is the contract the hooks chapter documents: exit 2, with your message on stderr. The non-zero exit kills the tool call before it runs; the stderr text is handed back into the agent’s context, so the model sees why it was stopped and re-plans around it. Notice what the script does not do: it doesn’t approve anything. Approval happens where it belongs — the harness’s permission flow, a human deciding in that moment. Want the gate to prompt instead of hard-block? One change: exit 0 and print a hookSpecificOutput block with permissionDecision: "ask", and the call routes straight into the approval prompt rather than dying. The same channel takes "deny" plus a permissionDecisionReason when you want a structured refusal. Either way the script blocks or defers — it never quietly waves an irreversible call through.

That last point matters because of where these hooks run. A PreToolUse hook fires inside the agent’s process, which frequently has no controlling terminal — there is nothing to type into. If you reach for an interactive prompt (read -r answer < /dev/tty) you’ve built a gate that works only when a human is sitting at an attached TTY, and silently fails or hangs the moment the same agent runs headless in CI. Treat a TTY prompt as an interactive-only variant with that caveat attached; the portable, documented gate is exit 2 plus stderr, which behaves identically whether a human is watching or not.

Wire it into the loop:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/gate-irreversible.sh" }
        ]
      }
    ]
  }
}

Two things make this load-bearing. It is deterministic — it fires every time the pattern matches, regardless of what the model believes about its own plan. And it moves the decision to proceed out of the agent loop entirely: proceeding becomes a deliberate human act through the harness, in that moment — not a reflexive y the agent has trained you to autopilot through.

Codex and opencode expose the same pre-execution hook point under their own names; the contract is identical — your code runs before the call, and a non-zero exit kills it. The reversibility list in the regex is the part you own. The agent will never infer that payments_ledger is sacred. You write it down once, here, and the gate enforces it forever.

The one way this primitive turns on you is over-gating. A regex tuned too wide re-creates the exact disease it was meant to cure. Make the pattern DELETE FROM with no qualifier and you’ll block the agent’s perfectly reversible DELETE FROM scratch_tmp inside a transaction it can roll back — and once the gate fires on routine work, you start reflexively waving it through, which is the rubber-stamp reflex you built the gate to kill, now wearing a different hat. The fix is to keep the list narrow and truly irreversible: a DELETE without a WHERE, an apply against a prod workspace, a force-push to a shared branch — not the whole verb. When you’re unsure whether something belongs in the regex, the honest test is the same one from the top: if it runs and it’s wrong, can you undo it? If yes, it does not go in the hook. The gate earns its authority by firing rarely and being right every time.

The three layers do different jobs — that’s why you need all three

It’s tempting to pick one. Don’t.

Permissions set the default posture per tool — silent on reversible, banned on the worst. They run with zero human attention, which is exactly what you want for the overwhelming majority of calls.
Plan mode makes the intended irreversible action visible and reviewable in advance, turning a diff you’d catch too late into a line item you approve on purpose.
The hook is the deterministic backstop that the model cannot forget, drift past, or talk itself out of. It’s the only one of the three that can’t be overridden by a sufficiently confident agent.

Permissions without the hook trust the model to respect a deny it can sometimes route around. Plan mode without the hook trusts the model to honor its own “awaiting approval” line. The hook without plan mode gives you a hard stop with no context for why the agent wanted to act. Stacked, they give you the thing the autonomy pitch never does: full speed on everything reversible, and a wall on everything you can’t take back.

That’s the actual goal. Not an autonomous agent, and not a human babysitting every keystroke. An agent that knows where your line is — because you drew it, in the one place that can’t drift — and moves freely right up to it.

The diff was never the thing to approve. The plan was. Approve the plan, gate the call, and let the agent run.

Reversibility is the only line that matters

Step 1 — Split the toolset by reversibility in permissions

Step 2 — Force a plan so the action is visible before it runs

Step 3 — Gate the destructive step behind a hook that interrupts

The three layers do different jobs — that’s why you need all three

Stay ahead of the curve