“Build the payment flow” feels like the most productive prompt you’ll write all week. The agent comes back twenty minutes later with a Stripe integration, a webhook handler, a charges table migration, retry logic, and a tidy summary. It looks done. It compiles. The demo works.
It’s also the single riskiest thing you did all sprint, and you won’t find out for three weeks — when a webhook replays and a customer gets billed twice.
Here’s the uncomfortable part. The agent that wrote your billing code is the same agent that wrote your settings page. Same model, same speed, same confidence. You reviewed both the same way: a glance at the diff, a thumbs-up. But those two diffs do not carry the same downside. One produces a support ticket. The other produces a chargeback, a furious customer, and a reconciliation spreadsheet you’ll be staring at on a Saturday.
Your codebase is not one risk surface — it’s two
Section titled “Your codebase is not one risk surface — it’s two”The mainstream advice is sound as far as it goes: review everything the agent writes, run the tests, keep a human in the loop. The trouble is that “review everything” treats a CSS tweak and a DELETE FROM subscriptions with the same budget of attention — and your attention is the scarce resource. Uniform scrutiny means you under-review the dangerous code and over-review the harmless code. You burn out and miss the bug.
The fix is to stop thinking of your repo as one thing the agent operates on. Split it by blast radius:
- Low-stakes zones — UI, internal tooling, copy, docs, the marketing site. A mistake here costs a complaint and a follow-up commit. Let the agent run loose. This is exactly where its breadth and speed pay off.
- High-stakes zones — billing, auth, schema migrations, anything that touches money, identity, or data you can’t reconstruct. A mistake here is silent and expensive. The agent ships one small slice at a time, behind a flag, under explicit guardrails.
The question this raises — and I’ll come back to it — is how the agent is supposed to know which zone it’s standing in. Because if you’re the only one tracking that boundary, you’ve just become the bottleneck you were trying to automate away.
Encode the boundary so the agent enforces it, not you
Section titled “Encode the boundary so the agent enforces it, not you”Start with permissions. This is the layer that turns “never apply a migration to production unsupervised” into a mechanical fact rather than a hope. Block the commands that change a live system outright, and require explicit approval before the agent edits anything under your money-or-identity paths:
{ "permissions": { "deny": [ "Bash(npx prisma migrate deploy:*)", "Bash(stripe:*)" ], "ask": [ "Bash(npx prisma migrate dev:*)", "Edit(src/billing/**)", "Edit(src/auth/**)" ] }}Claude Code evaluates these in order — deny, then ask, then allow — and matches Bash rules against the start of the command, not arbitrary substrings buried inside it. It’s smarter than a naive prefix check in one important way: it splits on shell operators (&&, ||, ;, |) and matches each subcommand independently, so Bash(stripe:*) still fires on echo hi && stripe charges create. But the anchoring cuts the other way too — you can’t reliably blacklist a phrase like DROP TABLE or --live, because a flag buried mid-command isn’t a prefix and the agent can reach the same outcome a dozen ways the pattern never sees. The deny list is a backstop. The durable guarantee is that the agent’s shell only ever holds scratch-database and test-mode credentials in the first place — string-matching is the second lock, not the first.
The ask rules are the leash. The agent can propose an edit to src/billing/, but it stops and waits for you before applying it. In the low-stakes directories, you grant blanket write access and let it churn. Same agent, different leash length — set by the path it’s editing, not by your vigilance in the moment.
In the dangerous zone, make the agent show its work before it does any
Section titled “In the dangerous zone, make the agent show its work before it does any”Permissions gate the action. Plan mode gates the intent — and that’s where the real safety is, because the catastrophic billing bugs are almost never typos. They’re design errors: a webhook that isn’t idempotent, a retry that doesn’t check whether the charge already succeeded, a migration that drops a column the old code still reads.
When the agent is in a high-stakes zone, force it to produce a plan before it writes a line:
> Add a one-time charge when a user upgrades to Pro.
[plan mode]1. Add `idempotency_key` column to charges (additive migration, nullable).2. Generate a fresh idempotency key for each charge request; send it as Stripe's `Idempotency-Key` header.3. Webhook handler: dedupe on Stripe's event ID (unique constraint), skip events already processed.4. Backfill is unnecessary — column is nullable, old rows unaffected.5. Ship behind flag `billing.one_time_charge`, default off.Now you’re reviewing the reasoning at the moment it’s cheap to change, not the implementation after the agent has already committed to a flawed shape. You catch it in the plan, not in production: step 2 mints a fresh key on every request, so a user who retries after a dropped connection sends one Stripe has never seen — and Stripe charges them again. The key has to be minted once per upgrade and reused on every retry of that same upgrade. That’s not a stylistic nit. Stripe remembers an idempotency key for exactly 24 hours and replays the original response for any repeat within that window; mint a new key and you’ve stepped outside the only protection the API offers you. A bad plan costs a sentence to fix; a bad merge costs a weekend.
Step 3 hides a subtler version of the same trap. Stripe guarantees at-least-once delivery and retries a failed webhook with exponential backoff for up to 72 hours, which means the same event will arrive more than once by design — not as a rare glitch. Deduping on the event ID is correct, but the dedupe record has to outlive the retry window. Garbage-collect that table after a day and a webhook that finally lands on hour 50 looks brand new again: you process it twice, and the customer’s account moves twice. The idempotency window is 24 hours; the retry window is 72. The post’s whole argument lives in that 48-hour gap, and you only see it if you read the plan before the code exists.
A different shape of expensive: the data you can’t get back
Section titled “A different shape of expensive: the data you can’t get back”Billing is the loud failure — someone notices when they’re charged twice. The quiet one is worse, and it’s where the second high-stakes surface lives: schema changes that touch data you cannot reconstruct.
Ask an agent to “rename the users.name column to full_name” and the obvious, compiles-clean answer is a single migration that renames the column and updates the model. It works in dev. It passes the tests. Then it deploys, and for the ninety seconds between the migration landing and the new code rolling out, every running instance of the old code queries a column that no longer exists. Every request 500s. If the rename also drops the old column, a rollback can’t bring it back — the data’s structure is gone, and you’re restoring from a backup on a Saturday.
The plan-mode version of this is the expand/contract pattern, and the agent will produce it correctly if you make it plan first:
> Rename users.name to full_name.
[plan mode]1. EXPAND: add full_name column (nullable), backfill from name. Old code untouched.2. Dual-write: app writes both columns. Ship, bake for a release.3. CONTRACT: switch reads to full_name, stop writing name. Ship.4. Only after a clean release: drop name in a separate, later migration.Four shipped changes instead of one, each reversible, never a moment where live code reads a column that isn’t there. That is slower, and that is the entire point. The single-migration version felt like one productive prompt and was a latent outage. The boundary isn’t “billing is special” — it’s “anything where being wrong is silent and unrecoverable gets the short leash,” and unreconstructable data qualifies just as much as money does.
Write the zone boundary down once, so every session inherits it
Section titled “Write the zone boundary down once, so every session inherits it”This is the payoff to the open question: the agent knows which zone it’s in because you told it — once — in your rules file. Permissions are the hard gate; rules are the soft context that makes the agent behave correctly before it ever hits the gate.
## Risk zones
- `src/ui/`, `src/components/`, `docs/` — low stakes. Move fast, one PR per change, no plan needed.- `src/billing/`, `src/auth/`, `prisma/migrations/` — HIGH stakes. - Never edit more than one file per change. - All DB changes must be additive and reversible. No dropping columns in the same migration that stops reading them. - Everything ships behind a feature flag, default off. - Propose a plan before writing. Do not run migrations against any database but `dev_scratch`.The flag discipline is doing quiet, heavy lifting here. Shipping the risky slice behind a flag that defaults off means the agent’s code can be wrong and merged and deployed and still harm no one, because nothing is calling it yet. You flip it on for yourself, then a test account, then 1% of traffic. The blast radius is something you turn up by hand — not something the agent decides by shipping.
Pin all of this in your configuration so it travels with the repo. The next engineer who runs the agent against src/billing/ inherits the same leash without knowing it’s there. That’s the whole point: the boundary is a property of the codebase, not of whoever happens to be driving.
When the leash is just friction
Section titled “When the leash is just friction”There’s a real version of this you can overdo. If you’re three days into a prototype with no users, no money moving, and a database you’d happily drop and reseed, then drawing risk zones is ceremony — you’ve added a plan-mode tax to code whose blast radius is “I redo an afternoon.” The whole framework is a response to irreversibility, and a throwaway has none. Let the agent run loose everywhere; that’s correct.
The zones earn their keep the moment a mistake stops being recoverable by you alone — the first real payment, the first customer record you can’t regenerate, the first migration against a database someone else depends on. The signal isn’t “this code is important.” It’s “if this is wrong, can I quietly fix it before anyone is harmed?” When the answer turns to no, that path joins the high-stakes list, and the leash goes on. Drawing the line too early costs you some speed on code that didn’t need protecting. Drawing it too late costs a customer. Given that asymmetry, err toward marking a surface high-stakes one feature before it carries real weight — moving a path off the list later is a one-line diff.
The leash is the feature, not the friction
Section titled “The leash is the feature, not the friction”The instinct that “build the whole payment system” is your most productive prompt is exactly backwards. One-shotting the high-stakes surface feels like leverage and is the most efficient way to ship a silent, expensive bug. Real leverage is letting the agent sprint across the 80% of your codebase where being wrong is cheap, and walking it on a short leash across the 20% where being wrong costs real money.
Don’t review your agent harder. Review it unevenly. Spend your scrutiny where the mistake double-charges a customer — and hand the rest to the machine.
For the per-tool mechanics, see Permissions for the hard gates, Plan mode for reviewing intent before action, and Rules for encoding the zone boundary so every session inherits it.