#hooks

DISCOVERIES • May 31, 2026

Headless mode isn't a flag you set. It's a privilege you earn.

Taking the human out of the loop is the real unlock — but headless plus loose permissions plus production access is how you wake up to a wiped repo. Automate only what's tested, sandboxed, and scoped.

DISCOVERIES • May 31, 2026

You can't prompt your way out of a hallucinated output. You can validate your way out in three days.

A deterministic check on the agent's output, with one retry, kills a class of wrong answers that no prompt rewrite ever reaches — and gives you a number instead of a vibe.

DISCOVERIES • May 31, 2026

The friction that makes humans hate pre-commit hooks is what makes them perfect for agents.

A git pre-commit hook turns 'the agent should run the tests' into 'the agent cannot commit broken code' — a deterministic gate that self-corrects from its own error output and ships with every clone.

DISCOVERIES • May 31, 2026

An agent will gate three routes and forget the fourth — unless the gate is a function it has to call.

Scatter feature-gating logic inline and consistency depends on luck. Collapse the policy into one named helper and make 'call this gate' a rule, and uniform enforcement becomes the default across every endpoint the agent writes.

DISCOVERIES • May 31, 2026

A legitimate MCP server with two trust levels is the exploit

The dangerous MCP server isn't the malicious one — it's the convenient all-in-one that reads untrusted data and holds privileged access to a second system. Scope it, gate the writes, and you close the confused-deputy attack.

DISCOVERIES • May 31, 2026

The best prompt you'll ever write is the one you delete.

An agent that waits for you to remember to ask it is a toy. Wire a deterministic event to a headless run that reads your rules, and the routine context work fires on its own — with your conventions already baked in.

DISCOVERIES • May 31, 2026

Run your whole eval suite on every commit and you'll quietly stop running it.

The fix for slow, flaky agent evals isn't a faster suite — it's two loops. A cheap deterministic gate on every PR, and a slow judge suite headless on a schedule. One blocks regressions; the other discovers them.

DISCOVERIES • May 31, 2026

You already have labeled data. It's the transcripts you throw away.

Every failed agent session is a labeled example of where your context falls short. Capture them, cluster them, and fix the top category once in your rules file instead of re-correcting it forever.

DISCOVERIES • May 31, 2026

An Agent That Can't Forget Is Worse Than One With No Memory

A rules file is write-once and rots; auto-memory saves corrections but on its own judgment, in its own store. Add the human-gated curation layer the vendors leave open: a command that counts corrections and proposes durable edits, a hook that resolves conflicts, so corrections compound instead of contradicting.