#context-engineering

DISCOVERIES • Jun 20, 2026

Everyone's Designing Loops. Almost No One Asks What the Agent Knows at Each Tick.

Loop engineering's six building blocks are all context primitives wearing new names. The scheduling is cron. The part that decides whether the loop works is the context delivered on every iteration.

DISCOVERIES • May 31, 2026

Your agent re-litigates the same decision every week because you delete the answer.

Stored session transcripts are an external memory most people throw away after 30 days. A skill that forks a cheap model to grep, expand, and re-rank them turns 'what did we decide about X' into ranked snippets — so the agent stops re-making old mistakes.

DISCOVERIES • May 31, 2026

One Agent Drowns in a 1000-Line Spec. A Hundred Disposable Ones Finish It by Morning.

Decompose a reviewed spec into discrete tasks and drive it with a loop that spawns a fresh headless agent per task. Each unit of work gets a clean context window and a pass/fail gate instead of one agent compacting itself into confusion.

DISCOVERIES • May 31, 2026

A green build is the weakest definition of done — and the one your agent reaches for first.

Write the exact verification step into the feature spec your agent loads, so 'done' means the test passed and the result is on the table — not that the code compiled.

DISCOVERIES • May 31, 2026

The fastest way to corrupt money is to let an agent build the whole payment system.

Not all code is equally expensive to get wrong. Stop reviewing your agent's output uniformly and start spending your scrutiny where a mistake double-charges a customer.

DISCOVERIES • May 31, 2026

When you know where the context lives, stop letting the agent guess.

Agent-driven file discovery burns turns and fills the window with wrong guesses. A parameterized priming command loads exactly the right files in one deterministic shot — reserve probabilistic search for when you genuinely don't know where to look.

DISCOVERIES • May 31, 2026

Headless mode isn't a flag you set. It's a privilege you earn.

Taking the human out of the loop is the real unlock — but headless plus loose permissions plus production access is how you wake up to a wiped repo. Automate only what's tested, sandboxed, and scoped.

DISCOVERIES • May 31, 2026

Put the confirmation gate in the tool, not the UI.

A destructive-tool annotation plus a mid-call elicitation request gives you a deterministic human checkpoint that travels with the capability — so a confused or injected instruction can't quietly delete your data, no matter which client is driving.

DISCOVERIES • May 31, 2026

Your 600-line rules file is teaching the agent to ignore you.

A single giant root rules file dilutes the agent's attention. Split persistent context into a broad root plus directory-scoped files the agent loads by what it's touching — and push task-specific instructions into commands.

DISCOVERIES • May 31, 2026

Stop writing rules for failures your agent never commits.

Most AGENTS.md files defend against imagined mistakes. Mine your own transcripts, count what the agent actually gets wrong, and let the failure distribution decide what to write — and in what order.

DISCOVERIES • May 31, 2026

You can't prompt your way out of a hallucinated output. You can validate your way out in three days.

A deterministic check on the agent's output, with one retry, kills a class of wrong answers that no prompt rewrite ever reaches — and gives you a number instead of a vibe.

DISCOVERIES • May 31, 2026

A missing parameter isn't an error — it's the agent telling you what only you know.

When a tool call lacks a parameter, the agent's two instincts — fail or guess — are both wrong. Have the server elicit the missing field against a JSON schema and turn a half-specified request into a clean, validated dialogue.

DISCOVERIES • May 31, 2026

Three control axes are sitting in the MCP spec. Most teams use one.

An MCP server exposes resources, tools, and prompts — context the app pulls, actions the model takes, workflows the user invokes. Wiring them to the right controller is how the agent reads your conventions instead of guessing them.

DISCOVERIES • May 31, 2026

The friction that makes humans hate pre-commit hooks is what makes them perfect for agents.

A git pre-commit hook turns 'the agent should run the tests' into 'the agent cannot commit broken code' — a deterministic gate that self-corrects from its own error output and ships with every clone.

DISCOVERIES • May 31, 2026

An agent will gate three routes and forget the fourth — unless the gate is a function it has to call.

Scatter feature-gating logic inline and consistency depends on luck. Collapse the policy into one named helper and make 'call this gate' a rule, and uniform enforcement becomes the default across every endpoint the agent writes.

DISCOVERIES • May 31, 2026

Stop describing your work to the agent. Hand it the diff.

Asking an agent to review what you say you did grades your spin, not your code. A small MCP server that reads the live diff and carries its own rubric grades reality instead.

DISCOVERIES • May 31, 2026

Half your agent's 'hallucinations' never reached the prompt

Before you write one more anti-hallucination instruction, split each wrong answer into a retrieval failure and a generation failure. The two need opposite fixes, and no prompt rescues context that was never pulled into scope.

DISCOVERIES • May 31, 2026

If you can describe your workflow, you don't have a workflow — you have a habit.

The correction you keep re-typing into the chat is a missing command. Fold the recurring ritual into one parameterized verb backed by a rules file, and the workflow stops varying with your mood.

DISCOVERIES • May 31, 2026

If you can't reproduce the agent's failure, your fix is a guess.

Turn each reported agent failure into a committed synthetic scenario, then replay it headlessly on every change. The bug becomes a permanent guardrail instead of a one-time patch.

DISCOVERIES • May 31, 2026

Your Reasoning Model Should Never Read Grep Output

Subagents are sold as parallelism. The real win is context hygiene plus model arbitrage: delegate read-heavy investigation to cheap explorers so your expensive model only ever sees distilled findings.

DISCOVERIES • May 31, 2026

Bootstrap a service you don't understand and you've shipped a black box you can't debug.

Wiring up Stripe or a database you've never touched by vibe-coding traps you at the first failure. Point the agent at the service's docs over MCP and force it to teach you back — close its codebase gap and your knowledge gap in the same pass.

DISCOVERIES • May 31, 2026

An autonomous agent will run forever or quit one bug short. You need two brakes, not one.

An unattended agent loop has no instinct for 'done.' Give it a dumb iteration ceiling it can't argue past and a completion sentinel it can raise — and it terminates correctly and cheaply.

DISCOVERIES • May 31, 2026

The smartest worker on your team is the one that forgets every job the moment it ships.

Pairing an isolated subagent with the right tools and a thin slice of your conventions rebuilds the proven agent loop — read, pick a tool, act, report — without dragging dozens of tool-call logs back into your main thread.

DISCOVERIES • May 31, 2026

A legitimate MCP server with two trust levels is the exploit

The dangerous MCP server isn't the malicious one — it's the convenient all-in-one that reads untrusted data and holds privileged access to a second system. Scope it, gate the writes, and you close the confused-deputy attack.

DISCOVERIES • May 31, 2026

An Agent Watching Its Own Context Fill Up Will Rush the Ending. Take the Wheel.

When the window fills, agents cut corners and auto-compaction throws away the wrong things. Author a handoff file, eyeball it, and restart in a clean session you control.

DISCOVERIES • May 31, 2026

Stop handing your agent the API key. Hand it a phone that only dials three numbers.

An MCP server is a trust boundary, not a passthrough. Hold every secret server-side and expose a curated verb set, so a prompt-injected agent is bounded by the tools you chose — not your full API surface.

DISCOVERIES • May 31, 2026

The best prompt you'll ever write is the one you delete.

An agent that waits for you to remember to ask it is a toy. Wire a deterministic event to a headless run that reads your rules, and the routine context work fires on its own — with your conventions already baked in.

DISCOVERIES • May 31, 2026

An Agent Gets Worse Long Before Its Context Window Fills Up

Stop fighting the context limit with a bigger window. Write an exhaustive plan to a file, then run a relay of fresh sessions that each tick off items — so a migration too big for any single context still ships coherently.

DISCOVERIES • May 31, 2026

Run your whole eval suite on every commit and you'll quietly stop running it.

The fix for slow, flaky agent evals isn't a faster suite — it's two loops. A cheap deterministic gate on every PR, and a slow judge suite headless on a schedule. One blocks regressions; the other discovers them.

DISCOVERIES • May 31, 2026

Your agent says the delete worked. The only proof is a sentence it wrote itself.

Give every consequential tool a declared output schema and validate the result against it at the boundary, so the UI gates on a real success boolean instead of trusting the agent's prose — and a malformed result fails loud instead of rendering a lie.

DISCOVERIES • May 31, 2026

Your local review and your CI gate are the same check. Stop maintaining two.

Build your review as a headless command with an explicit tool allowlist, and the exact check you run by hand becomes the exact gate that runs on every pull request. One definition of 'acceptable,' two surfaces.

DISCOVERIES • May 31, 2026

The best prompt your team will ever write is the one nobody has to write twice.

Ship your team's recurring workflows as server-side MCP prompts that pre-load their own data as embedded resources. A teammate types a slash command, fills one argument, and gets an expert prompt with the context already attached.

DISCOVERIES • May 31, 2026

You already have labeled data. It's the transcripts you throw away.

Every failed agent session is a labeled example of where your context falls short. Capture them, cluster them, and fix the top category once in your rules file instead of re-correcting it forever.

DISCOVERIES • May 31, 2026

An Agent That Can't Forget Is Worse Than One With No Memory

A rules file is write-once and rots; auto-memory saves corrections but on its own judgment, in its own store. Add the human-gated curation layer the vendors leave open: a command that counts corrections and proposes durable edits, a hook that resolves conflicts, so corrections compound instead of contradicting.

DISCOVERIES • May 29, 2026

A Second Opinion It Didn't Write

An agent reviewing its own plan shares its own blind spots. Wire a different vendor's model in as a deterministic gate, and a fail verdict re-engages planning before a line of code exists.

DISCOVERIES • May 29, 2026

A Skill That Writes Your Skills

Your debugging ability isn't intelligence, it's a procedure — and procedures are exactly what you can write down once and hand to an agent. Bootstrap the whole library with a meta-skill.

DISCOVERIES • May 29, 2026

By the time you're reviewing the diff, the destructive call already ran.

Reviewing irreversible actions after the fact is too late. Split tools by reversibility in permissions, force a plan, and gate the destructive step behind a human interrupt — autonomy on reads, a hard stop on anything you can't undo.

DISCOVERIES • May 29, 2026

Build the Glossary the Agent Wrote

Vague prompting isn't a skill gap you fix by writing longer prompts. It's a missing shared vocabulary — and the fix is a committed glossary, not a personal habit.

DISCOVERIES • May 29, 2026

Cache the Explore

The codebase is the strongest steering signal you have — stronger than your prompt or your rules. So pay the exploration cost once, freeze it into a file, and feed it to every run.

DISCOVERIES • May 29, 2026

Commission the Skill, Don't Write It

You don't author a skill from a blank page. You send a subagent to research the problem once, and the research becomes the skill — a build artifact, not a document.

DISCOVERIES • May 29, 2026

Design the Codebase Your Agent Can Read

The highest-leverage thing you can do for your agent isn't a better prompt — it's a better interface. Deep modules are persistent context the agent reads at a glance.

DISCOVERIES • May 29, 2026

Destination and Journey

Hand the agent two committed documents — a requirements doc that fixes the destination and a phased plan that fixes the journey — and a feature too big for one window ships correctly anyway.

DISCOVERIES • May 29, 2026

Full Autonomy Is Just a Small Blast Radius

The blocker to running an agent overnight is never capability. It's trust. And you don't earn trust with a smarter model — you earn it with a smaller blast radius.

DISCOVERIES • May 29, 2026

Flat agent swarms demo beautifully and ship terribly.

Let peer agents talk freely and a query hot-potatoes between them, each disclaiming the task, while your bill climbs. Route everything through one orchestrator with explicit handoff rules and a turn cap, and the chaos becomes a system you can test.

DISCOVERIES • May 29, 2026

Inject the Reminder When the Agent Forgets

Your rules file is necessary but not sufficient. Persistent context degrades as the window fills — so move the rule that keeps getting ignored into a hook that fires at the exact moment it matters.

DISCOVERIES • May 29, 2026

The moment your agent reads a webpage, that webpage can give it orders.

Prompt injection isn't an unsolvable model problem — it's a context-engineering one. Fence untrusted tool output as data in your rules, run a parallel guardrail hook that cancels on a hit, and cap the blast radius with a permissions allowlist.

DISCOVERIES • May 29, 2026

The Value of the Planning Step Isn't the Document. It's the Interrogation.

Your judgment — which tradeoff, which edge cases, which empty states — lives only in your head until something extracts it. A saved procedure that explores the repo, then grills you one branch at a time, turns a vague idea into a codebase-aware spec the agent can build against.

Adding a system to your agent should be a config line, not a codebase.

DISCOVERIES • May 29, 2026