#context-engineering

56 posts tagged #context-engineering.

When you know where the context lives, stop letting the agent guess.

When you know where the context lives, stop letting the agent guess.

Agent-driven file discovery burns turns and fills the window with wrong guesses. A parameterized priming command loads exactly the right files in one deterministic shot — reserve probabilistic search for when you genuinely don't know where to look.

Headless mode isn't a flag you set. It's a privilege you earn.

Headless mode isn't a flag you set. It's a privilege you earn.

Taking the human out of the loop is the real unlock — but headless plus loose permissions plus production access is how you wake up to a wiped repo. Automate only what's tested, sandboxed, and scoped.

Put the confirmation gate in the tool, not the UI.

Put the confirmation gate in the tool, not the UI.

A destructive-tool annotation plus a mid-call elicitation request gives you a deterministic human checkpoint that travels with the capability — so a confused or injected instruction can't quietly delete your data, no matter which client is driving.

Your 600-line rules file is teaching the agent to ignore you.

Your 600-line rules file is teaching the agent to ignore you.

A single giant root rules file dilutes the agent's attention. Split persistent context into a broad root plus directory-scoped files the agent loads by what it's touching — and push task-specific instructions into commands.

Stop writing rules for failures your agent never commits.

Stop writing rules for failures your agent never commits.

Most AGENTS.md files defend against imagined mistakes. Mine your own transcripts, count what the agent actually gets wrong, and let the failure distribution decide what to write — and in what order.

Three control axes are sitting in the MCP spec. Most teams use one.

Three control axes are sitting in the MCP spec. Most teams use one.

An MCP server exposes resources, tools, and prompts — context the app pulls, actions the model takes, workflows the user invokes. Wiring them to the right controller is how the agent reads your conventions instead of guessing them.

Stop describing your work to the agent. Hand it the diff.

Stop describing your work to the agent. Hand it the diff.

Asking an agent to review what you say you did grades your spin, not your code. A small MCP server that reads the live diff and carries its own rubric grades reality instead.

Half your agent's 'hallucinations' never reached the prompt

Half your agent's 'hallucinations' never reached the prompt

Before you write one more anti-hallucination instruction, split each wrong answer into a retrieval failure and a generation failure. The two need opposite fixes, and no prompt rescues context that was never pulled into scope.

Your Reasoning Model Should Never Read Grep Output

Your Reasoning Model Should Never Read Grep Output

Subagents are sold as parallelism. The real win is context hygiene plus model arbitrage: delegate read-heavy investigation to cheap explorers so your expensive model only ever sees distilled findings.

A legitimate MCP server with two trust levels is the exploit

A legitimate MCP server with two trust levels is the exploit

The dangerous MCP server isn't the malicious one — it's the convenient all-in-one that reads untrusted data and holds privileged access to a second system. Scope it, gate the writes, and you close the confused-deputy attack.

The best prompt you'll ever write is the one you delete.

The best prompt you'll ever write is the one you delete.

An agent that waits for you to remember to ask it is a toy. Wire a deterministic event to a headless run that reads your rules, and the routine context work fires on its own — with your conventions already baked in.

An Agent Gets Worse Long Before Its Context Window Fills Up

An Agent Gets Worse Long Before Its Context Window Fills Up

Stop fighting the context limit with a bigger window. Write an exhaustive plan to a file, then run a relay of fresh sessions that each tick off items — so a migration too big for any single context still ships coherently.

An Agent That Can't Forget Is Worse Than One With No Memory

An Agent That Can't Forget Is Worse Than One With No Memory

A rules file is write-once and rots; auto-memory saves corrections but on its own judgment, in its own store. Add the human-gated curation layer the vendors leave open: a command that counts corrections and proposes durable edits, a hook that resolves conflicts, so corrections compound instead of contradicting.

A Second Opinion It Didn't Write

A Second Opinion It Didn't Write

An agent reviewing its own plan shares its own blind spots. Wire a different vendor's model in as a deterministic gate, and a fail verdict re-engages planning before a line of code exists.

A Skill That Writes Your Skills

A Skill That Writes Your Skills

Your debugging ability isn't intelligence, it's a procedure — and procedures are exactly what you can write down once and hand to an agent. Bootstrap the whole library with a meta-skill.

By the time you're reviewing the diff, the destructive call already ran.

By the time you're reviewing the diff, the destructive call already ran.

Reviewing irreversible actions after the fact is too late. Split tools by reversibility in permissions, force a plan, and gate the destructive step behind a human interrupt — autonomy on reads, a hard stop on anything you can't undo.

Build the Glossary the Agent Wrote

Build the Glossary the Agent Wrote

Vague prompting isn't a skill gap you fix by writing longer prompts. It's a missing shared vocabulary — and the fix is a committed glossary, not a personal habit.

Cache the Explore

Cache the Explore

The codebase is the strongest steering signal you have — stronger than your prompt or your rules. So pay the exploration cost once, freeze it into a file, and feed it to every run.

Commission the Skill, Don't Write It

Commission the Skill, Don't Write It

You don't author a skill from a blank page. You send a subagent to research the problem once, and the research becomes the skill — a build artifact, not a document.

Design the Codebase Your Agent Can Read

Design the Codebase Your Agent Can Read

The highest-leverage thing you can do for your agent isn't a better prompt — it's a better interface. Deep modules are persistent context the agent reads at a glance.

Destination and Journey

Destination and Journey

Hand the agent two committed documents — a requirements doc that fixes the destination and a phased plan that fixes the journey — and a feature too big for one window ships correctly anyway.

Full Autonomy Is Just a Small Blast Radius

Full Autonomy Is Just a Small Blast Radius

The blocker to running an agent overnight is never capability. It's trust. And you don't earn trust with a smarter model — you earn it with a smaller blast radius.

Flat agent swarms demo beautifully and ship terribly.

Flat agent swarms demo beautifully and ship terribly.

Let peer agents talk freely and a query hot-potatoes between them, each disclaiming the task, while your bill climbs. Route everything through one orchestrator with explicit handoff rules and a turn cap, and the chaos becomes a system you can test.

Inject the Reminder When the Agent Forgets

Inject the Reminder When the Agent Forgets

Your rules file is necessary but not sufficient. Persistent context degrades as the window fills — so move the rule that keeps getting ignored into a hook that fires at the exact moment it matters.

The moment your agent reads a webpage, that webpage can give it orders.

The moment your agent reads a webpage, that webpage can give it orders.

Prompt injection isn't an unsolvable model problem — it's a context-engineering one. Fence untrusted tool output as data in your rules, run a parallel guardrail hook that cancels on a hit, and cap the blast radius with a permissions allowlist.

The Value of the Planning Step Isn't the Document. It's the Interrogation.

The Value of the Planning Step Isn't the Document. It's the Interrogation.

Your judgment — which tradeoff, which edge cases, which empty states — lives only in your head until something extracts it. A saved procedure that explores the repo, then grills you one branch at a time, turns a vague idea into a codebase-aware spec the agent can build against.

Adding a system to your agent should be a config line, not a codebase.

Adding a system to your agent should be a config line, not a codebase.

Hand-writing an adapter for every external system is the contextless agent's biggest tax. An MCP server collapses the M×N integration blowup into one standard connection the agent discovers at runtime — scoped by permissions, configured once.

Package the Workflow, Not the Prompt

Package the Workflow, Not the Prompt

A plugin isn't a convenience wrapper. It's how a workflow survives leaving your laptop — the difference between a personal habit and a team standard.

'It worked when I tried it' is not a test for a non-deterministic system.

'It worked when I tried it' is not a test for a non-deterministic system.

Treat prompts and rules like code: a golden dataset of inputs with known-good outputs, run headless on every change, gated by a hook that fails the build below baseline. The eval that blocks the merge is the one that prevents regressions.

The Loop That Re-Reads Its Diary

The Loop That Re-Reads Its Diary

An autonomous agent isn't a bigger context window. It's a tiny window run many times, where the git log — not the chat history — carries the decisions between passes.

The Relay, Not the Window

The Relay, Not the Window

A million-token window won't fix a long build. Externalize state to a plan file and run a relay across fresh sessions, because quality dies long before the window fills.

Your Backlog Is the Prompt

Your Backlog Is the Prompt

Hand-passing one plan per run keeps you inside the loop doing task selection for the agent. Pipe your whole issue tracker in and let it pick the next ticket itself.

Your AI agent can't read your Jira. MCP is how you fix that.

Your AI agent can't read your Jira. MCP is how you fix that.

MCP is the standard protocol for plugging external systems — issue trackers, databases, design tools — into AI agents. Every major tool added support for it this year. Here's what changes when you use it.