You set the loop running before bed. Triage the open issues, draft fixes, open PRs, stop when CI is green. By morning it has opened nine. Three are good. The other six are confidently wrong in ways you’d have caught in ten seconds from the chair — except the whole point was that you weren’t in the chair. The loop didn’t fail because the model is weak. It failed because at 2am, on iteration forty, it didn’t know something you know.
That’s the part the pitch skips. The advice — stop prompting your agents, build the system that prompts them — is right, and it ships with a tidy parts list: automations, worktrees, skills, connectors, sub-agents, a memory file. Every version of it converges on roughly that list, then closes with the same noble line about staying the engineer. The list is correct. But it’s organized around the wrong question. It tells you what to assemble. It never asks the one that decides whether the assembled thing works: on each tick, what does the agent actually know?
Because that’s the part the parts list can’t see. A loop is two halves bolted together. One half is the timer — the thing that fires every fifteen minutes, on a PR comment, on a CI failure. That half is genuinely cron, and cron has shipped with Unix since the 1970s. The other half is a model that wakes up with no memory of the last 4,000 ticks, looks at the world, and decides what to do next. The timer is solved. The decision is not. And the quality of that decision is set entirely by what context arrives in the window before the model is asked to think.
So loop engineering isn’t a new discipline that replaces prompting. It’s the most demanding form of context engineering there is — the discipline this whole site is about: closing the gap between an agent that’s capable but knows nothing about your project, and you, who know everything but can’t be in the chair at 3am. When you were in the chair, you closed that gap live. You saw the agent reach for the wrong file and you redirected it. Remove yourself from the loop and every gap you used to patch in real time has to be pre-loaded into the tick. That’s the whole job. The six building blocks are just the six places context gets loaded.
The six blocks are six context moves
Section titled “The six blocks are six context moves”Walk the canonical list again, but ask of each one what context does this deliver, and to whom — and the list stops being a parts bin and starts being a map.
Skills are persistent context, written once. A skill is the project knowledge the agent would otherwise guess at — the conventions, the build steps, the “we don’t do it this way because of that one incident.” In an interactive session you’d just tell the agent that when it went wrong. In a loop there’s no one to tell it, so the telling has to live on disk where every tick re-reads it cold. Same with a rules file. Without these, the loop re-derives your entire project from zero every single cycle and fills every hole with a confident wrong guess. With them, intent compounds instead of evaporating.
The state file is context across runs. The agent forgets everything between ticks; the repo does not. A markdown file, a Linear board, the git log — anything outside the single conversation that records what got tried, what passed, what’s still open. This is the least glamorous block and the most load-bearing, because it’s the only thing that turns 200 independent ticks into one coherent project instead of 200 fresh starts. (We’ve built a whole loop around exactly this — the git log as the agent’s only memory.)
Sub-agents are context isolation. This is the block people most often mistake for an orchestration trick when it’s really a context trick. The reason you split the agent that writes from the agent that checks isn’t org-chart theater — it’s that a verifier is only honest if it has a context the maker can’t see. Self-critique fails for a precise, mechanical reason: the same window that holds the flawed reasoning also holds the grade, and the model is far too generous about its own work. Spawn a sub-agent with a clean window and a different instruction and you’ve created a context boundary the first agent can’t reason its way across. That boundary is the verifier’s entire value.
Connectors are external context reach. A loop that can only see the filesystem is a tiny loop. MCP servers let it read the issue tracker, query the database, hit staging, post to Slack — which is the difference between an agent that says “here’s the fix” and a loop that opens the PR, links the ticket, and pings the channel once CI is green. But notice it’s still context: the connector’s job is to pull the state of your real systems into the window so the next decision is made against reality instead of against a stale guess.
Worktrees keep parallel contexts from colliding. When two agents edit the same file they corrupt each other the way two engineers committing to the same lines without talking do. A git worktree gives each its own isolated checkout. It’s context isolation again, one floor down — at the filesystem instead of the window.
Five of the six “building blocks” are about what’s in the window and what’s walled off from it. The sixth, the automation, is the cron half. That ratio is the tell.
Every failure mode is a context failure
Section titled “Every failure mode is a context failure”Watch what the loop discourse warns you about, and it’s the same point from the other side.
The Ralph loop that exits on a half-done job — that’s a loop with no context of what “done” means, so it accepts the agent’s own say-so. The fix is a gate: a test, a type check, a build. Not a second agent with an opinion — an objective signal that lives outside any agent’s context and can fail the work without a human in the room.
Goal drift at turn 47 — the constraint you set at the start (“don’t touch the billing module”) vanishes because each summarization step is lossy and nobody re-injected it. That’s a context-maintenance failure, fixed by a standing file the loop re-reads every pass rather than trusting the window to retain.
Comprehension debt — the gap between what the repo now contains and what any human has actually read — is a context failure on your side of the loop. The faster it ships code you didn’t write, the wider that gap, and the bill that eventually hurts isn’t the token bill, it’s the day you have to debug a system no one understands. The mitigation isn’t technical. It’s reading the diffs. Staying in the context you delegated.
And the runaway spend everyone’s scared of is what happens when the only context the loop lacks is a stop condition — no iteration cap, no token ceiling, no no-progress detector. A decision-maker with no sense of when to stop deciding. The bills are real. One documented fleet ran sixteen agents in parallel for two weeks to build a single C compiler — just under twenty thousand dollars in tokens. A large engineering org handed its developers Claude Code in December, watched them burn the entire year’s AI budget by April, then capped every engineer at fifteen hundred dollars per tool per month. Once the model writes the code for almost nothing, the loop running it becomes the expensive part.
None of these are scheduling bugs. You cannot fix a single one by tuning the cron expression. They’re all the same failure in different costumes: the agent, at some tick, didn’t know something it needed to know — and in a loop, “didn’t know” is the only failure mode, because the loop is nothing but a machine for delivering knowledge to a forgetful model on a timer.
What one tick actually carries
Section titled “What one tick actually carries”Strip the romance and a working loop is short. Here’s the whole thing — a bounded loop that works a plan one item at a time, with every line that matters labeled by the context it delivers, not the component it is:
#!/usr/bin/env bashMAX=20 # STOP: a hard cap, so a stuck tick can't run foreverfor ((i=1; i<=MAX; i++)); do prompt="$(cat <<EOF$(cat ./loop-skill.md) # SKILL: how we do things here, written once## What past ticks did — your only memory, read it$(tail -30 ./STATE.md) # STATE: context across runs; the agent forgot, the file didn't## Your job this tickWork the first unchecked item in ./PLAN.md. One thing, then stop.EOF)" agent -p "$prompt" --allowedTools "Edit,Bash(npm test *)" # SCOPE: the only tools reachable this tick if ./verify.sh; then # VERIFY: an external gate, not the maker's opinion git commit -am "tick $i" else echo "tick $i $(date): failed" >> ./STATE.md fi grep -q '^- \[ \]' ./PLAN.md || break # STOP: an objective done-condition, checked each tickdone # the for-loop itself — this part really is just cronFive of the six building blocks are right there in the comments, and not one of them is about scheduling. They’re about what reaches the window before the model is asked to think: the skill that says how, the state that says what already happened, the scope that says what’s reachable, the gate that says whether it worked. The for loop — the part skeptics dismiss as “just a cron job” — is the one line that genuinely is. Everything else is context delivery. Sub-agents are the sixth block; you reach for them the moment verify.sh needs to be a fresh model grading the work instead of a script, because a checker that shares the maker’s window shares its blind spots.
The leverage moved up a floor. The work didn’t change kind.
Section titled “The leverage moved up a floor. The work didn’t change kind.”This is the honest version of the story, and it’s less triumphant than the six-word slogan. Loop engineering doesn’t retire context engineering — it removes the one actor who was quietly doing context engineering by hand the whole time: you, in the chair, patching gaps live. Take yourself out and the patching doesn’t disappear. It moves upstream, into the skill files and state files and verifier boundaries you build once, in advance, for ticks you’ll never watch.
That’s why two people build the identical loop and get opposite results. One uses it to move faster on work they understand deeply, pre-loading the context they already carry in their head. The other uses it to avoid understanding the work — and the loop faithfully amplifies the void, atomically committed, with green checkmarks on tasks that satisfy the letter of an underspecified goal and none of its intent. Garbage in, garbage in production by morning. The loop doesn’t know the difference. The context you fed it does.
So yes — design the loop. Take yourself out of the chair. But don’t mistake the timer for the achievement. A loop is only as good as the context it carries each tick, and assembling six blocks is the easy part. The hard part is the same hard part it’s always been: handing a capable, contextless machine everything it can’t infer about your project — except now you have to hand it everything up front, because you won’t be there at 3am to fill in what you forgot.
Build the loop. But build it like someone who knows the loop is just context engineering that finally has to stand on its own.