Skip to content

Match the dial to the task's value without ever reading a price

The CRUD endpoint shipped on a light gear, the rules-engine design earned a heavy one, and if you scroll back through the afternoon you can watch the dials moving in opposite directions across a single day. That’s not two settings you stumbled into — it’s the actual skill this chapter teaches: treating model and effort as a per-task spend, not a default you set once and forget. This last lesson is about making that a reflex, and about reasoning over cost honestly even though you’ll never see a price in this course.

The cost rule you can apply without a price tag

Section titled “The cost rule you can apply without a price tag”

You don’t need a number to make the right call, because the relationship is fixed regardless of what the number is:

More reasoning effort means more tokens generated and more latency before the answer. Always. A bigger model compounds it.

That’s true at every price point, so the discipline doesn’t depend on knowing the price — it depends on knowing the ratio. The question is never “what does high cost?” in absolute terms. It’s “does this task’s value justify the extra tokens and wait that high spends?” On the rules-engine design — where a wrong call means a rewrite and miscategorised money — yes, easily. On a CRUD endpoint with an existing pattern, no: the extra reasoning produces the same four routes more slowly and for more tokens. Same dial, opposite verdict, and you reached the verdict without a single dollar figure.

The chapter has been turning two independent knobs. Set them together, per task, and the costly mistakes price themselves:

A lumpy day: four kinds of task, and two dials on each — which model answers, and how much effort it spends thinking. Everything starts where most people leave it: pinned to the expensive corner. Re-dial each task and watch what the day costs.

  • copy tweaksmechanical · five today

    Fix the onboarding typo, reword two error strings, update the footer year — one obvious answer each.

    modeleffort
    top dollar, one answer

    The most expensive corner on the board, spent on work the cheap corner ships identically. This is where a pinned dial leaks.

  • pattern-following endpointsmechanical · three today

    Add list/create endpoints that mirror the handler in the next file over.

    modeleffort
    premium for boilerplate

    Nothing about list/create forks or surprises; the expensive reasoning has nothing to grip. Reserve this corner.

  • the intermittent failurehard · once, thankfully

    A test that fails one run in five, timing-dependent, with the cause three files from the symptom.

    modeleffort
    matched — spend it here

    A genuine reasoning problem: timing-dependent, cause far from symptom, no pattern to copy. This corner exists for exactly this task.

  • the design forkhard · once

    Choose how rule conflicts resolve in the categorisation engine — several defensible designs, one gets built on.

    modeleffort
    matched — the wrong call costs a rewrite

    Several reasonable designs, and whichever wins gets built on. The delta between corners here is noise next to the cost of unpicking a bad choice later.

first runs495 unitsredo tax0 unitsvs the matched day1.9×

Everything ships — no failures, no redo tax — and the day still costs 1.9× what it should. That’s the quiet leak of a pinned dial: the mechanical work bills like hard work, five and three times over. Dial it down to the cheapest corner that ships it; the hard problems keep their budget.

Units are illustrative — one unit is roughly the light model running a small task at low effort. A capable model bills ~5× per token; high effort generates ~3× the tokens; the ratios are the point, not the prices. The redo tax counts an underpowered task’s failed attempts plus the escalation you do anyway — not the hour you spend reading confident wrong answers, which is the real bill.

The off-diagonal corners are where the tokens leak. Running a capable model at high effort on the accounts endpoint is top-dollar reasoning for a task with one answer. Running high effort on a light model is buying thinking the smaller brain can’t fully cash. You want the matched corners — cheap-and-shallow for the mechanical work, expensive-and-deep for the genuinely hard problem — and you want to move between them as the work changes, which on a lumpy day is often. A profile per corner is what makes that movement one flag instead of four.

Strip the commands away and every choice in this chapter answers one question about the task in front of you:

Would you hand this to a junior without a second thought, or would you want your most careful engineer reasoning it through?

Junior-obvious work — the CRUD endpoint, a rename, a format pass — gets the low gear and the light model. The problems that genuinely fork, where a confident wrong answer costs you hours of unpicking against your own financial data, get the deep gear and the capable model. That’s the rules engine, and it’s why the effort dial existed in the first place.

This whole chapter is a single context-engineering move. You learned to spend the context window on purpose two chapters back — watch it, fill it with signal, reclaim it. Model and effort are the same instinct aimed at the agent’s reasoning: spend the heavy gear where the context is genuinely hard, and pull it back the moment the work goes mechanical. The gap this site is about is the agent’s missing context; the cost lever is making sure you pay for closing that gap on the hard problems, not for over-powering the easy ones. Do that by reflex and a lumpy day costs what it should — almost nothing on the endpoint, real budget on the design, and nothing wasted on the corners between them.

Everything so far has been one agent, on one model, at one effort, working a task in front of you. That’s the base case — and it has a ceiling. When the job is genuinely big — recategorising three years of budgetcli history, or refactoring money-handling across dozens of files — the next move isn’t a bigger gear on a single agent. One agent doing all of that serially floods the very context window you learned to guard, and you wait on it the whole time. The answer is more agents: pushing the heavy, parallelisable work off the main thread into isolated contexts that each carry their own window and report back.

That’s the whole idea of Subagents — and it’s where the course goes next.