Match the dial to the task's value without ever reading a price

The CRUD endpoint shipped on a light gear, the rules-engine design earned a heavy one, and if you scroll back through the afternoon you can watch the dials moving in opposite directions across a single day. That’s not two settings you stumbled into — it’s the actual skill this chapter teaches: treating model and effort as a per-task spend, not a default you set once and forget. This last lesson is about making that a reflex, and about reasoning over cost honestly even though you’ll never see a price in this course.

The cost rule you can apply without a price tag

You don’t need a number to make the right call, because the relationship is fixed regardless of what the number is:

More reasoning effort means more tokens generated and more latency before the answer. Always. A bigger model compounds it.

That’s true at every price point, so the discipline doesn’t depend on knowing the price — it depends on knowing the ratio. The question is never “what does high cost?” in absolute terms. It’s “does this task’s value justify the extra tokens and wait that high spends?” On the rules-engine design — where a wrong call means a rewrite and miscategorised money — yes, easily. On a CRUD endpoint with an existing pattern, no: the extra reasoning produces the same four routes more slowly and for more tokens. Same dial, opposite verdict, and you reached the verdict without a single dollar figure.

The two dials, and the corners to avoid

The chapter has been turning two independent knobs. Set them together, per task, and the costly mistakes price themselves:

A lumpy day: four kinds of task, and two dials on each — which model answers, and how much effort it spends thinking. Everything starts where most people leave it: pinned to the expensive corner. Re-dial each task and watch what the day costs.

first runs495 unitsredo tax0 unitsvs the matched day1.9×

Everything ships — no failures, no redo tax — and the day still costs 1.9× what it should. That’s the quiet leak of a pinned dial: the mechanical work bills like hard work, five and three times over. Dial it down to the cheapest corner that ships it; the hard problems keep their budget.

Units are illustrative — one unit is roughly the light model running a small task at low effort. A capable model bills ~5× per token; high effort generates ~3× the tokens; the ratios are the point, not the prices. The redo tax counts an underpowered task’s failed attempts plus the escalation you do anyway — not the hour you spend reading confident wrong answers, which is the real bill.

The off-diagonal corners are where the tokens leak. Running a capable model at high effort on the accounts endpoint is top-dollar reasoning for a task with one answer. Running high effort on a light model is buying thinking the smaller brain can’t fully cash. You want the matched corners — cheap-and-shallow for the mechanical work, expensive-and-deep for the genuinely hard problem — and you want to move between them as the work changes, which on a lumpy day is often. A profile per corner is what makes that movement one flag instead of four.

The test that decides the gear

Strip the commands away and every choice in this chapter answers one question about the task in front of you:

Would you hand this to a junior without a second thought, or would you want your most careful engineer reasoning it through?

Junior-obvious work — the CRUD endpoint, a rename, a format pass — gets the low gear and the light model. The problems that genuinely fork, where a confident wrong answer costs you hours of unpicking against your own financial data, get the deep gear and the capable model. That’s the rules engine, and it’s why the effort dial existed in the first place.

The discipline, stated once

This whole chapter is a single context-engineering move. You learned to spend the context window on purpose two chapters back — watch it, fill it with signal, reclaim it. Model and effort are the same instinct aimed at the agent’s reasoning: spend the heavy gear where the context is genuinely hard, and pull it back the moment the work goes mechanical. The gap this site is about is the agent’s missing context; the cost lever is making sure you pay for closing that gap on the hard problems, not for over-powering the easy ones. Do that by reflex and a lumpy day costs what it should — almost nothing on the endpoint, real budget on the design, and nothing wasted on the corners between them.

Where this goes next

Everything so far has been one agent, on one model, at one effort, working a task in front of you. That’s the base case — and it has a ceiling. When the job is genuinely big — recategorising three years of budgetcli history, or refactoring money-handling across dozens of files — the next move isn’t a bigger gear on a single agent. One agent doing all of that serially floods the very context window you learned to guard, and you wait on it the whole time. The answer is more agents: pushing the heavy, parallelisable work off the main thread into isolated contexts that each carry their own window and report back.

That’s the whole idea of Subagents — and it’s where the course goes next.