Skip to content

Match the dial to the task's value without ever reading a price

The CRUD endpoint shipped on a light gear, the rules-engine design earned a heavy one, and if you scroll back through the afternoon you can watch the dials moving in opposite directions across a single day. That’s not two settings you stumbled into — it’s the actual skill this chapter teaches: treating model and effort as a per-task spend, not a default you set once and forget. This last lesson is about making that a reflex, and about reasoning over cost honestly even though you’ll never see a price in this course.

The cost rule you can apply without a price tag

Section titled “The cost rule you can apply without a price tag”

You don’t need a number to make the right call, because the relationship is fixed regardless of what the number is:

More reasoning effort means more tokens generated and more latency before the answer. Always. A bigger model compounds it.

That’s true at every price point, so the discipline doesn’t depend on knowing the price — it depends on knowing the ratio. The question is never “what does high cost?” in absolute terms. It’s “does this task’s value justify the extra tokens and wait that high spends?” On the rules-engine design — where a wrong call means a rewrite and miscategorised money — yes, easily. On a CRUD endpoint with an existing pattern, no: the extra reasoning produces the same four routes more slowly and for more tokens. Same dial, opposite verdict, and you reached the verdict without a single dollar figure.

The chapter has been turning two independent knobs. Laid out together, the costly mistakes are obvious:

light model capable model
low effort CRUD endpoint simple change on a heavy model
(cheapest) (paying for a brain you idle)
high effort deep thinking on a rules-engine design
light brain (wasteful) (most expensive — reserve it)

The off-diagonal corners are where the tokens leak. Running a capable model at high effort on the accounts endpoint is top-dollar reasoning for a task with one answer. Running high effort on a light model is buying thinking the smaller brain can’t fully cash. You want the matched corners — cheap-and-shallow for the mechanical work, expensive-and-deep for the genuinely hard problem — and you want to move between them as the work changes, which on a lumpy day is often. A profile per corner is what makes that movement one flag instead of four.

Strip the commands away and every choice in this chapter answers one question about the task in front of you:

Would you hand this to a junior without a second thought, or would you want your most careful engineer reasoning it through?

Junior-obvious work — the CRUD endpoint, a rename, a format pass — gets the low gear and the light model. The problems that genuinely fork, where a confident wrong answer costs you hours of unpicking against your own financial data, get the deep gear and the capable model. That’s the rules engine, and it’s why the effort dial existed in the first place.

This whole chapter is a single context-engineering move. You learned to spend the context window on purpose two chapters back — watch it, fill it with signal, reclaim it. Model and effort are the same instinct aimed at the agent’s reasoning: spend the heavy gear where the context is genuinely hard, and pull it back the moment the work goes mechanical. The gap this site is about is the agent’s missing context; the cost lever is making sure you pay for closing that gap on the hard problems, not for over-powering the easy ones. Do that by reflex and a lumpy day costs what it should — almost nothing on the endpoint, real budget on the design, and nothing wasted on the corners between them.

Everything so far has been one agent, on one model, at one effort, working a task in front of you. That’s the base case — and it has a ceiling. When the job is genuinely big — recategorising three years of budgetcli history, or refactoring money-handling across dozens of files — the next move isn’t a bigger gear on a single agent. One agent doing all of that serially floods the very context window you learned to guard, and you wait on it the whole time. The answer is more agents: pushing the heavy, parallelisable work off the main thread into isolated contexts that each carry their own window and report back.

That’s the whole idea of Subagents — and it’s where the course goes next.