Models & effort

By the end of the last chapter you could move through a week of work on budgetcli without losing the thread: resume yesterday, branch a fork, reclaim a bloated window. That settled how you carry the work across time. This chapter settles a different dial — how hard the agent thinks on any given turn — and the whole skill is realising the right answer changes from one task to the next.

Here’s the trap most people fall into. They find a setting that works, leave it there, and pay for it on every prompt. Either they run the most capable model at the deepest reasoning on everything — and burn tokens and latency stamping out a CRUD endpoint — or they leave it light and fast and then watch it flail at a problem that genuinely needed thinking. Both are the same mistake: a fixed setting against a variable workload. The operator’s move is to shift gears, matching the model and the effort to the task instead of to habit.

The day we’ll follow

Same budgetcli, but today the work is deliberately lumpy — it spans the whole difficulty range in a single afternoon, which is exactly what makes the gear-shifting visible:

A boilerplate CRUD endpoint first: budgetcli needs a plain accounts endpoint — list, create, read, delete — following patterns that already exist in the codebase. Mechanical. There is nothing here to reason about, so dial the effort down and let it move fast.
Then the one that earns the deep gear: the categorisation rules engine. Years of imported transactions need tagging — groceries, rent, subscriptions — and the design of how rules match, how they’re ordered, and how conflicts resolve is a genuine fork with several reasonable answers. Turn the effort up and let it think before it writes.
Underneath both: which model answers at all, bundling a model-and-effort combination you’ll reuse, and reasoning about what each gear costs without ever seeing a price tag.

The through-line is the one this whole course keeps circling: context engineering is putting effort where it pays. You learned to spend the context window on purpose in the last chapter. This is the same instinct aimed at the agent’s reasoning — the rules-engine design gets the heavy gear, the CRUD endpoint gets the light one, and you stop paying max for a workload that’s mostly mechanical.

Start with the dial that does the most work — the reasoning-effort levels, and how to move them per task.