Skip to content

Agent — hand off the whole task

The shared-lib change is in. Now the other half of the work: build the actual approval workflow in orders-service on top of those new helpers. This isn’t a one-file edit. The endpoint needs to check the threshold, an order needs a new “awaiting approval” state, the approval action needs a handler, the audit event needs wiring, and there are tests to update. That’s the kind of multi-file, build-and-verify task Agent mode exists for.

Agent mode is the autonomous one. You give it a goal, and it runs the whole loop itself: it plans an approach, decides which files are relevant and edits across all of them, runs terminal commands and tests, reads the results, and self-corrects in a loop until it judges the task done. Where Edit mode waits for you to scope, Agent mode scopes itself. Where Edit mode hands back diffs and stops, Agent mode keeps going — edit, run, observe, fix — the way you would.

Two things about how it handles control are worth knowing before you turn it loose:

  • It auto-applies its edits as it works, rather than queuing each one for approval. The review moves to the end — you read the full set of changes once it’s done, not keystroke by keystroke. (How much it’s allowed to auto-run is itself configurable; that’s the Permissions chapter.)
  • It surfaces risky terminal commands for your approval. It’ll run tests and safe commands on its own, but when it wants to do something consequential at the shell, it stops and asks. That checkpoint is the one place the autonomy pauses for a human by default.

Switch to Agent, with orders-service open, and give it the goal rather than the steps:

Implement the large-order approval workflow. Orders at or above the approval threshold should enter an “awaiting approval” state instead of being fulfilled, expose an action for a second user to approve them, and record an approval-requested audit event using the shared library. Update the tests.

Then watch it work. It’ll lay out a plan, start editing across the endpoint, the state model, the handler, and the tests, run the suite, and react to failures — fixing its own mistakes without you pointing at each one. This is Copilot at its most capable, and the first time you see it drive a multi-file change to green it genuinely feels like a different tool than the autocomplete you started with.

The trap with Agent mode is that its competence invites you to stop paying attention. Don’t — but also don’t fall back to reviewing every keystroke, which would waste the whole point. The right posture is in between:

  • Watch the terminal-command prompts. When it pauses to ask before running something consequential, actually read what it wants to run. This is your live checkpoint; treat it as one, not as a dialog to dismiss.
  • Review the whole change at the end, as one diff. When it declares done, read everything it touched in one pass — with the same three questions from your first change, now across many files. “The tests pass” is necessary, not sufficient; tests it wrote itself can pass for the wrong reason.
  • Steer with words when it drifts. If it’s heading somewhere you don’t like, say so in the conversation and it re-plans. You don’t have to take its first answer.

Agent mode earns its autonomy on well-scoped, lower-stakes work like a feature in your own service. But notice the assumption hiding in “give it the goal and watch it go”: you trusted it to choose the approach. On a change where the approach itself is the risky part — say, anything back in shared-lib with a dozen consumers downstream — you’d want to see and approve the plan before it wrote a line. That’s a different mode. Next: Plan — see the plan first.