Skip to content

Model selection & multi-provider

Every CLI in scope can be steered to a specific underlying model — and most can swap models mid-session. The model is the engine: a “smarter” model reasons more deeply, plans longer, and costs more per token; a “lighter” model is faster, cheaper, and good enough for most mechanical work.

The single-vendor / multi-vendor split shapes everything downstream. Claude Code runs Anthropic models only. Codex runs OpenAI models only. opencode is fully multi-provider — any model that fits its provider abstraction. Cursor ships a wide built-in roster (Anthropic, OpenAI, Google, xAI, Moonshot, plus its own Composer family). Copilot offers a tiered roster (Claude, GPT, Gemini) gated by plan and org policy. That distinction shapes model swaps, cost optimisation, side-by-side comparisons, and vendor-risk posture.

You glance at the month’s API bill and realise you’ve been paying flagship-model rates to do… flagship-model things: renaming variables, regenerating docstrings, running greps and summarising the results, reformatting a JSON file. Tasks the cheapest model in the lineup would handle in half a second. You don’t need Opus to spell connectionTimeout correctly.

The reverse trap exists too. You’ve been letting the cheap fast model lead a complex refactor because you didn’t want to pay for the big one, and now you’re three rounds into untangling a design it tied itself in knots over. The cheap model’s hourly cost was negligible. The total cost — yours, in time spent unscrewing what it did — wasn’t.

That’s the gap model selection closes. Match the model to the work: heavyweight for hard reasoning (planning, code review of long diffs, debugging, architecture), default for everyday coding, lightweight for mechanical or exploratory tasks. The same CLI runs all three; you choose per task or per subagent.

Real-world examples of how teams allocate models:

  • The main loop uses the default — Sonnet (Claude Code) or GPT-5.x default (Codex) — calibrated for general coding.
  • Heavy planning / review uses the flagship — switch to Opus / strongest GPT-5 for “design this feature,” “review this 800-line diff,” “debug this race condition.” Switch back after.
  • Exploration subagents use the cheap fast model — Haiku / gpt-5.4-mini / a small Gemini for “find every place we call X” or “summarise these 40 files.” The work is high-volume and low-stakes per call.
  • Side-by-side comparisons (opencode) — same prompt to Claude, GPT-5, and a Gemini; diff the answers. Useful for hard architecture questions where you don’t trust any single answer.
  • Local / private models for sensitive code — opencode pointed at a local Ollama or LM Studio for code that mustn’t leave the machine.
  • Provider failover — when one provider has an outage or rate-limits, opencode lets you swap providers without changing tool.

The test: if you’ve been using the same model for everything, you’re either overpaying or under-thinking. The right answer is usually “two or three models, picked per task.”

You want to…Reach forNot
Cheaper, faster runs on simple tasksSwitch to a lighter modelA heavier prompt on the flagship
Deeper reasoning on a hard problemSwitch to the flagshipMore skills/context on a lighter one
Different model per workerPer-subagent model fieldRestarting the whole session
Compare two models on the same taskopencode side-by-side agentsRunning the same prompt twice manually
Run a model offline / on-premopencode with local providerClaude Code or Codex
Lock the team to one model for consistencyPin model in project configVerbal agreement

Switching models doesn’t replace your prompt, skills, or memory — it changes who’s reading them. If your output quality problem isn’t about reasoning depth, a model swap won’t fix it.

Models (current Claude 4.x tier):

  • claude-opus-4-7 — flagship; 1M-token context
  • claude-sonnet-4-6 — balanced; 1M-token context
  • claude-haiku-4-5 — fast; 200K-token context

Switching: /model mid-session lists available models; pick one to switch the rest of the conversation. Pin a default via the model setting or the ANTHROPIC_MODEL env var.

Tier strategy:

  • Opus — heaviest model; use for hard reasoning, planning, code review of long diffs.
  • Sonnet — default for most coding work.
  • Haiku — fast and cheap; use for trivial work or as a subagent’s model.

Claude Code does not support non-Anthropic models.

AspectClaude CodeCodexopencodeCursorCopilot
Vendor lock-inAnthropic onlyOpenAI onlyMulti-providerMulti-provider (built-in roster)Multi-provider (curated, plan-gated)
In-session switch/model/model, --modelPer-agent / TUIPer-chat pickerPer-chat picker (/model in CLI)
Curated model listImplicit (Opus/Sonnet/Haiku)Implicit (GPT-5.x)Zen (curated)Composer + rosterPlan-tier roster
Auto-routing modeAuto ModeAuto model selection (10% discount)
Different model per subagentYes (frontmatter model:)Yes (TOML model field)YesYes
Side-by-side comparisonN/AN/ANative (different agents)
Org / policy gatingManaged settingsManaged settingsConfigTeam / EnterpriseBusiness / Enterprise allow-deny
  • Pinning subagent costs. Giving a cheap exploration subagent a cheap model while the main loop uses a heavyweight one saves real money. Claude Code via model: in subagent frontmatter, Codex via model in the agent TOML file, opencode via per-agent model in the agent’s markdown frontmatter, Cursor via subagent frontmatter.
  • Comparison reviews. Want to compare how three models answer the same code-review question? opencode’s per-agent provider is the only built-in path.
  • Vendor risk. If you’ve standardised on Claude Code or Codex and need to switch providers, you switch tools, not just config. Cursor, opencode, and Copilot soften that by carrying multiple vendors inside one tool.
  • Premium-request burn (Copilot). Coding Agent runs consume premium requests; long autonomous sessions can burn quotas fast — a 2026 sticker-shock issue flagged in the press.
  • “Model” sometimes means the family (Opus, Sonnet, GPT-5) and sometimes the snapshot ID (claude-opus-4-7-20...). Be explicit when documenting model selection — snapshot IDs change over time.
  • “Auto” means different things across tools. Cursor’s Auto Mode picks a model on Cursor’s terms from a discounted pool. Copilot’s Auto model selection picks for you with a 10% premium-request discount. They are not the same primitive.