Skip to content

Context window management

How agents work establishes the substrate: the agent runs a loop, everything it can see lives in a finite context window, and that window both fills up and loses fidelity as a session grows. This chapter is the set of levers for operating within that limit over a long session — the management layer on top of the substrate.

There are three of them:

  • Inspect — see what’s currently eating the window, before it bites.
  • Compact — replace a long, noisy history with a short summary so the session can keep going.
  • Carry state across the boundary — resume, fork, or otherwise pick up a window instead of starting from an empty one.

None of these change what the agent can do. They change what it can still see clearly after an hour of work.

You’ve been pairing with the agent for two hours on a gnarly migration. It was sharp at the start. Now it’s started re-suggesting a fix it already tried and you already rejected, it’s “forgotten” the constraint you stated forty messages ago, and its edits are getting vaguer. Nothing is broken. The window is just full — mostly of stale file reads, greps that didn’t pan out, and abandoned attempts — and the signal you actually need is buried under noise it can no longer tell apart from signal.

That’s the failure context management addresses. The window has a hard size limit and a softer coherence limit, and a long session walks into both. The levers above are how you stay ahead of them: notice the window filling (inspect), squeeze the dead weight out without losing the thread (compact), and never pay to rebuild context you already had (carry it across).

The concrete moves teams make:

  • Compact at phase boundaries — finished the migration, about to start the tests? Compact first, so the test work starts on a lean window instead of dragging the migration trail behind it.
  • Clear, don’t compact, for unrelated work — switching to a different bug entirely? Start clean. Compaction summarises the old work; clearing drops it, which is what you want when none of it is relevant.
  • Watch the gauge — check what’s consuming the window (tool schemas, a giant file read, an over-long rules file) instead of waiting for degradation to tell you.
  • Delegate the noise instead of compacting after — if a sub-task will generate a lot of throwaway context, hand it to a subagent so it never enters your window in the first place.
  • Resume instead of re-explaining — pick up yesterday’s session with its context intact rather than opening a fresh window and re-establishing everything.

The test: if the agent is getting less sharp the longer you work — not wrong, just hazier — you have a context-management problem, not a model problem.

Compaction is lossy by definition: it trades a faithful, expensive transcript for a cheap, approximate summary. Knowing what survives the trade is the difference between compacting safely and compacting away the thing you needed.

BEFORE compaction (window ~90% full) AFTER compaction (~35%)
┌─────────────────────────────────────┐ ┌────────────────────────────┐
│ system prompt ← always kept │ │ system prompt │
│ rules (AGENTS.md / CLAUDE.md) kept │ │ rules │
│ tool definitions ← always kept │ │ tool definitions │
│ ── ~40 turns of work ────────────── │ │ ┌── summary (lossy) ─────┐ │
│ 17 file reads (full contents) │ ► │ │ goal · key decisions · │ │
│ 9 greps, 6 test runs │ │ │ files touched · │ │
│ 4 abandoned hypotheses │ │ │ next steps │ │
│ exact error/command output │ │ └────────────────────────┘ │
│ most recent turns ← kept │ │ most recent turns │
└─────────────────────────────────────┘ └────────────────────────────┘
the middle is what gets compressed

Reliably kept: the high-level goal, the decisions made and why, which files matter (by name), the state of the in-progress task, and the most recent turns verbatim.

Reliably lost: the full contents of files read (they become a name, not a body), exact error and command output (summarised to prose), and the precise sequence of steps. The practical consequence: write unsaved work to disk before compacting — a diff that exists only in the conversation gets flattened into “edited auth.ts,” and after compaction the agent may need to re-read the file to see its own change.

This is also the cleanest restatement of the rules argument: a constraint stated in conversation can be summarised away, but a constraint in the rules file is reloaded every turn and survives every compaction. Durable facts belong in always-loaded context, not in the chat.

The tools in scope reached the same design independently. Every one of them, when the window approaches full, makes a single summarisation call and replaces the bulk of the history with a structured Markdown summary — and the structure is nearly identical across tools: goal, what was done, key decisions, important files, next steps. Several expose a manual /compact that takes optional instructions (/compact focus on the API changes) so you can bias what the summary preserves. The vocabulary differs; the mechanism has standardised. If you understand compaction in one tool, you understand it in all of them — the only real differences left are how much control you get over when it fires and what it keeps (the comparison below).

You want to…Reach forNot
Free up a full / degrading window mid-taskCompactClear (drops everything, including what you still need)
Start unrelated work on a clean slateClear / new sessionCompact (keeps a summary you don’t want)
Keep a noisy side-quest out of the main window entirelySubagentCompacting after the noise already landed
Make a fact survive every turn and every compactionRulesRestating it in chat
See what’s actually consuming the windowInspect (/context, /status, usage gauge)Waiting for coherence to drop
Pick up a previous session with context intactResume / continueNew window + re-explaining
Explore a risky branch without losing the current threadFork (where supported)Compacting, then regretting it

Compaction is a recovery tool — it salvages a window that’s already heavy. The cheaper move is usually to keep the window light in the first place: delegate noise to subagents, keep rules lean, and lean on MCP tool-schema deferral so unused tool definitions don’t sit in the window as permanent weight.

Inspect. /context shows what’s currently occupying the window (system prompt, rules, tool definitions, MCP schemas, conversation). /usage shows token usage for the session with a breakdown attributed to skills, subagents, plugins, and individual MCP servers. You can also surface live window usage in the status line.

Compact. Auto-compaction summarises conversation history automatically as you approach the limit. Trigger it yourself with /compact, and steer it with inline instructions:

/compact Focus on the API changes and the failing test

You can also set standing compaction guidance in CLAUDE.md:

# Compact instructions
When compacting, preserve test output and code changes.

Clear vs rewind. /clear starts fresh for unrelated work (use /rename first so you can find the session later). /rewind (or double-tap Escape) restores both the conversation and the code to an earlier checkpoint — distinct from compaction; it’s an undo, not a summary.

Carry across. /resume reopens a prior session; claude --continue / -c continues the most recent one. A background job summarises past conversations so resume is fast.

Keep the window light upstream: MCP tool definitions are deferred by default (only names enter context until a tool is used), and moving rarely-needed instructions out of CLAUDE.md into skills keeps base context small.

AspectClaude CodeCodexopencodeCursorCopilot
Inspect usage/context, /usage, status line/statusstatus indicator Unverifiedinput-box gaugechat input context control
Manual compaction/compact [instructions]/compactagent-/auto-drivenmanual summarise Unverified/compact [instructions] (CLI)
Automatic compactionYes, near limitNot documented UnverifiedYes, on overflowYes, near limit (uses a faster model)Yes (~80% of window)
Custom compact instructionsYes (inline + CLAUDE.md)Not exposed UnverifiedFixed structured templateNot exposed UnverifiedYes (inline)
Disable auto-compactionVia config Unverifiedn/acompaction.auto=false / OPENCODE_DISABLE_AUTOCOMPACTNot exposed Unverifiedgithub.copilot.chat.summarizeAgentConversationHistory.enabled=false
Fresh window (same app)/clear/new, /clearnew sessionNew ChatNew Chat
Resume prior session/resume, claude --continue/-c/resume, codex resumeopencode --continue/-c, --session/-sChat historyChat history
Fork / branch a session/fork, /sideopencode --fork

Auto-compaction is now the default everywhere it’s offered; the axis that still separates the tools is control — whether you can see the window filling, steer what the summary keeps, and turn the automation off. The terminal CLIs expose the most knobs; the IDE tools manage it for you with the fewest. Verify tool-specific flags against the upstream docs below before relying on them — these settings move between releases.

  • Compact vs clear vs new. “Compact” summarises and keeps a thread going; “clear” / “new” abandons the thread for a fresh window. In Codex, /new starts a fresh conversation inside the same CLI process while /clear resets both the view and the conversation — neither is /compact.
  • “Summarize” vs “compact”. Cursor and Copilot surface the behaviour as “summarising conversation history”; the terminal tools call the same operation “compaction.” Same mechanism, different label.
  • “Session” is overloaded. In how agents work a session is one run of the loop with one window. opencode also has an opencode session CLI noun and /share-able sessions; don’t read the CLI sense as a different concept.
  • Compaction vs memory. Compaction works within a session to fit the window. Agent-managed memory persists facts across sessions and is a separate primitive (deferred; distinct from rules). Resuming a session is not the same as the agent remembering it.