Context window management
What context window management is
Section titled “What context window management is”How agents work establishes the substrate: the agent runs a loop, everything it can see lives in a finite context window, and that window both fills up and loses fidelity as a session grows. This chapter is the set of levers for operating within that limit over a long session — the management layer on top of the substrate.
There are three of them:
- Inspect — see what’s currently eating the window, before it bites.
- Compact — replace a long, noisy history with a short summary so the session can keep going.
- Carry state across the boundary — resume, fork, or otherwise pick up a window instead of starting from an empty one.
None of these change what the agent can do. They change what it can still see clearly after an hour of work.
Why you’d want to think about this
Section titled “Why you’d want to think about this”You’ve been pairing with the agent for two hours on a gnarly migration. It was sharp at the start. Now it’s started re-suggesting a fix it already tried and you already rejected, it’s “forgotten” the constraint you stated forty messages ago, and its edits are getting vaguer. Nothing is broken. The window is just full — mostly of stale file reads, greps that didn’t pan out, and abandoned attempts — and the signal you actually need is buried under noise it can no longer tell apart from signal.
That’s the failure context management addresses. The window has a hard size limit and a softer coherence limit, and a long session walks into both. The levers above are how you stay ahead of them: notice the window filling (inspect), squeeze the dead weight out without losing the thread (compact), and never pay to rebuild context you already had (carry it across).
The concrete moves teams make:
- Compact at phase boundaries — finished the migration, about to start the tests? Compact first, so the test work starts on a lean window instead of dragging the migration trail behind it.
- Clear, don’t compact, for unrelated work — switching to a different bug entirely? Start clean. Compaction summarises the old work; clearing drops it, which is what you want when none of it is relevant.
- Watch the gauge — check what’s consuming the window (tool schemas, a giant file read, an over-long rules file) instead of waiting for degradation to tell you.
- Delegate the noise instead of compacting after — if a sub-task will generate a lot of throwaway context, hand it to a subagent so it never enters your window in the first place.
- Resume instead of re-explaining — pick up yesterday’s session with its context intact rather than opening a fresh window and re-establishing everything.
The test: if the agent is getting less sharp the longer you work — not wrong, just hazier — you have a context-management problem, not a model problem.
What compaction keeps, and what it drops
Section titled “What compaction keeps, and what it drops”Compaction is lossy by definition: it trades a faithful, expensive transcript for a cheap, approximate summary. Knowing what survives the trade is the difference between compacting safely and compacting away the thing you needed.
BEFORE compaction (window ~90% full) AFTER compaction (~35%) ┌─────────────────────────────────────┐ ┌────────────────────────────┐ │ system prompt ← always kept │ │ system prompt │ │ rules (AGENTS.md / CLAUDE.md) kept │ │ rules │ │ tool definitions ← always kept │ │ tool definitions │ │ ── ~40 turns of work ────────────── │ │ ┌── summary (lossy) ─────┐ │ │ 17 file reads (full contents) │ ► │ │ goal · key decisions · │ │ │ 9 greps, 6 test runs │ │ │ files touched · │ │ │ 4 abandoned hypotheses │ │ │ next steps │ │ │ exact error/command output │ │ └────────────────────────┘ │ │ most recent turns ← kept │ │ most recent turns │ └─────────────────────────────────────┘ └────────────────────────────┘ the middle is what gets compressedReliably kept: the high-level goal, the decisions made and why, which files matter (by name), the state of the in-progress task, and the most recent turns verbatim.
Reliably lost: the full contents of files read (they become a name, not a body), exact error and command output (summarised to prose), and the precise sequence of steps. The practical consequence: write unsaved work to disk before compacting — a diff that exists only in the conversation gets flattened into “edited auth.ts,” and after compaction the agent may need to re-read the file to see its own change.
This is also the cleanest restatement of the rules argument: a constraint stated in conversation can be summarised away, but a constraint in the rules file is reloaded every turn and survives every compaction. Durable facts belong in always-loaded context, not in the chat.
The shape compaction is converging on
Section titled “The shape compaction is converging on”The tools in scope reached the same design independently. Every one of them, when the window approaches full, makes a single summarisation call and replaces the bulk of the history with a structured Markdown summary — and the structure is nearly identical across tools: goal, what was done, key decisions, important files, next steps. Several expose a manual /compact that takes optional instructions (/compact focus on the API changes) so you can bias what the summary preserves. The vocabulary differs; the mechanism has standardised. If you understand compaction in one tool, you understand it in all of them — the only real differences left are how much control you get over when it fires and what it keeps (the comparison below).
Why this and not…
Section titled “Why this and not…”| You want to… | Reach for | Not |
|---|---|---|
| Free up a full / degrading window mid-task | Compact | Clear (drops everything, including what you still need) |
| Start unrelated work on a clean slate | Clear / new session | Compact (keeps a summary you don’t want) |
| Keep a noisy side-quest out of the main window entirely | Subagent | Compacting after the noise already landed |
| Make a fact survive every turn and every compaction | Rules | Restating it in chat |
| See what’s actually consuming the window | Inspect (/context, /status, usage gauge) | Waiting for coherence to drop |
| Pick up a previous session with context intact | Resume / continue | New window + re-explaining |
| Explore a risky branch without losing the current thread | Fork (where supported) | Compacting, then regretting it |
Compaction is a recovery tool — it salvages a window that’s already heavy. The cheaper move is usually to keep the window light in the first place: delegate noise to subagents, keep rules lean, and lean on MCP tool-schema deferral so unused tool definitions don’t sit in the window as permanent weight.
How it works in each tool
Section titled “How it works in each tool”Inspect. /context shows what’s currently occupying the window (system prompt, rules, tool definitions, MCP schemas, conversation). /usage shows token usage for the session with a breakdown attributed to skills, subagents, plugins, and individual MCP servers. You can also surface live window usage in the status line.
Compact. Auto-compaction summarises conversation history automatically as you approach the limit. Trigger it yourself with /compact, and steer it with inline instructions:
/compact Focus on the API changes and the failing testYou can also set standing compaction guidance in CLAUDE.md:
# Compact instructionsWhen compacting, preserve test output and code changes.Clear vs rewind. /clear starts fresh for unrelated work (use /rename first so you can find the session later). /rewind (or double-tap Escape) restores both the conversation and the code to an earlier checkpoint — distinct from compaction; it’s an undo, not a summary.
Carry across. /resume reopens a prior session; claude --continue / -c continues the most recent one. A background job summarises past conversations so resume is fast.
Keep the window light upstream: MCP tool definitions are deferred by default (only names enter context until a tool is used), and moving rarely-needed instructions out of CLAUDE.md into skills keeps base context small.
Inspect. /status reports the live session state — active model, approval policy, writable roots, and current token usage — so you can see how full the window is before it degrades.
Compact. /compact summarises earlier turns to free up tokens while preserving the critical details, so a long transcript stays lean. Codex’s documented compaction is manual; automatic compaction at a token threshold is Unverified at the spec level and has varied across releases — don’t rely on it firing for you.
Fresh window. /new starts a fresh conversation within the same CLI process (switch tasks without restarting Codex). /clear resets the terminal view and the conversation together — broader than Ctrl+L, which only clears the display.
Branch without losing the thread. /fork clones the current conversation into a new thread with its own ID, leaving the original transcript intact — useful for trying an alternative approach. /side opens an ephemeral side conversation for a focused follow-up without polluting the main thread’s history.
Carry across. /resume restores a saved conversation from your session list inside the TUI; the shell command codex resume opens the saved-session picker from a cold start.
codex resume # pick a past session to continueVerify exact flags and any auto-compaction config keys against the current Codex docs before depending on them.
Compact (automatic). opencode leans on automatic compaction. It detects when a conversation is about to overflow the model’s context window, makes a summarisation call, and replaces the detailed history with a concise summary while pruning old tool output to reclaim space. User messages are replayed after compaction so the thread keeps flowing.
The summary follows a fixed structured template — goal, constraints, progress, key decisions, next steps, critical context — so you don’t steer it with per-call instructions the way you do in some other tools; the structure is the contract.
Turn it off. Disable automatic compaction with config or an environment variable:
{ "compaction": { "auto": false } }OPENCODE_DISABLE_AUTOCOMPACT=1 opencodeCarry across. Resume from the CLI with opencode --continue / -c (most recent session) or opencode --session <id> / -s (a specific one); add --fork to branch instead of continue. opencode session manages sessions, opencode stats reports usage, and /share creates a link to the current conversation. /undo and /redo step changes back and forth.
Plugin option. A community Dynamic Context Pruning plugin adds configurable pruning thresholds for finer control over when context is shed. Verify TUI slash-command names and config keys against the current opencode docs — these have moved between releases.
Cursor manages the window largely for you — the IDE surface trades fine-grained control for automation.
Compact (automatic). When the context window is about to fill, Cursor automatically summarises the conversation so the agent keeps a working window. Notably, it runs that summarisation on a smaller, faster model rather than your selected one, so compaction doesn’t cost a full-model call. To reduce the lossiness, Cursor treats the prior chat history as a file the agent can search — if a detail is missing from the summary, the agent can look it back up rather than having truly lost it.
Inspect. The chat input shows a context-usage indicator as a conversation grows. Unverified: exact thresholds and any manual-summarise command vary by version.
The main lever is a new chat. Because summarisation is lossy, the highest-value move on Cursor is starting a new chat when you switch tasks — it gives the agent a clean window with none of the previous task’s material, which both improves output and delays hitting the limit. For unrelated work, prefer a new chat over riding a long thread through repeated summarisations.
Reads are bounded. In Agent mode Cursor reads roughly the first 250 lines of a file, extending by another ~250 when needed — so large files don’t dump their full contents into the window unprompted.
Cursor is a future-scope tool here; verify specifics against the current Cursor docs before relying on them.
Copilot manages context automatically across its surfaces, with a bit more control in the CLI than in the editor.
Inspect. The chat input box shows a context-window control — a visual fill indicator with total usage — that updates as you send requests, so you can see the window filling.
Compact (automatic). When a conversation grows large, Copilot summarises the history (“Summarized conversation history”). In Copilot CLI, automatic compaction kicks in around 80% of the window, leaving a ~20% buffer so in-flight tool calls can keep running while it compacts. The summary is structured — goals, what was done, key technical details, important files, planned next steps.
Compact (manual). In the CLI, run /compact at any time to compact proactively before a new phase of work, with optional inline guidance:
/compact focus on the database schema decisionsTurn it off. Disable automatic summarisation in VS Code with:
github.copilot.chat.summarizeAgentConversationHistory.enabled = falseFresh window. Start a New Chat for unrelated work rather than letting one thread accumulate and summarise repeatedly.
Trade-off to expect. Summarisation is lossy: fine-grained details — exact message wording, full command output, minor early decisions — may not survive. Persist anything load-bearing outside the chat.
Copilot is a future-scope tool here; the editor and CLI differ, and settings move — verify against the current Copilot docs.
Comparison
Section titled “Comparison”| Aspect | Claude Code | Codex | opencode | Cursor | Copilot |
|---|---|---|---|---|---|
| Inspect usage | /context, /usage, status line | /status | status indicator Unverified | input-box gauge | chat input context control |
| Manual compaction | /compact [instructions] | /compact | agent-/auto-driven | manual summarise Unverified | /compact [instructions] (CLI) |
| Automatic compaction | Yes, near limit | Not documented Unverified | Yes, on overflow | Yes, near limit (uses a faster model) | Yes (~80% of window) |
| Custom compact instructions | Yes (inline + CLAUDE.md) | Not exposed Unverified | Fixed structured template | Not exposed Unverified | Yes (inline) |
| Disable auto-compaction | Via config Unverified | n/a | compaction.auto=false / OPENCODE_DISABLE_AUTOCOMPACT | Not exposed Unverified | github.copilot.chat.summarizeAgentConversationHistory.enabled=false |
| Fresh window (same app) | /clear | /new, /clear | new session | New Chat | New Chat |
| Resume prior session | /resume, claude --continue/-c | /resume, codex resume | opencode --continue/-c, --session/-s | Chat history | Chat history |
| Fork / branch a session | — | /fork, /side | opencode --fork | — | — |
Auto-compaction is now the default everywhere it’s offered; the axis that still separates the tools is control — whether you can see the window filling, steer what the summary keeps, and turn the automation off. The terminal CLIs expose the most knobs; the IDE tools manage it for you with the fewest. Verify tool-specific flags against the upstream docs below before relying on them — these settings move between releases.
Name collisions
Section titled “Name collisions”- Compact vs clear vs new. “Compact” summarises and keeps a thread going; “clear” / “new” abandons the thread for a fresh window. In Codex,
/newstarts a fresh conversation inside the same CLI process while/clearresets both the view and the conversation — neither is/compact. - “Summarize” vs “compact”. Cursor and Copilot surface the behaviour as “summarising conversation history”; the terminal tools call the same operation “compaction.” Same mechanism, different label.
- “Session” is overloaded. In how agents work a session is one run of the loop with one window. opencode also has an
opencode sessionCLI noun and/share-able sessions; don’t read the CLI sense as a different concept. - Compaction vs memory. Compaction works within a session to fit the window. Agent-managed memory persists facts across sessions and is a separate primitive (deferred; distinct from rules). Resuming a session is not the same as the agent remembering it.