Context window management

What context window management is

How agents work establishes the substrate: the agent runs a loop, everything it can see lives in a finite context window, and that window both fills up and loses fidelity as a session grows. This chapter is the set of levers for operating within that limit over a long session — the management layer on top of the substrate.

There are three of them:

Inspect — see what’s currently eating the window, before it bites.
Compact — replace a long, noisy history with a short summary so the session can keep going.
Carry state across the boundary — resume, fork, or otherwise pick up a window instead of starting from an empty one.

None of these change what the agent can do. They change what it can still see clearly after an hour of work.

Why you’d want to think about this

You’ve been pairing with the agent for two hours on a gnarly migration. It was sharp at the start. Now it’s started re-suggesting a fix it already tried and you already rejected, it’s “forgotten” the constraint you stated forty messages ago, and its edits are getting vaguer. Nothing is broken. The window is just full — mostly of stale file reads, greps that didn’t pan out, and abandoned attempts — and the signal you actually need is buried under noise it can no longer tell apart from signal.

That’s the failure context management addresses. The window has a hard size limit and a softer coherence limit, and a long session walks into both. The levers above are how you stay ahead of them: notice the window filling (inspect), squeeze the dead weight out without losing the thread (compact), and never pay to rebuild context you already had (carry it across).

The concrete moves teams make:

Compact at phase boundaries — finished the migration, about to start the tests? Compact first, so the test work starts on a lean window instead of dragging the migration trail behind it.
Clear, don’t compact, for unrelated work — switching to a different bug entirely? Start clean. Compaction summarises the old work; clearing drops it, which is what you want when none of it is relevant.
Watch the gauge — check what’s consuming the window (tool schemas, a giant file read, an over-long rules file) instead of waiting for degradation to tell you.
Delegate the noise instead of compacting after — if a sub-task will generate a lot of throwaway context, hand it to a subagent so it never enters your window in the first place.
Resume instead of re-explaining — pick up yesterday’s session with its context intact rather than opening a fresh window and re-establishing everything.

The test: if the agent is getting less sharp the longer you work — not wrong, just hazier — you have a context-management problem, not a model problem.

What compaction keeps, and what it drops

Compaction is lossy by definition: it trades a faithful, expensive transcript for a cheap, approximate summary. Knowing what survives the trade is the difference between compacting safely and compacting away the thing you needed.

   BEFORE compaction (window ~90% full)         AFTER compaction (~35%)
 ┌─────────────────────────────────────┐      ┌────────────────────────────┐
 │ system prompt        ← always kept  │      │ system prompt              │
 │ rules (AGENTS.md / CLAUDE.md) kept  │      │ rules                      │
 │ tool definitions     ← always kept  │      │ tool definitions           │
 │ ── ~40 turns of work ────────────── │      │ ┌── summary (lossy) ─────┐ │
 │    17 file reads (full contents)    │  ►   │ │ goal · key decisions · │ │
 │    9 greps, 6 test runs             │      │ │ files touched ·        │ │
 │    4 abandoned hypotheses           │      │ │ next steps             │ │
 │    exact error/command output       │      │ └────────────────────────┘ │
 │ most recent turns    ← kept         │      │ most recent turns          │
 └─────────────────────────────────────┘      └────────────────────────────┘
        the middle is what gets compressed

Reliably kept: the high-level goal, the decisions made and why, which files matter (by name), the state of the in-progress task, and the most recent turns verbatim.

Reliably lost: the full contents of files read (they become a name, not a body), exact error and command output (summarised to prose), and the precise sequence of steps. The practical consequence: write unsaved work to disk before compacting — a diff that exists only in the conversation gets flattened into “edited auth.ts,” and after compaction the agent may need to re-read the file to see its own change.

This is also the cleanest restatement of the rules argument: a constraint stated in conversation can be summarised away, but a constraint in the rules file is reloaded every turn and survives every compaction. Durable facts belong in always-loaded context, not in the chat.

The shape compaction is converging on

The tools in scope reached the same design independently. Every one of them, when the window approaches full, makes a single summarisation call and replaces the bulk of the history with a structured Markdown summary — and the structure is nearly identical across tools: goal, what was done, key decisions, important files, next steps. Several expose a manual /compact that takes optional instructions (/compact focus on the API changes) so you can bias what the summary preserves. The vocabulary differs; the mechanism has standardised. If you understand compaction in one tool, you understand it in all of them — the only real differences left are how much control you get over when it fires and what it keeps (the comparison below).

Why this and not…

You want to…	Reach for	Not
Free up a full / degrading window mid-task	Compact	Clear (drops everything, including what you still need)
Start unrelated work on a clean slate	Clear / new session	Compact (keeps a summary you don’t want)
Keep a noisy side-quest out of the main window entirely	Subagent	Compacting after the noise already landed
Make a fact survive every turn and every compaction	Rules	Restating it in chat
See what’s actually consuming the window	Inspect (`/context`, `/status`, usage gauge)	Waiting for coherence to drop
Pick up a previous session with context intact	Resume / continue	New window + re-explaining
Explore a risky branch without losing the current thread	Fork (where supported)	Compacting, then regretting it

Compaction is a recovery tool — it salvages a window that’s already heavy. The cheaper move is usually to keep the window light in the first place: delegate noise to subagents, keep rules lean, and lean on MCP tool-schema deferral so unused tool definitions don’t sit in the window as permanent weight.

How it works in each tool

Inspect. /context shows what’s currently occupying the window (system prompt, rules, tool definitions, MCP schemas, conversation). /usage shows token usage for the session with a breakdown attributed to skills, subagents, plugins, and individual MCP servers. You can also surface live window usage in the status line.

Compact. Auto-compaction summarises conversation history automatically as you approach the limit. Trigger it yourself with /compact, and steer it with inline instructions:

/compact Focus on the API changes and the failing test

You can also set standing compaction guidance in CLAUDE.md:

# Compact instructions
When compacting, preserve test output and code changes.

Clear vs rewind. /clear starts fresh for unrelated work (use /rename first so you can find the session later). /rewind (or double-tap Escape) restores both the conversation and the code to an earlier checkpoint — distinct from compaction; it’s an undo, not a summary.

Carry across. /resume reopens a prior session; claude --continue / -c continues the most recent one. A background job summarises past conversations so resume is fast.

Keep the window light upstream: MCP tool definitions are deferred by default (only names enter context until a tool is used), and moving rarely-needed instructions out of CLAUDE.md into skills keeps base context small.

Inspect. /status reports the live session state — active model, approval policy, writable roots, and current token usage — so you can see how full the window is before it degrades.

Compact. /compact summarises earlier turns to free up tokens while preserving the critical details, so a long transcript stays lean. Codex’s documented compaction is manual; automatic compaction at a token threshold is Unverified at the spec level and has varied across releases — don’t rely on it firing for you.

Fresh window. /new starts a fresh conversation within the same CLI process (switch tasks without restarting Codex). /clear resets the terminal view and the conversation together — broader than Ctrl+L, which only clears the display.

Branch without losing the thread. /fork clones the current conversation into a new thread with its own ID, leaving the original transcript intact — useful for trying an alternative approach. /side opens an ephemeral side conversation for a focused follow-up without polluting the main thread’s history.

Carry across. /resume restores a saved conversation from your session list inside the TUI; the shell command codex resume opens the saved-session picker from a cold start.

codex resume          # pick a past session to continue

Verify exact flags and any auto-compaction config keys against the current Codex docs before depending on them.

Compact (automatic). opencode leans on automatic compaction. It detects when a conversation is about to overflow the model’s context window, makes a summarisation call, and replaces the detailed history with a concise summary while pruning old tool output to reclaim space. User messages are replayed after compaction so the thread keeps flowing.

The summary follows a fixed structured template — goal, constraints, progress, key decisions, next steps, critical context — so you don’t steer it with per-call instructions the way you do in some other tools; the structure is the contract.

Turn it off. Disable automatic compaction with config or an environment variable:

{ "compaction": { "auto": false } }

OPENCODE_DISABLE_AUTOCOMPACT=1 opencode

Carry across. Resume from the CLI with opencode --continue / -c (most recent session) or opencode --session <id> / -s (a specific one); add --fork to branch instead of continue. opencode session manages sessions, opencode stats reports usage, and /share creates a link to the current conversation. /undo and /redo step changes back and forth.

Plugin option. A community Dynamic Context Pruning plugin adds configurable pruning thresholds for finer control over when context is shed. Verify TUI slash-command names and config keys against the current opencode docs — these have moved between releases.

Copilot manages context automatically across its surfaces, with a bit more control in the CLI than in the editor.

Inspect. The chat input box shows a context-window control — a visual fill indicator with total usage — that updates as you send requests, so you can see the window filling.

Compact (automatic). When a conversation grows large, Copilot summarises the history (“Summarized conversation history”). In Copilot CLI, automatic compaction kicks in around 80% of the window, leaving a ~20% buffer so in-flight tool calls can keep running while it compacts. The summary is structured — goals, what was done, key technical details, important files, planned next steps.

Compact (manual). In the CLI, run /compact at any time to compact proactively before a new phase of work, with optional inline guidance:

/compact focus on the database schema decisions

Turn it off. Disable automatic summarisation in VS Code with:

github.copilot.chat.summarizeAgentConversationHistory.enabled = false

Fresh window. Start a New Chat for unrelated work rather than letting one thread accumulate and summarise repeatedly.

Trade-off to expect. Summarisation is lossy: fine-grained details — exact message wording, full command output, minor early decisions — may not survive. Persist anything load-bearing outside the chat.

Copilot is a future-scope tool here; the editor and CLI differ, and settings move — verify against the current Copilot docs.

Comparison

Aspect	Claude Code	Codex	opencode	Cursor	Copilot
Inspect usage	`/context`, `/usage`, status line	`/status`	status indicator Unverified	input-box gauge	chat input context control
Manual compaction	`/compact [instructions]`	`/compact`	agent-/auto-driven	manual summarise Unverified	`/compact [instructions]` (CLI)
Automatic compaction	Yes, near limit	Not documented Unverified	Yes, on overflow	Yes, near limit (uses a faster model)	Yes (~80% of window)
Custom compact instructions	Yes (inline + `CLAUDE.md`)	Not exposed Unverified	Fixed structured template	Not exposed Unverified	Yes (inline)
Disable auto-compaction	Via config Unverified	n/a	`compaction.auto=false` / `OPENCODE_DISABLE_AUTOCOMPACT`	Not exposed Unverified	`github.copilot.chat.summarizeAgentConversationHistory.enabled=false`
Fresh window (same app)	`/clear`	`/new`, `/clear`	new session	New Chat	New Chat
Resume prior session	`/resume`, `claude --continue`/`-c`	`/resume`, `codex resume`	`opencode --continue`/`-c`, `--session`/`-s`	Chat history	Chat history
Fork / branch a session	—	`/fork`, `/side`	`opencode --fork`	—	—

Auto-compaction is now the default everywhere it’s offered; the axis that still separates the tools is control — whether you can see the window filling, steer what the summary keeps, and turn the automation off. The terminal CLIs expose the most knobs; the IDE tools manage it for you with the fewest. Verify tool-specific flags against the upstream docs below before relying on them — these settings move between releases.

Name collisions

Compact vs clear vs new. “Compact” summarises and keeps a thread going; “clear” / “new” abandons the thread for a fresh window. In Codex, /new starts a fresh conversation inside the same CLI process while /clear resets both the view and the conversation — neither is /compact.
“Summarize” vs “compact”. Cursor and Copilot surface the behaviour as “summarising conversation history”; the terminal tools call the same operation “compaction.” Same mechanism, different label.
“Session” is overloaded. In how agents work a session is one run of the loop with one window. opencode also has an opencode session CLI noun and /share-able sessions; don’t read the CLI sense as a different concept.
Compaction vs memory. Compaction works within a session to fit the window. Agent-managed memory persists facts across sessions and is a separate primitive (deferred; distinct from rules). Resuming a session is not the same as the agent remembering it.