Murph's Take

Context Windows as Working Memory: Designing Systems Around This Constraint

The context window isn't just a technical limit — it's a design constraint that should shape how you architect every system Claude is part of. Most people treat it as a ceiling. It's actually a forcing function.

Jason MurphyMarch 29, 20267 min read

A context window is a technical term with an underappreciated design implication. The technical definition is simple: it's the maximum amount of text a language model can process in a single interaction. Everything Claude can "see" when generating a response.

The design implication is less obvious and more consequential: the context window is a limited resource. How you fill it determines the quality of the reasoning it produces.

Most people treat this as a ceiling to work around. "My document is too long for the context window" becomes a problem to solve by summarizing the document or switching to a model with a larger window. That framing is correct but incomplete. The context window isn't just a ceiling — it's a forcing function for clarity about what actually needs to be in the room when Claude is making decisions.

The Working Memory Analogy

Cognitive science has a useful two-level model of memory. Working memory is limited capacity — roughly seven items simultaneously, and that's being generous. It's what you're actively holding in mind when you're reasoning through a problem. Long-term memory is vast but requires retrieval — you don't have access to everything you know at once, but you can surface relevant knowledge when a cue activates it.

Claude's architecture maps reasonably well onto this model. The context window is working memory — it's what Claude has active access to during reasoning. Everything outside the context window — files, databases, MCP-connected systems, knowledge bases — is analogous to long-term memory, available through retrieval but not present unless you surface it.

The mistake most people make is trying to load everything into working memory. Dump the entire document into context. Include every piece of project history. Add all the standing instructions. The result is a cluttered working memory that spends cognitive load managing noise rather than reasoning about signal.

What Belongs in Working Memory

The discipline here is principled curation. Not "include everything relevant" but "include only what this specific reasoning task requires."

Task-specific context — the precise problem you're asking Claude to solve — belongs in context at high resolution. Everything else should be either summarized, referenced, or retrieved on demand.

Standing instructions (what you'd put in a CLAUDE.md or system prompt) are expensive context real estate. Every line of instruction is competing with the actual task context for space. Good standing instructions are densely informative — they encode patterns, constraints, and conventions at high signal density. A standing instruction that says "always use TypeScript" and one that says "TypeScript is preferred for this project but JavaScript is acceptable for scripts" take the same space and contain very different amounts of information. The discipline is expressing constraints at the maximum signal density per token.

Project history is almost never needed at full resolution in working memory. What's needed is a structured summary of current state — what's done, what's in progress, what's blocked, what decisions have been made. That's a retrieval problem (MCP servers, context files) not a context problem.

The Retrieval Architecture

The other half of the model — long-term memory — requires its own design work. You can't just leave it as "everything that's not in the context window." You need to design how Claude retrieves relevant information when it needs it.

This is where tools like MCP servers, RAG (retrieval-augmented generation) systems, and structured project files become infrastructure rather than optional enhancements. They're the retrieval mechanism that lets Claude's effective knowledge far exceed what fits in working memory, while keeping working memory clean enough to reason at full capacity.

A well-designed retrieval architecture has a few properties: it surfaces information at the right granularity for the task (not raw database records but synthesized context), it's fast enough that retrieval doesn't interrupt reasoning flow, and it's accurate enough that Claude can trust what it retrieves without spending context budget on verification.

The Forcing Function

Here's what I find most valuable about treating context windows as working memory rather than just a size limit: it forces the right questions.

What does Claude actually need to know to do this task well? Not "what might be useful" but "what is load-bearing for this specific reasoning operation?" The constraint pushes you toward clarity about what matters and what's noise — which is, incidentally, also a useful exercise for thinking about your own work.

The organizations that design AI systems with this constraint in mind build something different from the ones that don't. They build systems where Claude operates with lean, high-quality context rather than bloated, noisy context. The output quality gap is significant and compounds over time.

What does your current context architecture look like — and is it designed, or just accumulated?

Want this for your business?

Tell us what you're building. We'll map out exactly what to build and what it costs.

Start Your Project →

Frequently Asked

What is a context window and why does it matter for system design?

A context window is the amount of text a language model can process in a single interaction — everything Claude can 'see' at once when generating a response. It matters for system design because it's finite, and because the quality of Claude's reasoning degrades as the context fills with noise rather than signal. Good system design treats the context window as a first-class architectural constraint, not an afterthought.

What's the analogy to working memory and why is it useful?

Cognitive science distinguishes working memory (the limited capacity system you use for active reasoning) from long-term memory (the vast store of what you know). Claude's context window functions like working memory — it's what's active in the current reasoning process. Long-term memory exists outside the context window in files, databases, and knowledge bases that can be retrieved when needed. Designing AI systems well means designing both: what goes in working memory, and what retrieval system surfaces the right things from long-term storage.

How does context window design affect the quality of Claude's outputs?

A context window filled with irrelevant information degrades reasoning quality measurably — Claude has to work harder to extract the signal from the noise, and its outputs reflect that effort. A well-curated context window, where every included element is relevant to the current task, produces sharper, more coherent reasoning. The discipline is knowing what to include, what to leave out, and what to make retrievable rather than permanent.

What's the practical implication for CLAUDE.md files and project instructions?

Project instruction files are prime context real estate. Every line of standing instruction is competing for space with the task-specific context that makes Claude useful in the moment. Good project instructions are densely informative — they encode patterns, constraints, and conventions at high signal density. Verbose instructions that repeat obvious things or cover every possible edge case are expensive. The exercise of writing lean, precise instructions is itself a forcing function for clarity about what actually matters.

Jason Murphy

Written by

Murph

Jason Matthew Murphy. Twenty years building digital systems for businesses. Former CardinalCommerce (acquired by Visa). Now running VibeTokens — AI-built websites and content for small businesses.

The window is open.

It won't be forever.

Start Your Project →