Standalone article · part of a sequenced guide
What you'll unlock: Memory is layered — in-context, files, memory system, Project instructions. A million tokens is a workspace, not a dump truck: position what matters, compress what ages, and build Projects as knowledge bases when retrieval must persist.
Memory, Context & the 1 Million Token Mindset
The complete guide to Claude's memory systems — what persists, what doesn't, and how to architect context for serious work
Chapter context
Your team hits context limits, loses thread continuity, or uploads entire data rooms hoping Claude 'finds the answer.' Quality is inconsistent and API bills spike.This chapter replaces hope with memory and retrieval design — the same discipline you'd apply to any data system, applied to how Claude sees your work.
Is this chapter for you?
Do multi-day Claude sessions lose early decisions or instructions?
Yes — Concept 1 in-context + handoffs; Concept 2.6–2.7 compression.
Are you loading large document sets (contracts, research, code)?
Yes — Concept 2 positioning and Concept 3 retrieval structure are mandatory.
Does your team re-upload the same files every week?
Yes — Concept 1.7 team Projects and Concept 3.4 knowledge-base pattern.
Are you evaluating vector databases for internal Q&A?
Yes — read Concept 3.6 before buying infrastructure.
Chapters 1 and 4 gave you the mental model and prompting craft. Chapter 5 is context architecture — memory layers, million-token strategy, and native RAG without a vector database.Power users do not ask 'does Claude remember?' They ask 'which memory layer owns this fact?' and design accordingly.
Chapter insight
Memory is layered — in-context, files, memory system, Project instructions. A million tokens is a workspace, not a dump truck: position what matters, compress what ages, and build Projects as knowledge bases when retrieval must persist.
Reference diagrams
Four memory layers
Assign every fact to a layer — volatility and sensitivity drive the choice.
Long context stack order
Instructions → summary → bulk → question → recap — fight lost-in-the-middle.
Implementation paths
Three concepts — memory layers, 1M mastery, native RAG.
Concept 1
How Claude's Memory Actually Works
The architecture behind what Claude remembers — and the consequences of each layer for how you work
1.1
The four memory layers
In-context, external storage, memory system, and project instructions — what each does and who controls it
Key takeaway
Claude memory is not one thing — it is four layers: the live conversation window, uploaded documents, Claude's optional memory feature, and Project-level instructions. Each has different persistence, cost, and control.
Why this matters
Teams conflate 'Claude remembered' with the wrong layer and build workflows that break when context shifts. Layer literacy is architecture literacy.
In-context is what Claude sees right now. External storage is evidence you provide. Memory system is selective persistence. Project instructions is team-scoped policy and context.
You control external storage and Project instructions directly. Memory system is semi-automatic — review it. In-context is automatic but finite.
Workflow — do this next
- 01Map your top 10 facts to a layer: in-context / file / memory / Project.
- 02Nothing critical should live only in in-context.
- 03Document the map in PROJECT_MEMORY.md.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Memory layer decision map
IN-CONTEXT → session decisions, draft text, this-thread reasoning EXTERNAL FILES → specs, contracts, data room, codebase exports MEMORY SYSTEM → stable prefs (role, tools, standing constraints) PROJECT INSTRUCTIONS → team policy, voice, scope, links to canonical docs Rule: if losing the chat would hurt → move up a layer.
1.2
In-context memory
Everything in the current conversation window — what it holds, how long it lasts, and when it is lost
Key takeaway
In-context is Claude's working RAM — fast, rich, and fragile. It includes messages, tool results, and attachments until the context limit is hit or the chat is abandoned.
Why this matters
Chapter 1 established stateless defaults. In-context is the only 'memory' in a single API call — everything else you engineer.
Everything sent in the thread competes for the same attention budget. Long threads degrade: early instructions weaken, middle documents suffer lost-in-the-middle effects (Concept 2.2).
In-context is lost when: you start a new chat, the window overflows, or you switch Projects without carrying summaries. Power users distill state before loss.
Workflow — do this next
- 01Monitor thread length on multi-day work — restart with handoff summary at ~60% felt capacity.
- 02Pin critical constraints in the latest message, not only message one.
- 03Use SESSION_STATE.md in Project for continuity across chats.
Real example
Legal review — 180-message thread collapse
A team ran contract redlines in one chat for two weeks. Message 1 contained scope limits; by message 150 Claude suggested clauses outside mandate. Fix: weekly new chat with distilled OPEN_ISSUES.md + only active sections attached.
1.3
Claude's memory system
What gets saved, how to view it, how to edit it, and how it surfaces in future conversations
Key takeaway
Claude.ai memory stores inferred facts about you — useful for continuity, dangerous if wrong. You can view, edit, and delete memories; treat them like a profile you'd audit quarterly.
Why this matters
Unreviewed memories become confident wrong callbacks. Memory is not a substitute for canonical docs.
Memory captures preferences and recurring context Claude infers from chats — not full transcripts. It surfaces in new conversations as background, similar to a lightweight user model.
Audit memory after role changes, company pivots, or tool switches. Delete stale entries ('you prefer Opus for everything'). Prefer Project files for factual corp data over memory.
Workflow — do this next
- 01Settings → Memory: read all entries monthly.
- 02Delete anything project-specific or outdated.
- 03Add critical standing facts manually if the product allows.
1.4
Project instructions as persistent context
The system prompt layer that persists across every conversation in a project
Key takeaway
Project instructions are the persistent brief for a workspace — scope, rules, links, and tone that apply to every new chat in that Project without re-pasting.
Why this matters
This is the closest Claude.ai gets to a system prompt. Under-invested Projects behave like amnesiac assistants.
Put here: what this Project is for, canonical file names, approval rules, output defaults, links to repo/wiki. Keep lean — link to large docs rather than paste them (Chapter 2 token economics).
Project instructions + attached knowledge files = persistent context layer. Different Projects = different memory boundaries (client A vs client B).
Workflow — do this next
- 01One Project per client, product, or major initiative.
- 02Instructions under 400 words; details in attached INDEX.md.
- 03Review instructions when scope changes — version in filename.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Project instructions template
PURPOSE: [what this Project is for] SCOPE: [in / out] CANONICAL FILES: [list with one-line description each] OUTPUT DEFAULTS: [format, tone, length] RULES: [3–5 non-negotiables] ESCALATION: [when to ask human before acting]
1.5
External memory via documents
Uploading files as external memory — what Claude can retrieve and how accurately
Key takeaway
Uploaded files are external memory — Claude reads them in context, not from a perfect database. Retrieval quality depends on structure, positioning, and question clarity.
Why this matters
Uploading a 400-page PDF without structure is hope, not architecture. Document design is memory design.
Claude processes uploaded text into the context window — it does not magically index every page for selective lookup unless you use tools/MCP or structure prompts for retrieval (Concept 3).
Accuracy improves with: clear headings, tables of contents, chunking large files, and explicit 'answer only from section X' instructions.
Workflow — do this next
- 01Prepend TOC to long PDFs before upload.
- 02Split multi-topic dumps into named files.
- 03Ask Claude to quote supporting passages — verify against source.
1.6
Memory across conversations
Why Claude starts fresh and the three ways to give it continuity
Key takeaway
New chat = cold start unless you inject continuity via Project context, handoff summaries, or API session stores.
Why this matters
Assuming cross-chat memory causes duplicated work and contradictory decisions.
Three continuity patterns: (1) Project persistence, (2) handoff summaries, (3) API session store
Workflow — do this next
- 01Pick one primary continuity pattern per workstream.
- 02Never rely on Claude memory alone for decisions.
- 03Template handoff block in Chapter 1 artifact.
1.7
Memory for teams
How shared Projects give teams shared context without each member re-explaining everything
Key takeaway
Team Projects are shared external memory — onboarding docs, approved prompts, client context — so new chats inherit org knowledge, not individual folklore.
Why this matters
Without shared Projects, every hire re-uploads the same PDFs and re-explains brand voice.
Claude Team enables shared Projects with role-appropriate access. Treat each shared Project as a team knowledge capsule — not a dumping ground for every file.
Assign Project owners: curate files quarterly, prune stale instructions, document what belongs here vs in the wiki.
Workflow — do this next
- 01Create TEAM_HQ Project with onboarding README.
- 02Migrate top 5 re-uploaded files into shared Project once.
- 03Ban 'ask Sarah' — link to Project instead.
Real example
Agency — client Projects as memory boundary
Each client Project: scope doc, brand guide, banned phrases, active campaign artifacts. Account managers start chats inside client Project — no cross-client bleed. New hire productive day one by reading Project INDEX.
1.8
The memory design decision
Choosing the right memory layer for each piece of information — the architecture mindset
Key takeaway
For every fact, ask: volatility, sensitivity, audience, and retrieval frequency — then assign a layer. Memory design is explicit, not accidental.
Why this matters
Random layer choice creates cost (token bloat), risk (wrong client data), and confusion (stale memory).
Volatile (changes weekly) → in-context or short-lived files, not memory system. Stable policy → Project instructions. Personal preference → memory or user prefs. Sensitive → external with access control, never public chat links.
High-frequency retrieval across many chats → Project file with good structure. One-off analysis → attach, extract conclusions to canonical doc, detach.
Workflow — do this next
- 01Run memory design review when starting a new initiative.
- 02Document layer choices in Project README.
- 03Re-review when team complains 'Claude forgot' — usually wrong layer.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Memory design worksheet (per fact)
Fact: _______________ Volatility: [daily / weekly / stable] Sensitivity: [public / internal / confidential] Audience: [solo / team / org] Frequency needed: [every chat / weekly / once] → Layer: [in-context | file | memory | Project instructions] → Owner: _______________ → Review date: _______________
Concept 2
The 1 Million Token Context — Practical Mastery
What a 1 million token context window actually enables — and the techniques for using it without wasting it
2.1
What 1 million tokens actually holds
Books, codebases, research corpora, conversation histories — the concrete capacity in human terms
Key takeaway
One million tokens is roughly 750k words — multiple books, a mid-size codebase snapshot, or dozens of long reports — but capacity is not the same as perfect recall.
Why this matters
Oversized context invites lazy dumping. Knowing human-scale capacity helps you plan what belongs in-window vs external RAG.
Rule of thumb: ~1.3 tokens per English word. 1M tokens ≈ 750k words ≈ 1,500 single-spaced pages of prose — or less for dense code/JSON.
Concrete fits: full novel + notes, 50–100 substantial PDFs if compressed, entire repo export for architecture review (not every binary). Always verify model tier supports 1M on your plan — see Chapter 2.
Workflow — do this next
- 01Estimate token count before mega-upload.
- 02Ask: does this task need full corpus or targeted sections?
- 03Budget cost — 1M input is not free on API.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
1M token capacity cheatsheet
~750k English words ~3–5 full business books (text only) ~50–80 long PDF reports (varies) ~1 mid-size codebase (source only, no node_modules) ~years of chat if summarised — not raw Always measure your actual corpus with a token counter.
2.2
The lost-in-the-middle problem
Why Claude's attention degrades on information buried in the middle of a long context — and how to structure documents to counter it
Key takeaway
Models attend strongly to the beginning and end of context; middle sections get under-weighted. Long dumps without structure produce missed details.
Why this matters
Teams upload everything, ask one question, and blame the model when mid-document facts vanish.
Lost in the middle means critical clauses on page 200 of 400 may be ignored. Mitigations: reposition key facts, summarise middle sections, or retrieve relevant chunks only.
Symptoms: contradictory answers, 'I don't see that' when text is present, confident omission of mid-doc requirements.
Workflow — do this next
- 01Put must-read facts in intro and recap sections.
- 02For contracts: extract key clauses to a 2-page SUMMARY.md at top of context.
- 03Test with needle-in-haystack questions before trusting workflow.
Real example
Procurement — indemnity clause missed
200-page MSA uploaded whole. Claude approved terms but missed indemnity cap in middle section. Fix: REQUIREMENTS.md listing 12 must-verify clauses at context start; ask Claude to tick each with page cite.
2.3
Document positioning strategy
Where to place the most important information in a long context — the positioning principles that preserve retrieval quality
Key takeaway
Order context deliberately: instructions first, critical facts next, supporting bulk in the middle, task and recap last.
Why this matters
Positioning is free and often beats buying more tokens.
Optimal stack: (1) system/Project instructions, (2) executive summary of all attachments, (3) full documents, (4) user question, (5) 'Before answering, list which sections you used.'
Repeat critical constraints in the final user message — recency reinforces attention.
Workflow — do this next
- 01Build a CONTEXT_ORDER template for your team.
- 02Never bury the ask — put question after documents or in dual position.
- 03Use XML tags to label sections (Chapter 4).
2.4
Loading a codebase
How to structure an entire codebase in context for software work — the format and the order that produces the best results
Key takeaway
For codebase-in-context: exclude noise (node_modules, build artifacts), lead with ARCHITECTURE.md and tree overview, group by module, put target files last before the task.
Why this matters
Raw repo dumps waste tokens on irrelevant files and bury the module you need to change.
Prefer Claude Code for repo work when possible — it navigates natively. For Claude.ai/API: export tree + key files, or use MCP git integration.
Include: README, package manifests, entry points, types/interfaces, files under change. Exclude: lockfiles content, minified assets, generated code unless task-specific.
Workflow — do this next
- 01Generate tree: find . -type f -name '*.ts' | head — curate list.
- 02Attach ARCHITECTURE.md written by humans first.
- 03Scope task to one package/service per session.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Codebase context pack order
1. TASK + acceptance criteria 2. ARCHITECTURE.md (human-written) 3. Directory tree (paths only) 4. Shared types / API contracts 5. Files directly under change 6. Related tests 7. "Quote file:line for every claim"
2.5
Loading a corpus of research
Feeding multiple documents and asking cross-document questions — the research workflow that used to require a dedicated tool
Key takeaway
Multi-doc synthesis works when documents are labelled, summarised at the top, and questions specify comparison dimensions — not 'tell me everything.'
Why this matters
Cross-doc questions without structure produce shallow summaries that miss disagreements between sources.
Workflow: ingest docs with consistent naming (AUTHOR_YEAR_TOPIC.md), add 5-line abstract per doc at context start, ask matrix questions ('compare methods, sample size, conclusion across docs A–F').
Use artifacts for synthesis output; keep chat for methodology questions.
Workflow — do this next
- 01Create CORPUS_INDEX.md — one row per source.
- 02Ask for disagreement map before consensus summary.
- 03Require citation format: [Doc ID, section].
Real example
Corp dev — 40 acquisition memos
PM indexed memos, loaded index + 12 most relevant full texts. Question: 'Which targets share regulatory risk pattern X?' Cross-doc table in artifact with cites. Work that previously needed analyst week.
2.6
Conversation history management
When to continue a conversation and when to start fresh — the decision that affects quality as context grows
Key takeaway
Continue when thread is focused and under ~60% context; start fresh with handoff when scope shifts, quality drops, or instructions fight earlier messages.
Why this matters
Zombie threads accumulate contradictions and dilute instructions — sunk-cost fallacy keeps people in bad chats.
Fresh start triggers: new sub-project, role change in prompt, repeated corrections of same mistake, unexplained quality cliff.
Continue when: same deliverable, iterative refinement, artifact in progress, context still coherent.
Workflow — do this next
- 01End sessions with 10-line HANDOFF block.
- 02New chat starts with HANDOFF + 'confirm before proceeding'.
- 03Archive old threads — don't delete; export conclusions to Project.
2.7
Context compression
Summarising earlier context to preserve the window — the technique for long-running projects
Key takeaway
Compression = structured summaries that preserve decisions, open questions, and constraints — not lossy 'tl;dr' that drops nuance.
Why this matters
Long projects exceed any window without compression discipline.
Pattern: every N turns or daily, ask Claude to update STATE.md sections: Decisions, Open questions, Constraints, Next actions, Key quotes with cites. Replace raw history with STATE in new thread.
API: rolling summary in your DB — append new turns, re-summarise when summary exceeds token budget.
Workflow — do this next
- 01Define non-negotiable fields in STATE template.
- 02Human approves compression before it becomes canonical.
- 03Never compress away numbers, dates, or named decisions.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Context compression prompt
Update STATE.md from this thread. Preserve: - All numeric decisions and dates verbatim - Open questions (numbered) - Constraints labelled MUST / MUST NOT - Remove duplicate reasoning; keep conclusions Output markdown only. Flag anything ambiguous for human review.
2.8
The 1 million token mindset shift
The work that becomes possible when you stop thinking in single-document chunks — the workflow transformation
Key takeaway
1M context enables portfolio thinking — whole codebases, corpuses, deal rooms — but rewards architects who curate and position, not hoarders who dump.
Why this matters
Mindset shift: from 'what fits in one prompt' to 'what system of evidence supports this decision.'
New workflows: full-library code review, multi-contract comparison, longitudinal chat analysis, cross-team doc harmonisation — tasks that required teams of analysts or bespoke tools.
Still combine with verification, chunking for edge precision, and external RAG when corpuses exceed 1M or need real-time updates.
Workflow — do this next
- 01List one task you previously chunked manually — try 1M with structure.
- 02Measure quality vs cost vs latency.
- 03Document when to use 1M vs retrieval — decision tree in Project.
Real example
Compliance — annual policy harmonisation
12 policy PDFs loaded with index. Claude produced conflict matrix across jurisdictions. Legal reviewed matrix, not 400 pages. 1M window + positioning beat six weeks of associate time — with human sign-off on conflicts only.
Concept 3
Claude as a RAG System — Hidden Architecture
Using Claude's native features to build retrieval-augmented workflows without external infrastructure
3.1
What RAG means in a Claude context
Using Claude's context window as a retrieval layer without a vector database
Key takeaway
Native Claude RAG = curated documents in context + prompts that force grounded answers — no Pinecone required for many knowledge-work tasks.
Why this matters
Teams over-build vector DBs before exhausting Project-based retrieval. Claude's window is the retrieval layer when corpus fits and updates are infrequent.
RAG traditionally: embed chunks, vector search, inject top-k. In Claude.ai: you are the retriever — attach the right files, structure the ask, demand citations.
Works when: corpus < context limit, updates weekly not per-second, team can curate files. Breaks when: millions of docs, strict ACL per chunk, sub-second fresh data.
Workflow — do this next
- 01Try Project RAG before proposing vector infra.
- 02Define success: citation accuracy on 10 test questions.
- 03If fail, note whether size, freshness, or ACL caused it.
3.2
Document upload as retrieval
How to structure uploaded documents so Claude retrieves from them accurately
Key takeaway
Retrieval-quality uploads have: descriptive filenames, headings, page/section markers, and a top-of-file summary — Claude reads like a human skimmer, not a DB.
Why this matters
Unstructured PDF exports are retrieval poison.
Before upload: add cover page with doc ID, date, 5-bullet abstract. Use H1/H2 hierarchy. For scans, OCR with structure preserved.
Prompt pattern: 'Answer using only [DOC_ID]. Quote section headers. If not found, say NOT IN SOURCE.'
Workflow — do this next
- 01Rename files: ROLE_TOPIC_vDATE.ext
- 02Add 10-line summary file per large upload.
- 03Run 3 needle tests after upload.
3.3
Multi-document synthesis
Loading multiple sources and asking Claude to compare, synthesise, and reason across all of them
Key takeaway
Multi-doc synthesis needs an index layer, explicit comparison axes, and output format that forces per-source attribution.
Why this matters
Without axes, Claude averages sources into mushy consensus.
Load CORPUS_INDEX first, then full texts or summaries. Ask: 'Build comparison table: Source | Claim | Evidence | Conflicts with.'
For conflicting sources, instruct: 'Do not merge — list disagreement explicitly.'
Workflow — do this next
- 01Cap active full texts — summarise peripheral docs.
- 02One synthesis question per thread.
- 03Export matrix to artifact; verify 3 random cells.
Real example
Strategy — three analyst reports
CEO wanted one view of market size. Three reports disagreed. Claude produced attributed table — not blended number. CEO picked assumption set consciously. Native RAG + synthesis prompt avoided false precision.
3.4
The project-as-knowledge-base pattern
Using a Claude Project with uploaded documents as a persistent knowledge base
Key takeaway
A Project with INDEX.md, curated files, and retrieval prompts functions as a zero-code knowledge base for a role or domain.
Why this matters
Cheapest path to team-wide grounded Q&A without engineering sprint.
Structure: INDEX (what's here), POLICY (how to answer), CORPUS (files), PROMPTS (saved question templates). New chat always starts in same Project.
Maintenance: owner reviews uploads monthly; deprecate files to ARCHIVE/ subfolder listing in INDEX.
Workflow — do this next
- 01Clone template Project per domain (Legal KB, Product KB).
- 02Add 5 canonical docs before inviting team.
- 03Pin retrieval prompt in Project description.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Project knowledge base INDEX.md
# [Domain] Knowledge Base Last reviewed: [DATE] Owner: [name] ## Files | ID | File | Summary | Use when | |----|------|---------|----------| | A1 | pricing_2025.pdf | … | pricing questions | ## Retrieval rules - Cite file ID + section - NOT IN SOURCE if missing - Escalate legal to [human]
3.5
Chunking for Claude
How to split large documents before upload for better retrieval quality
Key takeaway
Chunk by semantic boundary (chapter, clause, module) — not arbitrary token splits — with chunk headers that repeat parent context.
Why this matters
Arbitrary 512-token chunks lose legal and technical meaning.
Each chunk file: CHUNK_META (parent doc, section, date) + content. INDEX maps questions → chunk IDs.
For API at scale, mirror same boundaries in vector DB — semantic chunks beat random splits.
Workflow — do this next
- 01Split at H1/H2 boundaries.
- 02Prefix each chunk: 'From CONTRACT_X, Section 4.2 Indemnity:'
- 03Load only relevant chunks when corpus exceeds window.
3.6
Native RAG vs external RAG
When Claude's built-in context window replaces a vector database and when you need the real thing
Key takeaway
Native wins for curated, bounded, human-maintained knowledge. External RAG wins for huge, dynamic, permissioned, or embedding-optimised corpora.
Why this matters
Wrong choice wastes months — vector DB for 12 PDFs, or Projects for 10M support tickets.
Choose native Project RAG: <100 docs, same team access, weekly updates, Q&A workflow.
Choose external RAG: per-user ACL, >1M tokens corpus, sub-minute data freshness, production customer-facing at scale, hybrid keyword+semantic needs.
Workflow — do this next
- 01Score corpus on size, freshness, ACL, query volume.
- 02Prototype native in one afternoon.
- 03External only if scorecard fails 2+ criteria.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Native vs external RAG matrix
NATIVE (Project) if: □ Corpus fits in context with room for Q&A □ Shared or single-user access OK □ Updates manual/weekly □ Internal power users EXTERNAL if: □ Millions of chunks or strict per-user ACL □ Real-time DB / ticket stream □ Customer-facing SLA + logging □ Need hybrid search / reranking pipeline
3.7
The knowledge base Project
Building a Claude Project that functions as an always-available knowledge base for a domain or role
Key takeaway
A KB Project is a product: INDEX, owners, test questions, retrieval constitution, and changelog — not a folder of uploads.
Why this matters
KB Projects without governance become stale and untrusted.
Rollout: define 20 canonical questions, gather source docs, build INDEX, write retrieval constitution, test pass rate ≥90% on canonical set, train team on prompts, schedule quarterly audit.
Pair with Chapter 4 prompt templates for consistent retrieval behaviour.
Workflow — do this next
- 01Write KB_CHARTER.md: scope, owner, review cadence.
- 02Run canonical Q test suite before launch.
- 03Log failed questions — fix doc or prompt, not blame user.
Real example
HR — internal policy KB Project
40 policies chunked, INDEX maintained by HR ops. Employees ask leave, equity, travel questions. Retrieval prompt requires policy ID cite. Escalation path for edge cases. IT ticket volume for basic HR Q down 35%.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Session handoff (memory-aware)
End every long thread; start the next with this block.
## Handoff — [DATE] [PROJECT] ### Decisions (verbatim numbers/dates) - ### Open questions - ### Layer map - In Project files: [list] - Updated STATE.md: Y/N ### Next chat first message "Read HANDOFF below. Confirm. Then [task]."
Knowledge base canonical test set
20 questions every KB Project must answer with cites.
For each question record: - Q# - Expected source doc ID - Pass: cite correct / Fail: reason - Fix: [doc | prompt | chunk] Launch threshold: 18/20 pass before team rollout.
Memory layer decision map
IN-CONTEXT → session decisions, draft text, this-thread reasoning EXTERNAL FILES → specs, contracts, data room, codebase exports MEMORY SYSTEM → stable prefs (role, tools, standing constraints) PROJECT INSTRUCTIONS → team policy, voice, scope, links to canonical docs Rule: if losing the chat would hurt → move up a layer.
Project instructions template
PURPOSE: [what this Project is for] SCOPE: [in / out] CANONICAL FILES: [list with one-line description each] OUTPUT DEFAULTS: [format, tone, length] RULES: [3–5 non-negotiables] ESCALATION: [when to ask human before acting]
Memory design worksheet (per fact)
Fact: _______________ Volatility: [daily / weekly / stable] Sensitivity: [public / internal / confidential] Audience: [solo / team / org] Frequency needed: [every chat / weekly / once] → Layer: [in-context | file | memory | Project instructions] → Owner: _______________ → Review date: _______________
1M token capacity cheatsheet
~750k English words ~3–5 full business books (text only) ~50–80 long PDF reports (varies) ~1 mid-size codebase (source only, no node_modules) ~years of chat if summarised — not raw Always measure your actual corpus with a token counter.
Codebase context pack order
1. TASK + acceptance criteria 2. ARCHITECTURE.md (human-written) 3. Directory tree (paths only) 4. Shared types / API contracts 5. Files directly under change 6. Related tests 7. "Quote file:line for every claim"
Context compression prompt
Update STATE.md from this thread. Preserve: - All numeric decisions and dates verbatim - Open questions (numbered) - Constraints labelled MUST / MUST NOT - Remove duplicate reasoning; keep conclusions Output markdown only. Flag anything ambiguous for human review.
Project knowledge base INDEX.md
# [Domain] Knowledge Base Last reviewed: [DATE] Owner: [name] ## Files | ID | File | Summary | Use when | |----|------|---------|----------| | A1 | pricing_2025.pdf | … | pricing questions | ## Retrieval rules - Cite file ID + section - NOT IN SOURCE if missing - Escalate legal to [human]
Native vs external RAG matrix
NATIVE (Project) if: □ Corpus fits in context with room for Q&A □ Shared or single-user access OK □ Updates manual/weekly □ Internal power users EXTERNAL if: □ Millions of chunks or strict per-user ACL □ Real-time DB / ticket stream □ Customer-facing SLA + logging □ Need hybrid search / reranking pipeline
Retrieval prompt shell
SOURCES: [attached / Project files] RULES: 1. List section IDs you'll use (before answering) 2. Quote then interpret 3. NOT IN SOURCE if missing 4. Ignore: [appendix, marketing fluff] QUESTION: [your question] Repeat question: [your question]
Legal ops — deal room to Project KB
A 60-person company ran M&A diligence in ad hoc chats. Associates re-uploaded the same dataroom PDFs; partners got inconsistent clause summaries; one chat referenced wrong target.
Before
No memory layers, no INDEX, full PDF dumps, questions without cite requirements.
After
Per-deal Project: INDEX, chunk files by section, retrieval prompt shell (3.8), weekly STATE compression. Native RAG for diligence; external vector only for historical deal search across 500+ deals.
- Associate re-upload time → down 70%
- Clause summary cite accuracy → 94% on audit sample
- Wrong-target context incidents → 2 in H1 to 0 in H2
- Vector DB scope → deferred; saved 6-week eng project
What goes wrong
Treating 1M context as perfect memory — middle content ignored.
Positioning strategy 2.3 + lost-in-the-middle awareness 2.2.
Storing company facts only in Claude memory system.
Canonical docs in Project files; memory for prefs only.
Building Pinecone for 15 static policy PDFs.
Native Project KB 3.4–3.7; re-evaluate at scale triggers 3.6.
Never starting fresh threads — zombie context rots quality.
Handoff + compression 2.6–2.7; fresh chat with STATE.

Vetted by Krishna KumarCurator, FactorBeam
Discussion
Discussion coming soon
Shared comments for this playbook are not live yet. When they are, you'll be able to ask questions, share what worked, and see replies from other readers.