Standalone article · part of a sequenced guide

What you'll unlock: Memory is layered — in-context, files, memory system, Project instructions. A million tokens is a workspace, not a dump truck: position what matters, compress what ages, and build Projects as knowledge bases when retrieval must persist.

View full guide New here? Start Chapter 1

Tool guideChapter 5 of 10

Memory, Context & the 1 Million Token Mindset

~80 min read

The complete guide to Claude's memory systems — what persists, what doesn't, and how to architect context for serious work

Chapter context

Your team hits context limits, loses thread continuity, or uploads entire data rooms hoping Claude 'finds the answer.' Quality is inconsistent and API bills spike.This chapter replaces hope with memory and retrieval design — the same discipline you'd apply to any data system, applied to how Claude sees your work.

Is this chapter for you?

Do multi-day Claude sessions lose early decisions or instructions?

Yes — Concept 1 in-context + handoffs; Concept 2.6–2.7 compression.

Are you loading large document sets (contracts, research, code)?

Yes — Concept 2 positioning and Concept 3 retrieval structure are mandatory.

Does your team re-upload the same files every week?

Yes — Concept 1.7 team Projects and Concept 3.4 knowledge-base pattern.

Are you evaluating vector databases for internal Q&A?

Yes — read Concept 3.6 before buying infrastructure.

Chapters 1 and 4 gave you the mental model and prompting craft. Chapter 5 is context architecture — memory layers, million-token strategy, and native RAG without a vector database.Power users do not ask 'does Claude remember?' They ask 'which memory layer owns this fact?' and design accordingly.

Chapter insight

Memory is layered — in-context, files, memory system, Project instructions. A million tokens is a workspace, not a dump truck: position what matters, compress what ages, and build Projects as knowledge bases when retrieval must persist.

Reference diagrams

Four memory layers

Assign every fact to a layer — volatility and sensitivity drive the choice.

In-contextThis thread onlyVolatile

FilesProject uploadsEvidence

MemoryUser prefsPersonal

ProjectInstructionsTeam policy

Long context stack order

Instructions → summary → bulk → question → recap — fight lost-in-the-middle.

InstructionsRules firstTop

SummaryKey factsHigh attention

CorpusFull docsMiddle

Ask + recapTask lastBottom

Implementation paths

Three concepts — memory layers, 1M mastery, native RAG.

Concept 1

How Claude's Memory Actually Works

The architecture behind what Claude remembers — and the consequences of each layer for how you work

1.1

The four memory layers

In-context, external storage, memory system, and project instructions — what each does and who controls it

Key takeaway

Claude memory is not one thing — it is four layers: the live conversation window, uploaded documents, Claude's optional memory feature, and Project-level instructions. Each has different persistence, cost, and control.

Why this matters

Teams conflate 'Claude remembered' with the wrong layer and build workflows that break when context shifts. Layer literacy is architecture literacy.

In-context is what Claude sees right now. External storage is evidence you provide. Memory system is selective persistence. Project instructions is team-scoped policy and context.

You control external storage and Project instructions directly. Memory system is semi-automatic — review it. In-context is automatic but finite.

Workflow — do this next

01Map your top 10 facts to a layer: in-context / file / memory / Project.
02Nothing critical should live only in in-context.
03Document the map in PROJECT_MEMORY.md.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Memory layer decision map

IN-CONTEXT → session decisions, draft text, this-thread reasoning
EXTERNAL FILES → specs, contracts, data room, codebase exports
MEMORY SYSTEM → stable prefs (role, tools, standing constraints)
PROJECT INSTRUCTIONS → team policy, voice, scope, links to canonical docs

Rule: if losing the chat would hurt → move up a layer.

1.2

In-context memory

Everything in the current conversation window — what it holds, how long it lasts, and when it is lost

Key takeaway

In-context is Claude's working RAM — fast, rich, and fragile. It includes messages, tool results, and attachments until the context limit is hit or the chat is abandoned.

Why this matters

Chapter 1 established stateless defaults. In-context is the only 'memory' in a single API call — everything else you engineer.

Everything sent in the thread competes for the same attention budget. Long threads degrade: early instructions weaken, middle documents suffer lost-in-the-middle effects (Concept 2.2).

In-context is lost when: you start a new chat, the window overflows, or you switch Projects without carrying summaries. Power users distill state before loss.

Workflow — do this next

01Monitor thread length on multi-day work — restart with handoff summary at ~60% felt capacity.
02Pin critical constraints in the latest message, not only message one.
03Use SESSION_STATE.md in Project for continuity across chats.

Real example

Legal review — 180-message thread collapse

A team ran contract redlines in one chat for two weeks. Message 1 contained scope limits; by message 150 Claude suggested clauses outside mandate. Fix: weekly new chat with distilled OPEN_ISSUES.md + only active sections attached.

1.3

Claude's memory system

What gets saved, how to view it, how to edit it, and how it surfaces in future conversations

Key takeaway

Claude.ai memory stores inferred facts about you — useful for continuity, dangerous if wrong. You can view, edit, and delete memories; treat them like a profile you'd audit quarterly.

Why this matters

Unreviewed memories become confident wrong callbacks. Memory is not a substitute for canonical docs.

Memory captures preferences and recurring context Claude infers from chats — not full transcripts. It surfaces in new conversations as background, similar to a lightweight user model.

Audit memory after role changes, company pivots, or tool switches. Delete stale entries ('you prefer Opus for everything'). Prefer Project files for factual corp data over memory.

Workflow — do this next

01Settings → Memory: read all entries monthly.
02Delete anything project-specific or outdated.
03Add critical standing facts manually if the product allows.

1.4

Project instructions as persistent context

The system prompt layer that persists across every conversation in a project

Key takeaway

Project instructions are the persistent brief for a workspace — scope, rules, links, and tone that apply to every new chat in that Project without re-pasting.

Why this matters

This is the closest Claude.ai gets to a system prompt. Under-invested Projects behave like amnesiac assistants.

Put here: what this Project is for, canonical file names, approval rules, output defaults, links to repo/wiki. Keep lean — link to large docs rather than paste them (Chapter 2 token economics).

Project instructions + attached knowledge files = persistent context layer. Different Projects = different memory boundaries (client A vs client B).

Workflow — do this next

01One Project per client, product, or major initiative.
02Instructions under 400 words; details in attached INDEX.md.
03Review instructions when scope changes — version in filename.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Project instructions template

PURPOSE: [what this Project is for]
SCOPE: [in / out]
CANONICAL FILES: [list with one-line description each]
OUTPUT DEFAULTS: [format, tone, length]
RULES: [3–5 non-negotiables]
ESCALATION: [when to ask human before acting]

1.5

External memory via documents

Uploading files as external memory — what Claude can retrieve and how accurately

Key takeaway

Uploaded files are external memory — Claude reads them in context, not from a perfect database. Retrieval quality depends on structure, positioning, and question clarity.

Why this matters

Uploading a 400-page PDF without structure is hope, not architecture. Document design is memory design.

Claude processes uploaded text into the context window — it does not magically index every page for selective lookup unless you use tools/MCP or structure prompts for retrieval (Concept 3).

Accuracy improves with: clear headings, tables of contents, chunking large files, and explicit 'answer only from section X' instructions.

Workflow — do this next

01Prepend TOC to long PDFs before upload.
02Split multi-topic dumps into named files.
03Ask Claude to quote supporting passages — verify against source.

1.6

Memory across conversations

Why Claude starts fresh and the three ways to give it continuity

Key takeaway

New chat = cold start unless you inject continuity via Project context, handoff summaries, or API session stores.

Why this matters

Assuming cross-chat memory causes duplicated work and contradictory decisions.

Three continuity patterns: (1) Project persistence, (2) handoff summaries, (3) API session store

Workflow — do this next

01Pick one primary continuity pattern per workstream.
02Never rely on Claude memory alone for decisions.
03Template handoff block in Chapter 1 artifact.

1.7

Memory for teams

How shared Projects give teams shared context without each member re-explaining everything

Key takeaway

Team Projects are shared external memory — onboarding docs, approved prompts, client context — so new chats inherit org knowledge, not individual folklore.

Why this matters

Without shared Projects, every hire re-uploads the same PDFs and re-explains brand voice.

Claude Team enables shared Projects with role-appropriate access. Treat each shared Project as a team knowledge capsule — not a dumping ground for every file.

Assign Project owners: curate files quarterly, prune stale instructions, document what belongs here vs in the wiki.

Workflow — do this next

01Create TEAM_HQ Project with onboarding README.
02Migrate top 5 re-uploaded files into shared Project once.
03Ban 'ask Sarah' — link to Project instead.

Real example

Agency — client Projects as memory boundary

Each client Project: scope doc, brand guide, banned phrases, active campaign artifacts. Account managers start chats inside client Project — no cross-client bleed. New hire productive day one by reading Project INDEX.

1.8

The memory design decision

Choosing the right memory layer for each piece of information — the architecture mindset

Key takeaway

For every fact, ask: volatility, sensitivity, audience, and retrieval frequency — then assign a layer. Memory design is explicit, not accidental.

Why this matters

Random layer choice creates cost (token bloat), risk (wrong client data), and confusion (stale memory).

Volatile (changes weekly) → in-context or short-lived files, not memory system. Stable policy → Project instructions. Personal preference → memory or user prefs. Sensitive → external with access control, never public chat links.

High-frequency retrieval across many chats → Project file with good structure. One-off analysis → attach, extract conclusions to canonical doc, detach.

Workflow — do this next

01Run memory design review when starting a new initiative.
02Document layer choices in Project README.
03Re-review when team complains 'Claude forgot' — usually wrong layer.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Memory design worksheet (per fact)

Fact: _______________
Volatility: [daily / weekly / stable]
Sensitivity: [public / internal / confidential]
Audience: [solo / team / org]
Frequency needed: [every chat / weekly / once]

→ Layer: [in-context | file | memory | Project instructions]
→ Owner: _______________
→ Review date: _______________

Concept 2

The 1 Million Token Context — Practical Mastery

What a 1 million token context window actually enables — and the techniques for using it without wasting it

2.1

What 1 million tokens actually holds

Books, codebases, research corpora, conversation histories — the concrete capacity in human terms

Key takeaway

One million tokens is roughly 750k words — multiple books, a mid-size codebase snapshot, or dozens of long reports — but capacity is not the same as perfect recall.

Why this matters

Oversized context invites lazy dumping. Knowing human-scale capacity helps you plan what belongs in-window vs external RAG.

Rule of thumb: ~1.3 tokens per English word. 1M tokens ≈ 750k words ≈ 1,500 single-spaced pages of prose — or less for dense code/JSON.

Concrete fits: full novel + notes, 50–100 substantial PDFs if compressed, entire repo export for architecture review (not every binary). Always verify model tier supports 1M on your plan — see Chapter 2.

Workflow — do this next

01Estimate token count before mega-upload.
02Ask: does this task need full corpus or targeted sections?
03Budget cost — 1M input is not free on API.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

1M token capacity cheatsheet

~750k English words
~3–5 full business books (text only)
~50–80 long PDF reports (varies)
~1 mid-size codebase (source only, no node_modules)
~years of chat if summarised — not raw

Always measure your actual corpus with a token counter.

2.2

The lost-in-the-middle problem

Why Claude's attention degrades on information buried in the middle of a long context — and how to structure documents to counter it

Key takeaway

Models attend strongly to the beginning and end of context; middle sections get under-weighted. Long dumps without structure produce missed details.

Why this matters

Teams upload everything, ask one question, and blame the model when mid-document facts vanish.

Lost in the middle means critical clauses on page 200 of 400 may be ignored. Mitigations: reposition key facts, summarise middle sections, or retrieve relevant chunks only.

Symptoms: contradictory answers, 'I don't see that' when text is present, confident omission of mid-doc requirements.

Workflow — do this next

01Put must-read facts in intro and recap sections.
02For contracts: extract key clauses to a 2-page SUMMARY.md at top of context.
03Test with needle-in-haystack questions before trusting workflow.

Real example

Procurement — indemnity clause missed

200-page MSA uploaded whole. Claude approved terms but missed indemnity cap in middle section. Fix: REQUIREMENTS.md listing 12 must-verify clauses at context start; ask Claude to tick each with page cite.

2.3

Document positioning strategy

Where to place the most important information in a long context — the positioning principles that preserve retrieval quality

Key takeaway

Order context deliberately: instructions first, critical facts next, supporting bulk in the middle, task and recap last.

Why this matters

Positioning is free and often beats buying more tokens.

Optimal stack: (1) system/Project instructions, (2) executive summary of all attachments, (3) full documents, (4) user question, (5) 'Before answering, list which sections you used.'

Repeat critical constraints in the final user message — recency reinforces attention.

Workflow — do this next

01Build a CONTEXT_ORDER template for your team.
02Never bury the ask — put question after documents or in dual position.
03Use XML tags to label sections (Chapter 4).

2.4

Loading a codebase

How to structure an entire codebase in context for software work — the format and the order that produces the best results

Key takeaway

For codebase-in-context: exclude noise (node_modules, build artifacts), lead with ARCHITECTURE.md and tree overview, group by module, put target files last before the task.

Why this matters

Raw repo dumps waste tokens on irrelevant files and bury the module you need to change.

Prefer Claude Code for repo work when possible — it navigates natively. For Claude.ai/API: export tree + key files, or use MCP git integration.

Include: README, package manifests, entry points, types/interfaces, files under change. Exclude: lockfiles content, minified assets, generated code unless task-specific.

Workflow — do this next

01Generate tree: find . -type f -name '*.ts' | head — curate list.
02Attach ARCHITECTURE.md written by humans first.
03Scope task to one package/service per session.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Codebase context pack order

1. TASK + acceptance criteria
2. ARCHITECTURE.md (human-written)
3. Directory tree (paths only)
4. Shared types / API contracts
5. Files directly under change
6. Related tests
7. "Quote file:line for every claim"

2.5

Loading a corpus of research

Feeding multiple documents and asking cross-document questions — the research workflow that used to require a dedicated tool

Key takeaway

Multi-doc synthesis works when documents are labelled, summarised at the top, and questions specify comparison dimensions — not 'tell me everything.'

Why this matters

Cross-doc questions without structure produce shallow summaries that miss disagreements between sources.

Workflow: ingest docs with consistent naming (AUTHOR_YEAR_TOPIC.md), add 5-line abstract per doc at context start, ask matrix questions ('compare methods, sample size, conclusion across docs A–F').

Use artifacts for synthesis output; keep chat for methodology questions.

Workflow — do this next

01Create CORPUS_INDEX.md — one row per source.
02Ask for disagreement map before consensus summary.
03Require citation format: [Doc ID, section].

Real example

Corp dev — 40 acquisition memos

PM indexed memos, loaded index + 12 most relevant full texts. Question: 'Which targets share regulatory risk pattern X?' Cross-doc table in artifact with cites. Work that previously needed analyst week.

2.6

Conversation history management

When to continue a conversation and when to start fresh — the decision that affects quality as context grows

Key takeaway

Continue when thread is focused and under ~60% context; start fresh with handoff when scope shifts, quality drops, or instructions fight earlier messages.

Why this matters

Zombie threads accumulate contradictions and dilute instructions — sunk-cost fallacy keeps people in bad chats.

Fresh start triggers: new sub-project, role change in prompt, repeated corrections of same mistake, unexplained quality cliff.

Continue when: same deliverable, iterative refinement, artifact in progress, context still coherent.

Workflow — do this next

01End sessions with 10-line HANDOFF block.
02New chat starts with HANDOFF + 'confirm before proceeding'.
03Archive old threads — don't delete; export conclusions to Project.

2.7

Context compression

Summarising earlier context to preserve the window — the technique for long-running projects

Key takeaway

Compression = structured summaries that preserve decisions, open questions, and constraints — not lossy 'tl;dr' that drops nuance.

Why this matters

Long projects exceed any window without compression discipline.

Pattern: every N turns or daily, ask Claude to update STATE.md sections: Decisions, Open questions, Constraints, Next actions, Key quotes with cites. Replace raw history with STATE in new thread.

API: rolling summary in your DB — append new turns, re-summarise when summary exceeds token budget.

Workflow — do this next

01Define non-negotiable fields in STATE template.
02Human approves compression before it becomes canonical.
03Never compress away numbers, dates, or named decisions.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Context compression prompt

Update STATE.md from this thread. Preserve:
- All numeric decisions and dates verbatim
- Open questions (numbered)
- Constraints labelled MUST / MUST NOT
- Remove duplicate reasoning; keep conclusions

Output markdown only. Flag anything ambiguous for human review.

2.8

The 1 million token mindset shift

The work that becomes possible when you stop thinking in single-document chunks — the workflow transformation

Key takeaway

1M context enables portfolio thinking — whole codebases, corpuses, deal rooms — but rewards architects who curate and position, not hoarders who dump.

Why this matters

Mindset shift: from 'what fits in one prompt' to 'what system of evidence supports this decision.'

New workflows: full-library code review, multi-contract comparison, longitudinal chat analysis, cross-team doc harmonisation — tasks that required teams of analysts or bespoke tools.

Still combine with verification, chunking for edge precision, and external RAG when corpuses exceed 1M or need real-time updates.

Workflow — do this next

01List one task you previously chunked manually — try 1M with structure.
02Measure quality vs cost vs latency.
03Document when to use 1M vs retrieval — decision tree in Project.

Real example

Compliance — annual policy harmonisation

12 policy PDFs loaded with index. Claude produced conflict matrix across jurisdictions. Legal reviewed matrix, not 400 pages. 1M window + positioning beat six weeks of associate time — with human sign-off on conflicts only.

Concept 3

Claude as a RAG System — Hidden Architecture

Using Claude's native features to build retrieval-augmented workflows without external infrastructure

3.1

What RAG means in a Claude context

Using Claude's context window as a retrieval layer without a vector database

Key takeaway

Native Claude RAG = curated documents in context + prompts that force grounded answers — no Pinecone required for many knowledge-work tasks.

Why this matters

Teams over-build vector DBs before exhausting Project-based retrieval. Claude's window is the retrieval layer when corpus fits and updates are infrequent.

RAG traditionally: embed chunks, vector search, inject top-k. In Claude.ai: you are the retriever — attach the right files, structure the ask, demand citations.

Works when: corpus < context limit, updates weekly not per-second, team can curate files. Breaks when: millions of docs, strict ACL per chunk, sub-second fresh data.

Workflow — do this next

01Try Project RAG before proposing vector infra.
02Define success: citation accuracy on 10 test questions.
03If fail, note whether size, freshness, or ACL caused it.

3.2

Document upload as retrieval

How to structure uploaded documents so Claude retrieves from them accurately

Key takeaway

Retrieval-quality uploads have: descriptive filenames, headings, page/section markers, and a top-of-file summary — Claude reads like a human skimmer, not a DB.

Why this matters

Unstructured PDF exports are retrieval poison.

Before upload: add cover page with doc ID, date, 5-bullet abstract. Use H1/H2 hierarchy. For scans, OCR with structure preserved.

Prompt pattern: 'Answer using only [DOC_ID]. Quote section headers. If not found, say NOT IN SOURCE.'

Workflow — do this next

01Rename files: ROLE_TOPIC_vDATE.ext
02Add 10-line summary file per large upload.
03Run 3 needle tests after upload.

3.3

Multi-document synthesis

Loading multiple sources and asking Claude to compare, synthesise, and reason across all of them

Key takeaway

Multi-doc synthesis needs an index layer, explicit comparison axes, and output format that forces per-source attribution.

Why this matters

Without axes, Claude averages sources into mushy consensus.

Load CORPUS_INDEX first, then full texts or summaries. Ask: 'Build comparison table: Source | Claim | Evidence | Conflicts with.'

For conflicting sources, instruct: 'Do not merge — list disagreement explicitly.'

Workflow — do this next

01Cap active full texts — summarise peripheral docs.
02One synthesis question per thread.
03Export matrix to artifact; verify 3 random cells.

Real example

Strategy — three analyst reports

CEO wanted one view of market size. Three reports disagreed. Claude produced attributed table — not blended number. CEO picked assumption set consciously. Native RAG + synthesis prompt avoided false precision.

3.4

The project-as-knowledge-base pattern

Using a Claude Project with uploaded documents as a persistent knowledge base

Key takeaway

A Project with INDEX.md, curated files, and retrieval prompts functions as a zero-code knowledge base for a role or domain.

Why this matters

Cheapest path to team-wide grounded Q&A without engineering sprint.

Structure: INDEX (what's here), POLICY (how to answer), CORPUS (files), PROMPTS (saved question templates). New chat always starts in same Project.

Maintenance: owner reviews uploads monthly; deprecate files to ARCHIVE/ subfolder listing in INDEX.

Workflow — do this next

01Clone template Project per domain (Legal KB, Product KB).
02Add 5 canonical docs before inviting team.
03Pin retrieval prompt in Project description.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Project knowledge base INDEX.md

# [Domain] Knowledge Base
Last reviewed: [DATE]
Owner: [name]

## Files
| ID | File | Summary | Use when |
|----|------|---------|----------|
| A1 | pricing_2025.pdf | … | pricing questions |

## Retrieval rules
- Cite file ID + section
- NOT IN SOURCE if missing
- Escalate legal to [human]

3.5

Chunking for Claude

How to split large documents before upload for better retrieval quality

Key takeaway

Chunk by semantic boundary (chapter, clause, module) — not arbitrary token splits — with chunk headers that repeat parent context.

Why this matters

Arbitrary 512-token chunks lose legal and technical meaning.

Each chunk file: CHUNK_META (parent doc, section, date) + content. INDEX maps questions → chunk IDs.

For API at scale, mirror same boundaries in vector DB — semantic chunks beat random splits.

Workflow — do this next

01Split at H1/H2 boundaries.
02Prefix each chunk: 'From CONTRACT_X, Section 4.2 Indemnity:'
03Load only relevant chunks when corpus exceeds window.

3.6

Native RAG vs external RAG

When Claude's built-in context window replaces a vector database and when you need the real thing

Key takeaway

Native wins for curated, bounded, human-maintained knowledge. External RAG wins for huge, dynamic, permissioned, or embedding-optimised corpora.

Why this matters

Wrong choice wastes months — vector DB for 12 PDFs, or Projects for 10M support tickets.

Choose native Project RAG: <100 docs, same team access, weekly updates, Q&A workflow.

Choose external RAG: per-user ACL, >1M tokens corpus, sub-minute data freshness, production customer-facing at scale, hybrid keyword+semantic needs.

Workflow — do this next

01Score corpus on size, freshness, ACL, query volume.
02Prototype native in one afternoon.
03External only if scorecard fails 2+ criteria.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Native vs external RAG matrix

NATIVE (Project) if:
□ Corpus fits in context with room for Q&A
□ Shared or single-user access OK
□ Updates manual/weekly
□ Internal power users

EXTERNAL if:
□ Millions of chunks or strict per-user ACL
□ Real-time DB / ticket stream
□ Customer-facing SLA + logging
□ Need hybrid search / reranking pipeline

3.7

The knowledge base Project

Building a Claude Project that functions as an always-available knowledge base for a domain or role

Key takeaway

A KB Project is a product: INDEX, owners, test questions, retrieval constitution, and changelog — not a folder of uploads.

Why this matters

KB Projects without governance become stale and untrusted.

Rollout: define 20 canonical questions, gather source docs, build INDEX, write retrieval constitution, test pass rate ≥90% on canonical set, train team on prompts, schedule quarterly audit.

Pair with Chapter 4 prompt templates for consistent retrieval behaviour.

Workflow — do this next

01Write KB_CHARTER.md: scope, owner, review cadence.
02Run canonical Q test suite before launch.
03Log failed questions — fix doc or prompt, not blame user.

Real example

HR — internal policy KB Project

40 policies chunked, INDEX maintained by HR ops. Employees ask leave, equity, travel questions. Retrieval prompt requires policy ID cite. Escalation path for edge cases. IT ticket volume for basic HR Q down 35%.

3.8

Hidden retrieval hacks

The prompt structures, document formats, and context positioning techniques that improve Claude's retrieval from long documents

Key takeaway

Retrieval hacks: quote-and-answer, section IDs, dual-position questions, negative scope ('ignore appendix'), markdown tables for facts, and forced uncertainty.

Why this matters

Small prompt and format changes often double retrieval accuracy without new infrastructure.

Hack 1: 'List relevant sections first, then answer' — forces browse before generate. Hack 2: Number every section in source; ask for §3.2. Hack 3: Put question after docs AND repeat in final line.

Hack 4: Markdown tables beat prose for numeric facts. Hack 5: 'If confidence < high, say UNCERTAIN' reduces false retrieval.

Workflow — do this next

01Add 'sections used' step to every retrieval template.
02Number headings in source docs automatically on ingest.
03A/B test with and without hacks on 10 questions.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Retrieval prompt shell

SOURCES: [attached / Project files]
RULES:
1. List section IDs you'll use (before answering)
2. Quote then interpret
3. NOT IN SOURCE if missing
4. Ignore: [appendix, marketing fluff]

QUESTION: [your question]

Repeat question: [your question]

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Session handoff (memory-aware)

End every long thread; start the next with this block.

## Handoff — [DATE] [PROJECT]

### Decisions (verbatim numbers/dates)
-

### Open questions
-

### Layer map
- In Project files: [list]
- Updated STATE.md: Y/N

### Next chat first message
"Read HANDOFF below. Confirm. Then [task]."

Knowledge base canonical test set

20 questions every KB Project must answer with cites.

For each question record:
- Q#
- Expected source doc ID
- Pass: cite correct / Fail: reason
- Fix: [doc | prompt | chunk]

Launch threshold: 18/20 pass before team rollout.

Memory layer decision map

IN-CONTEXT → session decisions, draft text, this-thread reasoning
EXTERNAL FILES → specs, contracts, data room, codebase exports
MEMORY SYSTEM → stable prefs (role, tools, standing constraints)
PROJECT INSTRUCTIONS → team policy, voice, scope, links to canonical docs

Rule: if losing the chat would hurt → move up a layer.

Project instructions template

PURPOSE: [what this Project is for]
SCOPE: [in / out]
CANONICAL FILES: [list with one-line description each]
OUTPUT DEFAULTS: [format, tone, length]
RULES: [3–5 non-negotiables]
ESCALATION: [when to ask human before acting]

Memory design worksheet (per fact)

Fact: _______________
Volatility: [daily / weekly / stable]
Sensitivity: [public / internal / confidential]
Audience: [solo / team / org]
Frequency needed: [every chat / weekly / once]

→ Layer: [in-context | file | memory | Project instructions]
→ Owner: _______________
→ Review date: _______________

1M token capacity cheatsheet

~750k English words
~3–5 full business books (text only)
~50–80 long PDF reports (varies)
~1 mid-size codebase (source only, no node_modules)
~years of chat if summarised — not raw

Always measure your actual corpus with a token counter.

Codebase context pack order

1. TASK + acceptance criteria
2. ARCHITECTURE.md (human-written)
3. Directory tree (paths only)
4. Shared types / API contracts
5. Files directly under change
6. Related tests
7. "Quote file:line for every claim"

Context compression prompt

Update STATE.md from this thread. Preserve:
- All numeric decisions and dates verbatim
- Open questions (numbered)
- Constraints labelled MUST / MUST NOT
- Remove duplicate reasoning; keep conclusions

Output markdown only. Flag anything ambiguous for human review.

Project knowledge base INDEX.md

# [Domain] Knowledge Base
Last reviewed: [DATE]
Owner: [name]

## Files
| ID | File | Summary | Use when |
|----|------|---------|----------|
| A1 | pricing_2025.pdf | … | pricing questions |

## Retrieval rules
- Cite file ID + section
- NOT IN SOURCE if missing
- Escalate legal to [human]

Native vs external RAG matrix

NATIVE (Project) if:
□ Corpus fits in context with room for Q&A
□ Shared or single-user access OK
□ Updates manual/weekly
□ Internal power users

EXTERNAL if:
□ Millions of chunks or strict per-user ACL
□ Real-time DB / ticket stream
□ Customer-facing SLA + logging
□ Need hybrid search / reranking pipeline

Retrieval prompt shell

SOURCES: [attached / Project files]
RULES:
1. List section IDs you'll use (before answering)
2. Quote then interpret
3. NOT IN SOURCE if missing
4. Ignore: [appendix, marketing fluff]

QUESTION: [your question]

Repeat question: [your question]

Legal ops — deal room to Project KB

A 60-person company ran M&A diligence in ad hoc chats. Associates re-uploaded the same dataroom PDFs; partners got inconsistent clause summaries; one chat referenced wrong target.

Before

No memory layers, no INDEX, full PDF dumps, questions without cite requirements.

After

Per-deal Project: INDEX, chunk files by section, retrieval prompt shell (3.8), weekly STATE compression. Native RAG for diligence; external vector only for historical deal search across 500+ deals.

Associate re-upload time → down 70%
Clause summary cite accuracy → 94% on audit sample
Wrong-target context incidents → 2 in H1 to 0 in H2
Vector DB scope → deferred; saved 6-week eng project

What goes wrong

Treating 1M context as perfect memory — middle content ignored.

Positioning strategy 2.3 + lost-in-the-middle awareness 2.2.

Storing company facts only in Claude memory system.

Canonical docs in Project files; memory for prefs only.

Building Pinecone for 15 static policy PDFs.

Native Project KB 3.4–3.7; re-evaluate at scale triggers 3.6.

Never starting fresh threads — zombie context rots quality.

Handoff + compression 2.6–2.7; fresh chat with STATE.

Vetted by Krishna KumarCurator, FactorBeam

Discussion

Discussion coming soon

Shared comments for this playbook are not live yet. When they are, you'll be able to ask questions, share what worked, and see replies from other readers.