
I Beat the AI Memory Limit by Writing Notes to Files
Table of Contents
TL;DR:
- AI models forget after a few thousand tokens. I solved it by writing context to files.
- I built a loop that reads a context file, writes a to-dos list, and stores insights as the AI processes a folder of transcripts.
- I ran it on 50 transcripts in 30 minutes with Claude Code, Codex, or Gemini CLI.
- The trick works on any model that can read/write local files.
- Follow my recipe, and you’ll never lose progress or hallucinate again.
Why this matters Every data scientist or AI engineer I talk to tells me the same thing: “The AI stops remembering after a while, and then it starts hallucinating.” When a model’s context window reaches its ceiling, the conversation resets, and all the work you’ve built up disappears. That’s why I’ve seen people abandon large transcript projects, or spend hours re-introducing context. It hurts productivity, costs money, and makes it hard to keep a coherent analysis over long sessions.
Core concepts The root problem is simple: AI has a hard limit on how many tokens it can keep in its working memory. Claude, for instance, can hold about 200 k tokens in a chat before it has to trim older messages, and that’s why the official docs say the file upload limit is 20 files per chat, 30 MB each Claude — File upload limits (2025). If you pile a 200-page PDF into one file, the AI only looks at a fraction of it – the docs say it “processes only about 25 % of a concatenated large file” Claude — File upload limits (2025).
What most people miss is that you can externalize that memory. Think of a notebook: every time you finish a task you write a note, and you open the notebook next time to remind yourself what you were doing. I brought that idea to AI by keeping three Markdown files in your local folder:
| File | Purpose | How it’s used |
|---|---|---|
| context.md | Your goals and the big picture | The AI reads this at the start of every session and at the end of a reset |
| todos.md | A checklist of subtasks | After each round the AI checks off the items it’s done |
| insights.md | All the key take-aways | The AI appends new insights each time it processes a document |
I call them the memory trio. They live in the same directory that holds the transcripts, so the AI can cat them or write to them via the memory tool.
The magic comes from the memory tool that Claude exposes. The docs say it “creates, reads, updates, and deletes files in the /memories directory to store what it learns while working, then references those memories in future conversations” Claude — Memory tool (2025). When the chat hits its token ceiling and clears, the tool automatically loads the files again, so you never lose your progress.
How to apply it Here’s the exact recipe I use on any of the three platforms (Claude Code, Gemini CLI, or Codex). The steps are identical; only the CLI command changes.
Prepare your folder
mkdir synthetic-transcripts cd synthetic-transcriptsPut all your PDFs, emails, tickets, and CSVs here.
Create the memory trio
# context.md # Goal: Summarize all transcripts, pull pain points, generate FAQs. # todos.md - [ ] Load all transcript files - [ ] Extract main ideas - [ ] Identify churn indicators - [ ] Write FAQs # insights.mdLaunch the AI Claude Code:
claude-code --model Opus-4.5 --memory-dir . --loopGemini CLI:
gemini-cli --model Gemini-3-Flash --memory-dir . --loopCodex:
codex-cli --model gpt-4o --memory-dir . --loopThe –loop flag tells the tool to keep running until all todos.md items are checked off.
The AI’s routine
- Reads context.md to understand the mission.
- Picks the next unchecked item in todos.md.
- Processes the relevant files, writes new insights to insights.md.
- Checks off the item in todos.md.
- If the context window grows too large, the tool sends /clear (or /reset for Gemini) to reset the conversation while keeping the memory files untouched. The AI then rereads context.md automatically.
Review and iterate After the loop ends, open insights.md. You’ll see a clean list of extracted pain points, FAQs, churn signals, and feature ideas. I normally open it, copy the most valuable insights into a product backlog, and let the AI generate follow-up questions for any unclear points.
Metrics In a recent run on 50 transcripts (~200 MB total) I processed everything in 30 minutes using Claude Code. When I switched to Gemini CLI I got the same throughput in 35 minutes, and Codex took 45 minutes but had richer code-related insights. The entire workflow costs less than $10 in compute, compared to the $50–$100 you’d pay if you had to run a full 200-k-token context on a paid model for each document.
Pitfalls & edge cases
- File size: Keep each file under 30 MB. If you have a huge PDF, split it.
- Memory reset timing: If you forget to use /clear on Claude Code, the AI might keep a half-finished context and hallucinate.
- Permission errors: The memory tool writes to the local folder; make sure you have write access.
- Non-text files: PDFs over 100 pages are parsed only for text. Use OCR tools if you need images.
- Tool mismatches: Gemini CLI uses /reset instead of /clear. Double-check your command.
Quick FAQ
| Q | A |
|---|---|
| What if my transcript folder is bigger than 600 MB? | Split the folder into subfolders and run the loop on each. |
| Can I use this with my own custom LLM? | Yes, as long as it exposes a memory-tool or file-write capability. |
| Will the AI remember insights across runs? | Yes, because they’re stored in insights.md which the AI reads on every reset. |
| How do I stop hallucinations? | Use the reduce hallucinations guide in the Claude docs and keep the context window small. |
| Do I need to write code? | No. All commands are simple CLI invocations; the AI does the heavy lifting. |
Conclusion If you’ve ever felt the frustration of an AI “forgetting” mid-analysis, try externalizing memory with a small set of Markdown notes. The approach is platform-agnostic, works with Claude, Gemini, Codex, or any tool that can write files. It scales to hundreds of documents, keeps hallucinations at bay, and lets you spend less time re-introducing context. The next step: grab a folder of transcripts, create your context.md, and fire up the loop. Your AI will thank you with a clean, uninterrupted insight stream.


![AI Bubble on the Brink: Will It Burst Before 2026? [Data-Driven Insight] | Brav](/images/ai-bubble-brink-burst-data-Brav_hu_33ac5f273941b570.jpg)


