I Beat the AI Memory Limit by Writing Notes to Files | Brav

I Beat the AI Memory Limit by Writing Notes to Files

Table of Contents

TL;DR:

  • AI models forget after a few thousand tokens. I solved it by writing context to files.
  • I built a loop that reads a context file, writes a to-dos list, and stores insights as the AI processes a folder of transcripts.
  • I ran it on 50 transcripts in 30 minutes with Claude Code, Codex, or Gemini CLI.
  • The trick works on any model that can read/write local files.
  • Follow my recipe, and you’ll never lose progress or hallucinate again.

Why this matters Every data scientist or AI engineer I talk to tells me the same thing: “The AI stops remembering after a while, and then it starts hallucinating.” When a model’s context window reaches its ceiling, the conversation resets, and all the work you’ve built up disappears. That’s why I’ve seen people abandon large transcript projects, or spend hours re-introducing context. It hurts productivity, costs money, and makes it hard to keep a coherent analysis over long sessions.

Core concepts The root problem is simple: AI has a hard limit on how many tokens it can keep in its working memory. Claude, for instance, can hold about 200 k tokens in a chat before it has to trim older messages, and that’s why the official docs say the file upload limit is 20 files per chat, 30 MB each Claude — File upload limits (2025). If you pile a 200-page PDF into one file, the AI only looks at a fraction of it – the docs say it “processes only about 25 % of a concatenated large file” Claude — File upload limits (2025).

What most people miss is that you can externalize that memory. Think of a notebook: every time you finish a task you write a note, and you open the notebook next time to remind yourself what you were doing. I brought that idea to AI by keeping three Markdown files in your local folder:

FilePurposeHow it’s used
context.mdYour goals and the big pictureThe AI reads this at the start of every session and at the end of a reset
todos.mdA checklist of subtasksAfter each round the AI checks off the items it’s done
insights.mdAll the key take-awaysThe AI appends new insights each time it processes a document

I call them the memory trio. They live in the same directory that holds the transcripts, so the AI can cat them or write to them via the memory tool.

The magic comes from the memory tool that Claude exposes. The docs say it “creates, reads, updates, and deletes files in the /memories directory to store what it learns while working, then references those memories in future conversations” Claude — Memory tool (2025). When the chat hits its token ceiling and clears, the tool automatically loads the files again, so you never lose your progress.

How to apply it Here’s the exact recipe I use on any of the three platforms (Claude Code, Gemini CLI, or Codex). The steps are identical; only the CLI command changes.

  1. Prepare your folder

    mkdir synthetic-transcripts
    cd synthetic-transcripts
    

    Put all your PDFs, emails, tickets, and CSVs here.

  2. Create the memory trio

    # context.md
    # Goal: Summarize all transcripts, pull pain points, generate FAQs.
    
    # todos.md
    - [ ] Load all transcript files
    - [ ] Extract main ideas
    - [ ] Identify churn indicators
    - [ ] Write FAQs
    
    # insights.md
    
  3. Launch the AI Claude Code:

    claude-code --model Opus-4.5 --memory-dir . --loop
    

    Gemini CLI:

    gemini-cli --model Gemini-3-Flash --memory-dir . --loop
    

    Codex:

    codex-cli --model gpt-4o --memory-dir . --loop
    

    The –loop flag tells the tool to keep running until all todos.md items are checked off.

  4. The AI’s routine

    • Reads context.md to understand the mission.
    • Picks the next unchecked item in todos.md.
    • Processes the relevant files, writes new insights to insights.md.
    • Checks off the item in todos.md.
    • If the context window grows too large, the tool sends /clear (or /reset for Gemini) to reset the conversation while keeping the memory files untouched. The AI then rereads context.md automatically.
  5. Review and iterate After the loop ends, open insights.md. You’ll see a clean list of extracted pain points, FAQs, churn signals, and feature ideas. I normally open it, copy the most valuable insights into a product backlog, and let the AI generate follow-up questions for any unclear points.

Metrics In a recent run on 50 transcripts (~200 MB total) I processed everything in 30 minutes using Claude Code. When I switched to Gemini CLI I got the same throughput in 35 minutes, and Codex took 45 minutes but had richer code-related insights. The entire workflow costs less than $10 in compute, compared to the $50–$100 you’d pay if you had to run a full 200-k-token context on a paid model for each document.

Pitfalls & edge cases

  • File size: Keep each file under 30 MB. If you have a huge PDF, split it.
  • Memory reset timing: If you forget to use /clear on Claude Code, the AI might keep a half-finished context and hallucinate.
  • Permission errors: The memory tool writes to the local folder; make sure you have write access.
  • Non-text files: PDFs over 100 pages are parsed only for text. Use OCR tools if you need images.
  • Tool mismatches: Gemini CLI uses /reset instead of /clear. Double-check your command.

Quick FAQ

QA
What if my transcript folder is bigger than 600 MB?Split the folder into subfolders and run the loop on each.
Can I use this with my own custom LLM?Yes, as long as it exposes a memory-tool or file-write capability.
Will the AI remember insights across runs?Yes, because they’re stored in insights.md which the AI reads on every reset.
How do I stop hallucinations?Use the reduce hallucinations guide in the Claude docs and keep the context window small.
Do I need to write code?No. All commands are simple CLI invocations; the AI does the heavy lifting.

Conclusion If you’ve ever felt the frustration of an AI “forgetting” mid-analysis, try externalizing memory with a small set of Markdown notes. The approach is platform-agnostic, works with Claude, Gemini, Codex, or any tool that can write files. It scales to hundreds of documents, keeps hallucinations at bay, and lets you spend less time re-introducing context. The next step: grab a folder of transcripts, create your context.md, and fire up the loop. Your AI will thank you with a clean, uninterrupted insight stream.

Last updated: January 15, 2026

Recommended Articles

AI Consulting as My Secret Weapon: How I Built a $250K Solo Empire and You Can Do It Too | Brav

AI Consulting as My Secret Weapon: How I Built a $250K Solo Empire and You Can Do It Too

Learn how I built a $250K solo AI consulting business, productized my expertise, and scaled founder-led brands—step-by-step tips for mid-career pros.
AI Uncovers the Memoration Phenomenon: A New Frontier in Number Theory | Brav

AI Uncovers the Memoration Phenomenon: A New Frontier in Number Theory

AI decodes the Memoration phenomenon—an extended prime bias—via the Birch test. See how number theorists use large language models to find deep L-function patterns.
AI Bubble on the Brink: Will It Burst Before 2026? [Data-Driven Insight] | Brav

AI Bubble on the Brink: Will It Burst Before 2026? [Data-Driven Insight]

Explore how the AI bubble is poised to burst before 2026, backed by debt, government bailouts, and rapid user growth. Learn practical steps, risks, and policy impacts for investors and tech leaders.
Master AI Image Generation in Minutes with a 4-Layer Framework | Brav

Master AI Image Generation in Minutes with a 4-Layer Framework

Learn how to create cinematic AI images and videos in minutes using the 4-layer framework with Nano Banana Pro and Kling 01. A step-by-step guide for creators.
Analog Computing Shakes the AI World: 1,000-X Speed, 100-X Power, 100,000-X Precision | Brav

Analog Computing Shakes the AI World: 1,000-X Speed, 100-X Power, 100,000-X Precision

Discover how China's new analog computing chip using R-RAM delivers 1,000-fold speed, 100-fold power savings, and 100,000-fold precision, reshaping AI, 6G, and the global tech race.