
Master context engineering for AI agents with a step-by-step playbook covering system prompts, hooks, sub-agents, memory, and compaction for reliable models.
Mastering Context Engineering for AI Agents: A Practical Playbook
Published by Brav
Table of Contents
TL;DR
- Context rot and attention scarcity kill agent performance.
- Three pillars—System Prompt, Hooks, Progressive Disclosure—turn chaos into order.
- Sub-agents and compaction keep long sessions snappy.
- Memory files preserve identity across sessions.
- Avoid token bloat with step-by-step guidelines.
Why This Matters
Every time I run a long conversation with an AI agent, I feel like the model forgets my last instruction. One day it was generating random code; the next it missed a critical requirement. It turns out the culprit is the context window—an invisible memory bank that can hold only so many tokens before the model starts forgetting. Context rot happens when that bank fills up: performance degrades, the agent skips details, and you lose trust. The bigger the context, the less the model can focus on what matters. Redis — Context Rot (2023)
The same problem appears when you overload the prompt with unrelated docs, tool lists, or system logs. The model then suffers from attention scarcity: every token competes for a fixed budget, and irrelevant noise pushes important info out of range. OpenAI — Conversation State Documentation (2024)
These symptoms—context rot, attention scarcity, forgetting details—are why context engineering is the most important skill for AI agents. Anthropic — System Prompt Documentation (2023)
Core Concepts
- Context Window – The number of tokens the model can see at once. Sub-agents and Cloud Code both boast a 200k token window, but the main agent usually runs with a smaller limit. Think of it as a backpack that can only hold a fixed amount of gear. If you cram too much, the model can’t carry the essentials. Claude Code — Subagents Documentation (2025)
- Context Rot – The gradual loss of performance as the prompt grows. Long, untrimmed histories become noisy. It’s like trying to remember a story after hearing it over and over with background chatter. Redis — Context Rot (2023)
- Attention Scarcity – The model’s attention budget is shared across all tokens. When you feed it too many irrelevant items, it can’t focus on the key points. OpenAI — Conversation State Documentation (2024)
- System Prompt – The agent’s “instruction sheet.” It sets the role, environment, and output style, so the model doesn’t waste tokens guessing what to do. Anthropic — System Prompt Documentation (2023)
- Tool Definitions – A catalog of what the agent can call. Imagine a menu that the agent consults before making a call. Claude Code — Plugins Reference (2025)
- Hooks – Small scripts that run automatically when the agent stops or starts. They let you inject fresh context or update memory without cluttering the prompt. Claude Code — Hooks Documentation (2025)
- Progressive Disclosure – Load only the tools you need for the current step. It’s like opening a toolbox only when you need a particular tool, not the entire set. Claude Code — Progressive Disclosure (2025)
- Sub-Agents – Specialized mini-agents with their own context windows. They can handle a specific task while keeping the main agent lean. Claude Code — Subagents Documentation (2025)
- Memory System – Persistent storage that remembers what happened across sessions, so the agent behaves consistently. Think of it as a notebook that the agent can write in and read from. Claude Code — Memory System Medium article (2024)
- Compaction – Trimming conversation history while keeping key details. It’s like summarizing a long meeting into a short note. Claude Code — Slash Commands Documentation (2025)
- Task Management – Tracking what tasks are pending, in progress, or completed. This prevents the agent from running in circles. DeepWiki — Task Management System (2024)
| Feature | Use Case | Limitation |
|---|---|---|
| Context Window | Store current conversation and relevant docs | Limited size; can cause rot |
| Memory System | Persist info across sessions | Requires careful syncing |
| Hooks | Inject context on stop/start | Misconfig can leak stale data |
How to Apply It
Write a clear System Prompt Start the prompt with a brief role description, environment info (current directory, git status), and output style. Keep it under 1k tokens. This keeps the model on track. Anthropic — System Prompt Documentation (2023)
Load tools via Progressive Disclosure Use the /tools slash command or the plugin catalog to pull in only the tools you need for the current step. Avoid listing all MCP servers at startup. Claude Code — Progressive Disclosure (2025)
Hook memory updates on stop In a stop hook, write the current turn to a memory file and prune obsolete entries. The agent will read this file on the next turn. Claude Code — Hooks Documentation (2025)
Compact when you hit ~20k tokens Run /compact to truncate the chat log into a short summary while preserving key details, preventing context rot. Claude Code — Slash Commands Documentation (2025)
Delegate heavy tasks to Sub-Agents For long-running analysis, spin up a sub-agent. Give it the 200k window and a focused prompt. Return a concise report to the main agent. Claude Code — Subagents Documentation (2025)
Persist critical context Store high-value facts (e.g., user preferences, project milestones) in memory files. The agent can load them at the start of each session. Claude Code — Memory System Medium article (2024)
Track tasks Use the built-in task manager to log progress. The agent can pause, resume, or re-prioritize tasks automatically. DeepWiki — Task Management System (2024)
Pitfalls & Edge Cases
- Too many tools: Loading all MCP servers at once forces the model to scan a huge list, triggering attention scarcity. Use progressive disclosure.
- Neglecting compaction: When history grows unchecked, the model’s context window fills, causing context rot. Run /compact often.
- Misconfigured hooks: A hook that writes stale data can corrupt memory. Test hooks in isolation before deploying.
- Sub-agent misuse: Calling a sub-agent for a trivial task wastes a 200k window that could belong to the main agent. Use sub-agents only for complex, long-running tasks.
- Missing environment info: If the system prompt omits current directory or git status, the agent may generate code for the wrong repo.
- Over-compressing: Compacting too aggressively can drop useful nuance. Tune the summary algorithm to preserve key facts.
Questions
- How exactly is context engineering implemented in Cloud Code? Cloud Code exposes a /context endpoint that the agent calls to fetch fresh data, then merges it into the prompt.
- What specific steps reduce context rot in practice? Regular compaction, selective tool loading, and memory pruning are the proven tactics.
- How do hooks interact with system prompts in Cloud Code? Hooks run after the system prompt is parsed, letting you modify the prompt before the model sees it.
Quick FAQ
- How do I set up hooks in Cloud Code? Follow the Hooks reference guide and add a stop hook that writes context to a memory file. Claude Code — Hooks Documentation (2025)
- Why use sub-agents? Sub-agents give each task its own 200k context window, keeping the main agent lean. Claude Code — Subagents Documentation (2025)
- What is slash compact and how often? Slash compact truncates conversation history to a summary while preserving key details, preventing context rot. Use it when you hit ~20k tokens. Claude Code — Slash Commands Documentation (2025)
- Can I integrate external data sources? Yes, via MCP servers. Add them in the MCP docs and load only the needed ones via progressive disclosure. Claude Code — MCP Documentation (2025)
- How do I keep the agent consistent across sessions? Store crucial context in memory files and use hooks to load them at startup. Claude Code — Memory System Medium article (2024)
- What if I need to analyze YouTube comments? Use the YouTube thumbnails CLI tool (Thumbkit) via the plugin, which can fetch comments and produce summaries. YouTube thumbnails CLI — Thumbkit (2025)
- How do I manage tool limits? Use progressive disclosure to load only the tools needed for each step, reducing token usage. Claude Code — Progressive Disclosure (2025)
Conclusion
Context engineering is not a luxury; it’s the backbone of any high-performance AI agent. Start by crafting a tight system prompt, hook memory updates, and load tools on demand. Use sub-agents for heavy lifting and keep the main chat lean with regular compaction. Persist key facts so the agent remembers you across sessions. With these practices, the agent will stay focused, follow instructions exactly, and produce reliable results every time.
References
- Anthropic — System Prompt Documentation (2023) (https://simonwillison.net/2025/May/25/claude-4-system-prompt/)
- OpenAI — Conversation State Documentation (2024) (https://platform.openai.com/docs/guides/conversation-state)
- Redis — Context Rot (2023) (https://redis.io/blog/context-rot/)
- Claude Code — Subagents Documentation (2025) (https://code.claude.com/docs/en/sub-agents)
- Claude Code — Hooks Documentation (2025) (https://code.claude.com/docs/en/hooks)
- Claude Code — Progressive Disclosure (2025) (https://docs.claude-mem.ai/progressive-disclosure)
- Claude Code — Plugins Reference (2025) (https://code.claude.com/docs/en/plugins-reference)
- Claude Code — Memory System Medium article (2024) (https://medium.com/@sonitanishk2003/the-ultimate-guide-to-llm-memory-from-context-windows-to-advanced-agent-memory-systems-3ec106d2a345)
- Claude Code — Slash Commands Documentation (2025) (https://code.claude.com/docs/en/slash-commands)
- Claude Code — MCP Documentation (2025) (https://code.claude.com/docs/en/mcp)
- DeepWiki — Task Management System (2024) (https://deepwiki.com/memodb-io/Acontext/5.1-task-agent-system)
- YouTube thumbnails CLI — Thumbkit (2025) (https://claude-plugins.dev/skills/@kenneth-liao/ai-launchpad-marketplace/youtube-thumbnail)
- UV — Tool Guide (2024) (https://docs.astral.sh/uv/guides/tools/)





