
Learn how to set up ref.tools and exa.ai MCPs for fast, token-efficient AI coding. Secure API keys, plan mode, and design tokens explained.
MCP Mastery: How to Configure ref.tools & exa.ai for Lightning-Fast AI Code Generation
Published by Brav
Table of Contents
TL;DR
- Get MCPs set up so your LLM never forgets the middle of a discussion.
- Use ref.tools to pull just the docs your model needs, cutting token waste.
- Pair it with exa.ai for fast, high-quality code search.
- Secure your API keys and avoid hard-coded style values.
- Measure token usage to keep costs low.
Why this matters
When an LLM is asked to generate code for a large project, it can lose track of the conversation—this is context rot. If the assistant can’t remember earlier parts of the dialogue, it ends up guessing or repeating the same mistakes. Ref.tools was built to stop that. It stores only the snippet the model actually needs in a Model Context Protocol (MCP) session, so the model never has to remember the whole file. ref.tools — ref-tools-mcp (2025)
Large projects also drown the model in tokens. If you feed it a whole README or an entire API spec, the model spends tokens just reading, not writing. Ref.tools and exa.ai cut that waste by giving you a focused search. Using them can lower token usage for documentation by almost half. ref.tools — ref-tools-mcp (2025) exa.ai — Exa AI Search Engine (2025)
When you add a code search engine that pulls in code from many GitHub repos, you also risk a huge burst of tokens in a single query. Exa.ai’s search engine returns only the relevant lines, keeping token consumption in check. exa.ai — Exa AI Search Engine (2025)
You’ve probably seen that you can’t keep API keys in your repo. If a key slips into a public repo, anyone can use your quota and hit the bill. Secure your keys by storing them in your CI secret store or a local .env file and never commit. Ref.tools requires an API key; it never embeds it in the code. ref.tools — ref-tools-mcp (2025)
Multi-step tool calls can also drain the context window. If your model does a search, then reads, then writes code, each step adds tokens. Plan mode can help by letting the model outline the plan first, so you only feed the plan and the final code. Claude — Common Workflows (2025)
Finally, design consistency matters. Hard-coded colors or spacing in Tailwind v4 break when you update your design system. Using a unified design token system and the new Tailwind v4 utilities removes hard-coded values and keeps your UI consistent across themes. Tailwind CSS — Tailwind CSS v4.0 Release (2025)
Core concepts
| Parameter | Use Case | Limitation |
|---|---|---|
| ref.tools | Search and fetch documentation, PDFs, GitHub repos | Requires API key, initial indexing cost |
| exa.ai | Fast code search across open-source | Higher per-search token cost, requires API key |
| Naive fetch | Directly grab a URL | Loads whole page, high token usage, prone to context rot |
Model Context Protocol (MCP) is the glue that lets your LLM talk to these tools. An MCP server sits in front of your LLM and translates the model’s “search” or “read” calls into real HTTP requests. The server stores the results in the session, so the model never sees a huge chunk of text. That’s how you avoid context rot.
Plan mode is a special mode in Claude (and other LLMs) that lets the model create a plan before it writes code. It’s like a programmer’s whiteboard: you see the steps first, then the implementation. Claude — Common Workflows (2025)
Tailwind v4 brings a new high-performance engine. It’s 5× faster for full builds and 100× faster for incremental builds. That speed lets you re-generate your design tokens on the fly without waiting for a full build. Tailwind CSS — Tailwind CSS v4.0 Release (2025)
ShadCN is a component library that sits on top of Tailwind. It gives you ready-made components with a consistent design token system. When you combine ShadCN with Tailwind v4 and a unified design token system, hard-coded values disappear. ShadCN — UI Library (2025)
How to apply it
Add MCP servers once for all projects
In your root user folder, add the MCP servers to a single mcp.json file. This file is shared across all your projects.{ "servers": [ { "name": "ref-tools", "type": "http", "url": "https://api.ref.tools/mcp?apiKey=${REF_API_KEY}" }, { "name": "exa.ai", "type": "http", "url": "https://exa.ai/mcp?apiKey=${EXA_API_KEY}" } ] }The ${REF_API_KEY} and ${EXA_API_KEY} variables come from your secret store. Never check them into the repo.
Secure your API keys
Store REF_API_KEY and EXA_API_KEY in your CI secret store or a local .env file that’s added to .gitignore. If you need to rotate keys, update the secret store and restart your MCP server. This keeps your keys out of version control. ref.tools — ref-tools-mcp (2025)Configure Cursor to use the MCP servers
In Cursor, add a JSON section to your workspace config that points to the MCP servers.{ "mcp": { "servers": [ { "ref-tools": "http://localhost:8000/mcp" }, { "exa.ai": "http://localhost:8001/mcp" } ] } }Now Cursor can issue search and read calls to the MCP.
Index your documentation with ref.tools
Run ref-tools index against your public docs, PDFs, and GitHub repos. The tool builds a lightweight vector store that can be queried in real time.ref-tools index --docs public/README.md --pdf docs/architecture.pdf --repo https://github.com/myorg/myrepoThis process can take a few minutes, but it only runs once unless you add new content.
Use plan mode for large changes
Before writing code, ask Claude to generate a plan. The model will output steps like “1. Review architecture, 2. Define new component, 3. Write tests.” You then feed that plan into the code generation step. This keeps the token budget low and reduces the risk of context rot. Claude — Common Workflows (2025)Search for code with exa.ai
When you need to copy or refactor code, ask Claude to search exa.ai. The search returns only the relevant lines, not the entire repo.exa.ai.search("React component for button with Tailwind classes")The response is a short snippet that you can paste directly into your editor.
Measure token usage
Use the MCP’s built-in metrics or a side-car service to track how many tokens are used per request. If you hit 100k tokens for a single query, split the request or refine the prompt. The MetaCTO blog shows how a 6k token request can cost $0.09 for a single step with Claude Opus. MetaCTO — Anthropic API Pricing 2025 (2025)Apply design tokens
Create a globals.css file that declares all your design tokens (colors, spacing, typography). Tailwind v4 can consume these tokens directly, so you never hard-code a hex value.:root { --color-primary: #ff3e00; --spacing-4: 1rem; }In Tailwind, reference the tokens with var(–color-primary) or use the Tailwind config to map them. This keeps the design consistent across the app. Tailwind CSS — Tailwind CSS v4.0 Release (2025)
Pitfalls & edge cases
- Context rot still occurs if you use plain fetch. Avoid using raw HTTP calls; always route through the MCP.
- Token spikes from large docs. If you search a huge PDF, the MCP will trim the result to the most relevant 5k tokens, but you still pay for the read. Keep your queries focused.
- API key leaks. Even if the key is in a secrets store, a mis-configured deployment can expose it. Regularly audit access logs.
- Plan mode not supported on all models. Only Claude and a few other LLMs have plan mode. If you’re using GPT-4, use a custom workflow that writes a plan first.
- Design tokens conflict across Tailwind versions. Tailwind v4’s token syntax changed. If you mix v3 and v4 code, you’ll see errors. Upgrade your entire codebase to v4 before adding tokens.
Quick FAQ
| Question | Answer |
|---|---|
| How do MCP servers handle authentication for private documentation? | The MCP server accepts an API key in the URL or environment. The server forwards the key to the underlying tool, so only authorized users can index private docs. ref.tools — ref-tools-mcp (2025) |
| What are the limitations on data size or indexing speed for ref.tools? | Indexing a 100-MB PDF can take a few seconds, but the tool throttles large requests to avoid OOM. The speed scales linearly with the number of documents. |
| How does Exa AI compare to other code search engines? | Exa AI returns only the most relevant lines, with a 0.5-second latency on most queries, and supports advanced filtering. It’s faster than a plain GitHub search and cheaper than commercial RAG engines. exa.ai — Exa AI Search Engine (2025) |
| How are plans generated by the agent in plan mode? | The model is prompted to produce a numbered list of steps. You can then feed each step as a separate prompt or let the model execute them in sequence. |
| What are the cost implications of using ref.tools and exa.ai? | Ref.tools is free for the open-source version; the hosted service costs $0.01 per 1,000 queries. Exa AI charges $0.0001 per token returned. The total cost depends on usage volume. |
| How does the integration with Cursor’s JSON config work in detail? | Cursor reads a cursor.json file at launch, looks for an mcp section, and registers the servers. It then exposes search and read calls as commands in the IDE. |
| How can I rotate API keys securely without breaking integration? | Update the key in your secret store and restart the MCP server. The server reads the environment at launch, so no code changes are needed. |
Conclusion
If you’re building a large codebase or a design-heavy app, MCPs give you a single, token-efficient way to give your LLM the exact context it needs. Ref.tools pulls in the docs you care about; exa.ai gives you fast, high-quality code snippets. Combined with plan mode and a unified design token system, you’ll write code faster, spend less on tokens, and keep your UI consistent.
Actionable next steps
- Add mcp.json to your root folder and point to ref.tools and exa.ai.
- Secure the API keys in your secret store.
- Index your docs with ref.tools.
- Test plan mode in Claude before writing code.
- Use exa.ai for code searches.
- Add design tokens to globals.css and upgrade to Tailwind v4.
- Measure token usage and iterate.
Who should use this?
- Software engineers who need fast, accurate code generation.
- AI developers who want to keep LLMs in context.
- Technical leads who care about design consistency and cost control.
Who should not?
- Small teams with a trivial codebase; the overhead may not be worth it.



