
Deploy Firecrawl, n8n, and Qdrant with Docker Compose for a cost-effective AI web-scraping and knowledge-base workflow, covering authentication, credits, and CI/CD.
From Web to Knowledge Base: Deploying Firecrawl, n8n, and Qdrant on Docker Compose
TL;DR
- Firecrawl is a free self-hosted web-scraping API that turns pages into clean markdown for LLMs.
- Combine it with n8n for automation and Qdrant for vector search to build a cost-effective knowledge base.
- Use Docker Compose for all three services and a CI/CD pipeline to keep them up to date.
- The Firecrawl free plan gives 500 credits per month; the $16 plan gives 3,000 credits.
- A typical workflow produces 5–20 Q&A pairs per page and stores up to 4 documents per query.
Published by Brav
Table of Contents
Why this matters
Every AI developer who wants to scrape the web and feed the data into a chatbot runs into the same pain points:
- Finding a reliable, free API that does not throttle or block you.
- Managing a self-hosted stack without a dedicated ops team.
- Converting raw HTML into a format that an LLM can ingest.
- Building a deduplicated knowledge base that can be queried in real time.
- Watching your credit balance drift and blowing your budget on a single project.
In this article I walk through a real-world stack that solves all of these problems with a single docker-compose.yml file, a handful of n8n nodes, and a simple RAG assistant that uses Qdrant as its vector store. I also point out the edge cases that can trip you up.
Core concepts
| Component | What it does | Why it matters | Source |
|---|---|---|---|
| Firecrawl | Scrapes a URL and returns clean markdown | Gives you LLM-ready content in seconds | Firecrawl — Docs (2024) |
| n8n | Low-code workflow automation | Connects the scrape, summarization, and storage steps | n8n — Docs (2024) |
| Qdrant | Vector database that stores embeddings | Enables fast RAG queries | Qdrant — OpenAI Embeddings (2024) |
| OpenAI embeddings | Transforms text into vectors | Works seamlessly with Qdrant | Same as above |
| LLM summarizer | Generates 5–20 Q&A pairs from markdown | Turns noisy pages into usable knowledge | Medium — How to Generate QA (2024) |
| Recursive character splitter | Splits long markdown into chunks | Keeps prompt token limits in check | – |
Firecrawl: the free, self-hosted API
Firecrawl offers a self-hosted Docker image that costs nothing but a few credits for API calls. The free tier gives you 500 credits per month; the paid tier gives you 3,000 credits for $16/month. Credits are charged per page scraped, not per request, so the cost scales with the size of the website you crawl.
docker pull firecrawl/firecrawl
Once you run the container you can call its endpoint:
curl -X POST http://localhost:8080/scrape \
-u root:root \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com","format":"markdown"}'
The response is a JSON payload that contains a content key with markdown. If you see an error, check the container logs:
docker compose logs firecrawl
n8n: orchestrating the workflow
n8n lets you build a linear or branching workflow with minimal code. In my stack I use the following nodes:
Form – adds a single field called url.
HTTP Request –
- Method: POST
- URL: http://firecrawl:8080/scrape
- Authentication: Basic → username root, password root
- Body: {“url”:"{{ $json[‘url’] }}",“format”:“markdown”}
OpenAI:
- Prompt:
Summarize the following Markdown into 5–20 question-answer pairs. Return a JSON array. Markdown: {{ $json['content'] }} - Model: gpt-4o-mini (or any model that fits your budget)
- Prompt:
Qdrant:
- Collection: knowledge_base
- Vector: Pass the OpenAI embeddings of each Q&A pair.
- Payload: Include url and timestamp.
Webhook: expose the workflow so you can trigger it via curl or a browser.
Qdrant: the vector store
Qdrant stores each Q&A pair as a point with a 1536-dimensional vector. The default distance metric is cosine similarity, which works great for semantic retrieval. I configure the collection with a max_chunk_size of 1 000 characters to keep the embeddings manageable.
The query side of the workflow looks like this:
{
"query_vector": "<embedding of user question>",
"filter": {"must": [{"key": "url", "match": {"value": "https://example.com"}}]},
"limit": 4
}
The result is an array of the four most relevant Q&A pairs, which the chatbot then feeds back to the user. This simple RAG pattern keeps the LLM’s context window small while still giving the user accurate answers.
How to apply it
Below is a step-by-step recipe that you can copy-paste into a fresh Git repository.
1. Create a docker-compose.yml
version: "3.9"
services:
firecrawl:
image: firecrawl/firecrawl
container_name: firecrawl
environment:
- FIRECRAWL_ROOT_USER=root
- FIRECRAWL_ROOT_PASSWORD=root
- FIRECRAWL_MODE=api
- FIRECRAWL_PORT=8080
ports:
- "8080:8080"
volumes:
- firecrawl_data:/data
n8n:
image: n8nio/n8n
container_name: n8n
environment:
- N8N_HOST=0.0.0.0
- N8N_PORT=5678
- N8N_PROTOCOL=http
- N8N_INSECURE=true
- N8N_BASIC_AUTH_ACTIVE=true
- N8N_BASIC_AUTH_USER=admin
- N8N_BASIC_AUTH_PASSWORD=admin
depends_on:
- firecrawl
ports:
- "5678:5678"
volumes:
- n8n_data:/home/node/.n8n
qdrant:
image: qdrant/qdrant
container_name: qdrant
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
volumes:
firecrawl_data:
n8n_data:
qdrant_data:
This file spins up all three services with default credentials that you should change in production. The depends_on line guarantees that n8n starts only after Firecrawl is ready.
2. Bring everything up
docker compose up -d
Open http://localhost:5678 to see the n8n UI. The first time you hit the Form node, n8n will ask you for the Firecrawl URL and credentials; fill them in.
3. Test the Firecrawl endpoint
curl -X POST http://localhost:8080/scrape \
-u root:root \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com","format":"markdown"}'
The response is a JSON payload that contains a content key with markdown. If you see an error, check the container logs:
docker compose logs firecrawl
4. Build the n8n workflow
- Form: add a single field called url.
- HTTP Request:
- Method: POST
- URL: http://firecrawl:8080/scrape
- Authentication: Basic → username root, password root
- Body: {“url”:"{{ $json[‘url’] }}",“format”:“markdown”}
- OpenAI:
- Prompt:
Summarize the following Markdown into 5–20 question-answer pairs. Return a JSON array. Markdown: {{ $json['content'] }} - Model: gpt-4o-mini (or any model that fits your budget)
- Prompt:
- Qdrant:
- Collection: knowledge_base
- Vector: Pass the OpenAI embeddings of each Q&A pair.
- Payload: Include url and timestamp.
- Webhook: expose the workflow so you can trigger it via curl or a browser.
5. Hook up a simple chat
I use the Chat UI node from n8n’s AI collection. It sends the user’s question to an OpenAI model, calls Qdrant for the top 4 results, and then stitches the answer. The final prompt looks like:
You are an assistant that answers user questions using the following retrieved Q&A pairs. Only use these facts, and if you can’t answer, say “I don’t know.”
Retrieved Q&A: {{ $json['retrieved'] }}
User question: {{ $json['question'] }}
6. Set up CI/CD
Create a GitHub Actions workflow that triggers on pushes to the main branch:
name: Deploy stack
on: [push]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Docker
uses: docker/setup-buildx-action@v3
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ghcr.io/your-org/stack:latest
- name: Deploy to server
run: |
ssh -o StrictHostKeyChecking=no user@yourserver <<'EOF'
docker compose pull
docker compose up -d
EOF
7. Monitor credits
The Firecrawl API returns a credits_used field in the response header X-Api-Used-Credits. Capture that in n8n and push it to a monitoring dashboard or an alerting service.
{{ $json['X-Api-Used-Credits'] }}
Pitfalls & edge cases
| Issue | What goes wrong | Fix |
|---|---|---|
| Duplicate documents | Qdrant returns the same page multiple times | Store the source URL as metadata and add a unique constraint. |
| Token limits | Summarization prompt exceeds 8 K tokens | Use the recursive character splitter to chunk the markdown before summarizing. |
| API throttling | Firecrawl rate limits you | Increase concurrency by using multiple Firecrawl instances behind a load balancer. |
| Memory blowout | n8n workers consume >2 GB on large workflows | Run n8n in a separate container with a dedicated –max-old-space-size flag. |
| Credential leaks | API keys in public repo | Store credentials in Docker secrets or a secrets manager. |
| Cost overruns | 500 credits are exhausted quickly | Add a hard limit in the workflow to pause scraping when credits are low. |
Quick FAQ
| Question | Answer |
|---|---|
| What’s the difference between the 500-credit free plan and the $16 plan? | The free plan gives 500 page scrapes per month; the paid plan gives 3,000 credits (≈3 000 pages) for $16/month. |
| How do I obtain an API key for a self-hosted Firecrawl instance? | After starting the container, open http://localhost:8080 and create an account. The UI will give you a key. |
| Which authentication method does Firecrawl support? | Basic authentication with a username and password; you can also use API keys in the header Authorization: Bearer |
| How can I avoid duplicate data in Qdrant? | Store the source URL as payload and filter by it before inserting. |
| How do I control the chunk size for the recursive splitter? | Set the chunk_size parameter in the LLM chain; a typical value is 1 000 characters. |
| When should I use the $16 plan instead of the free tier? | If you expect to scrape more than ~500 pages a month or need higher concurrency, upgrade to avoid rate limits. |
Conclusion
You now have a reproducible stack that turns any website into a searchable knowledge base with LLM-powered answers—all on a single Docker Compose file. The key takeaways:
- Scrape once, use many – Firecrawl gives you clean markdown that an LLM can ingest without extra preprocessing.
- Automation is cheap – n8n stitches the pieces together with a few clicks, and the workflow can be version-controlled.
- Vector search is fast – Qdrant’s built-in similarity search keeps your chatbot snappy, even on large knowledge bases.
- Watch your credits – Firecrawl’s credit system is simple; just add a counter in your workflow and let it trigger alerts.
- Keep it CI-driven – With Docker Compose and GitHub Actions you can keep the stack up-to-date without manual intervention.
If you’re building a knowledge-base chatbot, a research assistant, or a content-analysis tool, this stack scales from a local demo to a production-grade deployment with minimal friction. Give it a spin, tweak the chunk size or prompt, and watch your LLM get smarter without paying for cloud infra every time you scrape a new page.
Who should use this?
- AI developers who want to experiment with web-scraped data without the cost of a paid API.
- DevOps engineers who can ship a Docker Compose stack and maintain it via CI/CD.
- Automation specialists who need to tie together scraping, summarization, and vector search.
Who should avoid it?
- Projects that require real-time scraping of millions of pages per day (the free tier will hit limits).
- Teams that cannot run Docker or manage credentials securely.
Start small, test the workflow, then scale up to the $16 plan when your knowledge base starts answering real users. Happy scraping!

