What’s the difference between the 500-credit free plan and the $16 plan?

The free plan gives 500 page scrapes per month; the paid plan gives 3,000 credits (≈3 000 pages) for $16/month.

How do I obtain an API key for a self-hosted Firecrawl instance?

After starting the container, open `http://localhost:8080` and create an account. The UI will give you a key.

Which authentication method does Firecrawl support?

Basic authentication with a username and password; you can also use API keys in the header `Authorization: Bearer `.

How can I avoid duplicate data in Qdrant?

Store the source URL as payload and filter by it before inserting.

How do I control the chunk size for the recursive splitter?

Set the `chunk_size` parameter in the LLM chain; a typical value is 1 000 characters.

When should I use the $16 plan instead of the free tier?

If you expect to scrape more than ~500 pages a month or need higher concurrency, upgrade to avoid rate limits.

Deploy Firecrawl, n8n, and Qdrant with Docker Compose for a cost-effective AI web-scraping and knowledge-base workflow, covering authentication, credits, and CI/CD.

From Web to Knowledge Base: Deploying Firecrawl, n8n, and Qdrant on Docker Compose

TL;DR

Firecrawl is a free self-hosted web-scraping API that turns pages into clean markdown for LLMs.
Combine it with n8n for automation and Qdrant for vector search to build a cost-effective knowledge base.
Use Docker Compose for all three services and a CI/CD pipeline to keep them up to date.
The Firecrawl free plan gives 500 credits per month; the $16 plan gives 3,000 credits.
A typical workflow produces 5–20 Q&A pairs per page and stores up to 4 documents per query.

Published by Brav

Table of Contents

Why this matters

Every AI developer who wants to scrape the web and feed the data into a chatbot runs into the same pain points:

Finding a reliable, free API that does not throttle or block you.
Managing a self-hosted stack without a dedicated ops team.
Converting raw HTML into a format that an LLM can ingest.
Building a deduplicated knowledge base that can be queried in real time.
Watching your credit balance drift and blowing your budget on a single project.

In this article I walk through a real-world stack that solves all of these problems with a single docker-compose.yml file, a handful of n8n nodes, and a simple RAG assistant that uses Qdrant as its vector store. I also point out the edge cases that can trip you up.

Core concepts

Component	What it does	Why it matters	Source
Firecrawl	Scrapes a URL and returns clean markdown	Gives you LLM-ready content in seconds	Firecrawl — Docs (2024)
n8n	Low-code workflow automation	Connects the scrape, summarization, and storage steps	n8n — Docs (2024)
Qdrant	Vector database that stores embeddings	Enables fast RAG queries	Qdrant — OpenAI Embeddings (2024)
OpenAI embeddings	Transforms text into vectors	Works seamlessly with Qdrant	Same as above
LLM summarizer	Generates 5–20 Q&A pairs from markdown	Turns noisy pages into usable knowledge	Medium — How to Generate QA (2024)
Recursive character splitter	Splits long markdown into chunks	Keeps prompt token limits in check	–

Firecrawl: the free, self-hosted API

Firecrawl offers a self-hosted Docker image that costs nothing but a few credits for API calls. The free tier gives you 500 credits per month; the paid tier gives you 3,000 credits for $16/month. Credits are charged per page scraped, not per request, so the cost scales with the size of the website you crawl.

docker pull firecrawl/firecrawl

Once you run the container you can call its endpoint:

curl -X POST http://localhost:8080/scrape \
  -u root:root \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","format":"markdown"}'

The response is a JSON payload that contains a content key with markdown. If you see an error, check the container logs:

docker compose logs firecrawl

n8n: orchestrating the workflow

n8n lets you build a linear or branching workflow with minimal code. In my stack I use the following nodes:

Form – adds a single field called url.
HTTP Request –
- Method: POST
- URL: http://firecrawl:8080/scrape
- Authentication: Basic → username root, password root
- Body: {“url”:"{{ $json[‘url’] }}",“format”:“markdown”}

OpenAI:

Prompt:

Summarize the following Markdown into 5–20 question-answer pairs. Return a JSON array.
Markdown:
{{ $json['content'] }}

Model: gpt-4o-mini (or any model that fits your budget)

Qdrant:
- Collection: knowledge_base
- Vector: Pass the OpenAI embeddings of each Q&A pair.
- Payload: Include url and timestamp.
Webhook: expose the workflow so you can trigger it via curl or a browser.

Qdrant: the vector store

Qdrant stores each Q&A pair as a point with a 1536-dimensional vector. The default distance metric is cosine similarity, which works great for semantic retrieval. I configure the collection with a max_chunk_size of 1 000 characters to keep the embeddings manageable.

The query side of the workflow looks like this:

{
  "query_vector": "<embedding of user question>",
  "filter": {"must": [{"key": "url", "match": {"value": "https://example.com"}}]},
  "limit": 4
}

The result is an array of the four most relevant Q&A pairs, which the chatbot then feeds back to the user. This simple RAG pattern keeps the LLM’s context window small while still giving the user accurate answers.

How to apply it

Below is a step-by-step recipe that you can copy-paste into a fresh Git repository.

1. Create a docker-compose.yml

version: "3.9"
services:
  firecrawl:
    image: firecrawl/firecrawl
    container_name: firecrawl
    environment:
      - FIRECRAWL_ROOT_USER=root
      - FIRECRAWL_ROOT_PASSWORD=root
      - FIRECRAWL_MODE=api
      - FIRECRAWL_PORT=8080
    ports:
      - "8080:8080"
    volumes:
      - firecrawl_data:/data
  n8n:
    image: n8nio/n8n
    container_name: n8n
    environment:
      - N8N_HOST=0.0.0.0
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - N8N_INSECURE=true
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=admin
    depends_on:
      - firecrawl
    ports:
      - "5678:5678"
    volumes:
      - n8n_data:/home/node/.n8n
  qdrant:
    image: qdrant/qdrant
    container_name: qdrant
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage
volumes:
  firecrawl_data:
  n8n_data:
  qdrant_data:

This file spins up all three services with default credentials that you should change in production. The depends_on line guarantees that n8n starts only after Firecrawl is ready.

2. Bring everything up

docker compose up -d

Open http://localhost:5678 to see the n8n UI. The first time you hit the Form node, n8n will ask you for the Firecrawl URL and credentials; fill them in.

3. Test the Firecrawl endpoint

curl -X POST http://localhost:8080/scrape \
  -u root:root \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","format":"markdown"}'

The response is a JSON payload that contains a content key with markdown. If you see an error, check the container logs:

docker compose logs firecrawl

4. Build the n8n workflow

Form: add a single field called url.
HTTP Request:
- Method: POST
- URL: http://firecrawl:8080/scrape
- Authentication: Basic → username root, password root
- Body: {“url”:"{{ $json[‘url’] }}",“format”:“markdown”}

OpenAI:

Prompt:

Summarize the following Markdown into 5–20 question-answer pairs. Return a JSON array.
Markdown:
{{ $json['content'] }}

Model: gpt-4o-mini (or any model that fits your budget)

Qdrant:
- Collection: knowledge_base
- Vector: Pass the OpenAI embeddings of each Q&A pair.
- Payload: Include url and timestamp.
Webhook: expose the workflow so you can trigger it via curl or a browser.

5. Hook up a simple chat

I use the Chat UI node from n8n’s AI collection. It sends the user’s question to an OpenAI model, calls Qdrant for the top 4 results, and then stitches the answer. The final prompt looks like:

You are an assistant that answers user questions using the following retrieved Q&A pairs. Only use these facts, and if you can’t answer, say “I don’t know.”  
Retrieved Q&A: {{ $json['retrieved'] }}  
User question: {{ $json['question'] }}

6. Set up CI/CD

Create a GitHub Actions workflow that triggers on pushes to the main branch:

name: Deploy stack
on: [push]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Docker
        uses: docker/setup-buildx-action@v3
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/your-org/stack:latest
      - name: Deploy to server
        run: |
          ssh -o StrictHostKeyChecking=no user@yourserver <<'EOF'
          docker compose pull
          docker compose up -d
          EOF

7. Monitor credits

The Firecrawl API returns a credits_used field in the response header X-Api-Used-Credits. Capture that in n8n and push it to a monitoring dashboard or an alerting service.

{{ $json['X-Api-Used-Credits'] }}

Pitfalls & edge cases

Issue	What goes wrong	Fix
Duplicate documents	Qdrant returns the same page multiple times	Store the source URL as metadata and add a unique constraint.
Token limits	Summarization prompt exceeds 8 K tokens	Use the recursive character splitter to chunk the markdown before summarizing.
API throttling	Firecrawl rate limits you	Increase concurrency by using multiple Firecrawl instances behind a load balancer.
Memory blowout	n8n workers consume >2 GB on large workflows	Run n8n in a separate container with a dedicated –max-old-space-size flag.
Credential leaks	API keys in public repo	Store credentials in Docker secrets or a secrets manager.
Cost overruns	500 credits are exhausted quickly	Add a hard limit in the workflow to pause scraping when credits are low.

Quick FAQ

Question	Answer
What’s the difference between the 500-credit free plan and the $16 plan?	The free plan gives 500 page scrapes per month; the paid plan gives 3,000 credits (≈3 000 pages) for $16/month.
How do I obtain an API key for a self-hosted Firecrawl instance?	After starting the container, open http://localhost:8080 and create an account. The UI will give you a key.
Which authentication method does Firecrawl support?	Basic authentication with a username and password; you can also use API keys in the header Authorization: Bearer .
How can I avoid duplicate data in Qdrant?	Store the source URL as payload and filter by it before inserting.
How do I control the chunk size for the recursive splitter?	Set the chunk_size parameter in the LLM chain; a typical value is 1 000 characters.
When should I use the $16 plan instead of the free tier?	If you expect to scrape more than ~500 pages a month or need higher concurrency, upgrade to avoid rate limits.

Conclusion

You now have a reproducible stack that turns any website into a searchable knowledge base with LLM-powered answers—all on a single Docker Compose file. The key takeaways:

Scrape once, use many – Firecrawl gives you clean markdown that an LLM can ingest without extra preprocessing.
Automation is cheap – n8n stitches the pieces together with a few clicks, and the workflow can be version-controlled.
Vector search is fast – Qdrant’s built-in similarity search keeps your chatbot snappy, even on large knowledge bases.
Watch your credits – Firecrawl’s credit system is simple; just add a counter in your workflow and let it trigger alerts.
Keep it CI-driven – With Docker Compose and GitHub Actions you can keep the stack up-to-date without manual intervention.

If you’re building a knowledge-base chatbot, a research assistant, or a content-analysis tool, this stack scales from a local demo to a production-grade deployment with minimal friction. Give it a spin, tweak the chunk size or prompt, and watch your LLM get smarter without paying for cloud infra every time you scrape a new page.

Who should use this?

AI developers who want to experiment with web-scraped data without the cost of a paid API.
DevOps engineers who can ship a Docker Compose stack and maintain it via CI/CD.
Automation specialists who need to tie together scraping, summarization, and vector search.

Who should avoid it?

Projects that require real-time scraping of millions of pages per day (the free tier will hit limits).
Teams that cannot run Docker or manage credentials securely.

Start small, test the workflow, then scale up to the $16 plan when your knowledge base starts answering real users. Happy scraping!

Last updated: January 2, 2026