From Web to Knowledge Base: Deploying Firecrawl, n8n, and Qdrant on Docker Compose | Brav

Deploy Firecrawl, n8n, and Qdrant with Docker Compose for a cost-effective AI web-scraping and knowledge-base workflow, covering authentication, credits, and CI/CD.

From Web to Knowledge Base: Deploying Firecrawl, n8n, and Qdrant on Docker Compose

TL;DR

  • Firecrawl is a free self-hosted web-scraping API that turns pages into clean markdown for LLMs.
  • Combine it with n8n for automation and Qdrant for vector search to build a cost-effective knowledge base.
  • Use Docker Compose for all three services and a CI/CD pipeline to keep them up to date.
  • The Firecrawl free plan gives 500 credits per month; the $16 plan gives 3,000 credits.
  • A typical workflow produces 5–20 Q&A pairs per page and stores up to 4 documents per query.

Published by Brav

Table of Contents

Why this matters

Every AI developer who wants to scrape the web and feed the data into a chatbot runs into the same pain points:

  • Finding a reliable, free API that does not throttle or block you.
  • Managing a self-hosted stack without a dedicated ops team.
  • Converting raw HTML into a format that an LLM can ingest.
  • Building a deduplicated knowledge base that can be queried in real time.
  • Watching your credit balance drift and blowing your budget on a single project.

In this article I walk through a real-world stack that solves all of these problems with a single docker-compose.yml file, a handful of n8n nodes, and a simple RAG assistant that uses Qdrant as its vector store. I also point out the edge cases that can trip you up.

Core concepts

ComponentWhat it doesWhy it mattersSource
FirecrawlScrapes a URL and returns clean markdownGives you LLM-ready content in secondsFirecrawl — Docs (2024)
n8nLow-code workflow automationConnects the scrape, summarization, and storage stepsn8n — Docs (2024)
QdrantVector database that stores embeddingsEnables fast RAG queriesQdrant — OpenAI Embeddings (2024)
OpenAI embeddingsTransforms text into vectorsWorks seamlessly with QdrantSame as above
LLM summarizerGenerates 5–20 Q&A pairs from markdownTurns noisy pages into usable knowledgeMedium — How to Generate QA (2024)
Recursive character splitterSplits long markdown into chunksKeeps prompt token limits in check

Firecrawl: the free, self-hosted API

Firecrawl offers a self-hosted Docker image that costs nothing but a few credits for API calls. The free tier gives you 500 credits per month; the paid tier gives you 3,000 credits for $16/month. Credits are charged per page scraped, not per request, so the cost scales with the size of the website you crawl.

docker pull firecrawl/firecrawl

Once you run the container you can call its endpoint:

curl -X POST http://localhost:8080/scrape \
  -u root:root \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","format":"markdown"}'

The response is a JSON payload that contains a content key with markdown. If you see an error, check the container logs:

docker compose logs firecrawl

n8n: orchestrating the workflow

n8n lets you build a linear or branching workflow with minimal code. In my stack I use the following nodes:

  • Form – adds a single field called url.

  • HTTP Request

    • Method: POST
    • URL: http://firecrawl:8080/scrape
    • Authentication: Basic → username root, password root
    • Body: {“url”:"{{ $json[‘url’] }}",“format”:“markdown”}
  • OpenAI:

    • Prompt:
      Summarize the following Markdown into 5–20 question-answer pairs. Return a JSON array.
      Markdown:
      {{ $json['content'] }}
      
    • Model: gpt-4o-mini (or any model that fits your budget)
  • Qdrant:

    • Collection: knowledge_base
    • Vector: Pass the OpenAI embeddings of each Q&A pair.
    • Payload: Include url and timestamp.
  • Webhook: expose the workflow so you can trigger it via curl or a browser.

Qdrant: the vector store

Qdrant stores each Q&A pair as a point with a 1536-dimensional vector. The default distance metric is cosine similarity, which works great for semantic retrieval. I configure the collection with a max_chunk_size of 1 000 characters to keep the embeddings manageable.

The query side of the workflow looks like this:

{
  "query_vector": "<embedding of user question>",
  "filter": {"must": [{"key": "url", "match": {"value": "https://example.com"}}]},
  "limit": 4
}

The result is an array of the four most relevant Q&A pairs, which the chatbot then feeds back to the user. This simple RAG pattern keeps the LLM’s context window small while still giving the user accurate answers.

How to apply it

Below is a step-by-step recipe that you can copy-paste into a fresh Git repository.

1. Create a docker-compose.yml

version: "3.9"
services:
  firecrawl:
    image: firecrawl/firecrawl
    container_name: firecrawl
    environment:
      - FIRECRAWL_ROOT_USER=root
      - FIRECRAWL_ROOT_PASSWORD=root
      - FIRECRAWL_MODE=api
      - FIRECRAWL_PORT=8080
    ports:
      - "8080:8080"
    volumes:
      - firecrawl_data:/data
  n8n:
    image: n8nio/n8n
    container_name: n8n
    environment:
      - N8N_HOST=0.0.0.0
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - N8N_INSECURE=true
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=admin
    depends_on:
      - firecrawl
    ports:
      - "5678:5678"
    volumes:
      - n8n_data:/home/node/.n8n
  qdrant:
    image: qdrant/qdrant
    container_name: qdrant
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage
volumes:
  firecrawl_data:
  n8n_data:
  qdrant_data:

This file spins up all three services with default credentials that you should change in production. The depends_on line guarantees that n8n starts only after Firecrawl is ready.

2. Bring everything up

docker compose up -d

Open http://localhost:5678 to see the n8n UI. The first time you hit the Form node, n8n will ask you for the Firecrawl URL and credentials; fill them in.

3. Test the Firecrawl endpoint

curl -X POST http://localhost:8080/scrape \
  -u root:root \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","format":"markdown"}'

The response is a JSON payload that contains a content key with markdown. If you see an error, check the container logs:

docker compose logs firecrawl

4. Build the n8n workflow

  • Form: add a single field called url.
  • HTTP Request:
    • Method: POST
    • URL: http://firecrawl:8080/scrape
    • Authentication: Basic → username root, password root
    • Body: {“url”:"{{ $json[‘url’] }}",“format”:“markdown”}
  • OpenAI:
    • Prompt:
      Summarize the following Markdown into 5–20 question-answer pairs. Return a JSON array.
      Markdown:
      {{ $json['content'] }}
      
    • Model: gpt-4o-mini (or any model that fits your budget)
  • Qdrant:
    • Collection: knowledge_base
    • Vector: Pass the OpenAI embeddings of each Q&A pair.
    • Payload: Include url and timestamp.
  • Webhook: expose the workflow so you can trigger it via curl or a browser.

5. Hook up a simple chat

I use the Chat UI node from n8n’s AI collection. It sends the user’s question to an OpenAI model, calls Qdrant for the top 4 results, and then stitches the answer. The final prompt looks like:

You are an assistant that answers user questions using the following retrieved Q&A pairs. Only use these facts, and if you can’t answer, say “I don’t know.”  
Retrieved Q&A: {{ $json['retrieved'] }}  
User question: {{ $json['question'] }}

6. Set up CI/CD

Create a GitHub Actions workflow that triggers on pushes to the main branch:

name: Deploy stack
on: [push]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Docker
        uses: docker/setup-buildx-action@v3
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/your-org/stack:latest
      - name: Deploy to server
        run: |
          ssh -o StrictHostKeyChecking=no user@yourserver <<'EOF'
          docker compose pull
          docker compose up -d
          EOF

7. Monitor credits

The Firecrawl API returns a credits_used field in the response header X-Api-Used-Credits. Capture that in n8n and push it to a monitoring dashboard or an alerting service.

{{ $json['X-Api-Used-Credits'] }}

Pitfalls & edge cases

IssueWhat goes wrongFix
Duplicate documentsQdrant returns the same page multiple timesStore the source URL as metadata and add a unique constraint.
Token limitsSummarization prompt exceeds 8 K tokensUse the recursive character splitter to chunk the markdown before summarizing.
API throttlingFirecrawl rate limits youIncrease concurrency by using multiple Firecrawl instances behind a load balancer.
Memory blowoutn8n workers consume >2 GB on large workflowsRun n8n in a separate container with a dedicated –max-old-space-size flag.
Credential leaksAPI keys in public repoStore credentials in Docker secrets or a secrets manager.
Cost overruns500 credits are exhausted quicklyAdd a hard limit in the workflow to pause scraping when credits are low.

Quick FAQ

QuestionAnswer
What’s the difference between the 500-credit free plan and the $16 plan?The free plan gives 500 page scrapes per month; the paid plan gives 3,000 credits (≈3 000 pages) for $16/month.
How do I obtain an API key for a self-hosted Firecrawl instance?After starting the container, open http://localhost:8080 and create an account. The UI will give you a key.
Which authentication method does Firecrawl support?Basic authentication with a username and password; you can also use API keys in the header Authorization: Bearer .
How can I avoid duplicate data in Qdrant?Store the source URL as payload and filter by it before inserting.
How do I control the chunk size for the recursive splitter?Set the chunk_size parameter in the LLM chain; a typical value is 1 000 characters.
When should I use the $16 plan instead of the free tier?If you expect to scrape more than ~500 pages a month or need higher concurrency, upgrade to avoid rate limits.

Conclusion

You now have a reproducible stack that turns any website into a searchable knowledge base with LLM-powered answers—all on a single Docker Compose file. The key takeaways:

  • Scrape once, use many – Firecrawl gives you clean markdown that an LLM can ingest without extra preprocessing.
  • Automation is cheap – n8n stitches the pieces together with a few clicks, and the workflow can be version-controlled.
  • Vector search is fast – Qdrant’s built-in similarity search keeps your chatbot snappy, even on large knowledge bases.
  • Watch your credits – Firecrawl’s credit system is simple; just add a counter in your workflow and let it trigger alerts.
  • Keep it CI-driven – With Docker Compose and GitHub Actions you can keep the stack up-to-date without manual intervention.

If you’re building a knowledge-base chatbot, a research assistant, or a content-analysis tool, this stack scales from a local demo to a production-grade deployment with minimal friction. Give it a spin, tweak the chunk size or prompt, and watch your LLM get smarter without paying for cloud infra every time you scrape a new page.


Who should use this?

  • AI developers who want to experiment with web-scraped data without the cost of a paid API.
  • DevOps engineers who can ship a Docker Compose stack and maintain it via CI/CD.
  • Automation specialists who need to tie together scraping, summarization, and vector search.

Who should avoid it?

  • Projects that require real-time scraping of millions of pages per day (the free tier will hit limits).
  • Teams that cannot run Docker or manage credentials securely.

Start small, test the workflow, then scale up to the $16 plan when your knowledge base starts answering real users. Happy scraping!

Last updated: January 2, 2026

Recommended Articles

Deploying a 3CX PBX: From Zero to Hero in 2025 | Brav

Deploying a 3CX PBX: From Zero to Hero in 2025

Set up a 3CX PBX on Windows or the cloud, configure SIP trunks, forward ports on pfSense, and use ring groups for a low-cost, scalable phone system.
Build Your Own Python-Based Quant Hedge Fund: The Step-by-Step Blueprint | Brav

Build Your Own Python-Based Quant Hedge Fund: The Step-by-Step Blueprint

Learn how to build a Python-based quant hedge fund from data ingestion to live trading. Follow our step-by-step blueprint, avoid overfitting, and manage risk.