What if I need more than 9 GPUs?

You can add more P106 cards if you have a larger chassis, but the power supply and cooling must scale. Alternatively, use a cloud GPU instance for burst traffic.

Is the P106 still suitable for modern deep-learning models?

Yes, for models like BiRefNet and Real ESRGAN that fit in 6 GB VRAM, the P106 works fine. For larger models you’ll need 12 GB or more.

How do I keep the API private?

Use JWT or a short-lived JVT token stored in an HttpOnly cookie. The token is validated on every request.

Can I use the same pipeline for other image tasks?

Absolutely. Just swap in a different model (e.g., Stable Diffusion for generative art) and adjust the VRAM thresholds.

Do I need a dedicated server?

No. A home server works as long as you manage power, cooling, and backup.

What about the cost of the GPUs over time?

GPUs depreciate, but used P106s are cheap enough that even with a 10 % annual depreciation you still stay below $200/month.

How do I handle large file uploads?

Use Cloudflare R2 pre-signed URLs for direct upload; this bypasses the server and reduces load.

Learn how to replace expensive background-removal APIs with a low-cost home GPU rig, using open-source models and Cloudflare R2 for privacy-first image processing.

How I Built a Budget-Friendly Background Removal Service on a Home GPU Rig

TL;DR

High-cost cloud APIs (≈$0.20/image) can be replaced by a self-hosted rig for ≈$0.01/image.
A nine-GPU P106 cluster in Belarus costs ≈$26/month in electricity, far cheaper than cloud rates.
Use FastAPI, Redis (or a simple task queue), Cloudflare R2, and open-source models (BiRefNet, Real ESRGAN, LaMa) for a scalable, privacy-first pipeline.
Cooling, power supply, and DDoS protection are the main hardware risks.
The next step? Scale to a dedicated GPU server or expose an API for developers.

Published by Brav

Table of Contents

Why this matters

When I first started a side project that required background removal, I hit two wall-flowers: the API cost was bleeding money, and every image I sent out to remove its background was leaving my apartment’s Wi-Fi, and I didn’t trust a third-party to delete the originals. The numbers were brutal. remove.bg charges about $0.20 per image remove.bg — Pricing (2026), so a thousand images a month would cost me $200 remove.bg — Pricing (2026).

I also discovered that running on a cloud GPU instance (DigitalOcean droplet, AWS GPU, or a managed GPU service) was no different price-wise. A $20/mo DigitalOcean droplet can give me a single P106 GPU for a fraction of a cloud instance’s cost DigitalOcean — Droplet Pricing (2026), but that still added an extra $20 to my bill, and I was still paying the same 0.20 $ per image because I was still sending images to a third-party.

The real cost was twofold: the API bill and the electricity bill of a home server. Belarus’s electricity cost is only $0.06–$0.09 per kWh PVKnowhow — Belarus Electricity Price (2026), and my rig would pull only about 600 W on average, or roughly 432 kWh per month. That works out to only $26 a month in electricity PVKnowhow — Belarus Electricity Price (2026), a far cry from a $200 monthly API bill.

Core concepts

The GPU economy

I started with an inexpensive NVIDIA P106, a 6 GB Pascal card that was originally a mining GPU. Its price on eBay is roughly $22 USD used Nvidia P106 — eBay listing (2026). Each card draws about 100 W under load Nvidia P106 — TechPowerUp spec (2026). Multiply that by nine cards and you’re looking at 600 W, which translates to $26/month in Belarusian electricity PVKnowhow — Belarus Electricity Price (2026).

Software stack

FastAPI: I chose FastAPI for its asynchronous nature and its clean Python API surface. The docs are straightforward FastAPI — Docs (2026).
Task queue: I kept it simple, using a small Redis-like queue stored in memory.
Storage: Cloudflare R2 is a drop-in S3 compatible store that offers a free tier of 10 GB-month and 1 M Class A requests per month Cloudflare R2 — Pricing (2026).
Open-source models:
- BiRefNet for fast, bidirectional matting BiRefNet — GitHub Repo (2026).
- Real ESRGAN for upscaling the background-removed image to a higher resolution Real ESRGAN — GitHub Repo (2026).
- LaMa (Llama) for inpainting or object removal Llama — GitHub Repo (2026).

Data privacy

All original images are deleted immediately after processing. No logs or storage persist on the server, and Cloudflare R2’s bucket is private, accessed only via pre-signed URLs that the client generates and then immediately discards.

How to apply it

Gather GPUs – Scrape eBay or local mining hardware resellers for used P106s. Each card is $20–$30 used.
Build the rig – Mount nine cards in a 4U case, connect a 600 W PSU (600 W * 1.5 = 900 W recommended). Install a small active cooling system (USB fans or small inline fans).
Install drivers – Use the official CUDA 11 driver; the GPU is Pascal-compatible.
Set up the OS – A lightweight Ubuntu 22.04 server (CPU: Intel Pentium, 2 cores, 20 GB RAM, 128 GB SSD).
Deploy FastAPI – Use Uvicorn with async workers; expose two endpoints: upload (accepts a pre-signed URL for R2) and process (launches a background task).
Queue & workers – Spin up a small thread pool that pulls tasks from a simple in-memory queue and assigns them to the GPU that has enough VRAM (BiRefNet needs 5.5 GB, Real ESRGAN 4 GB, LaMa 3 GB).
Use TMPFS – Mount /tmp as tmpfs (RAM disk) to speed up intermediate I/O.
Handle power – Monitor wattage with a USB meter and log to a simple Prometheus endpoint; set a threshold to pause jobs if voltage dips.
Secure the API – Use JWT or a simple JVT token + fingerprint for each request; store the token in an HttpOnly cookie.
Expose Cloudflare – Point DNS to Cloudflare; enable the free tier for DDoS protection and WAF.
Delete originals – The pipeline deletes the source image from the local filesystem after the R2 upload is confirmed.
Scale – If traffic increases, replace the 9-GPU rack with a dedicated server, or lease a GPU droplet on DigitalOcean for higher throughput.

Metrics you should monitor

Metric	Target	Tool
CPU load	< 50 %	top/htop
GPU memory	< 90 %	nvidia-smi
Power consumption	< 650 W	USB meter
Latency per image	< 5 s	Prometheus
Error rate	0 %	Sentry

Pitfalls & edge cases

Issue	Why it matters	Mitigation
Cooling failure	GPUs overheat, throttling or hardware failure	Add redundant fans, monitor temps, set auto-shutdown at 90 °C
Power outages	Service crashes, data loss	UPS with 30 min backup, write-through caching
DDoS attacks	Free Cloudflare tier can still be abused	Enable rate limits, use Cloudflare WAF
GPU failure	Single card failure reduces capacity	Spare GPU, health checks, job retries
Storage limits	R2 free tier caps at 10 GB-month	Upgrade to paid plan or add a local cache
Data privacy	Accidental leakage via logs	No logging of image payloads, secure cookies
Scaling limits	9 GPUs only handle ~200 requests/hour	Move to GPU server or add more rigs

Quick FAQ

What if I need more than 9 GPUs?
You can add more P106 cards if you have a larger chassis, but the power supply and cooling must scale. Alternatively, use a cloud GPU instance for burst traffic.
Is the P106 still suitable for modern deep-learning models?
Yes, for models like BiRefNet and Real ESRGAN that fit in 6 GB VRAM, the P106 works fine. For larger models you’ll need 12 GB or more.
How do I keep the API private?
Use JWT or a short-lived JVT token stored in an HttpOnly cookie. The token is validated on every request.
Can I use the same pipeline for other image tasks?
Absolutely. Just swap in a different model (e.g., Stable Diffusion for generative art) and adjust the VRAM thresholds.
Do I need a dedicated server?
No. A home server works as long as you manage power, cooling, and backup.
What about the cost of the GPUs over time?
GPUs depreciate, but used P106s are cheap enough that even with a 10 % annual depreciation you still stay below $200/month.
How do I handle large file uploads?
Use Cloudflare R2 pre-signed URLs for direct upload; this bypasses the server and reduces load.

Conclusion

If you’re a startup founder or hobbyist developer looking to keep costs low while maintaining full data privacy, a home GPU rig built around used P106 cards is a viable path. The key takeaways:

Cut API bills by moving to your own GPUs.
Cap electricity by running in a low-cost region (Belarus).
Simplify storage with Cloudflare R2’s free tier.
Build a resilient pipeline with FastAPI, a lightweight queue, and open-source models.

Who should use this? Anyone who can tolerate the upfront hardware cost, the extra maintenance overhead, and who cares about privacy. Who should not? Those who need instant horizontal scaling, or who lack the physical space and power supply to run a GPU rig.

Ready to jump in? Grab a few P106s, set up FastAPI, and let the GPU do the heavy lifting.

References

remove.bg — Pricing (2026) https://www.deviantart.com/josephmoyers/journal/Easy-Ways-to-Remove-Background-from-Images-856541241
DigitalOcean — Droplet Pricing (2026) https://www.digitalocean.com/pricing/droplets
Cloudflare R2 — Pricing (2026) https://developers.cloudflare.com/r2/pricing/
FastAPI — Docs (2026) https://fastapi.tiangolo.com/
Nvidia P106 — eBay listing (2026) https://www.ebay.com.au/itm/156284589117
Nvidia P106 — TechPowerUp spec (2026) https://www.techpowerup.com/gpu-specs/p106-100.c2980
PVKnowhow — Belarus Electricity Price (2026) https://www.pvknowhow.com/solar-report/belarus/
BiRefNet — GitHub Repo (2026) https://github.com/ZhengPeng7/BiRefNet
Real ESRGAN — GitHub Repo (2026) https://github.com/xinntao/Real-ESRGAN
Llama — GitHub Repo (2026) https://github.com/advimman/lama

Last updated: January 6, 2026

How I Built a Budget-Friendly Background Removal Service on a Home GPU Rig

Why this matters

Core concepts

The GPU economy

Software stack

Data privacy

How to apply it

Metrics you should monitor

Pitfalls & edge cases

Quick FAQ

Conclusion

References

Recommended Articles

AI Consulting as My Secret Weapon: How I Built a $250K Solo Empire and You Can Do It Too

Mastering agents.md: Build Long-Running AI Sessions That Never Forget

I Built a Forex Bot with Reinforcement Learning That Outperformed My Old Strategy

I Built Kai: A Personal AI Infrastructure That Turned My 9-5 Into a Personal Supercomputer

Build Smarter AI Agents with These 10 Open-Source GitHub Projects

Build a Network Security Monitoring Stack in VirtualBox: From Capture to Alerts with tshark, Zeek, and Suricata