Self-Hosted Monitoring: Why I Fired Datadog for Uptime Kuma & Grafana

Last Updated: January 2026 Author: Raja Reading Time: 7 mins

I remember the exact moment I decided to kill my Datadog account.

It wasn't a server crash. It was an invoice. $420 for a month. For what? Monitoring 3 VPS instances and a handful of containers?

The "hidden costs" of observability SaaS in 2026 are criminal. You pay for the host. Then you pay for the logs. Then you pay for "custom metrics" (the only metrics that actually matter).

So I fired them. I replaced the entire $420/mo stack with a $5/mo Hetzner VPS running Uptime Kuma and Grafana.

Here is how I did it, and why you should too.

1. The "Is It Down?" Layer: Uptime Kuma

If you are paying for PagerDuty or Pingdom for a side project, stop.

Uptime Kuma is the single best piece of open-source software I installed in 2025.

The UI: It looks like it was designed by a human, not a committee.
The Features: It monitors HTTP, TCP, DNS, and even Docker containers directly.
The "Status Page": It generates a beautiful "Is It Down?" page for your users automatically.

My Setup: I run Uptime Kuma on a separate cheap VPS (never host your monitor on the same server you are monitoring).

# docker-compose.yml
services:
  uptime-kuma:
    image: louislam/uptime-kuma:1
    container_name: uptime-kuma
    volumes:
      - ./uptime-kuma-data:/app/data
    ports:
      - "3001:3001"
    restart: always

Cost: $4/mo (VPS) vs $30/mo (Pingdom).

2. The "Why Is It Slow?" Layer: Prometheus + Grafana

Uptime Kuma tells you if it's down. Grafana tells you why.

I used to think setting up Prometheus was hard. "I don't want to write PromQL," I said. But in 2026, the Grafana Agent (now "Alloy") makes this trivial.

The Stack:

Node Exporter: Runs on every server. Exports CPU/RAM/Disk metrics.
Prometheus: Scrapes those metrics every 15s.
Grafana: Visualizes them.

The "Spicy" Config: I don't use the default dashboards. They are too noisy. I built a "CEO Dashboard" that shows me exactly 3 things:

Error Rate: Are users seeing 500s?
Latency (p95): Is the site feeling slow?
Disk Space: Am I about to crash?

If those are green, I sleep. If they are red, I wake up.

3. The "Alerting" Layer: Discord (Yes, really)

I don't use SMS alerts. I don't use email (too slow). I send everything to a private Discord server.

Uptime Kuma -> #uptime-alerts channel.
Grafana -> #performance-alerts channel.

Why Discord?

Push Notifications: Instant on my phone.
History: I can scroll back and see "Oh, the DB spiked yesterday at 3 AM too."
Free: It costs $0.

The "Gotcha": Logs

This is where I will be honest. Self-hosting logs sucks.

I tried Loki. I tried ELK. They are heavy, resource-hungry beasts. If you are generating 100GB of logs a day, self-hosting will cost you more in time than Datadog costs in money.

My Solution: I don't aggregate logs anymore. I use OrbStack or ssh to tail logs when I need them. For production errors, I use Sentry (the free tier is generous).

The Rule: Monitor metrics centrally. Debug logs locally.

The Verdict

Total Monthly Cost:

Datadog: $420
My Stack: $9 (Hetzner VPS + Backups)

Total Setup Time: 2 hours.

Is Datadog better? Yes, if you are Netflix. But you are not Netflix. You are a developer who wants to keep their money.

Clone my docker-compose files from the repo below and save yourself $400/mo.

1. The "Is It Down?" Layer: Uptime Kuma

2. The "Why Is It Slow?" Layer: Prometheus + Grafana

3. The "Alerting" Layer: Discord (Yes, really)

The "Gotcha": Logs

The Verdict

Raja CRN