
You can’t fix what you can’t see. Linux VPS monitoring with Prometheus and Grafana is still one of the clearest ways to answer the questions that matter during an incident: “Are we CPU-starved, memory-bound, running out of disk, or dropping packets?” This walkthrough aims for a setup that stays quiet on good days and only gets loud when something needs your attention.
Assume a simple, real-world box: one Ubuntu 24.04/25.04-class VPS (systemd, no Kubernetes) running a small API and a worker. You want host metrics, a dashboard you’ll actually trust, and alerts that don’t ping you at 3 a.m. for non-events. We’ll also leave room for a couple of app probes later, without turning the server into a science project.
If this is production, size for headroom. A 2 vCPU / 4 GB instance is a sensible floor once you add Prometheus, Grafana, and your app together. HostMyCode’s HostMyCode VPS plans map well to this “single box, real monitoring” setup.
What you’ll build (and what you won’t)
- Prometheus scraping host metrics via node_exporter
- Grafana dashboards for CPU, memory, disk, and network
- Alertmanager with a small set of actionable alerts (email example)
- Security: firewall rules, local-only bind where possible, and basic auth on Grafana
What we’re skipping on purpose: multi-node federation, long-term storage (Thanos/Mimir), and full tracing. You can add those later without ripping this apart. If you want a vendor-neutral “logs + metrics + traces” baseline instead, see VPS monitoring with OpenTelemetry Collector on Linux (2026).
Prerequisites
- An Ubuntu server (Ubuntu 24.04 LTS or newer recommended) with sudo
- Open ports: 22/tcp for SSH; optionally 443/tcp if you’ll put Grafana behind a reverse proxy
- A domain name if you want HTTPS and a friendly URL (optional). You can manage DNS via HostMyCode domains.
- Basic familiarity with systemd and editing files in /etc
Architecture choices that reduce noise
Most “Prometheus + Grafana on a VPS” guides go off the rails in predictable ways: they ship hundreds of alerts, or they expose port 3000 to the internet with weak credentials. In 2026, that’s not an “oops.” It’s a breach waiting to happen.
- Bind Prometheus and Alertmanager to localhost and only expose Grafana (or expose nothing and tunnel over SSH).
- Start with 6–10 alerts max, tuned to your instance size and disk layout.
- Use systemd services instead of ad-hoc background processes, so restarts and logs behave consistently.
Install Prometheus, node_exporter, Grafana, and Alertmanager
Ubuntu’s repos are fine for many things, but monitoring components can lag. For a VPS you care about, prefer official packaging where it matters so you get timely updates and predictable layouts.
1) Create a dedicated directory layout
sudo mkdir -p /etc/prometheus /var/lib/prometheus /etc/alertmanager
sudo chown -R prometheus:prometheus /var/lib/prometheus 2>/dev/null || true
We’ll let packages create users where appropriate; the command above is safe even if prometheus doesn’t exist yet.
2) Install node_exporter
Keep node_exporter boring: small, stable, and always running.
sudo apt update
sudo apt install -y prometheus-node-exporter
Verify it’s up and listening on 9100:
systemctl status prometheus-node-exporter --no-pager
ss -ltnp | grep 9100
You should see a LISTEN line similar to:
LISTEN 0 4096 0.0.0.0:9100 ... prometheus-node-exporter
3) Install Prometheus
sudo apt install -y prometheus
Prometheus typically listens on 9090. We’ll bind it to localhost shortly.
4) Install Grafana from Grafana’s APT repo
sudo apt install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
Start it:
sudo systemctl enable --now grafana-server
systemctl status grafana-server --no-pager
5) Install Alertmanager
On some Ubuntu versions Alertmanager ships as prometheus-alertmanager.
sudo apt install -y prometheus-alertmanager
Check status:
systemctl status prometheus-alertmanager --no-pager
Configure Prometheus scraping (and keep it private)
We’ll scrape node_exporter and Prometheus itself, then make Prometheus local-only so it isn’t reachable from the public internet.
1) Edit /etc/prometheus/prometheus.yml
sudo cp /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.bak.$(date +%F)
sudo nano /etc/prometheus/prometheus.yml
Use a minimal config like this (adjust comments to taste):
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["127.0.0.1:9090"]
- job_name: "node"
static_configs:
- targets: ["127.0.0.1:9100"]
Restart Prometheus:
sudo systemctl restart prometheus
sudo journalctl -u prometheus -n 50 --no-pager
2) Bind Prometheus to 127.0.0.1
The Prometheus systemd unit usually reads extra flags from /etc/default/prometheus or a systemd drop-in. First, check what your unit is doing:
sudo systemctl cat prometheus | sed -n '1,120p'
If your Prometheus uses /etc/default/prometheus, set:
sudo nano /etc/default/prometheus
Append or adjust:
ARGS="--web.listen-address=127.0.0.1:9090"
If your unit doesn’t use /etc/default, create a drop-in instead:
sudo systemctl edit prometheus
Add:
[Service]
ExecStart=
ExecStart=/usr/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--web.listen-address=127.0.0.1:9090
Reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart prometheus
ss -ltnp | grep 9090
You want to see 127.0.0.1:9090, not 0.0.0.0:9090.
Add Alertmanager and a small, useful rule set
Alert fatigue isn’t a rite of passage. On a single VPS you mostly care about two categories: “something is down” and “something is running out.”
1) Configure Alertmanager (email example)
Edit:
sudo cp /etc/alertmanager/alertmanager.yml /etc/alertmanager/alertmanager.yml.bak.$(date +%F)
sudo nano /etc/alertmanager/alertmanager.yml
Example (use your SMTP provider; many teams use a transactional account dedicated to alerts):
global:
smtp_smarthost: "smtp.examplemail.net:587"
smtp_from: "alerts@yourdomain.example"
smtp_auth_username: "alerts@yourdomain.example"
smtp_auth_password: "REPLACE_WITH_APP_PASSWORD"
route:
receiver: "email-oncall"
group_by: ["alertname", "instance"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: "email-oncall"
email_configs:
- to: "oncall@yourdomain.example"
send_resolved: true
Restart Alertmanager:
sudo systemctl restart prometheus-alertmanager
sudo journalctl -u prometheus-alertmanager -n 50 --no-pager
2) Create Prometheus alert rules
Create a new rules file:
sudo nano /etc/prometheus/rules-vps.yml
Paste a focused set (tune thresholds to your box):
groups:
- name: vps.rules
rules:
- alert: NodeExporterDown
expr: up{job="node"} == 0
for: 2m
labels:
severity: page
annotations:
summary: "node_exporter is down"
description: "Prometheus cannot scrape node_exporter on {{ $labels.instance }}"
- alert: HighCPUUsage
expr: (1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 0.90
for: 10m
labels:
severity: warn
annotations:
summary: "CPU > 90% for 10m"
description: "Sustained CPU pressure on {{ $labels.instance }}"
- alert: LowMemoryAvailable
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.10
for: 10m
labels:
severity: page
annotations:
summary: "Memory available < 10%"
description: "Likely OOM risk on {{ $labels.instance }}"
- alert: DiskSpaceLowRoot
expr: (node_filesystem_avail_bytes{mountpoint="/",fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{mountpoint="/",fstype!~"tmpfs|overlay"}) < 0.12
for: 15m
labels:
severity: page
annotations:
summary: "Root disk free < 12%"
description: "Clean up disk or grow volume on {{ $labels.instance }}"
- alert: DiskWillFillSoonRoot
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/",fstype!~"tmpfs|overlay"}[6h], 24*3600) < 0
for: 30m
labels:
severity: warn
annotations:
summary: "Root disk on track to fill within ~24h"
description: "Write rate suggests {{ $labels.instance }} will run out of space soon"
- alert: HighNetworkRetransmits
expr: rate(node_netstat_Tcp_RetransSegs[5m]) > 50
for: 10m
labels:
severity: warn
annotations:
summary: "High TCP retransmits"
description: "Potential packet loss or congestion on {{ $labels.instance }}"
Now wire rules + Alertmanager into Prometheus. Edit /etc/prometheus/prometheus.yml:
sudo nano /etc/prometheus/prometheus.yml
Add:
rule_files:
- /etc/prometheus/rules-vps.yml
alerting:
alertmanagers:
- static_configs:
- targets: ["127.0.0.1:9093"]
Restart Prometheus:
sudo systemctl restart prometheus
If you want a deeper playbook for disk pressure, this pairs well with VPS disk space troubleshooting (2026).
Grafana: add Prometheus datasource and import a sane dashboard
Grafana’s UI shifts from release to release, but the steps below stay basically the same.
1) Secure initial access
Grafana listens on port 3000 by default. If inbound access isn’t restricted yet, don’t leave 3000 open to the internet while you “get around to it.”
The quickest safe option is an SSH tunnel from your laptop:
ssh -L 3000:127.0.0.1:3000 youruser@your-vps-ip
Then open http://localhost:3000 locally. Log in and change the admin password immediately.
2) Add Prometheus datasource
- Grafana → Connections → Data sources → Add data source → Prometheus
- URL:
http://127.0.0.1:9090 - Save & test
You should get “Data source is working”. If it fails, confirm Prometheus is bound to 127.0.0.1:9090 and the service is running.
3) Import a node_exporter dashboard
Dashboard IDs come and go, so don’t bake one into your runbook. Pick a maintained “Node Exporter Full”-style dashboard from Grafana’s dashboard library and confirm it’s built on the metrics you actually have:
node_cpu_seconds_totalnode_memory_MemAvailable_bytesnode_filesystem_avail_bytes
After importing, set the dashboard variable job to node (or whatever your scrape job is named).
Expose Grafana safely (two options)
You’ll usually land on one of these patterns:
- Private-only Grafana over SSH or a VPN (smallest attack surface).
- Public HTTPS Grafana behind Nginx/Caddy with authentication and TLS.
If you want private admin access without opening extra ports, pair this with Tailscale VPS VPN setup (2026).
Option A: keep Grafana private and close port 3000
Use UFW to only allow SSH, and bind Grafana to localhost.
Edit /etc/grafana/grafana.ini:
sudo nano /etc/grafana/grafana.ini
Set:
[server]
http_addr = 127.0.0.1
http_port = 3000
Restart:
sudo systemctl restart grafana-server
ss -ltnp | grep 3000
Expected: 127.0.0.1:3000.
Option B: publish Grafana behind Nginx with HTTPS
If you already run Nginx as a reverse proxy, give Grafana its own vhost and put it behind TLS plus basic auth. If you need a clean multi-app proxy baseline first, see Nginx reverse proxy on a VPS (2026).
Install Nginx and htpasswd tooling:
sudo apt install -y nginx apache2-utils
Create credentials:
sudo htpasswd -c /etc/nginx/.htpasswd-grafana opsadmin
Create /etc/nginx/sites-available/grafana.ops.example:
server {
listen 80;
server_name grafana.ops.example;
location / {
auth_basic "Grafana";
auth_basic_user_file /etc/nginx/.htpasswd-grafana;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://127.0.0.1:3000;
}
}
Enable and reload:
sudo ln -s /etc/nginx/sites-available/grafana.ops.example /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
Then obtain TLS certificates (Let’s Encrypt) using Certbot if you want public HTTPS. Keep Grafana itself bound to localhost, and only expose 80/443.
Verification: prove metrics, dashboards, and alerts work
Don’t stop at “the services are running.” You want to confirm the whole chain: exporter → Prometheus → Grafana → Alertmanager.
1) Verify Prometheus targets
From the VPS:
curl -s http://127.0.0.1:9090/-/healthy && echo
curl -s http://127.0.0.1:9090/api/v1/targets | head
Expected: the health endpoint returns “Prometheus is Healthy.” The targets JSON should include node_exporter with health":"up".
2) Verify a query returns data
curl -sG http://127.0.0.1:9090/api/v1/query --data-urlencode 'query=up{job="node"}'
Expected: a JSON payload with value ending in "1".
3) Verify Grafana can query Prometheus
In Grafana’s Explore view, run:
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
You should get a CPU usage time series back.
4) Trigger a test alert (safe)
Create a temporary alert that always fires, confirm delivery, then delete it. That proves the delivery path without putting artificial load on the server.
Add to /etc/prometheus/rules-vps.yml:
- alert: AlertPipelineTest
expr: vector(1)
for: 1m
labels:
severity: warn
annotations:
summary: "Test alert"
description: "If you received this, Prometheus → Alertmanager delivery works."
Restart Prometheus and watch Alertmanager logs:
sudo systemctl restart prometheus
sudo journalctl -u prometheus-alertmanager -f
After you receive the email, remove the test alert and restart Prometheus again.
Common pitfalls (and how to avoid them)
- Prometheus scraping 0.0.0.0 targets: if you bind services to localhost but mix
localhostand127.0.0.1in configs, you’ll waste time on avoidable debugging. Use127.0.0.1everywhere. - Disk alerts firing instantly on small volumes: tune to your disk reality. On a 30 GB root volume, “12% free” is ~3.6 GB. That may be fine or it may be a crisis, depending on logs and containers.
- Grafana exposed publicly with weak auth: don’t expose port 3000 directly. Put it behind HTTPS + basic auth (or SSO), and restrict access by IP/VPN if you can.
- Email alerts silently failing: SMTP auth issues often show up only in logs. Check
journalctl -u prometheus-alertmanagerand validate delivery with the temporary rule. - node_exporter metrics too broad: multiple disks and mounts can make dashboards look “wrong.” Filter filesystem panels to real mountpoints and exclude tmpfs/overlay.
Rollback plan (clean and quick)
If you decide this isn’t the direction you want, you can remove it cleanly.
- Stop services:
sudo systemctl disable --now grafana-server prometheus prometheus-node-exporter prometheus-alertmanager
- Remove packages:
sudo apt remove --purge -y grafana prometheus prometheus-node-exporter prometheus-alertmanager
sudo apt autoremove -y
- Remove config/data directories (only if you’re sure):
sudo rm -rf /etc/prometheus /var/lib/prometheus /etc/alertmanager /var/lib/grafana
If you changed Nginx, remove the Grafana vhost file and reload Nginx.
Next steps: make it production-grade without getting complex
- Add log hygiene so metrics aren’t your only early warning. Start with VPS log rotation best practices in 2026.
- Pin a few service-level metrics: HTTP 5xx rate, request latency, queue depth. Skip the 50-panel sprawl; add the five graphs you’ll use in an incident.
- Backups matter even for monitoring. If dashboards are critical, back up
/var/lib/grafana/grafana.db. - Scale up thoughtfully: if you need months of retention, move long-term storage to a dedicated system. If you’re already planning a bigger footprint, consider a larger managed VPS hosting plan so upgrades, patching, and monitoring upkeep don’t eat your engineering time.
Summary
Linux VPS monitoring with Prometheus and Grafana stays useful when you keep it small, keep it private by default, and tune it to the one box you run. Start with node_exporter, a short alert list, and one dashboard you trust. You’ll catch disk pressure, memory exhaustion, and network trouble earlier—without turning your inbox into a firehose.
If you’re setting this up on a new server, pick a VPS with enough headroom for both the app and the monitoring stack. HostMyCode offers Affordable & Reliable Hosting with plans that fit single-instance production services and the operational tooling that comes with them.
If you want monitoring you can actually depend on, run it on a VPS with predictable performance and enough memory for Grafana to breathe. Start with a HostMyCode VPS, and if you’d rather offload routine ops, look at managed VPS hosting for patching and baseline hardening.
FAQ
Should I run Prometheus and Grafana on the same VPS I’m monitoring?
For a single VPS, yes—this is common and perfectly reasonable. If the box goes down, you lose monitoring too, but you also keep the setup simple. Once you have two or more servers, move monitoring to a separate instance.
How much RAM do Prometheus and Grafana need in 2026?
On one VPS with a 15s scrape interval and a handful of targets, Prometheus often sits around ~300–600 MB RAM. Grafana varies, but budgeting ~300–700 MB is usually safe. Leave at least 1–2 GB of headroom for your app and the OS page cache.
Is it safe to expose Grafana on the public internet?
It can be, but don’t expose port 3000 directly. Put Grafana behind HTTPS, add authentication (basic auth or SSO), and restrict access by IP/VPN where possible. For most teams, an SSH tunnel is the simplest secure default.
Why are my disk alerts noisy?
Noise usually comes from alerting on the wrong mountpoint or including tmpfs/overlay mounts. Filter filesystem metrics to your real volumes, then tune thresholds to your disk size and growth rate.
What’s the fastest way to test that alerts deliver?
Add a temporary always-firing rule (vector(1)), wait for delivery, then delete it. It verifies the whole pipeline without stressing the server.