
Graphs are nice, but they don’t wake anyone up. Your monitoring is “done” only when the right person gets a readable alert, with enough context to act, and enough noise control that the channel stays trusted. This guide shows a practical path from Linux VPS metrics to Slack alerts using Prometheus + Alertmanager, aimed at a small API or internal tool running on a single VPS—without treating alerting like an afterthought.
Assume you’re running a Go or Node API behind Nginx on an Ubuntu 24.04 LTS VPS. You either already scrape metrics, or you’re about to. You want Slack notifications for “API is down,” “disk will fill soon,” and “CPU saturation is sustained.” You also want a simple rollback plan for the inevitable moment your first rules are too spicy.
Why Slack alerts fail in real life (and how to avoid it)
Most alerting setups fall apart in one of three predictable ways:
- High noise: every spike becomes an incident. People mute the channel. The real outage arrives quietly.
- No context: “Instance down” with no hostname, service label, or runbook link turns a 2‑minute fix into a scavenger hunt.
- No routing: dev and infra alerts land in the same place, with no severity. The wrong people get pinged, and the right people tune out.
The fixes are unglamorous and effective: reasonable thresholds, a small set of severity labels, grouping to collapse duplicates, and a Slack message that includes instance/service labels plus a runbook (and ideally a Grafana/Prometheus link).
Prerequisites
- A Linux VPS (Ubuntu 24.04 LTS used below) with sudo access.
- Prometheus already installed, or willingness to install it. (If you need a full agent-first approach, see Linux VPS monitoring agent setup with OpenTelemetry Collector.)
- Node Exporter on the VPS for host metrics (disk, CPU, memory). You can run it on the same box as Prometheus for a single-node setup.
- A Slack workspace where you can create an incoming webhook for a channel like
#ops-alerts. - Ports: Prometheus on
9090(local), Alertmanager on9093(local). You should not expose these publicly.
If you want this on a clean VM with predictable performance, start with a HostMyCode VPS and keep monitoring services on the same instance until you outgrow it.
Step 1: Install Alertmanager and create a dedicated user
Ubuntu packages work, but Prometheus components are often installed from upstream releases so you can pin versions. The steps below install Alertmanager under /opt/alertmanager and run it via systemd.
-
Create a system user and directories:
sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager sudo mkdir -p /opt/alertmanager /etc/alertmanager /var/lib/alertmanager sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager -
Download and install Alertmanager (example version; check latest stable release before you deploy):
cd /tmp AM_VER="0.28.1" wget -q "https://github.com/prometheus/alertmanager/releases/download/v${AM_VER}/alertmanager-${AM_VER}.linux-amd64.tar.gz" tar -xzf "alertmanager-${AM_VER}.linux-amd64.tar.gz" sudo cp "alertmanager-${AM_VER}.linux-amd64/alertmanager" /usr/local/bin/ sudo cp "alertmanager-${AM_VER}.linux-amd64/amtool" /usr/local/bin/ sudo chown root:root /usr/local/bin/alertmanager /usr/local/bin/amtool sudo chmod 0755 /usr/local/bin/alertmanager /usr/local/bin/amtoolExpected output: silent on success. Verify versions:
alertmanager --version amtool --version
Step 2: Create an Alertmanager config that routes by severity
Next you’ll set up alertmanager.yml to do a few key jobs:
- Group alerts by
alertnameandinstanceso a single failure doesn’t flood Slack - Send
severity=pageto#ops-alerts - Send
severity=warnto#ops-observe
-
Create Slack incoming webhooks for two channels and copy the webhook URLs.
Store them as environment variables while you test:
export SLACK_WEBHOOK_OPS_ALERTS="https://hooks.slack.com/services/XXX/YYY/ZZZ" export SLACK_WEBHOOK_OPS_OBSERVE="https://hooks.slack.com/services/AAA/BBB/CCC"In production, don’t leave webhooks in shell history. Put them in a root-owned environment file (we’ll do that in Step 4).
-
Create
/etc/alertmanager/alertmanager.yml:sudo tee /etc/alertmanager/alertmanager.yml >/dev/null <<'YAML' global: resolve_timeout: 5m route: receiver: slack-observe group_by: ['alertname', 'instance'] group_wait: 30s group_interval: 5m repeat_interval: 4h routes: - matchers: - severity = "page" receiver: slack-alerts repeat_interval: 30m - matchers: - severity = "warn" receiver: slack-observe repeat_interval: 6h receivers: - name: slack-alerts slack_configs: - api_url: ${SLACK_WEBHOOK_OPS_ALERTS} channel: "#ops-alerts" send_resolved: true title: "[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}" text: | *Severity:* {{ .CommonLabels.severity }} *Instance:* {{ .CommonLabels.instance }} *Job:* {{ .CommonLabels.job }} {{- if .CommonAnnotations.summary }} *Summary:* {{ .CommonAnnotations.summary }} {{- end }} {{- if .CommonAnnotations.runbook_url }} *Runbook:* {{ .CommonAnnotations.runbook_url }} {{- end }} - name: slack-observe slack_configs: - api_url: ${SLACK_WEBHOOK_OPS_OBSERVE} channel: "#ops-observe" send_resolved: true title: "[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}" text: | *Severity:* {{ .CommonLabels.severity }} *Instance:* {{ .CommonLabels.instance }} *Summary:* {{ .CommonAnnotations.summary }} YAMLThis uses environment variable expansion for the webhooks. That choice is intentional: it keeps secrets out of the YAML file.
Step 3: Add Prometheus alert rules (host + API checks)
Prometheus evaluates alert rules. Alertmanager handles routing, grouping, silences, and notifications. Below is a compact ruleset that fits a single VPS running an API and Nginx.
Create a rules file at /etc/prometheus/rules/vps-api.rules.yml. Adjust paths to match your Prometheus installation.
-
Create the directory if needed:
sudo mkdir -p /etc/prometheus/rules -
Add rules (disk, CPU, memory pressure, and a blackbox-style HTTP check if you already scrape one):
sudo tee /etc/prometheus/rules/vps-api.rules.yml >/dev/null <<'YAML' groups: - name: vps-host.rules rules: - alert: InstanceDown expr: up == 0 for: 2m labels: severity: page annotations: summary: "Prometheus target is down" runbook_url: "https://internal-wiki.example/runbooks/instance-down" - alert: DiskWillFillSoon expr: | (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.10 for: 30m labels: severity: page annotations: summary: "Disk free space < 10% for 30 minutes" runbook_url: "https://internal-wiki.example/runbooks/disk-space" - alert: HighCpuSustained expr: | 100 * (1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 85 for: 20m labels: severity: warn annotations: summary: "CPU > 85% for 20 minutes" runbook_url: "https://internal-wiki.example/runbooks/high-cpu" - alert: MemoryPressure expr: | (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 0.90 for: 15m labels: severity: warn annotations: summary: "Memory usage > 90% for 15 minutes" runbook_url: "https://internal-wiki.example/runbooks/memory-pressure" - name: vps-app.rules rules: - alert: ApiHttpErrorRate expr: | sum by (instance) (rate(nginx_http_requests_total{status=~"5.."}[5m])) / sum by (instance) (rate(nginx_http_requests_total[5m])) > 0.02 for: 10m labels: severity: warn annotations: summary: "5xx rate > 2% for 10 minutes (Nginx)" runbook_url: "https://internal-wiki.example/runbooks/api-5xx" YAMLNotes:
- The Nginx metric
nginx_http_requests_totalassumes you expose Nginx metrics (via stub_status exporter or Nginx Prometheus exporter). If you don’t have it yet, remove that rule for now. - CPU and memory thresholds are intentionally conservative. Tune them after a week of baseline data.
- The Nginx metric
If you’re building out monitoring from scratch, you’ll get better signal over time with a full stack approach. The post Linux VPS monitoring with Prometheus and Grafana in 2026 pairs well with this Slack routing setup.
Step 4: Wire Prometheus to Alertmanager and keep Slack webhooks out of Git
This step has two moving parts: tell Prometheus where Alertmanager lives, and make sure Alertmanager gets the Slack webhook URLs from a protected file rather than from your repo.
-
Create an environment file for Alertmanager secrets:
sudo tee /etc/alertmanager/alertmanager.env >/dev/null <<EOF SLACK_WEBHOOK_OPS_ALERTS=${SLACK_WEBHOOK_OPS_ALERTS} SLACK_WEBHOOK_OPS_OBSERVE=${SLACK_WEBHOOK_OPS_OBSERVE} EOF sudo chown root:alertmanager /etc/alertmanager/alertmanager.env sudo chmod 0640 /etc/alertmanager/alertmanager.envThis keeps the URLs readable by the Alertmanager process (via systemd) but not by regular users.
-
Create a systemd unit
/etc/systemd/system/alertmanager.service:sudo tee /etc/systemd/system/alertmanager.service >/dev/null <<'UNIT' [Unit] Description=Prometheus Alertmanager After=network-online.target Wants=network-online.target [Service] User=alertmanager Group=alertmanager EnvironmentFile=/etc/alertmanager/alertmanager.env ExecStart=/usr/local/bin/alertmanager \ --config.file=/etc/alertmanager/alertmanager.yml \ --storage.path=/var/lib/alertmanager \ --web.listen-address=127.0.0.1:9093 Restart=on-failure RestartSec=3 [Install] WantedBy=multi-user.target UNIT -
Start Alertmanager:
sudo systemctl daemon-reload sudo systemctl enable --now alertmanager sudo systemctl status alertmanager --no-pagerExpected output includes
active (running). -
Update Prometheus config to load the rules file and point to Alertmanager.
Edit
/etc/prometheus/prometheus.ymland ensure you have bothrule_filesandalertingsections:sudo nano /etc/prometheus/prometheus.ymlalerting: alertmanagers: - static_configs: - targets: ['127.0.0.1:9093'] rule_files: - /etc/prometheus/rules/*.ymlReload Prometheus (method depends on how you installed it). With systemd:
sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheus
Step 5: Verify end-to-end delivery (with a deliberate test alert)
Don’t wait for the first outage to find out you mis-indented YAML. Create a short-lived “always firing” alert, verify Slack delivery, then remove it.
-
Create a test rule:
sudo tee /etc/prometheus/rules/test-slack.rules.yml >/dev/null <<'YAML' groups: - name: test.rules rules: - alert: SlackDeliveryTest expr: vector(1) for: 1m labels: severity: warn annotations: summary: "Test alert to verify Slack delivery" runbook_url: "https://internal-wiki.example/runbooks/alerting-test" YAML -
Reload Prometheus again:
sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheus -
Check Prometheus rules loaded:
curl -s http://127.0.0.1:9090/api/v1/rules | jq -r '.data.groups[].name' | sortExpected output includes
test.rules. -
Confirm Alertmanager has the alert:
curl -s http://127.0.0.1:9093/api/v2/alerts | jq -r '.[].labels.alertname' | sort -uExpected output includes
SlackDeliveryTest. -
Verify Slack message appears in
#ops-observe. You should see title[FIRING] SlackDeliveryTest. -
Remove the test rule and reload Prometheus:
sudo rm -f /etc/prometheus/rules/test-slack.rules.yml sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheusWithin a few minutes, you should also see a
[RESOLVED]message ifsend_resolvedis enabled.
Hardening: keep Alertmanager private, and don’t leak webhooks
Prometheus and Alertmanager endpoints are operationally sensitive. If they’re reachable from the public internet, attackers can enumerate targets, learn internal hostnames, or abuse your notification pipeline.
- Bind to localhost (we did:
127.0.0.1:9093), and reverse proxy only if you must. - Firewall: allow inbound SSH and web traffic, deny 9090/9093 from the internet.
- Secrets hygiene: treat Slack webhooks like credentials. Put them in a root-owned file and restrict permissions.
If you’re tightening the rest of the box, the companion posts Linux VPS hardening checklist in 2026 and VPS firewall logging with nftables help you get to audit-friendly defaults without surprises.
Common pitfalls (the stuff that wastes your afternoon)
-
Alerts never reach Alertmanager: Prometheus is missing the
alerting:block, or it can’t connect to127.0.0.1:9093. Check with:curl -s http://127.0.0.1:9090/api/v1/status/config | jq -r '.data.yaml' | sed -n '1,120p' -
Slack notifications don’t send: usually a bad webhook URL or a YAML indentation mistake. Validate before restarting:
amtool check-config /etc/alertmanager/alertmanager.yml -
Environment variables not expanded: Alertmanager only expands env vars if the process environment includes them. With systemd,
EnvironmentFile=must be correct and readable. Confirm with:sudo systemctl show alertmanager -p Environment --no-pager -
Rule expression uses metrics you don’t have: the Nginx 5xx rule will sit in “missing series” if you didn’t configure Nginx metrics. Start with node_exporter alerts first, then add app-level rules.
-
Noise from flappy targets: if a service restarts constantly,
up==0can fire all day. Afor:window helps (we used 2 minutes), but you still need to fix the restarts. A systemd watchdog can help keep services stable; see systemd watchdog on a VPS.
Rollback plan (because you will overtune alerts)
Rollbacks should be quick and boring. Use either of the options below depending on what went wrong.
Rollback option A: disable the noisy rule file
- Move the offending rules out of the loaded directory:
sudo mkdir -p /etc/prometheus/rules.disabled
sudo mv /etc/prometheus/rules/vps-api.rules.yml /etc/prometheus/rules.disabled/
- Reload Prometheus:
sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheus
Rollback option B: temporarily mute a route in Alertmanager
If the rules are correct but you need quiet during maintenance, create a silence instead of editing configs. Example: silence DiskWillFillSoon for one instance for 2 hours.
amtool --alertmanager.url=http://127.0.0.1:9093 \
silence add alertname=DiskWillFillSoon instance="api-vps-01:9100" \
--duration=2h --comment="Maintenance window"
List active silences:
amtool --alertmanager.url=http://127.0.0.1:9093 silence query
Noise control you can add without rebuilding everything
Once end-to-end delivery works, these small adjustments usually buy you the biggest improvement in signal quality:
- Group by service if you label targets with
service(for exampleservice="api"). This collapses “same problem, multiple targets” into one message. Even on a single VPS, it helps if you scrape several jobs. - Add runbook URLs that point to a short page with a few commands and a decision tree. That beats tribal knowledge every time.
- Use severities consistently: start with
warnandpage. Don’t create five levels you won’t maintain. - Prefer rate + for rather than raw thresholds. A 10-second CPU spike isn’t an incident.
Where HostMyCode fits in this setup
Alerting is easier when the server behaves predictably: stable disk, consistent CPU, and sane networking. If you’re running Prometheus + Alertmanager on one box, a HostMyCode VPS is a solid baseline. If you don’t want to own OS patching and the usual reliability chores, managed VPS hosting can handle the base system while you focus on the application.
If you’re setting up Linux VPS metrics to Slack alerts for an API or internal service, start with a VPS that offers consistent resources and the root access you’ll need for Prometheus tooling. HostMyCode offers HostMyCode VPS plans that are a good fit for single-node monitoring, plus managed VPS hosting if you want help keeping the base system patched and tidy.
FAQ
Should I run Alertmanager on the same VPS as Prometheus?
For a single-node setup, yes. Bind it to localhost and back it up. Once monitoring becomes mission-critical, move Alertmanager to a separate node or run two instances.
Is a Slack webhook “secure enough” for alerts?
Yes—if you treat it like a password. Keep it out of Git, restrict file permissions, rotate it if it leaks, and don’t expose Alertmanager publicly.
How do I reduce duplicate alerts during an outage?
Use group_by and sensible group_interval values in Alertmanager, and avoid per-process alerts unless you truly need them. Grouping by alertname and instance is a good starting point.
What’s the fastest way to test that alerts still work after a change?
Add a temporary test rule like vector(1), wait for it to fire, confirm Slack, then remove it and confirm the resolved message. Do this before and after any routing change.
Next steps
- Add log correlation: When an alert fires, you’ll want logs close by. A good next step is shipping logs to Loki; see VPS log shipping with Loki.
- Move from host metrics to SLO-ish alerts: Alert on user impact (latency, error rate) instead of CPU whenever you can.
- Document two runbooks: “Instance down” and “Disk will fill soon.” Keep them short, specific, and executable.
- Plan for recovery: Pair alerting with snapshots or restic backups. If you need a quick rollback strategy, start from Linux VPS snapshot backups.
Once you’re ready to run this on a clean production box, deploy it on a HostMyCode VPS and keep Alertmanager private. That first real 2 a.m. outage will feel a lot less chaotic.