Linux VPS metrics to Slack alerts with Prometheus Alertmanager: a practical setup guide for 2026

Graphs are nice, but they don’t wake anyone up. Your monitoring is “done” only when the right person gets a readable alert, with enough context to act, and enough noise control that the channel stays trusted. This guide shows a practical path from Linux VPS metrics to Slack alerts using Prometheus + Alertmanager, aimed at a small API or internal tool running on a single VPS—without treating alerting like an afterthought.

Assume you’re running a Go or Node API behind Nginx on an Ubuntu 24.04 LTS VPS. You either already scrape metrics, or you’re about to. You want Slack notifications for “API is down,” “disk will fill soon,” and “CPU saturation is sustained.” You also want a simple rollback plan for the inevitable moment your first rules are too spicy.

Why Slack alerts fail in real life (and how to avoid it)

Most alerting setups fall apart in one of three predictable ways:

High noise: every spike becomes an incident. People mute the channel. The real outage arrives quietly.
No context: “Instance down” with no hostname, service label, or runbook link turns a 2‑minute fix into a scavenger hunt.
No routing: dev and infra alerts land in the same place, with no severity. The wrong people get pinged, and the right people tune out.

The fixes are unglamorous and effective: reasonable thresholds, a small set of severity labels, grouping to collapse duplicates, and a Slack message that includes instance/service labels plus a runbook (and ideally a Grafana/Prometheus link).

Prerequisites

A Linux VPS (Ubuntu 24.04 LTS used below) with sudo access.
Prometheus already installed, or willingness to install it. (If you need a full agent-first approach, see Linux VPS monitoring agent setup with OpenTelemetry Collector.)
Node Exporter on the VPS for host metrics (disk, CPU, memory). You can run it on the same box as Prometheus for a single-node setup.
A Slack workspace where you can create an incoming webhook for a channel like #ops-alerts.
Ports: Prometheus on 9090 (local), Alertmanager on 9093 (local). You should not expose these publicly.

If you want this on a clean VM with predictable performance, start with a HostMyCode VPS and keep monitoring services on the same instance until you outgrow it.

Step 1: Install Alertmanager and create a dedicated user

Ubuntu packages work, but Prometheus components are often installed from upstream releases so you can pin versions. The steps below install Alertmanager under /opt/alertmanager and run it via systemd.

Create a system user and directories:

sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager
sudo mkdir -p /opt/alertmanager /etc/alertmanager /var/lib/alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

Download and install Alertmanager (example version; check latest stable release before you deploy):

cd /tmp
AM_VER="0.28.1"
wget -q "https://github.com/prometheus/alertmanager/releases/download/v${AM_VER}/alertmanager-${AM_VER}.linux-amd64.tar.gz"
tar -xzf "alertmanager-${AM_VER}.linux-amd64.tar.gz"
sudo cp "alertmanager-${AM_VER}.linux-amd64/alertmanager" /usr/local/bin/
sudo cp "alertmanager-${AM_VER}.linux-amd64/amtool" /usr/local/bin/
sudo chown root:root /usr/local/bin/alertmanager /usr/local/bin/amtool
sudo chmod 0755 /usr/local/bin/alertmanager /usr/local/bin/amtool

Expected output: silent on success. Verify versions:

alertmanager --version
amtool --version

Step 2: Create an Alertmanager config that routes by severity

Next you’ll set up alertmanager.yml to do a few key jobs:

Group alerts by alertname and instance so a single failure doesn’t flood Slack
Send severity=page to #ops-alerts
Send severity=warn to #ops-observe

Create Slack incoming webhooks for two channels and copy the webhook URLs.

Store them as environment variables while you test:
```
export SLACK_WEBHOOK_OPS_ALERTS="https://hooks.slack.com/services/XXX/YYY/ZZZ"
export SLACK_WEBHOOK_OPS_OBSERVE="https://hooks.slack.com/services/AAA/BBB/CCC"
```
In production, don’t leave webhooks in shell history. Put them in a root-owned environment file (we’ll do that in Step 4).

Create /etc/alertmanager/alertmanager.yml:

sudo tee /etc/alertmanager/alertmanager.yml >/dev/null <<'YAML'
global:
  resolve_timeout: 5m

route:
  receiver: slack-observe
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

  routes:
    - matchers:
        - severity = "page"
      receiver: slack-alerts
      repeat_interval: 30m

    - matchers:
        - severity = "warn"
      receiver: slack-observe
      repeat_interval: 6h

receivers:
  - name: slack-alerts
    slack_configs:
      - api_url: ${SLACK_WEBHOOK_OPS_ALERTS}
        channel: "#ops-alerts"
        send_resolved: true
        title: "[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}"
        text: |
          *Severity:* {{ .CommonLabels.severity }}
          *Instance:* {{ .CommonLabels.instance }}
          *Job:* {{ .CommonLabels.job }}
          {{- if .CommonAnnotations.summary }}
          *Summary:* {{ .CommonAnnotations.summary }}
          {{- end }}
          {{- if .CommonAnnotations.runbook_url }}
          *Runbook:* {{ .CommonAnnotations.runbook_url }}
          {{- end }}

  - name: slack-observe
    slack_configs:
      - api_url: ${SLACK_WEBHOOK_OPS_OBSERVE}
        channel: "#ops-observe"
        send_resolved: true
        title: "[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}"
        text: |
          *Severity:* {{ .CommonLabels.severity }}
          *Instance:* {{ .CommonLabels.instance }}
          *Summary:* {{ .CommonAnnotations.summary }}
YAML

This uses environment variable expansion for the webhooks. That choice is intentional: it keeps secrets out of the YAML file.

Step 3: Add Prometheus alert rules (host + API checks)

Prometheus evaluates alert rules. Alertmanager handles routing, grouping, silences, and notifications. Below is a compact ruleset that fits a single VPS running an API and Nginx.

Create a rules file at /etc/prometheus/rules/vps-api.rules.yml. Adjust paths to match your Prometheus installation.

Create the directory if needed:
```
sudo mkdir -p /etc/prometheus/rules
```

Add rules (disk, CPU, memory pressure, and a blackbox-style HTTP check if you already scrape one):

sudo tee /etc/prometheus/rules/vps-api.rules.yml >/dev/null <<'YAML'
groups:
  - name: vps-host.rules
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: page
        annotations:
          summary: "Prometheus target is down"
          runbook_url: "https://internal-wiki.example/runbooks/instance-down"

      - alert: DiskWillFillSoon
        expr: |
          (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.10
        for: 30m
        labels:
          severity: page
        annotations:
          summary: "Disk free space < 10% for 30 minutes"
          runbook_url: "https://internal-wiki.example/runbooks/disk-space"

      - alert: HighCpuSustained
        expr: |
          100 * (1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 85
        for: 20m
        labels:
          severity: warn
        annotations:
          summary: "CPU > 85% for 20 minutes"
          runbook_url: "https://internal-wiki.example/runbooks/high-cpu"

      - alert: MemoryPressure
        expr: |
          (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 0.90
        for: 15m
        labels:
          severity: warn
        annotations:
          summary: "Memory usage > 90% for 15 minutes"
          runbook_url: "https://internal-wiki.example/runbooks/memory-pressure"

  - name: vps-app.rules
    rules:
      - alert: ApiHttpErrorRate
        expr: |
          sum by (instance) (rate(nginx_http_requests_total{status=~"5.."}[5m]))
          /
          sum by (instance) (rate(nginx_http_requests_total[5m]))
          > 0.02
        for: 10m
        labels:
          severity: warn
        annotations:
          summary: "5xx rate > 2% for 10 minutes (Nginx)"
          runbook_url: "https://internal-wiki.example/runbooks/api-5xx"
YAML

Notes:

The Nginx metric nginx_http_requests_total assumes you expose Nginx metrics (via stub_status exporter or Nginx Prometheus exporter). If you don’t have it yet, remove that rule for now.
CPU and memory thresholds are intentionally conservative. Tune them after a week of baseline data.

If you’re building out monitoring from scratch, you’ll get better signal over time with a full stack approach. The post Linux VPS monitoring with Prometheus and Grafana in 2026 pairs well with this Slack routing setup.

Step 4: Wire Prometheus to Alertmanager and keep Slack webhooks out of Git

This step has two moving parts: tell Prometheus where Alertmanager lives, and make sure Alertmanager gets the Slack webhook URLs from a protected file rather than from your repo.

Create an environment file for Alertmanager secrets:

sudo tee /etc/alertmanager/alertmanager.env >/dev/null <<EOF
SLACK_WEBHOOK_OPS_ALERTS=${SLACK_WEBHOOK_OPS_ALERTS}
SLACK_WEBHOOK_OPS_OBSERVE=${SLACK_WEBHOOK_OPS_OBSERVE}
EOF
sudo chown root:alertmanager /etc/alertmanager/alertmanager.env
sudo chmod 0640 /etc/alertmanager/alertmanager.env

This keeps the URLs readable by the Alertmanager process (via systemd) but not by regular users.

Create a systemd unit /etc/systemd/system/alertmanager.service:

sudo tee /etc/systemd/system/alertmanager.service >/dev/null <<'UNIT'
[Unit]
Description=Prometheus Alertmanager
After=network-online.target
Wants=network-online.target

[Service]
User=alertmanager
Group=alertmanager
EnvironmentFile=/etc/alertmanager/alertmanager.env
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager \
  --web.listen-address=127.0.0.1:9093
Restart=on-failure
RestartSec=3

[Install]
WantedBy=multi-user.target
UNIT

Start Alertmanager:

sudo systemctl daemon-reload
sudo systemctl enable --now alertmanager
sudo systemctl status alertmanager --no-pager

Expected output includes active (running).

Update Prometheus config to load the rules file and point to Alertmanager.

Edit /etc/prometheus/prometheus.yml and ensure you have both rule_files and alerting sections:

sudo nano /etc/prometheus/prometheus.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['127.0.0.1:9093']

rule_files:
  - /etc/prometheus/rules/*.yml

Reload Prometheus (method depends on how you installed it). With systemd:

sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheus

Step 5: Verify end-to-end delivery (with a deliberate test alert)

Don’t wait for the first outage to find out you mis-indented YAML. Create a short-lived “always firing” alert, verify Slack delivery, then remove it.

Create a test rule:

sudo tee /etc/prometheus/rules/test-slack.rules.yml >/dev/null <<'YAML'
groups:
  - name: test.rules
    rules:
      - alert: SlackDeliveryTest
        expr: vector(1)
        for: 1m
        labels:
          severity: warn
        annotations:
          summary: "Test alert to verify Slack delivery"
          runbook_url: "https://internal-wiki.example/runbooks/alerting-test"
YAML

Reload Prometheus again:

sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheus

Check Prometheus rules loaded:

curl -s http://127.0.0.1:9090/api/v1/rules | jq -r '.data.groups[].name' | sort

Expected output includes test.rules.

Confirm Alertmanager has the alert:

curl -s http://127.0.0.1:9093/api/v2/alerts | jq -r '.[].labels.alertname' | sort -u

Expected output includes SlackDeliveryTest.

Verify Slack message appears in #ops-observe. You should see title [FIRING] SlackDeliveryTest.

Remove the test rule and reload Prometheus:

sudo rm -f /etc/prometheus/rules/test-slack.rules.yml
sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheus

Within a few minutes, you should also see a [RESOLVED] message if send_resolved is enabled.

Hardening: keep Alertmanager private, and don’t leak webhooks

Prometheus and Alertmanager endpoints are operationally sensitive. If they’re reachable from the public internet, attackers can enumerate targets, learn internal hostnames, or abuse your notification pipeline.

Bind to localhost (we did: 127.0.0.1:9093), and reverse proxy only if you must.
Firewall: allow inbound SSH and web traffic, deny 9090/9093 from the internet.
Secrets hygiene: treat Slack webhooks like credentials. Put them in a root-owned file and restrict permissions.

If you’re tightening the rest of the box, the companion posts Linux VPS hardening checklist in 2026 and VPS firewall logging with nftables help you get to audit-friendly defaults without surprises.

Common pitfalls (the stuff that wastes your afternoon)

Alerts never reach Alertmanager: Prometheus is missing the alerting: block, or it can’t connect to 127.0.0.1:9093. Check with:
```
curl -s http://127.0.0.1:9090/api/v1/status/config | jq -r '.data.yaml' | sed -n '1,120p'
```
Slack notifications don’t send: usually a bad webhook URL or a YAML indentation mistake. Validate before restarting:
```
amtool check-config /etc/alertmanager/alertmanager.yml
```
Environment variables not expanded: Alertmanager only expands env vars if the process environment includes them. With systemd, EnvironmentFile= must be correct and readable. Confirm with:
```
sudo systemctl show alertmanager -p Environment --no-pager
```
Rule expression uses metrics you don’t have: the Nginx 5xx rule will sit in “missing series” if you didn’t configure Nginx metrics. Start with node_exporter alerts first, then add app-level rules.
Noise from flappy targets: if a service restarts constantly, up==0 can fire all day. A for: window helps (we used 2 minutes), but you still need to fix the restarts. A systemd watchdog can help keep services stable; see systemd watchdog on a VPS.

Rollback plan (because you will overtune alerts)

Rollbacks should be quick and boring. Use either of the options below depending on what went wrong.

Rollback option A: disable the noisy rule file

Move the offending rules out of the loaded directory:

sudo mkdir -p /etc/prometheus/rules.disabled
sudo mv /etc/prometheus/rules/vps-api.rules.yml /etc/prometheus/rules.disabled/

Reload Prometheus:

sudo systemctl reload prometheus 2>/dev/null || sudo systemctl restart prometheus

Rollback option B: temporarily mute a route in Alertmanager

If the rules are correct but you need quiet during maintenance, create a silence instead of editing configs. Example: silence DiskWillFillSoon for one instance for 2 hours.

amtool --alertmanager.url=http://127.0.0.1:9093 \
  silence add alertname=DiskWillFillSoon instance="api-vps-01:9100" \
  --duration=2h --comment="Maintenance window"

List active silences:

amtool --alertmanager.url=http://127.0.0.1:9093 silence query

Noise control you can add without rebuilding everything

Once end-to-end delivery works, these small adjustments usually buy you the biggest improvement in signal quality:

Group by service if you label targets with service (for example service="api"). This collapses “same problem, multiple targets” into one message. Even on a single VPS, it helps if you scrape several jobs.
Add runbook URLs that point to a short page with a few commands and a decision tree. That beats tribal knowledge every time.
Use severities consistently: start with warn and page. Don’t create five levels you won’t maintain.
Prefer rate + for rather than raw thresholds. A 10-second CPU spike isn’t an incident.

Where HostMyCode fits in this setup

Alerting is easier when the server behaves predictably: stable disk, consistent CPU, and sane networking. If you’re running Prometheus + Alertmanager on one box, a HostMyCode VPS is a solid baseline. If you don’t want to own OS patching and the usual reliability chores, managed VPS hosting can handle the base system while you focus on the application.

If you’re setting up Linux VPS metrics to Slack alerts for an API or internal service, start with a VPS that offers consistent resources and the root access you’ll need for Prometheus tooling. HostMyCode offers HostMyCode VPS plans that are a good fit for single-node monitoring, plus managed VPS hosting if you want help keeping the base system patched and tidy.

FAQ

Should I run Alertmanager on the same VPS as Prometheus?

For a single-node setup, yes. Bind it to localhost and back it up. Once monitoring becomes mission-critical, move Alertmanager to a separate node or run two instances.

Is a Slack webhook “secure enough” for alerts?

Yes—if you treat it like a password. Keep it out of Git, restrict file permissions, rotate it if it leaks, and don’t expose Alertmanager publicly.

How do I reduce duplicate alerts during an outage?

Use group_by and sensible group_interval values in Alertmanager, and avoid per-process alerts unless you truly need them. Grouping by alertname and instance is a good starting point.

What’s the fastest way to test that alerts still work after a change?

Add a temporary test rule like vector(1), wait for it to fire, confirm Slack, then remove it and confirm the resolved message. Do this before and after any routing change.

Next steps

Add log correlation: When an alert fires, you’ll want logs close by. A good next step is shipping logs to Loki; see VPS log shipping with Loki.
Move from host metrics to SLO-ish alerts: Alert on user impact (latency, error rate) instead of CPU whenever you can.
Document two runbooks: “Instance down” and “Disk will fill soon.” Keep them short, specific, and executable.
Plan for recovery: Pair alerting with snapshots or restic backups. If you need a quick rollback strategy, start from Linux VPS snapshot backups.

Once you’re ready to run this on a clean production box, deploy it on a HostMyCode VPS and keep Alertmanager private. That first real 2 a.m. outage will feel a lot less chaotic.