VPS Monitoring and Alerting Setup in 2026: Complete Server Health Tracking with Prometheus, Grafana, and Alertmanager

The Foundation of Reliable VPS Monitoring

Your VPS can fail in dozens of ways. Memory exhaustion crashes applications. Disk space fills up silently until services stop working. CPU spikes from runaway processes bring sites to a crawl.

Network latency creeps up without warning. These problems compound quickly when left undetected.

VPS monitoring and alerting catches these problems before they become outages. You get notifications when metrics cross thresholds you define, not angry customer emails about downtime.

The stack we'll build uses three core tools. Prometheus collects metrics. Grafana creates visual dashboards. Alertmanager sends notifications.

This combination runs reliably across Ubuntu, Debian, and CentOS servers. HostMyCode VPS servers provide enough resources to run this monitoring setup alongside your applications.

Core Metrics That Matter for Server Health

Focus on metrics that predict problems. System load average shows if your server is overloaded. Memory usage reveals when you're approaching swap territory.

Disk space monitoring prevents "No space left on device" errors.

CPU metrics include usage percentage, load average (1, 5, 15 minutes), and per-core utilization. Memory tracking covers total usage, available memory, swap usage, and buffer/cache allocation.

Disk monitoring watches free space, inode usage, and I/O wait times. Network metrics monitor bandwidth usage, packet loss, connection counts, and TCP retransmissions.

Application-specific metrics depend on your stack. Track Apache connection counts, MySQL slow queries, Redis memory usage, or PHP-FPM pool status.

Start simple. Monitor CPU, memory, disk space, and load average first. Add more metrics as you identify specific bottlenecks or failure patterns.

Prometheus Installation and Configuration

Prometheus pulls metrics from targets at regular intervals. Create a dedicated user account:

sudo useradd --system --no-create-home --shell /bin/false prometheus

Download Prometheus 2.45 or later. Create the configuration directory and basic prometheus.yml:

sudo mkdir -p /etc/prometheus /var/lib/prometheus sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

The main configuration defines scrape targets and intervals. Set global scrape_interval to 15s for most VPS setups.

Define job configurations for node_exporter (system metrics), nginx_exporter, and mysql_exporter if needed.

Prometheus stores data locally in /var/lib/prometheus. Configure retention based on your disk space and historical analysis needs. Two weeks of data typically requires 1-2GB for a single server with standard metrics.

Node Exporter Setup for System Metrics

Node Exporter collects system-level metrics from your VPS. It runs as a daemon and exposes metrics on port 9100.

Download the latest release and install it in /usr/local/bin.

Create a systemd service file for node_exporter. Configure it to start automatically and restart on failure.

The default collectors gather CPU, memory, disk, network, and filesystem metrics without additional configuration.

Secure the metrics endpoint by binding to localhost or using basic authentication. For multiple servers, configure firewall rules to allow Prometheus access to port 9100 from your monitoring server only.

Test by accessing http://your-server:9100/metrics. You should see hundreds of metrics in Prometheus format. Look for node_cpu_seconds_total, node_memory_MemAvailable_bytes, and node_filesystem_free_bytes as key indicators.

Grafana Dashboard Creation

Grafana transforms raw Prometheus metrics into readable dashboards. Install from the official repository or download packages directly.

The web interface runs on port 3000.

Add Prometheus as a data source pointing to http://localhost:9090. Create your first dashboard with panels for CPU usage, memory utilization, disk space, and network throughput.

CPU panels use queries like 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) to show percentage utilization.

Memory panels display (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 for usage percentage.

Disk space monitoring uses 100 - (node_filesystem_free_bytes / node_filesystem_size_bytes * 100) for each mounted filesystem.

Network panels show rate(node_network_receive_bytes_total[5m]) and rate(node_network_transmit_bytes_total[5m]) for throughput.

Import community dashboards from grafana.com for common use cases. The "Node Exporter Full" dashboard provides comprehensive system monitoring with minimal setup. Customize thresholds and time ranges based on your server specs and usage patterns.

Alertmanager Configuration for Critical Events

Alertmanager handles notifications when Prometheus alert rules trigger. Install it alongside Prometheus and create configuration for email, Slack, or webhook notifications.

Define alert rules in Prometheus for critical conditions. High CPU usage alerts trigger when usage exceeds 80% for 5 minutes.

Memory alerts fire when available memory drops below 10%. Disk space alerts warn at 85% usage and become critical at 95%.

Create escalation rules in Alertmanager. Send email notifications for warnings and SMS or phone calls for critical alerts.

Configure notification timing to avoid alert fatigue while ensuring important issues get immediate attention.

Group related alerts together to prevent notification storms. If your database server has CPU, memory, and disk alerts firing simultaneously, group them into a single "server-critical" notification.

Web Server and Database Monitoring

Nginx monitoring requires the nginx-prometheus-exporter or nginx_exporter. Configure nginx with stub_status module enabled.

The exporter collects active connections, requests per second, and response codes.

Apache monitoring uses mod_status with the apache_exporter. Enable server-status and configure the exporter to scrape metrics. Monitor worker utilization, request queue length, and bytes served per second.

MySQL monitoring needs mysqld_exporter with appropriate database permissions. Grant SELECT, PROCESS, and REPLICATION CLIENT privileges to a monitoring user. Track slow queries, connection count, buffer pool hit ratio, and replication lag if applicable.

PostgreSQL monitoring uses postgres_exporter with similar setup. Monitor connection counts, active queries, buffer hit ratios, and vacuum operations. Set alerts for connection limit approaches and long-running query detection.

Application Performance Monitoring Integration

Application-level metrics complement system monitoring. PHP applications can export metrics using prometheus/client library.

Track request duration, error rates, and business-specific counters.

Node.js applications integrate with prom-client module. Monitor event loop lag, garbage collection metrics, and HTTP response times.

Python applications use prometheus_client library for similar functionality.

Configure application metrics carefully. High-cardinality metrics (with many label combinations) can overwhelm Prometheus.

Use labels judiciously and aggregate data appropriately to maintain performance.

Set up synthetic monitoring for external endpoints. Use blackbox_exporter to monitor website availability, SSL certificate expiration, and response times from external perspectives.

Alert Threshold Tuning and Management

Initial alert thresholds require adjustment based on your server's normal behavior. CPU usage patterns vary significantly between web servers, database servers, and application servers.

Analyze metric trends over several weeks to establish baseline performance. Set warning thresholds at levels that indicate potential problems without triggering false alarms. Critical thresholds should represent conditions that require immediate intervention.

Memory alerts need careful tuning on Linux systems. Available memory includes buffers and cache, so alerts should use MemAvailable rather than calculating used memory manually. Set warnings when available memory drops below 20% and critical alerts at 5%.

Disk space alerts should account for log rotation and temporary file creation. Set warnings at 80% usage with critical alerts at 90% to provide sufficient response time for cleanup.

Log Integration and Centralized Analysis

Combine metrics monitoring with log analysis for complete observability. Netdata provides excellent real-time monitoring that complements Prometheus for immediate issue diagnosis.

Configure log shipping to centralized storage using rsyslog or syslog-ng. Parse application logs for error patterns and create metrics from log events.

This approach catches issues that don't appear in system metrics.

Set up correlation between metrics spikes and log events. When CPU usage spikes, automatically display related error logs to speed troubleshooting.

This integrated approach reduces mean time to resolution significantly.

For comprehensive strategies, review our VPS resource monitoring guide covering what to track and when to scale resources.

Monitoring Data Storage and Retention

Prometheus stores data locally by default. Configure retention policies based on available disk space and analysis requirements.

Most VPS environments work well with 30-day retention for detailed metrics and longer retention for aggregated data.

Large deployments benefit from remote storage solutions. Thanos provides long-term storage with deduplication and downsampling.

Victoria Metrics offers a Prometheus-compatible solution with better compression and query performance.

Implement backup strategies for monitoring data. Losing historical metrics makes capacity planning and trend analysis impossible.

Regular snapshots of Prometheus data directories ensure you can restore monitoring history if needed.

Monitor the monitoring system itself. Set alerts for Prometheus downtime, Grafana unavailability, and Alertmanager notification failures. A monitoring system that fails silently provides false confidence.

Reliable monitoring requires consistent performance and adequate resources. HostMyCode managed VPS hosting provides the stability and support needed for production monitoring systems. Our team can help optimize your setup and ensure your alerts reach you when they matter most.

Frequently Asked Questions

How much disk space does Prometheus monitoring require?

A typical single-server setup with standard metrics uses 50-100MB per day. With 30-day retention, expect 1.5-3GB total storage. High-frequency scraping or additional exporters increase storage requirements proportionally.

What are the most important alerts to set up first?

Start with disk space (critical at 90%), available memory (critical at 5%), and server availability checks. These catch the most common failure modes that lead to service outages.

How often should Prometheus scrape metrics?

15-second intervals work well for most VPS monitoring. Increase frequency to 5-10 seconds for critical applications requiring rapid response. Decrease to 30-60 seconds for resource-constrained environments.

Can I monitor multiple servers with one Prometheus instance?

Yes, Prometheus can scrape metrics from multiple targets. Configure each server's node_exporter as a separate job. Large deployments may need Prometheus federation or clustering for scalability.

Should I use cloud monitoring services instead of self-hosted?

Self-hosted monitoring provides complete control and privacy. Cloud services offer easier setup but cost more long-term and may not customize to your specific needs. Consider hybrid approaches for critical production environments.