Linux VPS PostgreSQL High Availability Setup with Repmgr: Complete Failover and Recovery Tutorial for 2026

Why Database High Availability Matters for Production VPS

Database downtime costs businesses an average of $5,600 per minute. A single PostgreSQL crash wipes out hours of transactions, corrupts user sessions, and damages your reputation.

PostgreSQL high availability setup eliminates single points of failure. It maintains multiple synchronized database servers that automatically take over when the primary fails.

This tutorial builds a robust PostgreSQL cluster using repmgr on Ubuntu 24.04. You'll configure streaming replication, automatic failover, and monitoring across multiple HostMyCode VPS instances.

We'll use three servers: one primary, one standby, and one witness node. The witness prevents split-brain scenarios without running a full database instance.

Prerequisites and Server Specifications

You need three Linux VPS instances with these minimum specs:

Primary and Standby: 4GB RAM, 2 CPU cores, 40GB SSD
Witness: 1GB RAM, 1 CPU core, 10GB SSD
Ubuntu 24.04 LTS on all nodes
PostgreSQL 16 (we'll install this)
Private network connectivity between servers

Configure hostnames in /etc/hosts on all servers:

192.168.1.10 pg-primary
192.168.1.11 pg-standby
192.168.1.12 pg-witness

Replace these IP addresses with your actual private network IPs. Test connectivity with ping pg-primary from each node.

Install PostgreSQL 16 and Configure Base Settings

Run these commands on the primary and standby servers (not the witness):

sudo apt update
sudo apt install -y wget ca-certificates
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
echo "deb http://apt.postgresql.org/pub/repos/apt/ jammy-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
sudo apt update
sudo apt install -y postgresql-16 postgresql-client-16 postgresql-16-repmgr

Stop PostgreSQL to configure it properly:

sudo systemctl stop postgresql

Edit /etc/postgresql/16/main/postgresql.conf on the primary server:

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB
hot_standby = on
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/16/archive/%f && cp %p /var/lib/postgresql/16/archive/%f'
shared_preload_libraries = 'repmgr'

Create the archive directory:

sudo mkdir -p /var/lib/postgresql/16/archive
sudo chown postgres:postgres /var/lib/postgresql/16/archive

Configure authentication in /etc/postgresql/16/main/pg_hba.conf:

# Replication connections
host replication repmgr 192.168.1.0/24 trust
host repmgr repmgr 192.168.1.0/24 trust
# Application connections
host all all 192.168.1.0/24 md5

Start PostgreSQL on the primary:

sudo systemctl start postgresql
sudo systemctl enable postgresql

Create Repmgr User and Database

Connect to PostgreSQL as the postgres user:

sudo -u postgres psql

Create the repmgr user and database:

CREATE USER repmgr WITH REPLICATION SUPERUSER LOGIN;
CREATE DATABASE repmgr OWNER repmgr;
ALTER USER repmgr SET search_path TO repmgr, "$user", public;
\q

This dedicated user manages replication and cluster operations. Superuser privilege allows repmgr to modify system settings during failover.

Configure Repmgr on Primary Server

Create /etc/repmgr.conf on the primary server:

node_id=1
node_name='pg-primary'
conninfo='host=pg-primary port=5432 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
config_directory='/etc/postgresql/16/main'
replication_user='repmgr'
replication_type=physical
location='datacenter1'
use_replication_slots=yes
witness_sync_interval=15
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
monitor_interval_secs=2
degradation_threshold=5
unhealthy_threshold=3

sudo -u postgres repmgr -f /etc/repmgr.conf primary register

Verify registration:

sudo -u postgres repmgr -f /etc/repmgr.conf cluster show

You should see the primary node listed with status "running."

Set Up Standby Server with Streaming Replication

On the standby server, stop PostgreSQL if it's running:

sudo systemctl stop postgresql

Remove the existing data directory:

sudo rm -rf /var/lib/postgresql/16/main/*

Clone the primary database to the standby:

sudo -u postgres repmgr -h pg-primary -U repmgr -d repmgr -f /etc/repmgr.conf standby clone

This command copies the entire database from the primary and configures streaming replication. The process takes 2-10 minutes depending on database size.

Create /etc/repmgr.conf on the standby server:

node_id=2
node_name='pg-standby'
conninfo='host=pg-standby port=5432 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
config_directory='/etc/postgresql/16/main'
replication_user='repmgr'
replication_type=physical
location='datacenter1'
use_replication_slots=yes
witness_sync_interval=15
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
monitor_interval_secs=2
degradation_threshold=5
unhealthy_threshold=3

Start PostgreSQL on the standby:

sudo systemctl start postgresql
sudo systemctl enable postgresql

sudo -u postgres repmgr -f /etc/repmgr.conf standby register

Configure Witness Node for Split-Brain Prevention

The witness node prevents split-brain scenarios when network partitions occur. Install only the repmgr client on the witness server:

sudo apt update
sudo apt install -y postgresql-client-16 postgresql-16-repmgr

Create /etc/repmgr.conf on the witness server:

node_id=3
node_name='pg-witness'
conninfo='host=pg-witness port=5432 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/witness'
replication_user='repmgr'
replication_type=physical
location='datacenter1'
witness_sync_interval=15
reconnect_attempts=6
reconnect_interval=10

sudo -u postgres repmgr -f /etc/repmgr.conf witness register -h pg-primary

Verify the complete cluster:

sudo -u postgres repmgr -f /etc/repmgr.conf cluster show

You should see three nodes: primary (running), standby (running), and witness (running).

Enable Automatic Failover with Repmgrd

The repmgrd daemon monitors cluster health and triggers automatic failover. Start it on all nodes.

Create systemd service files. On Ubuntu, copy the example service:

sudo cp /usr/share/doc/postgresql-16-repmgr/examples/repmgrd.service /etc/systemd/system/
sudo systemctl daemon-reload

Edit /etc/systemd/system/repmgrd.service to use the correct config path:

[Unit]
Description=A replication manager, and failover management tool for PostgreSQL
After=postgresql.service

[Service]
Type=forking
User=postgres
ExecStart=/usr/bin/repmgrd -f /etc/repmgr.conf --pid-file=/var/run/postgresql/repmgrd.pid --daemonize
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/postgresql/repmgrd.pid

[Install]
WantedBy=multi-user.target

Start repmgrd on all nodes:

sudo systemctl enable repmgrd
sudo systemctl start repmgrd

Check repmgrd status:

sudo systemctl status repmgrd

For detailed insights on PostgreSQL monitoring, check our database performance monitoring guide. It covers essential metrics and alerting strategies.

Test Replication and Verify Data Synchronization

Create test data on the primary to verify replication works:

sudo -u postgres psql -c "CREATE DATABASE testdb;"
sudo -u postgres psql testdb -c "CREATE TABLE users (id SERIAL PRIMARY KEY, name VARCHAR(50));"
sudo -u postgres psql testdb -c "INSERT INTO users (name) VALUES ('Alice'), ('Bob'), ('Charlie');"
sudo -u postgres psql testdb -c "SELECT * FROM users;"

Check if data appears on the standby (read-only):

sudo -u postgres psql -h pg-standby testdb -c "SELECT * FROM users;"

You should see the same three records. If not, check replication lag:

sudo -u postgres psql -c "SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag FROM pg_stat_replication;"

Healthy replication shows minimal lag values. Look for under 1MB for LSN differences.

Test Automatic Failover Scenarios

Simulate primary server failure to test automatic failover:

# On primary server - simulate crash
sudo systemctl stop postgresql

Monitor the standby server logs:

sudo tail -f /var/log/postgresql/postgresql-16-main.log

Within 30-60 seconds, repmgrd detects the failure and promotes the standby to primary. Check cluster status:

sudo -u postgres repmgr -f /etc/repmgr.conf cluster show

The former standby should now show as "primary" and the old primary as "failed." Test write operations on the new primary:

sudo -u postgres psql -h pg-standby testdb -c "INSERT INTO users (name) VALUES ('David');"

This should work without errors. When you restart the failed server, it rejoins as a standby:

# On the recovered server
sudo -u postgres repmgr node rejoin -f /etc/repmgr.conf -d 'host=pg-standby user=repmgr dbname=repmgr' --force-rewind

For comprehensive backup strategies that complement high availability, see our Linux VPS MySQL backup tutorial. It covers similar recovery principles.

Monitor Cluster Health and Performance

Set up monitoring to catch issues before they cause outages:

# View detailed cluster information
sudo -u postgres repmgr -f /etc/repmgr.conf cluster show --compact

# Check replication lag
sudo -u postgres repmgr -f /etc/repmgr.conf cluster monitor

Monitor repmgrd logs for warnings:

sudo journalctl -u repmgrd -f

Key metrics to track:

Replication lag (should be under 1MB LSN difference)
Connection count to both servers
Disk space on archive directory
Network connectivity between nodes

Create a simple monitoring script at /home/monitor/check_repmgr.sh:

#!/bin/bash
LAG=$(sudo -u postgres psql -t -c "SELECT CASE WHEN pg_is_in_recovery() THEN pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn() ELSE 0 END;" | tr -d ' ')

if [ "$LAG" -gt 16777216 ]; then  # 16MB in bytes
    echo "WARNING: Replication lag is ${LAG} bytes"
    exit 1
else
    echo "OK: Replication lag is ${LAG} bytes"
    exit 0
fi

Run this script every 5 minutes via cron to detect replication issues early.

Troubleshoot Common High Availability Issues

When replication stops working, check these common causes:

Connection refused errors: Verify pg_hba.conf allows replication connections from the standby IP. Check firewall rules allow port 5432 between servers.

Archive directory full: The /var/lib/postgresql/16/archive directory can fill up quickly. Set up log rotation or move old WAL files to backup storage.

Replication slot conflicts: If failover fails, you may have conflicting replication slots:

sudo -u postgres psql -c "SELECT slot_name, active FROM pg_replication_slots;"
sudo -u postgres psql -c "SELECT pg_drop_replication_slot('slot_name');"

Split-brain prevention: Always use a witness node or external fencing mechanism. Never run two primaries simultaneously as this corrupts data.

Our VPS troubleshooting checklist covers network and performance issues that can affect database replication.

Performance Tuning for High Availability

Optimize PostgreSQL settings for your managed VPS hosting environment:

In /etc/postgresql/16/main/postgresql.conf:

# Memory settings for 4GB VPS
shared_buffers = 1GB
effective_cache_size = 3GB
work_mem = 16MB
maintenance_work_mem = 256MB

# WAL settings for replication
wal_buffers = 16MB
checkpoint_completion_target = 0.9
wal_compression = on

# Connection settings
max_connections = 200
max_worker_processes = 4

Restart PostgreSQL after configuration changes:

sudo systemctl restart postgresql

Monitor performance with:

sudo -u postgres psql -c "SELECT datname, xact_commit, xact_rollback, blks_read, blks_hit, temp_files, temp_bytes FROM pg_stat_database;"

High blks_read values indicate insufficient shared_buffers. Frequent temp_files suggest low work_mem.

Ready to deploy PostgreSQL high availability on production-ready infrastructure? HostMyCode VPS hosting provides the reliable network connectivity and performance you need for database clusters. Our managed VPS plans include 24/7 monitoring and expert support to keep your setup running smoothly.

Frequently Asked Questions

How long does PostgreSQL failover take with repmgr?

Automatic failover typically completes in 30-60 seconds. Detection takes 10-30 seconds based on your monitoring_history and degradation_threshold settings. Promotion itself takes 5-15 seconds. Applications experience a brief connection interruption during the switchover.

Can I run more than one standby server in the cluster?

Yes, you can add multiple standbys for read scaling and additional redundancy. Each standby needs a unique node_id in repmgr.conf. Use repmgr standby clone to create additional replicas from the primary or an existing standby.

What happens if the witness node goes down?

The cluster continues operating normally. The witness only prevents split-brain during network partitions. Without it, you risk having two primaries if the network splits but both database servers remain operational. Always restore the witness quickly.

How much additional storage does streaming replication require?

Each standby needs the same storage as the primary for the database itself. Additionally, configure at least 2-5GB for WAL archives on the primary. Monitor archive directory size and implement cleanup policies based on your backup retention needs.

Can I perform schema changes on a PostgreSQL HA cluster?

Yes, execute DDL statements on the primary and they replicate automatically to standbys. Avoid long-running schema migrations during peak hours as they can increase replication lag. Test major changes on a staging cluster first to verify timing and impact.