
PostgreSQL Hot Standby vs Standard Replication
PostgreSQL hot standby creates a warm backup server that accepts read-only queries. It continuously applies changes from the primary server.
Unlike standard replication that requires manual promotion, hot standby maintains a ready-to-activate secondary database. This setup cuts recovery time from hours to minutes.
Hot standby uses Write-Ahead Logging (WAL) streaming to transfer transaction logs in real-time. This approach provides better consistency than file-based log shipping.
It also enables read queries on the standby server for reporting workloads.
Your HostMyCode VPS hosting provides the network stability and performance needed for reliable WAL streaming between primary and standby PostgreSQL servers.
Prerequisites and Environment Setup
You'll need two Ubuntu 24.04 VPS instances with PostgreSQL 16 installed. The primary server requires at least 2GB RAM for production workloads.
The standby can operate with 1GB since it handles read-only queries.
Install PostgreSQL on both servers:
sudo apt update
sudo apt install postgresql-16 postgresql-contrib-16
sudo systemctl enable postgresql
sudo systemctl start postgresql
Create a replication user on the primary server:
sudo -u postgres psql
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password_here';
\q
Configure SSH key authentication between servers for easier management. Generate keys on the primary server and copy the public key to the standby server's postgres user.
Primary Server Configuration for WAL Streaming
Edit the PostgreSQL configuration file `/etc/postgresql/16/main/postgresql.conf` on the primary server. These settings enable WAL streaming and configure the server for standby support:
listen_addresses = '*'
wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GB
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/16/main/archive/%f && cp %p /var/lib/postgresql/16/main/archive/%f'
max_replication_slots = 3
hot_standby = on
Create the WAL archive directory:
sudo -u postgres mkdir -p /var/lib/postgresql/16/main/archive
sudo chown postgres:postgres /var/lib/postgresql/16/main/archive
Configure client authentication in `/etc/postgresql/16/main/pg_hba.conf`. Add this line to allow replication connections:
host replication replicator STANDBY_SERVER_IP/32 md5
Replace `STANDBY_SERVER_IP` with your standby server's actual IP address.
Restart PostgreSQL to apply the changes:
sudo systemctl restart postgresql
Base Backup Creation and Transfer
Stop PostgreSQL on the standby server before creating the base backup:
sudo systemctl stop postgresql
Remove the existing data directory on the standby server:
sudo rm -rf /var/lib/postgresql/16/main/*
Create a base backup from the primary server. Run this command on the standby server:
sudo -u postgres pg_basebackup -h PRIMARY_SERVER_IP -D /var/lib/postgresql/16/main -U replicator -P -v -R -W
The `-R` flag automatically creates a `standby.signal` file and configures basic recovery settings. The `-P` option shows progress during the backup process.
This process transfers the entire database cluster from primary to standby. For large databases, expect transfer times of 10-15 minutes per GB over a standard network connection.
Standby Server PostgreSQL Hot Standby Configuration
PostgreSQL 16 uses a `standby.signal` file to indicate recovery mode. The `pg_basebackup` command creates this file automatically.
Verify its presence:
sudo -u postgres ls -la /var/lib/postgresql/16/main/standby.signal
Edit the standby server's `postgresql.conf` file to configure settings:
hot_standby = on
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on
The `hot_standby_feedback` setting prevents the primary server from removing rows. This protects data that might be needed by long-running queries on the standby.
Start PostgreSQL on the standby server:
sudo systemctl start postgresql
Check the standby server logs to confirm WAL streaming is working:
sudo tail -f /var/log/postgresql/postgresql-16-main.log
Look for messages like "started streaming WAL" and "consistent recovery state reached."
Connection and Streaming Verification
Verify the replication connection from the primary server:
sudo -u postgres psql -c "SELECT client_addr, state, sync_state FROM pg_stat_replication;"
This query should show your standby server's IP address with state "streaming" and sync_state "async" for asynchronous replication.
Test read-only access on the standby server:
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
The result should be "t" (true). This indicates the server is in recovery mode and accepting read-only queries.
Create test data on the primary server to verify streaming:
sudo -u postgres psql -c "CREATE TABLE replication_test (id serial, created_at timestamp default now());"
Insert a record and check if it appears on the standby within seconds:
sudo -u postgres psql -c "INSERT INTO replication_test DEFAULT VALUES; SELECT * FROM replication_test;"
Proper configuration ensures data appears on the standby server within 2-3 seconds of insertion on the primary.
Monitoring WAL Streaming Performance
Monitor replication lag using PostgreSQL's built-in views. On the primary server, check sender statistics:
sudo -u postgres psql -c "SELECT pid, usename, client_addr, client_hostname, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag FROM pg_stat_replication;"
The lag columns show time delays between WAL generation and application on the standby. Normal lag should stay under 100ms on a local network.
On the standby server, monitor recovery progress:
sudo -u postgres psql -c "SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();"
Set up log monitoring to catch streaming interruptions:
sudo tail -f /var/log/postgresql/postgresql-16-main.log | grep -E "(streaming|replication|recovery)"
Create a simple monitoring script that checks replication lag every minute:
#!/bin/bash
LAG=$(sudo -u postgres psql -t -c "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()));")
if (( $(echo "$LAG > 60" | bc -l) )); then
echo "WARNING: Replication lag is $LAG seconds"
fi
HostMyCode's managed VPS hosting provides the network reliability and SSD performance essential for PostgreSQL setups. Our monitoring team can help identify replication issues before they impact your applications.
Promoting Standby to Primary (Failover)
When the primary server fails, promote the standby to become the new primary. First, confirm the primary is unreachable:
ping PRIMARY_SERVER_IP
telnet PRIMARY_SERVER_IP 5432
Promote the standby server by removing the standby signal file:
sudo -u postgres rm /var/lib/postgresql/16/main/standby.signal
sudo systemctl restart postgresql
Alternatively, use PostgreSQL's promotion function without restarting:
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/16/main
Verify the server is accepting write operations:
sudo -u postgres psql -c "SELECT pg_is_in_recovery(); INSERT INTO replication_test DEFAULT VALUES;"
The recovery status should return "f" (false). The INSERT should succeed without errors.
Update your application connection strings to point to the promoted server. Consider using a connection pooler like pgBouncer to minimize application changes during failover.
Setting Up Monitoring and Alerts
Create a comprehensive monitoring solution for your setup. Install monitoring tools on both servers:
sudo apt install postgresql-contrib monitoring-plugins-basic
Set up a replication monitoring script that checks multiple metrics:
#!/bin/bash
# Check replication connection
REPL_COUNT=$(sudo -u postgres psql -t -c "SELECT count(*) FROM pg_stat_replication;")
if [ "$REPL_COUNT" -lt 1 ]; then
echo "CRITICAL: No active replication connections"
fi
# Check WAL sender process
WAL_SENDER=$(ps aux | grep "wal sender" | grep -v grep | wc -l)
if [ "$WAL_SENDER" -lt 1 ]; then
echo "WARNING: No WAL sender processes found"
fi
Configure automatic email alerts when replication breaks. Install and configure a mail transfer agent:
sudo apt install postfix mailutils
sudo dpkg-reconfigure postfix
Add the monitoring script to cron for regular execution:
*/5 * * * * /home/postgres/check_replication.sh | mail -s "PostgreSQL Replication Status" admin@yourdomain.com
Troubleshooting Common Issues
When WAL streaming stops, check network connectivity first. Verify the standby can reach the primary server on port 5432:
telnet PRIMARY_SERVER_IP 5432
sudo -u postgres pg_isready -h PRIMARY_SERVER_IP
If authentication fails, verify the replication user exists and has proper permissions:
sudo -u postgres psql -c "SELECT rolname, rolreplication FROM pg_roles WHERE rolname = 'replicator';"
Check pg_hba.conf entries on the primary server. The replication line must appear before more restrictive rules.
For "requested WAL segment has already been removed" errors, increase `wal_keep_size` on the primary server:
wal_keep_size = 2GB
Monitor disk space on both servers. WAL files can accumulate quickly during high-write periods:
du -sh /var/lib/postgresql/16/main/pg_wal/
If the standby falls too far behind, create a fresh base backup. This works better than trying to catch up with archived WAL files.
Performance Optimization
Tune PostgreSQL settings for optimal performance. On the primary server, adjust WAL settings for faster streaming:
wal_compression = on
wal_writer_delay = 200ms
checkpoint_completion_target = 0.9
On the standby server, optimize for read queries:
effective_cache_size = 3GB
random_page_cost = 1.1
max_worker_processes = 4
Consider using synchronous replication for critical data. Change the primary server configuration:
synchronous_standby_names = 'standby_server_name'
synchronous_commit = on
This ensures transactions wait for confirmation from the standby before committing. This provides zero data loss but higher latency.
For read-heavy workloads, distribute queries between primary and standby servers. Use connection pooling tools like pgBouncer or HAProxy.
Frequently Asked Questions
How long does failover take?
Typical failover time ranges from 30 seconds to 2 minutes. This depends on the amount of outstanding WAL data to replay. Applications with prepared connection pools can reconnect within 10-15 seconds of promotion completion.
Can I run different PostgreSQL versions on primary and standby?
Hot standby requires identical major PostgreSQL versions. Minor version differences (like 16.1 vs 16.2) are acceptable. Major version mismatches (15.x vs 16.x) will prevent WAL streaming from working.
What happens if the network connection between servers fails?
The standby server continues operating in recovery mode using archived WAL files if configured. Once network connectivity returns, streaming automatically resumes from the last received WAL position.
How much additional storage does the standby require?
The standby server needs the same storage capacity as the primary server plus space for WAL files during network outages. Plan for at least 110% of your primary server's database size on the standby server.
Can applications write to the standby server?
No, standby servers are read-only. Any write attempts will return an error. Use connection string logic or load balancers to route write queries to the primary server and read queries to either server.