
In production, PostgreSQL success looks a lot like boredom. Not max benchmarks—just staying online through the reboot, the flaky disk, or the kernel panic that shows up at 2 a.m. This guide walks you through building a Postgres 16/17 high-availability cluster on HostMyCode VPS with Patroni for orchestration, etcd for leader election, and HAProxy as the stable place your apps connect. You’ll set up streaming replication, run a controlled failover, and do backup/restore drills that actually prove you can recover.
The design is intentionally small: two database nodes (primary + replica) plus a third lightweight node that runs etcd and HAProxy. That third node gives you a tie-breaker for leader election and keeps the DCS (distributed configuration store) from becoming a fragile two-node “quorum.”
Architecture and prerequisites (versions, sizing, and network)
Here’s the target topology:
- db1 (Postgres + Patroni) – starts as primary
- db2 (Postgres + Patroni) – starts as replica
- proxy1 (HAProxy + etcd) – stable connection point + DCS quorum tie-breaker
Suggested HostMyCode VPS sizing for most small-to-mid workloads:
- db1/db2: 2–4 vCPU, 4–8 GB RAM, NVMe storage sized for your data + WAL growth
- proxy1: 1 vCPU, 1–2 GB RAM is usually enough
If you’re hosting this on your own infrastructure, you can provision instances quickly with a HostMyCode VPS. If you’d rather offload OS hardening and the steady drumbeat of patching, consider managed VPS hosting.
OS assumption: Ubuntu 24.04 LTS (still a common baseline in 2026). Commands below use systemd paths typical for Ubuntu/Debian family.
Component versions:
- PostgreSQL 16.x or 17.x (either works; pick one and keep both nodes identical)
- Patroni 4.x (Python package)
- etcd 3.5.x
- HAProxy 2.8+ (2.9 is fine)
Network: Put all three servers on a private network if you can. If you can’t, lock ports down so only cluster members can reach them.
- Postgres: 5432/tcp (db1 <-> db2 + HAProxy)
- Patroni REST API: 8008/tcp (HAProxy health checks + cluster nodes)
- etcd client: 2379/tcp (db1/db2 to proxy1)
- etcd peer: 2380/tcp (only needed if you later add more etcd nodes)
- HAProxy frontend: 5000 (writes), 5001 (reads) in this tutorial
Before you expose any database endpoint publicly, run through a hardening checklist. This one is a solid baseline: Linux VPS hardening for production in 2026.
Build the servers on HostMyCode VPS (hostnames, /etc/hosts, time sync)
Assign static private IPs (examples):
- db1: 10.0.0.11
- db2: 10.0.0.12
- proxy1: 10.0.0.10
Set hostnames and map them (do this on all nodes):
sudo hostnamectl set-hostname db1 # on db1
sudo hostnamectl set-hostname db2 # on db2
sudo hostnamectl set-hostname proxy1 # on proxy1
sudo nano /etc/hosts
# Add:
10.0.0.10 proxy1
10.0.0.11 db1
10.0.0.12 db2
Make sure clocks agree. Patroni behavior and TLS validation both get weird when time drifts:
sudo apt-get update
sudo apt-get install -y chrony
sudo systemctl enable --now chrony
chronyc tracking
Install PostgreSQL 16 or 17 and prepare data directories
On db1 and db2, install PostgreSQL from the PGDG repository. Ubuntu’s default packages can lag behind; PGDG stays current in 2026.
sudo apt-get update
sudo apt-get install -y curl ca-certificates gnupg lsb-release
curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo gpg --dearmor -o /usr/share/keyrings/postgresql.gpg
echo "deb [signed-by=/usr/share/keyrings/postgresql.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
sudo apt-get update
# Choose ONE:
sudo apt-get install -y postgresql-16
# or
# sudo apt-get install -y postgresql-17
Patroni will control the PostgreSQL instance, so stop the default cluster service to avoid fights over port 5432:
sudo systemctl stop postgresql
sudo systemctl disable postgresql
Pick a clean data directory. You can use /var/lib/postgresql/16/main, but this tutorial keeps it obvious and separate with /var/lib/postgresql/patroni:
sudo mkdir -p /var/lib/postgresql/patroni
sudo chown -R postgres:postgres /var/lib/postgresql
sudo chmod 700 /var/lib/postgresql/patroni
Install etcd on the tie-breaker node (proxy1)
With only two database nodes, you need a third “vote” for leader election. etcd is the DCS Patroni uses to coordinate who is primary.
On proxy1:
sudo apt-get update
sudo apt-get install -y etcd
sudo nano /etc/default/etcd
Set a minimal single-node etcd config:
ETCD_NAME="proxy1"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_CLIENT_URLS="http://10.0.0.10:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.0.10:2379"
ETCD_LISTEN_PEER_URLS="http://10.0.0.10:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.0.10:2380"
ETCD_INITIAL_CLUSTER="proxy1=http://10.0.0.10:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="patroni-etcd"
Start it and sanity-check:
sudo systemctl enable --now etcd
etcdctl --endpoints=http://127.0.0.1:2379 endpoint health
etcdctl --endpoints=http://127.0.0.1:2379 member list
If you plan to add more database nodes, move to a 3-node etcd cluster. For this tutorial (primary + replica), a single etcd on proxy1 is a reasonable compromise.
Install Patroni and configure the cluster (db1 + db2)
You can install Patroni via apt (if your distro ships a current build) or via pip. In practice, pip is usually the simpler way to keep Patroni current.
On db1 and db2:
sudo apt-get update
sudo apt-get install -y python3 python3-pip python3-venv gcc libpq-dev
sudo pip3 install --upgrade pip
sudo pip3 install "patroni[etcd]"
Create a systemd unit for Patroni (same file on both nodes):
sudo nano /etc/systemd/system/patroni.service
[Unit]
Description=Patroni PostgreSQL HA Cluster
After=network-online.target
Wants=network-online.target
[Service]
User=postgres
Group=postgres
Type=simple
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
Restart=on-failure
RestartSec=5s
LimitNOFILE=1048576
[Install]
WantedBy=multi-user.target
Now create /etc/patroni.yml on db1. If you installed PostgreSQL 17, update bin_dir accordingly.
sudo nano /etc/patroni.yml
scope: pg-ha
namespace: /service/
name: db1
restapi:
listen: 10.0.0.11:8008
connect_address: 10.0.0.11:8008
etcd:
hosts: 10.0.0.10:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
wal_keep_size: 1024MB
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "on"
archive_command: 'test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 10.0.0.0/24 scram-sha-256
- host all all 10.0.0.0/24 scram-sha-256
- host all all 127.0.0.1/32 scram-sha-256
users:
admin:
password: "REPLACE_ME_ADMIN_PASSWORD"
options:
- createrole
- createdb
postgresql:
listen: 10.0.0.11:5432
connect_address: 10.0.0.11:5432
data_dir: /var/lib/postgresql/patroni
bin_dir: /usr/lib/postgresql/16/bin
pgpass: /var/lib/postgresql/.pgpass
authentication:
superuser:
username: postgres
password: "REPLACE_ME_POSTGRES_PASSWORD"
replication:
username: replicator
password: "REPLACE_ME_REPL_PASSWORD"
parameters:
password_encryption: scram-sha-256
create_replica_methods:
- basebackup
basebackup:
checkpoint: fast
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Create the WAL archive directory:
sudo -u postgres mkdir -p /var/lib/postgresql/wal_archive
sudo -u postgres chmod 700 /var/lib/postgresql/wal_archive
Copy the config to db2 and change node-specific fields:
name: db2restapi.listenandconnect_address: 10.0.0.12:8008postgresql.listenandconnect_address: 10.0.0.12:5432
On db2:
sudo nano /etc/patroni.yml
Enable and start Patroni (db1 first, then db2):
# db1
sudo systemctl daemon-reload
sudo systemctl enable --now patroni
# db2
sudo systemctl daemon-reload
sudo systemctl enable --now patroni
Check cluster state using patronictl (run from either DB node):
sudo -u postgres patronictl -c /etc/patroni.yml list
You should see one Leader and one Replica. If both show “stopped,” go straight to the logs:
sudo journalctl -u patroni -n 200 --no-pager
Put HAProxy in front of the cluster (stable write/read endpoints)
Don’t point applications at db1 or db2. Point them at HAProxy. HAProxy polls Patroni’s REST API and sends writes to the current leader.
On proxy1:
sudo apt-get update
sudo apt-get install -y haproxy
sudo nano /etc/haproxy/haproxy.cfg
Replace the config with a minimal Postgres-aware setup:
global
log /dev/log local0
maxconn 4096
defaults
log global
mode tcp
timeout connect 5s
timeout client 60s
timeout server 60s
frontend pg_write
bind 10.0.0.10:5000
default_backend pg_primary
frontend pg_read
bind 10.0.0.10:5001
default_backend pg_replicas
backend pg_primary
option httpchk GET /primary
http-check expect status 200
default-server inter 2s fall 2 rise 2 on-marked-down shutdown-sessions
server db1 10.0.0.11:5432 check port 8008
server db2 10.0.0.12:5432 check port 8008
backend pg_replicas
option httpchk GET /replica
http-check expect status 200
balance roundrobin
default-server inter 2s fall 2 rise 2
server db1 10.0.0.11:5432 check port 8008
server db2 10.0.0.12:5432 check port 8008
Restart HAProxy and confirm the listeners are up:
sudo systemctl enable --now haproxy
sudo systemctl restart haproxy
sudo ss -lntp | egrep '5000|5001'
Quick connectivity tests from any node that has psql installed:
psql "host=10.0.0.10 port=5000 user=postgres dbname=postgres" -c "select inet_server_addr(), pg_is_in_recovery();"
psql "host=10.0.0.10 port=5001 user=postgres dbname=postgres" -c "select inet_server_addr(), pg_is_in_recovery();"
On port 5000, pg_is_in_recovery() should be false. On port 5001, it should usually be true (unless only the leader is up).
Verify streaming replication (WAL, slots, and replay lag)
Patroni uses physical streaming replication via pg_basebackup. If slots are enabled, it also uses replication slots to keep WAL from disappearing under a lagging replica. Check status from the leader:
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "select client_addr, state, sync_state, write_lag, flush_lag, replay_lag from pg_stat_replication;"
On the replica, confirm it’s replaying WAL:
psql -h 10.0.0.10 -p 5001 -U postgres -d postgres -c "select pg_is_in_recovery(), now() - pg_last_xact_replay_timestamp() as replay_delay;"
If replay_delay stays large, the replica can’t keep up. Look at I/O wait, network throughput, and WAL generation rate. From there, you’re usually choosing between faster storage and tuning checkpoint/WAL behavior.
Automatic failover testing (do it safely, then prove it)
High availability isn’t “done” until you’ve broken it on purpose. You’re validating three things:
- Patroni promotes the replica to leader
- HAProxy notices and routes writes to the new leader
- Your application reconnects cleanly (or you at least understand what it does)
1) Create a test table and write through HAProxy:
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "create table if not exists ha_test(id bigserial primary key, note text, created_at timestamptz default now());"
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "insert into ha_test(note) values('before failover');"
2) Identify the current leader:
sudo -u postgres patronictl -c /etc/patroni.yml list
3) Simulate a leader outage (on the leader node):
sudo systemctl stop patroni
Give it 15–40 seconds (based on your TTL/loop_wait), then check the cluster from the remaining node:
sudo -u postgres patronictl -c /etc/patroni.yml list
4) Confirm HAProxy routes writes to the new leader:
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "select inet_server_addr(), pg_is_in_recovery();"
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "insert into ha_test(note) values('after failover');"
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "select count(*) from ha_test;"
5) Bring the old leader back:
sudo systemctl start patroni
It should rejoin as a replica (often via pg_rewind). If it doesn’t, tail the logs and fix the real issue instead of rerunning commands blindly:
sudo journalctl -u patroni -f
To track uptime and failover events with low overhead, pair the cluster with Beszel monitoring on all nodes.
Backups you can actually restore: pgBackRest + restore drills
Streaming replication isn’t a backup. It happily replicates dropped tables, bad migrations, and corruption. Treat backups as a separate system, with a restore process you practice.
For Postgres 16/17, pgBackRest is a solid default. You get full/incremental backups, compression, retention policies, and WAL archiving for point-in-time recovery (PITR).
In this tutorial, you’ll store backups on proxy1 (or a separate backup VPS). In production, a dedicated backup target with tighter access controls and plenty of disk is usually worth it.
Install pgBackRest (db1 and proxy1)
Install on the active leader. Since Patroni can move leadership, install on both db1 and db2 if you want the backup job to follow the primary. For brevity, the steps show leader + proxy1.
On db1 and db2:
sudo apt-get update
sudo apt-get install -y pgbackrest
On proxy1 (backup repository):
sudo apt-get update
sudo apt-get install -y pgbackrest
Set up SSH for repository access
Create a restricted user on proxy1 for backup storage:
sudo adduser --disabled-password --gecos "" pgbackrest
sudo mkdir -p /var/lib/pgbackrest
sudo chown -R pgbackrest:pgbackrest /var/lib/pgbackrest
sudo chmod 750 /var/lib/pgbackrest
From each DB node, allow postgres to SSH to proxy1 as pgbackrest using keys. On db1 (repeat on db2):
sudo -u postgres ssh-keygen -t ed25519 -f /var/lib/postgresql/.ssh/id_ed25519 -N ""
sudo -u postgres ssh-copy-id -i /var/lib/postgresql/.ssh/id_ed25519.pub pgbackrest@10.0.0.10
Configure pgBackRest
On proxy1 (/etc/pgbackrest/pgbackrest.conf):
sudo nano /etc/pgbackrest/pgbackrest.conf
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=4
repo1-retention-diff=8
compress-type=zst
start-fast=y
[pg-ha]
pg1-host=db1
pg1-host-user=postgres
pg1-path=/var/lib/postgresql/patroni
pg2-host=db2
pg2-host-user=postgres
pg2-path=/var/lib/postgresql/patroni
On db1 and db2, create a matching config pointing to the repo on proxy1 (same stanza name):
sudo nano /etc/pgbackrest/pgbackrest.conf
[global]
repo1-host=10.0.0.10
repo1-host-user=pgbackrest
repo1-path=/var/lib/pgbackrest
compress-type=zst
[pg-ha]
pg1-path=/var/lib/postgresql/patroni
Now hook WAL archiving into pgBackRest. Because Patroni manages postgresql.conf parameters, update the Patroni config and reload. Change this on both nodes under bootstrap.dcs.postgresql.parameters:
- Set
archive_commandto pgBackRest:
archive_mode: "on"
archive_command: 'pgbackrest --stanza=pg-ha archive-push %p'
Then reload Patroni on each node:
sudo systemctl restart patroni
Create the stanza (run on the current leader DB node):
sudo -u postgres pgbackrest --stanza=pg-ha --log-level-console=info stanza-create
Run a full backup:
sudo -u postgres pgbackrest --stanza=pg-ha --type=full --log-level-console=info backup
Verify:
sudo -u postgres pgbackrest --stanza=pg-ha info
Restore drill (the part most teams skip)
Run this on a separate test VPS, not on your live cluster. The goal is simple: restore a backup and confirm the database starts cleanly and answers queries.
High-level restore steps on a fresh test node:
- Install Postgres (same major version) and pgBackRest
- Configure pgBackRest to point at the repo on proxy1
- Stop Postgres, wipe the data dir, restore, start Postgres
# On the restore target node:
sudo systemctl stop postgresql || true
sudo rm -rf /var/lib/postgresql/patroni/*
# Restore latest backup:
sudo -u postgres pgbackrest --stanza=pg-ha --log-level-console=info restore --pg1-path=/var/lib/postgresql/patroni
# Start postgres manually for verification (outside Patroni for the drill):
sudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl -D /var/lib/postgresql/patroni -l /tmp/pg.log start
psql -U postgres -d postgres -c "select now();"
If you need PITR, add a recovery.signal and set restore_command plus a recovery_target_time. pgBackRest can generate recovery settings, but don’t leave PITR as a future task. Run at least one PITR drill per quarter.
Operational checklist (common pitfalls that bite later)
- Two nodes is a risk tradeoff. Add a third Postgres node if your RPO/RTO is strict, or if maintenance windows are painful.
- Keep versions aligned. Postgres minor version differences are usually fine, but Patroni upgrades should be tested on staging first.
- Make HAProxy timeouts explicit. Long idle transaction sessions can collide with low HAProxy timeouts. Tune to your app’s behavior.
- Watch disk for WAL growth. A replica that falls behind can explode WAL usage on the leader if slots are enabled.
- Use SCRAM everywhere. The sample uses
scram-sha-256; keep it that way unless you have a hard constraint.
If your application stack also sits behind Nginx, the troubleshooting mindset is similar: health checks, timeouts, and upstream behavior. This guide is useful when you’re trying to separate “app slow” from “server slow”: Fix high TTFB and improve server response time.
If you want predictable performance for Postgres HA, start with a VPS that has consistent CPU and fast NVMe. A HostMyCode VPS is a clean fit for Patroni-based clusters, and managed VPS hosting is a good option if you don’t want to own patch cycles and baseline hardening.
FAQ: Patroni + etcd + HAProxy for PostgreSQL HA
Do I need three database nodes for “real” HA?
For strict high availability, three database nodes is the safer design. With two DB nodes, you can fail over, but you have less tolerance for maintenance and less margin for network partitions. The third node in this tutorial is for etcd tie-breaking, not extra database capacity.
Can I run etcd on the database nodes instead of proxy1?
You can, but a two-node etcd cluster is a bad idea. It can’t form a healthy quorum reliably during failures. Putting etcd on a third node avoids that trap.
How do clients connect: port 5432 or HAProxy ports?
Point applications at HAProxy. Use the write port (5000 here) for normal app traffic. Use the read port (5001) only if your app can tolerate replica lag and read-after-write inconsistencies.
What’s the simplest way to test failover without corrupting data?
Stop Patroni on the leader with systemctl stop patroni and let Patroni promote the replica. Avoid pulling power until you’ve validated clean failover and rewind behavior in your environment.
What should I back up: the replica or the leader?
Back up from the leader unless you have a clear reason to offload to a replica and you’ve tested it. Either way, the backup must include WAL archiving, and you must run restore drills.
Summary: a practical HA baseline you can validate
You now have Patroni coordinating Postgres 16/17 leadership via etcd, HAProxy providing stable write/read endpoints, streaming replication in place, and a repeatable failover test. The restore drill is what turns “we have HA” into “we can recover.”
If you’re putting a Postgres 16/17 high-availability cluster on HostMyCode VPS into production, size for WAL and I/O—not just dataset size—and put failover and restore drills on the calendar. The goal is a cluster that stays boring even on bad days.