Set up a Postgres 16/17 high-availability cluster on HostMyCode VPS with Patroni, etcd, and HAProxy

In production, PostgreSQL success looks a lot like boredom. Not max benchmarks—just staying online through the reboot, the flaky disk, or the kernel panic that shows up at 2 a.m. This guide walks you through building a Postgres 16/17 high-availability cluster on HostMyCode VPS with Patroni for orchestration, etcd for leader election, and HAProxy as the stable place your apps connect. You’ll set up streaming replication, run a controlled failover, and do backup/restore drills that actually prove you can recover.

The design is intentionally small: two database nodes (primary + replica) plus a third lightweight node that runs etcd and HAProxy. That third node gives you a tie-breaker for leader election and keeps the DCS (distributed configuration store) from becoming a fragile two-node “quorum.”

Architecture and prerequisites (versions, sizing, and network)

Here’s the target topology:

db1 (Postgres + Patroni) – starts as primary
db2 (Postgres + Patroni) – starts as replica
proxy1 (HAProxy + etcd) – stable connection point + DCS quorum tie-breaker

Suggested HostMyCode VPS sizing for most small-to-mid workloads:

db1/db2: 2–4 vCPU, 4–8 GB RAM, NVMe storage sized for your data + WAL growth
proxy1: 1 vCPU, 1–2 GB RAM is usually enough

If you’re hosting this on your own infrastructure, you can provision instances quickly with a HostMyCode VPS. If you’d rather offload OS hardening and the steady drumbeat of patching, consider managed VPS hosting.

OS assumption: Ubuntu 24.04 LTS (still a common baseline in 2026). Commands below use systemd paths typical for Ubuntu/Debian family.

Component versions:

PostgreSQL 16.x or 17.x (either works; pick one and keep both nodes identical)
Patroni 4.x (Python package)
etcd 3.5.x
HAProxy 2.8+ (2.9 is fine)

Network: Put all three servers on a private network if you can. If you can’t, lock ports down so only cluster members can reach them.

Postgres: 5432/tcp (db1 <-> db2 + HAProxy)
Patroni REST API: 8008/tcp (HAProxy health checks + cluster nodes)
etcd client: 2379/tcp (db1/db2 to proxy1)
etcd peer: 2380/tcp (only needed if you later add more etcd nodes)
HAProxy frontend: 5000 (writes), 5001 (reads) in this tutorial

Before you expose any database endpoint publicly, run through a hardening checklist. This one is a solid baseline: Linux VPS hardening for production in 2026.

Build the servers on HostMyCode VPS (hostnames, /etc/hosts, time sync)

Assign static private IPs (examples):

db1: 10.0.0.11
db2: 10.0.0.12
proxy1: 10.0.0.10

Set hostnames and map them (do this on all nodes):

sudo hostnamectl set-hostname db1   # on db1
sudo hostnamectl set-hostname db2   # on db2
sudo hostnamectl set-hostname proxy1 # on proxy1

sudo nano /etc/hosts
# Add:
10.0.0.10 proxy1
10.0.0.11 db1
10.0.0.12 db2

Make sure clocks agree. Patroni behavior and TLS validation both get weird when time drifts:

sudo apt-get update
sudo apt-get install -y chrony
sudo systemctl enable --now chrony
chronyc tracking

Install PostgreSQL 16 or 17 and prepare data directories

On db1 and db2, install PostgreSQL from the PGDG repository. Ubuntu’s default packages can lag behind; PGDG stays current in 2026.

sudo apt-get update
sudo apt-get install -y curl ca-certificates gnupg lsb-release

curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo gpg --dearmor -o /usr/share/keyrings/postgresql.gpg
echo "deb [signed-by=/usr/share/keyrings/postgresql.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list

sudo apt-get update
# Choose ONE:
sudo apt-get install -y postgresql-16
# or
# sudo apt-get install -y postgresql-17

Patroni will control the PostgreSQL instance, so stop the default cluster service to avoid fights over port 5432:

sudo systemctl stop postgresql
sudo systemctl disable postgresql

Pick a clean data directory. You can use /var/lib/postgresql/16/main, but this tutorial keeps it obvious and separate with /var/lib/postgresql/patroni:

sudo mkdir -p /var/lib/postgresql/patroni
sudo chown -R postgres:postgres /var/lib/postgresql
sudo chmod 700 /var/lib/postgresql/patroni

Install etcd on the tie-breaker node (proxy1)

With only two database nodes, you need a third “vote” for leader election. etcd is the DCS Patroni uses to coordinate who is primary.

On proxy1:

sudo apt-get update
sudo apt-get install -y etcd
sudo nano /etc/default/etcd

Set a minimal single-node etcd config:

ETCD_NAME="proxy1"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_CLIENT_URLS="http://10.0.0.10:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.0.10:2379"
ETCD_LISTEN_PEER_URLS="http://10.0.0.10:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.0.10:2380"
ETCD_INITIAL_CLUSTER="proxy1=http://10.0.0.10:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="patroni-etcd"

Start it and sanity-check:

sudo systemctl enable --now etcd
etcdctl --endpoints=http://127.0.0.1:2379 endpoint health
etcdctl --endpoints=http://127.0.0.1:2379 member list

If you plan to add more database nodes, move to a 3-node etcd cluster. For this tutorial (primary + replica), a single etcd on proxy1 is a reasonable compromise.

Install Patroni and configure the cluster (db1 + db2)

You can install Patroni via apt (if your distro ships a current build) or via pip. In practice, pip is usually the simpler way to keep Patroni current.

On db1 and db2:

sudo apt-get update
sudo apt-get install -y python3 python3-pip python3-venv gcc libpq-dev
sudo pip3 install --upgrade pip
sudo pip3 install "patroni[etcd]"

Create a systemd unit for Patroni (same file on both nodes):

sudo nano /etc/systemd/system/patroni.service

[Unit]
Description=Patroni PostgreSQL HA Cluster
After=network-online.target
Wants=network-online.target

[Service]
User=postgres
Group=postgres
Type=simple
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
Restart=on-failure
RestartSec=5s
LimitNOFILE=1048576

[Install]
WantedBy=multi-user.target

Now create /etc/patroni.yml on db1. If you installed PostgreSQL 17, update bin_dir accordingly.

sudo nano /etc/patroni.yml

scope: pg-ha
namespace: /service/
name: db1

restapi:
  listen: 10.0.0.11:8008
  connect_address: 10.0.0.11:8008

etcd:
  hosts: 10.0.0.10:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        wal_keep_size: 1024MB
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"
        archive_mode: "on"
        archive_command: 'test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'
  initdb:
  - encoding: UTF8
  - data-checksums
  pg_hba:
  - host replication replicator 10.0.0.0/24 scram-sha-256
  - host all all 10.0.0.0/24 scram-sha-256
  - host all all 127.0.0.1/32 scram-sha-256
  users:
    admin:
      password: "REPLACE_ME_ADMIN_PASSWORD"
      options:
      - createrole
      - createdb

postgresql:
  listen: 10.0.0.11:5432
  connect_address: 10.0.0.11:5432
  data_dir: /var/lib/postgresql/patroni
  bin_dir: /usr/lib/postgresql/16/bin
  pgpass: /var/lib/postgresql/.pgpass
  authentication:
    superuser:
      username: postgres
      password: "REPLACE_ME_POSTGRES_PASSWORD"
    replication:
      username: replicator
      password: "REPLACE_ME_REPL_PASSWORD"
  parameters:
    password_encryption: scram-sha-256
  create_replica_methods:
  - basebackup
  basebackup:
    checkpoint: fast

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

Create the WAL archive directory:

sudo -u postgres mkdir -p /var/lib/postgresql/wal_archive
sudo -u postgres chmod 700 /var/lib/postgresql/wal_archive

Copy the config to db2 and change node-specific fields:

name: db2
restapi.listen and connect_address: 10.0.0.12:8008
postgresql.listen and connect_address: 10.0.0.12:5432

On db2:

sudo nano /etc/patroni.yml

Enable and start Patroni (db1 first, then db2):

# db1
sudo systemctl daemon-reload
sudo systemctl enable --now patroni

# db2
sudo systemctl daemon-reload
sudo systemctl enable --now patroni

Check cluster state using patronictl (run from either DB node):

sudo -u postgres patronictl -c /etc/patroni.yml list

You should see one Leader and one Replica. If both show “stopped,” go straight to the logs:

sudo journalctl -u patroni -n 200 --no-pager

Put HAProxy in front of the cluster (stable write/read endpoints)

Don’t point applications at db1 or db2. Point them at HAProxy. HAProxy polls Patroni’s REST API and sends writes to the current leader.

On proxy1:

sudo apt-get update
sudo apt-get install -y haproxy
sudo nano /etc/haproxy/haproxy.cfg

Replace the config with a minimal Postgres-aware setup:

global
  log /dev/log local0
  maxconn 4096

defaults
  log global
  mode tcp
  timeout connect 5s
  timeout client  60s
  timeout server  60s

frontend pg_write
  bind 10.0.0.10:5000
  default_backend pg_primary

frontend pg_read
  bind 10.0.0.10:5001
  default_backend pg_replicas

backend pg_primary
  option httpchk GET /primary
  http-check expect status 200
  default-server inter 2s fall 2 rise 2 on-marked-down shutdown-sessions
  server db1 10.0.0.11:5432 check port 8008
  server db2 10.0.0.12:5432 check port 8008

backend pg_replicas
  option httpchk GET /replica
  http-check expect status 200
  balance roundrobin
  default-server inter 2s fall 2 rise 2
  server db1 10.0.0.11:5432 check port 8008
  server db2 10.0.0.12:5432 check port 8008

Restart HAProxy and confirm the listeners are up:

sudo systemctl enable --now haproxy
sudo systemctl restart haproxy
sudo ss -lntp | egrep '5000|5001'

Quick connectivity tests from any node that has psql installed:

psql "host=10.0.0.10 port=5000 user=postgres dbname=postgres" -c "select inet_server_addr(), pg_is_in_recovery();"
psql "host=10.0.0.10 port=5001 user=postgres dbname=postgres" -c "select inet_server_addr(), pg_is_in_recovery();"

On port 5000, pg_is_in_recovery() should be false. On port 5001, it should usually be true (unless only the leader is up).

Verify streaming replication (WAL, slots, and replay lag)

Patroni uses physical streaming replication via pg_basebackup. If slots are enabled, it also uses replication slots to keep WAL from disappearing under a lagging replica. Check status from the leader:

psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "select client_addr, state, sync_state, write_lag, flush_lag, replay_lag from pg_stat_replication;"

On the replica, confirm it’s replaying WAL:

psql -h 10.0.0.10 -p 5001 -U postgres -d postgres -c "select pg_is_in_recovery(), now() - pg_last_xact_replay_timestamp() as replay_delay;"

If replay_delay stays large, the replica can’t keep up. Look at I/O wait, network throughput, and WAL generation rate. From there, you’re usually choosing between faster storage and tuning checkpoint/WAL behavior.

Automatic failover testing (do it safely, then prove it)

High availability isn’t “done” until you’ve broken it on purpose. You’re validating three things:

Patroni promotes the replica to leader
HAProxy notices and routes writes to the new leader
Your application reconnects cleanly (or you at least understand what it does)

1) Create a test table and write through HAProxy:

psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "create table if not exists ha_test(id bigserial primary key, note text, created_at timestamptz default now());"
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "insert into ha_test(note) values('before failover');"

2) Identify the current leader:

sudo -u postgres patronictl -c /etc/patroni.yml list

3) Simulate a leader outage (on the leader node):

sudo systemctl stop patroni

Give it 15–40 seconds (based on your TTL/loop_wait), then check the cluster from the remaining node:

sudo -u postgres patronictl -c /etc/patroni.yml list

4) Confirm HAProxy routes writes to the new leader:

psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "select inet_server_addr(), pg_is_in_recovery();"
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "insert into ha_test(note) values('after failover');"
psql -h 10.0.0.10 -p 5000 -U postgres -d postgres -c "select count(*) from ha_test;"

5) Bring the old leader back:

sudo systemctl start patroni

It should rejoin as a replica (often via pg_rewind). If it doesn’t, tail the logs and fix the real issue instead of rerunning commands blindly:

sudo journalctl -u patroni -f

To track uptime and failover events with low overhead, pair the cluster with Beszel monitoring on all nodes.

Backups you can actually restore: pgBackRest + restore drills

Streaming replication isn’t a backup. It happily replicates dropped tables, bad migrations, and corruption. Treat backups as a separate system, with a restore process you practice.

For Postgres 16/17, pgBackRest is a solid default. You get full/incremental backups, compression, retention policies, and WAL archiving for point-in-time recovery (PITR).

In this tutorial, you’ll store backups on proxy1 (or a separate backup VPS). In production, a dedicated backup target with tighter access controls and plenty of disk is usually worth it.

Install pgBackRest (db1 and proxy1)

Install on the active leader. Since Patroni can move leadership, install on both db1 and db2 if you want the backup job to follow the primary. For brevity, the steps show leader + proxy1.

On db1 and db2:

sudo apt-get update
sudo apt-get install -y pgbackrest

On proxy1 (backup repository):

sudo apt-get update
sudo apt-get install -y pgbackrest

Set up SSH for repository access

Create a restricted user on proxy1 for backup storage:

sudo adduser --disabled-password --gecos "" pgbackrest
sudo mkdir -p /var/lib/pgbackrest
sudo chown -R pgbackrest:pgbackrest /var/lib/pgbackrest
sudo chmod 750 /var/lib/pgbackrest

From each DB node, allow postgres to SSH to proxy1 as pgbackrest using keys. On db1 (repeat on db2):

sudo -u postgres ssh-keygen -t ed25519 -f /var/lib/postgresql/.ssh/id_ed25519 -N ""
sudo -u postgres ssh-copy-id -i /var/lib/postgresql/.ssh/id_ed25519.pub pgbackrest@10.0.0.10

Configure pgBackRest

On proxy1 (/etc/pgbackrest/pgbackrest.conf):

sudo nano /etc/pgbackrest/pgbackrest.conf

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=4
repo1-retention-diff=8
compress-type=zst
start-fast=y

[pg-ha]
pg1-host=db1
pg1-host-user=postgres
pg1-path=/var/lib/postgresql/patroni
pg2-host=db2
pg2-host-user=postgres
pg2-path=/var/lib/postgresql/patroni

On db1 and db2, create a matching config pointing to the repo on proxy1 (same stanza name):

sudo nano /etc/pgbackrest/pgbackrest.conf

[global]
repo1-host=10.0.0.10
repo1-host-user=pgbackrest
repo1-path=/var/lib/pgbackrest
compress-type=zst

[pg-ha]
pg1-path=/var/lib/postgresql/patroni

Now hook WAL archiving into pgBackRest. Because Patroni manages postgresql.conf parameters, update the Patroni config and reload. Change this on both nodes under bootstrap.dcs.postgresql.parameters:

Set archive_command to pgBackRest:

archive_mode: "on"
archive_command: 'pgbackrest --stanza=pg-ha archive-push %p'

Then reload Patroni on each node:

sudo systemctl restart patroni

Create the stanza (run on the current leader DB node):

sudo -u postgres pgbackrest --stanza=pg-ha --log-level-console=info stanza-create

Run a full backup:

sudo -u postgres pgbackrest --stanza=pg-ha --type=full --log-level-console=info backup

Verify:

sudo -u postgres pgbackrest --stanza=pg-ha info

Restore drill (the part most teams skip)

Run this on a separate test VPS, not on your live cluster. The goal is simple: restore a backup and confirm the database starts cleanly and answers queries.

High-level restore steps on a fresh test node:

Install Postgres (same major version) and pgBackRest
Configure pgBackRest to point at the repo on proxy1
Stop Postgres, wipe the data dir, restore, start Postgres

# On the restore target node:
sudo systemctl stop postgresql || true
sudo rm -rf /var/lib/postgresql/patroni/*

# Restore latest backup:
sudo -u postgres pgbackrest --stanza=pg-ha --log-level-console=info restore --pg1-path=/var/lib/postgresql/patroni

# Start postgres manually for verification (outside Patroni for the drill):
sudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl -D /var/lib/postgresql/patroni -l /tmp/pg.log start

psql -U postgres -d postgres -c "select now();"

If you need PITR, add a recovery.signal and set restore_command plus a recovery_target_time. pgBackRest can generate recovery settings, but don’t leave PITR as a future task. Run at least one PITR drill per quarter.

Operational checklist (common pitfalls that bite later)

Two nodes is a risk tradeoff. Add a third Postgres node if your RPO/RTO is strict, or if maintenance windows are painful.
Keep versions aligned. Postgres minor version differences are usually fine, but Patroni upgrades should be tested on staging first.
Make HAProxy timeouts explicit. Long idle transaction sessions can collide with low HAProxy timeouts. Tune to your app’s behavior.
Watch disk for WAL growth. A replica that falls behind can explode WAL usage on the leader if slots are enabled.
Use SCRAM everywhere. The sample uses scram-sha-256; keep it that way unless you have a hard constraint.

If your application stack also sits behind Nginx, the troubleshooting mindset is similar: health checks, timeouts, and upstream behavior. This guide is useful when you’re trying to separate “app slow” from “server slow”: Fix high TTFB and improve server response time.

If you want predictable performance for Postgres HA, start with a VPS that has consistent CPU and fast NVMe. A HostMyCode VPS is a clean fit for Patroni-based clusters, and managed VPS hosting is a good option if you don’t want to own patch cycles and baseline hardening.

FAQ: Patroni + etcd + HAProxy for PostgreSQL HA

Do I need three database nodes for “real” HA?

For strict high availability, three database nodes is the safer design. With two DB nodes, you can fail over, but you have less tolerance for maintenance and less margin for network partitions. The third node in this tutorial is for etcd tie-breaking, not extra database capacity.

Can I run etcd on the database nodes instead of proxy1?

You can, but a two-node etcd cluster is a bad idea. It can’t form a healthy quorum reliably during failures. Putting etcd on a third node avoids that trap.

How do clients connect: port 5432 or HAProxy ports?

Point applications at HAProxy. Use the write port (5000 here) for normal app traffic. Use the read port (5001) only if your app can tolerate replica lag and read-after-write inconsistencies.

What’s the simplest way to test failover without corrupting data?

Stop Patroni on the leader with systemctl stop patroni and let Patroni promote the replica. Avoid pulling power until you’ve validated clean failover and rewind behavior in your environment.

What should I back up: the replica or the leader?

Back up from the leader unless you have a clear reason to offload to a replica and you’ve tested it. Either way, the backup must include WAL archiving, and you must run restore drills.

Summary: a practical HA baseline you can validate

You now have Patroni coordinating Postgres 16/17 leadership via etcd, HAProxy providing stable write/read endpoints, streaming replication in place, and a repeatable failover test. The restore drill is what turns “we have HA” into “we can recover.”

If you’re putting a Postgres 16/17 high-availability cluster on HostMyCode VPS into production, size for WAL and I/O—not just dataset size—and put failover and restore drills on the calendar. The goal is a cluster that stays boring even on bad days.