Back to blog
Blog

VPS Disaster Recovery Planning in 2026: Practical Runbook for Backups, Restore Tests, and Fast Rollback

VPS disaster recovery planning in 2026: build a tested runbook for backups, restore drills, DNS failover, and rollback.

By Anurag Singh
Updated on Apr 13, 2026
Category: Blog
Share article
VPS Disaster Recovery Planning in 2026: Practical Runbook for Backups, Restore Tests, and Fast Rollback

Your backup isn’t your safety net. Your restore is.

VPS disaster recovery planning gets real the first time you’re rebuilding a server at 3 a.m. after a bad deploy, a disk failure, or an accidental rm -rf. Keep this runbook in your repo and treat it like production code: it spells out what to back up, how to prove restores work on a schedule, and how to roll back without inventing steps under pressure.

What “disaster recovery” looks like for a typical VPS in 2026

Most VPS outages aren’t dramatic. They’re the same few problems on repeat:

  • Configuration drift after “just this one quick fix” on production.
  • Bad releases that break boot, networking, TLS, or migrations.
  • Storage problems (filesystem corruption, degraded NVMe, or plain old “disk full”).
  • Credential mistakes (rotated secrets not updated everywhere, locked-out SSH access).
  • Provider-side issues that force you to move fast to a new VPS.

A usable DR plan for a VPS comes down to two targets you can defend:

  • RPO (Recovery Point Objective): how much data you can lose (e.g., 15 minutes for a DB, 24 hours for a static site).
  • RTO (Recovery Time Objective): how quickly you can be back online (e.g., 30 minutes for an internal API, 2 hours for a SaaS).

If you don’t write down RPO/RTO, you’ll either overpay for backup frequency—or learn too late that daily backups can’t meet expectations.

Prerequisites: what you need before you write the runbook

  • A VPS running Linux (examples below use Ubuntu 24.04 and Debian 13 conventions).
  • Root or sudo access.
  • One external backup target: object storage, a second VPS, or an offsite machine reachable via SSH.
  • A DNS provider that supports low TTL changes (or an API for failover).
  • A place to store your runbook and recovery keys (private repo + password manager).

If your production workload is outgrowing shared hosting, a VPS is usually where DR discipline starts paying off. For repeatable recovery workflows, a HostMyCode VPS gives you the control you need for snapshots, scripting, and clean rebuilds.

Step 1: Inventory your “must restore” assets (don’t guess)

Start your runbook with a short table. Keep it blunt and specific. For each component, record where it lives, how you back it up, and the exact restore method.

  • Application code: Git remote; restore = redeploy from tag.
  • Database: Postgres/MySQL data; restore = import + verify schema/app health.
  • Uploads: /var/www/app/uploads/; restore = rsync/object sync.
  • Secrets: environment files, tokens, TLS private keys; restore = vault/password manager + file placement.
  • System config: /etc/nginx/, /etc/systemd/system/, firewall rules.
  • Observability: alerts/contact routes; restore = reapply config so you know when you’re broken.

Common miss: you back up /etc and still lose custom systemd units in /etc/systemd/system, or the app secrets sitting in /opt/app/.env.

Step 2: Choose a backup approach you can actually restore from

On a VPS, you usually want two layers:

  • System images (snapshots): fast for full-server rollback, but not always granular, and not always portable across providers.
  • File/database backups: portable and auditable, but they require restore steps and verification.

The safest pattern is “snapshot + app-aware backups.” Snapshots buy you speed. App-aware backups keep you safe from “the snapshot captured the same corruption” and let you restore onto a fresh VPS.

Also connect DR to your patch strategy. Reboots and patch windows are where weak recovery habits show up fast. HostMyCode’s kernel patching guide pairs well with this runbook: Linux kernel security patches and automated patching.

Step 3: Write your backup checklist (with real paths and commands)

Paste the checklist below into RUNBOOK.md and adjust it for your server. The goal is the same every time: predictable output you can restore.

3.1 Files to include (example)

  • /etc/nginx/
  • /etc/ssh/sshd_config (and /etc/ssh/sshd_config.d/)
  • /etc/systemd/system/myapp.service
  • /opt/myapp/.env (if you store secrets here—consider avoiding it)
  • /var/www/myapp/uploads/
  • /var/lib/postgresql/ (only if you do file-level DB backups; prefer dumps/basebackups)

3.2 Database backup commands (Postgres example)

For a small-to-medium Postgres instance, pg_dump is still a solid choice. For larger databases, use base backups or WAL archiving.

# Create a timestamped dump (custom format supports parallel restore)
export PGHOST=127.0.0.1
export PGPORT=5432
export PGUSER=app_backup
export PGPASSWORD='REDACTED'

TS=$(date -u +%Y%m%dT%H%M%SZ)
pg_dump -Fc -d appdb -f /var/backups/postgres/appdb_${TS}.dump

# Expected output: (none on success)
ls -lh /var/backups/postgres/ | tail -n 3

Look for a non-trivial file size (not a few KB). A tiny dump usually means you hit the wrong database, dumped a template DB, or ran into permissions.

3.3 File backups with tar + zstd (portable and fast)

sudo install -d -m 0750 /var/backups/files
TS=$(date -u +%Y%m%dT%H%M%SZ)

sudo tar --xattrs --acls -I 'zstd -6 -T0' \
  -cpf /var/backups/files/vps_files_${TS}.tar.zst \
  /etc/nginx /etc/systemd/system /var/www/myapp/uploads

sudo zstd -t /var/backups/files/vps_files_${TS}.tar.zst
# Expected output: ... OK

The --xattrs --acls flags aren’t decoration. If you rely on ACLs for shared directories or hardened service files, you want them in the archive.

3.4 Ship backups off the VPS (rsync over SSH)

Keep at least one copy off-host. A backup that lives on the same disk is handy, but it’s not recovery.

# On the backup target, create a restricted user (once)
sudo useradd -m -s /bin/bash backuprecv
sudo install -d -m 0750 -o backuprecv -g backuprecv /srv/backup/vps-01

# From the VPS, push backups
rsync -a --delete \
  -e 'ssh -p 22' \
  /var/backups/ \
  backuprecv@203.0.113.50:/srv/backup/vps-01/

If you want to harden the receiving side (forced commands, restricted keys, separate volume), add it to your backlog. Don’t let “perfect” delay your first working DR loop.

Step 4: Schedule backups and add one sanity check

Silent failures are the norm. Fix that with one cheap sanity check: alert when today’s files are missing or suspiciously small.

Create /usr/local/sbin/backup-nightly.sh:

sudo tee /usr/local/sbin/backup-nightly.sh > /dev/null <<'EOF'
#!/usr/bin/env bash
set -euo pipefail

TS=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP_DIR=/var/backups

mkdir -p ${BACKUP_DIR}/postgres ${BACKUP_DIR}/files

# Postgres dump
export PGHOST=127.0.0.1
export PGPORT=5432
export PGUSER=app_backup
export PGPASSWORD='REDACTED'
pg_dump -Fc -d appdb -f ${BACKUP_DIR}/postgres/appdb_${TS}.dump

# File archive
tar --xattrs --acls -I 'zstd -6 -T0' -cpf \
  ${BACKUP_DIR}/files/vps_files_${TS}.tar.zst \
  /etc/nginx /etc/systemd/system /var/www/myapp/uploads

# Sanity checks
zstd -t ${BACKUP_DIR}/files/vps_files_${TS}.tar.zst >/dev/null
DUMPSIZE=$(stat -c%s ${BACKUP_DIR}/postgres/appdb_${TS}.dump)
if [ "${DUMPSIZE}" -lt 5000000 ]; then
  echo "ERROR: dump too small (${DUMPSIZE} bytes)" 1>&2
  exit 2
fi

# Retention (keep 14 days locally)
find ${BACKUP_DIR}/postgres -type f -name '*.dump' -mtime +14 -delete
find ${BACKUP_DIR}/files -type f -name '*.tar.zst' -mtime +14 -delete
EOF

sudo chmod 0750 /usr/local/sbin/backup-nightly.sh

Schedule it with systemd timers (more reliable than cron for logging):

sudo tee /etc/systemd/system/backup-nightly.service > /dev/null <<'EOF'
[Unit]
Description=Nightly backup (files + database)

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/backup-nightly.sh
EOF

sudo tee /etc/systemd/system/backup-nightly.timer > /dev/null <<'EOF'
[Unit]
Description=Run nightly backup at 02:20 UTC

[Timer]
OnCalendar=*-*-* 02:20:00 UTC
Persistent=true

[Install]
WantedBy=timers.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now backup-nightly.timer
systemctl list-timers --all | grep backup-nightly

Expected output includes the unit name and a next run time. If it doesn’t, the timer isn’t enabled.

Step 5: The restore drill (the part most teams skip)

Backups are comforting. Restore drills are what keep you out of trouble.

Run a restore test at least monthly, and after any of these changes: major OS upgrade, DB major version upgrade, new secrets layout, new storage layout, or new reverse proxy config.

Use a scratch VPS that mirrors production (same distro family if possible). If production runs Debian 13, test on Debian 13. If you can’t match it exactly, write down what’s different and why.

If you need a disposable environment for restore testing, a second low-cost HostMyCode VPS works well because you can rebuild it repeatedly without touching production.

5.1 Restore files

# Pull the backup from the backup target
rsync -a -e 'ssh -p 22' \
  backuprecv@203.0.113.50:/srv/backup/vps-01/files/ \
  /tmp/restore/files/

# Extract
sudo tar --xattrs --acls -I zstd -xpf /tmp/restore/files/vps_files_20260115T022001Z.tar.zst -C /

# Validate key configs exist
sudo nginx -t
# Expected output: syntax is ok / test is successful

5.2 Restore Postgres dump

# Create DB + restore role (example)
sudo -u postgres psql <<'SQL'
CREATE ROLE app_user LOGIN PASSWORD 'REDACTED';
CREATE DATABASE appdb OWNER app_user;
SQL

# Restore the dump
pg_restore -d appdb -U postgres --no-owner /tmp/restore/postgres/appdb_20260115T022001Z.dump

# Quick verification
sudo -u postgres psql -d appdb -c "\dt" | head

Expected output should list tables. If you see “0 rows,” you likely restored to the wrong database or hit a permissions issue.

5.3 Verify application health (HTTP + logs)

Use a known endpoint. Example: your API health check on port 8081 behind Nginx.

curl -fsS http://127.0.0.1/healthz
# Expected output: ok

sudo journalctl -u myapp --no-pager -n 50
sudo journalctl -u nginx --no-pager -n 50

If recovery drags because the server “works” but the app times out, keep this performance diagnostic handy: fix high TTFB.

Step 6: Design rollback paths for the failures you actually have

Rollback isn’t a single button. You need a few levers, and you should know which one you’re pulling.

  • App rollback: redeploy previous Git tag or container image.
  • Config rollback: revert Nginx/systemd config from version control or config backup.
  • DB rollback: restore to point-in-time (hard) or restore last known good dump (simpler, more data loss).
  • Server rollback: snapshot revert (fastest, least portable).

One practice that pays off quickly: keep your Nginx config and systemd units in Git, and treat what’s on the server as deployable artifacts. If you run multiple apps behind one proxy, stick to a predictable routing pattern; this guide helps you avoid rewrite loops and path bugs during recovery: route multiple applications using Nginx URL paths.

Step 7: DNS and failover: the quick win most VPS owners miss

Even without active-active, you can shave real downtime by preparing DNS and TLS ahead of time.

  • Lower TTL for key records (e.g., 60–300 seconds) so IP changes propagate faster.
  • Document cert strategy: if you use Let’s Encrypt, record how you’ll re-issue on a new box (and where ACME challenge paths live).
  • Keep a “cold spare” plan: a minimal VPS template that can be promoted quickly.

DNS is also where recoveries stall because someone can’t access the zone, forgot where renewal emails go, or doesn’t have the right account. Keep ownership clean. If you’re consolidating, get domains and DNS under one roof: HostMyCode Domains.

Common pitfalls (and the small fixes that prevent them)

  • Backups succeed but restores fail because you didn’t capture extensions, roles, or permissions. Fix: include pg_dumpall --globals-only (or recreate roles in the runbook).
  • “Works on restore VPS” but not on prod because you forgot kernel parameters, firewall rules, or system packages. Fix: document sysctl changes and package lists (dpkg --get-selections for Debian/Ubuntu).
  • Secrets drift because you rotate tokens in one place only. Fix: keep a single source of truth (vault/password manager) and scripted deployment.
  • Uploads missing because they live outside your assumed directory. Fix: grep configs for upload paths and check your app settings.
  • Disk fills during backup because you compress too slowly or keep too many copies locally. Fix: retention + use zstd + ship off-host early.

If high memory or OOM events show up during compression or restores, investigate before the next incident. This guide is a solid reference: troubleshooting high memory usage.

Rollback plan: how to back out safely after a failed restore or deploy

Your runbook needs a rollback section that assumes the last action made things worse. Keep it short, and keep it executable.

  1. Stop the app to prevent further writes:
    sudo systemctl stop myapp
  2. Revert reverse proxy config to the last known good version (from Git or your file backup), then validate:
    sudo nginx -t && sudo systemctl reload nginx
  3. Restore DB from the previous backup (accepting the RPO loss), or switch traffic to the known-good node if you have one.
  4. If you used snapshots, revert the VPS snapshot only after you’ve confirmed you can’t recover with app-level rollback. Snapshots roll back everything, including logs you may need for forensics.
  5. Post-rollback verification: confirm /healthz, confirm write path (create a test record), confirm background jobs and queues are running.

Next steps: make the plan boring (and that’s a compliment)

  • Add a restore calendar event: monthly restore drills to a scratch VPS.
  • Automate verification: a simple script that restores the latest DB dump into a disposable container and runs a few queries.
  • Improve RPO for databases: consider WAL archiving or replica-based recovery if daily dumps aren’t enough.
  • Add monitoring: alert on backup failures, low disk space, and SSL expiry.

If you want recovery to feel predictable, start with infrastructure you can rebuild quickly and repeatably. A managed VPS hosting setup can offload a lot of OS-level maintenance, while a standard HostMyCode VPS gives you full control for snapshots, scripted restores, and repeatable drills.

FAQ

How often should I test restores for a VPS?

At least monthly, and after any major changes to your OS, database version, secrets handling, or reverse proxy configuration. The point is to find restore surprises on your schedule, not during an outage.

Is a VPS snapshot enough for disaster recovery?

It helps with fast rollback, but it’s not enough by itself. Snapshots can capture the same corruption or misconfiguration, and they may not be portable. Pair snapshots with app-aware backups (DB dumps/base backups + file archives) stored off-host.

What’s a reasonable TTL for DNS in a DR plan?

For services where you may need to change IPs quickly, 60–300 seconds is a practical range in 2026. Don’t set it low everywhere; lower TTL on the records that actually matter for failover.

What should I back up first if I’m behind and need a minimal plan?

Database + uploads + reverse proxy config + secrets. Code should already be in Git; if it isn’t, fix that before you call it “production.”

How do I avoid losing logs during a snapshot rollback?

Ship logs off-host (syslog/agent) or store important logs on a separate volume. If you rely on on-box logs for incident review, snapshot rollback can erase your evidence.

Summary

VPS disaster recovery planning isn’t about collecting tools. It’s about writing down exact steps, proving restores work, and keeping rollback options simple. If you can restore onto a clean VPS on a timer, most “disasters” become routine maintenance.

For teams that want consistent rebuilds and predictable performance, start with a reliable platform like a HostMyCode VPS, then refine your runbook until restores feel boring.