Disaster Recovery Tutorial (2026): Build a Restore-First Runbook for VPS & Dedicated Servers

A backup you’ve never restored isn’t a backup. It’s a comforting file sitting on a disk somewhere. This disaster recovery tutorial walks you through building a restore-first runbook for VPS and dedicated servers, then proving it works with a timed restore drill and a clean DNS cutover.

This isn’t about buying more tools. It’s about shrinking recovery time and removing guesswork when you’re tired, stressed, and clients keep asking for updates.

Disaster recovery tutorial: define your recovery targets (RTO/RPO) for real hosting incidents

Before you write a single command, decide what “recovered” means for each service you run. In hosting, two targets do most of the work:

RTO (Recovery Time Objective): how long you can be down. Example: 60 minutes.
RPO (Recovery Point Objective): how much data you can lose. Example: 15 minutes (orders, contact forms, email).

Set these per workload, not per server. A brochure site can tolerate a larger RPO than WooCommerce or business email.

Scope your DR plan: what you will restore (and what you won’t)

Most restores fail because the plan tries to bring back “everything” at once. Don’t do that. Break the runbook into tiers, and restore in order:

Tier 0 (identity + access): SSH access, sudo, firewall rules, provider console access.
Tier 1 (web): Nginx/Apache, PHP-FPM, vhosts, TLS, app files, cron.
Tier 2 (data): database dumps/replicas, uploads, object storage mounts.
Tier 3 (email): mailboxes, queues, DKIM keys, reputation-sensitive settings.

If you host multiple sites, treat each domain as its own workload with a clear “done” definition.

Prerequisites: a staging “recovery box”, safe DNS, and credentials you can reach during an outage

You need somewhere to restore to. For many teams, the fastest path is spinning up a fresh VPS and treating it as the recovery target.

Keep a provider account with API access and MFA enabled.
Store SSH keys and a break-glass password in a password manager with offline access.
Maintain a small “recovery VPS” budget line item if your RTO is strict.

If you want a predictable recovery environment, a managed VPS hosting plan helps because patching, baseline hardening, and monitoring stay consistent across rebuilds. If you prefer full control, start with a HostMyCode VPS and keep the runbook explicit and short.

Build a “restore-first” inventory (your runbook’s backbone)

Create one file you can print or open offline. Name it DR-INVENTORY.md. Keep it in a private repo and in your password manager’s secure notes.

Include:

Server facts: OS (Ubuntu 24.04/26.04 LTS, Debian 12/13, AlmaLinux 9/10), CPU/RAM/disk, public IPs.
Network: firewall policy (UFW/nftables), allowed admin IPs, SSH port, fail2ban status.
DNS: registrar, DNS provider, TTL defaults, current A/AAAA/CNAME/MX records.
TLS: Let’s Encrypt vs purchased cert, renewal method, where certs live.
Web stack: Nginx/Apache, PHP version, PHP-FPM pools, document roots.
Data locations: database name/user, uploads path, any external storage.
Monitoring: what alerts you rely on and how to silence them during drills.

Tip: if you need a repeatable hardening baseline for rebuilds, link your internal steps and keep a public reference handy, such as Server hardening tutorial for a new Ubuntu VPS.

Create the runbook structure (so you can execute under stress)

Use an order that matches how outages actually go. You want a sequence you can follow even when you’re running on caffeine and bad news:

Triage: decide if this is restore-worthy or fixable in place.
Containment: stop the bleeding (block traffic, disable compromised keys, freeze deployments).
Provision: bring up a clean recovery server.
Restore: restore configs + app + data.
Validate: functional tests and log checks.
Cutover: DNS or IP switch with rollback plan.
Post-incident: rotate secrets, patch root cause, document timings.

Triage checklist: decide whether to restore, rebuild, or repair

Keep the decision tree short and opinionated. Ambiguity wastes time.

Restore/rebuild if: root compromise suspected, ransomware, unknown cron jobs, or kernel-level tampering.
Repair in place if: one service is down (PHP-FPM, Redis), disk is full, certificate renewal failed.

If compromise is even plausible, treat the server like evidence. Don’t “clean it up” and continue serving production. Fail over to recovery, then do forensics later. For a workable flow you can mirror, see: VPS incident response tutorial.

Provision a clean recovery server (Ubuntu example)

This drill assumes a WordPress or PHP site with Nginx and PHP-FPM on Ubuntu. If you run Apache or LiteSpeed, swap packages accordingly.

Deploy a new VPS in the same region as production (latency matters) with enough disk for the restore.

Update and install baseline tooling:

sudo apt update
sudo apt -y upgrade
sudo apt -y install nginx php-fpm unzip curl jq ufw

Lock down SSH and firewall (keep it minimal during a drill):

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw status

If you run cPanel/WHM, provisioning changes (licensing, services, port policy). For Nginx fronting Apache/cPanel, keep this nearby: Reverse proxy setup guide for Nginx in front of Apache/cPanel.

Restore web files safely (avoid permissions and path mistakes)

Use a consistent document root and stick to it. Example:

sudo mkdir -p /var/www/example.com
sudo chown -R www-data:www-data /var/www/example.com

Restore your app files using your existing backup method. For file-based backups, rsync is a solid default:

rsync -aHAX --numeric-ids --delete \
  /mnt/backup/example.com/ \
  /var/www/example.com/

Pitfall: restoring as root and forgetting ownership breaks uploads, cache directories, and auto-updates. Fix ownership explicitly after the restore:

sudo chown -R www-data:www-data /var/www/example.com

Restore Nginx site config and verify syntax before you reload

If possible, keep Nginx config in version control. If you can’t, back it up as plain text and restore to:

/etc/nginx/nginx.conf
/etc/nginx/sites-available/example.com
/etc/nginx/snippets/ (if you use snippets)

Example minimal server block (PHP-FPM socket path varies by PHP version):

server {
  listen 80;
  server_name example.com www.example.com;
  root /var/www/example.com/public;
  index index.php index.html;

  location / {
    try_files $uri $uri/ /index.php?$args;
  }

  location ~ \.php$ {
    include snippets/fastcgi-php.conf;
    fastcgi_pass unix:/run/php/php8.3-fpm.sock;
  }
}

Enable and test:

sudo ln -sf /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/example.com
sudo nginx -t
sudo systemctl reload nginx

Restore TLS certificates (or re-issue quickly)

In many recoveries, re-issuing TLS is faster than copying old cert files around. That works as long as DNS points to the new server and validation succeeds.

If you use Let’s Encrypt with Certbot, follow your usual flow and then confirm redirects and renewal timers. This guide covers safe renewals and common redirect mistakes: SSL Setup Guide Tutorial (2026).

Fast check:

sudo certbot --version
sudo certbot certificates

Restore the database and validate app connectivity

This tutorial stays focused on the restore path, not database tuning. Your objective is simple: get a known-good dataset in place and confirm the app can read and write.

Typical steps for a WordPress-style restore:

Restore the DB dump into the target database.
Confirm credentials in your app config (for WordPress: wp-config.php).
Run a quick health check page load and log scan.

Quick diagnostics: if the homepage throws 502/504, look at PHP-FPM first:

sudo systemctl status php8.3-fpm --no-pager
sudo tail -n 80 /var/log/nginx/error.log

Don’t forget the “small stuff” that breaks production: cron, uploads, and outbound email

Restores often “work” and still fail in production because key details weren’t in the backup set. Make these explicit runbook steps:

Cron jobs: restore from /etc/cron.d/, /etc/crontab, and user crontabs (crontab -l).
Uploads and writable dirs: confirm permissions on wp-content/uploads, cache, and temp dirs.
SMTP/email: make sure the recovery IP isn’t blocked and that SPF/DKIM/DMARC still align.

If your mail depends on DNS auth records, keep this in the runbook: Email authentication setup tutorial. It’s the difference between “site is up” and “nobody receives password resets”.

Validation: a tight checklist you can run in 10 minutes

Validate before you touch DNS. Use a hosts file override so you can test the recovery server privately.

Point your laptop at the recovery IP (macOS/Linux):

sudo sh -c 'echo "203.0.113.50 example.com www.example.com" >> /etc/hosts'

HTTP checks:

curl -I http://example.com
curl -I https://example.com

App checks: log in, submit a test form, place a test order (if eCommerce), upload an image.

Error scan (look for fatals, permission errors, and missing extensions):

sudo tail -n 200 /var/log/nginx/error.log
sudo journalctl -u php8.3-fpm -n 200 --no-pager

Keep this checklist printable. In a real incident, anything that isn’t short and concrete gets skipped.

Cutover plan: low-downtime DNS switch with rollback

DNS is how you steer traffic. If you rush it, you can create split-brain behavior: some users land on the old server while others hit the recovery host.

Set sane TTLs ahead of time for key records (A/AAAA and CNAME). For many sites, 300 seconds is a practical default. For high-risk windows (migrations, big releases), temporarily drop to 60–120 seconds.

Use a predictable cutover sequence:

Lower TTL (ahead of time, not during the outage if possible).
Verify recovery server is ready using hosts override.
Change A/AAAA records to the recovery IP.
Monitor logs for real traffic and errors.
Keep the old server online (if safe) until traffic stabilizes.

If you want a hosting-focused walkthrough (including rollback), keep this bookmarked: DNS Cutover Tutorial (2026).

Time the drill and capture metrics (so DR improves every quarter)

A runbook earns its keep when you can prove it gets faster. After each drill, record:

Provisioning time: from “need a server” to “SSH works”.
Restore time: files + DB + config.
Validation time: to pass the checklist.
Cutover time: until most clients resolve to the new IP.
Issues found: missing secrets, wrong permissions, outdated docs, unknown cron.

Put the numbers into a simple table inside the runbook. Next quarter, you’ll know exactly what slowed you down.

Hardening after recovery: what to rotate and what to audit

Once the service is stable, assume credentials might be exposed, especially after compromise or shared-access incidents. Rotate in a clear order:

Control panel logins (WHM/cPanel/Plesk/DirectAdmin) and SSH keys.
Application admin users and database passwords.
API keys (payment gateways, SMTP providers, backups, CDN).
Re-issue TLS if private keys may have leaked.

If you rely on security headers, confirm they survived the restore and weren’t removed during the scramble. Reference: HTTP Security Headers Tutorial (2026).

Dedicated server notes: restoring is similar, provisioning is not

The restore steps look similar on a dedicated server. Provisioning rarely does. Plan for two recovery paths:

Alternate host recovery: restore to a standby VPS or another dedicated server, then cut over DNS.
Same-host rebuild: reimage, restore, validate, cut over at the IP level (riskier if hardware is failing).

If your business needs a strict RTO, keep enough spare capacity to restore elsewhere quickly. A smaller standby HostMyCode VPS is often enough to keep core sites online while you rebuild the primary server.

Summary: your DR plan is a runbook plus a calendar

Write the inventory and keep it accessible during an outage.
Provision a clean recovery server and practice restoring into it.
Validate with a short checklist and cut over DNS with rollback.
Measure your timings and repeat the drill quarterly.

If you want fewer moving parts during restores, consider managed VPS hosting so baseline security and system upkeep stay consistent across rebuilds. If you want hands-on control with predictable pricing, start with a HostMyCode VPS and keep your runbook in version control.

If you’re writing a DR runbook for client sites, build on infrastructure you can recreate quickly. A HostMyCode VPS keeps recovery drills realistic, and managed VPS hosting can cut the routine maintenance load when you manage lots of domains.

FAQ: disaster recovery runbooks for hosting servers

How often should I run a disaster recovery drill?

Quarterly is a solid baseline in 2026. Run an extra drill after big changes: control panel upgrades, major PHP version jumps, or DNS/provider moves.

Should I restore to the same server or a new one?

For suspected compromise or unknown changes, restore to a clean new server. For simple outages (disk full, service crash), repair in place can be faster.

What’s the fastest way to test a restored site without changing DNS?

Use a local hosts file override to point your computer at the recovery IP, then run curl -I checks and real browser tests before cutover.

What DR step is most commonly missed for WordPress hosting?

Outbound email and scheduled tasks. Password resets, form notifications, and order confirmations often fail until SPF/DKIM and cron are verified.

Do I need low TTL all the time?

No. Keep TTL reasonable (like 300s) for stability. Drop TTL temporarily before planned work, and document the rollback to your normal TTL after cutover.