VPS Troubleshooting Tutorial (2026): Diagnose Slow Website Response on Ubuntu with Nginx/Apache, PHP-FPM, and System Metrics

Slow pages on a VPS usually aren’t “bad server” problems. They’re measurable bottlenecks.

Common causes include one CPU core pegged, a PHP-FPM pool stuck in a queue, a disk queue that never drains, or an upstream that starts timing out. This VPS troubleshooting tutorial gives you a repeatable workflow on Ubuntu. You’ll find the choke point fast. Then you’ll apply the smallest change that improves response time.

You’re not here to tune everything. You’re here to prove what’s slow (TTFB, upstream, PHP, disk, DNS, TLS), capture evidence, and change one variable at a time.

What you’ll need (and what “slow” means)

This tutorial assumes:

Ubuntu 22.04 LTS or Ubuntu 24.04 LTS
Nginx or Apache (or both with Nginx as a reverse proxy)
PHP-FPM for PHP sites (WordPress, WooCommerce, Laravel, etc.)
Root or sudo access

Define “slow” upfront so you’re not chasing feelings. For many sites, problems show up when:

TTFB (time to first byte) exceeds ~400–800 ms consistently
p95 response time exceeds ~1.5–2.5 s
HTTP 499/502/504 rates increase during traffic spikes

If you’re dealing with recurring slowness, start with a stable baseline. A properly sized HostMyCode VPS gives you dedicated resources and predictable IO. Those two things make troubleshooting far more reliable.

Step 1: Confirm it’s server-side (quick external checks)

Start from your laptop. First, separate network/setup time (DNS/connect/TLS) from server processing time.

curl -s -o /dev/null -w '\
DNS: %{time_namelookup}\
Connect: %{time_connect}\
TLS: %{time_appconnect}\
TTFB: %{time_starttransfer}\
Total: %{time_total}\
HTTP: %{http_code}\
' https://example.com/

How to read it:

DNS high: resolver/TTL problem (often outside your VPS)
Connect/TLS high: network path, packet loss, or TLS handshake overhead
TTFB high but connect/TLS normal: web server/upstream/app bottleneck
Total high but TTFB normal: large downloads, slow client, or bandwidth constraints

If DNS time jumps around during incidents, review failover behavior and TTLs.

Pair this workflow with DNS failover with low TTL and health checks.

Step 2: Capture a “slow window” on the VPS (CPU, RAM, IO, load)

SSH into the VPS and collect metrics while the site is slow. Don’t troubleshoot a “fast” moment and hope it translates.

# Overall load and process view
uptime
sudo top -o %CPU

# Memory pressure and swap activity
free -h
sudo vmstat 1 10

# Disk IO saturation (look for high %util and long await)
sudo iostat -xz 1 10

# If iostat isn't installed
sudo apt-get update && sudo apt-get install -y sysstat

Signals that usually point to the real constraint:

CPU-bound: one core pinned, high user CPU, many PHP workers running
Memory pressure: low “available” memory, swap-in/out in vmstat, rising load
IO-bound: %util near 100% and await spikes; PHP requests back up
Run queue: vmstat column r > CPU cores for sustained periods

If you keep getting caught without data, add basic monitoring. Graphs and alerts can show the pattern before you SSH in.

See VPS monitoring with uptime, alerts, and resource tracking.

Step 3: Identify which layer is stalling (web server vs upstream vs PHP)

Now narrow the delay to a specific layer. Use status endpoints and logs to see where requests wait.

Follow the path that matches your stack.

Nginx: check active connections and upstream behavior

If you use Nginx, enable a stub status page. Lock it down to localhost.

# /etc/nginx/conf.d/status.conf
server {
  listen 127.0.0.1:8080;
  location /nginx_status {
    stub_status;
    allow 127.0.0.1;
    deny all;
  }
}

sudo nginx -t && sudo systemctl reload nginx
curl -s http://127.0.0.1:8080/nginx_status

Red flags:

Active connections climbs and stays high
Reading grows: clients uploading slowly or slowloris-type behavior
Waiting grows: keep-alives with a busy upstream can exhaust worker capacity

Apache: check scoreboard (if prefork/event is overloaded)

On Apache, mod_status shows what workers are doing. Keep it local.

sudo a2enmod status

# /etc/apache2/conf-available/status.conf
<Location /server-status>
    SetHandler server-status
    Require ip 127.0.0.1
</Location>

sudo a2enconf status
sudo systemctl reload apache2
curl -s http://127.0.0.1/server-status?auto | head

If many workers sit in one state (for example W sending reply or R reading), something is limiting concurrency.

Typical causes are slow clients or a backend that can’t keep up.

PHP-FPM: see if the pool is maxing out

PHP-FPM is a common choke point for WordPress and other PHP apps. Check pool status and look for queuing.

Enable status for your pool (example for PHP 8.3 on Ubuntu):

# /etc/php/8.3/fpm/pool.d/www.conf
pm.status_path = /fpm-status
ping.path = /fpm-ping

Expose it only locally via Nginx:

# inside your site server block or a localhost-only server block
location = /fpm-status {
  include snippets/fastcgi-php.conf;
  fastcgi_pass unix:/run/php/php8.3-fpm.sock;
  allow 127.0.0.1;
  deny all;
}

sudo systemctl reload php8.3-fpm
sudo nginx -t && sudo systemctl reload nginx
curl -s http://127.0.0.1/fpm-status

Key fields to watch:

max children reached: requests are queuing; users experience “random” slowness
listen queue and listen queue len: a growing queue means you’re under-provisioned or PHP is stalled
slow requests: enable slowlog to identify scripts

Step 4: Turn on the right logs (without drowning your disk)

You want logs that answer two questions: what waited, and for how long?

Keep logging targeted. Treat it as temporary instrumentation during an investigation.

Nginx: log request time and upstream time

Add a log format that records $request_time and upstream timings.

# /etc/nginx/nginx.conf (http {} block)
log_format timed '$remote_addr - $host "$request" '
                 'status=$status rt=$request_time '
                 'urt=$upstream_response_time uct=$upstream_connect_time '
                 'uht=$upstream_header_time ref="$http_referer" ua="$http_user_agent"';

Apply it to the vhost you’re debugging:

# /etc/nginx/sites-available/example
access_log /var/log/nginx/example.timed.log timed;

sudo nginx -t && sudo systemctl reload nginx
sudo tail -f /var/log/nginx/example.timed.log

Interpretation:

rt high, urt low/empty: Nginx spent time before hitting upstream (slow client upload/reads)
urt high: upstream (PHP-FPM/app) is slow
uct high: upstream connect delay (socket contention, overloaded PHP-FPM, or network upstream)

PHP-FPM slowlog: name the exact script

Enable slowlog briefly—15 to 60 minutes during a known slow window. Turn it back off when you’re done.

# /etc/php/8.3/fpm/pool.d/www.conf
request_slowlog_timeout = 3s
slowlog = /var/log/php8.3-fpm/www-slow.log

sudo systemctl reload php8.3-fpm
sudo tail -f /var/log/php8.3-fpm/www-slow.log

If the slowlog points to WordPress admin-ajax calls, a specific plugin, or repeated outbound API calls, you have a concrete lead. You can reproduce it, then fix it.

Step 5: Fix the common bottlenecks (small changes that show up immediately)

Once you pin slowness to a layer, make one narrow change. Then re-measure.

These fixes most often reduce TTFB on typical VPS-hosted stacks.

Fix A: PHP-FPM pool sizing (stop queueing)

If you see max children reached, the pool is out of workers. Requests stack up in a queue.

Increase capacity carefully. Too many workers can push the VPS into swapping.

Swap usually makes latency worse.

For a 2 GB VPS running a single WordPress site, a common starting point:

# /etc/php/8.3/fpm/pool.d/www.conf
pm = dynamic
pm.max_children = 12
pm.start_servers = 4
pm.min_spare_servers = 2
pm.max_spare_servers = 6
pm.max_requests = 500

Reload and check real memory use per worker:

sudo systemctl reload php8.3-fpm
ps -o pid,rss,cmd -C php-fpm8.3 --sort=-rss | head

Rule of thumb: if each PHP worker uses ~60–120 MB under load, don’t set pm.max_children so high that you force swapping.

Fix B: Nginx buffering and timeouts for PHP upstream

Defaults are a compromise. Some sites need slightly longer timeouts for admin jobs.

Busy frontends often benefit from tighter limits. Base changes on what you saw in logs.

# inside your PHP location {} block
fastcgi_connect_timeout 10s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;

If you’re seeing 504s during long admin operations, raise fastcgi_read_timeout for /wp-admin/ only.

Don’t increase it globally.

Fix C: Stop bot traffic from eating concurrency

Bots don’t need huge bandwidth to hurt performance. They just need enough concurrent requests to hit expensive endpoints.

If you run Nginx, rate-limit login and XML-RPC traffic (or disable XML-RPC if you don’t use it). Follow the pattern in this Nginx rate limiting tutorial. Then watch whether the PHP-FPM queue drops during spikes.

Fix D: Disk IO saturation (reduce write amplification)

If iostat shows high await and %util, requests are waiting on storage.

You can often reduce unnecessary writes quickly:

Turn on access log sampling during incident response instead of logging everything for hours.
Ensure log rotation is working and not compressing massive files at peak times.
Move heavy application debug logs off production levels.

If storage is genuinely too small or too slow for your traffic pattern, the clean fix is faster IO. Upgrading to faster NVMe-backed plans (or moving to a dedicated server for sustained heavy writes) removes that bottleneck.

For predictable performance under load, consider HostMyCode dedicated servers for high-traffic sites.

Step 6: Quick “slow request triage” with one-liners

Use these during an incident to spot patterns fast. They help you find slow requests, failing endpoints, and abusive sources.

Top slow Nginx requests (by request_time)

sudo awk '{for(i=1;i<=NF;i++) if($i ~ /^rt=/){sub("rt=","",$i); print $i}}' \
  /var/log/nginx/example.timed.log | sort -nr | head

Top endpoints hitting 499/502/504

sudo awk '$9 ~ /^(499|502|504)$/{print $7}' /var/log/nginx/access.log \
  | sort | uniq -c | sort -nr | head -20

Who is hammering wp-login.php or xmlrpc.php

sudo awk '$7 ~ /(wp-login\.php|xmlrpc\.php)/{print $1}' /var/log/nginx/access.log \
  | sort | uniq -c | sort -nr | head -20

Step 7: Validate improvements with before/after measurements

After each change, re-run the curl timing test from Step 1. Then capture a short “server snapshot” so you can compare like-for-like.

date
uptime
free -h
sudo vmstat 1 5
sudo iostat -xz 1 5

Keep a small incident log (a plain text file is fine). Note what changed, when you changed it, and which metric improved.

This prevents tuning by guesswork.

Step 8: Prevent a repeat with lightweight guardrails

Once performance is stable, add guardrails to catch regressions before users do:

Daily service reports (auth failures, disk usage, mail warnings) using Logwatch on Ubuntu.
Backups you’ve restored, not just backups you scheduled. Use automated snapshots + offsite copies + restore tests.
A staging workflow for WordPress plugin/theme changes so performance regressions don’t land in production. See WordPress staging on a VPS.

If you keep hitting resource ceilings after cleanup, it’s likely a sizing problem. It’s not an operations problem.

Moving to managed VPS hosting can also help if you want another set of eyes on the stack and its performance signals.

Common pitfalls that waste hours

Restarting services first. It clears symptoms, erases evidence, and hides the pattern.
Changing five settings at once. You won’t know which change helped.
Ignoring IO wait. Low CPU can still mean a struggling server; high IO wait makes slowness feel “random.”
Over-allocating PHP workers. More children can mean more swapping and worse latency.
Leaving debug logging on after the incident. It quietly becomes the next incident.

Summary: the fastest path to a stable, fast site

Troubleshooting slowness is mostly order of operations. Measure TTFB. Map the delay to a layer.

Confirm it with logs and status endpoints. Then make a small change and re-test.

That workflow holds up under pressure.

If your current plan has noisy neighbors or inconsistent storage performance, you’ll get more repeatable results on a HostMyCode VPS.

For sustained high traffic and heavy workloads, stepping up to dedicated servers removes resource contention from the equation.

If you’re troubleshooting a slow production site, start with infrastructure you can trust. HostMyCode VPS plans give you dedicated CPU/RAM and predictable performance for WordPress and PHP stacks. If you’d rather spend your time on the site instead of server upkeep, managed VPS hosting can handle ongoing updates, tuning, and routine operational checks.

FAQ

How do I know if PHP-FPM is the bottleneck?

Check PHP-FPM status for max children reached and a growing listen queue. Then confirm in Nginx logs with high urt= (upstream response time). Together, that’s strong evidence the pool is queueing.

My CPU is low but the site is slow—what next?

Check disk and memory pressure. Run iostat -xz 1 10 for IO wait and vmstat 1 10 for swapping.

High await or swap-in/out will slow PHP and page generation even with low CPU.

Should I increase Nginx worker_processes and worker_connections?

Only after you confirm you’re hitting those limits (errors in logs, stub_status showing saturation). On most VPS setups, upstream capacity (PHP-FPM, database, disk) becomes the bottleneck first.

What’s the safest first change during an incident?

Enable timing logs (request time and upstream time) on the affected vhost. Then turn on PHP-FPM slowlog for a short window. Those changes are low-risk and usually produce usable data quickly.

When is it time to upgrade from a VPS to a dedicated server?

If you consistently hit IO saturation or CPU limits during normal traffic (not just brief spikes), and optimizations don’t reduce p95 latency, a dedicated server is typically the cleanest next step.