
A “slow site” report is rarely one clean issue. It’s a symptom. Common causes include CPU steal, disk I/O wait, PHP-FPM saturation, noisy MySQL queries, or a DNS/SSL mistake that affects only some visitors.
The quickest path back to stable service is a VPS hosting troubleshooting checklist. Start with user impact. Then move down the stack in a fixed order.
This is a runbook-style flow for 2026. Use it at 2 a.m. when a WordPress store is throwing 502s, Nginx access logs look normal, and uptime monitors won’t stop flapping.
Start with a 3-minute triage: what’s broken, for whom, and since when?
Before you touch the VPS, lock in three facts. They keep you from debugging the wrong layer.
- Scope: one URL, one domain, or everything on the VPS?
- Failure mode: slow (TTFB high), error (502/504/500), or hard-down (connection refused/timeouts)?
- Timeline: did it start after a deploy, plugin update, SSL renewal, or traffic spike?
Quick external checks from your laptop:
curl -I https://example.com(look at status, headers, and response time)curl -Iv https://example.com(TLS handshake details; useful for cert chain/SNI problems)dig +short A example.comanddig +short AAAA example.com(confirm DNS returns what you expect)
If failures are intermittent, test from two networks (mobile + office).
A bad IPv6 (AAAA) record can look “random,” because only some clients prefer IPv6.
Confirm the VPS itself is healthy (CPU, RAM, disk, and network)
If the host is struggling, everything above it becomes misleading. Start with these basics.
- Load + CPU saturation:
uptime,toporhtop - Memory pressure:
free -h, and look for swap churn - Disk space:
df -h(a full root partition breaks logging, MySQL temp files, and PHP sessions) - Disk I/O wait:
iostat -xz 1(install viasysstat) - Network errors:
ip -s linkandss -s
Two patterns show up constantly on production VPSes:
- High load with high
wa(I/O wait): the CPU isn’t “busy.” It’s waiting on disk. Think slow queries, logging storms, backups running at peak time, or a saturated volume. - Normal load but users still time out: you’re likely hitting a connection bottleneck (PHP-FPM workers, MySQL max connections, or a proxy timeout). This is different from raw resource exhaustion.
If you’re running into host-level limits and need consistent headroom, a HostMyCode VPS plan with dedicated resources can help.
It reduces “noisy neighbor” variables and gives you a cleaner baseline while troubleshooting.
Diagnose “hard down”: is the service listening and reachable?
A true outage usually comes down to one of three things. The web server stopped. The firewall blocked it. Or the process is running but not listening where you think it is.
- Is the port open locally?
ss -lntp | egrep ':80|:443' - Is the service running?
systemctl status nginxorsystemctl status apache2 - Are you blocked at the firewall? check UFW (
ufw status) or nftables rules (nft list ruleset)
If Nginx/Apache won’t start, don’t guess. Ask the daemon what it’s refusing to load.
- Nginx:
nginx -t(syntax + include errors) - Apache:
apachectl -tandjournalctl -u apache2 -n 200 --no-pager
A classic “it worked yesterday” cause is a failed log write because the disk filled up.
If df -h shows 100% usage on / or /var, free space first. Then restart services.
Fix 502/504 errors by separating web, PHP, and upstream timeouts
Most VPS stacks in 2026 look like: Nginx (or LiteSpeed) → PHP-FPM → MySQL/MariaDB.
A 502/504 usually means the proxy didn’t get a timely response from an upstream.
That upstream is often PHP-FPM, not the web server itself.
First, pinpoint where the timeout happens:
- Nginx error log: typically
/var/log/nginx/error.log - PHP-FPM log: varies by distro/version, commonly
/var/log/php8.3-fpm.logorjournalctl -u php8.3-fpm - App log: WordPress often writes to
wp-content/debug.logif enabled
Then look for saturation and queuing:
- PHP-FPM pool status: check
pm.max_childrenand whether you’re hitting it. Pool config is typically under/etc/php/8.3/fpm/pool.d/www.conf(Debian/Ubuntu). - Active connections:
ss -ant | wc -landss -ant state established '( sport = :443 )' | head
Mid-incident mitigation is fine, but be careful.
You can temporarily raise pm.max_children only if you have free RAM.
Otherwise you’ll swap or trigger OOM kills. That’s worse than a clean 502.
If the kernel starts killing processes, you’ll see it in dmesg -T | tail -n 50.
If you run a reverse proxy in front of an app service, treat timeouts as part of the design.
A too-low proxy_read_timeout converts real backend slowness into noisy 504s.
A too-high timeout can tie up connections long enough to starve workers.
If you want a solid Nginx + TLS proxying baseline to compare against, this guide is a good reference: Nginx SSL reverse proxy configuration guide.
Track down “slow site” complaints with one metric: TTFB
Users say “slow.” You need to decide what “slow” means in your stack.
It could be network latency, TLS handshake time, static delivery, or backend generation time.
Time to first byte (TTFB) is the quickest divider.
- Low TTFB, slow overall: large pages, heavy images, missing compression, or browser-side work.
- High TTFB: backend delay (PHP, database, external APIs, or overloaded workers).
Quick check with curl:
curl -o /dev/null -s -w 'namelookup:%{time_namelookup} connect:%{time_connect} tls:%{time_appconnect} ttfb:%{time_starttransfer} total:%{time_total}\n' https://example.com/
If time_namelookup spikes, you’re likely dealing with DNS (or resolver settings on the VPS).
If TLS time spikes, inspect the certificate chain and stapling. Also confirm you aren’t serving an outdated intermediate.
Database bottlenecks: the quiet cause behind PHP timeouts
On WordPress and many PHP apps, the database is where “random slowness” tends to hide.
One missing index can turn a 20 ms query into a 2-second problem under load.
PHP-FPM backs up, Nginx hits its upstream timeout, and you end up staring at 504s.
Two quick signals:
- MySQL/MariaDB CPU:
topshowsmysqldburning CPU, often with load climbing. - Thread/connection pressure: slow queries cause connection pileups and “too many connections.”
If you don’t already have the slow query log enabled, enable it.
It’s one of the highest-ROI switches you can flip on a VPS.
HostMyCode has a hands-on walkthrough here: MySQL slow query log tutorial.
Also watch disk behavior while the database is under stress.
If the database volume is near capacity, InnoDB can struggle during temp table creation and write bursts.
For database-heavy sites where you want cleaner boundaries, consider HostMyCode database hosting.
It lets the web tier scale independently, while the database sits on storage tuned for endurance and latency.
DNS, SSL, and redirect loops: outages that look like “server problems”
A lot of “server is down” tickets aren’t CPU or RAM issues.
They’re edge misconfigurations that show up as timeouts or “site can’t be reached.”
- DNS points to the wrong IP: common after migrations or when an old AAAA record lingers.
- SSL mismatch: wrong certificate served for a hostname due to missing SNI server block.
- Redirect loops: HTTP→HTTPS rules fighting with WordPress “site URL” settings or proxy headers.
Practical checks:
dig A example.com +shortand compare to your VPS public IP.openssl s_client -connect example.com:443 -servername example.com -showcertsto confirm the served cert chain.curl -I http://example.comandcurl -I https://example.comto see redirect paths.
If you move projects between servers often, centralizing DNS helps during cutovers.
HostMyCode domains and DNS can reduce “which provider holds this record?” confusion while you’re mid-migration.
Disk-full incidents: the most preventable downtime on a VPS
Disk-full failures cascade fast. Logs stop writing. Databases can’t create temp files. Services refuse to restart.
The upside is that most disk incidents are preventable once you treat log growth and backups as real operational risks.
When space is tight, find the biggest offenders quickly:
du -xhd1 /var | sort -hjournalctl --disk-usage(systemd journal can grow quietly)find /var/log -type f -size +200M -printf '%p %s\n' | sort -n | tail
Then add guardrails.
If logrotate exists but isn’t catching custom app logs, tune it and validate with a dry run.
This guide covers a reliable baseline: log rotation setup with logrotate and systemd.
Email alerts, contact forms, and “my site can’t send mail” incidents
Website mail is deceptively messy. A working PHP mail() call doesn’t mean messages land in inboxes.
In 2026, providers rate-limit aggressively. They also junk mail that lacks aligned DNS and a clean sending identity.
If WordPress password resets and WooCommerce receipts stopped arriving, check:
- DNS auth: SPF, DKIM, DMARC
- Reverse DNS: rDNS matches your sending hostname
- Queue growth: Postfix queue filling indicates upstream blocks or auth failures
HostMyCode’s deliverability checklist is a solid reference: VPS email deliverability checklist for 2026.
If email is business-critical, treat sending as its own system with monitoring and clear DNS ownership.
“It worked on staging” doesn’t survive real-world reputation and rate limits.
Incidents after a migration: the 6 things that break most often
Migrations usually don’t fail because rsync missed files.
They fail because one small dependency stayed pointed at the old environment.
- DNS TTL wasn’t lowered: users keep hitting the old IP for hours.
- Firewall rules differ: new VPS blocks SMTP, Redis, or SSH from your office IP.
- Missing PHP extensions: the app loads but breaks under specific paths.
- Different default PHP settings:
upload_max_filesize,memory_limit,max_execution_time. - Wrong file ownership/permissions: cache and upload directories become unwritable.
- SSL automation not installed: cert renewals fail or the wrong vhost serves traffic.
If you want a battle-tested sequence for near-zero downtime moves, use this as a runbook: VPS migration checklist.
Make your troubleshooting repeatable: lightweight monitoring beats guesswork
In the middle of an incident, “what changed?” matters more than “what do I see right now?”
Baseline monitoring belongs in the troubleshooting toolkit because it answers that question quickly.
At minimum, collect:
- Host metrics: CPU, load, RAM, swap, disk space, disk I/O wait
- Service checks: HTTP 200 on key endpoints, TLS expiry, DNS resolution
- Logs: Nginx/Apache errors, PHP-FPM errors, database errors
This HostMyCode post lays out a useful metric set and alert thresholds that map to real outages: Linux VPS performance monitoring in 2026.
You don’t need an elaborate stack to get value.
Alerts for “disk 90%,” “load 2× baseline,” and “HTTP 5xx spike” will cut downtime dramatically.
Summary: the order that saves time (and prevents repeat incidents)
A good VPS hosting troubleshooting checklist feels boring because it’s consistent.
Confirm VPS health. Verify services are reachable. Then isolate 502/504s across web, PHP, and database.
Only after that should you start tweaking configs.
If every incident feels ambiguous because the underlying platform is unpredictable, consider moving to infrastructure with clearer baselines and reachable support.
Start with a managed VPS hosting plan for production sites that need fast response, or deploy on a HostMyCode VPS when you want full control without sacrificing reliability.
If you run client sites or revenue-critical WordPress on a VPS, small outages add up fast. HostMyCode offers managed VPS hosting if you want a stable stack, predictable updates, and help during incidents, plus flexible HostMyCode VPS plans when you prefer to self-manage.
FAQ
What’s the fastest way to tell if a 502 is Nginx or PHP-FPM?
Check /var/log/nginx/error.log for “upstream” errors. Then check journalctl -u php8.3-fpm (or your PHP-FPM unit) for worker exhaustion, crashes, or slow script warnings.
My VPS load is high, but CPU usage is low. Why?
Look for high I/O wait (wa) and slow storage. Use iostat -xz 1. Database writes, backups, or log spikes can inflate load while the CPU mostly waits on disk.
How do I confirm DNS isn’t causing “random” outages?
Run dig A and dig AAAA from two networks. Confirm both point to valid, reachable IPs. A stale AAAA record is a common cause of intermittent failures for IPv6-preferred clients.
What should I do first if the disk is full?
Free space safely (large logs, old backups, runaway journals), then restart affected services. After recovery, fix the cause with logrotate rules, retention policies, and monitoring alerts.
When is it time to move from a VPS to a dedicated server?
If you consistently hit CPU/RAM limits, need higher sustained I/O, or want strict isolation for multiple high-traffic sites, a dedicated server removes contention and simplifies capacity planning.