
Auto Scaling Architecture Overview
Linux VPS auto scaling creates responsive infrastructure that grows and shrinks based on actual demand. You deploy additional server instances automatically when CPU usage hits 75%. The system then terminates them when load drops below 30% for ten minutes.
This tutorial builds a complete auto scaling system using three core components:
- Ansible for server provisioning and configuration management
- NGINX as the load balancer distributing traffic across instances
- Custom monitoring scripts that trigger scaling decisions
The setup handles WordPress sites, Node.js applications, or any web workload. HostMyCode VPS instances provide the consistent performance needed for reliable auto scaling behavior.
Prerequisites and Environment Setup
You need three Ubuntu 24.04 servers to start:
- Load balancer node (2 GB RAM minimum)
- Primary application server (baseline capacity)
- Template server for cloning new instances
Install Ansible on your control machine:
sudo apt update
sudo apt install ansible python3-pip
pip3 install ansible-runner requests
Create the project structure:
mkdir vps-autoscaler
cd vps-autoscaler
mkdir -p playbooks inventory group_vars scripts templates
Your hosting provider API credentials go in group_vars/all.yml. This tutorial assumes you can programmatically create and destroy VPS instances through REST calls.
NGINX Load Balancer Configuration
Configure NGINX to distribute traffic across your server pool. Create templates/nginx-upstream.conf.j2:
upstream app_servers {
least_conn;
{% for server in active_servers %}
server {{ server.ip }}:80 max_fails=3 fail_timeout=30s;
{% endfor %}
server {{ primary_server_ip }}:80 backup;
}
server {
listen 80;
server_name {{ domain_name }};
location / {
proxy_pass http://app_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 5s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
}
location /health {
access_log off;
return 200 "healthy";
add_header Content-Type text/plain;
}
}
The least_conn directive sends new requests to the server handling the fewest active connections. Failed servers get marked unavailable for 30 seconds after three consecutive failures.
Deploy this configuration with the playbook playbooks/update-loadbalancer.yml:
- hosts: loadbalancer
tasks:
- name: Generate NGINX upstream config
template:
src: nginx-upstream.conf.j2
dest: /etc/nginx/sites-available/app
notify: reload nginx
- name: Enable site configuration
file:
src: /etc/nginx/sites-available/app
dest: /etc/nginx/sites-enabled/app
state: link
notify: reload nginx
handlers:
- name: reload nginx
systemd:
name: nginx
state: reloaded
Server Provisioning Playbook
Ansible handles the entire server lifecycle. Create playbooks/provision-server.yml to standardize new instance setup:
- hosts: localhost
tasks:
- name: Create new VPS instance
uri:
url: "{{ provider_api_url }}/servers"
method: POST
headers:
Authorization: "Bearer {{ api_token }}"
body_format: json
body:
name: "app-{{ ansible_date_time.epoch }}"
image: "ubuntu-24.04"
size: "{{ server_size }}"
region: "{{ server_region }}"
register: new_server
- name: Wait for server to boot
uri:
url: "{{ provider_api_url }}/servers/{{ new_server.json.id }}"
headers:
Authorization: "Bearer {{ api_token }}"
register: server_status
until: server_status.json.status == "active"
retries: 30
delay: 10
- name: Add server to inventory
add_host:
hostname: "{{ new_server.json.networks.v4[0].ip_address }}"
groups: new_servers
server_id: "{{ new_server.json.id }}"
- hosts: new_servers
tasks:
- name: Wait for SSH to be available
wait_for_connection:
timeout: 300
- name: Install application dependencies
apt:
name:
- nginx
- nodejs
- npm
- htop
state: present
update_cache: yes
- name: Deploy application code
synchronize:
src: "{{ app_source_path }}/"
dest: /var/www/app/
notify: restart app
- name: Configure application service
template:
src: app.service.j2
dest: /etc/systemd/system/app.service
notify:
- reload systemd
- restart app
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart app
systemd:
name: app
state: restarted
enabled: yes
This playbook creates a server through your hosting provider's API. It waits for the server to become available. Then it configures the complete application stack.
Resource Monitoring Script
Create scripts/monitor-load.py to collect metrics from all servers and make scaling decisions:
#!/usr/bin/env python3
import requests
import subprocess
import json
import time
from datetime import datetime, timedelta
class AutoScaler:
def __init__(self, config_file):
with open(config_file, 'r') as f:
self.config = json.load(f)
self.metrics_history = []
def get_server_metrics(self, server_ip):
try:
cmd = f"ssh {server_ip} 'cat /proc/loadavg; free -m; df -h /'"
result = subprocess.run(cmd.split(), capture_output=True, text=True, timeout=10)
lines = result.stdout.strip().split('\n')
load_avg = float(lines[0].split()[0])
mem_line = [l for l in lines if 'Mem:' in l][0].split()
mem_used_percent = (int(mem_line[2]) / int(mem_line[1])) * 100
return {
'timestamp': datetime.now(),
'load_avg': load_avg,
'memory_percent': mem_used_percent,
'server_ip': server_ip
}
except Exception as e:
print(f"Failed to get metrics from {server_ip}: {e}")
return None
def should_scale_up(self):
if len(self.metrics_history) < 3:
return False
recent_metrics = self.metrics_history[-3:]
avg_load = sum(m['load_avg'] for m in recent_metrics) / len(recent_metrics)
return avg_load > self.config['scale_up_threshold']
def should_scale_down(self):
if len(self.metrics_history) < 6:
return False
recent_metrics = self.metrics_history[-6:]
avg_load = sum(m['load_avg'] for m in recent_metrics) / len(recent_metrics)
return avg_load < self.config['scale_down_threshold']
def scale_up(self):
print("Scaling up - creating new server")
cmd = ['ansible-playbook', 'playbooks/provision-server.yml']
subprocess.run(cmd)
time.sleep(30)
self.update_load_balancer()
def scale_down(self):
if len(self.config['active_servers']) <= self.config['min_servers']:
return
server_to_remove = self.config['active_servers'][-1]
print(f"Scaling down - removing {server_to_remove}")
cmd = ['ansible-playbook', 'playbooks/decommission-server.yml',
'-e', f'target_server={server_to_remove}']
subprocess.run(cmd)
self.update_load_balancer()
def update_load_balancer(self):
cmd = ['ansible-playbook', 'playbooks/update-loadbalancer.yml']
subprocess.run(cmd)
The monitoring script checks server load every 30 seconds. It triggers scale-up when average load exceeds 2.0 across three consecutive checks. It triggers scale-down when load stays below 0.5 for six consecutive checks.
Configuration Management
Create group_vars/all.yml for your scaling parameters:
---
api_token: "your-hosting-provider-token"
provider_api_url: "https://api.yourprovider.com/v1"
server_size: "s-2vcpu-4gb"
server_region: "nyc1"
min_servers: 2
max_servers: 8
scale_up_threshold: 2.0
scale_down_threshold: 0.5
domain_name: "your-app.com"
primary_server_ip: "10.0.1.100"
app_source_path: "/home/deploy/app"
The inventory/hosts file tracks your current server pool:
[loadbalancer]
10.0.1.10
[app_servers]
10.0.1.100
10.0.1.101
[all:vars]
ansible_user=root
ansible_ssh_private_key_file=~/.ssh/vps_key
Update the inventory file each time servers are added or removed. The monitoring script handles this automatically.
Health Checks and Failover
NGINX performs application-level health checks every 30 seconds. Configure more sophisticated health monitoring in playbooks/setup-health-checks.yml:
- hosts: app_servers
tasks:
- name: Create health check endpoint
copy:
content: |
#!/usr/bin/env python3
import json
import subprocess
import sys
def check_app_health():
try:
# Check if application process is running
result = subprocess.run(['pgrep', '-f', 'node.*app'],
capture_output=True, text=True)
if not result.stdout.strip():
return False
# Check database connectivity
db_check = subprocess.run(['nc', '-z', 'localhost', '3306'],
capture_output=True)
return db_check.returncode == 0
except:
return False
if __name__ == '__main__':
healthy = check_app_health()
if healthy:
print(json.dumps({'status': 'healthy', 'timestamp': time.time()}))
sys.exit(0)
else:
print(json.dumps({'status': 'unhealthy', 'timestamp': time.time()}))
sys.exit(1)
dest: /usr/local/bin/health-check
mode: '0755'
- name: Configure health check service
copy:
content: |
[Unit]
Description=Application Health Check
[Service]
Type=oneshot
ExecStart=/usr/local/bin/health-check
[Install]
WantedBy=multi-user.target
dest: /etc/systemd/system/health-check.service
- name: Configure health check timer
copy:
content: |
[Unit]
Description=Run health check every 30 seconds
Requires=health-check.service
[Timer]
OnCalendar=*:*:0/30
Persistent=true
[Install]
WantedBy=timers.target
dest: /etc/systemd/system/health-check.timer
- name: Enable and start health check timer
systemd:
name: health-check.timer
enabled: yes
state: started
daemon_reload: yes
Servers that fail health checks get automatically removed from the load balancer rotation within 30 seconds. The auto scaling system detects reduced capacity and provisions replacement instances.
Automated Deployment Integration
Integrate scaling with your deployment pipeline. Create playbooks/rolling-deployment.yml:
- hosts: app_servers
serial: 1
tasks:
- name: Remove server from load balancer
uri:
url: "http://{{ loadbalancer_ip }}/admin/upstream/{{ inventory_hostname }}"
method: POST
body: "down=1"
delegate_to: localhost
- name: Wait for connections to drain
wait_for:
timeout: 30
- name: Deploy new application version
synchronize:
src: "{{ app_source_path }}/"
dest: /var/www/app/
delete: yes
notify: restart app
- name: Verify application health
uri:
url: "http://{{ inventory_hostname }}/health"
register: health_check
until: health_check.status == 200
retries: 10
delay: 5
- name: Add server back to load balancer
uri:
url: "http://{{ loadbalancer_ip }}/admin/upstream/{{ inventory_hostname }}"
method: POST
body: "down=0"
delegate_to: localhost
handlers:
- name: restart app
systemd:
name: app
state: restarted
This approach deploys to one server at a time. It ensures zero-downtime updates even during scaling events.
Cost Optimization and Monitoring
Track auto scaling costs and performance. Add cost monitoring to scripts/cost-tracker.py:
#!/usr/bin/env python3
import json
import requests
from datetime import datetime, timedelta
def calculate_hourly_costs(servers, hourly_rate=0.024):
total_cost = 0
for server in servers:
uptime_hours = (datetime.now() - server['created']).total_seconds() / 3600
server_cost = uptime_hours * hourly_rate
total_cost += server_cost
print(f"Server {server['name']}: ${server_cost:.2f}")
return total_cost
def get_performance_metrics():
# Collect response times, throughput, error rates
metrics = {
'avg_response_time': get_avg_response_time(),
'requests_per_second': get_rps(),
'error_rate': get_error_rate()
}
return metrics
def optimize_instance_sizes():
# Analyze resource utilization patterns
# Recommend right-sizing opportunities
pass
Run this script hourly to track spending patterns. Identify optimization opportunities by looking for servers consistently running below 20% CPU utilization. These indicate oversized instances.
For production deployments, managed VPS hosting provides the monitoring, backup, and support infrastructure needed for reliable auto scaling operations.
Testing and Validation
Test your auto scaling setup with controlled load:
# Install load testing tools
sudo apt install apache2-utils wrk
# Generate baseline load
ab -n 10000 -c 50 http://your-app.com/
# Monitor scaling decisions
tail -f /var/log/autoscaler.log
# Verify new servers join load balancer
curl http://loadbalancer-ip/nginx_status
Gradually increase concurrent connections until you trigger scale-up events. Verify that new servers appear in the NGINX upstream configuration within two minutes.
Test scale-down by stopping the load generator. Excess servers should terminate after the configured cooldown period (typically 10 minutes).
Common Troubleshooting Steps
Debug scaling issues systematically:
Servers won't scale up:
- Check API credentials and quota limits
- Verify load threshold calculations in monitoring logs
- Ensure Ansible playbooks have correct permissions
New servers don't receive traffic:
- Confirm health checks pass on new instances
- Check NGINX upstream configuration updates
- Verify firewall rules allow load balancer connections
Scaling oscillation (rapid up/down cycles):
- Increase cooldown periods between scaling decisions
- Adjust load thresholds to create wider decision bands
- Review metric collection frequency and averaging windows
The performance monitoring guide covers additional metrics collection and alerting strategies that complement auto scaling.
Ready to implement auto scaling for your applications? HostMyCode VPS instances provide the consistent performance and API access needed for reliable auto scaling. Our managed VPS hosting includes monitoring, automated backups, and 24/7 support to keep your scaled infrastructure running smoothly.
Frequently Asked Questions
Q: How quickly can auto scaling respond to traffic spikes?
A: New servers typically provision and join the load balancer pool within 2-3 minutes. For faster response, maintain warm standby instances that activate immediately.
Q: What's the minimum server count for reliable auto scaling?
A: Keep at least 2 baseline servers running at all times. This prevents single points of failure and provides immediate capacity for traffic handling during scaling events.
Q: Can auto scaling work with stateful applications?
A: Yes, but you need session affinity or external session storage. Configure NGINX with ip_hash directive or use Redis/database-backed sessions to maintain state across scaled instances.
Q: How do you prevent runaway scaling costs?
A: Set strict max_servers limits, implement cost alerts, and use proper cooldown periods. Monitor your scaling patterns weekly and adjust thresholds based on actual traffic patterns.
Q: What happens if the monitoring script fails?
A: Run multiple monitoring instances with leader election, or configure external monitoring services. Always design scaling systems to fail safely by maintaining minimum capacity.