Reverse Proxy & Load Balancing with Nginx

Why This Matters

In the real world, almost no production application is served directly by its application process. Instead, a reverse proxy sits in front, handling tasks that the application should not care about: TLS termination, load balancing, caching, compression, rate limiting, and connection management.

Here is a scenario you will encounter: your team runs three instances of a Node.js API on ports 3001, 3002, and 3003. Users should hit a single URL (https://api.example.com). If one instance crashes, traffic should seamlessly go to the other two. Response times are slow, so you want to cache certain endpoints. And you need to add rate limiting to prevent abuse. Nginx handles all of this with a few dozen lines of configuration.

This chapter covers the patterns that power nearly every production web deployment.

Try This Right Now

Let us set up a minimal reverse proxy. First, start a simple backend (Python's built-in HTTP server works perfectly):

# Terminal 1: Start a backend on port 8001
$ mkdir -p /tmp/backend1 && echo "Hello from Backend 1" > /tmp/backend1/index.html
$ cd /tmp/backend1 && python3 -m http.server 8001 &

# Terminal 2: Create an Nginx reverse proxy config
$ sudo tee /etc/nginx/sites-available/proxy-demo > /dev/null << 'EOF'
server {
    listen 80;
    server_name proxy-demo.local;

    location / {
        proxy_pass http://127.0.0.1:8001;
    }
}
EOF

$ sudo ln -sf /etc/nginx/sites-available/proxy-demo /etc/nginx/sites-enabled/
$ sudo nginx -t && sudo systemctl reload nginx

# Test it
$ curl -H "Host: proxy-demo.local" http://localhost
Hello from Backend 1

That single proxy_pass line turned Nginx into a reverse proxy. The client talks to Nginx; Nginx talks to the backend.

What Is a Reverse Proxy?

A forward proxy sits in front of clients (like a corporate proxy that employees use to access the internet). A reverse proxy sits in front of servers, and the client usually does not know it exists.

┌──────────────────────────────────────────────────────────────┐
│                     Forward Proxy                             │
│                                                              │
│  ┌────────┐    ┌───────────┐    ┌────────────┐              │
│  │ Client ├───>│  Proxy    ├───>│  Internet  │              │
│  │        │    │ (client's │    │  Servers   │              │
│  │        │<───┤  side)    │<───┤            │              │
│  └────────┘    └───────────┘    └────────────┘              │
│  Client KNOWS about the proxy                                │
├──────────────────────────────────────────────────────────────┤
│                     Reverse Proxy                             │
│                                                              │
│  ┌────────┐    ┌───────────┐    ┌────────────┐              │
│  │ Client ├───>│  Nginx    ├───>│  Backend   │              │
│  │        │    │ (server's │    │  App(s)    │              │
│  │        │<───┤  side)    │<───┤            │              │
│  └────────┘    └───────────┘    └────────────┘              │
│  Client has NO idea the backend exists                       │
└──────────────────────────────────────────────────────────────┘

Why use a reverse proxy?

Security -- the backend is never directly exposed to the internet
TLS termination -- Nginx handles HTTPS; the backend speaks plain HTTP
Load balancing -- distribute traffic across multiple backends
Caching -- serve cached responses without hitting the backend
Compression -- Nginx compresses responses, saving backend CPU
Connection management -- Nginx handles thousands of client connections while maintaining only a few to the backend
Rate limiting -- protect backends from abuse

The proxy_pass Directive

proxy_pass is the heart of Nginx's reverse proxy functionality.

location / {
    proxy_pass http://127.0.0.1:8001;
}

URI Handling: Trailing Slash Matters

This is one of the most common Nginx gotchas:

# WITHOUT trailing slash in proxy_pass:
# Request: /api/users
# Proxied to: http://backend:8001/api/users  (path preserved)
location /api/ {
    proxy_pass http://backend:8001;
}

# WITH trailing slash in proxy_pass:
# Request: /api/users
# Proxied to: http://backend:8001/users  (location prefix stripped!)
location /api/ {
    proxy_pass http://backend:8001/;
}

The rule: if proxy_pass has a URI component (even just /), Nginx replaces the matched location prefix with that URI. If it has no URI, the full original path is forwarded.

Essential Proxy Headers

When Nginx proxies a request, the backend loses information about the original client. You need to pass it along with headers:

location / {
    proxy_pass http://127.0.0.1:8001;

    # Pass the original Host header
    proxy_set_header Host $host;

    # Pass the real client IP
    proxy_set_header X-Real-IP $remote_addr;

    # Pass the chain of proxies
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # Tell the backend if the original request was HTTPS
    proxy_set_header X-Forwarded-Proto $scheme;

    # Timeouts
    proxy_connect_timeout 5s;       # Time to establish connection to backend
    proxy_send_timeout 10s;         # Time to send request to backend
    proxy_read_timeout 30s;         # Time to read response from backend
}

Without proxy_set_header Host, the backend receives Host: 127.0.0.1:8001 instead of Host: api.example.com. Without X-Real-IP, the backend sees all requests coming from Nginx's IP instead of the real client.

Think About It: Why is X-Forwarded-For a chain of IPs rather than a single IP? What happens when there are multiple proxies in front of the backend?

Upstream Blocks and Load Balancing

An upstream block defines a group of backend servers that Nginx can distribute traffic across.

Basic Round-Robin Load Balancing

upstream backend_pool {
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://backend_pool;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

By default, Nginx uses round-robin: requests go to each backend in order (1, 2, 3, 1, 2, 3, ...).

Hands-On: Set Up Load Balancing

Let us create a realistic multi-backend setup:

# Start three backends
$ for port in 8001 8002 8003; do
    dir="/tmp/backend${port}"
    mkdir -p "$dir"
    echo "Response from backend on port ${port}" > "$dir/index.html"
    cd "$dir" && python3 -m http.server "$port" &
  done

# Verify they are running
$ curl http://localhost:8001
$ curl http://localhost:8002
$ curl http://localhost:8003

# /etc/nginx/sites-available/loadbalancer
upstream app_backends {
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
}

server {
    listen 80;
    server_name lb.local;

    location / {
        proxy_pass http://app_backends;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

$ sudo ln -sf /etc/nginx/sites-available/loadbalancer /etc/nginx/sites-enabled/
$ sudo nginx -t && sudo systemctl reload nginx

# Send 6 requests and watch round-robin in action
$ for i in $(seq 1 6); do
    curl -s -H "Host: lb.local" http://localhost
  done

Expected output:

Response from backend on port 8001
Response from backend on port 8002
Response from backend on port 8003
Response from backend on port 8001
Response from backend on port 8002
Response from backend on port 8003

Load Balancing Algorithms

Round-Robin (Default)

Distributes requests evenly in order. Simple and effective when backends have equal capacity.

upstream backend {
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

Weighted Round-Robin

Give more traffic to more powerful servers:

upstream backend {
    server 10.0.1.10:8001 weight=5;    # Gets 5x the traffic
    server 10.0.1.11:8001 weight=3;    # Gets 3x the traffic
    server 10.0.1.12:8001 weight=1;    # Gets 1x the traffic
}

Out of every 9 requests: 5 go to server 1, 3 go to server 2, 1 goes to server 3.

Least Connections

Send the request to the backend with the fewest active connections. Best when request processing time varies:

upstream backend {
    least_conn;
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

IP Hash

Always send the same client IP to the same backend. Useful for applications that store session state locally:

upstream backend {
    ip_hash;
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

Safety Warning: ip_hash provides "sticky sessions" but has drawbacks. If one backend fails, all sessions pinned to it are disrupted. And if many users share an IP (corporate NAT), one backend gets overloaded. Prefer stateless application design when possible.

Hash (Generic)

Hash on any variable for consistent routing:

upstream backend {
    hash $request_uri consistent;     # Same URL always goes to same backend
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

The consistent keyword uses a consistent hashing ring, which minimizes redistribution when backends are added or removed.

Comparison

┌──────────────────────────────────────────────────────────────┐
│              Load Balancing Algorithm Comparison               │
├────────────────┬─────────────────────────────────────────────┤
│ Round-Robin    │ Simple, even distribution, best default      │
│ Weighted       │ When backends have different capacities      │
│ Least Conn     │ When request duration varies widely          │
│ IP Hash        │ When sessions must be sticky (not ideal)     │
│ Hash URI       │ When caching per-URL on specific backends    │
└────────────────┴─────────────────────────────────────────────┘

Health Checks and Failure Detection

Nginx has built-in passive health checks. If a backend fails, Nginx temporarily removes it from the pool.

Passive Health Checks (Open Source Nginx)

upstream backend {
    server 10.0.1.10:8001 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8001 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:8001 max_fails=3 fail_timeout=30s;
}

max_fails=3 -- after 3 failed requests, mark the server as down
fail_timeout=30s -- keep it marked as down for 30 seconds, then try again

What counts as a failure? A connection timeout, a connection refused, or an error response (configurable with proxy_next_upstream):

location / {
    proxy_pass http://backend;
    proxy_next_upstream error timeout http_502 http_503;
    proxy_next_upstream_tries 2;      # Try at most 2 other backends
    proxy_next_upstream_timeout 10s;  # Give up after 10s total
}

Marking a Server as Down

upstream backend {
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001 down;       # Temporarily removed from rotation
    server 10.0.1.13:8001 backup;     # Only used if all others are down
}

Think About It: Why is passive health checking imperfect? (Answer: it only detects failure when a real user request fails. Active health checks -- available in Nginx Plus or HAProxy -- proactively probe backends so failures are detected before any user is affected.)

Caching with proxy_cache

Nginx can cache backend responses, dramatically reducing backend load and improving response times.

# Define a cache zone in the http context
http {
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m
                     max_size=1g inactive=60m use_temp_path=off;
}

Parameters explained:

/var/cache/nginx -- where cache files are stored on disk
levels=1:2 -- two-level directory structure (prevents too many files in one dir)
keys_zone=my_cache:10m -- 10 MB of shared memory for cache keys
max_size=1g -- maximum total cache size on disk
inactive=60m -- remove items not accessed for 60 minutes

Using the Cache

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://backend_pool;
        proxy_cache my_cache;
        proxy_cache_valid 200 10m;         # Cache 200 responses for 10 min
        proxy_cache_valid 404 1m;          # Cache 404 responses for 1 min
        proxy_cache_use_stale error timeout updating http_500 http_502;

        # Add header so you can see cache status
        add_header X-Cache-Status $upstream_cache_status;
    }
}

The X-Cache-Status header tells you whether the response came from cache:

$ curl -I -H "Host: api.example.com" http://localhost
X-Cache-Status: MISS       # First request, not cached yet

$ curl -I -H "Host: api.example.com" http://localhost
X-Cache-Status: HIT        # Served from cache!

Possible values: MISS, HIT, EXPIRED, STALE, UPDATING, BYPASS.

Bypassing the Cache

Sometimes you need to skip the cache:

location / {
    proxy_pass http://backend_pool;
    proxy_cache my_cache;

    # Don't cache POST requests
    proxy_cache_methods GET HEAD;

    # Bypass cache if the client sends a specific header
    proxy_cache_bypass $http_x_no_cache;

    # Don't cache if backend says not to
    proxy_no_cache $http_set_cookie;
}

WebSocket Proxying

WebSocket connections start as HTTP and then upgrade to a persistent bidirectional connection. Nginx can proxy them, but you need to handle the upgrade:

location /ws/ {
    proxy_pass http://websocket_backend;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;

    # WebSocket connections are long-lived
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;
}

The critical lines are Upgrade and Connection -- without them, the WebSocket handshake fails and you get a 400 error.

┌──────────────────────────────────────────────────────────────┐
│                   WebSocket Proxy Flow                        │
│                                                              │
│  Client                  Nginx                 Backend       │
│    │                       │                      │          │
│    │── GET /ws/ HTTP/1.1 ──>│                      │          │
│    │   Upgrade: websocket  │── GET /ws/ HTTP/1.1 ──>│         │
│    │   Connection: Upgrade │   Upgrade: websocket  │          │
│    │                       │   Connection: upgrade │          │
│    │                       │                      │          │
│    │<── 101 Switching ─────│<── 101 Switching ────│          │
│    │                       │                      │          │
│    │<═══ Bidirectional ════>│<═══ Bidirectional ═══>│         │
│    │     WebSocket data    │     WebSocket data    │          │
└──────────────────────────────────────────────────────────────┘

SSL/TLS Termination

SSL termination means Nginx handles HTTPS while talking to backends over plain HTTP. This simplifies backend configuration and consolidates certificate management.

server {
    listen 443 ssl;
    server_name api.example.com;

    ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;

    # Modern TLS configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;

    # HSTS (tell browsers to always use HTTPS)
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    location / {
        proxy_pass http://backend_pool;           # Plain HTTP to backend
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;   # Tells backend "this was HTTPS"
    }
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name api.example.com;
    return 301 https://$host$request_uri;
}

Rate Limiting

Protect your backends from abuse or accidental overload:

# Define rate limit zones in the http context
http {
    # 10 requests per second per client IP, 10MB zone
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

    # Connection limit: max 20 simultaneous connections per IP
    limit_conn_zone $binary_remote_addr zone=conn_limit:10m;
}

server {
    listen 80;
    server_name api.example.com;

    location /api/ {
        # Allow bursts of 20 requests, then enforce the rate
        limit_req zone=api_limit burst=20 nodelay;
        limit_conn conn_limit 20;

        # Return 429 instead of 503 when rate limited
        limit_req_status 429;
        limit_conn_status 429;

        proxy_pass http://backend_pool;
    }
}

rate=10r/s -- 10 requests per second per client IP
burst=20 -- allow up to 20 requests to queue up
nodelay -- process burst requests immediately rather than throttling them

Hands-On: Testing Rate Limiting

# Send 30 rapid requests
$ for i in $(seq 1 30); do
    curl -s -o /dev/null -w "%{http_code} " -H "Host: api.example.com" http://localhost/api/
  done
echo ""

# You should see:
# 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 429 429 429 429 429 429 429 429 429 429

Practical: Complete Multi-Backend Setup

Here is a production-style configuration that ties everything together:

# /etc/nginx/sites-available/production-app

upstream api_backends {
    least_conn;
    server 10.0.1.10:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:3000 max_fails=3 fail_timeout=30s;
}

upstream websocket_backends {
    ip_hash;
    server 10.0.1.10:3001;
    server 10.0.1.11:3001;
    server 10.0.1.12:3001;
}

# Cache zone
proxy_cache_path /var/cache/nginx/api levels=1:2 keys_zone=api_cache:10m
                 max_size=500m inactive=30m;

# Rate limiting
limit_req_zone $binary_remote_addr zone=api_rate:10m rate=20r/s;

server {
    listen 443 ssl http2;
    server_name app.example.com;

    # TLS
    ssl_certificate /etc/letsencrypt/live/app.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/app.example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header Strict-Transport-Security "max-age=31536000" always;
    server_tokens off;

    # Static files (served directly by Nginx)
    location /static/ {
        alias /var/www/app/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;
    }

    # API endpoints (proxied, cached, rate-limited)
    location /api/ {
        limit_req zone=api_rate burst=40 nodelay;

        proxy_pass http://api_backends;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Cache GET requests
        proxy_cache api_cache;
        proxy_cache_valid 200 5m;
        proxy_cache_methods GET HEAD;
        proxy_cache_bypass $http_authorization;
        add_header X-Cache-Status $upstream_cache_status;

        # Retry on backend failure
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 2;
    }

    # WebSocket endpoint
    location /ws/ {
        proxy_pass http://websocket_backends;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 3600s;
    }

    # Health check endpoint (no proxy, instant response)
    location = /health {
        access_log off;
        return 200 "OK\n";
    }
}

# HTTP -> HTTPS redirect
server {
    listen 80;
    server_name app.example.com;
    return 301 https://$host$request_uri;
}

Debug This

Users report intermittent 502 Bad Gateway errors. The application team says all backends are healthy.

# Step 1: Check Nginx error log
$ sudo tail -50 /var/log/nginx/error.log | grep 502
upstream prematurely closed connection while reading response header

# Step 2: Check backend connectivity from the Nginx server
$ curl http://10.0.1.10:3000/health
curl: (7) Failed to connect to 10.0.1.10 port 3000: Connection refused

# Step 3: Check upstream response times in access log
$ awk '{print $NF}' /var/log/nginx/access.log | sort -n | tail -20
# If you see "urt=30.000" values, backends are timing out

# Step 4: Check if it's a specific backend
$ grep "502" /var/log/nginx/error.log | grep -oP 'upstream: "\K[^"]+' | sort | uniq -c

Common causes:

Backend process crashed or is not listening
Backend is too slow and Nginx's proxy_read_timeout is exceeded
Backend closes the connection before sending a response (keepalive mismatch)
OS-level connection limits reached (check ulimit -n and ss -s)

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                     Chapter 45 Recap                          │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  A reverse proxy sits in front of backends, handling TLS,    │
│  load balancing, caching, and rate limiting.                  │
│                                                              │
│  Key directives:                                              │
│  - proxy_pass: forward requests to a backend                  │
│  - upstream { }: define a pool of backend servers             │
│  - proxy_cache: cache backend responses                       │
│  - limit_req: rate-limit requests                             │
│                                                              │
│  Load balancing algorithms:                                   │
│  - round-robin (default), weighted, least_conn, ip_hash      │
│                                                              │
│  Always set these proxy headers:                              │
│  - Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto      │
│                                                              │
│  For WebSockets: set Upgrade and Connection headers           │
│  For TLS: terminate at Nginx, plain HTTP to backends          │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Weighted Load Balancing

Start three backends, give them different weights (5, 3, 1). Send 90 requests and count how many each backend receives. Does it match the expected ratio?

Exercise 2: Failure Detection

Set up three backends with max_fails=2 fail_timeout=15s. Kill one backend. Send requests and observe that Nginx stops sending traffic to the dead backend. Start it again and verify it rejoins the pool.

Exercise 3: Caching

Configure proxy_cache for a backend. Send the same request five times. Use X-Cache-Status header to verify the first request is a MISS and subsequent requests are HITs. Then send curl -H "Cache-Control: no-cache" and observe a BYPASS.

Exercise 4: Rate Limiting

Set a rate limit of 2 requests per second with burst=5. Use a bash loop to send 20 rapid requests. Count how many succeed (200) and how many are rate-limited (429).

Bonus Challenge

Set up Nginx as a reverse proxy for two different applications on the same domain: /api/ goes to a Node.js (or Python) backend, and / serves a static React/Vue build. Add caching for the API and long-lived cache headers for static assets. This is the most common production pattern you will encounter.

What Comes Next

Nginx is excellent for many scenarios, but it is not the only option. The next chapter covers Apache -- the original web server that is still widely used -- and helps you understand when to choose one over the other.

Linux Book: From First Boot to Production