HTTP Protocol Essentials
Why This Matters
Every time you open a browser, call an API, download a package with apt or dnf, or deploy a web application, you are using HTTP. It is the protocol that glues the web together. If you are going to manage web servers, set up reverse proxies, debug application issues, or secure web traffic, you need to understand HTTP at a level deeper than "it shows web pages."
Consider this real scenario: your company's API is returning intermittent 502 errors to users. The developers say "the app is fine." The load balancer logs show upstream timeouts. Without understanding HTTP status codes, headers, connection behavior, and the difference between the client, proxy, and backend, you will be guessing in the dark. This chapter gives you the foundation to diagnose and reason about every web request that flows through your infrastructure.
Try This Right Now
If you have any Linux system with curl installed (it is available on virtually every distribution), run this:
$ curl -v http://example.com 2>&1 | head -30
You should see something like:
* Trying 93.184.216.34:80...
* Connected to example.com (93.184.216.34) port 80
> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Content-Length: 1256
< Cache-Control: max-age=604800
Those lines starting with > are what your machine sent (the request). Lines starting with < are what the server replied (the response). You just watched an HTTP conversation happen in real time. Let us break it all apart.
What HTTP Is
HTTP stands for HyperText Transfer Protocol. It is an application-layer protocol (Layer 7 in the OSI model) that defines how a client and a server communicate. The model is simple:
- The client sends a request.
- The server sends back a response.
- That is one transaction. Done.
┌──────────┐ ┌──────────┐
│ │ ── HTTP Request ──> │ │
│ Client │ │ Server │
│ (curl, │ <── HTTP Response ── │ (nginx, │
│ browser)│ │ apache) │
└──────────┘ └──────────┘
HTTP is stateless -- each request-response pair is independent. The server does not remember your previous request unless something at the application layer (cookies, sessions, tokens) adds that memory.
HTTP runs on top of TCP (and with HTTP/2 and HTTP/3, on top of TLS and even QUIC), typically on port 80 for plain HTTP and port 443 for HTTPS.
Anatomy of an HTTP Request
An HTTP request has four parts:
┌─────────────────────────────────────────────────────┐
│ REQUEST LINE │
│ GET /api/users?page=2 HTTP/1.1 │
├─────────────────────────────────────────────────────┤
│ HEADERS │
│ Host: api.example.com │
│ User-Agent: curl/8.5.0 │
│ Accept: application/json │
│ Authorization: Bearer eyJhbGciOi... │
├─────────────────────────────────────────────────────┤
│ BLANK LINE (separates headers from body) │
├─────────────────────────────────────────────────────┤
│ BODY (optional) │
│ {"name": "alice", "email": "alice@example.com"} │
└─────────────────────────────────────────────────────┘
The Request Line
The first line contains three things:
- Method -- what action to perform (GET, POST, PUT, DELETE, etc.)
- URL/Path -- what resource you want (
/api/users?page=2) - HTTP Version -- which version of the protocol (
HTTP/1.1)
Headers
Headers are key-value pairs that carry metadata about the request. Each one sits on its own line in the format Header-Name: value. Headers are case-insensitive (Content-Type and content-type are the same).
Body
The body carries the actual data payload. GET requests usually have no body. POST and PUT requests typically do. The body is separated from headers by a blank line.
HTTP Methods
HTTP defines several methods (sometimes called "verbs"). Here are the ones you will encounter constantly:
| Method | Purpose | Has Body? | Idempotent? | Safe? |
|---|---|---|---|---|
| GET | Retrieve a resource | No | Yes | Yes |
| POST | Create a resource / submit data | Yes | No | No |
| PUT | Replace a resource entirely | Yes | Yes | No |
| PATCH | Partially update a resource | Yes | No | No |
| DELETE | Remove a resource | Optional | Yes | No |
| HEAD | Same as GET but no response body | No | Yes | Yes |
| OPTIONS | Ask what methods are supported | No | Yes | Yes |
Idempotent means doing it once or doing it ten times produces the same result. Sending the same PUT request ten times replaces the resource with the same data each time -- same outcome. Sending the same POST ten times might create ten different records.
Safe means the method should not change anything on the server. GET and HEAD are safe -- they only read.
Hands-On: Exploring Methods with curl
# A simple GET request
$ curl -X GET http://httpbin.org/get
# A POST request with JSON data
$ curl -X POST http://httpbin.org/post \
-H "Content-Type: application/json" \
-d '{"name": "linux-book", "topic": "HTTP"}'
# A PUT request
$ curl -X PUT http://httpbin.org/put \
-H "Content-Type: application/json" \
-d '{"name": "updated-name"}'
# A DELETE request
$ curl -X DELETE http://httpbin.org/delete
# A HEAD request (only headers, no body)
$ curl -I http://httpbin.org/get
# An OPTIONS request (check allowed methods)
$ curl -X OPTIONS http://httpbin.org/get -v 2>&1 | grep -i "allow"
Think About It: Why would a browser send an OPTIONS request before a POST? Look up "CORS preflight" -- it is directly related to this.
Anatomy of an HTTP Response
The server's response follows a similar structure:
┌─────────────────────────────────────────────────────┐
│ STATUS LINE │
│ HTTP/1.1 200 OK │
├─────────────────────────────────────────────────────┤
│ HEADERS │
│ Content-Type: application/json │
│ Content-Length: 245 │
│ Cache-Control: no-cache │
│ X-Request-Id: a3f8c9d2 │
├─────────────────────────────────────────────────────┤
│ BLANK LINE │
├─────────────────────────────────────────────────────┤
│ BODY │
│ {"users": [{"id": 1, "name": "alice"}, ...]} │
└─────────────────────────────────────────────────────┘
The status line has the HTTP version, a status code (a three-digit number), and a reason phrase (a human-readable description).
HTTP Status Codes
Status codes are grouped into five classes. Memorize the common ones -- you will see them daily.
1xx -- Informational
| Code | Meaning | When You See It |
|---|---|---|
| 100 | Continue | Server says "go ahead, send the body" |
| 101 | Switching Protocols | Upgrading to WebSocket |
2xx -- Success
| Code | Meaning | When You See It |
|---|---|---|
| 200 | OK | Standard successful response |
| 201 | Created | Resource successfully created (POST) |
| 204 | No Content | Success, but no body to return (DELETE) |
3xx -- Redirection
| Code | Meaning | When You See It |
|---|---|---|
| 301 | Moved Permanently | URL changed forever, update your bookmarks |
| 302 | Found (Temp Redirect) | Temporary redirect |
| 304 | Not Modified | Cached version is still valid |
| 307 | Temporary Redirect | Like 302 but keeps the method |
| 308 | Permanent Redirect | Like 301 but keeps the method |
4xx -- Client Error
| Code | Meaning | When You See It |
|---|---|---|
| 400 | Bad Request | Malformed request syntax |
| 401 | Unauthorized | Authentication required |
| 403 | Forbidden | Authenticated but not authorized |
| 404 | Not Found | Resource does not exist |
| 405 | Method Not Allowed | Used POST where only GET is accepted |
| 408 | Request Timeout | Client took too long |
| 429 | Too Many Requests | Rate limit exceeded |
5xx -- Server Error
| Code | Meaning | When You See It |
|---|---|---|
| 500 | Internal Server Error | Unhandled exception / generic server bug |
| 502 | Bad Gateway | Proxy got invalid response from upstream |
| 503 | Service Unavailable | Server overloaded or in maintenance |
| 504 | Gateway Timeout | Proxy timed out waiting for upstream |
Hands-On: Observing Status Codes
# 200 OK
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/status/200
200
# 404 Not Found
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/status/404
404
# 301 Redirect (follow with -L)
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/redirect-to?url=http://example.com
302
# Follow the redirect
$ curl -L -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/redirect-to?url=http://example.com
200
Think About It: You see 502 Bad Gateway errors. Is the problem on the client, the proxy, or the backend? What would you check first?
Essential HTTP Headers
Headers control everything from content negotiation to caching to authentication. Here are the ones you must know:
Request Headers
| Header | Purpose | Example |
|---|---|---|
Host | Which virtual host to reach | Host: api.example.com |
User-Agent | Identifies the client software | User-Agent: curl/8.5.0 |
Accept | What content types the client wants | Accept: application/json |
Content-Type | Format of the request body | Content-Type: application/json |
Authorization | Credentials for authentication | Authorization: Bearer eyJ... |
Cookie | Session cookies | Cookie: session=abc123 |
Cache-Control | Caching directives from client | Cache-Control: no-cache |
If-None-Match | Conditional request (ETag-based) | If-None-Match: "abc123" |
Response Headers
| Header | Purpose | Example |
|---|---|---|
Content-Type | Format of the response body | Content-Type: text/html; charset=UTF-8 |
Content-Length | Size of response body in bytes | Content-Length: 1256 |
Cache-Control | How long clients/proxies can cache | Cache-Control: max-age=3600 |
Set-Cookie | Send cookies to the client | Set-Cookie: session=abc123; HttpOnly |
Location | URL to redirect to (with 3xx codes) | Location: https://example.com/new |
X-Request-Id | Unique ID for tracing (custom header) | X-Request-Id: a3f8c9d2 |
Server | Identifies the server software | Server: nginx/1.24.0 |
Hands-On: Inspecting Headers
# See all response headers
$ curl -I https://www.google.com
# Send custom headers
$ curl -H "Accept: application/json" \
-H "X-Custom-Header: myvalue" \
http://httpbin.org/headers
# See both request and response headers
$ curl -v http://httpbin.org/get 2>&1 | grep -E "^[<>]"
The Host Header and Virtual Hosting
One crucial thing to understand: a single server (one IP address) can host hundreds of different websites. How does the server know which site you want? The Host header.
GET / HTTP/1.1
Host: blog.example.com <-- THIS tells the server which site
This is called name-based virtual hosting. When you configure Nginx or Apache with multiple server blocks (or VirtualHosts), they use the Host header to route the request to the right configuration.
# These hit the same IP but get different sites:
$ curl -H "Host: site-a.example.com" http://93.184.216.34/
$ curl -H "Host: site-b.example.com" http://93.184.216.34/
HTTP/1.1 vs HTTP/2
HTTP/1.1 (1997 -- still everywhere)
HTTP/1.1 is text-based and human-readable. Each request-response uses its own TCP connection (or reuses one with Connection: keep-alive, which is the default in HTTP/1.1).
The major bottleneck: head-of-line blocking. If you need 10 files, the browser sends requests one-at-a-time on each connection. Browsers work around this by opening 6-8 parallel TCP connections per host, but this is wasteful.
HTTP/2 (2015 -- widely adopted)
HTTP/2 solves these problems:
┌──────────────────────────────────────────────────────────┐
│ HTTP/1.1 │
│ │
│ Connection 1: GET /style.css ──> response │
│ Connection 2: GET /app.js ──> response │
│ Connection 3: GET /logo.png ──> response │
│ Connection 4: GET /data.json ──> response │
│ (One request per connection at a time) │
├──────────────────────────────────────────────────────────┤
│ HTTP/2 │
│ │
│ Single Connection: │
│ Stream 1: GET /style.css ──> response ┐ │
│ Stream 2: GET /app.js ──> response │ All at once │
│ Stream 3: GET /logo.png ──> response │ (multiplexed│
│ Stream 4: GET /data.json ──> response ┘ binary) │
└──────────────────────────────────────────────────────────┘
Key improvements in HTTP/2:
- Multiplexing -- multiple requests/responses over a single TCP connection simultaneously
- Binary framing -- more efficient parsing (not human-readable on the wire)
- Header compression (HPACK) -- reduces redundant header data
- Server push -- server can proactively send resources it predicts the client needs
- Stream prioritization -- clients can hint which resources matter most
Hands-On: Checking HTTP/2 Support
# Check if a site supports HTTP/2
$ curl -I --http2 -s https://www.google.com | head -1
HTTP/2 200
# Force HTTP/1.1 for comparison
$ curl -I --http1.1 -s https://www.google.com | head -1
HTTP/1.1 200 OK
# Verbose to see the negotiation
$ curl -v --http2 https://example.com 2>&1 | grep -i "ALPN"
* ALPN: offers h2,http/1.1
* ALPN: server accepted h2
ALPN (Application-Layer Protocol Negotiation) is how the client and server agree to use HTTP/2 during the TLS handshake.
HTTPS: HTTP + TLS
HTTPS is not a different protocol -- it is HTTP wrapped in a TLS (Transport Layer Security) encrypted tunnel. Everything we have discussed (methods, headers, status codes) works identically; the difference is that the entire conversation is encrypted.
┌──────────────────────────────────────────────────────────┐
│ HTTPS Flow │
│ │
│ 1. Client connects to port 443 │
│ 2. TLS handshake occurs: │
│ - Server presents its certificate │
│ - Client verifies the certificate │
│ - Both sides agree on encryption keys │
│ 3. Encrypted tunnel established │
│ 4. HTTP request/response flows inside the tunnel │
│ │
│ ┌────────┐ TLS Tunnel ┌────────┐ │
│ │ Client ├══════════════════┤ Server │ │
│ │ │ HTTP inside │ │ │
│ └────────┘ └────────┘ │
│ │
│ Anyone sniffing the network sees encrypted gibberish. │
└──────────────────────────────────────────────────────────┘
Hands-On: Inspecting a TLS Connection
# See the full TLS handshake + certificate details
$ curl -v https://example.com 2>&1 | grep -E "(SSL|TLS|subject|issuer|expire)"
# Check certificate details specifically
$ openssl s_client -connect example.com:443 -brief
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Ciphersuite: TLS_AES_256_GCM_SHA384
We covered TLS in depth in Chapters 39-41. The key point here: always use HTTPS in production. There is no excuse not to, especially with free certificates from Let's Encrypt.
Mastering curl for HTTP Exploration
curl is the Swiss Army knife of HTTP. Every sysadmin and developer should be fluent in it. Here is your reference:
# Basic GET request
$ curl http://example.com
# Save output to a file
$ curl -o page.html http://example.com
# Show response headers only
$ curl -I http://example.com
# Show the full conversation (verbose)
$ curl -v http://example.com
# Follow redirects
$ curl -L http://example.com
# POST with form data
$ curl -X POST -d "user=alice&pass=secret" http://httpbin.org/post
# POST with JSON
$ curl -X POST \
-H "Content-Type: application/json" \
-d '{"user": "alice"}' \
http://httpbin.org/post
# Send custom headers
$ curl -H "Authorization: Bearer mytoken" http://httpbin.org/headers
# Show only the HTTP status code
$ curl -o /dev/null -s -w "%{http_code}\n" http://example.com
# Show timing information
$ curl -o /dev/null -s -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" https://example.com
# Download with progress bar
$ curl -# -O https://example.com/largefile.tar.gz
# Resume a broken download
$ curl -C - -O https://example.com/largefile.tar.gz
# Send a request with basic auth
$ curl -u username:password http://httpbin.org/basic-auth/username/password
# Ignore SSL certificate errors (testing only!)
$ curl -k https://self-signed.badssl.com/
Safety Warning: The
-kflag disables certificate verification. Never use this in production scripts. It defeats the entire purpose of HTTPS.
Timing a Request End-to-End
This is invaluable for debugging slow responses:
$ curl -o /dev/null -s -w "\
DNS Lookup: %{time_namelookup}s\n\
TCP Connect: %{time_connect}s\n\
TLS Handshake: %{time_appconnect}s\n\
First Byte: %{time_starttransfer}s\n\
Total Time: %{time_total}s\n\
Download Size: %{size_download} bytes\n\
" https://www.google.com
Example output:
DNS Lookup: 0.012s
TCP Connect: 0.025s
TLS Handshake: 0.078s
First Byte: 0.142s
Total Time: 0.155s
Download Size: 19876 bytes
If DNS lookup is slow, you have a DNS problem. If TLS handshake is slow, check the certificate chain. If time-to-first-byte (TTFB) is slow, the backend application is slow.
Debug This
A developer reports that their API call is failing with "connection refused." They show you this curl command:
$ curl -v http://api.internal.company.com:8080/health
* Trying 10.0.1.50:8080...
* connect to 10.0.1.50 port 8080 failed: Connection refused
* Failed to connect to api.internal.company.com port 8080: Connection refused
Questions to work through:
- DNS resolved successfully (to 10.0.1.50). Is DNS the problem?
- "Connection refused" means TCP got a RST packet. What does this tell you about the server?
- What would you check on the server at 10.0.1.50?
- How is "Connection refused" different from "Connection timed out"?
Answers:
- No, DNS is fine. The name resolved to an IP.
- "Connection refused" means the server is reachable at the network level, but nothing is listening on port 8080. The TCP SYN got a RST back.
- Check if the application is running (
ss -tlnp | grep 8080), check if it crashed (journalctl -u myapp), check if it is listening on a different port or only on localhost (127.0.0.1 instead of 0.0.0.0). - "Connection timed out" means packets are being dropped (firewall, wrong IP, host down). "Connection refused" means the host is alive and actively rejecting the connection.
Connection Keep-Alive and Persistent Connections
In HTTP/1.0, every request opened a new TCP connection and closed it after the response. This was wasteful -- TCP handshakes and TLS negotiations are expensive.
HTTP/1.1 introduced persistent connections (keep-alive) as the default. The TCP connection stays open for multiple request-response cycles:
Without Keep-Alive (HTTP/1.0):
TCP connect → Request 1 → Response 1 → TCP close
TCP connect → Request 2 → Response 2 → TCP close
TCP connect → Request 3 → Response 3 → TCP close
With Keep-Alive (HTTP/1.1 default):
TCP connect → Request 1 → Response 1
→ Request 2 → Response 2
→ Request 3 → Response 3
→ ... → TCP close (after timeout)
You can see this in action:
# curl reuses connections when given multiple URLs
$ curl -v http://example.com http://example.com 2>&1 | grep -E "(Connected|Re-using)"
* Connected to example.com (93.184.216.34) port 80
* Re-using existing connection with host example.com
Content Negotiation
When a client and server need to agree on the format of data, they use content negotiation headers:
# Client says "I want JSON"
$ curl -H "Accept: application/json" http://httpbin.org/get
# Client says "I want XML"
$ curl -H "Accept: application/xml" http://httpbin.org/get
# Client says "I'm sending JSON"
$ curl -H "Content-Type: application/json" \
-d '{"key": "value"}' \
http://httpbin.org/post
Common content types you will encounter:
| Content-Type | What It Is |
|---|---|
text/html | HTML web page |
text/plain | Plain text |
application/json | JSON data |
application/xml | XML data |
application/x-www-form-urlencoded | HTML form data |
multipart/form-data | File uploads |
application/octet-stream | Raw binary data |
Caching Basics
HTTP has built-in caching mechanisms that reduce load and speed up responses:
┌────────┐ ┌───────────┐ ┌────────┐
│ Client │ ──> │ Cache │ ──> │ Server │
│ │ <── │ (browser, │ <── │ │
│ │ │ proxy, │ │ │
│ │ │ CDN) │ │ │
└────────┘ └───────────┘ └────────┘
Key caching headers:
Cache-Control: max-age=3600-- cache this for 3600 secondsCache-Control: no-cache-- always revalidate with server before using cacheCache-Control: no-store-- never cache this at allETag: "abc123"-- a fingerprint of the content; client can ask "has it changed?"If-None-Match: "abc123"-- client sends the old ETag; server returns 304 if unchanged
# See caching headers
$ curl -I https://www.google.com 2>/dev/null | grep -i cache
Cache-Control: private, max-age=0
# See ETag header
$ curl -I http://example.com 2>/dev/null | grep -i etag
ETag: "3147526947+gzip"
What Just Happened?
┌──────────────────────────────────────────────────────────┐
│ Chapter 43 Recap │
├──────────────────────────────────────────────────────────┤
│ │
│ HTTP is a request-response protocol (client asks, │
│ server answers). Each transaction is stateless. │
│ │
│ A REQUEST has: Method + URL + Headers + optional Body │
│ A RESPONSE has: Status Code + Headers + optional Body │
│ │
│ Methods: GET (read), POST (create), PUT (replace), │
│ DELETE (remove), HEAD (headers only), │
│ OPTIONS (capabilities) │
│ │
│ Status codes: │
│ 2xx = success 3xx = redirect 4xx = client error │
│ 5xx = server error │
│ │
│ Key headers: Host (virtual hosting), Content-Type │
│ (data format), Authorization (credentials), │
│ Cache-Control (caching behavior) │
│ │
│ HTTP/2 = binary, multiplexed, single connection │
│ HTTPS = HTTP inside a TLS encrypted tunnel │
│ │
│ curl is your best friend for HTTP debugging. │
│ │
└──────────────────────────────────────────────────────────┘
Try This
Exercise 1: Decode a Full Request
Use curl -v against any public URL. Identify and label every part: the method, URL, HTTP version, each request header, the status code, each response header, and the body.
Exercise 2: Status Code Scavenger Hunt
Using httpbin.org/status/{code}, get curl to show you a 200, 301, 403, 404, 500, and 502. Observe how the responses differ.
$ for code in 200 301 403 404 500 502; do
echo "=== $code ==="
curl -o /dev/null -s -w "Status: %{http_code}\n" http://httpbin.org/status/$code
done
Exercise 3: Timing Deep Dive
Use the curl timing format string from this chapter to measure the response time of five different websites. Which has the fastest TTFB? Which has the slowest DNS lookup?
Exercise 4: Content Negotiation
Send requests to httpbin.org/get with different Accept headers. Try application/json, text/html, application/xml, and text/plain. Compare the responses.
Bonus Challenge
Write a bash script that takes a URL as an argument and produces a "health check report" including: the HTTP status code, the server header, the content type, the TLS version (if HTTPS), and the total response time. Format it nicely.
#!/bin/bash
URL="${1:?Usage: $0 <url>}"
echo "=== Health Check: $URL ==="
curl -o /dev/null -s -w "\
Status Code: %{http_code}\n\
Content Type: %{content_type}\n\
TLS Version: %{ssl_version}\n\
Response Time: %{time_total}s\n\
Download Size: %{size_download} bytes\n\
" "$URL"
What Comes Next
Now that you understand HTTP at the protocol level, it is time to set up the software that actually speaks this protocol. In the next chapter, we will install Nginx and configure it from scratch to serve web content -- your first web server.