HTTP Protocol Essentials

Why This Matters

Every time you open a browser, call an API, download a package with apt or dnf, or deploy a web application, you are using HTTP. It is the protocol that glues the web together. If you are going to manage web servers, set up reverse proxies, debug application issues, or secure web traffic, you need to understand HTTP at a level deeper than "it shows web pages."

Consider this real scenario: your company's API is returning intermittent 502 errors to users. The developers say "the app is fine." The load balancer logs show upstream timeouts. Without understanding HTTP status codes, headers, connection behavior, and the difference between the client, proxy, and backend, you will be guessing in the dark. This chapter gives you the foundation to diagnose and reason about every web request that flows through your infrastructure.

Try This Right Now

If you have any Linux system with curl installed (it is available on virtually every distribution), run this:

$ curl -v http://example.com 2>&1 | head -30

You should see something like:

*   Trying 93.184.216.34:80...
* Connected to example.com (93.184.216.34) port 80
> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Content-Length: 1256
< Cache-Control: max-age=604800

Those lines starting with > are what your machine sent (the request). Lines starting with < are what the server replied (the response). You just watched an HTTP conversation happen in real time. Let us break it all apart.

What HTTP Is

HTTP stands for HyperText Transfer Protocol. It is an application-layer protocol (Layer 7 in the OSI model) that defines how a client and a server communicate. The model is simple:

The client sends a request.
The server sends back a response.
That is one transaction. Done.

┌──────────┐                        ┌──────────┐
│          │  ── HTTP Request ──>   │          │
│  Client  │                        │  Server  │
│  (curl,  │  <── HTTP Response ──  │  (nginx, │
│  browser)│                        │  apache) │
└──────────┘                        └──────────┘

HTTP is stateless -- each request-response pair is independent. The server does not remember your previous request unless something at the application layer (cookies, sessions, tokens) adds that memory.

HTTP runs on top of TCP (and with HTTP/2 and HTTP/3, on top of TLS and even QUIC), typically on port 80 for plain HTTP and port 443 for HTTPS.

Anatomy of an HTTP Request

An HTTP request has four parts:

┌─────────────────────────────────────────────────────┐
│  REQUEST LINE                                        │
│  GET /api/users?page=2 HTTP/1.1                     │
├─────────────────────────────────────────────────────┤
│  HEADERS                                             │
│  Host: api.example.com                              │
│  User-Agent: curl/8.5.0                             │
│  Accept: application/json                            │
│  Authorization: Bearer eyJhbGciOi...                │
├─────────────────────────────────────────────────────┤
│  BLANK LINE (separates headers from body)            │
├─────────────────────────────────────────────────────┤
│  BODY (optional)                                     │
│  {"name": "alice", "email": "alice@example.com"}    │
└─────────────────────────────────────────────────────┘

The Request Line

The first line contains three things:

Method -- what action to perform (GET, POST, PUT, DELETE, etc.)
URL/Path -- what resource you want (/api/users?page=2)
HTTP Version -- which version of the protocol (HTTP/1.1)

Headers

Headers are key-value pairs that carry metadata about the request. Each one sits on its own line in the format Header-Name: value. Headers are case-insensitive (Content-Type and content-type are the same).

Body

The body carries the actual data payload. GET requests usually have no body. POST and PUT requests typically do. The body is separated from headers by a blank line.

HTTP Methods

HTTP defines several methods (sometimes called "verbs"). Here are the ones you will encounter constantly:

Method	Purpose	Has Body?	Idempotent?	Safe?
GET	Retrieve a resource	No	Yes	Yes
POST	Create a resource / submit data	Yes	No	No
PUT	Replace a resource entirely	Yes	Yes	No
PATCH	Partially update a resource	Yes	No	No
DELETE	Remove a resource	Optional	Yes	No
HEAD	Same as GET but no response body	No	Yes	Yes
OPTIONS	Ask what methods are supported	No	Yes	Yes

Idempotent means doing it once or doing it ten times produces the same result. Sending the same PUT request ten times replaces the resource with the same data each time -- same outcome. Sending the same POST ten times might create ten different records.

Safe means the method should not change anything on the server. GET and HEAD are safe -- they only read.

Hands-On: Exploring Methods with curl

# A simple GET request
$ curl -X GET http://httpbin.org/get

# A POST request with JSON data
$ curl -X POST http://httpbin.org/post \
  -H "Content-Type: application/json" \
  -d '{"name": "linux-book", "topic": "HTTP"}'

# A PUT request
$ curl -X PUT http://httpbin.org/put \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name"}'

# A DELETE request
$ curl -X DELETE http://httpbin.org/delete

# A HEAD request (only headers, no body)
$ curl -I http://httpbin.org/get

# An OPTIONS request (check allowed methods)
$ curl -X OPTIONS http://httpbin.org/get -v 2>&1 | grep -i "allow"

Think About It: Why would a browser send an OPTIONS request before a POST? Look up "CORS preflight" -- it is directly related to this.

Anatomy of an HTTP Response

The server's response follows a similar structure:

┌─────────────────────────────────────────────────────┐
│  STATUS LINE                                         │
│  HTTP/1.1 200 OK                                    │
├─────────────────────────────────────────────────────┤
│  HEADERS                                             │
│  Content-Type: application/json                      │
│  Content-Length: 245                                  │
│  Cache-Control: no-cache                             │
│  X-Request-Id: a3f8c9d2                              │
├─────────────────────────────────────────────────────┤
│  BLANK LINE                                          │
├─────────────────────────────────────────────────────┤
│  BODY                                                │
│  {"users": [{"id": 1, "name": "alice"}, ...]}       │
└─────────────────────────────────────────────────────┘

The status line has the HTTP version, a status code (a three-digit number), and a reason phrase (a human-readable description).

HTTP Status Codes

Status codes are grouped into five classes. Memorize the common ones -- you will see them daily.

1xx -- Informational

Code	Meaning	When You See It
100	Continue	Server says "go ahead, send the body"
101	Switching Protocols	Upgrading to WebSocket

2xx -- Success

Code	Meaning	When You See It
200	OK	Standard successful response
201	Created	Resource successfully created (POST)
204	No Content	Success, but no body to return (DELETE)

3xx -- Redirection

Code	Meaning	When You See It
301	Moved Permanently	URL changed forever, update your bookmarks
302	Found (Temp Redirect)	Temporary redirect
304	Not Modified	Cached version is still valid
307	Temporary Redirect	Like 302 but keeps the method
308	Permanent Redirect	Like 301 but keeps the method

4xx -- Client Error

Code	Meaning	When You See It
400	Bad Request	Malformed request syntax
401	Unauthorized	Authentication required
403	Forbidden	Authenticated but not authorized
404	Not Found	Resource does not exist
405	Method Not Allowed	Used POST where only GET is accepted
408	Request Timeout	Client took too long
429	Too Many Requests	Rate limit exceeded

5xx -- Server Error

Code	Meaning	When You See It
500	Internal Server Error	Unhandled exception / generic server bug
502	Bad Gateway	Proxy got invalid response from upstream
503	Service Unavailable	Server overloaded or in maintenance
504	Gateway Timeout	Proxy timed out waiting for upstream

Hands-On: Observing Status Codes

# 200 OK
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/status/200
200

# 404 Not Found
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/status/404
404

# 301 Redirect (follow with -L)
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/redirect-to?url=http://example.com
302

# Follow the redirect
$ curl -L -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/redirect-to?url=http://example.com
200

Think About It: You see 502 Bad Gateway errors. Is the problem on the client, the proxy, or the backend? What would you check first?

Essential HTTP Headers

Headers control everything from content negotiation to caching to authentication. Here are the ones you must know:

Request Headers

Header	Purpose	Example
`Host`	Which virtual host to reach	`Host: api.example.com`
`User-Agent`	Identifies the client software	`User-Agent: curl/8.5.0`
`Accept`	What content types the client wants	`Accept: application/json`
`Content-Type`	Format of the request body	`Content-Type: application/json`
`Authorization`	Credentials for authentication	`Authorization: Bearer eyJ...`
`Cookie`	Session cookies	`Cookie: session=abc123`
`Cache-Control`	Caching directives from client	`Cache-Control: no-cache`
`If-None-Match`	Conditional request (ETag-based)	`If-None-Match: "abc123"`

Response Headers

Header	Purpose	Example
`Content-Type`	Format of the response body	`Content-Type: text/html; charset=UTF-8`
`Content-Length`	Size of response body in bytes	`Content-Length: 1256`
`Cache-Control`	How long clients/proxies can cache	`Cache-Control: max-age=3600`
`Set-Cookie`	Send cookies to the client	`Set-Cookie: session=abc123; HttpOnly`
`Location`	URL to redirect to (with 3xx codes)	`Location: https://example.com/new`
`X-Request-Id`	Unique ID for tracing (custom header)	`X-Request-Id: a3f8c9d2`
`Server`	Identifies the server software	`Server: nginx/1.24.0`

Hands-On: Inspecting Headers

# See all response headers
$ curl -I https://www.google.com

# Send custom headers
$ curl -H "Accept: application/json" \
       -H "X-Custom-Header: myvalue" \
       http://httpbin.org/headers

# See both request and response headers
$ curl -v http://httpbin.org/get 2>&1 | grep -E "^[<>]"

The Host Header and Virtual Hosting

One crucial thing to understand: a single server (one IP address) can host hundreds of different websites. How does the server know which site you want? The Host header.

GET / HTTP/1.1
Host: blog.example.com       <-- THIS tells the server which site

This is called name-based virtual hosting. When you configure Nginx or Apache with multiple server blocks (or VirtualHosts), they use the Host header to route the request to the right configuration.

# These hit the same IP but get different sites:
$ curl -H "Host: site-a.example.com" http://93.184.216.34/
$ curl -H "Host: site-b.example.com" http://93.184.216.34/

HTTP/1.1 vs HTTP/2

HTTP/1.1 (1997 -- still everywhere)

HTTP/1.1 is text-based and human-readable. Each request-response uses its own TCP connection (or reuses one with Connection: keep-alive, which is the default in HTTP/1.1).

The major bottleneck: head-of-line blocking. If you need 10 files, the browser sends requests one-at-a-time on each connection. Browsers work around this by opening 6-8 parallel TCP connections per host, but this is wasteful.

HTTP/2 (2015 -- widely adopted)

HTTP/2 solves these problems:

┌──────────────────────────────────────────────────────────┐
│                     HTTP/1.1                              │
│                                                          │
│  Connection 1: GET /style.css ──> response               │
│  Connection 2: GET /app.js   ──> response                │
│  Connection 3: GET /logo.png ──> response                │
│  Connection 4: GET /data.json ──> response               │
│  (One request per connection at a time)                   │
├──────────────────────────────────────────────────────────┤
│                     HTTP/2                                │
│                                                          │
│  Single Connection:                                       │
│    Stream 1: GET /style.css ──> response    ┐             │
│    Stream 2: GET /app.js   ──> response     │ All at once │
│    Stream 3: GET /logo.png ──> response     │ (multiplexed│
│    Stream 4: GET /data.json ──> response    ┘  binary)    │
└──────────────────────────────────────────────────────────┘

Key improvements in HTTP/2:

Multiplexing -- multiple requests/responses over a single TCP connection simultaneously
Binary framing -- more efficient parsing (not human-readable on the wire)
Header compression (HPACK) -- reduces redundant header data
Server push -- server can proactively send resources it predicts the client needs
Stream prioritization -- clients can hint which resources matter most

Hands-On: Checking HTTP/2 Support

# Check if a site supports HTTP/2
$ curl -I --http2 -s https://www.google.com | head -1
HTTP/2 200

# Force HTTP/1.1 for comparison
$ curl -I --http1.1 -s https://www.google.com | head -1
HTTP/1.1 200 OK

# Verbose to see the negotiation
$ curl -v --http2 https://example.com 2>&1 | grep -i "ALPN"
* ALPN: offers h2,http/1.1
* ALPN: server accepted h2

ALPN (Application-Layer Protocol Negotiation) is how the client and server agree to use HTTP/2 during the TLS handshake.

HTTPS: HTTP + TLS

HTTPS is not a different protocol -- it is HTTP wrapped in a TLS (Transport Layer Security) encrypted tunnel. Everything we have discussed (methods, headers, status codes) works identically; the difference is that the entire conversation is encrypted.

┌──────────────────────────────────────────────────────────┐
│                    HTTPS Flow                             │
│                                                          │
│  1. Client connects to port 443                          │
│  2. TLS handshake occurs:                                │
│     - Server presents its certificate                     │
│     - Client verifies the certificate                     │
│     - Both sides agree on encryption keys                 │
│  3. Encrypted tunnel established                          │
│  4. HTTP request/response flows inside the tunnel         │
│                                                          │
│  ┌────────┐    TLS Tunnel    ┌────────┐                  │
│  │ Client ├══════════════════┤ Server │                  │
│  │        │  HTTP inside     │        │                  │
│  └────────┘                  └────────┘                  │
│                                                          │
│  Anyone sniffing the network sees encrypted gibberish.    │
└──────────────────────────────────────────────────────────┘

Hands-On: Inspecting a TLS Connection

# See the full TLS handshake + certificate details
$ curl -v https://example.com 2>&1 | grep -E "(SSL|TLS|subject|issuer|expire)"

# Check certificate details specifically
$ openssl s_client -connect example.com:443 -brief
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Ciphersuite: TLS_AES_256_GCM_SHA384

We covered TLS in depth in Chapters 39-41. The key point here: always use HTTPS in production. There is no excuse not to, especially with free certificates from Let's Encrypt.

Mastering curl for HTTP Exploration

curl is the Swiss Army knife of HTTP. Every sysadmin and developer should be fluent in it. Here is your reference:

# Basic GET request
$ curl http://example.com

# Save output to a file
$ curl -o page.html http://example.com

# Show response headers only
$ curl -I http://example.com

# Show the full conversation (verbose)
$ curl -v http://example.com

# Follow redirects
$ curl -L http://example.com

# POST with form data
$ curl -X POST -d "user=alice&pass=secret" http://httpbin.org/post

# POST with JSON
$ curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"user": "alice"}' \
  http://httpbin.org/post

# Send custom headers
$ curl -H "Authorization: Bearer mytoken" http://httpbin.org/headers

# Show only the HTTP status code
$ curl -o /dev/null -s -w "%{http_code}\n" http://example.com

# Show timing information
$ curl -o /dev/null -s -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" https://example.com

# Download with progress bar
$ curl -# -O https://example.com/largefile.tar.gz

# Resume a broken download
$ curl -C - -O https://example.com/largefile.tar.gz

# Send a request with basic auth
$ curl -u username:password http://httpbin.org/basic-auth/username/password

# Ignore SSL certificate errors (testing only!)
$ curl -k https://self-signed.badssl.com/

Safety Warning: The -k flag disables certificate verification. Never use this in production scripts. It defeats the entire purpose of HTTPS.

Timing a Request End-to-End

This is invaluable for debugging slow responses:

$ curl -o /dev/null -s -w "\
   DNS Lookup:  %{time_namelookup}s\n\
   TCP Connect: %{time_connect}s\n\
   TLS Handshake: %{time_appconnect}s\n\
   First Byte:  %{time_starttransfer}s\n\
   Total Time:  %{time_total}s\n\
   Download Size: %{size_download} bytes\n\
" https://www.google.com

Example output:

   DNS Lookup:  0.012s
   TCP Connect: 0.025s
   TLS Handshake: 0.078s
   First Byte:  0.142s
   Total Time:  0.155s
   Download Size: 19876 bytes

If DNS lookup is slow, you have a DNS problem. If TLS handshake is slow, check the certificate chain. If time-to-first-byte (TTFB) is slow, the backend application is slow.

Debug This

A developer reports that their API call is failing with "connection refused." They show you this curl command:

$ curl -v http://api.internal.company.com:8080/health
* Trying 10.0.1.50:8080...
* connect to 10.0.1.50 port 8080 failed: Connection refused
* Failed to connect to api.internal.company.com port 8080: Connection refused

Questions to work through:

DNS resolved successfully (to 10.0.1.50). Is DNS the problem?
"Connection refused" means TCP got a RST packet. What does this tell you about the server?
What would you check on the server at 10.0.1.50?
How is "Connection refused" different from "Connection timed out"?

Answers:

No, DNS is fine. The name resolved to an IP.
"Connection refused" means the server is reachable at the network level, but nothing is listening on port 8080. The TCP SYN got a RST back.
Check if the application is running (ss -tlnp | grep 8080), check if it crashed (journalctl -u myapp), check if it is listening on a different port or only on localhost (127.0.0.1 instead of 0.0.0.0).
"Connection timed out" means packets are being dropped (firewall, wrong IP, host down). "Connection refused" means the host is alive and actively rejecting the connection.

Connection Keep-Alive and Persistent Connections

In HTTP/1.0, every request opened a new TCP connection and closed it after the response. This was wasteful -- TCP handshakes and TLS negotiations are expensive.

HTTP/1.1 introduced persistent connections (keep-alive) as the default. The TCP connection stays open for multiple request-response cycles:

Without Keep-Alive (HTTP/1.0):
  TCP connect → Request 1 → Response 1 → TCP close
  TCP connect → Request 2 → Response 2 → TCP close
  TCP connect → Request 3 → Response 3 → TCP close

With Keep-Alive (HTTP/1.1 default):
  TCP connect → Request 1 → Response 1
              → Request 2 → Response 2
              → Request 3 → Response 3
              → ... → TCP close (after timeout)

You can see this in action:

# curl reuses connections when given multiple URLs
$ curl -v http://example.com http://example.com 2>&1 | grep -E "(Connected|Re-using)"
* Connected to example.com (93.184.216.34) port 80
* Re-using existing connection with host example.com

Content Negotiation

When a client and server need to agree on the format of data, they use content negotiation headers:

# Client says "I want JSON"
$ curl -H "Accept: application/json" http://httpbin.org/get

# Client says "I want XML"
$ curl -H "Accept: application/xml" http://httpbin.org/get

# Client says "I'm sending JSON"
$ curl -H "Content-Type: application/json" \
       -d '{"key": "value"}' \
       http://httpbin.org/post

Common content types you will encounter:

Content-Type	What It Is
`text/html`	HTML web page
`text/plain`	Plain text
`application/json`	JSON data
`application/xml`	XML data
`application/x-www-form-urlencoded`	HTML form data
`multipart/form-data`	File uploads
`application/octet-stream`	Raw binary data

Caching Basics

HTTP has built-in caching mechanisms that reduce load and speed up responses:

┌────────┐     ┌───────────┐     ┌────────┐
│ Client │ ──> │   Cache   │ ──> │ Server │
│        │ <── │ (browser, │ <── │        │
│        │     │  proxy,   │     │        │
│        │     │  CDN)     │     │        │
└────────┘     └───────────┘     └────────┘

Key caching headers:

Cache-Control: max-age=3600 -- cache this for 3600 seconds
Cache-Control: no-cache -- always revalidate with server before using cache
Cache-Control: no-store -- never cache this at all
ETag: "abc123" -- a fingerprint of the content; client can ask "has it changed?"
If-None-Match: "abc123" -- client sends the old ETag; server returns 304 if unchanged

# See caching headers
$ curl -I https://www.google.com 2>/dev/null | grep -i cache
Cache-Control: private, max-age=0

# See ETag header
$ curl -I http://example.com 2>/dev/null | grep -i etag
ETag: "3147526947+gzip"

What Just Happened?

┌──────────────────────────────────────────────────────────┐
│                   Chapter 43 Recap                        │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  HTTP is a request-response protocol (client asks,       │
│  server answers). Each transaction is stateless.          │
│                                                          │
│  A REQUEST has: Method + URL + Headers + optional Body    │
│  A RESPONSE has: Status Code + Headers + optional Body    │
│                                                          │
│  Methods: GET (read), POST (create), PUT (replace),      │
│           DELETE (remove), HEAD (headers only),           │
│           OPTIONS (capabilities)                          │
│                                                          │
│  Status codes:                                            │
│    2xx = success    3xx = redirect    4xx = client error  │
│    5xx = server error                                     │
│                                                          │
│  Key headers: Host (virtual hosting), Content-Type        │
│  (data format), Authorization (credentials),              │
│  Cache-Control (caching behavior)                         │
│                                                          │
│  HTTP/2 = binary, multiplexed, single connection          │
│  HTTPS  = HTTP inside a TLS encrypted tunnel              │
│                                                          │
│  curl is your best friend for HTTP debugging.             │
│                                                          │
└──────────────────────────────────────────────────────────┘

Try This

Exercise 1: Decode a Full Request

Use curl -v against any public URL. Identify and label every part: the method, URL, HTTP version, each request header, the status code, each response header, and the body.

Exercise 2: Status Code Scavenger Hunt

Using httpbin.org/status/{code}, get curl to show you a 200, 301, 403, 404, 500, and 502. Observe how the responses differ.

$ for code in 200 301 403 404 500 502; do
    echo "=== $code ==="
    curl -o /dev/null -s -w "Status: %{http_code}\n" http://httpbin.org/status/$code
  done

Exercise 3: Timing Deep Dive

Use the curl timing format string from this chapter to measure the response time of five different websites. Which has the fastest TTFB? Which has the slowest DNS lookup?

Exercise 4: Content Negotiation

Send requests to httpbin.org/get with different Accept headers. Try application/json, text/html, application/xml, and text/plain. Compare the responses.

Bonus Challenge

Write a bash script that takes a URL as an argument and produces a "health check report" including: the HTTP status code, the server header, the content type, the TLS version (if HTTPS), and the total response time. Format it nicely.

#!/bin/bash
URL="${1:?Usage: $0 <url>}"
echo "=== Health Check: $URL ==="
curl -o /dev/null -s -w "\
  Status Code:    %{http_code}\n\
  Content Type:   %{content_type}\n\
  TLS Version:    %{ssl_version}\n\
  Response Time:  %{time_total}s\n\
  Download Size:  %{size_download} bytes\n\
" "$URL"

What Comes Next

Now that you understand HTTP at the protocol level, it is time to set up the software that actually speaks this protocol. In the next chapter, we will install Nginx and configure it from scratch to serve web content -- your first web server.

Linux Book: From First Boot to Production