Certificate Lifecycle

"A certificate is not a set-and-forget artifact. It is born, it lives, and it must be allowed to die gracefully -- or it will die catastrophically." -- Unknown sysadmin, 3 AM during a cert expiry outage

Picture this: the payment gateway returns 502 errors. All transactions are failing. Revenue impact is roughly $12,000 per minute. The load balancer health checks are failing, the backend is fine, but the TLS handshake is dying.

echo | openssl s_client -connect payments.internal:443 -servername payments.internal 2>/dev/null \
  | openssl x509 -noout -dates

The output reads notAfter=Mar 12 00:00:00 2026 GMT. The certificate expired today. A certificate that nobody was watching just cost the company six figures. Understanding the entire certificate lifecycle is what prevents this from happening on your watch.

The Certificate Lifecycle: End to End

Every certificate goes through a defined series of stages. Understanding each stage -- and what can go wrong at each -- is the difference between smooth operations and midnight outages. The lifecycle is not linear; it is a loop with an emergency exit.

stateDiagram-v2
    [*] --> KeyGeneration: Generate cryptographic key pair
    KeyGeneration --> CSRCreation: Create Certificate Signing Request
    CSRCreation --> Validation: Submit CSR to CA
    Validation --> Issuance: CA validates and signs
    Issuance --> Deployment: Install cert + key on server
    Deployment --> Monitoring: Monitor expiry, revocation, health
    Monitoring --> Renewal: Approaching expiry threshold
    Renewal --> Validation: New CSR or ACME renewal

    Monitoring --> Revocation: Key compromise or policy change
    Revocation --> KeyGeneration: Generate new key pair

    note right of KeyGeneration
        RSA 2048+ or ECDSA P-256
        Protect with strict file permissions
    end note

    note right of Monitoring
        Alert at 30, 14, 7, 3, 1 days
        Multiple independent systems
    end note

    note right of Revocation
        CRL, OCSP, or OCSP Stapling
        Immediate action required
    end note

Stage 1: Key Generation

Everything starts with generating a cryptographic key pair. The private key is the most sensitive artifact in the entire lifecycle. If it is compromised at any point, the certificate must be revoked and the entire lifecycle restarts from scratch.

# Generate an RSA 2048-bit private key
openssl genrsa -out server.key 2048

# Generate a 4096-bit key for higher security
# Trade-off: slower TLS handshakes (RSA 4096 signature verification
# takes roughly 4x longer than RSA 2048)
openssl genrsa -out server-strong.key 4096

# Generate an ECDSA key with P-256 curve (recommended for most use cases)
openssl ecparam -genkey -name prime256v1 -noout -out server-ec.key

# Generate an ECDSA key with P-384 curve (required by some government standards)
openssl ecparam -genkey -name secp384r1 -noout -out server-ec384.key

# Encrypt the private key at rest with AES-256
# This means the key file requires a passphrase to use
openssl rsa -aes256 -in server.key -out server-encrypted.key

# Verify the key was generated correctly
openssl rsa -in server.key -check -noout
# or for EC keys:
openssl ec -in server-ec.key -check -noout

So how do you choose between RSA and ECDSA? In 2026, ECDSA P-256 is the default recommendation for most use cases. Here is the comparison that matters operationally:

Property	RSA 2048	RSA 4096	ECDSA P-256
Security level	~112 bits	~140 bits	~128 bits
Key size	256 bytes	512 bytes	32 bytes
Signature size	256 bytes	512 bytes	64 bytes
Sign speed	Fast	Slower	Fast
Verify speed	Very fast	Fast	Moderate
TLS handshake impact	Baseline	~2x slower	~1.5x faster
Client compatibility	Universal	Universal	All modern clients
Post-quantum resistance	None	None	None

ECDSA keys are smaller, which means smaller certificates, less bandwidth during the TLS handshake, and faster operations on mobile devices. RSA 2048 is still widely deployed and not broken, but there is no reason to choose it for new deployments unless you need to support very old clients (Android 2.x era).

For post-quantum considerations, neither RSA nor ECDSA is resistant. The industry is beginning to experiment with hybrid certificates that include both classical and post-quantum key material, but standardization is ongoing.

**Private key security is non-negotiable.**

- Never generate keys on shared systems where other users have root access
- Never store unencrypted private keys in version control (check your `.gitignore`)
- Never transmit private keys over unencrypted channels -- not even internal chat tools
- Set file permissions immediately: `chmod 600 server.key` (owner read/write only)
- For production CAs, use Hardware Security Modules (HSMs) where the key never leaves tamper-resistant hardware
- Consider generating the key on the target server so it never traverses a network
- Use `shred` or `srm` when deleting old key files -- regular `rm` leaves data recoverable on disk
- Audit who has access to key files regularly. A key is only as secure as the most careless person who can read it.

Stage 2: Certificate Signing Request (CSR)

A CSR is a formal request to a CA to sign your public key. It contains your identity information and your public key, signed with your private key to prove possession. The CSR itself is not secret -- it contains only public information. But the process of creating it correctly matters.

Creating a CSR Step by Step

# Method 1: Interactive CSR generation (prompts for each field)
openssl req -new -key server.key -out server.csr
# You'll be prompted for:
# Country Name (2 letter code) []: IN
# State or Province Name []: Karnataka
# Locality Name []: Bangalore
# Organization Name []: Acme Corp
# Organizational Unit Name []: Engineering
# Common Name []: api.acme.com
# Email Address []: (leave blank for server certs)

# Method 2: Non-interactive CSR generation (all in one command)
openssl req -new -key server.key -out server.csr \
  -subj "/C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/OU=Engineering/CN=api.acme.com"

# Method 3: CSR with Subject Alternative Names
# This is the recommended approach for modern certificates
openssl req -new -key server.key -out server.csr \
  -subj "/C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/CN=acme.com" \
  -addext "subjectAltName=DNS:acme.com,DNS:www.acme.com,DNS:api.acme.com"

# Method 4: Generate key and CSR in a single command
openssl req -new -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
  -keyout server.key -out server.csr -nodes \
  -subj "/CN=api.acme.com" \
  -addext "subjectAltName=DNS:api.acme.com,DNS:acme.com"

Inspecting a CSR Before Submission

Always inspect the CSR before submitting it to a CA. Mistakes here mean you get a certificate with wrong information and have to start over.

# Display the CSR contents in human-readable form
openssl req -in server.csr -noout -text

# Verify the CSR's self-signature (proves the CSR creator has the private key)
openssl req -in server.csr -noout -verify
# verify OK

# Just show the subject
openssl req -in server.csr -noout -subject
# subject=C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/CN=api.acme.com

Here is what the full CSR output looks like and what to check:

Certificate Request:
    Data:
        Version: 1 (0x0)
        Subject: C=IN, ST=Karnataka, L=Bangalore, O=Acme Corp, CN=api.acme.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)         <-- Verify key type and size
                pub:
                    04:a1:b2:c3:...           <-- The actual public key bytes
                ASN1 OID: prime256v1          <-- Curve name
        Attributes:
            Requested Extensions:
                X509v3 Subject Alternative Name:
                    DNS:api.acme.com, DNS:acme.com    <-- Verify all hostnames are correct
    Signature Algorithm: ecdsa-with-SHA256      <-- Self-signature algorithm
        30:45:02:21:00:...                       <-- Proves possession of private key

Why does the CSR need to be signed? The self-signature on the CSR serves as a proof-of-possession. It proves that whoever created the CSR actually holds the private key corresponding to the public key in the request. Without that signature, an attacker could submit a CSR containing someone else's public key and trick a CA into issuing a certificate that ties a victim's identity to the attacker's key. The CSR signature makes this impossible -- only the private key holder can create a valid CSR signature.

Stage 3: CA Signing (Validation and Issuance)

The CA receives your CSR, validates your identity (to the level required by the certificate type), and issues a signed certificate. For DV certificates, validation means proving domain control.

Domain Validation Methods

graph LR
    subgraph "HTTP-01 Challenge"
        A1[CA sends token] --> B1[Place token at<br/>/.well-known/acme-challenge/TOKEN]
        B1 --> C1[CA fetches token via HTTP<br/>from port 80]
        C1 --> D1[Token matches = Domain control proven]
    end

    subgraph "DNS-01 Challenge"
        A2[CA sends token] --> B2[Create TXT record<br/>_acme-challenge.domain.com]
        B2 --> C2[CA queries DNS<br/>for TXT record]
        C2 --> D2[Record matches = Domain control proven]
    end

    subgraph "TLS-ALPN-01 Challenge"
        A3[CA sends token] --> B3[Configure server to present<br/>self-signed cert with token<br/>via ALPN protocol on port 443]
        B3 --> C3[CA connects to port 443<br/>using acme-tls/1 ALPN]
        C3 --> D3[Token in cert = Domain control proven]
    end

Each method has specific trade-offs:

HTTP-01 is the simplest and works for most servers with port 80 open. Limitations: requires port 80 accessible from the internet; cannot be used for wildcard certificates; the CA's validation servers must be able to reach your server directly.

DNS-01 is required for wildcard certificates and works even when the server is not publicly accessible. You just need API access to your DNS provider. Limitations: DNS propagation can take minutes; requires automating DNS record creation; if your DNS provider's API is slow or unreliable, renewals can fail.

TLS-ALPN-01 is useful when port 80 is blocked but 443 is open. Limitations: requires server-side support; less widely supported by ACME clients; cannot be used for wildcard certificates.

Which challenge should you use? For single servers with port 80 open, HTTP-01 is simplest. For wildcard certificates, DNS-01 is your only option. For automation at scale, DNS-01 with an API-driven DNS provider (Cloudflare, Route 53, Google Cloud DNS) is the most flexible because it does not require the server being reachable from the internet. DNS-01 is the best default for everything.

Stage 4: Certificate Deployment

Once you have the signed certificate, you deploy it alongside the private key and the intermediate certificate chain. Getting the chain right is where most teams stumble.

Server Configuration

# Nginx -- CORRECT configuration
server {
    listen 443 ssl http2;
    server_name api.acme.com;

    # fullchain.pem = your cert + intermediate(s), concatenated in order
    ssl_certificate     /etc/nginx/ssl/fullchain.pem;
    # Private key -- must match the public key in the cert
    ssl_certificate_key /etc/nginx/ssl/privkey.pem;

    # Modern TLS configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;

    # OCSP stapling (server fetches its own revocation status)
    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate /etc/nginx/ssl/chain.pem;  # intermediate + root
    resolver 8.8.8.8 8.8.4.4 valid=300s;
}

Building the Chain File Correctly

# CORRECT: end-entity cert first, then intermediate(s)
cat server.crt intermediate.crt > fullchain.pem

# WRONG: reversed order
cat intermediate.crt server.crt > fullchain.pem    # DO NOT DO THIS

# WRONG: including the root
cat server.crt intermediate.crt root.crt > fullchain.pem  # DO NOT DO THIS
# The root must already be in the client's trust store

# WRONG: only the server cert, no intermediate
cp server.crt fullchain.pem  # DO NOT DO THIS
# Works on some browsers (cached intermediate) but fails on Android/Java/curl

Verification After Deployment

# Test the deployed certificate from outside
openssl s_client -connect api.acme.com:443 -servername api.acme.com < /dev/null 2>&1 \
  | grep -E "Verify return code|depth|Protocol|Cipher"
# Look for: Verify return code: 0 (ok)

# Verify the full chain
openssl s_client -connect api.acme.com:443 -servername api.acme.com \
  -showcerts < /dev/null 2>/dev/null | grep -E "s:|i:" | head -6
# Should show: depth 0 (your cert), depth 1 (intermediate)

# Confirm cert and key match on the server
CERT_HASH=$(openssl x509 -noout -modulus -in /etc/nginx/ssl/fullchain.pem | openssl md5)
KEY_HASH=$(openssl rsa -noout -modulus -in /etc/nginx/ssl/privkey.pem | openssl md5)
[ "$CERT_HASH" = "$KEY_HASH" ] && echo "MATCH" || echo "MISMATCH -- FIX THIS"

# Check from a completely clean environment
curl -vI https://api.acme.com 2>&1 | grep -E "SSL|subject|issuer|expire"

After deploying a certificate, always verify from three perspectives:

1. **openssl s_client** -- shows the raw TLS negotiation and chain
2. **curl with -v flag** -- shows what a typical HTTP client sees
3. **SSL Labs** (ssllabs.com/ssltest/) or **testssl.sh** -- comprehensive audit

\```bash
# Quick smoke test from the command line
echo | openssl s_client -connect api.acme.com:443 -servername api.acme.com 2>/dev/null \
  | openssl x509 -noout -dates -issuer -subject -checkend 0
\```

Do not trust that the cert works just because your browser shows a padlock. Your browser may have cached the intermediate from a previous visit. Test from a clean environment with no cached state.

Stage 5: Monitoring and Maintenance

This is where most teams fail. They deploy the certificate and forget about it. Then it expires, and everything breaks at the worst possible time.

Certificate Monitoring Script

#!/bin/bash
# cert-check.sh -- Check certificate expiry for a list of domains
DOMAINS="api.acme.com www.acme.com payments.acme.com auth.acme.com"
WARN_DAYS=30
CRIT_DAYS=7

for domain in $DOMAINS; do
    expiry=$(echo | openssl s_client -connect "$domain:443" \
      -servername "$domain" 2>/dev/null \
      | openssl x509 -noout -enddate 2>/dev/null \
      | cut -d= -f2)

    if [ -z "$expiry" ]; then
        echo "CRITICAL: Could not retrieve cert for $domain"
        continue
    fi

    # macOS date syntax; for Linux use: date -d "$expiry" +%s
    expiry_epoch=$(date -j -f "%b %d %T %Y %Z" "$expiry" +%s 2>/dev/null \
      || date -d "$expiry" +%s 2>/dev/null)
    now_epoch=$(date +%s)
    days_left=$(( (expiry_epoch - now_epoch) / 86400 ))

    if [ "$days_left" -lt "$CRIT_DAYS" ]; then
        echo "CRITICAL: $domain expires in $days_left days ($expiry)"
    elif [ "$days_left" -lt "$WARN_DAYS" ]; then
        echo "WARNING: $domain expires in $days_left days ($expiry)"
    else
        echo "OK: $domain expires in $days_left days"
    fi
done

Professional Monitoring Approaches

Prometheus + blackbox_exporter -- Scrapes TLS endpoints and exposes probe_ssl_earliest_cert_expiry as a metric. Set up Alertmanager rules for 30/14/7/3/1 day thresholds.
Nagios/Icinga check_ssl_cert plugin -- Traditional monitoring with alerting thresholds and escalation paths.
cert-manager (Kubernetes) -- Built-in certificate lifecycle management that monitors, renews, and deploys certificates automatically.
Datadog/Grafana synthetic monitors -- SaaS-based TLS monitoring with dashboards and PagerDuty integration.

**The Equifax Certificate Expiry Disaster (2017)**

In 2017, Equifax suffered one of the largest data breaches in history, exposing personal data of 147 million people. While the root cause was an unpatched Apache Struts vulnerability (CVE-2017-5638), the breach went **undetected for 76 days** partly because an expired SSL certificate on their network inspection appliance blinded their security monitoring.

The full timeline:

- **March 7, 2017**: Apache Struts vulnerability CVE-2017-5638 is publicly disclosed with a patch available.
- **March 9, 2017**: Equifax's security team sends an email directing that the patch be applied. It is not applied to the affected system.
- **March 15, 2017**: Equifax runs vulnerability scans but the affected system is not properly scanned.
- **May 13, 2017**: Attackers exploit the vulnerability and begin exfiltrating data through encrypted channels.
- **January 31, 2016 to July 29, 2017**: An SSL inspection certificate had been expired for **19 months**. The network monitoring appliance (a McAfee device that performed SSL/TLS interception to inspect encrypted traffic) could not decrypt outbound traffic during this entire period. The data exfiltration was happening in encrypted sessions that the monitoring tool could not see.
- **July 29, 2017**: Equifax finally updates the expired certificate. The monitoring tool immediately detects suspicious traffic.
- **July 30, 2017**: Breach is formally discovered and incident response begins.

An expired certificate -- something that costs nothing to renew -- contributed to 76 days of undetected data theft affecting nearly half the US adult population.

**The lesson:** Certificate monitoring is not just about preventing outages. It is about maintaining your security visibility. Every expired certificate is a potential blind spot in your defense. Equifax had the monitoring tool in place; it was the expired certificate that made it useless.

Stage 6: Renewal

Certificate renewal should be a non-event -- automated, tested, and completed well before expiry.

Manual Renewal Process

# 1. Generate a new key (recommended) or reuse the existing one
openssl ecparam -genkey -name prime256v1 -noout -out server-new.key

# 2. Create a new CSR
openssl req -new -key server-new.key -out server-renewal.csr \
  -subj "/C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/CN=api.acme.com" \
  -addext "subjectAltName=DNS:api.acme.com,DNS:acme.com"

# 3. Submit CSR to CA (process varies by CA)
# 4. Receive new certificate
# 5. Build the new fullchain
cat server-new.crt intermediate.crt > fullchain-new.pem

# 6. Test in staging first
openssl verify -CAfile root.crt -untrusted intermediate.crt server-new.crt

# 7. Deploy to production and reload the web server
cp fullchain-new.pem /etc/nginx/ssl/fullchain.pem
cp server-new.key /etc/nginx/ssl/privkey.pem
nginx -t && systemctl reload nginx

# 8. Verify from outside
openssl s_client -connect api.acme.com:443 -servername api.acme.com < /dev/null

Automated Renewal with Certbot

# Install certbot
sudo apt install certbot python3-certbot-nginx

# Obtain initial certificate (nginx plugin handles everything)
sudo certbot --nginx -d api.acme.com -d www.acme.com

# Certbot sets up automatic renewal via systemd timer
systemctl list-timers | grep certbot
# certbot.timer - twice daily check

# Test the renewal process without actually renewing
sudo certbot renew --dry-run

# The renewal flow:
# 1. certbot checks each certificate's expiry
# 2. If within 30 days of expiry, begins renewal
# 3. Generates new CSR, completes ACME challenge
# 4. Downloads new cert, installs to /etc/letsencrypt/live/
# 5. Runs deploy hooks (e.g., reload nginx)

Create a deploy hook to reload your web server after renewal:

# Create deploy hook
cat > /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh << 'EOF'
#!/bin/bash
nginx -t && systemctl reload nginx
echo "$(date): Certificate renewed and nginx reloaded" >> /var/log/certbot-deploy.log
EOF
chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh

Let's Encrypt and the ACME Protocol

How does Let's Encrypt actually work under the hood? It is not magic. It is a well-designed protocol called ACME -- Automatic Certificate Management Environment, standardized as RFC 8555.

The ACME Protocol Flow

sequenceDiagram
    participant Client as ACME Client<br/>(certbot)
    participant Server as ACME Server<br/>(Let's Encrypt)
    participant Challenge as Validation<br/>Target

    Note over Client,Server: Phase 1: Account Setup

    Client->>Server: POST /acme/new-account<br/>{contact: "admin@acme.com",<br/>termsOfServiceAgreed: true}
    Server->>Client: 201 Created<br/>{account URL, status: "valid"}

    Note over Client,Server: Phase 2: Order Creation

    Client->>Server: POST /acme/new-order<br/>{identifiers: [{type: "dns",<br/>value: "api.acme.com"}]}
    Server->>Client: 201 Created<br/>{authorizations: [authz-url],<br/>finalize: finalize-url}

    Note over Client,Server: Phase 3: Authorization Challenge

    Client->>Server: GET authz-url
    Server->>Client: {challenges: [{type: "http-01",<br/>token: "abc123",<br/>url: challenge-url}]}

    Client->>Challenge: Place token file at<br/>/.well-known/acme-challenge/abc123<br/>Content: abc123.thumbprint

    Client->>Server: POST challenge-url<br/>{} (empty, signals readiness)

    Server->>Challenge: GET http://api.acme.com/<br/>.well-known/acme-challenge/abc123
    Challenge->>Server: abc123.thumbprint

    Server->>Server: Validate token matches

    Note over Client,Server: Phase 4: Certificate Issuance

    Client->>Server: POST finalize-url<br/>{csr: "base64url-encoded-CSR"}
    Server->>Server: Verify CSR, sign certificate,<br/>submit to CT logs, embed SCTs

    Client->>Server: GET certificate-url
    Server->>Client: Certificate chain in PEM format<br/>(end-entity + intermediate)

Let's Encrypt Architecture

Let's Encrypt's infrastructure is a marvel of engineering, designed to issue millions of certificates per day with high availability:

graph TD
    subgraph "Let's Encrypt Infrastructure"
        Boulder["Boulder<br/>(ACME Server)<br/>Open-source Go application"]
        HSMs["HSM Cluster<br/>(Hardware Security Modules)<br/>Stores CA signing keys"]
        DB["MariaDB Cluster<br/>(Certificate database)<br/>Multi-datacenter replication"]
        VA["Validation Authority<br/>(Multi-perspective)<br/>Validates from multiple<br/>network vantage points"]
        CT["CT Log Submission<br/>(Submits to Google, Cloudflare,<br/>and other CT logs)"]
    end

    Client["ACME Clients<br/>(certbot, acme.sh,<br/>Caddy, cert-manager)"]

    Client -->|HTTPS| Boulder
    Boulder --> HSMs
    Boulder --> DB
    Boulder --> VA
    Boulder --> CT

    VA -->|HTTP/DNS| Internet["Internet<br/>(validates domain control<br/>from multiple locations)"]

Multi-perspective validation is a critical security feature. When Let's Encrypt validates a domain, it does so from multiple network vantage points simultaneously. An attacker performing a BGP hijack to redirect traffic to their server would need to hijack routes from all vantage points simultaneously, which is significantly harder than hijacking a single path. This was added after research showed that single-perspective validation was vulnerable to network-level attacks.

Why Let's Encrypt Changed Everything

Before Let's Encrypt launched in December 2015:

Certificates cost $50-$300/year from commercial CAs
Issuance required manual email exchanges, sometimes taking days
Renewal was manual and easy to forget
Many sites ran without HTTPS because of cost and complexity
HTTPS adoption was around 40% of web traffic

After Let's Encrypt:

Certificates are free, forever
Issuance is fully automated, taking seconds
Renewal is automated with no human intervention needed
As of 2026, Let's Encrypt has issued over 5 billion certificates
HTTPS adoption exceeds 95% of web traffic in most browsers

Why 90 days? That seems short -- and it is intentionally short for two reasons. First, if a key is compromised, the window of exposure is limited to at most 90 days (minus the time until the next renewal). Second, short lifetimes force automation. You cannot manually manage a 90-day certificate across hundreds of servers -- you have to automate. And automation is fundamentally more reliable than humans remembering to renew certificates. The industry is moving toward even shorter lifetimes: Apple has proposed 45-day maximum certificate validity by 2027, and some organizations already issue certificates valid for only 24 hours.

**ACME Beyond Let's Encrypt**

ACME is an open standard (RFC 8555), not proprietary to Let's Encrypt. Other CAs that support ACME:

- **ZeroSSL** -- Free DV certificates via ACME, operated by Stack Holdings
- **Buypass** -- Norwegian CA with free ACME-based certificates
- **Google Trust Services** -- Google's CA with ACME support
- **Sectigo** -- Commercial CA with ACME API for paid certificates
- **SSL.com** -- Commercial CA with ACME support

You can also run your own internal ACME server:

- **step-ca** (Smallstep) -- Open-source, easy to deploy, supports short-lived certificates (minutes to hours)
- **Caddy** -- Web server with built-in ACME client that automatically obtains and renews certificates
- **Boulder** -- Let's Encrypt's actual ACME server software (complex to self-host but fully featured)

Internal ACME servers let you bring the same automation that Let's Encrypt provides for public certificates to your internal services. This is the recommended approach for organizations with significant internal TLS infrastructure.

Stage 7: Revocation

Sometimes a certificate needs to die before its natural expiration. Key compromise, employee departure, domain ownership change, CA misissuance -- all require immediate revocation. The challenge is that revocation is one of the hardest problems in PKI, and none of the existing mechanisms work perfectly.

CRL: Certificate Revocation Lists

The oldest revocation mechanism. The CA periodically publishes a signed list of revoked certificate serial numbers at a URL specified in the certificate's CRL Distribution Points extension.

# Download and inspect a CRL
curl -o crl.der http://crl.example.com/ca.crl
openssl crl -in crl.der -inform DER -noout -text | head -30

# Check how many certificates are in the CRL
openssl crl -in crl.der -inform DER -noout -text | grep -c "Serial Number"

Problems with CRLs:

CRLs can grow to megabytes for busy CAs (Let's Encrypt's CRL would be enormous given their volume)
Clients must download the entire list to check one certificate -- there is no way to query for a single serial number
CRLs are cached with a "Next Update" timestamp; there is a window between revocation and the next CRL publication where revoked certificates are still accepted
Downloading a CRL on every TLS connection is too slow for interactive web browsing
Most browsers stopped checking CRLs years ago due to performance impact and privacy concerns (the download reveals which sites the user visits)

OCSP: Online Certificate Status Protocol

OCSP is a real-time, per-certificate revocation check. The client sends the certificate's serial number to the CA's OCSP responder and gets back a signed "good," "revoked," or "unknown" status.

sequenceDiagram
    participant Browser as Browser
    participant OCSP as CA's OCSP Responder

    Browser->>Browser: Received certificate with<br/>serial 0A:3C:7F...
    Browser->>OCSP: Is serial 0A:3C:7F still valid?<br/>(HTTP GET or POST)
    OCSP->>OCSP: Look up revocation database
    OCSP->>Browser: Signed OCSP Response:<br/>Status: GOOD<br/>This Update: 2026-03-12<br/>Next Update: 2026-03-19<br/>Signed by: CA's OCSP key
    Browser->>Browser: Verify OCSP response signature<br/>Check status and freshness<br/>Proceed with connection

# Get the OCSP responder URL from a certificate
openssl x509 -in server.crt -noout -ocsp_uri
# http://ocsp.digicert.com

# Query the OCSP responder
openssl ocsp -issuer intermediate.crt -cert server.crt \
  -url http://ocsp.digicert.com -resp_text -noverify

Problems with OCSP:

Privacy: The CA's OCSP responder sees every site the user visits, because the browser queries the CA for every new TLS connection. This is a massive privacy leak -- the CA can build browsing profiles for every user.
Latency: An extra HTTP round-trip (sometimes 100-300ms) for every new TLS connection, directly impacting page load times.
Availability: If the OCSP responder is down, the browser must decide: fail-open (accept the certificate, defeating the purpose of revocation checking) or fail-closed (reject the certificate, breaking the website). Most browsers fail-open because the alternative breaks too many websites. This means OCSP provides no security benefit when the responder is unavailable -- which is exactly when an attacker might suppress it.

OCSP Stapling: The Best of Both Worlds

OCSP stapling elegantly solves the privacy and latency problems by having the server fetch its own OCSP response and include ("staple") it in the TLS handshake.

sequenceDiagram
    participant Server as Web Server
    participant OCSP as CA's OCSP Responder
    participant Browser as Browser

    Note over Server,OCSP: Background: Server periodically<br/>fetches its own OCSP response

    Server->>OCSP: Am I still valid?<br/>(every few hours)
    OCSP->>Server: Signed OCSP Response:<br/>Status: GOOD<br/>Valid for 7 days

    Note over Server: Server caches the<br/>signed OCSP response

    Note over Server,Browser: During TLS Handshake

    Browser->>Server: ClientHello<br/>(with status_request extension)
    Server->>Browser: ServerHello + Certificate +<br/>Stapled OCSP Response
    Browser->>Browser: Verify OCSP response:<br/>1. Signed by CA's OCSP key ✓<br/>2. Status is GOOD ✓<br/>3. Response is fresh ✓<br/>No need to contact CA directly!

Enabling OCSP Stapling:

# Nginx
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;
ssl_trusted_certificate /etc/nginx/ssl/chain.pem;  # intermediate + root

# Test if a server supports OCSP stapling
openssl s_client -connect www.example.com:443 -servername www.example.com \
  -status < /dev/null 2>/dev/null | grep -A5 "OCSP Response"
# If stapling is working, you'll see "OCSP Response Status: successful"
# If not, you'll see "OCSP response: no response sent"

OCSP stapling limitations: If the server is the one that has been compromised (key stolen), the server can choose not to staple -- it simply does not include the OCSP response, and most clients will not fail. The OCSP Must-Staple extension (OID 1.3.6.1.5.5.7.1.24) was designed to fix this: a certificate with this extension requires the server to staple, and clients must reject connections without a stapled response. However, OCSP Must-Staple has not been widely adopted because OCSP responder outages can cause hard failures.

So we have three revocation mechanisms and none of them fully works. This is one of the genuinely unsolved problems in PKI. The industry's current pragmatic approach combines multiple strategies:

Short certificate lifetimes -- If a certificate expires in 90 days (or less), the window where revocation matters is small
CRLite (Mozilla) -- Compresses the entire revocation status of every certificate on the internet into a ~10MB Bloom filter cascade that Firefox downloads daily, checking revocation locally without privacy leakage
CRLSets (Chrome) -- Google maintains a curated list of high-priority revocations pushed to Chrome, covering major incidents but not all revoked certificates
OCSP stapling where supported -- Eliminates privacy and latency concerns
Very short-lived certificates -- Some organizations issue certificates valid for hours, eliminating the need for revocation entirely

Revoking a Certificate in Practice

# Revoke via Let's Encrypt / certbot
sudo certbot revoke --cert-path /etc/letsencrypt/live/api.acme.com/cert.pem \
  --reason keycompromise

# Revoke using the account key (alternative proof of ownership)
sudo certbot revoke --cert-path cert.pem --key-path privkey.pem

# Revocation reasons (RFC 5280 Section 5.3.1):
# 0 - unspecified
# 1 - keyCompromise          (private key stolen or leaked)
# 2 - cACompromise           (CA's key was compromised)
# 3 - affiliationChanged     (subject's name or affiliation changed)
# 4 - superseded             (replaced by a new certificate)
# 5 - cessationOfOperation   (subject no longer operates the domain)
# 9 - privilegeWithdrawn     (authorization has been revoked)

Certificate Automation at Scale

When you have hundreds or thousands of certificates across dozens of services, manual management is impossible. You need automation that handles the full lifecycle.

Kubernetes cert-manager

# cert-manager ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@acme.com
    privateKeySecretRef:
      name: letsencrypt-prod-account
    solvers:
    - http01:
        ingress:
          class: nginx
---
# Certificate resource -- cert-manager handles everything else
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: production
spec:
  secretName: api-tls-secret      # K8s secret where cert+key are stored
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - api.acme.com
  - www.acme.com
  renewBefore: 720h               # Renew 30 days before expiry

cert-manager handles the entire lifecycle: creates the private key, generates the CSR, completes the ACME challenge, downloads the certificate, stores it in a Kubernetes Secret, monitors expiry, and renews automatically. It is the standard solution for TLS in Kubernetes.

HashiCorp Vault PKI

# Enable the PKI secrets engine
vault secrets enable pki

# Set max TTL
vault secrets tune -max-lease-ttl=87600h pki

# Generate the internal root CA
vault write pki/root/generate/internal \
  common_name="Acme Internal Root CA" \
  ttl=87600h

# Create a role for issuing certificates
vault write pki/roles/acme-services \
  allowed_domains="acme.internal" \
  allow_subdomains=true \
  max_ttl=2160h \
  key_type=ec \
  key_bits=256

# Issue a certificate (returns cert, key, and chain)
vault write pki/issue/acme-services \
  common_name="api.acme.internal" \
  alt_names="grpc.acme.internal" \
  ttl=720h

Vault's PKI engine is particularly powerful for short-lived certificates. Some teams configure TTLs of 24 hours or even 1 hour, completely eliminating the need for revocation infrastructure. The trade-off is that every service must be able to request new certificates frequently, which requires tight integration with your deployment platform.

Build a complete certificate lifecycle automation on your local machine:

1. Set up a local CA using `step-ca`:
\```bash
# Install step CLI and step-ca
brew install step  # macOS
# or: wget https://dl.step.sm/gh-release/cli/latest/step-cli.tar.gz

# Initialize a new CA
step ca init --name "Dev CA" --dns localhost --address :8443 \
  --provisioner admin

# Start the CA server
step-ca $(step path)/config/ca.json
\```

2. Use ACME to get a certificate:
\```bash
step ca certificate api.dev.local api.crt api.key \
  --provisioner acme --san api.dev.local
\```

3. Write the monitoring script from earlier in this chapter to check the cert's expiry

4. Set up a cron job to renew before expiry

This gives you hands-on experience with every lifecycle stage in a safe, local environment.

Common Certificate Lifecycle Failures

Failure 1: The 3 AM Expiry

A startup had a single wildcard certificate that covered their entire infrastructure: API, dashboard, webhook endpoints, partner integrations, and monitoring dashboard. The certificate expired on a Saturday night. Their on-call engineer was at a wedding with their phone on silent.

By the time someone noticed:
- Webhook deliveries to 200+ partners had been failing for 6 hours
- Partner systems had queued up millions of retry events
- Their own monitoring dashboard was behind the same certificate, so Grafana was inaccessible and PagerDuty alerts were delayed
- Recovery took 14 hours because nobody could find the CA login credentials -- they were in the personal email of an engineer who had left the company
- Several enterprise partners triggered breach notification procedures because they assumed the failed webhooks indicated a security incident

**Total cost**: an estimated $400,000 in lost revenue, SLA penalty payments to partners, and engineering time for recovery.

**Prevention (any one of these would have avoided the outage):**
- Monitor certificates with at least two independent systems (external + internal)
- Alert at 30, 14, 7, 3, and 1 day before expiry, with escalation
- Store CA credentials in a shared secrets manager (Vault, 1Password Teams), not one person's email
- Never put your monitoring system behind the same certificates you are monitoring
- Automate renewal with certbot or cert-manager so certificates renew without human intervention

Failure 2: The Silent Renewal Failure

Auto-renewal can fail silently for many reasons:

DNS provider API credentials rotated but certbot configuration was not updated
Firewall rule changed, blocking port 80 for HTTP-01 challenges on the validation path
Server was replaced during migration, certbot cron job was not included in the new image
Let's Encrypt rate limits hit (50 certificates per registered domain per week)
DNS propagation delay caused the DNS-01 challenge to fail
The certbot package was upgraded and the renewal configuration format changed

Always verify that renewals are actually succeeding, not just that the timer is running:

# Check certbot renewal logs
cat /var/log/letsencrypt/letsencrypt.log | tail -50

# Verify the actual certificate on disk matches what the server is serving
openssl x509 -in /etc/letsencrypt/live/api.acme.com/cert.pem -noout -enddate
# Compare with:
echo | openssl s_client -connect api.acme.com:443 2>/dev/null | openssl x509 -noout -enddate
# If these dates differ, the server hasn't loaded the new cert (needs reload)

Failure 3: The Let's Encrypt Root Transition

In September 2021, Let's Encrypt's original root certificate (DST Root CA X3) expired. Let's Encrypt had transitioned to their own root (ISRG Root X1), but older devices that did not have ISRG Root X1 in their trust stores suddenly could not verify Let's Encrypt certificates. This affected millions of devices, particularly:

Android devices running versions older than 7.1.1 (released in 2016)
Older OpenSSL versions (1.0.2 and earlier) that did not handle the cross-sign chain correctly
Embedded devices and IoT hardware with frozen trust stores

The mitigation was a creative cross-signing arrangement, but the incident demonstrated that root certificate transitions affect the real world in surprising ways, and that the long tail of legacy devices creates lasting compatibility challenges.

What You've Learned

This chapter walked through every stage of a certificate's life, from birth to death:

Key generation -- ECDSA P-256 for modern deployments, with strict file permissions and HSMs for production CAs; the private key is the most sensitive artifact in the entire lifecycle
CSR creation -- The formal request that ties your identity to your public key, self-signed to prove possession; always inspect the CSR before submitting
CA signing -- Domain validation methods (HTTP-01, DNS-01, TLS-ALPN-01) each have distinct trade-offs for automation, wildcard support, and network accessibility
Deployment -- Always send the full chain (end-entity + intermediates); verify from outside with openssl, curl, and SSL Labs; never trust browser caching
Monitoring -- Automate expiry alerts at multiple thresholds using multiple independent systems; an expired certificate can blind your security monitoring (Equifax) or halt your revenue
Renewal -- Automate with certbot, cert-manager, or Vault; manual renewal is a ticking time bomb that will eventually explode
Revocation -- CRL, OCSP, and OCSP stapling each have fundamental trade-offs; the industry is converging on short-lived certificates as the pragmatic solution
ACME protocol -- The open standard (RFC 8555) that Let's Encrypt popularized, enabling free, automated certificate management for the entire internet
Automation at scale -- cert-manager for Kubernetes, Vault PKI for multi-platform, ACME for standardized automation

That payment gateway outage? Entirely preventable. One monitoring alert, one certbot cron job, one cert-manager resource. Pick any one of those, and nobody would have learned about certificate lifecycles the expensive way. Set up monitoring for every certificate you have. Today. And while you are at it, audit the monitoring system's own certificates. Too many teams have had their monitoring dashboard go down alongside the service it was supposed to be monitoring, all because they used the same wildcard certificate.

Network Security: Applied Principles & Modern Defense