Certificate Lifecycle
"A certificate is not a set-and-forget artifact. It is born, it lives, and it must be allowed to die gracefully -- or it will die catastrophically." -- Unknown sysadmin, 3 AM during a cert expiry outage
Picture this: the payment gateway returns 502 errors. All transactions are failing. Revenue impact is roughly $12,000 per minute. The load balancer health checks are failing, the backend is fine, but the TLS handshake is dying.
echo | openssl s_client -connect payments.internal:443 -servername payments.internal 2>/dev/null \
| openssl x509 -noout -dates
The output reads notAfter=Mar 12 00:00:00 2026 GMT. The certificate expired today. A certificate that nobody was watching just cost the company six figures. Understanding the entire certificate lifecycle is what prevents this from happening on your watch.
The Certificate Lifecycle: End to End
Every certificate goes through a defined series of stages. Understanding each stage -- and what can go wrong at each -- is the difference between smooth operations and midnight outages. The lifecycle is not linear; it is a loop with an emergency exit.
stateDiagram-v2
[*] --> KeyGeneration: Generate cryptographic key pair
KeyGeneration --> CSRCreation: Create Certificate Signing Request
CSRCreation --> Validation: Submit CSR to CA
Validation --> Issuance: CA validates and signs
Issuance --> Deployment: Install cert + key on server
Deployment --> Monitoring: Monitor expiry, revocation, health
Monitoring --> Renewal: Approaching expiry threshold
Renewal --> Validation: New CSR or ACME renewal
Monitoring --> Revocation: Key compromise or policy change
Revocation --> KeyGeneration: Generate new key pair
note right of KeyGeneration
RSA 2048+ or ECDSA P-256
Protect with strict file permissions
end note
note right of Monitoring
Alert at 30, 14, 7, 3, 1 days
Multiple independent systems
end note
note right of Revocation
CRL, OCSP, or OCSP Stapling
Immediate action required
end note
Stage 1: Key Generation
Everything starts with generating a cryptographic key pair. The private key is the most sensitive artifact in the entire lifecycle. If it is compromised at any point, the certificate must be revoked and the entire lifecycle restarts from scratch.
# Generate an RSA 2048-bit private key
openssl genrsa -out server.key 2048
# Generate a 4096-bit key for higher security
# Trade-off: slower TLS handshakes (RSA 4096 signature verification
# takes roughly 4x longer than RSA 2048)
openssl genrsa -out server-strong.key 4096
# Generate an ECDSA key with P-256 curve (recommended for most use cases)
openssl ecparam -genkey -name prime256v1 -noout -out server-ec.key
# Generate an ECDSA key with P-384 curve (required by some government standards)
openssl ecparam -genkey -name secp384r1 -noout -out server-ec384.key
# Encrypt the private key at rest with AES-256
# This means the key file requires a passphrase to use
openssl rsa -aes256 -in server.key -out server-encrypted.key
# Verify the key was generated correctly
openssl rsa -in server.key -check -noout
# or for EC keys:
openssl ec -in server-ec.key -check -noout
So how do you choose between RSA and ECDSA? In 2026, ECDSA P-256 is the default recommendation for most use cases. Here is the comparison that matters operationally:
| Property | RSA 2048 | RSA 4096 | ECDSA P-256 |
|---|---|---|---|
| Security level | ~112 bits | ~140 bits | ~128 bits |
| Key size | 256 bytes | 512 bytes | 32 bytes |
| Signature size | 256 bytes | 512 bytes | 64 bytes |
| Sign speed | Fast | Slower | Fast |
| Verify speed | Very fast | Fast | Moderate |
| TLS handshake impact | Baseline | ~2x slower | ~1.5x faster |
| Client compatibility | Universal | Universal | All modern clients |
| Post-quantum resistance | None | None | None |
ECDSA keys are smaller, which means smaller certificates, less bandwidth during the TLS handshake, and faster operations on mobile devices. RSA 2048 is still widely deployed and not broken, but there is no reason to choose it for new deployments unless you need to support very old clients (Android 2.x era).
For post-quantum considerations, neither RSA nor ECDSA is resistant. The industry is beginning to experiment with hybrid certificates that include both classical and post-quantum key material, but standardization is ongoing.
**Private key security is non-negotiable.**
- Never generate keys on shared systems where other users have root access
- Never store unencrypted private keys in version control (check your `.gitignore`)
- Never transmit private keys over unencrypted channels -- not even internal chat tools
- Set file permissions immediately: `chmod 600 server.key` (owner read/write only)
- For production CAs, use Hardware Security Modules (HSMs) where the key never leaves tamper-resistant hardware
- Consider generating the key on the target server so it never traverses a network
- Use `shred` or `srm` when deleting old key files -- regular `rm` leaves data recoverable on disk
- Audit who has access to key files regularly. A key is only as secure as the most careless person who can read it.
Stage 2: Certificate Signing Request (CSR)
A CSR is a formal request to a CA to sign your public key. It contains your identity information and your public key, signed with your private key to prove possession. The CSR itself is not secret -- it contains only public information. But the process of creating it correctly matters.
Creating a CSR Step by Step
# Method 1: Interactive CSR generation (prompts for each field)
openssl req -new -key server.key -out server.csr
# You'll be prompted for:
# Country Name (2 letter code) []: IN
# State or Province Name []: Karnataka
# Locality Name []: Bangalore
# Organization Name []: Acme Corp
# Organizational Unit Name []: Engineering
# Common Name []: api.acme.com
# Email Address []: (leave blank for server certs)
# Method 2: Non-interactive CSR generation (all in one command)
openssl req -new -key server.key -out server.csr \
-subj "/C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/OU=Engineering/CN=api.acme.com"
# Method 3: CSR with Subject Alternative Names
# This is the recommended approach for modern certificates
openssl req -new -key server.key -out server.csr \
-subj "/C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/CN=acme.com" \
-addext "subjectAltName=DNS:acme.com,DNS:www.acme.com,DNS:api.acme.com"
# Method 4: Generate key and CSR in a single command
openssl req -new -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
-keyout server.key -out server.csr -nodes \
-subj "/CN=api.acme.com" \
-addext "subjectAltName=DNS:api.acme.com,DNS:acme.com"
Inspecting a CSR Before Submission
Always inspect the CSR before submitting it to a CA. Mistakes here mean you get a certificate with wrong information and have to start over.
# Display the CSR contents in human-readable form
openssl req -in server.csr -noout -text
# Verify the CSR's self-signature (proves the CSR creator has the private key)
openssl req -in server.csr -noout -verify
# verify OK
# Just show the subject
openssl req -in server.csr -noout -subject
# subject=C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/CN=api.acme.com
Here is what the full CSR output looks like and what to check:
Certificate Request:
Data:
Version: 1 (0x0)
Subject: C=IN, ST=Karnataka, L=Bangalore, O=Acme Corp, CN=api.acme.com
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit) <-- Verify key type and size
pub:
04:a1:b2:c3:... <-- The actual public key bytes
ASN1 OID: prime256v1 <-- Curve name
Attributes:
Requested Extensions:
X509v3 Subject Alternative Name:
DNS:api.acme.com, DNS:acme.com <-- Verify all hostnames are correct
Signature Algorithm: ecdsa-with-SHA256 <-- Self-signature algorithm
30:45:02:21:00:... <-- Proves possession of private key
Why does the CSR need to be signed? The self-signature on the CSR serves as a proof-of-possession. It proves that whoever created the CSR actually holds the private key corresponding to the public key in the request. Without that signature, an attacker could submit a CSR containing someone else's public key and trick a CA into issuing a certificate that ties a victim's identity to the attacker's key. The CSR signature makes this impossible -- only the private key holder can create a valid CSR signature.
Stage 3: CA Signing (Validation and Issuance)
The CA receives your CSR, validates your identity (to the level required by the certificate type), and issues a signed certificate. For DV certificates, validation means proving domain control.
Domain Validation Methods
graph LR
subgraph "HTTP-01 Challenge"
A1[CA sends token] --> B1[Place token at<br/>/.well-known/acme-challenge/TOKEN]
B1 --> C1[CA fetches token via HTTP<br/>from port 80]
C1 --> D1[Token matches = Domain control proven]
end
subgraph "DNS-01 Challenge"
A2[CA sends token] --> B2[Create TXT record<br/>_acme-challenge.domain.com]
B2 --> C2[CA queries DNS<br/>for TXT record]
C2 --> D2[Record matches = Domain control proven]
end
subgraph "TLS-ALPN-01 Challenge"
A3[CA sends token] --> B3[Configure server to present<br/>self-signed cert with token<br/>via ALPN protocol on port 443]
B3 --> C3[CA connects to port 443<br/>using acme-tls/1 ALPN]
C3 --> D3[Token in cert = Domain control proven]
end
Each method has specific trade-offs:
HTTP-01 is the simplest and works for most servers with port 80 open. Limitations: requires port 80 accessible from the internet; cannot be used for wildcard certificates; the CA's validation servers must be able to reach your server directly.
DNS-01 is required for wildcard certificates and works even when the server is not publicly accessible. You just need API access to your DNS provider. Limitations: DNS propagation can take minutes; requires automating DNS record creation; if your DNS provider's API is slow or unreliable, renewals can fail.
TLS-ALPN-01 is useful when port 80 is blocked but 443 is open. Limitations: requires server-side support; less widely supported by ACME clients; cannot be used for wildcard certificates.
Which challenge should you use? For single servers with port 80 open, HTTP-01 is simplest. For wildcard certificates, DNS-01 is your only option. For automation at scale, DNS-01 with an API-driven DNS provider (Cloudflare, Route 53, Google Cloud DNS) is the most flexible because it does not require the server being reachable from the internet. DNS-01 is the best default for everything.
Stage 4: Certificate Deployment
Once you have the signed certificate, you deploy it alongside the private key and the intermediate certificate chain. Getting the chain right is where most teams stumble.
Server Configuration
# Nginx -- CORRECT configuration
server {
listen 443 ssl http2;
server_name api.acme.com;
# fullchain.pem = your cert + intermediate(s), concatenated in order
ssl_certificate /etc/nginx/ssl/fullchain.pem;
# Private key -- must match the public key in the cert
ssl_certificate_key /etc/nginx/ssl/privkey.pem;
# Modern TLS configuration
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
# OCSP stapling (server fetches its own revocation status)
ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/nginx/ssl/chain.pem; # intermediate + root
resolver 8.8.8.8 8.8.4.4 valid=300s;
}
Building the Chain File Correctly
# CORRECT: end-entity cert first, then intermediate(s)
cat server.crt intermediate.crt > fullchain.pem
# WRONG: reversed order
cat intermediate.crt server.crt > fullchain.pem # DO NOT DO THIS
# WRONG: including the root
cat server.crt intermediate.crt root.crt > fullchain.pem # DO NOT DO THIS
# The root must already be in the client's trust store
# WRONG: only the server cert, no intermediate
cp server.crt fullchain.pem # DO NOT DO THIS
# Works on some browsers (cached intermediate) but fails on Android/Java/curl
Verification After Deployment
# Test the deployed certificate from outside
openssl s_client -connect api.acme.com:443 -servername api.acme.com < /dev/null 2>&1 \
| grep -E "Verify return code|depth|Protocol|Cipher"
# Look for: Verify return code: 0 (ok)
# Verify the full chain
openssl s_client -connect api.acme.com:443 -servername api.acme.com \
-showcerts < /dev/null 2>/dev/null | grep -E "s:|i:" | head -6
# Should show: depth 0 (your cert), depth 1 (intermediate)
# Confirm cert and key match on the server
CERT_HASH=$(openssl x509 -noout -modulus -in /etc/nginx/ssl/fullchain.pem | openssl md5)
KEY_HASH=$(openssl rsa -noout -modulus -in /etc/nginx/ssl/privkey.pem | openssl md5)
[ "$CERT_HASH" = "$KEY_HASH" ] && echo "MATCH" || echo "MISMATCH -- FIX THIS"
# Check from a completely clean environment
curl -vI https://api.acme.com 2>&1 | grep -E "SSL|subject|issuer|expire"
After deploying a certificate, always verify from three perspectives:
1. **openssl s_client** -- shows the raw TLS negotiation and chain
2. **curl with -v flag** -- shows what a typical HTTP client sees
3. **SSL Labs** (ssllabs.com/ssltest/) or **testssl.sh** -- comprehensive audit
\```bash
# Quick smoke test from the command line
echo | openssl s_client -connect api.acme.com:443 -servername api.acme.com 2>/dev/null \
| openssl x509 -noout -dates -issuer -subject -checkend 0
\```
Do not trust that the cert works just because your browser shows a padlock. Your browser may have cached the intermediate from a previous visit. Test from a clean environment with no cached state.
Stage 5: Monitoring and Maintenance
This is where most teams fail. They deploy the certificate and forget about it. Then it expires, and everything breaks at the worst possible time.
Certificate Monitoring Script
#!/bin/bash
# cert-check.sh -- Check certificate expiry for a list of domains
DOMAINS="api.acme.com www.acme.com payments.acme.com auth.acme.com"
WARN_DAYS=30
CRIT_DAYS=7
for domain in $DOMAINS; do
expiry=$(echo | openssl s_client -connect "$domain:443" \
-servername "$domain" 2>/dev/null \
| openssl x509 -noout -enddate 2>/dev/null \
| cut -d= -f2)
if [ -z "$expiry" ]; then
echo "CRITICAL: Could not retrieve cert for $domain"
continue
fi
# macOS date syntax; for Linux use: date -d "$expiry" +%s
expiry_epoch=$(date -j -f "%b %d %T %Y %Z" "$expiry" +%s 2>/dev/null \
|| date -d "$expiry" +%s 2>/dev/null)
now_epoch=$(date +%s)
days_left=$(( (expiry_epoch - now_epoch) / 86400 ))
if [ "$days_left" -lt "$CRIT_DAYS" ]; then
echo "CRITICAL: $domain expires in $days_left days ($expiry)"
elif [ "$days_left" -lt "$WARN_DAYS" ]; then
echo "WARNING: $domain expires in $days_left days ($expiry)"
else
echo "OK: $domain expires in $days_left days"
fi
done
Professional Monitoring Approaches
- Prometheus + blackbox_exporter -- Scrapes TLS endpoints and exposes
probe_ssl_earliest_cert_expiryas a metric. Set up Alertmanager rules for 30/14/7/3/1 day thresholds. - Nagios/Icinga
check_ssl_certplugin -- Traditional monitoring with alerting thresholds and escalation paths. - cert-manager (Kubernetes) -- Built-in certificate lifecycle management that monitors, renews, and deploys certificates automatically.
- Datadog/Grafana synthetic monitors -- SaaS-based TLS monitoring with dashboards and PagerDuty integration.
**The Equifax Certificate Expiry Disaster (2017)**
In 2017, Equifax suffered one of the largest data breaches in history, exposing personal data of 147 million people. While the root cause was an unpatched Apache Struts vulnerability (CVE-2017-5638), the breach went **undetected for 76 days** partly because an expired SSL certificate on their network inspection appliance blinded their security monitoring.
The full timeline:
- **March 7, 2017**: Apache Struts vulnerability CVE-2017-5638 is publicly disclosed with a patch available.
- **March 9, 2017**: Equifax's security team sends an email directing that the patch be applied. It is not applied to the affected system.
- **March 15, 2017**: Equifax runs vulnerability scans but the affected system is not properly scanned.
- **May 13, 2017**: Attackers exploit the vulnerability and begin exfiltrating data through encrypted channels.
- **January 31, 2016 to July 29, 2017**: An SSL inspection certificate had been expired for **19 months**. The network monitoring appliance (a McAfee device that performed SSL/TLS interception to inspect encrypted traffic) could not decrypt outbound traffic during this entire period. The data exfiltration was happening in encrypted sessions that the monitoring tool could not see.
- **July 29, 2017**: Equifax finally updates the expired certificate. The monitoring tool immediately detects suspicious traffic.
- **July 30, 2017**: Breach is formally discovered and incident response begins.
An expired certificate -- something that costs nothing to renew -- contributed to 76 days of undetected data theft affecting nearly half the US adult population.
**The lesson:** Certificate monitoring is not just about preventing outages. It is about maintaining your security visibility. Every expired certificate is a potential blind spot in your defense. Equifax had the monitoring tool in place; it was the expired certificate that made it useless.
Stage 6: Renewal
Certificate renewal should be a non-event -- automated, tested, and completed well before expiry.
Manual Renewal Process
# 1. Generate a new key (recommended) or reuse the existing one
openssl ecparam -genkey -name prime256v1 -noout -out server-new.key
# 2. Create a new CSR
openssl req -new -key server-new.key -out server-renewal.csr \
-subj "/C=IN/ST=Karnataka/L=Bangalore/O=Acme Corp/CN=api.acme.com" \
-addext "subjectAltName=DNS:api.acme.com,DNS:acme.com"
# 3. Submit CSR to CA (process varies by CA)
# 4. Receive new certificate
# 5. Build the new fullchain
cat server-new.crt intermediate.crt > fullchain-new.pem
# 6. Test in staging first
openssl verify -CAfile root.crt -untrusted intermediate.crt server-new.crt
# 7. Deploy to production and reload the web server
cp fullchain-new.pem /etc/nginx/ssl/fullchain.pem
cp server-new.key /etc/nginx/ssl/privkey.pem
nginx -t && systemctl reload nginx
# 8. Verify from outside
openssl s_client -connect api.acme.com:443 -servername api.acme.com < /dev/null
Automated Renewal with Certbot
# Install certbot
sudo apt install certbot python3-certbot-nginx
# Obtain initial certificate (nginx plugin handles everything)
sudo certbot --nginx -d api.acme.com -d www.acme.com
# Certbot sets up automatic renewal via systemd timer
systemctl list-timers | grep certbot
# certbot.timer - twice daily check
# Test the renewal process without actually renewing
sudo certbot renew --dry-run
# The renewal flow:
# 1. certbot checks each certificate's expiry
# 2. If within 30 days of expiry, begins renewal
# 3. Generates new CSR, completes ACME challenge
# 4. Downloads new cert, installs to /etc/letsencrypt/live/
# 5. Runs deploy hooks (e.g., reload nginx)
Create a deploy hook to reload your web server after renewal:
# Create deploy hook
cat > /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh << 'EOF'
#!/bin/bash
nginx -t && systemctl reload nginx
echo "$(date): Certificate renewed and nginx reloaded" >> /var/log/certbot-deploy.log
EOF
chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh
Let's Encrypt and the ACME Protocol
How does Let's Encrypt actually work under the hood? It is not magic. It is a well-designed protocol called ACME -- Automatic Certificate Management Environment, standardized as RFC 8555.
The ACME Protocol Flow
sequenceDiagram
participant Client as ACME Client<br/>(certbot)
participant Server as ACME Server<br/>(Let's Encrypt)
participant Challenge as Validation<br/>Target
Note over Client,Server: Phase 1: Account Setup
Client->>Server: POST /acme/new-account<br/>{contact: "admin@acme.com",<br/>termsOfServiceAgreed: true}
Server->>Client: 201 Created<br/>{account URL, status: "valid"}
Note over Client,Server: Phase 2: Order Creation
Client->>Server: POST /acme/new-order<br/>{identifiers: [{type: "dns",<br/>value: "api.acme.com"}]}
Server->>Client: 201 Created<br/>{authorizations: [authz-url],<br/>finalize: finalize-url}
Note over Client,Server: Phase 3: Authorization Challenge
Client->>Server: GET authz-url
Server->>Client: {challenges: [{type: "http-01",<br/>token: "abc123",<br/>url: challenge-url}]}
Client->>Challenge: Place token file at<br/>/.well-known/acme-challenge/abc123<br/>Content: abc123.thumbprint
Client->>Server: POST challenge-url<br/>{} (empty, signals readiness)
Server->>Challenge: GET http://api.acme.com/<br/>.well-known/acme-challenge/abc123
Challenge->>Server: abc123.thumbprint
Server->>Server: Validate token matches
Note over Client,Server: Phase 4: Certificate Issuance
Client->>Server: POST finalize-url<br/>{csr: "base64url-encoded-CSR"}
Server->>Server: Verify CSR, sign certificate,<br/>submit to CT logs, embed SCTs
Client->>Server: GET certificate-url
Server->>Client: Certificate chain in PEM format<br/>(end-entity + intermediate)
Let's Encrypt Architecture
Let's Encrypt's infrastructure is a marvel of engineering, designed to issue millions of certificates per day with high availability:
graph TD
subgraph "Let's Encrypt Infrastructure"
Boulder["Boulder<br/>(ACME Server)<br/>Open-source Go application"]
HSMs["HSM Cluster<br/>(Hardware Security Modules)<br/>Stores CA signing keys"]
DB["MariaDB Cluster<br/>(Certificate database)<br/>Multi-datacenter replication"]
VA["Validation Authority<br/>(Multi-perspective)<br/>Validates from multiple<br/>network vantage points"]
CT["CT Log Submission<br/>(Submits to Google, Cloudflare,<br/>and other CT logs)"]
end
Client["ACME Clients<br/>(certbot, acme.sh,<br/>Caddy, cert-manager)"]
Client -->|HTTPS| Boulder
Boulder --> HSMs
Boulder --> DB
Boulder --> VA
Boulder --> CT
VA -->|HTTP/DNS| Internet["Internet<br/>(validates domain control<br/>from multiple locations)"]
Multi-perspective validation is a critical security feature. When Let's Encrypt validates a domain, it does so from multiple network vantage points simultaneously. An attacker performing a BGP hijack to redirect traffic to their server would need to hijack routes from all vantage points simultaneously, which is significantly harder than hijacking a single path. This was added after research showed that single-perspective validation was vulnerable to network-level attacks.
Why Let's Encrypt Changed Everything
Before Let's Encrypt launched in December 2015:
- Certificates cost $50-$300/year from commercial CAs
- Issuance required manual email exchanges, sometimes taking days
- Renewal was manual and easy to forget
- Many sites ran without HTTPS because of cost and complexity
- HTTPS adoption was around 40% of web traffic
After Let's Encrypt:
- Certificates are free, forever
- Issuance is fully automated, taking seconds
- Renewal is automated with no human intervention needed
- As of 2026, Let's Encrypt has issued over 5 billion certificates
- HTTPS adoption exceeds 95% of web traffic in most browsers
Why 90 days? That seems short -- and it is intentionally short for two reasons. First, if a key is compromised, the window of exposure is limited to at most 90 days (minus the time until the next renewal). Second, short lifetimes force automation. You cannot manually manage a 90-day certificate across hundreds of servers -- you have to automate. And automation is fundamentally more reliable than humans remembering to renew certificates. The industry is moving toward even shorter lifetimes: Apple has proposed 45-day maximum certificate validity by 2027, and some organizations already issue certificates valid for only 24 hours.
**ACME Beyond Let's Encrypt**
ACME is an open standard (RFC 8555), not proprietary to Let's Encrypt. Other CAs that support ACME:
- **ZeroSSL** -- Free DV certificates via ACME, operated by Stack Holdings
- **Buypass** -- Norwegian CA with free ACME-based certificates
- **Google Trust Services** -- Google's CA with ACME support
- **Sectigo** -- Commercial CA with ACME API for paid certificates
- **SSL.com** -- Commercial CA with ACME support
You can also run your own internal ACME server:
- **step-ca** (Smallstep) -- Open-source, easy to deploy, supports short-lived certificates (minutes to hours)
- **Caddy** -- Web server with built-in ACME client that automatically obtains and renews certificates
- **Boulder** -- Let's Encrypt's actual ACME server software (complex to self-host but fully featured)
Internal ACME servers let you bring the same automation that Let's Encrypt provides for public certificates to your internal services. This is the recommended approach for organizations with significant internal TLS infrastructure.
Stage 7: Revocation
Sometimes a certificate needs to die before its natural expiration. Key compromise, employee departure, domain ownership change, CA misissuance -- all require immediate revocation. The challenge is that revocation is one of the hardest problems in PKI, and none of the existing mechanisms work perfectly.
CRL: Certificate Revocation Lists
The oldest revocation mechanism. The CA periodically publishes a signed list of revoked certificate serial numbers at a URL specified in the certificate's CRL Distribution Points extension.
# Download and inspect a CRL
curl -o crl.der http://crl.example.com/ca.crl
openssl crl -in crl.der -inform DER -noout -text | head -30
# Check how many certificates are in the CRL
openssl crl -in crl.der -inform DER -noout -text | grep -c "Serial Number"
Problems with CRLs:
- CRLs can grow to megabytes for busy CAs (Let's Encrypt's CRL would be enormous given their volume)
- Clients must download the entire list to check one certificate -- there is no way to query for a single serial number
- CRLs are cached with a "Next Update" timestamp; there is a window between revocation and the next CRL publication where revoked certificates are still accepted
- Downloading a CRL on every TLS connection is too slow for interactive web browsing
- Most browsers stopped checking CRLs years ago due to performance impact and privacy concerns (the download reveals which sites the user visits)
OCSP: Online Certificate Status Protocol
OCSP is a real-time, per-certificate revocation check. The client sends the certificate's serial number to the CA's OCSP responder and gets back a signed "good," "revoked," or "unknown" status.
sequenceDiagram
participant Browser as Browser
participant OCSP as CA's OCSP Responder
Browser->>Browser: Received certificate with<br/>serial 0A:3C:7F...
Browser->>OCSP: Is serial 0A:3C:7F still valid?<br/>(HTTP GET or POST)
OCSP->>OCSP: Look up revocation database
OCSP->>Browser: Signed OCSP Response:<br/>Status: GOOD<br/>This Update: 2026-03-12<br/>Next Update: 2026-03-19<br/>Signed by: CA's OCSP key
Browser->>Browser: Verify OCSP response signature<br/>Check status and freshness<br/>Proceed with connection
# Get the OCSP responder URL from a certificate
openssl x509 -in server.crt -noout -ocsp_uri
# http://ocsp.digicert.com
# Query the OCSP responder
openssl ocsp -issuer intermediate.crt -cert server.crt \
-url http://ocsp.digicert.com -resp_text -noverify
Problems with OCSP:
- Privacy: The CA's OCSP responder sees every site the user visits, because the browser queries the CA for every new TLS connection. This is a massive privacy leak -- the CA can build browsing profiles for every user.
- Latency: An extra HTTP round-trip (sometimes 100-300ms) for every new TLS connection, directly impacting page load times.
- Availability: If the OCSP responder is down, the browser must decide: fail-open (accept the certificate, defeating the purpose of revocation checking) or fail-closed (reject the certificate, breaking the website). Most browsers fail-open because the alternative breaks too many websites. This means OCSP provides no security benefit when the responder is unavailable -- which is exactly when an attacker might suppress it.
OCSP Stapling: The Best of Both Worlds
OCSP stapling elegantly solves the privacy and latency problems by having the server fetch its own OCSP response and include ("staple") it in the TLS handshake.
sequenceDiagram
participant Server as Web Server
participant OCSP as CA's OCSP Responder
participant Browser as Browser
Note over Server,OCSP: Background: Server periodically<br/>fetches its own OCSP response
Server->>OCSP: Am I still valid?<br/>(every few hours)
OCSP->>Server: Signed OCSP Response:<br/>Status: GOOD<br/>Valid for 7 days
Note over Server: Server caches the<br/>signed OCSP response
Note over Server,Browser: During TLS Handshake
Browser->>Server: ClientHello<br/>(with status_request extension)
Server->>Browser: ServerHello + Certificate +<br/>Stapled OCSP Response
Browser->>Browser: Verify OCSP response:<br/>1. Signed by CA's OCSP key ✓<br/>2. Status is GOOD ✓<br/>3. Response is fresh ✓<br/>No need to contact CA directly!
Enabling OCSP Stapling:
# Nginx
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;
ssl_trusted_certificate /etc/nginx/ssl/chain.pem; # intermediate + root
# Test if a server supports OCSP stapling
openssl s_client -connect www.example.com:443 -servername www.example.com \
-status < /dev/null 2>/dev/null | grep -A5 "OCSP Response"
# If stapling is working, you'll see "OCSP Response Status: successful"
# If not, you'll see "OCSP response: no response sent"
OCSP stapling limitations: If the server is the one that has been compromised (key stolen), the server can choose not to staple -- it simply does not include the OCSP response, and most clients will not fail. The OCSP Must-Staple extension (OID 1.3.6.1.5.5.7.1.24) was designed to fix this: a certificate with this extension requires the server to staple, and clients must reject connections without a stapled response. However, OCSP Must-Staple has not been widely adopted because OCSP responder outages can cause hard failures.
So we have three revocation mechanisms and none of them fully works. This is one of the genuinely unsolved problems in PKI. The industry's current pragmatic approach combines multiple strategies:
- Short certificate lifetimes -- If a certificate expires in 90 days (or less), the window where revocation matters is small
- CRLite (Mozilla) -- Compresses the entire revocation status of every certificate on the internet into a ~10MB Bloom filter cascade that Firefox downloads daily, checking revocation locally without privacy leakage
- CRLSets (Chrome) -- Google maintains a curated list of high-priority revocations pushed to Chrome, covering major incidents but not all revoked certificates
- OCSP stapling where supported -- Eliminates privacy and latency concerns
- Very short-lived certificates -- Some organizations issue certificates valid for hours, eliminating the need for revocation entirely
Revoking a Certificate in Practice
# Revoke via Let's Encrypt / certbot
sudo certbot revoke --cert-path /etc/letsencrypt/live/api.acme.com/cert.pem \
--reason keycompromise
# Revoke using the account key (alternative proof of ownership)
sudo certbot revoke --cert-path cert.pem --key-path privkey.pem
# Revocation reasons (RFC 5280 Section 5.3.1):
# 0 - unspecified
# 1 - keyCompromise (private key stolen or leaked)
# 2 - cACompromise (CA's key was compromised)
# 3 - affiliationChanged (subject's name or affiliation changed)
# 4 - superseded (replaced by a new certificate)
# 5 - cessationOfOperation (subject no longer operates the domain)
# 9 - privilegeWithdrawn (authorization has been revoked)
Certificate Automation at Scale
When you have hundreds or thousands of certificates across dozens of services, manual management is impossible. You need automation that handles the full lifecycle.
Kubernetes cert-manager
# cert-manager ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ops@acme.com
privateKeySecretRef:
name: letsencrypt-prod-account
solvers:
- http01:
ingress:
class: nginx
---
# Certificate resource -- cert-manager handles everything else
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-tls
namespace: production
spec:
secretName: api-tls-secret # K8s secret where cert+key are stored
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- api.acme.com
- www.acme.com
renewBefore: 720h # Renew 30 days before expiry
cert-manager handles the entire lifecycle: creates the private key, generates the CSR, completes the ACME challenge, downloads the certificate, stores it in a Kubernetes Secret, monitors expiry, and renews automatically. It is the standard solution for TLS in Kubernetes.
HashiCorp Vault PKI
# Enable the PKI secrets engine
vault secrets enable pki
# Set max TTL
vault secrets tune -max-lease-ttl=87600h pki
# Generate the internal root CA
vault write pki/root/generate/internal \
common_name="Acme Internal Root CA" \
ttl=87600h
# Create a role for issuing certificates
vault write pki/roles/acme-services \
allowed_domains="acme.internal" \
allow_subdomains=true \
max_ttl=2160h \
key_type=ec \
key_bits=256
# Issue a certificate (returns cert, key, and chain)
vault write pki/issue/acme-services \
common_name="api.acme.internal" \
alt_names="grpc.acme.internal" \
ttl=720h
Vault's PKI engine is particularly powerful for short-lived certificates. Some teams configure TTLs of 24 hours or even 1 hour, completely eliminating the need for revocation infrastructure. The trade-off is that every service must be able to request new certificates frequently, which requires tight integration with your deployment platform.
Build a complete certificate lifecycle automation on your local machine:
1. Set up a local CA using `step-ca`:
\```bash
# Install step CLI and step-ca
brew install step # macOS
# or: wget https://dl.step.sm/gh-release/cli/latest/step-cli.tar.gz
# Initialize a new CA
step ca init --name "Dev CA" --dns localhost --address :8443 \
--provisioner admin
# Start the CA server
step-ca $(step path)/config/ca.json
\```
2. Use ACME to get a certificate:
\```bash
step ca certificate api.dev.local api.crt api.key \
--provisioner acme --san api.dev.local
\```
3. Write the monitoring script from earlier in this chapter to check the cert's expiry
4. Set up a cron job to renew before expiry
This gives you hands-on experience with every lifecycle stage in a safe, local environment.
Common Certificate Lifecycle Failures
Failure 1: The 3 AM Expiry
A startup had a single wildcard certificate that covered their entire infrastructure: API, dashboard, webhook endpoints, partner integrations, and monitoring dashboard. The certificate expired on a Saturday night. Their on-call engineer was at a wedding with their phone on silent.
By the time someone noticed:
- Webhook deliveries to 200+ partners had been failing for 6 hours
- Partner systems had queued up millions of retry events
- Their own monitoring dashboard was behind the same certificate, so Grafana was inaccessible and PagerDuty alerts were delayed
- Recovery took 14 hours because nobody could find the CA login credentials -- they were in the personal email of an engineer who had left the company
- Several enterprise partners triggered breach notification procedures because they assumed the failed webhooks indicated a security incident
**Total cost**: an estimated $400,000 in lost revenue, SLA penalty payments to partners, and engineering time for recovery.
**Prevention (any one of these would have avoided the outage):**
- Monitor certificates with at least two independent systems (external + internal)
- Alert at 30, 14, 7, 3, and 1 day before expiry, with escalation
- Store CA credentials in a shared secrets manager (Vault, 1Password Teams), not one person's email
- Never put your monitoring system behind the same certificates you are monitoring
- Automate renewal with certbot or cert-manager so certificates renew without human intervention
Failure 2: The Silent Renewal Failure
Auto-renewal can fail silently for many reasons:
- DNS provider API credentials rotated but certbot configuration was not updated
- Firewall rule changed, blocking port 80 for HTTP-01 challenges on the validation path
- Server was replaced during migration, certbot cron job was not included in the new image
- Let's Encrypt rate limits hit (50 certificates per registered domain per week)
- DNS propagation delay caused the DNS-01 challenge to fail
- The certbot package was upgraded and the renewal configuration format changed
Always verify that renewals are actually succeeding, not just that the timer is running:
# Check certbot renewal logs
cat /var/log/letsencrypt/letsencrypt.log | tail -50
# Verify the actual certificate on disk matches what the server is serving
openssl x509 -in /etc/letsencrypt/live/api.acme.com/cert.pem -noout -enddate
# Compare with:
echo | openssl s_client -connect api.acme.com:443 2>/dev/null | openssl x509 -noout -enddate
# If these dates differ, the server hasn't loaded the new cert (needs reload)
Failure 3: The Let's Encrypt Root Transition
In September 2021, Let's Encrypt's original root certificate (DST Root CA X3) expired. Let's Encrypt had transitioned to their own root (ISRG Root X1), but older devices that did not have ISRG Root X1 in their trust stores suddenly could not verify Let's Encrypt certificates. This affected millions of devices, particularly:
- Android devices running versions older than 7.1.1 (released in 2016)
- Older OpenSSL versions (1.0.2 and earlier) that did not handle the cross-sign chain correctly
- Embedded devices and IoT hardware with frozen trust stores
The mitigation was a creative cross-signing arrangement, but the incident demonstrated that root certificate transitions affect the real world in surprising ways, and that the long tail of legacy devices creates lasting compatibility challenges.
What You've Learned
This chapter walked through every stage of a certificate's life, from birth to death:
- Key generation -- ECDSA P-256 for modern deployments, with strict file permissions and HSMs for production CAs; the private key is the most sensitive artifact in the entire lifecycle
- CSR creation -- The formal request that ties your identity to your public key, self-signed to prove possession; always inspect the CSR before submitting
- CA signing -- Domain validation methods (HTTP-01, DNS-01, TLS-ALPN-01) each have distinct trade-offs for automation, wildcard support, and network accessibility
- Deployment -- Always send the full chain (end-entity + intermediates); verify from outside with openssl, curl, and SSL Labs; never trust browser caching
- Monitoring -- Automate expiry alerts at multiple thresholds using multiple independent systems; an expired certificate can blind your security monitoring (Equifax) or halt your revenue
- Renewal -- Automate with certbot, cert-manager, or Vault; manual renewal is a ticking time bomb that will eventually explode
- Revocation -- CRL, OCSP, and OCSP stapling each have fundamental trade-offs; the industry is converging on short-lived certificates as the pragmatic solution
- ACME protocol -- The open standard (RFC 8555) that Let's Encrypt popularized, enabling free, automated certificate management for the entire internet
- Automation at scale -- cert-manager for Kubernetes, Vault PKI for multi-platform, ACME for standardized automation
That payment gateway outage? Entirely preventable. One monitoring alert, one certbot cron job, one cert-manager resource. Pick any one of those, and nobody would have learned about certificate lifecycles the expensive way. Set up monitoring for every certificate you have. Today. And while you are at it, audit the monitoring system's own certificates. Too many teams have had their monitoring dashboard go down alongside the service it was supposed to be monitoring, all because they used the same wildcard certificate.