Chapter 25: Secret Management

"Three may keep a secret, if two of them are dead." — Benjamin Franklin

What do the most expensive four lines of code look like? An AWS access key, hardcoded, committed, and pushed to a public repository. GitHub's search index caches it within ninety seconds. Bots scraping every public commit in near real-time find it even faster. In one real incident, an attacker spun up sixty-four GPU instances for cryptocurrency mining before the key could be revoked. The bill: $47,000 in four hours.

Secret management is not an afterthought. It is infrastructure.


The Problem with Secrets

Every application has secrets: database passwords, API keys, TLS certificates, encryption keys, OAuth tokens, SSH keys. The question is never whether you have secrets — it's where they live and who can access them.

Let's map the evolution of how developers typically handle secrets, from worst to best:

graph TD
    L0["Level 0: Hardcoded in source code"] --> L1["Level 1: Configuration files<br/>(not committed)"]
    L1 --> L2["Level 2: Environment variables"]
    L2 --> L3["Level 3: Encrypted config files<br/>(SOPS, Sealed Secrets)"]
    L3 --> L4["Level 4: Centralized secret store<br/>(Vault, AWS Secrets Manager)"]
    L4 --> L5["Level 5: Dynamic secrets with<br/>automatic rotation"]
    L5 --> L6["Level 6: Zero-trust identity-based<br/>access (no static secrets)"]

    style L0 fill:#ff4444,color:#fff
    style L1 fill:#ff6644,color:#fff
    style L2 fill:#ff8844,color:#fff
    style L3 fill:#ccaa00,color:#000
    style L4 fill:#44aa44,color:#fff
    style L5 fill:#2288cc,color:#fff
    style L6 fill:#6644aa,color:#fff

Most teams hover between Level 1 and Level 2. Environment variables seem safe enough — but they have real, exploitable problems.


Why Environment Variables Fail

Environment variables feel safe because they're "not in the code." But consider these concrete attack surfaces:

1. Process Inspection Reveals Them

Any process running as the same user can read another process's environment:

# On Linux, /proc exposes every process's environment
$ cat /proc/$(pgrep -f "myapp")/environ | tr '\0' '\n'
DATABASE_URL=postgres://admin:SuperSecret123@db.prod.internal:5432/app
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
STRIPE_SECRET_KEY=sk_live_4eC39HqLyjWDarjtT1zdp7dc
JWT_SIGNING_KEY=my-256-bit-secret

# Even ps can reveal them on some systems
$ ps aux -e | grep myapp
root  1234  0.0  0.1  myapp  DATABASE_URL=postgres://admin:SuperSecret123@...

If an attacker gets code execution on your server — even through a dependency vulnerability like a prototype pollution in a Node.js library — they can dump every environment variable in seconds. And most container orchestrators inject secrets as environment variables by default.

2. Docker Inspect Exposes Everything

# Docker stores env vars in the container config - readable by anyone
# with access to the Docker socket
$ docker inspect my-container | jq '.[0].Config.Env'
[
  "DATABASE_URL=postgres://admin:SuperSecret123@db.prod.internal:5432/app",
  "AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE",
  "AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  "STRIPE_SECRET_KEY=sk_live_4eC39HqLyjWDarjtT1zdp7dc"
]

# docker-compose.yml with env_file is equally exposed
$ docker compose config
services:
  web:
    environment:
      DATABASE_URL: "postgres://admin:SuperSecret123@..."

3. They Leak into Logs and Error Reporters

Application crash dumps, debug logs, error reporters like Sentry, and core dumps will cheerfully display environment variables. One misconfigured logging middleware and your database password is sitting in Elasticsearch:

# A common Python pattern that leaks secrets to Sentry
import sentry_sdk
sentry_sdk.init(dsn="...")

# When an exception occurs, Sentry captures the environment
# including all env vars. Unless you explicitly filter:
sentry_sdk.init(
    dsn="...",
    before_send=lambda event, hint: strip_sensitive_data(event),
)
# Node.js crash dumps include process.env
$ node --abort-on-uncaught-exception app.js
# The resulting core dump contains all env vars in memory

# Kubernetes pod describe shows env vars too
$ kubectl describe pod myapp-pod-abc123
    Environment:
      DATABASE_URL:  postgres://admin:SuperSecret123@db.prod.internal:5432/app

4. Child Process Inheritance

Every subprocess spawned by your application inherits the full environment. That shell command you exec'd? It now has your Stripe API key.

import subprocess
# This subprocess inherits ALL parent env vars including secrets
result = subprocess.run(["curl", "https://api.example.com"], capture_output=True)

# Even a simple log rotation script gets your AWS credentials
subprocess.run(["logrotate", "/etc/logrotate.conf"])

5. No Rotation Without Restart

Environment variables are set at process start. To rotate a secret, you must restart the process. In a zero-downtime deployment, this means coordinating rolling restarts across multiple instances — during which some instances have the old secret and some have the new one.

6. No Audit Trail

There's no log of who accessed which environment variable and when. If an attacker reads DATABASE_URL, you'll never know it happened. Compare this with Vault, which logs every secret access with the caller's identity, IP, and timestamp.

Environment variables are a *delivery mechanism*, not a *secret management system*. They answer the question "how does my app receive the secret?" but not "who manages the secret's lifecycle, rotation, access control, and audit trail?"

The Secret Zero Problem

Before diving into tools, you need to confront the fundamental paradox of secret management.

You use Vault to store your secrets. But you need a token to authenticate to Vault. Where do you store that token? This is what the industry calls the "Secret Zero" problem. It is turtles all the way down.

graph TD
    A["App needs DB_PASSWORD"] --> B["Stored in Vault"]
    B --> C["App needs VAULT_TOKEN<br/>to access Vault"]
    C --> D{"Where is VAULT_TOKEN stored?"}
    D --> E["In an env var?<br/>(same problem)"]
    D --> F["In a config file?<br/>(same problem)"]
    D --> G["Somewhere else?<br/>(still needs a secret)"]

    H["You always need at least<br/>ONE bootstrap secret"]

    E --> H
    F --> H
    G --> H

    style D fill:#ff8800,color:#fff
    style H fill:#cc0000,color:#fff

The industry has converged on several approaches to minimize (not eliminate) the Secret Zero:

Platform Identity (the best current answer): Cloud providers offer instance identity. An EC2 instance has an IAM role. A Kubernetes pod has a service account. A GCE instance has a service account attached at creation. These identities are asserted by the platform itself, not by a static credential. The platform signs a cryptographic proof of identity that Vault can verify.

# AWS: EC2 instance metadata provides temporary credentials
# automatically — no static key needed
$ curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/my-role
{
  "AccessKeyId": "ASIAIOSFODNN7EXAMPLE",
  "SecretAccessKey": "temporary-secret-key",
  "Token": "session-token...",
  "Expiration": "2026-03-12T18:00:00Z"
}
# These rotate automatically every ~6 hours

# Kubernetes: Service account token is injected by the kubelet
$ cat /var/run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
# This JWT is signed by the API server and can be verified by Vault

Response Wrapping: Vault can wrap a secret in a single-use token. The token can only be unwrapped once. If an attacker intercepts it, either the attacker or the legitimate app will fail — and the failure is detected immediately.

AppRole with SecretID: Vault's AppRole auth method splits authentication into a RoleID (known, like a username) and a SecretID (short-lived, delivered through a trusted channel). The SecretID can be single-use and tightly scoped.

sequenceDiagram
    participant CI as CI/CD Pipeline
    participant Vault as HashiCorp Vault
    participant App as Application

    CI->>Vault: Request wrapped SecretID<br/>(using CI credentials)
    Vault-->>CI: Wrapped SecretID<br/>(single-use token)
    CI->>App: Deliver wrapped token<br/>(via trusted channel)
    App->>Vault: Unwrap to get SecretID
    Vault-->>App: SecretID (single-use)
    App->>Vault: Authenticate with<br/>RoleID + SecretID
    Vault-->>App: Vault Token<br/>(scoped, time-limited)
    App->>Vault: Read secrets using token
    Vault-->>App: Database credentials,<br/>API keys, etc.

    Note over CI,App: If attacker intercepts the wrapped token,<br/>the legitimate app's unwrap will fail,<br/>triggering an alert.

The goal is not to eliminate Secret Zero. It is to make it as small, short-lived, and tightly scoped as possible. An IAM role attached to an EC2 instance is better than a static token in an environment variable, because the role is platform-asserted and the temporary credentials rotate automatically.


HashiCorp Vault: Architecture Deep Dive

Vault has become the de facto standard for secret management in production environments. Let's understand its architecture in detail.

graph TD
    subgraph Clients
        C1["App / Microservice"]
        C2["CI/CD Pipeline"]
        C3["Developer CLI"]
    end

    subgraph VaultCluster["Vault Cluster"]
        API["Vault API<br/>(HTTPS :8200)"]

        subgraph Core["Vault Core"]
            Auth["Auth Methods<br/>Token, LDAP, OIDC,<br/>AWS IAM, K8s SA,<br/>TLS Certs, GitHub"]
            SE["Secret Engines<br/>KV, Database, PKI,<br/>Transit, SSH, AWS,<br/>TOTP, Consul"]
            PE["Policy Engine<br/>HCL-based ACLs,<br/>deny-by-default"]
            Audit["Audit Devices<br/>File, Syslog,<br/>Socket"]
        end

        Barrier["Encryption Barrier<br/>AES-256-GCM"]
    end

    Storage["Storage Backend<br/>Integrated Raft, Consul,<br/>S3, GCS, DynamoDB"]

    C1 --> API
    C2 --> API
    C3 --> API
    API --> Auth
    API --> SE
    API --> PE
    API --> Audit
    Core --> Barrier
    Barrier --> Storage

    style Barrier fill:#cc4400,color:#fff
    style Auth fill:#2266aa,color:#fff
    style SE fill:#228844,color:#fff
    style PE fill:#886622,color:#fff

The Seal/Unseal Mechanism

Vault starts in a sealed state. When sealed, it has encrypted data but cannot decrypt it. The master key needed to decrypt is itself split using Shamir's Secret Sharing — a cryptographic algorithm that splits a key into N shares where any K shares (the threshold) can reconstruct the original key.

stateDiagram-v2
    [*] --> Sealed: Vault starts

    Sealed --> Unseal1: Key share 1 provided
    Unseal1 --> Unseal2: Key share 2 provided
    Unseal2 --> Unsealed: Key share 3 provided<br/>(threshold met: 3 of 5)

    Unsealed --> Sealed: Manual seal command
    Unsealed --> Sealed: Vault restart
    Unsealed --> Sealed: HA failover (some configs)

    state Sealed {
        [*] --> EncryptedStorage: Data exists but<br/>cannot be decrypted
    }

    state Unsealed {
        [*] --> Operational: Master key in memory,<br/>all operations available
    }

    note right of Sealed
        No secret operations possible.
        Only status and unseal endpoints work.
        API returns 503 for all other requests.
    end note

    note right of Unsealed
        Master key reconstructed in memory.
        Encryption barrier is open.
        All auth, secret, and audit operations work.
    end note
# Initialize Vault with Shamir's Secret Sharing
# 5 key shares, any 3 needed to unseal
$ vault operator init -key-shares=5 -key-threshold=3
Unseal Key 1: kF5jNMqPz2XrLLmV+RUbQnz8TN7xCz1B5nOtKVlq3Jkx
Unseal Key 2: qzT8pRWV/GhXhYFnKQx9Y0jYqJfLmNO3x2P4+kYE7xAy
Unseal Key 3: xWm2vKDL+8TkQx/JhVYAqZ5nBcMp+FqLx0N8E1jKYR0z
Unseal Key 4: bL9hYdRFn+pW3xKMz8TQvJ2hNq0L5cXr7A9BkWjFmC1v
Unseal Key 5: mN3sJfKx+2Rz9YBwE4hLqT7vPcAd0X6nU8gWjFlMm5Nk

Initial Root Token: hvs.pQrsTuVwXyZAbCdEfGhI

Vault initialized with 5 key shares and a key threshold of 3.
Please securely distribute the key shares printed above.
Store the initial root token SECURELY - it grants full access.

# Unseal process (must provide 3 of 5 keys)
$ vault operator unseal kF5jNMqPz2XrLLmV+RUbQnz8TN7xCz1B5nOtKVlq3Jkx
Sealed: true
Unseal Progress: 1/3

$ vault operator unseal qzT8pRWV/GhXhYFnKQx9Y0jYqJfLmNO3x2P4+kYE7xAy
Sealed: true
Unseal Progress: 2/3

$ vault operator unseal xWm2vKDL+8TkQx/JhVYAqZ5nBcMp+FqLx0N8E1jKYR0z
Sealed: false
Cluster Name: vault-cluster-abc123
Cluster ID: 12345678-abcd-efgh-ijkl-123456789012
HA Enabled: true
HA Mode: active
# Vault is now operational

Auto-Unseal with Cloud KMS

Manual unsealing is operationally painful — every restart, every upgrade, every node failure requires human intervention. Auto-unseal delegates the master key protection to a cloud KMS:

# vault.hcl configuration for auto-unseal with AWS KMS
seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "alias/vault-unseal-key"
}

# With GCP:
seal "gcpckms" {
  project     = "my-project"
  region      = "global"
  key_ring    = "vault-keyring"
  crypto_key  = "vault-unseal-key"
}

The trade-off: you now depend on cloud KMS availability for Vault to start. If AWS KMS has an outage, your Vault nodes cannot unseal. This is a deliberate exchange of operational convenience for a dependency on cloud infrastructure.

Auth Methods in Practice

Vault supports many authentication backends. The choice determines how your Secret Zero problem is solved:

# Kubernetes Auth: pods authenticate with their service account JWT
$ vault auth enable kubernetes

$ vault write auth/kubernetes/config \
    kubernetes_host="https://kubernetes.default.svc:443" \
    kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

$ vault write auth/kubernetes/role/myapp \
    bound_service_account_names=myapp-sa \
    bound_service_account_namespaces=production \
    policies=myapp-production \
    ttl=1h

# AWS IAM Auth: EC2 instances authenticate with their IAM role
$ vault auth enable aws

$ vault write auth/aws/role/myapp \
    auth_type=iam \
    bound_iam_principal_arn=arn:aws:iam::123456789012:role/myapp-role \
    policies=myapp-production \
    ttl=1h

# OIDC Auth: humans authenticate via SSO (Okta, Google, Azure AD)
$ vault auth enable oidc

$ vault write auth/oidc/config \
    oidc_discovery_url="https://accounts.google.com" \
    oidc_client_id="abc123.apps.googleusercontent.com" \
    oidc_client_secret="client-secret" \
    default_role="developer"

Secret Engines: The Power of Vault

Secret engines are pluggable backends that store, generate, or encrypt data:

# KV Engine (static secrets with versioning)
$ vault secrets enable -path=secret kv-v2

$ vault kv put secret/myapp/database \
    username="dbadmin" \
    password="hunter2" \
    host="db.prod.internal" \
    port="5432"
====== Secret Path ======
secret/data/myapp/database

======= Metadata =======
Key                Value
---                -----
created_time       2026-03-12T10:30:00.000Z
custom_metadata    <nil>
deletion_time      n/a
destroyed          false
version            1

# Read the secret
$ vault kv get secret/myapp/database
====== Secret Path ======
secret/data/myapp/database

======= Metadata =======
Key                Value
---                -----
created_time       2026-03-12T10:30:00.000Z
version            1

====== Data ======
Key         Value
---         -----
host        db.prod.internal
password    hunter2
port        5432
username    dbadmin

# Read a single field (useful in scripts)
$ vault kv get -field=password secret/myapp/database
hunter2

# JSON output (useful for automation)
$ vault kv get -format=json secret/myapp/database | jq '.data.data'
{
  "host": "db.prod.internal",
  "password": "hunter2",
  "port": "5432",
  "username": "dbadmin"
}

# Version history
$ vault kv metadata get secret/myapp/database

Dynamic Secrets: The Game Changer

Static secrets are secrets that someone creates and stores. Dynamic secrets are generated on-demand with automatic expiration. This is where Vault transforms from a "secure key-value store" to a security infrastructure platform.

sequenceDiagram
    participant App as Application
    participant Vault as Vault
    participant DB as PostgreSQL

    App->>Vault: GET /database/creds/myapp-readonly
    Vault->>DB: CREATE ROLE "v-token-myapp-xyz"<br/>WITH LOGIN PASSWORD 'auto-generated'<br/>VALID UNTIL '2026-03-12T11:30:00Z'<br/>GRANT SELECT ON ALL TABLES
    DB-->>Vault: Role created
    Vault-->>App: username: v-token-myapp-xyz<br/>password: A1B2-c3d4-E5F6<br/>lease_duration: 1h<br/>lease_id: database/creds/...

    Note over App,DB: App uses credentials for 1 hour

    Vault->>DB: DROP ROLE "v-token-myapp-xyz"
    Note over Vault,DB: Lease expires, credentials auto-revoked
# Enable the database secret engine
$ vault secrets enable database

# Configure a PostgreSQL connection
$ vault write database/config/myapp-db \
    plugin_name=postgresql-database-plugin \
    allowed_roles="myapp-readonly,myapp-readwrite" \
    connection_url="postgresql://{{username}}:{{password}}@db.prod.internal:5432/myapp" \
    username="vault_admin" \
    password="vault_admin_password"

# Create a read-only role
$ vault write database/roles/myapp-readonly \
    db_name=myapp-db \
    creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
        VALID UNTIL '{{expiration}}'; \
        GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
    revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
    default_ttl="1h" \
    max_ttl="24h"

# Request credentials — a new user is created on the fly
$ vault read database/creds/myapp-readonly
Key                Value
---                -----
lease_id           database/creds/myapp-readonly/abcd-1234-efgh-5678
lease_duration     1h
lease_renewable    true
password           A1B2-c3d4-E5F6-g7h8
username           v-token-myapp-read-xyz123-1234567890

# Request again — completely different credentials
$ vault read database/creds/myapp-readonly
Key                Value
---                -----
lease_id           database/creds/myapp-readonly/ijkl-5678-mnop-9012
lease_duration     1h
lease_renewable    true
password           Q9R8-s7t6-U5V4-w3x2
username           v-token-myapp-read-abc456-0987654321

# Renew a lease before expiry
$ vault lease renew database/creds/myapp-readonly/abcd-1234-efgh-5678
Key                Value
---                -----
lease_id           database/creds/myapp-readonly/abcd-1234-efgh-5678
lease_duration     1h
lease_renewable    true

# Revoke credentials immediately (incident response)
$ vault lease revoke database/creds/myapp-readonly/abcd-1234-efgh-5678
All revocation operations queued successfully!

# Revoke ALL credentials for a path (nuclear option)
$ vault lease revoke -prefix database/creds/myapp-readonly
All revocation operations queued successfully!

Every time you request credentials, you get a different username and password. They expire after one hour. If credentials are compromised, the blast radius is tiny — they stop working automatically. No manual rotation needed. And if you detect a breach, you can revoke all dynamic credentials for a service in one command. Try doing that with static passwords shared across twenty microservices.

Dynamic secrets work for far more than databases:

- **AWS Secret Engine:** Generates temporary IAM credentials with specific policies. Your application never has long-lived AWS keys.
- **PKI Secret Engine:** Issues X.509 certificates on demand with short TTLs. No more year-long certificates sitting on disk.
- **SSH Secret Engine:** Signs SSH public keys with a CA, providing time-limited SSH access without distributing authorized_keys files.
- **TOTP Secret Engine:** Generates TOTP codes for service-to-service authentication.
- **Consul Secret Engine:** Generates Consul ACL tokens dynamically.

The pattern is always the same: instead of a long-lived credential that someone creates and forgets about, Vault generates a short-lived credential that self-destructs.

Vault Policies: Least Privilege in Practice

Policies control who can access what. They follow a deny-by-default model written in HCL (HashiCorp Configuration Language).

# policy: myapp-production.hcl
# Grants the production app read-only access to its secrets

# Allow reading the app's static secrets
path "secret/data/myapp/*" {
  capabilities = ["read", "list"]
}

# Allow generating dynamic database credentials
path "database/creds/myapp-readonly" {
  capabilities = ["read"]
}

# Allow encrypting/decrypting via the transit engine
path "transit/encrypt/myapp-key" {
  capabilities = ["update"]
}
path "transit/decrypt/myapp-key" {
  capabilities = ["update"]
}

# Explicitly deny access to other apps' secrets
path "secret/data/otherapp/*" {
  capabilities = ["deny"]
}

# Deny all sys operations (no vault management)
path "sys/*" {
  capabilities = ["deny"]
}
# Write the policy
$ vault policy write myapp-production myapp-production.hcl
Success! Uploaded policy: myapp-production

# Create a token with this policy
$ vault token create -policy=myapp-production -ttl=8h
Key                  Value
---                  -----
token                hvs.CAESIGx5Y2...
token_accessor       accessor123abc
token_duration       8h
token_renewable      true
token_policies       ["default" "myapp-production"]

# Test what the token can and cannot do
$ VAULT_TOKEN=hvs.CAESIGx5Y2... vault kv get secret/myapp/database
# SUCCESS: allowed by policy

$ VAULT_TOKEN=hvs.CAESIGx5Y2... vault kv put secret/myapp/database password="new"
# ERROR: 1 error occurred:
#   * permission denied

$ VAULT_TOKEN=hvs.CAESIGx5Y2... vault kv get secret/otherapp/database
# ERROR: 1 error occurred:
#   * permission denied

Audit Logging: Every Access Recorded

# Enable file audit logging
$ vault audit enable file file_path=/var/log/vault/audit.log

# Enable syslog for centralized logging
$ vault audit enable syslog tag="vault" facility="AUTH"

# Vault logs EVERY request and response
# Sensitive values are HMAC'd — not stored in plaintext
$ tail -1 /var/log/vault/audit.log | jq .
{
  "time": "2026-03-12T10:30:15.123Z",
  "type": "response",
  "auth": {
    "client_token": "hmac-sha256:a1b2c3...",
    "accessor": "hmac-sha256:d4e5f6...",
    "display_name": "kubernetes-production-myapp-sa",
    "policies": ["default", "myapp-production"],
    "token_type": "service",
    "token_ttl": 3600
  },
  "request": {
    "id": "req-abc-123",
    "operation": "read",
    "path": "secret/data/myapp/database",
    "remote_address": "10.0.1.50",
    "namespace": { "id": "root" }
  },
  "response": {
    "data": {
      "data": {
        "password": "hmac-sha256:f3a1b2c3...",
        "username": "hmac-sha256:g4h5i6..."
      }
    }
  }
}

Notice the HMAC'd values. Vault does not log the actual secrets in the audit trail — that would defeat the purpose. Instead, it logs a deterministic hash. This means you can search for "was this specific secret accessed?" by computing the HMAC of the secret value and searching the logs, without the logs themselves containing cleartext secrets.

Vault requires at least one audit device to be available before processing requests. If all audit devices fail (disk full, syslog down), Vault will stop responding to requests entirely. This is a deliberate security decision — Vault will not process secrets without an audit trail. Always configure at least two audit devices for redundancy.

Envelope Encryption Explained

Why not encrypt data directly with a master key? Because envelope encryption is a far more elegant and practical pattern. You generate a data key (DEK), encrypt the data with the DEK, then encrypt the DEK with the master key. You store the encrypted DEK alongside the encrypted data.

sequenceDiagram
    participant App as Application
    participant KMS as KMS / Vault Transit
    participant Store as Storage (S3, DB)

    Note over App,Store: ENCRYPTION FLOW
    App->>KMS: GenerateDataKey()
    KMS-->>App: Plaintext DEK + Encrypted DEK

    Note over App: Encrypt data locally<br/>with Plaintext DEK<br/>(AES-256-GCM)
    App->>App: ciphertext = AES(data, plaintext_DEK)
    App->>App: Discard plaintext DEK from memory

    App->>Store: Store: Encrypted DEK + Ciphertext

    Note over App,Store: DECRYPTION FLOW
    App->>Store: Retrieve: Encrypted DEK + Ciphertext
    Store-->>App: Encrypted DEK + Ciphertext

    App->>KMS: Decrypt(encrypted_DEK)
    KMS-->>App: Plaintext DEK

    Note over App: Decrypt data locally<br/>with Plaintext DEK
    App->>App: data = AES_decrypt(ciphertext, plaintext_DEK)
    App->>App: Discard plaintext DEK from memory

Why this indirection? Four critical reasons:

  1. Performance: The master key in KMS can only encrypt small amounts of data (4KB for AWS KMS). The DEK can encrypt gigabytes locally using fast symmetric encryption (AES-256-GCM at hardware speeds).

  2. Key rotation without re-encryption: When you rotate the master key, you only need to re-encrypt the DEKs (256-bit values), not all the data. Re-encrypting a 256-bit key is instant; re-encrypting terabytes of data is not.

  3. Security boundary: The master key never leaves the KMS hardware security module (HSM). Even cloud provider engineers cannot extract it. Your plaintext data never leaves your application — it's encrypted locally and never sent to KMS.

  4. Granularity: Each record, file, or object can have its own DEK. Compromising one DEK affects only one piece of data, not everything encrypted with the master key.

**How Vault's Transit Engine implements envelope encryption:**

Vault's Transit engine provides encryption-as-a-service. The key never leaves Vault — your application sends plaintext to Vault and receives ciphertext back (or vice versa). This is conceptually different from envelope encryption where the DEK is sent to the application.

```bash
# Enable the transit engine
$ vault secrets enable transit

# Create a named encryption key
$ vault write -f transit/keys/myapp-key
Success! Data written to: transit/keys/myapp-key

# Encrypt data (plaintext must be base64-encoded)
$ vault write transit/encrypt/myapp-key \
    plaintext=$(echo -n "credit-card-4111-1111-1111-1111" | base64)
Key           Value
---           -----
ciphertext    vault:v1:8SDd3whDYlHmMr0+VIQ7YFpLBL...
key_version   1

# Decrypt
$ vault write transit/decrypt/myapp-key \
    ciphertext="vault:v1:8SDd3whDYlHmMr0+VIQ7YFpLBL..."
Key          Value
---          -----
plaintext    Y3JlZGl0LWNhcmQtNDExMS0xMTExLTExMTEtMTExMQ==

$ echo "Y3JlZGl0LWNhcmQtNDExMS0xMTExLTExMTEtMTExMQ==" | base64 -d
credit-card-4111-1111-1111-1111

The vault:v1: prefix in the ciphertext tells Vault which version of the key was used, enabling seamless key rotation.


---

## AWS KMS and GCP KMS in Practice

### AWS KMS

```bash
# Create a Customer Master Key (CMK)
$ aws kms create-key \
    --description "MyApp production encryption key" \
    --key-usage ENCRYPT_DECRYPT \
    --origin AWS_KMS
{
    "KeyMetadata": {
        "KeyId": "abcd1234-ab12-cd34-ef56-abcdef123456",
        "Arn": "arn:aws:kms:us-east-1:123456789012:key/abcd1234...",
        "KeyState": "Enabled",
        "KeyUsage": "ENCRYPT_DECRYPT",
        "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
        "EncryptionAlgorithms": ["SYMMETRIC_DEFAULT"],
        "Origin": "AWS_KMS"
    }
}

# Create an alias for human-readable reference
$ aws kms create-alias \
    --alias-name alias/myapp-production \
    --target-key-id abcd1234-ab12-cd34-ef56-abcdef123456

# Generate a data key for envelope encryption
$ aws kms generate-data-key \
    --key-id alias/myapp-production \
    --key-spec AES_256
{
    "CiphertextBlob": "AQIDAHhN...(base64 encrypted DEK)...",
    "Plaintext": "SGVsbG8g...(base64 plaintext DEK - use and discard!)...",
    "KeyId": "arn:aws:kms:us-east-1:123456789012:key/abcd1234..."
}

# Encrypt data directly (up to 4KB — use envelope encryption for larger data)
$ aws kms encrypt \
    --key-id alias/myapp-production \
    --plaintext fileb://secret.txt \
    --output text --query CiphertextBlob | base64 --decode > secret.enc

# Decrypt
$ aws kms decrypt \
    --ciphertext-blob fileb://secret.enc \
    --output text --query Plaintext | base64 --decode > secret.txt

# Key policy: who can use this key
$ aws kms put-key-policy \
    --key-id alias/myapp-production \
    --policy-name default \
    --policy '{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {"AWS": "arn:aws:iam::123456789012:role/myapp-role"},
            "Action": ["kms:Decrypt", "kms:GenerateDataKey"],
            "Resource": "*"
        }]
    }'

GCP KMS

# Create a key ring (container for keys)
$ gcloud kms keyrings create myapp-keyring \
    --location=global

# Create a key with automatic rotation
$ gcloud kms keys create myapp-key \
    --location=global \
    --keyring=myapp-keyring \
    --purpose=encryption \
    --rotation-period=90d \
    --next-rotation-time=2026-06-12T00:00:00Z

# Encrypt
$ gcloud kms encrypt \
    --location=global \
    --keyring=myapp-keyring \
    --key=myapp-key \
    --plaintext-file=secret.txt \
    --ciphertext-file=secret.enc

# Decrypt
$ gcloud kms decrypt \
    --location=global \
    --keyring=myapp-keyring \
    --key=myapp-key \
    --ciphertext-file=secret.enc \
    --plaintext-file=secret.txt

How do you choose between AWS KMS, GCP KMS, and Vault's Transit engine? Use cloud KMS if you are all-in on one cloud and want the simplest possible setup with zero operational overhead. Use Vault's Transit engine if you need cloud-agnostic encryption, multi-cloud support, or if you are already running Vault. They solve the same problem — encryption as a service — but Vault gives you more control and portability. The key decision factor is usually: do you want a managed service (KMS) or do you want control and portability (Vault)?


Key Rotation Strategies

Key rotation limits damage if a key is compromised and satisfies compliance requirements (PCI DSS requires annual rotation, SOC 2 audits check for it, HIPAA expects it).

graph TD
    subgraph Auto["1. AUTOMATIC ROTATION (KMS-managed)"]
        K1["Key v1 (Jan 2025)"] --> D1["Old data encrypted with v1"]
        K2["Key v2 (Apr 2025)"] --> D2["Old data encrypted with v2"]
        K3["Key v3 (Jul 2025)"] --> D3["New data encrypted with v3"]
        Note1["KMS tracks which version<br/>was used per ciphertext.<br/>Decrypt uses correct version<br/>automatically."]
    end

    subgraph ReEnc["2. RE-ENCRYPTION ROTATION"]
        RK1["Generate new key version"] --> RK2["Re-encrypt all data<br/>with new version"]
        RK2 --> RK3["Delete old key version"]
        RK4["Necessary when you<br/>suspect key compromise"]
    end

    subgraph Dual["3. DUAL-WRITING (gradual migration)"]
        DW1["Write new data with new key"] --> DW2["Read can use either key"]
        DW2 --> DW3["Background job re-encrypts<br/>old data"]
        DW3 --> DW4["Remove old key when<br/>migration complete"]
    end
# AWS KMS: Enable automatic key rotation (annual by default)
$ aws kms enable-key-rotation \
    --key-id alias/myapp-production

# Check rotation status
$ aws kms get-key-rotation-status \
    --key-id alias/myapp-production
{
    "KeyRotationEnabled": true
}

# Vault Transit: Rotate the encryption key
$ vault write -f transit/keys/myapp-key/rotate
Success! Data written to: transit/keys/myapp-key/rotate

# Check key versions
$ vault read transit/keys/myapp-key
Key                       Value
---                       -----
latest_version            2
min_decryption_version    1
min_encryption_version    0

# Set minimum decryption version (forces re-encryption of old data)
$ vault write transit/keys/myapp-key \
    min_decryption_version=2

# Rewrap existing ciphertext with the latest key version
# (without exposing plaintext — Vault decrypts and re-encrypts internally)
$ vault write transit/rewrap/myapp-key \
    ciphertext="vault:v1:8SDd3whDYlHmMr0..."
Key           Value
---           -----
ciphertext    vault:v2:newEncryptedData...
key_version   2

Sealed Secrets for Kubernetes

Kubernetes Secrets are base64-encoded (not encrypted!) and stored in etcd. Anyone with access to the etcd datastore or with get secrets RBAC permission can read them. And you can't commit them to Git because they contain plaintext values.

# How "secure" Kubernetes secrets really are:
$ kubectl get secret myapp-db -o jsonpath='{.data.password}' | base64 -d
SuperSecret123
# That's all it takes. base64 is encoding, not encryption.

Bitnami's Sealed Secrets solves this with asymmetric encryption:

sequenceDiagram
    participant Dev as Developer Workstation
    participant Git as Git Repository
    participant K8s as Kubernetes Cluster
    participant SC as SealedSecret Controller

    Dev->>Dev: Create regular Secret YAML
    Dev->>Dev: kubeseal encrypts with<br/>cluster's public key
    Dev->>Git: Commit SealedSecret<br/>(safe — encrypted!)
    Git->>K8s: GitOps deploys<br/>SealedSecret resource
    K8s->>SC: Controller detects<br/>new SealedSecret
    SC->>SC: Decrypt with private key<br/>(never leaves cluster)
    SC->>K8s: Create regular K8s Secret
    K8s->>K8s: Pods mount Secret<br/>as env vars or files

    Note over SC: Private key is stored in-cluster<br/>as a Secret in kube-system namespace.<br/>If lost, all SealedSecrets must be re-created.
# Install the sealed-secrets controller
$ helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
$ helm install sealed-secrets sealed-secrets/sealed-secrets \
    --namespace kube-system

# Create a regular secret manifest (don't commit this!)
$ kubectl create secret generic myapp-db \
    --from-literal=password=SuperSecret123 \
    --from-literal=username=dbadmin \
    --dry-run=client -o yaml > myapp-db-secret.yaml

# Seal it (encrypt with the cluster's public key)
$ kubeseal --format=yaml < myapp-db-secret.yaml > myapp-db-sealed.yaml

$ cat myapp-db-sealed.yaml
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: myapp-db
  namespace: default
spec:
  encryptedData:
    password: AgBy3i4OJSWK+PiTySYZZMpJkJW1X9...long-base64-string...
    username: AgCE7j2PLRTK+QjUzTZAPMpKkLW2Y0...

# This file is safe to commit to Git!
$ git add myapp-db-sealed.yaml
$ git commit -m "Add sealed database credentials"

# Apply it — the controller decrypts and creates a regular Secret
$ kubectl apply -f myapp-db-sealed.yaml

# Verify the Secret was created
$ kubectl get secret myapp-db
NAME       TYPE     DATA   AGE
myapp-db   Opaque   2      5s
Other approaches to Kubernetes secrets management:

- **External Secrets Operator (ESO):** Syncs secrets from Vault, AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault into Kubernetes Secrets. The source of truth lives outside the cluster. Best for teams already using a centralized secret store.
- **SOPS (Secrets OPerationS):** Mozilla's tool that encrypts specific values in YAML/JSON files using KMS, PGP, or age. Works well with GitOps (Flux, ArgoCD). Lets you see the keys but not the values in version control.
- **Vault Agent Sidecar Injector:** Runs a Vault agent as a sidecar container that fetches secrets and writes them to a shared volume. The application reads secrets from files, not environment variables — avoiding the env var problems discussed earlier.
- **Vault CSI Provider:** Mounts Vault secrets as volumes using the Container Storage Interface. Similar to the sidecar approach but uses the standard CSI mechanism.

Each approach has different operational trade-offs. ESO is simplest if you already use a cloud secret manager. Sealed Secrets is simplest for pure GitOps. Vault sidecar gives the most control but requires running Vault.

Git Secret Scanning

The last line of defense is catching secrets before they reach a repository. GitHub reported detecting over 100 million leaked secrets in public repositories in a single year. Secrets leak into Git repos at an alarming rate, and bots scan for them within seconds of commit.

Pre-Commit Hooks (Local Defense)

# Install gitleaks
$ brew install gitleaks   # macOS
$ apt install gitleaks    # Ubuntu/Debian

# Scan the current repo for secrets
$ gitleaks detect --source=. --verbose
Finding:     AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfi
Secret:      wJalrXUtnFEMI/K7MDENG/bPxRfi
RuleID:      aws-secret-access-key
Entropy:     4.7
File:        config/deploy.sh
Line:        23
Commit:      a1b2c3d4e5f6
Author:      developer@example.com
Date:        2026-03-10T15:30:00Z
Fingerprint: config/deploy.sh:aws-secret-access-key:23

Finding:     STRIPE_SECRET_KEY=sk_live_4eC39HqLyjWDarjtT1zdp7dc
Secret:      sk_live_4eC39HqLyjWDarjtT1zdp7dc
RuleID:      stripe-secret-key
Entropy:     4.2
File:        src/payments.py
Line:        12

2 findings detected. Scan complete.

# Set up as a pre-commit hook (catches secrets before they're committed)
$ cat .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.1
    hooks:
      - id: gitleaks

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit

# Now any commit containing secrets will be blocked:
$ git commit -m "Add deploy config"
gitleaks..........................................................Failed
- hook id: gitleaks
- exit code: 1
Secret detected in config/deploy.sh

CI/CD Pipeline Scanning

# Scan only the latest commit diff in CI
$ gitleaks detect --source=. --log-opts="HEAD~1..HEAD" --verbose

# truffleHog scans for high-entropy strings AND known patterns
$ trufflehog git file://. --since-commit HEAD~1 --only-verified

# GitHub Actions workflow
# .github/workflows/secret-scan.yml
name: Secret Scanning
on: [push, pull_request]
jobs:
  gitleaks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

GitHub's Built-In Secret Scanning

GitHub automatically scans public repos for known secret formats (AWS keys, Stripe keys, GCP credentials, etc.) and notifies both the repo owner and the secret provider. GitHub Advanced Security extends this to private repos and includes push protection — blocking the commit on the server side if it contains a recognized secret.

Secret scanning catches known patterns. It will NOT catch:
- Custom API keys with no recognizable format
- Passwords in configuration files that don't match patterns
- Private keys embedded in unusual formats
- Secrets obfuscated with base64 or other encoding
- Secrets split across multiple variables or lines

Defense in depth means combining scanning with proper secret management practices. Scanning is the safety net, not the primary control.

The Uber Breach: A Case Study

In 2016, Uber suffered a breach that exposed the personal data of 57 million users and 600,000 drivers. The attack path was devastatingly simple:

**The attack chain:**
1. Two attackers found Uber engineers' credentials in a **private** GitHub repository
2. Those credentials included AWS access keys
3. The AWS keys had overly broad IAM permissions — including access to an S3 bucket containing rider and driver data
4. The attackers downloaded the entire dataset and contacted Uber demanding $100,000

**What made it catastrophically worse:**
- Uber's CSO and security team **paid the ransom** ($100,000 in Bitcoin)
- They had the attackers sign NDAs (non-disclosure agreements)
- They **disguised the payment as a bug bounty**
- They **covered up the breach for over a year**, failing to notify affected users or regulators
- When the cover-up was discovered in November 2017, the CSO was fired and later **criminally charged**

**The technical failures:**
- Long-lived AWS access keys (instead of IAM roles with temporary credentials)
- Keys stored in a code repository (even a private one)
- No secret scanning in CI/CD pipeline
- Overly permissive IAM policies (the keys could access S3 buckets they shouldn't have)
- No alerts on unusual S3 access patterns (bulk download of PII)

**The organizational failures:**
- Leadership chose concealment over disclosure
- Bug bounty program was abused to disguise ransom payments
- No incident response process was followed
- Regulatory obligations were deliberately ignored

**The consequences:**
- $148 million settlement with US states
- $1.2 million fine from UK and Dutch regulators
- CSO Joe Sullivan convicted of obstruction and misprision of felony
- First criminal conviction of a CISO for breach cover-up

This case established the legal precedent that **CISOs can face personal criminal liability** for covering up breaches.

The entire breach chain started with credentials in a Git repo. An AWS access key. Twenty characters that cost Uber $148 million in settlements, criminal charges for their CISO, and immeasurable reputation damage.


Vault High Availability in Production

graph TD
    subgraph VaultHA["Vault HA Cluster"]
        Active["Vault Active Node<br/>(Leader)<br/>Serves all requests"]
        Standby1["Vault Standby 1<br/>(Warm standby)<br/>Forwards to leader"]
        Standby2["Vault Standby 2<br/>(Warm standby)<br/>Forwards to leader"]
    end

    subgraph Storage["Integrated Raft Storage"]
        R1["Raft Node 1<br/>(Leader)"]
        R2["Raft Node 2<br/>(Follower)"]
        R3["Raft Node 3<br/>(Follower)"]
        R1 <--> R2
        R2 <--> R3
        R1 <--> R3
    end

    LB["Load Balancer<br/>Routes to active node"]
    Client["Client Applications"] --> LB
    LB --> Active
    LB --> Standby1
    LB --> Standby2
    Active --> R1
    Standby1 --> R2
    Standby2 --> R3

    KMS["Cloud KMS<br/>(Auto-Unseal)"]
    Active --> KMS
    Standby1 --> KMS
    Standby2 --> KMS

    Note1["Only the active node serves requests.<br/>Standbys forward to active or return 307 redirect.<br/>If active fails, Raft leader election promotes a standby."]
# Production vault.hcl configuration
storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-node-1"
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  tls_cert_file = "/opt/vault/tls/vault-cert.pem"
  tls_key_file  = "/opt/vault/tls/vault-key.pem"
}

seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "alias/vault-unseal-key"
}

api_addr     = "https://vault-1.internal:8200"
cluster_addr = "https://vault-1.internal:8201"

telemetry {
  prometheus_retention_time = "30s"
  disable_hostname          = true
}
# Join additional nodes to the Raft cluster
$ vault operator raft join https://vault-1.internal:8200

# Check Raft cluster status
$ vault operator raft list-peers
Node          Address                    State       Voter
----          -------                    -----       -----
vault-node-1  vault-1.internal:8201      leader      true
vault-node-2  vault-2.internal:8201      follower    true
vault-node-3  vault-3.internal:8201      follower    true
Build a complete secret management pipeline:

1. Start Vault in dev mode (`vault server -dev`)
2. Store three secrets: DB credentials, API key, JWT signing key at `secret/workshop/`
3. Write a policy allowing read-only access to those secrets
4. Create a token with that policy and test permissions
5. Enable the transit engine, create a key, encrypt a message
6. Rotate the transit key and verify old ciphertext still decrypts
7. Enable an audit device and review what gets captured
8. Use `vault kv get -format=json` and pipe through `jq` — practice the scripting interface
9. Try `vault kv rollback` to revert a secret to a previous version
10. Explore `vault kv metadata get` to see version history

Production Architecture: Putting It All Together

graph TD
    Dev["Developer Workstation"]
    Git["GitHub<br/>(Code only, no secrets)"]
    CI["CI/CD Pipeline<br/>(GitHub Actions)"]
    Scan["gitleaks / truffleHog<br/>Secret Scanner"]
    Block["Block if secrets found"]

    subgraph K8s["Kubernetes Cluster"]
        Pod["App Pod"]
        Sidecar["Vault Agent<br/>(Sidecar)"]
        SA["K8s Service Account<br/>(Secret Zero)"]
        Secrets["/vault/secrets/<br/>(tmpfs volume)"]
    end

    subgraph VaultInfra["Vault Infrastructure"]
        Vault["HashiCorp Vault<br/>(HA Raft Cluster)"]
        Dynamic["Dynamic Secrets:<br/>DB credentials<br/>AWS STS tokens<br/>TLS certificates"]
        Transit["Transit Engine:<br/>Encryption as a service"]
        AuditLog["Audit Log<br/>(every access recorded)"]
    end

    KMS["AWS KMS<br/>(Auto-Unseal)"]
    SIEM["SIEM / Splunk<br/>(Audit analysis)"]

    Dev -->|"git push<br/>(code only)"| Git
    Git --> CI
    CI --> Scan
    Scan -->|"secrets found"| Block
    Scan -->|"clean"| K8s
    Sidecar -->|"Auth via K8s SA"| Vault
    Vault --> Dynamic
    Vault --> Transit
    Vault --> AuditLog
    Vault --> KMS
    AuditLog --> SIEM
    Sidecar -->|"Write secrets to<br/>shared tmpfs"| Secrets
    Pod -->|"Read from<br/>/vault/secrets/"| Secrets
    SA -->|"Platform-asserted<br/>identity"| Sidecar

Notice that in this architecture, no human ever handles a production secret directly. Vault generates dynamic credentials, the app receives them through a sidecar, they expire automatically, and everything is audited. The Secret Zero is the Kubernetes service account token, which is platform-asserted — nobody creates it, nobody stores it, nobody rotates it manually.

For a two-person startup, start with AWS Secrets Manager or GCP Secret Manager — managed services that handle the operational burden. But once you have more than a handful of services, the investment in proper secret management pays for itself the first time you don't have to do an emergency credential rotation at 3 AM. And the first time you can show an auditor a complete trail of every secret access for the past year.


What You've Learned

In this chapter, we covered the full landscape of secret management:

  • Environment variables are insufficient as a secret management strategy because they leak through /proc/PID/environ, docker inspect, ps aux -e, crash dumps, child process inheritance, and have no audit trail or rotation mechanism.
  • The Secret Zero problem is the fundamental bootstrapping paradox — you always need at least one initial secret. Platform identity (IAM roles, Kubernetes service accounts) minimizes this to a platform-asserted credential that no human creates or manages.
  • HashiCorp Vault provides centralized secret storage with seal/unseal (Shamir's Secret Sharing), pluggable auth methods (Kubernetes, AWS IAM, OIDC), multiple secret engines (KV, database, PKI, transit), HCL-based policies, and comprehensive audit logging.
  • Dynamic secrets are generated on demand with automatic expiration, providing unique per-requester credentials with minimal blast radius. Every request gets different credentials that self-destruct.
  • Envelope encryption separates the master key (in KMS/HSM) from data encryption keys (used locally), enabling efficient encryption of large data sets, practical key rotation, and strict security boundaries.
  • AWS KMS and GCP KMS provide managed encryption key services backed by hardware security modules, ideal for envelope encryption and Vault auto-unseal.
  • Key rotation limits the damage window of a compromised key. Automatic rotation in KMS and Vault's rewrap capability make this operationally painless.
  • Sealed Secrets enable GitOps for Kubernetes secrets by encrypting them with the cluster's public key, making them safe to commit to version control.
  • Git secret scanning (gitleaks, truffleHog, GitHub secret scanning) is a critical safety net against accidental credential commits — but scanning is the last line of defense, not the primary control.
  • The Uber breach demonstrates the catastrophic consequences of poor secret management: AWS access keys in a Git repo led to exposure of 57 million records, $148 million in settlements, and the first criminal conviction of a CISO.

The core principle: secrets should be short-lived, narrowly scoped, automatically rotated, centrally managed, and comprehensively audited. Every step away from this ideal is a step toward the next breach.