Chapter 24: Zero Trust Architecture — Never Trust, Always Verify

"The perimeter is dead. Long live the identity." — John Kindervag, creator of the Zero Trust model

Consider two network diagrams. The first shows a traditional network — a thick outer wall with everything inside it marked "Trusted." The second shows a network where every connection has a small padlock icon, every service has a question mark, and nothing is marked trusted.

The first one looks simpler. Everything inside the wall is safe — until it is not. Remember the Target breach? The SolarWinds supply chain attack? The Colonial Pipeline ransomware? In every case, the attacker got inside the perimeter, and then the "trusted" network became their playground. Inside the wall, there were no checkpoints, no verification, no encryption. The attacker moved freely.

In the second diagram, nothing is trusted — even internal traffic. Every request is authenticated, authorized, and encrypted regardless of where it comes from. A request from the office network is treated with the same suspicion as a request from a coffee shop in another country. That is Zero Trust.

Implementing it is a journey, not a light switch. But the alternative — trusting everything inside your perimeter — has been proven catastrophically wrong, again and again.


The Failure of Perimeter Security

Traditional network security follows the castle-and-moat model: build a strong perimeter (firewall, VPN, DMZ), and trust everything inside it.

graph TD
    subgraph "Castle-and-Moat Model"
        WALL["PERIMETER<br/>Firewall + VPN"]

        subgraph "Trusted Internal Zone"
            EMP[Employees] <--> SRV[Servers]
            SRV <--> DBS[Databases]
            EMP <--> PRINTER[Printers]
            PRINTER <--> IOT[IoT Devices]
            IOT <--> SRV
        end
    end

    EXT[External User] -->|"VPN login → full access"| WALL
    WALL --> EMP

    ATTACKER[Attacker<br/>with stolen VPN creds] -->|"Same VPN login → same full access"| WALL

    style ATTACKER fill:#ff6b6b,stroke:#c0392b,color:#fff

Why the perimeter model fails:

AssumptionReality
"Inside = safe, outside = dangerous"Phishing puts attackers inside. Supply chain attacks start inside. Insiders exist.
"If you passed the VPN, you're trusted"VPN credentials are stolen via phishing, credential stuffing, infostealer malware
"Internal traffic doesn't need encryption"Lateral movement, ARP spoofing, rogue access points all exploit unencrypted internal traffic
"The perimeter won't be breached"Every perimeter is breached eventually — the question is when, not if
"We know where the perimeter is"Remote work, cloud services, mobile devices, partner integrations — where does the perimeter end?

Breaches that exploited perimeter trust:

  • SolarWinds (2020): Supply chain attack placed backdoor inside networks of 18,000+ organizations including US government agencies. Attackers operated inside "trusted" networks for 9+ months.
  • Colonial Pipeline (2021): Compromised VPN credentials (no MFA) gave attackers access to the internal network. Ransomware shut down the largest fuel pipeline in the US.
  • Uber (2022): Attacker used MFA fatigue (repeated push notifications) to compromise an employee's VPN access. From there, accessed internal tools, source code, and financial data.

Zero Trust Principles — Deep Dive

Zero Trust is not a product or technology — it is a security philosophy with specific, actionable principles:

1. Never Trust, Always Verify

Every access request is fully authenticated, authorized, and encrypted before being granted, regardless of network location.

sequenceDiagram
    participant User as User / Service
    participant PEP as Policy Enforcement Point<br/>(Access Proxy / Gateway)
    participant PDP as Policy Decision Point<br/>(Policy Engine)
    participant IdP as Identity Provider
    participant DPS as Device Posture Service
    participant TI as Threat Intelligence
    participant Resource as Protected Resource

    User->>PEP: 1. Access request
    PEP->>IdP: 2. Verify identity (SSO + MFA)
    IdP-->>PEP: 3. Identity confirmed + attributes

    PEP->>DPS: 4. Check device posture
    DPS-->>PEP: 5. Device status: managed, patched, encrypted

    PEP->>TI: 6. Check threat context
    TI-->>PEP: 7. No known threats for this IP/user

    PEP->>PDP: 8. Evaluate policy with ALL signals:<br/>identity + device + resource + context
    PDP-->>PEP: 9. Decision: ALLOW with conditions<br/>(read-only, 4-hour session, log all)

    PEP->>Resource: 10. Forward request with auth context
    Resource-->>PEP: 11. Response
    PEP-->>User: 12. Response (filtered if necessary)

    Note over PEP,PDP: Steps 2-9 happen on EVERY request,<br/>not just at session start

2. Least Privilege Access

Users and services receive only the minimum permissions needed for their specific task, for the minimum necessary duration.

# Traditional: broad role-based access
# "Developer" role = access to ALL repositories, ALL databases, ALL internal tools
# Even though this developer only works on the payments service

# Zero Trust: fine-grained, context-aware access
def evaluate_access(user, resource, action, context):
    """Grant minimum necessary access based on all available signals."""

    # Check: Does this user need access to this specific resource?
    if resource.team != user.team and not user.is_oncall_for(resource):
        return Deny("User is not on the team that owns this resource")

    # Check: Is the action appropriate for their role?
    if action == "write" and resource.environment == "production":
        if not user.has_production_access:
            return Deny("Write access to production requires approval")

    # Check: Time-bounded access for sensitive operations
    if resource.classification == "restricted":
        return Allow(
            duration=timedelta(hours=4),  # Auto-expire in 4 hours
            conditions=["MFA verified", "Corporate device"],
            logging="enhanced"
        )

    return Allow(duration=timedelta(hours=8))

3. Assume Breach

Design systems as if an attacker is already inside the network. This mindset drives:

  • Micro-segmentation between every service
  • Encryption of all traffic, including internal
  • Monitoring for lateral movement patterns
  • Blast radius minimization through isolation

4. Verify Explicitly

Make access decisions based on ALL available data points, not just username/password:

graph TD
    subgraph "Signals for Access Decision"
        ID[User Identity<br/>Who are you?]
        DEV[Device Health<br/>Is your device secure?]
        LOC[Location & Network<br/>Where are you?]
        TIME[Time & Behavior<br/>Is this normal?]
        RESOURCE[Resource Sensitivity<br/>What are you accessing?]
        RISK[Risk Score<br/>Cumulative risk assessment]
    end

    ID --> ENGINE[Policy Engine]
    DEV --> ENGINE
    LOC --> ENGINE
    TIME --> ENGINE
    RESOURCE --> ENGINE
    RISK --> ENGINE

    ENGINE --> DECISION{Decision}
    DECISION -->|Low risk| ALLOW[Allow<br/>Full access]
    DECISION -->|Medium risk| STEP_UP[Step-up auth<br/>Re-verify MFA]
    DECISION -->|High risk| RESTRICT[Restrict<br/>Read-only access]
    DECISION -->|Critical risk| BLOCK[Block<br/>Deny + alert SOC]

    style ALLOW fill:#2ecc71,stroke:#27ae60,color:#fff
    style STEP_UP fill:#f39c12,stroke:#e67e22,color:#fff
    style RESTRICT fill:#e67e22,stroke:#d35400,color:#fff
    style BLOCK fill:#ff6b6b,stroke:#c0392b,color:#fff

5. Continuous Verification

Authentication is not a one-time event at login. Continuously reevaluate access decisions throughout a session:

  • Device becomes non-compliant mid-session (AV disabled) → reduce access to read-only
  • User behavior anomaly detected (accessing resources they never use) → require re-authentication
  • Threat intelligence updated (user's IP now on a botnet list) → terminate session
  • Session duration exceeds policy → require re-authentication

The Five Pillars of Zero Trust

CISA (Cybersecurity and Infrastructure Security Agency) defines five pillars of Zero Trust maturity:

graph LR
    subgraph "Five Pillars"
        P1[Identity<br/>Who?]
        P2[Device<br/>What device?]
        P3[Network<br/>Where?]
        P4[Application<br/>What workload?]
        P5[Data<br/>What data?]
    end

    subgraph "Cross-Cutting"
        VIS[Visibility &<br/>Analytics]
        AUTO[Automation &<br/>Orchestration]
        GOV[Governance]
    end

    P1 --> VIS
    P2 --> VIS
    P3 --> VIS
    P4 --> VIS
    P5 --> VIS
    VIS --> AUTO
    AUTO --> GOV

Pillar 1: Identity

The cornerstone of Zero Trust. Identity replaces network location as the primary security boundary.

  • Multi-factor authentication (MFA): Passwords alone are insufficient. Require phishing-resistant MFA — FIDO2/WebAuthn hardware security keys are the gold standard. TOTP apps are acceptable. SMS-based MFA is vulnerable to SIM swapping and should be phased out.
  • Single sign-on (SSO): Centralize authentication through an IdP (Okta, Azure AD, Google Workspace). When an employee is terminated, one account disable cuts ALL access — within minutes, not days.
  • Service identity: Every microservice has an identity (service account, workload identity, mTLS certificate). Services authenticate to each other, never assuming trust based on network location.
MFA is non-negotiable in Zero Trust. The 2022 Uber breach demonstrated MFA fatigue attacks — the attacker sent repeated push notifications until the employee, exhausted by the alerts, approved one at 1:00 AM.

**Defenses against MFA fatigue:**
- **Number matching:** The user must type a code displayed on screen, not just tap "approve"
- **Geographic context:** Show the location of the authentication request
- **Rate limiting:** Block push notifications after 3 unanswered prompts
- **FIDO2 hardware keys:** No prompt to approve — requires physical possession and touch

Pillar 2: Device

The security posture of the device matters as much as the identity of the user.

Device health signals checked on every access:

SignalWhat it meansImpact on access
Managed device?Enrolled in MDM (Jamf, Intune)Unmanaged → deny or read-only
OS patched?Latest security updates installedUnpatched → deny sensitive apps
Disk encrypted?FileVault, BitLocker enabledUnencrypted → deny all
Firewall enabled?Host firewall activeDisabled → warning
EDR running?Endpoint detection agent activeMissing → deny sensitive apps
Jailbroken/rooted?OS integrity compromisedJailbroken → deny all
Certificate present?Device certificate from internal CAMissing → deny

Pillar 3: Network

In Zero Trust, the network is untrusted by default. Security does not depend on which network you are on.

  • Micro-segmentation: Every service-to-service communication is explicitly authorized (Chapter 23)
  • Encrypted transport: All traffic is encrypted, even within the data center — mTLS between services
  • Software-defined perimeter (SDP): Services are invisible to unauthorized users. Ports are closed by default; they only open after identity verification.

Pillar 4: Application and Workload

  • Per-request authorization: Every API call checks "can this identity access this resource with this action?"
  • Secure software supply chain: Verify container images (Sigstore/cosign), dependencies (SBOMs), and deployment artifacts
  • Runtime protection: Monitor application behavior for anomalies

Pillar 5: Data

Data is the ultimate target. Zero Trust protects data regardless of where it resides.

  • Data classification: Label data as public, internal, confidential, restricted
  • Encryption everywhere: At rest (AES-256), in transit (TLS 1.3), in use (confidential computing)
  • Data loss prevention (DLP): Monitor and prevent unauthorized exfiltration
  • Access logging: Every access to sensitive data is logged for audit

BeyondCorp: Google's Zero Trust Implementation

Has anyone actually implemented full Zero Trust at scale? Google did. They called it BeyondCorp, and they started in 2011 — partly in response to Operation Aurora, a sophisticated attack attributed to Chinese state actors that compromised Google's internal network in 2009. If Google's perimeter could be breached, whose couldn't?

graph TD
    subgraph "Any Network (Office, Home, Coffee Shop)"
        USER[Employee on any device]
    end

    USER --> PROXY

    subgraph "BeyondCorp Components"
        PROXY[Access Proxy<br/>Internet-facing<br/>reverse proxy]

        PROXY --> AUTH{Authenticate}
        AUTH -->|"SSO + MFA<br/>(hardware key)"| DEVICE_CHECK{Check Device}

        DEVICE_CHECK -->|"Query Device<br/>Inventory DB"| TRUST{Calculate<br/>Trust Score}

        TRUST -->|"Combine signals:<br/>• User identity & groups<br/>• Device certificate (TPM-bound)<br/>• OS patch level<br/>• Disk encryption status<br/>• EDR agent status<br/>• Location anomalies<br/>• Time of day"| POLICY{Access<br/>Policy Engine}

        POLICY -->|"Trust score HIGH<br/>+ matching policy"| ALLOW[Forward to<br/>internal app]
        POLICY -->|"Trust score LOW<br/>(e.g., unpatched device)"| REMEDIATE[Redirect to<br/>device compliance page]
        POLICY -->|"Trust score FAILED<br/>(unknown device, no cert)"| DENY[Access denied]
    end

    subgraph "Internal Applications"
        APP1[Gmail Admin]
        APP2[Code Search]
        APP3[Bug Tracker]
        APP4[HR Systems]
    end

    ALLOW --> APP1
    ALLOW --> APP2
    ALLOW --> APP3
    ALLOW --> APP4

    style DENY fill:#ff6b6b,stroke:#c0392b,color:#fff
    style ALLOW fill:#2ecc71,stroke:#27ae60,color:#fff
    style REMEDIATE fill:#f39c12,stroke:#e67e22,color:#fff

Key Properties of BeyondCorp

  1. No VPN. Google eliminated their VPN entirely. Every employee, whether in a Google office or on a beach in Thailand, accesses corporate applications through the same access proxy with the same identity and device checks.

  2. Network location is irrelevant. The corporate WiFi and Starbucks WiFi are treated identically. This eliminated enormous complexity around office network security.

  3. Device certificates are TPM-bound. The device certificate is tied to the hardware Trusted Platform Module — it cannot be exported or copied to another device. If the device is stolen, the certificate is useless without the user's credentials.

  4. Trust is dynamic. A fully patched, encrypted, managed device gets a high trust score. The same device, after missing an OS update, gets a lower score and may lose access to sensitive applications — automatically.

  5. Per-application access. Users do not get "access to the network." They get access to specific applications based on their role, team membership, and device posture. An engineer accessing the code repository goes through the same authentication as the same engineer accessing the HR system — but the policy may require additional verification for the HR system.

Google published the BeyondCorp papers between 2014 and 2017:

- **BeyondCorp: A New Approach to Enterprise Security** (2014) — The core architecture
- **BeyondCorp: Design to Deployment at Google** (2016) — Implementation details
- **BeyondCorp: The Access Proxy** (2017) — The internet-facing component

Key insight: the migration took **years**, not months. Google did not flip a switch. They:
1. Built the device inventory and device certificate infrastructure
2. Migrated applications one by one behind the access proxy
3. Ran the VPN and BeyondCorp in parallel during transition
4. Gradually tightened policies as confidence grew
5. Eventually decommissioned the VPN

This phased approach is the model for every organization implementing Zero Trust — you cannot do it all at once, and you should not try.
A company running a traditional VPN allowed every remote employee to connect and land on the corporate network with full access to everything — file servers, databases, admin panels, CI/CD, source code repositories.

During a penetration test, a single employee's VPN credentials were compromised through a phishing exercise. Within two hours, the testers had:
- Accessed the HR database (employee SSNs and salaries)
- Cloned the entire source code repository
- Read secrets from the CI/CD pipeline configuration
- Connected to the production database (credentials were in a shared config file)

Total time from phished credential to full crown-jewel access: 2 hours 14 minutes.

The company's response: "We need a better VPN." The correct recommendation: "You need to stop trusting the VPN. The problem is not the quality of the wall — it is the assumption that everything inside the wall is safe."

Zero Trust vs. VPN

So does Zero Trust replace VPNs entirely? In most modern architectures, yes. Here is exactly why.

graph LR
    subgraph "VPN Model"
        V_USER[User] -->|"VPN tunnel"| V_GW[VPN Gateway]
        V_GW -->|"Full network access<br/>to entire subnet"| V_NET[Internal Network]
        V_NET --> V_APP1[App 1]
        V_NET --> V_APP2[App 2]
        V_NET --> V_DB[Database]
        V_NET --> V_AD[Active Directory]
        V_NET --> V_CI[CI/CD Pipeline]
    end

    subgraph "Zero Trust Model"
        ZT_USER[User] -->|"HTTPS"| ZT_PROXY[Access Proxy]
        ZT_PROXY -->|"Identity + Device<br/>verified per-app"| ZT_APP1[App 1 ✓]
        ZT_PROXY -->|"Not authorized"| ZT_APP2[App 2 ✗]
        ZT_PROXY -->|"Not authorized"| ZT_DB[Database ✗]
    end

    style V_NET fill:#ff6b6b,stroke:#c0392b,color:#fff
    style ZT_APP2 fill:#ff6b6b,stroke:#c0392b,color:#fff
    style ZT_DB fill:#ff6b6b,stroke:#c0392b,color:#fff
    style ZT_APP1 fill:#2ecc71,stroke:#27ae60,color:#fff
AspectVPNZero Trust (ZTNA)
Access scopeEntire network subnetSpecific applications only
AuthenticationOnce at connectionContinuous, per-request
Device postureRarely checked after connectChecked on every request
Compromised credentialsFull network accessAccess to authorized apps only (with device check)
Split tunnelingCreates security gapsNo tunnel — direct to authorized apps
PerformanceAll traffic through VPN concentrator (bottleneck)Direct connections via edge proxy (scales horizontally)
VisibilityVPN logs show connection/disconnection onlyFull request-level audit trail

ZTNA products that replace VPNs:

ProductApproachKey Feature
Cloudflare AccessEdge-based reverse proxy + tunnelsGlobal edge network, Cloudflare tunnel
Zscaler Private AccessCloud-delivered ZTNAInside-out connectivity (no inbound ports)
Palo Alto Prisma AccessSASE platformIntegrates with on-prem firewalls
Google BeyondCorp EnterpriseGoogle's commercial versionChrome Enterprise integration
TailscaleWireGuard-based meshPeer-to-peer, ACL-based, minimal infrastructure
TwingateSoftware-defined perimeterResource-level access control
# Example: Cloudflare Access setup for an internal application

# 1. Install cloudflared tunnel on the internal network
cloudflared tunnel create my-app-tunnel
cloudflared tunnel route dns my-app-tunnel internal-app.example.com

# 2. Configure the tunnel to point to the internal service
cat > ~/.cloudflared/config.yml << 'EOF'
tunnel: <tunnel-id>
credentials-file: /root/.cloudflared/<tunnel-id>.json
ingress:
  - hostname: internal-app.example.com
    service: http://10.0.1.50:8080
  - service: http_status:404
EOF

# 3. Start the tunnel
cloudflared tunnel run my-app-tunnel

# 4. Configure Access Policy (via Cloudflare dashboard or API):
#    - Require: email domain = @example.com (IdP integration)
#    - Require: device posture = managed device with EDR
#    - Allow: groups = engineering, ops
#    - Session duration: 12 hours
#    - Re-auth for sensitive operations: every 1 hour

# 5. Users navigate to https://internal-app.example.com
#    Cloudflare Access authenticates them, checks device posture,
#    evaluates policy, and proxies the request through the tunnel.
#    The internal service is NEVER directly exposed to the internet.
#    No VPN. No inbound firewall rules. No exposed ports.

Service Mesh for Zero Trust: Istio and Envoy

Zero Trust for humans is one challenge. Zero Trust for service-to-service communication — the east-west traffic discussed in Chapter 23 — is another. That is where service meshes come in.

A service mesh provides a dedicated infrastructure layer for handling service-to-service communication, implementing Zero Trust principles automatically:

graph TD
    subgraph "Control Plane (istiod)"
        CITADEL[Citadel<br/>Certificate Authority<br/>Issues short-lived mTLS certs<br/>Auto-rotates every 24h]
        PILOT[Pilot<br/>Configuration<br/>Distributes routing rules]
        POLICY[Policy Engine<br/>Authorization rules<br/>Per-service, per-method]
    end

    CITADEL -->|"Certs"| E1 & E2 & E3
    PILOT -->|"Config"| E1 & E2 & E3
    POLICY -->|"AuthZ rules"| E1 & E2 & E3

    subgraph "Pod A: Orders Service"
        APP_A[Application<br/>Container] <-->|"localhost"| E1[Envoy<br/>Sidecar Proxy]
    end

    subgraph "Pod B: Payments Service"
        APP_B[Application<br/>Container] <-->|"localhost"| E2[Envoy<br/>Sidecar Proxy]
    end

    subgraph "Pod C: Inventory Service"
        APP_C[Application<br/>Container] <-->|"localhost"| E3[Envoy<br/>Sidecar Proxy]
    end

    E1 <-->|"mTLS<br/>Encrypted + Authenticated<br/>Identity: orders-service"| E2
    E1 <-.->|"DENIED by policy<br/>Orders cannot call Inventory directly"| E3
    E2 <-->|"mTLS"| E3

    style E1 fill:#3498db,stroke:#2980b9,color:#fff
    style E2 fill:#3498db,stroke:#2980b9,color:#fff
    style E3 fill:#3498db,stroke:#2980b9,color:#fff

How Istio Implements Zero Trust

Every inter-service call is automatically:

  1. Authenticated: Envoy presents its mTLS certificate (issued by Citadel) to the destination. Both sides verify each other's identity.
  2. Authorized: The policy engine checks "can the orders-service call the payments-service's /api/v1/charge endpoint with POST?"
  3. Encrypted: All traffic between sidecars uses mTLS — encrypted in transit, always.
  4. Logged: Full request metadata (source, destination, method, path, response code, latency) is logged for audit.

The application code has no knowledge of mTLS. It makes a plain HTTP call to http://payments-service:8080/api/v1/charge. The Envoy sidecar intercepts the call, establishes mTLS with the destination's Envoy sidecar, and forwards the request. Zero Trust at the infrastructure level, invisible to developers.

Istio Authorization Policies

# Only allow the "orders" service to call the "payments" service
# on POST /api/v1/charge — deny everything else
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: payments-policy
  namespace: production
spec:
  selector:
    matchLabels:
      app: payments
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
          - "cluster.local/ns/production/sa/orders-service"
    to:
    - operation:
        methods: ["POST"]
        paths: ["/api/v1/charge"]
  # Implicit deny — if no rule matches, the request is DENIED
# Require STRICT mTLS for all services in the mesh
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT  # Reject ANY non-mTLS connection
Explore Istio security features on a test cluster:

~~~bash
# Install Istio
istioctl install --set profile=demo -y

# Enable sidecar injection for a namespace
kubectl label namespace production istio-injection=enabled

# Deploy sample application
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml -n production

# Verify mTLS is active
istioctl proxy-config listeners productpage-v1-xxx.production
istioctl authn tls-check productpage.production.svc.cluster.local

# Apply strict mTLS
kubectl apply -n production -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
spec:
  mtls:
    mode: STRICT
EOF

# Test from a non-mesh client (should FAIL — no sidecar, no mTLS cert)
kubectl run test -n default --image=curlimages/curl --rm -it -- \
  curl http://productpage.production:9080
# Expected: connection refused or reset — the service requires mTLS

# View security dashboard
istioctl dashboard kiali
# Kiali shows mTLS status for all service-to-service connections
~~~

NIST SP 800-207: Zero Trust Architecture Standard

The National Institute of Standards and Technology published SP 800-207 in August 2020 as the definitive reference for Zero Trust architecture. It became a federal mandate through Executive Order 14028 in May 2021.

Policy Decision Point and Policy Enforcement Point

The core of NIST's Zero Trust model is the separation of the decision ("should this access be allowed?") from the enforcement ("actually allow or block the access"):

graph TD
    subgraph "Data Sources"
        IDP[Identity Provider<br/>User/service identity]
        DEVICE[Device Posture Service<br/>OS patches, encryption, EDR]
        THREAT[Threat Intelligence<br/>Known malicious IPs, IOCs]
        SIEM_D[SIEM / Logs<br/>Historical access patterns]
        DATA_CLASS[Data Classification<br/>Sensitivity labels]
        COMPLIANCE[Compliance Engine<br/>Regulatory requirements]
    end

    subgraph "Policy Decision Point (PDP)"
        PE[Policy Engine<br/>Evaluates all signals]
        PA[Policy Administrator<br/>Grants/revokes access<br/>Configures PEP]
    end

    subgraph "Policy Enforcement Point (PEP)"
        PROXY[Access Proxy / Gateway<br/>API Gateway / Service Mesh Sidecar<br/>Actually allows or blocks the connection]
    end

    IDP --> PE
    DEVICE --> PE
    THREAT --> PE
    SIEM_D --> PE
    DATA_CLASS --> PE
    COMPLIANCE --> PE

    PE -->|"Decision:<br/>ALLOW / DENY / STEP-UP"| PA
    PA -->|"Configure"| PROXY

    USER[User / Service] -->|"1. Access request"| PROXY
    PROXY -->|"2. Ask for decision"| PE
    PE -->|"3. Decision + conditions"| PROXY
    PROXY -->|"4a. ALLOW"| RESOURCE[Protected Resource]
    PROXY -->|"4b. DENY"| BLOCKED[Access Denied]

    style PE fill:#3498db,stroke:#2980b9,color:#fff
    style PROXY fill:#e74c3c,stroke:#c0392b,color:#fff

NIST Trust Algorithm

NIST describes a trust algorithm that combines multiple signals into an access decision:

Trust Score = f(
    identity_confidence,      # How sure are we this is who they claim?
    device_health_score,      # Is the device secure and managed?
    request_risk_score,       # How sensitive is the requested resource?
    behavioral_anomaly_score, # Is this access pattern normal?
    threat_intelligence,      # Any known threats related to this context?
    environmental_factors     # Time, location, network
)

if Trust Score >= Resource Trust Threshold:
    ALLOW (with logging and conditions)
else if Trust Score >= Step-Up Threshold:
    REQUIRE additional verification (re-auth, MFA, manager approval)
else:
    DENY (log and alert)
**Executive Order 14028** (May 2021) mandated that US federal agencies adopt Zero Trust architecture. The subsequent OMB Memorandum M-22-09 set specific requirements:

- Agency staff use enterprise-managed identities with **phishing-resistant MFA** (FIDO2)
- Every device accessing resources is **inventoried and tracked**
- **Encrypted DNS** (DoH/DoT) and **HTTPS-only** traffic
- Application-level access controls **independent of network location**
- Data categorization with **automated monitoring**

This drove massive adoption across government and the defense industrial base. Federal contractors and suppliers began implementing Zero Trust to meet supply chain security requirements — cascading the mandate through the private sector.

CISA's Zero Trust Maturity Model defines four stages: Traditional → Initial → Advanced → Optimal, providing a concrete roadmap for organizations at any starting point.

Practical Implementation Roadmap

How do you actually implement Zero Trust? You cannot just flip a switch. Here is a practical roadmap from "we have a VPN and a perimeter firewall" to "we have Zero Trust." Most organizations take 18-36 months.

gantt
    title Zero Trust Implementation Roadmap
    dateFormat  YYYY-MM
    axisFormat  %b %Y

    section Phase 1: Foundation
    Deploy SSO + MFA for all users     :done, p1a, 2025-01, 2025-03
    Inventory all devices              :done, p1b, 2025-01, 2025-03
    Inventory all applications         :done, p1c, 2025-02, 2025-03
    Map service communication flows    :done, p1d, 2025-02, 2025-04

    section Phase 2: Quick Wins
    Move 2-3 apps behind ZTNA proxy    :active, p2a, 2025-04, 2025-06
    Device posture checks for those apps:active, p2b, 2025-04, 2025-06
    Begin micro-segmentation (critical) :active, p2c, 2025-05, 2025-07
    Data classification starts          :p2d, 2025-05, 2025-07

    section Phase 3: Expand
    Migrate remaining apps to ZTNA      :p3a, 2025-07, 2025-12
    Deploy service mesh (mTLS)          :p3b, 2025-08, 2025-12
    MDM for all corporate devices       :p3c, 2025-07, 2025-09
    Centralized access logging          :p3d, 2025-08, 2025-10

    section Phase 4: Mature
    Continuous verification             :p4a, 2026-01, 2026-06
    Dynamic risk-adaptive policies      :p4b, 2026-01, 2026-06
    Decommission VPN                    :crit, p4c, 2026-03, 2026-06
    Policy as code + automation         :p4d, 2026-04, 2026-06

Phase 1: Foundation (Months 1-3)

# Discover active services and communication patterns
# AWS: Enable VPC Flow Logs
aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-id vpc-abc123 \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination arn:aws:s3:::flow-logs-bucket

# Kubernetes: Map service communication with Cilium Hubble
hubble observe --namespace production --output json \
  | jq '{src: .source.labels.app, dst: .destination.labels.app,
         port: .l4.TCP.destination_port}' \
  | sort -u

# Inventory all service accounts and their permissions
aws iam list-users --query 'Users[*].[UserName,CreateDate]' --output table
aws iam list-roles --query 'Roles[*].[RoleName,CreateDate]' --output table

Phase 2: Quick Wins (Months 3-6)

Move 2-3 internal applications behind a Zero Trust proxy. Choose applications that are:

  • Widely used (validates the approach with real users)
  • Not mission-critical (lower risk during migration)
  • Currently accessed via VPN (demonstrates VPN replacement)

Phase 3: Expand (Months 6-12)

  • Migrate remaining applications behind the access proxy
  • Deploy service mesh for inter-service mTLS
  • Deploy MDM and enforce device compliance
  • Centralize all access logs into SIEM

Phase 4: Mature (Months 12-24)

  • Implement continuous verification (session re-evaluation)
  • Deploy risk-adaptive policies (dynamic trust scoring)
  • Decommission VPN infrastructure
  • Automate policy management (policy as code in git)

Common Zero Trust Mistakes

Before diving into maturity models, it is worth examining the pitfalls that organizations commonly fall into.

1. Treating Zero Trust as a product purchase. "We bought a Zero Trust solution" is not Zero Trust. No single vendor delivers end-to-end Zero Trust. It is an architecture implemented through identity, device management, network segmentation, application-level authorization, data protection, and monitoring — spanning multiple tools and teams.

2. Implementing Zero Trust only for remote workers. If office employees bypass the Zero Trust proxy because they are "on the corporate network," you do not have Zero Trust. The office network must be treated identically to any external network. This is the core insight of BeyondCorp.

3. Ignoring service-to-service communication. Zero Trust for human users but implicit trust between microservices leaves a massive gap. If Service A can call any other service without authentication, a compromised Service A has the same lateral movement capability as an attacker on a flat network. Service mesh and mTLS are essential.

4. Making the user experience terrible. If Zero Trust means employees are constantly reauthenticating and losing access, they will find workarounds (sharing credentials, disabling security tools). Good Zero Trust is transparent — strong security with minimal friction through SSO, device certificates, and risk-based adaptive policies that only challenge users when something is unusual.

5. Not investing in monitoring. Zero Trust generates an enormous amount of access data. Without proper logging, analysis, and alerting, you cannot detect compromised credentials, policy violations, or anomalous behavior. The data is there — but someone must look at it.

6. Forgetting about legacy systems. The mainframe that has been running since 1997 and only supports Telnet is not going to get an Envoy sidecar. Plan for legacy systems: wrap them in an access proxy, segment them aggressively, monitor them closely, and create a modernization plan.

A financial services company implemented Zero Trust for all their web applications. Beautiful architecture — identity-aware proxy, device posture checks, mTLS between services, centralized logging. Then someone asked about their mainframe.

"Oh, that's on the internal network." The mainframe processed wire transfers. It was accessible via TN3270 from anyone on the corporate VLAN. No authentication beyond the VLAN boundary. Their $3 million Zero Trust project had a $0 gap that could move millions in unauthorized transfers.

The fix: wrap the mainframe access in a Zero Trust proxy (Teleport for legacy protocols), add MFA for every session, and implement session recording. The mainframe itself did not change — the access path changed.

Zero Trust is only as strong as the weakest component in the system. Legacy systems need special attention, not exemption.

Zero Trust Maturity Model

graph TD
    subgraph "Level 0: Traditional"
        L0["VPN-based remote access<br/>Flat internal network<br/>Passwords only<br/>Implicit trust for internal traffic<br/>No device management"]
    end

    subgraph "Level 1: Initial"
        L1["MFA deployed for all users<br/>SSO for most applications<br/>Basic network segmentation (VLANs)<br/>Device inventory exists<br/>Some logging"]
    end

    subgraph "Level 2: Advanced"
        L2["Identity-based access replacing VPN<br/>Micro-segmentation for critical workloads<br/>Device posture checked for access<br/>mTLS for service-to-service<br/>Centralized access logging + monitoring"]
    end

    subgraph "Level 3: Optimal"
        L3["All access is identity-based (no VPN)<br/>Continuous verification throughout sessions<br/>Dynamic risk-adaptive policies<br/>Full micro-segmentation<br/>All traffic encrypted<br/>Automated anomaly response<br/>Policy as code, fully automated lifecycle"]
    end

    L0 -->|"3-6 months"| L1
    L1 -->|"6-12 months"| L2
    L2 -->|"12-24 months"| L3

    style L0 fill:#ff6b6b,stroke:#c0392b,color:#fff
    style L1 fill:#f39c12,stroke:#e67e22,color:#fff
    style L2 fill:#3498db,stroke:#2980b9,color:#fff
    style L3 fill:#2ecc71,stroke:#27ae60,color:#fff

What You've Learned

This chapter covered Zero Trust Architecture, the modern approach to network security that abandons perimeter-based trust:

  • Perimeter security has failed because the perimeter is porous (phishing, supply chain), dissolving (remote work, cloud), and insufficient (lateral movement after breach). SolarWinds, Colonial Pipeline, and Uber all demonstrated this.

  • Zero Trust principles: never trust, always verify; least privilege; assume breach; verify explicitly with all available signals; continuous verification throughout sessions. These apply to every user, every device, every service, every request.

  • The five pillars — identity, device, network, application, and data — must all be addressed for complete Zero Trust coverage. Identity is the cornerstone: it replaces network location as the primary trust signal.

  • BeyondCorp (Google's implementation) proved Zero Trust works at massive scale. They eliminated their VPN, treated the corporate network as hostile, and made all access decisions through an identity-aware access proxy with device posture checks.

  • Zero Trust replaces VPN with ZTNA — application-level access through identity-aware proxies instead of network-level access through VPN tunnels. Compromised credentials with ZTNA give access to specific apps (with device checks), not the entire network.

  • Service mesh (Istio/Envoy) implements Zero Trust for east-west service-to-service communication through automatic mTLS, identity-based authorization policies, and comprehensive observability — invisible to application code.

  • NIST SP 800-207 provides the standard reference architecture. The Policy Decision Point (PDP) evaluates access requests using multiple signals. The Policy Enforcement Point (PEP) enforces the decision. Federal mandate through EO 14028 is driving adoption.

  • Implementation is a journey — start with identity and MFA (Phase 1), migrate applications behind a ZTNA proxy (Phase 2), deploy service mesh and MDM (Phase 3), and mature toward continuous verification and VPN decommission (Phase 4). Budget 18-36 months.

Zero Trust is applying the principle of least privilege to everything — every user, every device, every network, every service, every piece of data — and verifying it continuously. Not "verify once at login." Not "verify when they're off the VPN." Continuously. Because the attacker who compromises a session five minutes after authentication does not care that you verified the user at login time.

It is a lot of work. But compare it to the alternative: responding to a breach where an attacker had free reign inside your network for months because everything was "trusted." SolarWinds attackers were inside government networks for nine months. Colonial Pipeline was shut down for six days. The cost of those breaches — in money, reputation, and national security — dwarfs the cost of implementing Zero Trust. Zero Trust invests effort upfront so that when breach happens, the blast radius is a single compromised session, not the entire organization.

When breach happens. Not if. That is how security engineers think.