Chapter 24: Zero Trust Architecture — Never Trust, Always Verify
"The perimeter is dead. Long live the identity." — John Kindervag, creator of the Zero Trust model
Consider two network diagrams. The first shows a traditional network — a thick outer wall with everything inside it marked "Trusted." The second shows a network where every connection has a small padlock icon, every service has a question mark, and nothing is marked trusted.
The first one looks simpler. Everything inside the wall is safe — until it is not. Remember the Target breach? The SolarWinds supply chain attack? The Colonial Pipeline ransomware? In every case, the attacker got inside the perimeter, and then the "trusted" network became their playground. Inside the wall, there were no checkpoints, no verification, no encryption. The attacker moved freely.
In the second diagram, nothing is trusted — even internal traffic. Every request is authenticated, authorized, and encrypted regardless of where it comes from. A request from the office network is treated with the same suspicion as a request from a coffee shop in another country. That is Zero Trust.
Implementing it is a journey, not a light switch. But the alternative — trusting everything inside your perimeter — has been proven catastrophically wrong, again and again.
The Failure of Perimeter Security
Traditional network security follows the castle-and-moat model: build a strong perimeter (firewall, VPN, DMZ), and trust everything inside it.
graph TD
subgraph "Castle-and-Moat Model"
WALL["PERIMETER<br/>Firewall + VPN"]
subgraph "Trusted Internal Zone"
EMP[Employees] <--> SRV[Servers]
SRV <--> DBS[Databases]
EMP <--> PRINTER[Printers]
PRINTER <--> IOT[IoT Devices]
IOT <--> SRV
end
end
EXT[External User] -->|"VPN login → full access"| WALL
WALL --> EMP
ATTACKER[Attacker<br/>with stolen VPN creds] -->|"Same VPN login → same full access"| WALL
style ATTACKER fill:#ff6b6b,stroke:#c0392b,color:#fff
Why the perimeter model fails:
| Assumption | Reality |
|---|---|
| "Inside = safe, outside = dangerous" | Phishing puts attackers inside. Supply chain attacks start inside. Insiders exist. |
| "If you passed the VPN, you're trusted" | VPN credentials are stolen via phishing, credential stuffing, infostealer malware |
| "Internal traffic doesn't need encryption" | Lateral movement, ARP spoofing, rogue access points all exploit unencrypted internal traffic |
| "The perimeter won't be breached" | Every perimeter is breached eventually — the question is when, not if |
| "We know where the perimeter is" | Remote work, cloud services, mobile devices, partner integrations — where does the perimeter end? |
Breaches that exploited perimeter trust:
- SolarWinds (2020): Supply chain attack placed backdoor inside networks of 18,000+ organizations including US government agencies. Attackers operated inside "trusted" networks for 9+ months.
- Colonial Pipeline (2021): Compromised VPN credentials (no MFA) gave attackers access to the internal network. Ransomware shut down the largest fuel pipeline in the US.
- Uber (2022): Attacker used MFA fatigue (repeated push notifications) to compromise an employee's VPN access. From there, accessed internal tools, source code, and financial data.
Zero Trust Principles — Deep Dive
Zero Trust is not a product or technology — it is a security philosophy with specific, actionable principles:
1. Never Trust, Always Verify
Every access request is fully authenticated, authorized, and encrypted before being granted, regardless of network location.
sequenceDiagram
participant User as User / Service
participant PEP as Policy Enforcement Point<br/>(Access Proxy / Gateway)
participant PDP as Policy Decision Point<br/>(Policy Engine)
participant IdP as Identity Provider
participant DPS as Device Posture Service
participant TI as Threat Intelligence
participant Resource as Protected Resource
User->>PEP: 1. Access request
PEP->>IdP: 2. Verify identity (SSO + MFA)
IdP-->>PEP: 3. Identity confirmed + attributes
PEP->>DPS: 4. Check device posture
DPS-->>PEP: 5. Device status: managed, patched, encrypted
PEP->>TI: 6. Check threat context
TI-->>PEP: 7. No known threats for this IP/user
PEP->>PDP: 8. Evaluate policy with ALL signals:<br/>identity + device + resource + context
PDP-->>PEP: 9. Decision: ALLOW with conditions<br/>(read-only, 4-hour session, log all)
PEP->>Resource: 10. Forward request with auth context
Resource-->>PEP: 11. Response
PEP-->>User: 12. Response (filtered if necessary)
Note over PEP,PDP: Steps 2-9 happen on EVERY request,<br/>not just at session start
2. Least Privilege Access
Users and services receive only the minimum permissions needed for their specific task, for the minimum necessary duration.
# Traditional: broad role-based access
# "Developer" role = access to ALL repositories, ALL databases, ALL internal tools
# Even though this developer only works on the payments service
# Zero Trust: fine-grained, context-aware access
def evaluate_access(user, resource, action, context):
"""Grant minimum necessary access based on all available signals."""
# Check: Does this user need access to this specific resource?
if resource.team != user.team and not user.is_oncall_for(resource):
return Deny("User is not on the team that owns this resource")
# Check: Is the action appropriate for their role?
if action == "write" and resource.environment == "production":
if not user.has_production_access:
return Deny("Write access to production requires approval")
# Check: Time-bounded access for sensitive operations
if resource.classification == "restricted":
return Allow(
duration=timedelta(hours=4), # Auto-expire in 4 hours
conditions=["MFA verified", "Corporate device"],
logging="enhanced"
)
return Allow(duration=timedelta(hours=8))
3. Assume Breach
Design systems as if an attacker is already inside the network. This mindset drives:
- Micro-segmentation between every service
- Encryption of all traffic, including internal
- Monitoring for lateral movement patterns
- Blast radius minimization through isolation
4. Verify Explicitly
Make access decisions based on ALL available data points, not just username/password:
graph TD
subgraph "Signals for Access Decision"
ID[User Identity<br/>Who are you?]
DEV[Device Health<br/>Is your device secure?]
LOC[Location & Network<br/>Where are you?]
TIME[Time & Behavior<br/>Is this normal?]
RESOURCE[Resource Sensitivity<br/>What are you accessing?]
RISK[Risk Score<br/>Cumulative risk assessment]
end
ID --> ENGINE[Policy Engine]
DEV --> ENGINE
LOC --> ENGINE
TIME --> ENGINE
RESOURCE --> ENGINE
RISK --> ENGINE
ENGINE --> DECISION{Decision}
DECISION -->|Low risk| ALLOW[Allow<br/>Full access]
DECISION -->|Medium risk| STEP_UP[Step-up auth<br/>Re-verify MFA]
DECISION -->|High risk| RESTRICT[Restrict<br/>Read-only access]
DECISION -->|Critical risk| BLOCK[Block<br/>Deny + alert SOC]
style ALLOW fill:#2ecc71,stroke:#27ae60,color:#fff
style STEP_UP fill:#f39c12,stroke:#e67e22,color:#fff
style RESTRICT fill:#e67e22,stroke:#d35400,color:#fff
style BLOCK fill:#ff6b6b,stroke:#c0392b,color:#fff
5. Continuous Verification
Authentication is not a one-time event at login. Continuously reevaluate access decisions throughout a session:
- Device becomes non-compliant mid-session (AV disabled) → reduce access to read-only
- User behavior anomaly detected (accessing resources they never use) → require re-authentication
- Threat intelligence updated (user's IP now on a botnet list) → terminate session
- Session duration exceeds policy → require re-authentication
The Five Pillars of Zero Trust
CISA (Cybersecurity and Infrastructure Security Agency) defines five pillars of Zero Trust maturity:
graph LR
subgraph "Five Pillars"
P1[Identity<br/>Who?]
P2[Device<br/>What device?]
P3[Network<br/>Where?]
P4[Application<br/>What workload?]
P5[Data<br/>What data?]
end
subgraph "Cross-Cutting"
VIS[Visibility &<br/>Analytics]
AUTO[Automation &<br/>Orchestration]
GOV[Governance]
end
P1 --> VIS
P2 --> VIS
P3 --> VIS
P4 --> VIS
P5 --> VIS
VIS --> AUTO
AUTO --> GOV
Pillar 1: Identity
The cornerstone of Zero Trust. Identity replaces network location as the primary security boundary.
- Multi-factor authentication (MFA): Passwords alone are insufficient. Require phishing-resistant MFA — FIDO2/WebAuthn hardware security keys are the gold standard. TOTP apps are acceptable. SMS-based MFA is vulnerable to SIM swapping and should be phased out.
- Single sign-on (SSO): Centralize authentication through an IdP (Okta, Azure AD, Google Workspace). When an employee is terminated, one account disable cuts ALL access — within minutes, not days.
- Service identity: Every microservice has an identity (service account, workload identity, mTLS certificate). Services authenticate to each other, never assuming trust based on network location.
MFA is non-negotiable in Zero Trust. The 2022 Uber breach demonstrated MFA fatigue attacks — the attacker sent repeated push notifications until the employee, exhausted by the alerts, approved one at 1:00 AM.
**Defenses against MFA fatigue:**
- **Number matching:** The user must type a code displayed on screen, not just tap "approve"
- **Geographic context:** Show the location of the authentication request
- **Rate limiting:** Block push notifications after 3 unanswered prompts
- **FIDO2 hardware keys:** No prompt to approve — requires physical possession and touch
Pillar 2: Device
The security posture of the device matters as much as the identity of the user.
Device health signals checked on every access:
| Signal | What it means | Impact on access |
|---|---|---|
| Managed device? | Enrolled in MDM (Jamf, Intune) | Unmanaged → deny or read-only |
| OS patched? | Latest security updates installed | Unpatched → deny sensitive apps |
| Disk encrypted? | FileVault, BitLocker enabled | Unencrypted → deny all |
| Firewall enabled? | Host firewall active | Disabled → warning |
| EDR running? | Endpoint detection agent active | Missing → deny sensitive apps |
| Jailbroken/rooted? | OS integrity compromised | Jailbroken → deny all |
| Certificate present? | Device certificate from internal CA | Missing → deny |
Pillar 3: Network
In Zero Trust, the network is untrusted by default. Security does not depend on which network you are on.
- Micro-segmentation: Every service-to-service communication is explicitly authorized (Chapter 23)
- Encrypted transport: All traffic is encrypted, even within the data center — mTLS between services
- Software-defined perimeter (SDP): Services are invisible to unauthorized users. Ports are closed by default; they only open after identity verification.
Pillar 4: Application and Workload
- Per-request authorization: Every API call checks "can this identity access this resource with this action?"
- Secure software supply chain: Verify container images (Sigstore/cosign), dependencies (SBOMs), and deployment artifacts
- Runtime protection: Monitor application behavior for anomalies
Pillar 5: Data
Data is the ultimate target. Zero Trust protects data regardless of where it resides.
- Data classification: Label data as public, internal, confidential, restricted
- Encryption everywhere: At rest (AES-256), in transit (TLS 1.3), in use (confidential computing)
- Data loss prevention (DLP): Monitor and prevent unauthorized exfiltration
- Access logging: Every access to sensitive data is logged for audit
BeyondCorp: Google's Zero Trust Implementation
Has anyone actually implemented full Zero Trust at scale? Google did. They called it BeyondCorp, and they started in 2011 — partly in response to Operation Aurora, a sophisticated attack attributed to Chinese state actors that compromised Google's internal network in 2009. If Google's perimeter could be breached, whose couldn't?
graph TD
subgraph "Any Network (Office, Home, Coffee Shop)"
USER[Employee on any device]
end
USER --> PROXY
subgraph "BeyondCorp Components"
PROXY[Access Proxy<br/>Internet-facing<br/>reverse proxy]
PROXY --> AUTH{Authenticate}
AUTH -->|"SSO + MFA<br/>(hardware key)"| DEVICE_CHECK{Check Device}
DEVICE_CHECK -->|"Query Device<br/>Inventory DB"| TRUST{Calculate<br/>Trust Score}
TRUST -->|"Combine signals:<br/>• User identity & groups<br/>• Device certificate (TPM-bound)<br/>• OS patch level<br/>• Disk encryption status<br/>• EDR agent status<br/>• Location anomalies<br/>• Time of day"| POLICY{Access<br/>Policy Engine}
POLICY -->|"Trust score HIGH<br/>+ matching policy"| ALLOW[Forward to<br/>internal app]
POLICY -->|"Trust score LOW<br/>(e.g., unpatched device)"| REMEDIATE[Redirect to<br/>device compliance page]
POLICY -->|"Trust score FAILED<br/>(unknown device, no cert)"| DENY[Access denied]
end
subgraph "Internal Applications"
APP1[Gmail Admin]
APP2[Code Search]
APP3[Bug Tracker]
APP4[HR Systems]
end
ALLOW --> APP1
ALLOW --> APP2
ALLOW --> APP3
ALLOW --> APP4
style DENY fill:#ff6b6b,stroke:#c0392b,color:#fff
style ALLOW fill:#2ecc71,stroke:#27ae60,color:#fff
style REMEDIATE fill:#f39c12,stroke:#e67e22,color:#fff
Key Properties of BeyondCorp
-
No VPN. Google eliminated their VPN entirely. Every employee, whether in a Google office or on a beach in Thailand, accesses corporate applications through the same access proxy with the same identity and device checks.
-
Network location is irrelevant. The corporate WiFi and Starbucks WiFi are treated identically. This eliminated enormous complexity around office network security.
-
Device certificates are TPM-bound. The device certificate is tied to the hardware Trusted Platform Module — it cannot be exported or copied to another device. If the device is stolen, the certificate is useless without the user's credentials.
-
Trust is dynamic. A fully patched, encrypted, managed device gets a high trust score. The same device, after missing an OS update, gets a lower score and may lose access to sensitive applications — automatically.
-
Per-application access. Users do not get "access to the network." They get access to specific applications based on their role, team membership, and device posture. An engineer accessing the code repository goes through the same authentication as the same engineer accessing the HR system — but the policy may require additional verification for the HR system.
Google published the BeyondCorp papers between 2014 and 2017:
- **BeyondCorp: A New Approach to Enterprise Security** (2014) — The core architecture
- **BeyondCorp: Design to Deployment at Google** (2016) — Implementation details
- **BeyondCorp: The Access Proxy** (2017) — The internet-facing component
Key insight: the migration took **years**, not months. Google did not flip a switch. They:
1. Built the device inventory and device certificate infrastructure
2. Migrated applications one by one behind the access proxy
3. Ran the VPN and BeyondCorp in parallel during transition
4. Gradually tightened policies as confidence grew
5. Eventually decommissioned the VPN
This phased approach is the model for every organization implementing Zero Trust — you cannot do it all at once, and you should not try.
A company running a traditional VPN allowed every remote employee to connect and land on the corporate network with full access to everything — file servers, databases, admin panels, CI/CD, source code repositories.
During a penetration test, a single employee's VPN credentials were compromised through a phishing exercise. Within two hours, the testers had:
- Accessed the HR database (employee SSNs and salaries)
- Cloned the entire source code repository
- Read secrets from the CI/CD pipeline configuration
- Connected to the production database (credentials were in a shared config file)
Total time from phished credential to full crown-jewel access: 2 hours 14 minutes.
The company's response: "We need a better VPN." The correct recommendation: "You need to stop trusting the VPN. The problem is not the quality of the wall — it is the assumption that everything inside the wall is safe."
Zero Trust vs. VPN
So does Zero Trust replace VPNs entirely? In most modern architectures, yes. Here is exactly why.
graph LR
subgraph "VPN Model"
V_USER[User] -->|"VPN tunnel"| V_GW[VPN Gateway]
V_GW -->|"Full network access<br/>to entire subnet"| V_NET[Internal Network]
V_NET --> V_APP1[App 1]
V_NET --> V_APP2[App 2]
V_NET --> V_DB[Database]
V_NET --> V_AD[Active Directory]
V_NET --> V_CI[CI/CD Pipeline]
end
subgraph "Zero Trust Model"
ZT_USER[User] -->|"HTTPS"| ZT_PROXY[Access Proxy]
ZT_PROXY -->|"Identity + Device<br/>verified per-app"| ZT_APP1[App 1 ✓]
ZT_PROXY -->|"Not authorized"| ZT_APP2[App 2 ✗]
ZT_PROXY -->|"Not authorized"| ZT_DB[Database ✗]
end
style V_NET fill:#ff6b6b,stroke:#c0392b,color:#fff
style ZT_APP2 fill:#ff6b6b,stroke:#c0392b,color:#fff
style ZT_DB fill:#ff6b6b,stroke:#c0392b,color:#fff
style ZT_APP1 fill:#2ecc71,stroke:#27ae60,color:#fff
| Aspect | VPN | Zero Trust (ZTNA) |
|---|---|---|
| Access scope | Entire network subnet | Specific applications only |
| Authentication | Once at connection | Continuous, per-request |
| Device posture | Rarely checked after connect | Checked on every request |
| Compromised credentials | Full network access | Access to authorized apps only (with device check) |
| Split tunneling | Creates security gaps | No tunnel — direct to authorized apps |
| Performance | All traffic through VPN concentrator (bottleneck) | Direct connections via edge proxy (scales horizontally) |
| Visibility | VPN logs show connection/disconnection only | Full request-level audit trail |
ZTNA products that replace VPNs:
| Product | Approach | Key Feature |
|---|---|---|
| Cloudflare Access | Edge-based reverse proxy + tunnels | Global edge network, Cloudflare tunnel |
| Zscaler Private Access | Cloud-delivered ZTNA | Inside-out connectivity (no inbound ports) |
| Palo Alto Prisma Access | SASE platform | Integrates with on-prem firewalls |
| Google BeyondCorp Enterprise | Google's commercial version | Chrome Enterprise integration |
| Tailscale | WireGuard-based mesh | Peer-to-peer, ACL-based, minimal infrastructure |
| Twingate | Software-defined perimeter | Resource-level access control |
# Example: Cloudflare Access setup for an internal application
# 1. Install cloudflared tunnel on the internal network
cloudflared tunnel create my-app-tunnel
cloudflared tunnel route dns my-app-tunnel internal-app.example.com
# 2. Configure the tunnel to point to the internal service
cat > ~/.cloudflared/config.yml << 'EOF'
tunnel: <tunnel-id>
credentials-file: /root/.cloudflared/<tunnel-id>.json
ingress:
- hostname: internal-app.example.com
service: http://10.0.1.50:8080
- service: http_status:404
EOF
# 3. Start the tunnel
cloudflared tunnel run my-app-tunnel
# 4. Configure Access Policy (via Cloudflare dashboard or API):
# - Require: email domain = @example.com (IdP integration)
# - Require: device posture = managed device with EDR
# - Allow: groups = engineering, ops
# - Session duration: 12 hours
# - Re-auth for sensitive operations: every 1 hour
# 5. Users navigate to https://internal-app.example.com
# Cloudflare Access authenticates them, checks device posture,
# evaluates policy, and proxies the request through the tunnel.
# The internal service is NEVER directly exposed to the internet.
# No VPN. No inbound firewall rules. No exposed ports.
Service Mesh for Zero Trust: Istio and Envoy
Zero Trust for humans is one challenge. Zero Trust for service-to-service communication — the east-west traffic discussed in Chapter 23 — is another. That is where service meshes come in.
A service mesh provides a dedicated infrastructure layer for handling service-to-service communication, implementing Zero Trust principles automatically:
graph TD
subgraph "Control Plane (istiod)"
CITADEL[Citadel<br/>Certificate Authority<br/>Issues short-lived mTLS certs<br/>Auto-rotates every 24h]
PILOT[Pilot<br/>Configuration<br/>Distributes routing rules]
POLICY[Policy Engine<br/>Authorization rules<br/>Per-service, per-method]
end
CITADEL -->|"Certs"| E1 & E2 & E3
PILOT -->|"Config"| E1 & E2 & E3
POLICY -->|"AuthZ rules"| E1 & E2 & E3
subgraph "Pod A: Orders Service"
APP_A[Application<br/>Container] <-->|"localhost"| E1[Envoy<br/>Sidecar Proxy]
end
subgraph "Pod B: Payments Service"
APP_B[Application<br/>Container] <-->|"localhost"| E2[Envoy<br/>Sidecar Proxy]
end
subgraph "Pod C: Inventory Service"
APP_C[Application<br/>Container] <-->|"localhost"| E3[Envoy<br/>Sidecar Proxy]
end
E1 <-->|"mTLS<br/>Encrypted + Authenticated<br/>Identity: orders-service"| E2
E1 <-.->|"DENIED by policy<br/>Orders cannot call Inventory directly"| E3
E2 <-->|"mTLS"| E3
style E1 fill:#3498db,stroke:#2980b9,color:#fff
style E2 fill:#3498db,stroke:#2980b9,color:#fff
style E3 fill:#3498db,stroke:#2980b9,color:#fff
How Istio Implements Zero Trust
Every inter-service call is automatically:
- Authenticated: Envoy presents its mTLS certificate (issued by Citadel) to the destination. Both sides verify each other's identity.
- Authorized: The policy engine checks "can the orders-service call the payments-service's
/api/v1/chargeendpoint with POST?" - Encrypted: All traffic between sidecars uses mTLS — encrypted in transit, always.
- Logged: Full request metadata (source, destination, method, path, response code, latency) is logged for audit.
The application code has no knowledge of mTLS. It makes a plain HTTP call to http://payments-service:8080/api/v1/charge. The Envoy sidecar intercepts the call, establishes mTLS with the destination's Envoy sidecar, and forwards the request. Zero Trust at the infrastructure level, invisible to developers.
Istio Authorization Policies
# Only allow the "orders" service to call the "payments" service
# on POST /api/v1/charge — deny everything else
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payments-policy
namespace: production
spec:
selector:
matchLabels:
app: payments
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/orders-service"
to:
- operation:
methods: ["POST"]
paths: ["/api/v1/charge"]
# Implicit deny — if no rule matches, the request is DENIED
# Require STRICT mTLS for all services in the mesh
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT # Reject ANY non-mTLS connection
Explore Istio security features on a test cluster:
~~~bash
# Install Istio
istioctl install --set profile=demo -y
# Enable sidecar injection for a namespace
kubectl label namespace production istio-injection=enabled
# Deploy sample application
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml -n production
# Verify mTLS is active
istioctl proxy-config listeners productpage-v1-xxx.production
istioctl authn tls-check productpage.production.svc.cluster.local
# Apply strict mTLS
kubectl apply -n production -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
spec:
mtls:
mode: STRICT
EOF
# Test from a non-mesh client (should FAIL — no sidecar, no mTLS cert)
kubectl run test -n default --image=curlimages/curl --rm -it -- \
curl http://productpage.production:9080
# Expected: connection refused or reset — the service requires mTLS
# View security dashboard
istioctl dashboard kiali
# Kiali shows mTLS status for all service-to-service connections
~~~
NIST SP 800-207: Zero Trust Architecture Standard
The National Institute of Standards and Technology published SP 800-207 in August 2020 as the definitive reference for Zero Trust architecture. It became a federal mandate through Executive Order 14028 in May 2021.
Policy Decision Point and Policy Enforcement Point
The core of NIST's Zero Trust model is the separation of the decision ("should this access be allowed?") from the enforcement ("actually allow or block the access"):
graph TD
subgraph "Data Sources"
IDP[Identity Provider<br/>User/service identity]
DEVICE[Device Posture Service<br/>OS patches, encryption, EDR]
THREAT[Threat Intelligence<br/>Known malicious IPs, IOCs]
SIEM_D[SIEM / Logs<br/>Historical access patterns]
DATA_CLASS[Data Classification<br/>Sensitivity labels]
COMPLIANCE[Compliance Engine<br/>Regulatory requirements]
end
subgraph "Policy Decision Point (PDP)"
PE[Policy Engine<br/>Evaluates all signals]
PA[Policy Administrator<br/>Grants/revokes access<br/>Configures PEP]
end
subgraph "Policy Enforcement Point (PEP)"
PROXY[Access Proxy / Gateway<br/>API Gateway / Service Mesh Sidecar<br/>Actually allows or blocks the connection]
end
IDP --> PE
DEVICE --> PE
THREAT --> PE
SIEM_D --> PE
DATA_CLASS --> PE
COMPLIANCE --> PE
PE -->|"Decision:<br/>ALLOW / DENY / STEP-UP"| PA
PA -->|"Configure"| PROXY
USER[User / Service] -->|"1. Access request"| PROXY
PROXY -->|"2. Ask for decision"| PE
PE -->|"3. Decision + conditions"| PROXY
PROXY -->|"4a. ALLOW"| RESOURCE[Protected Resource]
PROXY -->|"4b. DENY"| BLOCKED[Access Denied]
style PE fill:#3498db,stroke:#2980b9,color:#fff
style PROXY fill:#e74c3c,stroke:#c0392b,color:#fff
NIST Trust Algorithm
NIST describes a trust algorithm that combines multiple signals into an access decision:
Trust Score = f(
identity_confidence, # How sure are we this is who they claim?
device_health_score, # Is the device secure and managed?
request_risk_score, # How sensitive is the requested resource?
behavioral_anomaly_score, # Is this access pattern normal?
threat_intelligence, # Any known threats related to this context?
environmental_factors # Time, location, network
)
if Trust Score >= Resource Trust Threshold:
ALLOW (with logging and conditions)
else if Trust Score >= Step-Up Threshold:
REQUIRE additional verification (re-auth, MFA, manager approval)
else:
DENY (log and alert)
**Executive Order 14028** (May 2021) mandated that US federal agencies adopt Zero Trust architecture. The subsequent OMB Memorandum M-22-09 set specific requirements:
- Agency staff use enterprise-managed identities with **phishing-resistant MFA** (FIDO2)
- Every device accessing resources is **inventoried and tracked**
- **Encrypted DNS** (DoH/DoT) and **HTTPS-only** traffic
- Application-level access controls **independent of network location**
- Data categorization with **automated monitoring**
This drove massive adoption across government and the defense industrial base. Federal contractors and suppliers began implementing Zero Trust to meet supply chain security requirements — cascading the mandate through the private sector.
CISA's Zero Trust Maturity Model defines four stages: Traditional → Initial → Advanced → Optimal, providing a concrete roadmap for organizations at any starting point.
Practical Implementation Roadmap
How do you actually implement Zero Trust? You cannot just flip a switch. Here is a practical roadmap from "we have a VPN and a perimeter firewall" to "we have Zero Trust." Most organizations take 18-36 months.
gantt
title Zero Trust Implementation Roadmap
dateFormat YYYY-MM
axisFormat %b %Y
section Phase 1: Foundation
Deploy SSO + MFA for all users :done, p1a, 2025-01, 2025-03
Inventory all devices :done, p1b, 2025-01, 2025-03
Inventory all applications :done, p1c, 2025-02, 2025-03
Map service communication flows :done, p1d, 2025-02, 2025-04
section Phase 2: Quick Wins
Move 2-3 apps behind ZTNA proxy :active, p2a, 2025-04, 2025-06
Device posture checks for those apps:active, p2b, 2025-04, 2025-06
Begin micro-segmentation (critical) :active, p2c, 2025-05, 2025-07
Data classification starts :p2d, 2025-05, 2025-07
section Phase 3: Expand
Migrate remaining apps to ZTNA :p3a, 2025-07, 2025-12
Deploy service mesh (mTLS) :p3b, 2025-08, 2025-12
MDM for all corporate devices :p3c, 2025-07, 2025-09
Centralized access logging :p3d, 2025-08, 2025-10
section Phase 4: Mature
Continuous verification :p4a, 2026-01, 2026-06
Dynamic risk-adaptive policies :p4b, 2026-01, 2026-06
Decommission VPN :crit, p4c, 2026-03, 2026-06
Policy as code + automation :p4d, 2026-04, 2026-06
Phase 1: Foundation (Months 1-3)
# Discover active services and communication patterns
# AWS: Enable VPC Flow Logs
aws ec2 create-flow-logs \
--resource-type VPC \
--resource-id vpc-abc123 \
--traffic-type ALL \
--log-destination-type s3 \
--log-destination arn:aws:s3:::flow-logs-bucket
# Kubernetes: Map service communication with Cilium Hubble
hubble observe --namespace production --output json \
| jq '{src: .source.labels.app, dst: .destination.labels.app,
port: .l4.TCP.destination_port}' \
| sort -u
# Inventory all service accounts and their permissions
aws iam list-users --query 'Users[*].[UserName,CreateDate]' --output table
aws iam list-roles --query 'Roles[*].[RoleName,CreateDate]' --output table
Phase 2: Quick Wins (Months 3-6)
Move 2-3 internal applications behind a Zero Trust proxy. Choose applications that are:
- Widely used (validates the approach with real users)
- Not mission-critical (lower risk during migration)
- Currently accessed via VPN (demonstrates VPN replacement)
Phase 3: Expand (Months 6-12)
- Migrate remaining applications behind the access proxy
- Deploy service mesh for inter-service mTLS
- Deploy MDM and enforce device compliance
- Centralize all access logs into SIEM
Phase 4: Mature (Months 12-24)
- Implement continuous verification (session re-evaluation)
- Deploy risk-adaptive policies (dynamic trust scoring)
- Decommission VPN infrastructure
- Automate policy management (policy as code in git)
Common Zero Trust Mistakes
Before diving into maturity models, it is worth examining the pitfalls that organizations commonly fall into.
1. Treating Zero Trust as a product purchase. "We bought a Zero Trust solution" is not Zero Trust. No single vendor delivers end-to-end Zero Trust. It is an architecture implemented through identity, device management, network segmentation, application-level authorization, data protection, and monitoring — spanning multiple tools and teams.
2. Implementing Zero Trust only for remote workers. If office employees bypass the Zero Trust proxy because they are "on the corporate network," you do not have Zero Trust. The office network must be treated identically to any external network. This is the core insight of BeyondCorp.
3. Ignoring service-to-service communication. Zero Trust for human users but implicit trust between microservices leaves a massive gap. If Service A can call any other service without authentication, a compromised Service A has the same lateral movement capability as an attacker on a flat network. Service mesh and mTLS are essential.
4. Making the user experience terrible. If Zero Trust means employees are constantly reauthenticating and losing access, they will find workarounds (sharing credentials, disabling security tools). Good Zero Trust is transparent — strong security with minimal friction through SSO, device certificates, and risk-based adaptive policies that only challenge users when something is unusual.
5. Not investing in monitoring. Zero Trust generates an enormous amount of access data. Without proper logging, analysis, and alerting, you cannot detect compromised credentials, policy violations, or anomalous behavior. The data is there — but someone must look at it.
6. Forgetting about legacy systems. The mainframe that has been running since 1997 and only supports Telnet is not going to get an Envoy sidecar. Plan for legacy systems: wrap them in an access proxy, segment them aggressively, monitor them closely, and create a modernization plan.
A financial services company implemented Zero Trust for all their web applications. Beautiful architecture — identity-aware proxy, device posture checks, mTLS between services, centralized logging. Then someone asked about their mainframe.
"Oh, that's on the internal network." The mainframe processed wire transfers. It was accessible via TN3270 from anyone on the corporate VLAN. No authentication beyond the VLAN boundary. Their $3 million Zero Trust project had a $0 gap that could move millions in unauthorized transfers.
The fix: wrap the mainframe access in a Zero Trust proxy (Teleport for legacy protocols), add MFA for every session, and implement session recording. The mainframe itself did not change — the access path changed.
Zero Trust is only as strong as the weakest component in the system. Legacy systems need special attention, not exemption.
Zero Trust Maturity Model
graph TD
subgraph "Level 0: Traditional"
L0["VPN-based remote access<br/>Flat internal network<br/>Passwords only<br/>Implicit trust for internal traffic<br/>No device management"]
end
subgraph "Level 1: Initial"
L1["MFA deployed for all users<br/>SSO for most applications<br/>Basic network segmentation (VLANs)<br/>Device inventory exists<br/>Some logging"]
end
subgraph "Level 2: Advanced"
L2["Identity-based access replacing VPN<br/>Micro-segmentation for critical workloads<br/>Device posture checked for access<br/>mTLS for service-to-service<br/>Centralized access logging + monitoring"]
end
subgraph "Level 3: Optimal"
L3["All access is identity-based (no VPN)<br/>Continuous verification throughout sessions<br/>Dynamic risk-adaptive policies<br/>Full micro-segmentation<br/>All traffic encrypted<br/>Automated anomaly response<br/>Policy as code, fully automated lifecycle"]
end
L0 -->|"3-6 months"| L1
L1 -->|"6-12 months"| L2
L2 -->|"12-24 months"| L3
style L0 fill:#ff6b6b,stroke:#c0392b,color:#fff
style L1 fill:#f39c12,stroke:#e67e22,color:#fff
style L2 fill:#3498db,stroke:#2980b9,color:#fff
style L3 fill:#2ecc71,stroke:#27ae60,color:#fff
What You've Learned
This chapter covered Zero Trust Architecture, the modern approach to network security that abandons perimeter-based trust:
-
Perimeter security has failed because the perimeter is porous (phishing, supply chain), dissolving (remote work, cloud), and insufficient (lateral movement after breach). SolarWinds, Colonial Pipeline, and Uber all demonstrated this.
-
Zero Trust principles: never trust, always verify; least privilege; assume breach; verify explicitly with all available signals; continuous verification throughout sessions. These apply to every user, every device, every service, every request.
-
The five pillars — identity, device, network, application, and data — must all be addressed for complete Zero Trust coverage. Identity is the cornerstone: it replaces network location as the primary trust signal.
-
BeyondCorp (Google's implementation) proved Zero Trust works at massive scale. They eliminated their VPN, treated the corporate network as hostile, and made all access decisions through an identity-aware access proxy with device posture checks.
-
Zero Trust replaces VPN with ZTNA — application-level access through identity-aware proxies instead of network-level access through VPN tunnels. Compromised credentials with ZTNA give access to specific apps (with device checks), not the entire network.
-
Service mesh (Istio/Envoy) implements Zero Trust for east-west service-to-service communication through automatic mTLS, identity-based authorization policies, and comprehensive observability — invisible to application code.
-
NIST SP 800-207 provides the standard reference architecture. The Policy Decision Point (PDP) evaluates access requests using multiple signals. The Policy Enforcement Point (PEP) enforces the decision. Federal mandate through EO 14028 is driving adoption.
-
Implementation is a journey — start with identity and MFA (Phase 1), migrate applications behind a ZTNA proxy (Phase 2), deploy service mesh and MDM (Phase 3), and mature toward continuous verification and VPN decommission (Phase 4). Budget 18-36 months.
Zero Trust is applying the principle of least privilege to everything — every user, every device, every network, every service, every piece of data — and verifying it continuously. Not "verify once at login." Not "verify when they're off the VPN." Continuously. Because the attacker who compromises a session five minutes after authentication does not care that you verified the user at login time.
It is a lot of work. But compare it to the alternative: responding to a breach where an attacker had free reign inside your network for months because everything was "trusted." SolarWinds attackers were inside government networks for nine months. Colonial Pipeline was shut down for six days. The cost of those breaches — in money, reputation, and national security — dwarfs the cost of implementing Zero Trust. Zero Trust invests effort upfront so that when breach happens, the blast radius is a single compromised session, not the entire organization.
When breach happens. Not if. That is how security engineers think.