Passwords, Hashing, and Credential Storage

"The best password is the one you don't have to store." -- Every security engineer who has cleaned up after a breach

Consider a dark web monitoring alert: a competitor just got breached. 2.3 million user records. The passwords were stored as MD5 hashes, without salt. Every single password is crackable in under a minute with a decent GPU. In 2026.

Unsalted MD5 is worse than plaintext storage in a subtle way. Plaintext is honest incompetence. Unsalted MD5 is the illusion of competence. Someone looked at that code and thought "we hash our passwords" and moved on to the next feature. Understanding what proper credential storage actually looks like -- and why getting it wrong has destroyed companies -- is essential knowledge.


The Hall of Shame: Password Storage Disasters

Before discussing how to do it right, let us look at how some of the biggest companies got it catastrophically wrong. These are not theoretical scenarios -- they are well-documented breaches that exposed real users to real harm.

RockYou (2009) -- 32 Million Plaintext Passwords

RockYou was a social media application company that made widgets for MySpace and Facebook. In December 2009, a SQL injection vulnerability in their web application gave an attacker direct read access to their database. The passwords were stored in **plaintext**. Not hashed. Not encrypted. Plain, readable, copy-pasteable text.

32,603,388 passwords. Exposed. Downloadable.

This breach became the foundation of modern password research. The "rockyou.txt" wordlist -- all 32 million passwords in a flat file -- is now the most widely used dictionary for password cracking. It ships pre-installed with Kali Linux. Every penetration tester has used it. Every password-cracking GPU benchmark uses it.

The most common passwords from the RockYou breach:

| Rank | Password | Count | Percentage |
|------|----------|-------|-----------|
| 1 | 123456 | 290,731 | 0.89% |
| 2 | 12345 | 79,078 | 0.24% |
| 3 | 123456789 | 76,790 | 0.24% |
| 4 | password | 61,958 | 0.19% |
| 5 | iloveyou | 51,622 | 0.16% |
| 6 | princess | 35,231 | 0.11% |
| 7 | 1234567 | 35,078 | 0.11% |
| 8 | rockyou | 22,588 | 0.07% |
| 9 | 12345678 | 20,553 | 0.06% |
| 10 | abc123 | 17,542 | 0.05% |

Nearly 1% of all users chose "123456" as their password. The top 100 passwords covered over 5% of all accounts. This data set proved empirically what security researchers had long suspected: humans are terrible at choosing passwords, and the distribution of password choices follows a power law -- a small number of passwords cover a large fraction of users.

Adobe (2013) -- 153 Million Poorly Encrypted Passwords

Adobe's breach in October 2013 exposed 153 million user records. They did not hash passwords -- they **encrypted** them using 3DES in ECB (Electronic Codebook) mode with a single key for all passwords. This sounds like it should be better than hashing, but they made three critical errors that made it far worse:

**Error 1: ECB mode.** ECB encrypts each block independently with the same key. Identical plaintext blocks produce identical ciphertext blocks. This means every user with the password "123456" had the exact same encrypted value in the database. An attacker did not need to break the encryption -- they just needed to count frequencies.

**Error 2: Single encryption key.** All 153 million passwords were encrypted with one key. If that key were ever recovered (through a separate breach, insider threat, or legal compulsion), every password would be instantly decryptable. Hashing does not have this single-point-of-failure property.

**Error 3: Plaintext password hints.** Adobe stored user-written "password hints" in plaintext alongside the encrypted passwords. Users wrote hints like "the usual," "123456," "my dog's name + birthday," and even "the password is monkey123." Security researchers were able to crack millions of passwords simply by reading the hints associated with the most common encrypted values.

The Adobe breach data became a Venn-diagram meme: researchers grouped users by identical encrypted passwords and combined their hints to decode what the passwords were, without ever breaking the encryption.

**Key lesson:** Encryption is not hashing. Encryption is reversible by design -- that is its purpose. Password storage must be a one-way function. You should never be able to recover the original password from stored data.

LinkedIn (2012, 2016) -- 117 Million SHA-1 Hashes Without Salt

LinkedIn originally announced that 6.5 million password hashes were leaked in 2012. In 2016, the full database surfaced: 117 million email-password pairs. LinkedIn had hashed passwords with SHA-1 but without salting.

Without salt, identical passwords produce identical hashes. Attackers used precomputed rainbow tables -- massive databases mapping hashes back to passwords -- to crack millions of passwords in hours. The lack of salt also meant that every user with the same password could be cracked simultaneously: find the hash for "password123" once, and every user who chose it is compromised.

LinkedIn migrated to bcrypt after the breach, but for 117 million users, the damage was done. Their credentials were being sold on dark web markets within days and used in credential stuffing attacks against other services.


Why Plaintext Is Catastrophic: The Cascade Effect

Why does it matter so much if someone has access to the database and already has all the user data? Because the damage from password exposure extends far beyond your own service. There are three amplification effects that make password breaches cascade.

Effect 1: Password Reuse

Studies consistently show that 60-80% of users reuse passwords across multiple services. A 2019 Google/Harris Poll survey found that 65% of respondents reuse passwords, and 13% use the same password for all accounts. This means if you store passwords in plaintext and get breached, you are handing attackers the keys to your users' email, banking, healthcare, and social media accounts -- services you have no control over.

Effect 2: Credential Stuffing Economics

graph TD
    A["Breach: 10 million<br/>email:password pairs<br/>obtained for ~$500<br/>on dark web"] --> B["Credential Stuffing Service<br/>(Automated login attempts<br/>across hundreds of services)"]
    B --> C["Gmail: 0.5% success<br/>= 50,000 accounts"]
    B --> D["Banking apps: 0.1%<br/>= 10,000 accounts"]
    B --> E["Corporate VPN: 0.2%<br/>= 20,000 accounts"]
    B --> F["Shopping sites: 1%<br/>= 100,000 accounts"]

    C --> G["Sell access: $5-50/account<br/>depending on value"]
    D --> H["Drain funds directly<br/>or sell access: $100+/account"]
    E --> I["Ransomware deployment<br/>or data theft<br/>Value: $10,000-millions"]
    F --> J["Make fraudulent purchases<br/>using stored payment methods"]

    style A fill:#ff6b6b,color:#fff
    style H fill:#ff6b6b,color:#fff
    style I fill:#ff6b6b,color:#fff

Credential stuffing is an industrial-scale operation. Attackers purchase breached credential databases for a few hundred dollars, use automated tools to try each credential against hundreds of popular services simultaneously, and the success rates -- even at 0.1% to 2% -- yield tens of thousands of compromised accounts from a single breach. Dedicated credential stuffing tools like Sentry MBA, STORM, and custom scripts distribute attacks across thousands of proxy IPs to avoid rate limiting.

The economics are stark: the cost of the attack is nearly zero (automated tools, cheap proxies, freely available breach data), and the return is significant. This is why your password storage decisions affect not just your users on your platform, but your users everywhere.

Under GDPR, storing passwords in plaintext or with inadequate hashing (MD5, SHA-1 without salt) is a failure to implement "appropriate technical measures" under Article 25. Fines can reach 4% of global annual revenue or 20 million euros, whichever is higher. The UK ICO fined TalkTalk 400,000 pounds in 2016 partly for inadequate password storage. PCI DSS requires that passwords be rendered unrecoverable using "strong one-way hash functions." HIPAA's security rule requires encryption of PHI at rest.


Cryptographic Hashing for Passwords: Why Speed Is the Enemy

You might think SHA-256 is the right choice -- it is a strong hash function, after all. But SHA-256 is a strong general-purpose hash. It is terrible for passwords.

The Problem with Fast Hashes

graph LR
    subgraph "SHA-256 on RTX 4090 GPU"
        S1["22 BILLION hashes/second"]
        S2["rockyou.txt dictionary<br/>(14 million words):<br/>< 0.001 seconds"]
        S3["All 8-char lowercase:<br/>208 billion combinations<br/>~10 seconds"]
        S4["All 8-char alphanumeric:<br/>2.8 trillion combinations<br/>~2 minutes"]
    end

    subgraph "Argon2id on same GPU"
        A1["~10 hashes/second<br/>(with proper parameters)"]
        A2["rockyou.txt dictionary:<br/>~16 days"]
        A3["All 8-char lowercase:<br/>~660 years"]
        A4["All 8-char alphanumeric:<br/>~8,900 years"]
    end

    style S1 fill:#ff6b6b,color:#fff
    style S2 fill:#ff6b6b,color:#fff
    style S3 fill:#ff6b6b,color:#fff
    style S4 fill:#ff6b6b,color:#fff
    style A1 fill:#69db7c,color:#000
    style A2 fill:#69db7c,color:#000
    style A3 fill:#69db7c,color:#000
    style A4 fill:#69db7c,color:#000

SHA-256 is designed to be fast. That is a feature when you are checksumming files, building Merkle trees, or verifying data integrity. Speed is a catastrophic vulnerability when you are storing passwords, because an attacker who obtains your hashed password database can try billions of guesses per second.

The speed that makes SHA-256 great for checksums makes it terrible for passwords. For password hashing, you need algorithms that are intentionally slow and expensive to compute. You want it to take 200-400 milliseconds to verify a single password. A legitimate user logging in once waits 400ms -- barely noticeable. An attacker trying a billion passwords waits 12.7 years. That asymmetry is the entire point of password hashing functions.


Salting: Why It Is Essential

A salt is a random value unique to each user, generated when the password is first stored and saved alongside the hash. It ensures that identical passwords produce different hashes.

Without Salt: Batch Cracking

Database without salting:
User     Password      SHA-256 Hash
alice    password123   ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
bob      password123   ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f  ← SAME
charlie  letmein       1c8bfe8f801d79745c4631d09fff36c82aa37fc4cce4fc946683d7b336b63032
dave     password123   ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f  ← SAME

Three users have identical hashes. The attacker knows instantly that alice, bob, and dave share a password. Crack one, crack all three. This also enables rainbow table attacks -- precomputed tables mapping hashes to passwords. A rainbow table for SHA-256 covering all common passwords is a one-time cost that can be reused against every unsalted database.

With Salt: Every Hash Is Unique

Database with salting:
User     Salt (16 bytes, random)          Password      Hash (SHA-256 of salt+password)
alice    a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6  password123   7f2e8a91c34b... (unique)
bob      e5f6a7b8c9d0e1f2a3b4c5d6d7e8f9a0  password123   3d4c5b6a7e8f... (different!)
charlie  c9d0e1f2a3b4c5d6d7e8f9a0b1c2d3e4  letmein       9a8b7c6d5e4f... (unique)
dave     d7e8f9a0b1c2d3e4e5f6a7b8c9d0e1f2  password123   1a2b3c4d5e6f... (different!)

Even though alice, bob, and dave have the same password, every hash is different because each has a unique salt. Rainbow tables are useless because the attacker would need a separate table for every possible salt value -- with 128-bit salts, that is 2^128 tables, which is computationally impossible.

**How salting works mathematically:**

Without salt: `hash = H(password)`
With salt: `hash = H(salt || password)` (salt concatenated with password)

The salt is stored in plaintext alongside the hash. It does not need to be secret -- its purpose is to ensure uniqueness, not to provide secrecy. A common misconception is that salts should be kept secret like "peppers." While adding a server-side pepper (a secret value mixed in) can provide additional defense, the salt itself serves a different purpose: it defeats precomputation attacks and prevents identical passwords from producing identical hashes.

A proper salt must be:
- **Random**: Generated using a CSPRNG (cryptographically secure pseudorandom number generator), not derived from the username or any other predictable value
- **Unique per password**: Never reuse salts across users, and generate a new salt when a password changes
- **Sufficient length**: At least 16 bytes (128 bits) to make per-salt precomputation infeasible
- **Stored with the hash**: Modern password hashing functions (bcrypt, scrypt, Argon2) generate and embed the salt automatically in their output string, so you do not handle salts manually

The Password Hashing Trinity: bcrypt, scrypt, Argon2

bcrypt (1999)

The original purpose-built password hashing function, designed by Niels Provos and David Mazieres based on the Blowfish cipher's expensive key schedule.

How bcrypt works internally: bcrypt takes the password and salt, and uses them to set up the Blowfish cipher's key schedule. It then encrypts the string "OrpheanBeholderScryDoubt" 64 times using the derived key. The cost factor determines how many iterations of the key schedule expansion are performed: cost factor N means 2^N iterations. Each doubling of the cost factor doubles the computation time.

Anatomy of a bcrypt hash string:
$2b$12$LJ3m4ys3Lg2VJz5yKvOUn.CEX/jOB9oQ1P0yRcIBD1OrzfhqGXKy6
 │   │  │                    │
 │   │  │                    └── Hash (31 chars, Radix-64 encoded)
 │   │  └── Salt (22 chars, Radix-64 = 128 bits)
 │   └── Cost factor: 12 (2^12 = 4,096 iterations)
 └── Algorithm identifier: 2b (current bcrypt version)
# Python bcrypt example
import bcrypt

# Hash a password
password = b"correct horse battery staple"
salt = bcrypt.gensalt(rounds=12)  # Cost factor = 12
hashed = bcrypt.hashpw(password, salt)
print(hashed)
# b'$2b$12$LJ3m4ys3Lg2VJz5yKvOUn.CEX/jOB9oQ1P0yRcIBD1OrzfhqGXKy6'

# Verify a password -- constant-time comparison built in
if bcrypt.checkpw(password, hashed):
    print("Password matches!")
else:
    print("Invalid password")

# Cost factor timing (approximate, varies by hardware):
# Cost 10: ~65ms   (fast, minimum for low-security apps)
# Cost 12: ~250ms  (recommended default for most apps)
# Cost 13: ~500ms  (good for sensitive applications)
# Cost 14: ~1s     (high security, noticeable delay for user)

bcrypt limitations:

  • Maximum input length is 72 bytes. Passwords longer than 72 bytes are silently truncated. If you expect very long passphrases, pre-hash with SHA-256 before passing to bcrypt.
  • CPU-hard only, not memory-hard. GPUs can still achieve significant parallelism (though much less than with SHA-256).
  • The Blowfish cipher is not widely used elsewhere, so bcrypt benefits from less hardware optimization than SHA-family hashes.

scrypt (2009)

Designed by Colin Percival for Tarsnap, scrypt added memory-hardness to CPU-hardness. The insight: GPUs have thousands of cores but limited memory per core. If the hash function requires a large block of memory that must be accessed randomly, GPUs cannot parallelize it efficiently.

scrypt parameters:
scrypt(password, salt, N=16384, r=8, p=1, dkLen=32)

N (CPU/memory cost): Must be a power of 2.
  Memory required ≈ 128 × N × r bytes
  N=16384, r=8: 128 × 16384 × 8 = 16 MB per hash
  N=32768, r=8: 128 × 32768 × 8 = 32 MB per hash
  Doubling N doubles both CPU time AND memory usage.

r (block size): Controls memory chunk size. r=8 is standard.

p (parallelism): Number of independent mixing operations.
  p=1 for interactive logins (serial computation).
  Higher p allows parallel computation but doesn't increase memory.
# Python scrypt example
import hashlib
import os

password = b"correct horse battery staple"
salt = os.urandom(16)

# Hash with scrypt
hashed = hashlib.scrypt(
    password,
    salt=salt,
    n=16384,    # CPU/memory cost (16 MB with r=8)
    r=8,        # Block size
    p=1,        # Parallelism
    dklen=32    # Output length in bytes
)

# Store both salt and hash (both needed for verification)
import base64
stored = base64.b64encode(salt + hashed).decode()

Argon2 (2015) -- The PHC Winner

Argon2 won the Password Hashing Competition (PHC) in 2015, beating 23 other submissions over two years of public review. It is the current recommendation for all new systems.

Argon2 comes in three variants:

graph TD
    A[Argon2 Family] --> B["Argon2d<br/>Data-dependent memory access<br/>Strongest GPU/ASIC resistance<br/>Vulnerable to side-channel attacks<br/>Best for: backend hashing,<br/>cryptocurrency"]
    A --> C["Argon2i<br/>Data-independent memory access<br/>Side-channel resistant<br/>Slightly weaker GPU resistance<br/>Best for: key derivation,<br/>disk encryption"]
    A --> D["Argon2id (RECOMMENDED)<br/>Hybrid: first pass Argon2i,<br/>then Argon2d passes<br/>Best of both worlds<br/>Best for: password hashing"]

    style D fill:#69db7c,color:#000

Why Argon2id is the right choice for passwords: The first pass uses data-independent memory access (Argon2i), which resists side-channel attacks during the initial memory filling. Subsequent passes use data-dependent access (Argon2d), which provides stronger resistance against GPU and ASIC attacks for the majority of the computation. This hybrid approach gives you the security benefits of both variants.

# Python Argon2 example (using argon2-cffi library)
from argon2 import PasswordHasher, Type

ph = PasswordHasher(
    time_cost=3,          # Number of iterations (t)
    memory_cost=65536,    # Memory in KiB: 64 MiB (m)
    parallelism=1,        # Number of threads (p)
    hash_len=32,          # Output hash length in bytes
    salt_len=16,          # Salt length in bytes
    type=Type.ID          # Argon2id
)

# Hash a password (salt is generated automatically)
hashed = ph.hash("correct horse battery staple")
print(hashed)
# $argon2id$v=19$m=65536,t=3,p=1$c29tZXNhbHQ$RdescudvJCsgt3ub+b+daw

# The output string contains everything needed for verification:
# $argon2id  -- algorithm variant
# $v=19      -- version (0x13 = 19)
# $m=65536   -- memory cost (64 MiB)
# $t=3       -- time cost (3 iterations)
# $p=1       -- parallelism (1 thread)
# $c29tZXNhbHQ  -- salt (base64)
# $RdescudvJCsgt3ub+b+daw  -- hash (base64)

# Verify a password
try:
    ph.verify(hashed, "correct horse battery staple")
    print("Password matches!")
except Exception:
    print("Invalid password")

# Progressive rehashing: check if parameters need updating
if ph.check_needs_rehash(hashed):
    # On next successful login, rehash with current parameters
    new_hash = ph.hash("correct horse battery staple")
    # Store new_hash in database, replacing old hash

Algorithm Comparison

FeaturebcryptscryptArgon2id
Year199920092015
CPU-hardYesYesYes
Memory-hardNoYesYes
GPU-resistantModerateStrongStrong
ASIC-resistantWeakModerateStrong
Side-channel resistantN/ANoYes (hybrid)
Max input length72 bytesUnlimitedUnlimited
Tunable parameters1 (cost)3 (N, r, p)3 (t, m, p)
Competition winnerNoNoYes (PHC 2015)
OWASP recommendationAcceptableAcceptablePreferred

For a new project, always use Argon2id -- with at least 64 MiB memory, 3 iterations, and 1 parallelism degree. If you are working on an existing system using bcrypt with cost 12 or higher, that is fine -- do not rewrite working security code without a compelling reason. But for new systems, Argon2id is the standard answer. And implement progressive rehashing: when a user logs in successfully with an old bcrypt hash, rehash their password with Argon2id and store the new hash. Over time, your entire user base migrates to the stronger algorithm without any user action.


Password Policies: Science vs. Security Theater

Consider the typical corporate password policy: minimum 8 characters, at least one uppercase letter, one lowercase letter, one number, one special character, and mandatory rotation every 90 days. Sound familiar? It is security theater. NIST updated their guidelines in Special Publication 800-63B, and they explicitly recommend against most of those rules. The research shows they do more harm than good.

What NIST SP 800-63B Actually Recommends

graph TD
    subgraph "DO (Evidence-Based)"
        D1["Minimum 8 characters<br/>(15+ for privileged accounts)"]
        D2["Allow at least 64 characters<br/>(support passphrases)"]
        D3["Allow all printable ASCII<br/>+ Unicode + spaces"]
        D4["Check against breach databases<br/>(HIBP API)"]
        D5["Check against common<br/>password dictionaries"]
        D6["Allow paste in password fields<br/>(enables password managers)"]
        D7["Show password strength meter"]
    end

    subgraph "DO NOT (Counter-Productive)"
        N1["Require specific character classes<br/>(uppercase, numbers, symbols)"]
        N2["Force periodic password rotation"]
        N3["Use knowledge-based questions<br/>(mother's maiden name, etc.)"]
        N4["Truncate passwords silently"]
        N5["Disallow paste in password fields"]
        N6["Apply composition rules<br/>beyond minimum length"]
    end

    style D1 fill:#69db7c,color:#000
    style D2 fill:#69db7c,color:#000
    style D3 fill:#69db7c,color:#000
    style D4 fill:#69db7c,color:#000
    style D5 fill:#69db7c,color:#000
    style D6 fill:#69db7c,color:#000
    style D7 fill:#69db7c,color:#000
    style N1 fill:#ff6b6b,color:#fff
    style N2 fill:#ff6b6b,color:#fff
    style N3 fill:#ff6b6b,color:#fff
    style N4 fill:#ff6b6b,color:#fff
    style N5 fill:#ff6b6b,color:#fff
    style N6 fill:#ff6b6b,color:#fff

Why composition rules backfire: When you require "at least one uppercase, one number, one special character," users satisfy the minimum: Password1!. This pattern is so common that password crackers have specific rules for it. Research by Weir et al. (2009) and Shay et al. (2014) showed that composition rules increase the predictability of passwords by pushing users into common patterns rather than truly random choices.

Why forced rotation is harmful: Research by Cranor et al. at CMU (2016) studied password changes under mandatory rotation. Users made minimal, predictable changes: Summer2025! becomes Fall2025! becomes Winter2026!. An attacker who cracks one password in the rotation can predict the next with high probability. Microsoft, NIST, the UK's NCSC, and the Canadian Centre for Cyber Security all now recommend against mandatory rotation unless there is evidence of compromise.

Checking Against Breach Databases

# Integrate HIBP (Have I Been Pwned) breach checking into your application
import hashlib
import requests

def is_password_pwned(password: str) -> tuple[bool, int]:
    """Check if password appears in known breaches.

    Uses k-anonymity: only sends the first 5 characters of the
    SHA-1 hash to the API. The server returns all hash suffixes
    matching that prefix. The full hash never leaves your server.
    """
    sha1 = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()
    prefix = sha1[:5]   # Send this to HIBP
    suffix = sha1[5:]   # Keep this private

    response = requests.get(
        f"https://api.pwnedpasswords.com/range/{prefix}",
        headers={"Add-Padding": "true"}  # Padding prevents response length analysis
    )
    response.raise_for_status()

    for line in response.text.splitlines():
        hash_suffix, count = line.split(':')
        if hash_suffix == suffix:
            return True, int(count)

    return False, 0

# Usage in a registration flow
pwned, count = is_password_pwned("password123")
if pwned:
    print(f"REJECTED: This password has appeared in {count:,} data breaches.")
    # count will be something like 123,456
Integrate breach checking into your application at three points:

1. **At registration**: Reject passwords found in HIBP. Show a clear message explaining why.
2. **At login**: Check asynchronously after successful authentication. If the password is pwned, show a non-blocking warning and prompt for a password change.
3. **At password change**: Reject the new password if it appears in HIBP.

\```bash
# Test the HIBP API from the command line
# SHA-1 of "password" is 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
# Send first 5 chars: 5BAA6
curl -s https://api.pwnedpasswords.com/range/5BAA6 | grep "1E4C9B93F3F0682250B6CF8331B7EE68FD8"
# Returns: 1E4C9B93F3F0682250B6CF8331B7EE68FD8:10437236
# "password" has appeared in over 10 million breaches
\```

The k-anonymity model means you never send the full password hash to HIBP. Your user's password remains private while you check against 800+ million breached passwords.

Credential Stuffing Defense

Credential stuffing is the automated use of stolen credentials from one breach to attempt access on other services. Defending against it requires layered controls because no single measure is sufficient.

graph TD
    subgraph "Layer 1: Rate Limiting"
        R1["Limit login attempts per IP<br/>(e.g., 10/minute)"]
        R2["Limit login attempts per account<br/>(e.g., 5/hour before lockout)"]
        R3["Progressive delays<br/>(1s, 2s, 4s, 8s, 16s...)"]
        R4["Temporary lockout with<br/>exponential backoff"]
    end

    subgraph "Layer 2: Bot Detection"
        B1["CAPTCHA after N failed attempts"]
        B2["Device fingerprinting<br/>(screen size, fonts, WebGL)"]
        B3["Behavioral analysis<br/>(typing cadence, mouse movement)"]
        B4["JavaScript proof-of-work challenges"]
    end

    subgraph "Layer 3: Credential Hygiene"
        C1["Check passwords against HIBP<br/>at registration"]
        C2["Notify users when their<br/>credentials appear in new breaches"]
        C3["Encourage password manager<br/>adoption"]
    end

    subgraph "Layer 4: Multi-Factor Authentication"
        M1["TOTP authenticator app"]
        M2["WebAuthn/FIDO2 hardware keys<br/>(strongest, phishing-resistant)"]
        M3["Push notifications"]
        M4["SMS codes<br/>(weakest, but better than nothing)"]
    end

    subgraph "Layer 5: Monitoring"
        O1["Alert on login from<br/>new location/device"]
        O2["Detect distributed attacks<br/>(many IPs, same pattern)"]
        O3["Track impossible travel<br/>(login from US then Russia<br/>within 5 minutes)"]
        O4["Monitor account takeover<br/>indicators (password change<br/>+ email change + new device)"]
    end

Password Managers: The Practical Answer

The honest truth is that humans are terrible at passwords. You cannot remember strong, unique passwords for a hundred services. The practical answer is password managers.

How Password Managers Work

The password manager derives an encryption key from your master password using a slow KDF (usually Argon2id or PBKDF2 with high iterations). This key encrypts a vault containing all your stored credentials. The vault can be stored locally or synced to a cloud service.

Master password: "purple-elephant-dances-wildly-on-saturday"
        |
        v
Key Derivation Function (Argon2id, m=256MiB, t=4, p=2)
        |
        v
256-bit encryption key
        |
        v
AES-256-GCM encrypted vault:
  gmail.com    -> kX9#mP2$vL7@nQ4dRt8!wY6...  (32 random chars)
  github.com   -> Ry5!wT8&jF3*bH6aKm2#pZ9...  (32 random chars)
  bank.com     -> pN1^cZ4#aM7%dK9eLx3$qW5...  (32 random chars)
  ... (hundreds more, each unique and random)

Is that not a single point of failure? Yes, there is a concentration risk. But the trade-off is overwhelmingly positive. Without a password manager, people use weak, reused passwords across all their accounts. With a password manager, every password is truly random and unique. The master password should be a long passphrase (25+ characters) protected with MFA. The alternative -- memorizing a hundred unique strong passwords -- is not humanly possible. The risk profile of "one very strong master password protecting unique per-site passwords" is dramatically better than "one weak password reused everywhere."

**The LastPass Breach (2022-2023)**

In August 2022, an attacker compromised a LastPass developer's machine, then used stolen credentials to access LastPass's cloud storage. In December 2022, LastPass disclosed that encrypted customer vaults had been stolen.

The vaults were protected by each user's master password via PBKDF2 with 100,100 iterations (for accounts created after 2018 -- older accounts had as few as 5,000 iterations). If a user had a weak or short master password, the vault encryption could be brute-forced.

**What happened next:**
- Cryptocurrency theft attributed to cracked LastPass vaults reached $35+ million by late 2023
- Users with weak master passwords and stored cryptocurrency seed phrases were the primary targets
- LastPass was criticized for storing some vault metadata (URLs) unencrypted, allowing attackers to see which sites users had accounts on even without cracking the vault

**Lessons from this breach:**
- Your master password must be genuinely strong: a 25+ character passphrase with no common phrases
- Enable MFA on your password manager account
- Use a password manager with strong KDF parameters (Argon2id with high memory cost, not PBKDF2 with low iterations)
- Consider self-hosted solutions (Bitwarden/Vaultwarden) if you do not trust cloud providers with your vault
- Even in the worst case, a properly encrypted vault with a truly strong master password remains secure -- the PBKDF2 and Argon2id KDFs make brute force infeasible for strong passwords

Multi-Factor Authentication (MFA)

MFA adds additional authentication factors beyond passwords, creating defense in depth. Even if a password is compromised, the attacker needs to also compromise the second factor.

MFA Strength Hierarchy

RankFactor TypeMechanismPhishing Resistant?Weaknesses
1Hardware security keyFIDO2/WebAuthnYes -- bound to originPhysical theft, cost ($25-50/key)
2Platform authenticatorTouch ID, Windows HelloYes -- bound to originTied to specific device
3Authenticator appTOTP (6-digit codes)No -- user can enter code on phishing siteShared secret, phishable
4Push notificationApprove/deny promptNo -- "MFA fatigue" attacksSocial engineering, prompt bombing
5SMS codeOne-time code via textNo -- phishableSIM swapping, SS7 interception
6Email codeOne-time code via emailNo -- phishableEmail account compromise

TOTP: How It Works

# Time-based One-Time Password (RFC 6238)
import hmac
import hashlib
import struct
import time

def generate_totp(secret: bytes, time_step: int = 30, digits: int = 6) -> str:
    """Generate a TOTP code.

    Both the server and the authenticator app share the same secret.
    They independently compute the same code based on the current time.
    Codes change every `time_step` seconds (default: 30).
    """
    # Current time window
    counter = int(time.time()) // time_step

    # HMAC-SHA1 of the counter using the shared secret
    counter_bytes = struct.pack('>Q', counter)
    hmac_hash = hmac.new(secret, counter_bytes, hashlib.sha1).digest()

    # Dynamic truncation (RFC 4226 Section 5.4)
    offset = hmac_hash[-1] & 0x0F
    truncated = struct.unpack('>I', hmac_hash[offset:offset + 4])[0]
    truncated &= 0x7FFFFFFF  # Mask the sign bit

    # Modulo to get the desired number of digits
    code = truncated % (10 ** digits)
    return str(code).zfill(digits)

# The shared secret is typically a 160-bit random value
# encoded as base32 and displayed as a QR code for the user to scan

FIDO2/WebAuthn: Phishing-Resistant Authentication

sequenceDiagram
    participant User as User
    participant Browser as Browser
    participant Server as Server (Relying Party)
    participant Auth as Authenticator<br/>(YubiKey / Touch ID)

    Note over User,Auth: Registration

    User->>Browser: Click "Register Security Key"
    Browser->>Server: Start registration
    Server->>Browser: Challenge + RP ID (origin-bound)
    Browser->>Auth: Create credential request<br/>(includes origin: example.com)
    Auth->>User: Touch the key / scan fingerprint
    User->>Auth: Physical confirmation
    Auth->>Auth: Generate key pair<br/>Private key stays on device<br/>Bound to origin "example.com"
    Auth->>Browser: Public key + attestation
    Browser->>Server: Public key + attestation
    Server->>Server: Store public key for user

    Note over User,Auth: Authentication (later)

    User->>Browser: Click "Sign in"
    Browser->>Server: Start authentication
    Server->>Browser: Challenge + RP ID
    Browser->>Auth: Sign challenge<br/>(includes origin: example.com)
    Auth->>User: Touch the key / scan fingerprint
    User->>Auth: Physical confirmation
    Auth->>Auth: Sign challenge with<br/>private key for "example.com"
    Auth->>Browser: Signed assertion
    Browser->>Server: Signed assertion
    Server->>Server: Verify signature<br/>with stored public key
    Server->>Browser: Authenticated!

Why WebAuthn defeats phishing: The credential is cryptographically bound to the origin (domain name). When the authenticator signs the challenge, it includes the origin in the signed data. If an attacker creates a phishing site at g00gle.com, the authenticator will not find any credential for that origin and will not respond. Even if you navigate to the phishing site and click "sign in," the hardware key simply does nothing -- there is no credential to use. This is fundamentally different from TOTP or SMS codes, where you see a code and can type it into any website.

Google reported zero successful phishing attacks against employees after mandating hardware security keys in 2017. Zero. Out of 85,000 employees. That is the power of phishing-resistant authentication.


Implementing Secure Password Storage: A Checklist

When implementing password storage for a new application, follow this checklist in order of importance:

1. **Choose Argon2id** as your hashing algorithm (or bcrypt cost 12+ if Argon2 is unavailable)
2. **Configure parameters properly**: Argon2id with m=65536 (64 MiB), t=3, p=1 as the minimum. Tune these so hashing takes 200-500ms on your server hardware.
3. **Salts are automatic**: Argon2 and bcrypt generate and embed unique salts automatically. Do not implement salting manually.
4. **Check against HIBP** at registration and password change using the k-anonymity API
5. **Check against common password lists** (the top 100,000 most common passwords from breach data)
6. **Enforce minimum length**: 12+ characters recommended (8 absolute minimum)
7. **Allow long passwords**: At least 64 characters, ideally 128+. Never truncate silently.
8. **Allow all characters**: Spaces, Unicode, emoji, special characters. Do not restrict character sets.
9. **Show a strength meter** using zxcvbn or similar library that estimates actual crack time
10. **Support paste** in password fields -- disabling paste punishes password manager users
11. **Implement MFA**: TOTP at minimum, WebAuthn/FIDO2 preferred for high-value accounts
12. **Use constant-time comparison** for hash verification to prevent timing side-channel attacks
13. **Rate limit** login attempts by IP and by account with exponential backoff
14. **Log authentication events** (failures, successes, password changes, MFA events) for security monitoring
15. **Support progressive rehashing**: When a user logs in with an old hash algorithm, rehash with current parameters transparently
16. **Never log passwords**: Audit your logging to ensure passwords are never captured in access logs, application logs, or error reports

What You've Learned

This chapter covered the principles and practice of secure credential storage:

  • Plaintext password storage is catastrophic -- RockYou, Adobe, and LinkedIn demonstrate the cascading damage from password exposure through credential stuffing, password reuse, and regulatory liability
  • General-purpose hashes (MD5, SHA-1, SHA-256) are too fast for password hashing; a modern GPU can try billions of SHA-256 hashes per second, cracking most passwords in minutes
  • Salting ensures identical passwords produce different hashes, defeating rainbow tables and batch cracking; use at least 128-bit random salts (or let your hashing library handle it automatically)
  • bcrypt, scrypt, and Argon2id are purpose-built password hashing functions with adjustable cost parameters; Argon2id is the current best choice for new systems
  • NIST SP 800-63B recommends against forced rotation, composition rules, and security questions -- policies that research shows cause users to choose weaker, more predictable passwords
  • Breach database checking via the HIBP API (using k-anonymity) catches actually compromised passwords without exposing your users' choices
  • Credential stuffing requires layered defenses: rate limiting, bot detection, breach checking, and MFA
  • Password managers are the practical answer to the human inability to memorize strong unique passwords; a strong master password with MFA provides better security than memorized passwords
  • Multi-factor authentication adds defense in depth; FIDO2/WebAuthn provides phishing-resistant authentication that has proven effective at scale (Google's zero phishing incidents after mandating hardware keys)

How long would it take to crack that competitor's MD5 database of 2.3 million passwords? With a modern GPU rig and the rockyou wordlist, the dictionary attack finishes in under a second. Brute-forcing all remaining 8-character passwords takes maybe an hour. If they had used Argon2id with proper parameters, each password attempt would take 400 milliseconds instead of 0.00000005 milliseconds. That changes "crack all passwords in an hour" to "crack one password in 50 years." The algorithm choice is literally the difference between trivial and impossible.

Audit your password storage. Check the hashing algorithm, check the parameters, check that salts are unique per user. And for the love of everything secure, grep your codebase for any debug logging that might be capturing passwords in plaintext. The most perfectly configured Argon2id hash is worthless if someone added logger.debug(f"Login attempt: user={username}, password={password}") during development and never removed it.