Undefined Behavior: C's Silent Killer

Type this right now

// save as ub_demo.c — compile TWICE with different optimization levels
#include <stdio.h>
#include <limits.h>

int main() {
    int x = INT_MAX;
    printf("x = %d\n", x);
    printf("x + 1 = %d\n", x + 1);

    if (x + 1 > x) {
        printf("Overflow detected? x + 1 > x is TRUE\n");
    } else {
        printf("No overflow? x + 1 > x is FALSE\n");
    }
    return 0;
}
$ gcc -O0 -o ub_O0 ub_demo.c && ./ub_O0
x = 2147483647
x + 1 = -2147483648
No overflow? x + 1 > x is FALSE

$ gcc -O2 -o ub_O2 ub_demo.c && ./ub_O2
x = 2147483647
x + 1 = -2147483648
Overflow detected? x + 1 > x is TRUE

Read that again. The same code, compiled with different flags, produces opposite results. The -O0 build wraps around and gives FALSE. The -O2 build says TRUE — the compiler removed the comparison entirely because signed overflow is undefined behavior, so the compiler assumes it cannot happen, and therefore x + 1 > x must always be true.

This isn't a bug in GCC. This is exactly what the C standard permits. Welcome to undefined behavior.


What undefined behavior actually means

The C standard defines three categories of problematic code:

  • Implementation-defined: The behavior varies by compiler, but the compiler must document what it does (e.g., size of int, right-shifting signed numbers).
  • Unspecified: The compiler can choose among several options, but doesn't have to tell you (e.g., order of evaluation of function arguments).
  • Undefined: The standard imposes no requirements whatsoever.

That last one is what kills you. When your program has undefined behavior, the compiler is allowed to:

  • Produce the result you expected
  • Produce a different result
  • Crash
  • Delete your code
  • Make demons fly out of your nose (the community joke)
  • Optimize as if the UB can never happen

That last point is the real danger. Modern optimizing compilers actively exploit undefined behavior. They don't just ignore your bug — they use it to justify removing your safety checks.


Example 1: signed integer overflow

The C standard says signed integer overflow is undefined. The compiler exploits this:

int check_overflow(int x) {
    if (x + 1 > x)        // compiler: "signed overflow can't happen"
        return 1;          //           "so x + 1 is always > x"
    else                   //           "this branch is dead code"
        return 0;          //           "I'll remove it"
}

At -O2, GCC compiles this to:

check_overflow:
    mov eax, 1             ; just return 1, always
    ret

The entire if statement is gone. The compiler reasoned: "Signed overflow is UB. I'm allowed to assume the programmer never does UB. Therefore x + 1 is always greater than x. Therefore the function always returns 1."

The logic is airtight given the assumption. The assumption is wrong for INT_MAX. But the standard says you must never reach INT_MAX + 1, so the compiler is technically correct.

🧠 What do you think happens?

What if you change int to unsigned int in the example above? Unsigned overflow IS defined in C — it wraps around modulo 2^n. The compiler can no longer optimize away the check. Try it.


Example 2: null pointer "optimization"

void process(int *ptr) {
    int value = *ptr;       // dereference ptr — if ptr is NULL, this is UB
    if (ptr == NULL) {      // null check AFTER dereference
        printf("ptr is NULL!\n");
        return;
    }
    printf("value = %d\n", value);
}

A human reads this and thinks "the null check is there for safety." The compiler reads it differently:

Compiler's reasoning:
1. Line 2 dereferences ptr
2. If ptr were NULL, that would be UB
3. I'm allowed to assume UB doesn't happen
4. Therefore ptr is NOT NULL at line 2
5. Therefore ptr is NOT NULL at line 3
6. Therefore the NULL check always fails
7. I'll remove it

At -O2:

process:
    mov    eax, DWORD PTR [rdi]    ; dereference ptr (no null check)
    ; the if (ptr == NULL) block is GONE
    mov    esi, eax
    lea    rdi, .LC0               ; "value = %d\n"
    jmp    printf

The null check was deleted. If ptr is NULL, the program crashes with no diagnostic. The "safety" code you carefully wrote is not in the binary.

This is not a contrived example. The Linux kernel had exactly this bug in a network driver (CVE-2009-1897). A null check was removed by GCC because a dereference appeared earlier in the function.


Example 3: use-after-free, the optimizer's playground

#include <stdlib.h>
#include <stdio.h>

int main() {
    int *p = malloc(sizeof(int));
    *p = 42;
    free(p);

    // UB: accessing freed memory
    // The compiler may reuse p's register for something else,
    // or the allocator may recycle the memory.
    printf("value = %d\n", *p);
    return 0;
}

This might print 42, or 0, or 1735289204, or crash, depending on:

  • Optimization level
  • Allocator implementation
  • Whether another thread allocated between free and *p
  • Phase of the moon (only slightly joking)

The insidious part: it might work perfectly in testing and crash only in production. UB doesn't guarantee a crash. It guarantees nothing.


Example 4: uninitialized variables

int foo() {
    int x;      // uninitialized — reading it is UB
    return x;   // what value?
}

You might expect "whatever was on the stack." But the compiler is allowed to assume you never read uninitialized memory. This means:

int bar() {
    int x;
    if (x == 0) {
        printf("zero\n");
    }
    if (x != 0) {
        printf("nonzero\n");
    }
}

The compiler can print both, neither, or one of these messages. It's not required to be consistent — x doesn't have to have the same value in both checks. GCC and Clang have both been observed making different choices at different optimization levels.

💡 Fun Fact

The phrase "nasal demons" comes from a 1992 comp.std.c Usenet post. Someone argued that undefined behavior could cause "demons to fly out of your nose." It became a running joke, but it captures a real truth: the standard truly places NO constraints on what happens. The joke persists because the reality is hard to believe.


UB on Godbolt: see it live

You can see these optimizations yourself at godbolt.org. Paste the signed overflow example, select x86-64 GCC with -O2, and watch the assembly. The comparison vanishes.

Try these experiments:

  1. Change -O2 to -O0 — the comparison reappears
  2. Change int to unsigned int — the comparison stays even at -O2
  3. Add -fwrapv (tells GCC to treat signed overflow as wrapping) — the comparison stays

-fwrapv is GCC's escape hatch: it makes signed overflow defined (as two's complement wrapping). Some projects (including the Linux kernel) compile with -fwrapv to eliminate this entire class of UB.


Rust's answer: no undefined behavior in safe code

Rust makes a bold guarantee: safe Rust has no undefined behavior.

This isn't a suggestion or a best practice. It's a hard property of the language. The compiler rejects programs that could exhibit UB, or inserts runtime checks where compile-time prevention isn't possible.

Signed integer overflow:

fn main() {
    let x: i32 = i32::MAX;
    let y = x + 1;   // In debug: PANIC at runtime
    println!("{}", y);
}
$ cargo run
thread 'main' panicked at 'attempt to add with overflow'

$ cargo run --release
# In release mode: wraps to -2147483648 (defined behavior, not UB)

Rust made a choice: in debug mode, overflow panics (catches bugs). In release mode, overflow wraps (for performance). Either way, the behavior is defined. The compiler cannot exploit it for optimizations that break your code.

Null pointers:

#![allow(unused)]
fn main() {
// Rust doesn't have null pointers.
// Option<&T> is the equivalent, and you MUST check it:
fn process(ptr: Option<&i32>) {
    match ptr {
        Some(value) => println!("value = {}", value),
        None => println!("ptr is None!"),
    }
}
// The compiler enforces the check. You can't dereference without matching.
}

Uninitialized variables:

fn main() {
    let x: i32;
    println!("{}", x);  // COMPILE ERROR: use of possibly-uninitialized variable
}
error[E0381]: used binding `x` isn't initialized
 --> src/main.rs:3:20
  |
2 |     let x: i32;
  |         - binding declared here but left uninitialized
3 |     println!("{}", x);
  |                    ^ `x` used here but it isn't initialized

No guessing. No "whatever was on the stack." The compiler rejects it outright.


The unsafe boundary

Rust does allow operations that could cause UB — but only inside unsafe blocks:

fn main() {
    let x: i32;

    // This is safe Rust — compiler prevents UB
    // println!("{}", x);  // won't compile

    unsafe {
        // In unsafe, you can do things that might cause UB
        let ptr: *const i32 = 0x1234 as *const i32;
        // let val = *ptr;  // UB if address is invalid
    }
}

The unsafe keyword is:

  • Opt-in: You must explicitly ask for it
  • Explicit: It marks exactly which code has elevated risk
  • Auditable: You can grep for unsafe in any codebase
  • Contained: The responsibility for correctness is localized

The idea is that 95% of your code is safe Rust (no UB possible), and 5% is unsafe (UB is your problem). You audit the 5%. In C, you audit 100%.

🧠 What do you think happens?

If you have 10,000 lines of Rust and UB occurs, where do you look? You grep for unsafe — maybe 50 lines. In C, if you have 10,000 lines and UB occurs, where do you look? Everywhere. That's the practical value of the safe/unsafe boundary.


UB in unsafe Rust: still possible

unsafe Rust can still invoke UB. The most common causes:

#![allow(unused)]
fn main() {
unsafe {
    // 1. Dereferencing a raw pointer to invalid memory
    let ptr: *const i32 = std::ptr::null();
    let _val = *ptr;  // UB: null dereference

    // 2. Creating an invalid reference
    let _ref: &i32 = &*ptr;  // UB: references must never be null

    // 3. Data races
    // Two threads writing to the same memory without synchronization

    // 4. Breaking aliasing rules
    // Having &T and &mut T to the same data simultaneously

    // 5. Calling a function with wrong ABI or invalid arguments
}
}

Unsafe Rust is roughly as dangerous as C for the code inside the unsafe block. The difference is that the blast radius is contained — safe code can rely on invariants that unsafe code must uphold.


The complete comparison

Undefined Behavior in CWhat happensRust equivalentWhat happens
Signed overflow: INT_MAX + 1Compiler removes your checksi32::MAX + 1Debug: panic. Release: wraps (defined)
Null deref: *NULLCompiler removes null checksNo null pointersOption<&T> forces you to check
Use-after-free: free(p); *pSilent corruptiondrop(p); *pCompile error
Uninitialized read: int x; return x;Anything — optimizer goes wildlet x: i32; xCompile error
Buffer overflow: a[11] on 10Silent corruptiona[11] on 10Panic: index out of bounds
Double-free: free(p); free(p)Allocator corruptiondrop(p); drop(p)Compile error
Data race: two threads, no syncTorn reads, corruption&mut T aliasing rulesCompile error
Invalid enum valueOptimizer makes wrong branch assumptionsInvalid enum via unsafe onlySafe code can't create one

Compiler flags that help

If you must write C, these flags make UB less likely to bite you:

# Compile-time warnings:
gcc -Wall -Wextra -Wpedantic -Werror

# Runtime sanitizers (debug builds):
gcc -fsanitize=undefined     # UBSan: catches UB at runtime
gcc -fsanitize=address       # ASan: catches memory bugs
gcc -fsanitize=thread        # TSan: catches data races

# UB-safe overflow handling:
gcc -fwrapv                  # Signed overflow wraps (like unsigned)
gcc -ftrapv                  # Signed overflow traps (abort)

UBSan in action:

$ gcc -fsanitize=undefined -o ub_demo ub_demo.c && ./ub_demo
ub_demo.c:7:22: runtime error: signed integer overflow:
  2147483647 + 1 cannot be represented in type 'int'

UBSan caught it. But remember: sanitizers only catch UB that actually executes. Code paths you don't test can still harbor UB. Rust's approach — preventing UB at compile time — catches it whether you test that path or not.

💡 Fun Fact

The LLVM optimizer (used by both Clang and rustc) has a concept called "poison values." When you compute something with UB (like signed overflow), the result is "poison" — a special marker that infects everything it touches. If a poison value reaches a branch condition, both branches become valid. If it reaches a store, the stored value is undefined. This is the formal mechanism by which UB propagates through your program.


🔧 Task

  1. Compile the signed overflow example from the start of this chapter at -O0 and -O2:

    gcc -O0 -o ub_O0 ub_demo.c && ./ub_O0
    gcc -O2 -o ub_O2 ub_demo.c && ./ub_O2
    

    Observe the different output. The compiler is not broken — it's exploiting UB.

  2. Add -fwrapv to the -O2 build:

    gcc -O2 -fwrapv -o ub_wrap ub_demo.c && ./ub_wrap
    

    Does the output match -O0 now? Why?

  3. Compile with UBSan:

    gcc -fsanitize=undefined -o ub_san ub_demo.c && ./ub_san
    

    Read the sanitizer's report carefully.

  4. Write the equivalent in Rust:

    fn main() {
        let x: i32 = i32::MAX;
        let y = x + 1;
        println!("{}", y);
    }

    Run in debug mode (cargo run) and release mode (cargo run --release). Compare. Neither is undefined behavior — one panics, the other wraps. Both are specified outcomes.

  5. Challenge: Find the null-pointer example on Godbolt. Compile with -O0 and -O2. At -O2, confirm the null check is deleted from the assembly. Then add -fno-delete-null-pointer-checks and verify it comes back.