Volatile, Type Punning, and Hardware Access Patterns

When your code talks directly to hardware, two things break: the compiler's assumptions about memory, and the type system's assumptions about data. This chapter covers the volatile keyword, type punning, strict aliasing, and the register access patterns used in embedded and driver code.

The Problem: The Compiler Is Too Smart

Compilers optimize aggressively. They assume that if no code writes to a variable, its value does not change. They assume that writing to a variable that is never read afterward is dead code. Both assumptions are wrong when hardware is involved.

#include <stdio.h>
#include <stdint.h>

/* Simulating a hardware status register */
static uint32_t fake_hw_register = 0;

void wait_for_ready_broken(void) {
    uint32_t *status = &fake_hw_register;

    /* BUG: compiler sees *status never changes in this loop */
    /* At -O2, this becomes an infinite loop or is removed entirely */
    while ((*status & 0x01) == 0) {
        /* spin */
    }
}

int main(void) {
    printf("This function has a bug -- see the source.\n");
    /* Do NOT call wait_for_ready_broken -- it will hang */
    return 0;
}

At -O2, the compiler loads *status once, sees it is zero, and generates an infinite loop -- or removes the loop as dead code. The compiler does not know that hardware can change the value behind its back.

The volatile Keyword

volatile tells the compiler: do not optimize away accesses to this variable. Read it every time the code says to read it. Write it every time the code says to write it. In the order the code specifies.

#include <stdio.h>
#include <stdint.h>

static volatile uint32_t fake_hw_status = 0;

void wait_for_ready(void) {
    /* volatile forces a real memory read on every iteration */
    while ((fake_hw_status & 0x01) == 0) {
        /* spin -- compiler MUST re-read fake_hw_status each time */
    }
}

int main(void) {
    printf("volatile prevents the compiler from caching the read.\n");
    /* Still do not call wait_for_ready in this demo -- there is no */
    /* other thread or hardware changing the value.                 */
    return 0;
}

What volatile Does and Does NOT Do

volatile guarantees:

  • Every read in the source produces a load instruction
  • Every write in the source produces a store instruction
  • Reads and writes to the same volatile variable are not reordered relative to each other

volatile does NOT guarantee:

  • Atomicity -- a 64-bit volatile read on a 32-bit CPU may tear
  • Memory ordering between different variables (use memory barriers for that)
  • Thread safety (use _Atomic or stdatomic.h for threads)

Caution: volatile is NOT a substitute for atomic operations in multithreaded code. In C, use _Atomic. In Rust, use std::sync::atomic. volatile is for hardware registers and memory-mapped I/O only.

Memory-Mapped Hardware Registers

Real hardware appears as addresses in the CPU's memory map. Reading or writing those addresses talks to the device.

Physical memory map (simplified):
0x0000_0000 - 0x3FFF_FFFF   RAM
0x4000_0000 - 0x4000_00FF   UART registers
0x4000_0100 - 0x4000_01FF   GPIO registers
0x4000_0200 - 0x4000_02FF   Timer registers

A typical register block for a UART:

Offset  Register    Access
0x00    DATA        R/W    (read = receive, write = transmit)
0x04    STATUS      R      (bit 0 = TX ready, bit 1 = RX data available)
0x08    CONTROL     R/W    (bit 0 = enable, bit 1 = interrupt enable)
0x0C    BAUD_DIV    R/W    (baud rate divisor)

Accessing Registers in C

#include <stdio.h>
#include <stdint.h>

/* In real code, UART_BASE comes from device tree or platform header */
/* Here we simulate with a static array */
static uint32_t simulated_uart[4] = {0, 0x03, 0, 0};

#define UART_BASE    ((volatile uint32_t *)simulated_uart)
#define UART_DATA    (UART_BASE[0])
#define UART_STATUS  (UART_BASE[1])
#define UART_CONTROL (UART_BASE[2])
#define UART_BAUD    (UART_BASE[3])

#define STATUS_TX_READY  (1u << 0)
#define STATUS_RX_AVAIL  (1u << 1)
#define CTRL_ENABLE      (1u << 0)
#define CTRL_IRQ_EN      (1u << 1)

void uart_init(uint32_t baud_divisor) {
    UART_BAUD    = baud_divisor;
    UART_CONTROL = CTRL_ENABLE;
}

void uart_send(uint8_t byte) {
    while (!(UART_STATUS & STATUS_TX_READY)) {
        /* spin -- volatile ensures re-read */
    }
    UART_DATA = byte;
}

int main(void) {
    uart_init(26);  /* e.g., 115200 baud */

    /* STATUS already has TX_READY set in our simulation */
    uart_send('H');

    printf("Sent 'H' (0x%02X) to simulated UART\n",
           (unsigned)simulated_uart[0]);
    printf("CONTROL = 0x%08X\n", (unsigned)simulated_uart[2]);
    printf("BAUD    = %u\n", (unsigned)simulated_uart[3]);
    return 0;
}

Driver Prep: In the Linux kernel, you never access physical addresses directly. The kernel provides ioremap() to map physical addresses into kernel virtual space, and readl()/writel() to perform volatile MMIO reads/writes with proper barriers. The pattern is: void __iomem *base = ioremap(phys, size); then val = readl(base + OFFSET);.

Type Punning in C

Type punning means reinterpreting the bytes of one type as another. There are three ways to do it in C, and two of them are problematic.

Method 1: Pointer Cast (Dangerous)

#include <stdio.h>
#include <stdint.h>

int main(void) {
    float f = 3.14f;
    uint32_t *p = (uint32_t *)&f;   /* strict aliasing violation! */
    printf("float 3.14 as uint32: 0x%08X\n", *p);
    return 0;
}

This compiles and works on most compilers with default settings. But it violates the strict aliasing rule and is technically undefined behavior.

Method 2: Union (Common, Practical)

#include <stdio.h>
#include <stdint.h>

union float_bits {
    float    f;
    uint32_t u;
};

int main(void) {
    union float_bits fb;
    fb.f = 3.14f;
    printf("float 3.14 as uint32: 0x%08X\n", fb.u);

    /* Inspect the IEEE 754 parts */
    uint32_t sign     = (fb.u >> 31) & 1;
    uint32_t exponent = (fb.u >> 23) & 0xFF;
    uint32_t mantissa = fb.u & 0x7FFFFF;
    printf("sign=%u  exp=%u  mantissa=0x%06X\n", sign, exponent, mantissa);
    return 0;
}

Caution: Union type-punning is well-defined in C11 (6.5.2.3) but NOT in C++. If you write code that must compile as both C and C++, use memcpy.

Method 3: memcpy (Always Correct)

#include <stdio.h>
#include <stdint.h>
#include <string.h>

int main(void) {
    float f = 3.14f;
    uint32_t u;
    memcpy(&u, &f, sizeof(u));
    printf("float 3.14 as uint32: 0x%08X\n", u);

    /* Round-trip */
    float f2;
    memcpy(&f2, &u, sizeof(f2));
    printf("back to float: %f\n", f2);
    return 0;
}

memcpy is the only method that is correct under all standards, all compilers, and all optimization levels. Modern compilers optimize small memcpy calls into register moves -- there is no performance penalty.

The Strict Aliasing Rule

The strict aliasing rule (C11 6.5 paragraph 7) says: you may only access an object through a pointer to a compatible type, a character type, or a signed/unsigned variant of its declared type.

#include <stdio.h>
#include <stdint.h>

/* This violates strict aliasing: */
void bad_example(void) {
    int x = 42;
    float *fp = (float *)&x;  /* int* -> float*: VIOLATION */
    /* Reading *fp is undefined behavior */
    printf("%f\n", *fp);  /* compiler may return garbage at -O2 */
}

/* This is fine -- char* can alias anything: */
void ok_example(void) {
    int x = 42;
    unsigned char *cp = (unsigned char *)&x;
    for (size_t i = 0; i < sizeof(x); i++)
        printf("%02X ", cp[i]);
    printf("\n");
}

int main(void) {
    ok_example();
    /* bad_example();  -- do not rely on this */
    return 0;
}

GCC's -fstrict-aliasing (enabled at -O2 and above) lets the compiler assume the rule is followed. Violations cause real, baffling, optimization-dependent bugs.

$ gcc -O0 -o alias alias.c    # might "work"
$ gcc -O2 -o alias alias.c    # might break -- UB
$ gcc -O2 -fno-strict-aliasing -o alias alias.c  # disables the optimization

Caution: The Linux kernel compiles with -fno-strict-aliasing because kernel code routinely casts between pointer types. This is a pragmatic choice -- not a license to ignore aliasing in your own code.

Rust: read_volatile / write_volatile

Rust has no volatile keyword. Instead, it provides two functions in std::ptr:

use std::ptr;

fn main() {
    let mut hw_reg: u32 = 0;

    // Volatile write
    unsafe {
        ptr::write_volatile(&mut hw_reg as *mut u32, 0xDEAD_BEEF);
    }

    // Volatile read
    let val = unsafe {
        ptr::read_volatile(&hw_reg as *const u32)
    };

    println!("Register value: 0x{val:08X}");
}

Rust Note: read_volatile and write_volatile are unsafe because they take raw pointers. The volatility is a property of the access, not the variable. This is more precise than C's model, where volatility is part of the type.

Modeling Hardware Registers in Rust

In idiomatic Rust, you wrap register access in a struct that encapsulates the unsafe volatile operations.

use std::ptr;

/// A read-write hardware register at a fixed memory address.
struct Register {
    addr: *mut u32,
}

impl Register {
    /// # Safety
    /// `addr` must point to a valid, mapped hardware register.
    unsafe fn new(addr: *mut u32) -> Self {
        Register { addr }
    }

    fn read(&self) -> u32 {
        unsafe { ptr::read_volatile(self.addr) }
    }

    fn write(&self, val: u32) {
        unsafe { ptr::write_volatile(self.addr, val) }
    }

    fn set_bits(&self, mask: u32) {
        let old = self.read();
        self.write(old | mask);
    }

    fn clear_bits(&self, mask: u32) {
        let old = self.read();
        self.write(old & !mask);
    }

    fn read_field(&self, mask: u32, shift: u32) -> u32 {
        (self.read() & mask) >> shift
    }

    fn write_field(&self, mask: u32, shift: u32, val: u32) {
        let old = self.read() & !mask;
        self.write(old | ((val << shift) & mask));
    }
}

// Demonstration using a simulated register
fn main() {
    let mut simulated_reg: u32 = 0;

    let reg = unsafe { Register::new(&mut simulated_reg as *mut u32) };

    reg.write(0x0000_0000);
    reg.set_bits(0x01);              // enable
    reg.write_field(0x0E, 1, 5);     // set mode field [3:1] = 5

    println!("Register = 0x{:08X}", reg.read());
    println!("Mode     = {}", reg.read_field(0x0E, 1));
}

Read-Only and Write-Only Registers

Some registers must not be written (status registers), and some must not be read (command/data FIFOs where reading has side effects). Encode this in the type system.

use std::ptr;
use std::marker::PhantomData;

struct ReadOnly;
struct WriteOnly;
struct ReadWrite;

struct Reg<MODE> {
    addr: *mut u32,
    _mode: PhantomData<MODE>,
}

impl<MODE> Reg<MODE> {
    unsafe fn new(addr: *mut u32) -> Self {
        Reg { addr, _mode: PhantomData }
    }
}

impl Reg<ReadOnly> {
    fn read(&self) -> u32 {
        unsafe { ptr::read_volatile(self.addr) }
    }
    // No write method -- compile error if you try
}

impl Reg<WriteOnly> {
    fn write(&self, val: u32) {
        unsafe { ptr::write_volatile(self.addr, val) }
    }
    // No read method
}

impl Reg<ReadWrite> {
    fn read(&self) -> u32 {
        unsafe { ptr::read_volatile(self.addr) }
    }
    fn write(&self, val: u32) {
        unsafe { ptr::write_volatile(self.addr, val) }
    }
}

fn main() {
    let mut status_mem: u32 = 0x42;
    let mut data_mem: u32 = 0;
    let mut ctrl_mem: u32 = 0;

    let status: Reg<ReadOnly>  = unsafe { Reg::new(&mut status_mem) };
    let data:   Reg<WriteOnly> = unsafe { Reg::new(&mut data_mem) };
    let ctrl:   Reg<ReadWrite> = unsafe { Reg::new(&mut ctrl_mem) };

    println!("Status = 0x{:02X}", status.read());
    // status.write(0);  // COMPILE ERROR -- ReadOnly has no write()

    data.write(0xFF);
    // data.read();  // COMPILE ERROR -- WriteOnly has no read()

    ctrl.write(0x01);
    println!("Ctrl = 0x{:02X}", ctrl.read());
}

Driver Prep: The Rust embedded ecosystem (cortex-m, svd2rust) generates register access code with exactly this pattern. The SVD file from the chip vendor describes which registers are read-only, write-only, or read-write, and the generated code enforces it at compile time.

Type Punning in Rust

Rust does not have unions in the C sense (Rust has union but accessing fields is unsafe). The idiomatic approach is transmute or byte-level methods.

Using transmute

fn main() {
    let f: f32 = 3.14;
    let bits: u32 = unsafe { std::mem::transmute(f) };
    println!("f32 3.14 as u32: 0x{bits:08X}");

    let sign     = (bits >> 31) & 1;
    let exponent = (bits >> 23) & 0xFF;
    let mantissa = bits & 0x7F_FFFF;
    println!("sign={sign}  exp={exponent}  mantissa=0x{mantissa:06X}");

    // Round-trip
    let f2: f32 = unsafe { std::mem::transmute(bits) };
    println!("back to f32: {f2}");
}

Caution: transmute is extremely unsafe. The source and destination types must have the same size (checked at compile time) but the compiler cannot verify that the bit pattern is valid for the destination type. Prefer safer alternatives when they exist.

Using to_bits / from_bits (Preferred)

fn main() {
    let f: f32 = 3.14;
    let bits = f.to_bits();
    println!("f32 3.14 as u32: 0x{bits:08X}");

    let f2 = f32::from_bits(bits);
    println!("back to f32: {f2}");

    // For f64 <-> u64:
    let d: f64 = 2.718281828;
    let dbits = d.to_bits();
    println!("f64 as u64: 0x{dbits:016X}");
}

to_bits() and from_bits() are safe, stable, and produce the same code as transmute. Always prefer them for float/integer conversions.

The Register Access Pattern for Embedded/Driver Code

Putting it all together: a complete register block definition as used in real embedded Rust.

use std::ptr;

/// UART register block starting at a base address.
struct Uart {
    base: *mut u8,
}

impl Uart {
    const DATA_OFF:    usize = 0x00;
    const STATUS_OFF:  usize = 0x04;
    const CONTROL_OFF: usize = 0x08;
    const BAUD_OFF:    usize = 0x0C;

    const STATUS_TX_READY: u32 = 1 << 0;
    const STATUS_RX_AVAIL: u32 = 1 << 1;
    const CTRL_ENABLE:     u32 = 1 << 0;

    /// # Safety
    /// `base` must point to a mapped UART register block.
    unsafe fn new(base: *mut u8) -> Self {
        Uart { base }
    }

    fn read_reg(&self, offset: usize) -> u32 {
        unsafe {
            ptr::read_volatile(self.base.add(offset) as *const u32)
        }
    }

    fn write_reg(&self, offset: usize, val: u32) {
        unsafe {
            ptr::write_volatile(self.base.add(offset) as *mut u32, val);
        }
    }

    fn init(&self, baud_divisor: u32) {
        self.write_reg(Self::BAUD_OFF, baud_divisor);
        self.write_reg(Self::CONTROL_OFF, Self::CTRL_ENABLE);
    }

    fn send_byte(&self, byte: u8) {
        while (self.read_reg(Self::STATUS_OFF) & Self::STATUS_TX_READY) == 0 {
            // spin
        }
        self.write_reg(Self::DATA_OFF, byte as u32);
    }

    fn try_recv(&self) -> Option<u8> {
        if (self.read_reg(Self::STATUS_OFF) & Self::STATUS_RX_AVAIL) != 0 {
            Some(self.read_reg(Self::DATA_OFF) as u8)
        } else {
            None
        }
    }
}

fn main() {
    // Simulate a register block in memory
    let mut regs = [0u32; 4];
    regs[1] = 0x01;  // STATUS: TX_READY set

    let uart = unsafe { Uart::new(regs.as_mut_ptr() as *mut u8) };
    uart.init(26);
    uart.send_byte(b'R');

    println!("DATA    = 0x{:08X}", regs[0]);  // 'R' = 0x52
    println!("CONTROL = 0x{:08X}", regs[2]);  // ENABLE = 0x01
    println!("BAUD    = {}", regs[3]);          // 26
}

This pattern -- base pointer plus offsets, volatile reads/writes, bit masks for fields -- is the foundation of every hardware driver, whether you write it in C or Rust.

Try It: Add an interrupt-enable bit to the CONTROL register (bit 1). Write a method enable_interrupts(&self) that sets bit 1 without clearing bit 0. This is the read-modify-write pattern that every driver uses.

Quick Knowledge Check

  1. What happens if you remove volatile from a hardware status register poll loop and compile at -O2?
  2. In C, why is memcpy preferred over pointer casts for type punning?
  3. Why does Rust make read_volatile / write_volatile unsafe, when C just uses a type qualifier?

Common Pitfalls

  • Using volatile for thread synchronization. It does not provide atomicity or memory ordering between threads. Use atomics.
  • Forgetting volatile on MMIO. The compiler will optimize your register writes away. One missing volatile can make a device non-functional.
  • Read-modify-write races. reg |= BIT is read, modify, write. If an interrupt fires between the read and write, the change is lost. Use spin locks in kernel code.
  • Strict aliasing violations. Pointer casts between unrelated types are UB at -O2. Use memcpy.
  • transmute misuse in Rust. If the bit pattern is invalid for the target type (e.g., transmuting 2u8 to bool), it is instant UB. Prefer to_bits() or TryFrom.
  • Assuming volatile ordering across variables. volatile orders accesses to the same variable only. Use compiler/kernel barrier macros for cross-variable ordering.