C, Rust, and Linux Systems Programming

Learn C and Rust together while becoming a Linux systems programmer.


This book teaches C and Rust side by side — C first at each concept, then the Rust equivalent — while building you into a capable Linux user-space systems programmer. You'll go from writing your first hello world in both languages to building event-driven servers, manipulating bits in hardware registers, and talking directly to the kernel.

Who This Book Is For

You understand programming concepts — recursion, iteration, functions, data structures — but you haven't written C or Rust. You want to become a Linux systems programmer and you're preparing for protocol implementations and device driver work.

What You'll Learn

  • C and Rust fundamentals — types, control flow, functions, structs, enums
  • Pointers, memory, and ownership — from raw pointers to Rust's borrow checker
  • Bit-level programming — bitwise operations, masks, alignment, endianness, volatile access
  • Advanced patterns — data structures, generics, function pointers, state machines, error handling
  • The build pipeline — compilation stages, Make, CMake, Cargo, libraries, cross-compilation
  • Linux system programming — file descriptors, processes, signals, threads, IPC, networking
  • Performance — optimization, memory pools, zero-copy, atomics
  • The user-kernel boundary — /proc, /sys, ioctl, netlink, preparing for kernel space

How to Read This Book

Each chapter follows the same rhythm:

  1. Brief setup — what and why (not a lecture)
  2. C code — annotated, runnable, minimal
  3. Try it — modify something, see what happens
  4. Rust equivalent — what's the same, what's different
  5. Diagrams where they help
  6. Knowledge check — test your understanding
  7. Pitfalls — short, punchy list
  8. Move on

Every code snippet compiles and runs. Learn by doing.

Relationship to "How Programs Really Run"

That book explains the machine — CPU architecture, memory hierarchy, ELF format, virtual memory. It assumes C/Rust knowledge.

This book teaches C and Rust and focuses on programming the machine — system calls, IPC, signals, networking, bit-level manipulation.

They're complementary: read "How Programs Really Run" to understand what's underneath, read this book to learn to program with it.


Goljyu Innovations

Hello from C, Hello from Rust

Every systems programmer's journey starts the same way: make the machine say something. In this chapter you will write, compile, and run your first program in both C and Rust, and you will see how the two languages differ before a single line of logic appears.

Your First C Program

Create a file called hello.c:

/* hello.c -- the smallest useful C program */
#include <stdio.h>

int main(void)
{
    printf("Hello from C!\n");
    return 0;
}

Compile and run it:

$ gcc -Wall -o hello hello.c
$ ./hello
Hello from C!

Let us walk through every piece.

#include <stdio.h> -- This is a preprocessor directive. Before the compiler ever sees your code, a separate tool (the C preprocessor) pastes the entire contents of stdio.h into your file. That header declares printf and hundreds of other I/O functions. Without it, the compiler does not know what printf is.

int main(void) -- The entry point. The operating system's C runtime calls main after setting up the process. It returns int because the OS expects an exit code. void in the parameter list means "no arguments" (in C, empty parentheses mean "unspecified arguments", which is different).

printf("Hello from C!\n") -- Writes a string to standard output. The \n is a newline character. printf is a variadic function; it accepts a format string followed by zero or more arguments. We will use it heavily.

return 0; -- Exit code 0 means success. Any non-zero value signals an error. The shell stores this value in $?.

$ ./hello
Hello from C!
$ echo $?
0

The gcc flags you should always use

FlagPurpose
-WallEnable most warnings
-WextraEnable even more warnings
-std=c17Use the C17 standard
-pedanticReject non-standard extensions
-o nameName the output binary

A solid default:

$ gcc -Wall -Wextra -std=c17 -pedantic -o hello hello.c

Driver Prep: Kernel modules are compiled with an even stricter set of warnings. Getting comfortable with -Wall -Wextra now saves pain later.

Try It: Change the return 0; to return 42;. Recompile, run, then check echo $?. What do you see?

Your First Rust Program

Create a file called hello.rs:

// hello.rs -- the smallest useful Rust program
fn main() {
    println!("Hello from Rust!");
}

Compile and run it:

$ rustc hello.rs
$ ./hello
Hello from Rust!

fn main() -- Rust's entry point. No return type is written because main implicitly returns () (the unit type, similar to void). No header includes, no preprocessor. The compiler already knows about println!.

println!("Hello from Rust!") -- The ! marks this as a macro, not a function. Macros in Rust are expanded at compile time. println! handles formatting, type checking of arguments, and writes to stdout with an appended newline.

There is no explicit return 0. Rust's main returns exit code 0 on success automatically. If you want to return a custom exit code:

// hello_exit.rs -- returning a custom exit code
use std::process::ExitCode;

fn main() -> ExitCode {
    println!("Hello from Rust!");
    ExitCode::from(0)
}

Rust Note: Rust does not have a preprocessor. There are no #include directives. Modules, use statements, and the compiler's built-in knowledge of the standard library replace that entire mechanism.

The Compilation Model

C and Rust compile your source code down to native machine code, but the journey is different.

C compilation pipeline

                  +-------------+
  hello.c  ----->| Preprocessor|----> hello.i  (expanded source)
                  +-------------+
                        |
                  +-------------+
                  |  Compiler   |----> hello.s  (assembly)
                  +-------------+
                        |
                  +-------------+
                  |  Assembler  |----> hello.o  (object file)
                  +-------------+
                        |
                  +-------------+
                  |   Linker    |----> hello    (executable)
                  +-------------+

You can see each stage:

$ gcc -E hello.c -o hello.i      # preprocess only
$ gcc -S hello.c -o hello.s      # compile to assembly
$ gcc -c hello.c -o hello.o      # assemble to object file
$ gcc    hello.o -o hello         # link

Rust compilation pipeline

                  +-----------+
  hello.rs  ----->|   rustc   |----> hello  (executable)
                  | (frontend |
                  |  + LLVM   |
                  |  backend) |
                  +-----------+

rustc handles everything in one invocation. Internally it parses, type-checks, performs borrow checking, generates LLVM IR, and invokes LLVM to produce machine code. There is no separate preprocessor or linker step visible to the user (though a linker is invoked behind the scenes).

Try It: Run gcc -S hello.c and open hello.s. Find the call instruction that invokes printf. On x86-64 Linux it will look something like call printf@PLT.

Cargo: Rust's Build System

For anything beyond a single file, Rust programmers use Cargo.

$ cargo new hello_project
     Created binary (application) `hello_project` package
$ cd hello_project
$ tree .
.
├── Cargo.toml
└── src
    └── main.rs

src/main.rs already contains:

fn main() {
    println!("Hello, world!");
}

Build and run:

$ cargo build
   Compiling hello_project v0.1.0
    Finished dev [unoptimized + debuginfo] target(s)
$ cargo run
Hello, world!
Cargo commandPurpose
cargo new nameCreate a new project
cargo buildCompile (debug mode)
cargo build --releaseCompile with optimizations
cargo runBuild and run
cargo checkType-check without producing a binary

C has no official build system. Projects use Makefiles, CMake, Meson, or plain shell scripts. Here is a minimal Makefile for our hello program:

# Makefile
CC = gcc
CFLAGS = -Wall -Wextra -std=c17 -pedantic

hello: hello.c
	$(CC) $(CFLAGS) -o hello hello.c

clean:
	rm -f hello
$ make
gcc -Wall -Wextra -std=c17 -pedantic -o hello hello.c
$ make clean
rm -f hello

Driver Prep: The Linux kernel uses its own Kbuild Makefile system. Understanding basic Make targets (all, clean, modules) is essential for kernel module work.

printf vs println!

The two are deceptively similar but work very differently under the hood.

C: printf

/* printf_demo.c */
#include <stdio.h>

int main(void)
{
    int x = 42;
    double pi = 3.14159;
    char ch = 'A';

    printf("integer: %d\n", x);
    printf("float:   %.2f\n", pi);
    printf("char:    %c\n", ch);
    printf("hex:     0x%08x\n", x);

    return 0;
}
$ gcc -Wall -o printf_demo printf_demo.c && ./printf_demo
integer: 42
float:   3.14
char:    A
hex:     0x0000002a

printf format specifiers: %d (int), %f (double), %c (char), %s (string), %x (hex), %p (pointer), %zu (size_t). Use the wrong one and you get undefined behavior -- the compiler may warn you, but it is not required to.

Caution: Passing the wrong type to printf is undefined behavior. For example, printf("%d\n", 3.14) will print garbage. The compiler cannot always catch this because printf is a variadic function with no type information in its signature.

Rust: println!

// println_demo.rs
fn main() {
    let x: i32 = 42;
    let pi: f64 = 3.14159;
    let ch: char = 'A';

    println!("integer: {}", x);
    println!("float:   {:.2}", pi);
    println!("char:    {}", ch);
    println!("hex:     {:#010x}", x);
}
$ rustc println_demo.rs && ./println_demo
integer: 42
float:   3.14
char:    A
hex:     0x0000002a

println! uses {} as the default placeholder. Formatting traits (Display, Debug) determine how a type is printed. The compiler checks at compile time that every argument matches a placeholder and implements the required trait.

Rust Note: You cannot pass the wrong type to println!. It is a compile-time error, not undefined behavior. The macro expands into code that the type checker validates before any binary is produced.

Return Codes and Error Signaling

Both languages use the process exit code to signal success or failure to the OS.

/* exit_codes.c */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <name>\n", argv[0]);
        return EXIT_FAILURE;  /* defined as 1 in stdlib.h */
    }
    printf("Hello, %s!\n", argv[1]);
    return EXIT_SUCCESS;       /* defined as 0 */
}
$ gcc -Wall -o exit_codes exit_codes.c
$ ./exit_codes
Usage: ./exit_codes <name>
$ echo $?
1
$ ./exit_codes Alice
Hello, Alice!
$ echo $?
0

The Rust equivalent:

// exit_codes.rs
use std::env;
use std::process;

fn main() {
    let args: Vec<String> = env::args().collect();

    if args.len() < 2 {
        eprintln!("Usage: {} <name>", args[0]);
        process::exit(1);
    }
    println!("Hello, {}!", args[1]);
}
$ rustc exit_codes.rs
$ ./exit_codes
Usage: ./exit_codes <name>
$ echo $?
1
$ ./exit_codes Alice
Hello, Alice!
$ echo $?
0

eprintln! writes to stderr, just like fprintf(stderr, ...) in C.

The Edit-Compile-Run Cycle

Both languages follow the same workflow:

  +------+      +---------+      +-----+
  | Edit | ---> | Compile | ---> | Run |
  +------+      +---------+      +-----+
      ^                              |
      |         (fix bugs)           |
      +------------------------------+

In C the cycle is: edit hello.c, run gcc, run ./hello. In Rust the cycle is: edit src/main.rs, run cargo run (which compiles and runs).

Rust's cargo check lets you skip code generation entirely when you only want to see if your code type-checks. This is faster than a full build and useful during development.

$ cargo check
    Checking hello_project v0.1.0
    Finished dev [unoptimized + debuginfo] target(s)

Multiple Source Files

A real C project splits code across files. Here is a minimal two-file example.

/* greet.h */
#ifndef GREET_H
#define GREET_H

void greet(const char *name);

#endif
/* greet.c */
#include <stdio.h>
#include "greet.h"

void greet(const char *name)
{
    printf("Hello, %s!\n", name);
}
/* main.c */
#include "greet.h"

int main(void)
{
    greet("world");
    return 0;
}
$ gcc -Wall -c greet.c -o greet.o
$ gcc -Wall -c main.c  -o main.o
$ gcc greet.o main.o   -o hello
$ ./hello
Hello, world!

In Rust, you create a module:

#![allow(unused)]
fn main() {
// src/greet.rs
pub fn greet(name: &str) {
    println!("Hello, {}!", name);
}
}
// src/main.rs
mod greet;

fn main() {
    greet::greet("world");
}
$ cargo run
Hello, world!

No header files. No include guards. No separate compilation step. Cargo handles it.

Driver Prep: Kernel modules in C use header files extensively. The kernel headers (linux/module.h, linux/kernel.h, etc.) declare the interfaces you will call. Understanding #include and header guards is not optional.

Quick Knowledge Check

  1. What does return 0; in C's main tell the operating system?
  2. Why does println! have an exclamation mark?
  3. What gcc flag enables most compiler warnings?

Common Pitfalls

  • Forgetting \n in printf. Output may not appear until the buffer flushes. println! adds the newline automatically.
  • Empty parentheses in C. int main() means "unspecified parameters", not "no parameters". Write int main(void) to mean "no parameters".
  • Using rustc for multi-file projects. Use cargo instead. rustc works for single files only.
  • Ignoring compiler warnings. Both gcc -Wall and rustc produce warnings for a reason. Treat warnings as errors during learning (-Werror in gcc, #![deny(warnings)] in Rust).
  • Mixing up printf format specifiers. %d for int, %ld for long, %zu for size_t. Getting them wrong is undefined behavior in C.

Types and Variables

Every value in a running program occupies bytes in memory. C and Rust both force you to think about types, but Rust does it with stricter rules and stronger guarantees. This chapter maps the C type system onto Rust's so you can translate between them without hesitation.

Integer Types in C

C gives you a menu of integer types whose exact sizes are platform-dependent.

/* int_types.c */
#include <stdio.h>
#include <stdint.h>

int main(void)
{
    char           c  = 'A';
    short          s  = 32000;
    int            i  = 2000000000;
    long           l  = 2000000000L;
    long long      ll = 9000000000000000000LL;
    unsigned int   u  = 4000000000U;

    printf("char:      %d  (size: %zu)\n", c,  sizeof(c));
    printf("short:     %d  (size: %zu)\n", s,  sizeof(s));
    printf("int:       %d  (size: %zu)\n", i,  sizeof(i));
    printf("long:      %ld (size: %zu)\n", l,  sizeof(l));
    printf("long long: %lld (size: %zu)\n", ll, sizeof(ll));
    printf("unsigned:  %u  (size: %zu)\n", u,  sizeof(u));

    return 0;
}
$ gcc -Wall -o int_types int_types.c && ./int_types
char:      65  (size: 1)
short:     32000  (size: 2)
int:       2000000000  (size: 4)
long:      2000000000 (size: 8)
long long: 9000000000000000000 (size: 8)
unsigned:  4000000000  (size: 4)

The C standard only guarantees minimum sizes. int is at least 16 bits, long at least 32, long long at least 64. For exact-width integers, use <stdint.h>: int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t. For object sizes, use size_t (unsigned, pointer-width).

Driver Prep: The Linux kernel uses fixed-width types extensively: u8, u16, u32, u64, s8, s16, s32, s64. These map directly to the <stdint.h> types. Always prefer fixed-width types in systems code.

Integer Types in Rust

Rust makes the bit width part of the type name. No ambiguity.

// int_types.rs
fn main() {
    let a: i8   = -128;
    let b: u8   = 255;
    let c: i16  = -32_768;
    let d: u16  = 65_535;
    let e: i32  = -2_147_483_647;
    let f: u32  = 4_294_967_295;
    let g: i64  = -9_223_372_036_854_775_807;
    let h: u64  = 18_446_744_073_709_551_615;
    let s: usize = 1024;

    println!("i8:    {}", a);
    println!("u8:    {}", b);
    println!("i32:   {}", e);
    println!("u64:   {}", h);
    println!("usize: {} (size: {} bytes)", s, std::mem::size_of::<usize>());
}

Note the underscores in numeric literals (65_535). Rust allows them for readability.

Type size comparison

+------------+----------+------------------+
| C type     | Rust     | Size (bytes)     |
+------------+----------+------------------+
| char       | i8 / u8  | 1                |
| short      | i16      | 2                |
| int        | i32      | 4                |
| long       | i64*     | 8 (on LP64)      |
| long long  | i64      | 8                |
| size_t     | usize    | pointer-width    |
| ptrdiff_t  | isize    | pointer-width    |
+------------+----------+------------------+
  * C long is 4 bytes on Windows (LLP64), 8 on Linux (LP64).
    Rust i64 is always 8 bytes.

Rust Note: Rust has no type whose size varies by platform except usize and isize, which are always pointer-width. Everything else is fixed. This eliminates an entire class of portability bugs.

Try It: In both C and Rust, print sizeof(long) / size_of::<i64>() and confirm the sizes on your machine.

Floating-Point Types

C provides float (32-bit) and double (64-bit). Rust provides f32 and f64.

/* floats.c */
#include <stdio.h>

int main(void)
{
    float  f = 3.14f;
    double d = 3.141592653589793;

    printf("float:  %.7f  (size: %zu)\n", f, sizeof(f));
    printf("double: %.15f (size: %zu)\n", d, sizeof(d));

    return 0;
}
// floats.rs
fn main() {
    let f: f32 = 3.14;
    let d: f64 = 3.141592653589793;

    println!("f32: {:.7}  (size: {})", f, std::mem::size_of::<f32>());
    println!("f64: {:.15} (size: {})", d, std::mem::size_of::<f64>());
}

Both follow IEEE 754. The behavior is identical at the bit level.

Characters

This is where C and Rust diverge sharply.

C: char is one byte

/* chars_c.c */
#include <stdio.h>

int main(void)
{
    char c = 'A';
    printf("char: %c, value: %d, size: %zu\n", c, c, sizeof(c));
    return 0;
}

C's char is a single byte. It holds ASCII values (0-127). Whether char is signed or unsigned is implementation-defined.

Rust: char is four bytes (a Unicode scalar value)

// chars_rust.rs
fn main() {
    let c: char = 'A';
    let heart: char = '\u{2764}';
    let kanji: char = '\u{6F22}';

    println!("char: {}, value: {}, size: {}", c, c as u32, std::mem::size_of::<char>());
    println!("heart: {}, value: U+{:04X}", heart, heart as u32);
    println!("kanji: {}, value: U+{:04X}", kanji, kanji as u32);
}

Rust Note: Rust's char is a Unicode scalar value and always occupies 4 bytes. This is fundamentally different from C's 1-byte char. In Rust, u8 is the equivalent of C's char when you want a raw byte.

C char 'A':
  +----+
  | 41 |    1 byte
  +----+

Rust char 'A':
  +----+----+----+----+
  | 41 | 00 | 00 | 00 |    4 bytes (little-endian, Unicode scalar)
  +----+----+----+----+

Booleans

C: Booleans are integers

/* bools_c.c */
#include <stdio.h>
#include <stdbool.h>

int main(void)
{
    bool a = true;
    bool b = false;
    int  n = 42;

    printf("true:  %d (size: %zu)\n", a, sizeof(a));
    printf("false: %d (size: %zu)\n", b, sizeof(b));

    if (n) {
        printf("%d is truthy in C\n", n);
    }
    return 0;
}

In C, true is 1, false is 0, and any integer can be used where a boolean is expected. Zero is false; everything else is true.

Rust: bool is a distinct type

// bools_rust.rs
fn main() {
    let a: bool = true;
    let b: bool = false;
    let n: i32 = 42;

    println!("true:  {} (size: {})", a, std::mem::size_of::<bool>());
    println!("false: {} (size: {})", b, std::mem::size_of::<bool>());

    // if n { } // ERROR: expected `bool`, found `i32`
    if n != 0 {
        println!("{} is non-zero", n);
    }
}

Caution: C's implicit integer-to-boolean conversion is a source of bugs. if (x = 0) (assignment, not comparison) evaluates to false and silently succeeds. Rust rejects this at compile time.

Constants

C: const and #define

/* constants_c.c */
#include <stdio.h>

#define MAX_BUFFER 1024

static const int MAX_RETRIES = 5;

int main(void)
{
    printf("buffer size: %d\n", MAX_BUFFER);
    printf("max retries: %d\n", MAX_RETRIES);
    return 0;
}

#define performs textual substitution -- no type, no scope, no address. const creates a typed, scoped value.

Rust: const and static

// constants_rust.rs
const MAX_BUFFER: usize = 1024;       // compile-time constant, inlined
static MAX_RETRIES: i32 = 5;          // fixed address in memory

fn main() {
    println!("buffer size: {}", MAX_BUFFER);
    println!("max retries: {}", MAX_RETRIES);
}
KeywordCompile-time?Has address?Mutable?
C #definePreprocessorNoN/A
C constNoYesNo
Rust constYesNo (inlined)No
Rust staticNoYesNo*

(*) static mut exists but is unsafe to access.

Rust Note: Rust's const is evaluated at compile time and inlined at every use site. Rust's static lives at a fixed address for the entire program lifetime.

Variable Declaration and Mutability

C: mutable by default

/* mutability_c.c */
#include <stdio.h>

int main(void)
{
    int x = 10;
    x = 20;           /* fine -- variables are mutable by default */

    const int y = 30;
    /* y = 40; */      /* error: assignment of read-only variable */

    printf("x = %d, y = %d\n", x, y);
    return 0;
}

Rust: immutable by default

// mutability_rust.rs
fn main() {
    let x = 10;
    // x = 20;        // error: cannot assign twice to immutable variable

    let mut y = 30;
    y = 40;            // fine -- declared with `mut`

    println!("x = {}, y = {}", x, y);
}

This is the opposite default. In C, you opt into immutability with const. In Rust, you opt into mutability with mut.

Try It: In Rust, try reassigning an immutable variable. Read the compiler error message. Rust error messages are famously helpful -- get used to reading them.

Type Casting and Coercion

C: implicit and explicit casts

/* casting_c.c */
#include <stdio.h>

int main(void)
{
    int    i = 42;
    double d = i;         /* implicit: int -> double */
    int    j = (int)3.99; /* explicit: double -> int (truncation!) */
    char   c = 300;       /* implicit: int -> char (overflow!) */

    printf("d = %f\n", d);
    printf("j = %d\n", j);
    printf("c = %d\n", c);

    return 0;
}

Caution: C silently narrows values. char c = 300 wraps around without error. The compiler may warn with -Wall, but it compiles.

Rust: explicit only

// casting_rust.rs
fn main() {
    let i: i32 = 42;
    let d: f64 = i as f64;          // explicit: i32 -> f64
    let j: i32 = 3.99_f64 as i32;  // explicit: f64 -> i32 (truncates to 3)
    let c: u8 = 300_u16 as u8;     // explicit: wraps to 44

    println!("d = {}", d);
    println!("j = {}", j);
    println!("c = {}", c);
}

Rust requires as for every numeric conversion. No implicit narrowing or widening.

sizeof in C, size_of in Rust

/* sizeof_demo.c */
#include <stdio.h>

int main(void)
{
    printf("char:      %zu bytes\n", sizeof(char));
    printf("int:       %zu bytes\n", sizeof(int));
    printf("long:      %zu bytes\n", sizeof(long));
    printf("double:    %zu bytes\n", sizeof(double));
    printf("void*:     %zu bytes\n", sizeof(void *));

    return 0;
}
// sizeof_demo.rs
use std::mem::size_of;

fn main() {
    println!("i8:    {} bytes", size_of::<i8>());
    println!("i32:   {} bytes", size_of::<i32>());
    println!("i64:   {} bytes", size_of::<i64>());
    println!("f64:   {} bytes", size_of::<f64>());
    println!("bool:  {} bytes", size_of::<bool>());
    println!("char:  {} bytes", size_of::<char>());
    println!("usize: {} bytes", size_of::<usize>());
}

Complete type size reference (64-bit Linux)

+---------------+-------+---------------+-------+
| C type        | Bytes | Rust type     | Bytes |
+---------------+-------+---------------+-------+
| char          |   1   | i8 / u8       |   1   |
| short         |   2   | i16 / u16     |   2   |
| int           |   4   | i32 / u32     |   4   |
| long          |   8   | i64 / u64     |   8   |
| long long     |   8   | i64 / u64     |   8   |
| (none)        |  16   | i128 / u128   |  16   |
| float         |   4   | f32           |   4   |
| double        |   8   | f64           |   8   |
| _Bool         |   1   | bool          |   1   |
| char          |   1   | char          |   4   |
| void*         |   8   | *const T      |   8   |
| size_t        |   8   | usize         |   8   |
+---------------+-------+---------------+-------+

Integer Overflow

C: undefined behavior for signed, wrapping for unsigned

/* overflow_c.c */
#include <stdio.h>
#include <limits.h>

int main(void)
{
    unsigned int u = UINT_MAX;
    printf("UINT_MAX:     %u\n", u);
    printf("UINT_MAX + 1: %u\n", u + 1);  /* wraps to 0 -- defined behavior */

    int s = INT_MAX;
    printf("INT_MAX:      %d\n", s);
    /* s + 1 is UNDEFINED BEHAVIOR for signed integers */
    printf("INT_MAX + 1:  %d\n", s + 1);

    return 0;
}

Caution: Signed integer overflow in C is undefined behavior. The compiler is allowed to assume it never happens, and it may optimize your code in surprising ways based on that assumption.

Rust: panics in debug, wraps in release

// overflow_rust.rs
fn main() {
    let u: u32 = u32::MAX;

    // Use wrapping_add for explicit wrapping:
    let v = u.wrapping_add(1);
    println!("u32::MAX wrapping_add(1) = {}", v);

    // Use checked_add to detect overflow:
    match u.checked_add(1) {
        Some(val) => println!("result: {}", val),
        None      => println!("overflow detected!"),
    }

    // Use saturating_add to clamp at max:
    let w = u.saturating_add(1);
    println!("u32::MAX saturating_add(1) = {}", w);
}
$ rustc overflow_rust.rs && ./overflow_rust
u32::MAX wrapping_add(1) = 0
overflow detected!
u32::MAX saturating_add(1) = 4294967295

Rust Note: Rust gives you four explicit choices for overflow: wrapping_*, checked_*, saturating_*, and overflowing_*. In debug builds, the standard + operator panics on overflow. In release builds, it wraps. There is no undefined behavior.

Quick Knowledge Check

  1. What is the size of char in C versus char in Rust?
  2. What happens when you add 1 to INT_MAX in C? In Rust (debug mode)?
  3. How do you declare a mutable variable in Rust?

Common Pitfalls

  • Assuming int is always 32 bits. The C standard only guarantees at least 16. Use int32_t when you need exactly 32 bits.
  • Forgetting that C's char signedness is implementation-defined. On ARM, char is unsigned. On x86, it is signed. Use signed char or unsigned char to be explicit.
  • Using %d to print size_t. Use %zu. The wrong format specifier is undefined behavior.
  • Implicit narrowing in C. Assigning a long to an int silently truncates. Rust forces you to write as i32.
  • Forgetting mut in Rust. Variables are immutable by default. The compiler error is clear, but it catches newcomers off guard.

Control Flow

Programs need to make decisions and repeat work. C and Rust share the same fundamental constructs but differ in important ways around type safety, exhaustiveness, and what counts as a boolean. This chapter covers every branching and looping construct you will use in systems programming.

if / else

C

/* if_else.c */
#include <stdio.h>

int main(void)
{
    int temp = 37;

    if (temp > 100) {
        printf("boiling\n");
    } else if (temp > 0) {
        printf("liquid\n");
    } else {
        printf("frozen\n");
    }

    return 0;
}

C's if condition is any expression. Zero is false, non-zero is true. Parentheses around the condition are required.

Rust

// if_else.rs
fn main() {
    let temp = 37;

    if temp > 100 {
        println!("boiling");
    } else if temp > 0 {
        println!("liquid");
    } else {
        println!("frozen");
    }
}

No parentheses around the condition (optional but idiomatic to omit). The condition must be of type bool. You cannot write if temp { ... } when temp is an integer.

if as an expression in Rust

// if_expression.rs
fn main() {
    let temp = 37;
    let state = if temp > 100 {
        "boiling"
    } else if temp > 0 {
        "liquid"
    } else {
        "frozen"
    };
    println!("water is {}", state);
}

Both arms must return the same type.

Rust Note: Because if is an expression, Rust has no need for a ternary operator. let x = if cond { a } else { b }; replaces C's x = cond ? a : b;.

Truthiness: 0 Is False in C

C: integers as booleans

/* truthiness.c */
#include <stdio.h>

int main(void)
{
    int x = 0;
    int y = 42;
    int *p = NULL;

    if (x) printf("x is truthy\n");
    else   printf("x is falsy\n");

    if (y) printf("y is truthy\n");
    else   printf("y is falsy\n");

    if (p) printf("p is non-null\n");
    else   printf("p is null\n");

    return 0;
}

In C: 0, 0.0, NULL, and '\0' are all false. Everything else is true.

Rust: only bool is bool

// truthiness_rust.rs
fn main() {
    let x: i32 = 0;
    // if x { } // ERROR: expected `bool`, found `i32`

    if x == 0 {
        println!("x is zero");
    }

    let p: Option<i32> = None;
    if p.is_none() {
        println!("p is None");
    }
}

Caution: In C, if (x = 0) assigns 0 to x and evaluates to false. This is a common bug that compilers warn about but do not reject. In Rust, if x = 0 is a type error because assignment returns (), not bool.

Try It: In C, write if (x = 5) (single equals) inside an if statement. Compile with -Wall. Read the warning. Then try the same in Rust.

The Ternary Operator (C Only)

/* ternary.c */
#include <stdio.h>

int main(void)
{
    int x = 7;
    const char *parity = (x % 2 == 0) ? "even" : "odd";
    printf("%d is %s\n", x, parity);

    int sign = (x > 0) ? 1 : (x < 0) ? -1 : 0;
    printf("sign of %d is %d\n", x, sign);

    return 0;
}

Rust replacement -- if/else expressions:

// ternary_rust.rs
fn main() {
    let x = 7;
    let parity = if x % 2 == 0 { "even" } else { "odd" };
    println!("{} is {}", x, parity);
}

while Loops

/* while_loop.c */
#include <stdio.h>

int main(void)
{
    int i = 0;
    while (i < 5) {
        printf("%d ", i);
        i++;
    }
    printf("\n");
    return 0;
}
// while_loop.rs
fn main() {
    let mut i = 0;
    while i < 5 {
        print!("{} ", i);
        i += 1;
    }
    println!();
}

Rust has no ++ or -- operators. Use i += 1 and i -= 1.

do-while (C Only)

/* do_while.c */
#include <stdio.h>

int main(void)
{
    int i = 10;
    do {
        printf("%d ", i);
        i++;
    } while (i < 5);  /* condition is false, but body ran once */
    printf("\n");
    return 0;
}

Rust has no do-while. The idiomatic replacement uses loop:

// do_while_rust.rs
fn main() {
    let mut i = 10;
    loop {
        print!("{} ", i);
        i += 1;
        if i >= 5 { break; }
    }
    println!();
}

for Loops

C

/* for_loop.c */
#include <stdio.h>

int main(void)
{
    for (int i = 0; i < 5; i++) {
        printf("%d ", i);
    }
    printf("\n");

    int nums[] = {10, 20, 30, 40, 50};
    size_t len = sizeof(nums) / sizeof(nums[0]);
    for (size_t i = 0; i < len; i++) {
        printf("%d ", nums[i]);
    }
    printf("\n");

    return 0;
}

Rust

// for_loop.rs
fn main() {
    for i in 0..5 {
        print!("{} ", i);
    }
    println!();

    let nums = [10, 20, 30, 40, 50];
    for n in &nums {
        print!("{} ", n);
    }
    println!();

    for (i, n) in nums.iter().enumerate() {
        print!("[{}]={} ", i, n);
    }
    println!();
}
C for loop anatomy:
  for (init; condition; update) { body }

Rust for loop anatomy:
  for variable in iterator { body }

Try It: In Rust, change 0..5 to 0..=5 (inclusive range). What is the difference in output?

loop: Rust's Infinite Loop

// loop_demo.rs
fn main() {
    let mut count = 0;
    let result = loop {
        count += 1;
        if count == 10 {
            break count * 2;  // loop can return a value via break
        }
    };
    println!("result = {}", result);
}

In C, you write while (1) or for (;;):

/* infinite_loop.c */
#include <stdio.h>

int main(void)
{
    int count = 0;
    int result;
    for (;;) {
        count++;
        if (count == 10) {
            result = count * 2;
            break;
        }
    }
    printf("result = %d\n", result);
    return 0;
}

Driver Prep: Kernel code is full of infinite loops. The main kernel thread never returns. Device polling loops use while (1) with break on status changes. Rust's loop maps directly to this pattern.

break and continue

Both languages support break (exit the loop) and continue (skip to next iteration). The semantics are identical.

/* break_continue.c */
#include <stdio.h>

int main(void)
{
    for (int i = 0; i < 10; i++) {
        if (i == 3) continue;
        if (i == 7) break;
        printf("%d ", i);
    }
    printf("\n");
    return 0;
}
// break_continue.rs
fn main() {
    for i in 0..10 {
        if i == 3 { continue; }
        if i == 7 { break; }
        print!("{} ", i);
    }
    println!();
}

Both print: 0 1 2 4 5 6

Loop Labels (Rust)

Rust allows labeling loops and breaking/continuing to an outer loop by name.

// loop_labels.rs
fn main() {
    'outer: for i in 0..5 {
        for j in 0..5 {
            if i + j == 6 {
                println!("breaking outer at i={}, j={}", i, j);
                break 'outer;
            }
            if j == 3 {
                continue 'outer;
            }
            print!("({},{}) ", i, j);
        }
    }
    println!("done");
}

C has no loop labels. The typical workaround is a flag variable:

/* break_outer.c */
#include <stdio.h>

int main(void)
{
    int done = 0;
    for (int i = 0; i < 5 && !done; i++) {
        for (int j = 0; j < 5; j++) {
            if (i + j == 6) {
                printf("breaking at i=%d, j=%d\n", i, j);
                done = 1;
                break;
            }
        }
    }
    printf("done\n");
    return 0;
}

switch (C) vs match (Rust)

This is where the languages diverge most in control flow.

C: switch

/* switch_demo.c */
#include <stdio.h>

int main(void)
{
    int day = 3;

    switch (day) {
    case 1: printf("Monday\n");    break;
    case 2: printf("Tuesday\n");   break;
    case 3: printf("Wednesday\n"); break;
    case 4: printf("Thursday\n");  break;
    case 5: printf("Friday\n");    break;
    case 6: printf("Saturday\n");  break;
    case 7: printf("Sunday\n");    break;
    default: printf("Invalid\n");  break;
    }

    return 0;
}

Caution: Forgetting break in a C switch causes fallthrough -- execution continues into the next case. This is a legendary source of bugs.

Rust: match

// match_demo.rs
fn main() {
    let day = 3;

    let name = match day {
        1 => "Monday",
        2 => "Tuesday",
        3 => "Wednesday",
        4 => "Thursday",
        5 => "Friday",
        6 | 7 => "Weekend",
        _ => "Invalid",
    };
    println!("day {} is {}", day, name);
}

Key differences: no fallthrough, exhaustive (compiler rejects non-exhaustive matches), and match is an expression that returns a value.

Pattern matching with ranges and guards

// match_patterns.rs
fn main() {
    let score = 85;

    let grade = match score {
        90..=100 => "A",
        80..=89  => "B",
        70..=79  => "C",
        60..=69  => "D",
        0..=59   => "F",
        _        => "Invalid",
    };
    println!("score {} = grade {}", score, grade);

    let temp = 37;
    let status = match temp {
        t if t > 100 => "boiling",
        t if t == 37 => "body temperature",
        t if t > 0   => "cool",
        _            => "freezing",
    };
    println!("{}C is {}", temp, status);
}

Destructuring in match

// match_destructure.rs
fn main() {
    let point = (3, -5);

    match point {
        (0, 0)     => println!("origin"),
        (x, 0)     => println!("on x-axis at x={}", x),
        (0, y)     => println!("on y-axis at y={}", y),
        (x, y)     => println!("point at ({}, {})", x, y),
    }
}

Rust Note: Rust's match can destructure tuples, structs, and enums, bind variables, use guards, and combine patterns. It is one of Rust's most distinctive features.

Combining Constructs: FizzBuzz

C

/* fizzbuzz.c */
#include <stdio.h>

int main(void)
{
    for (int i = 1; i <= 20; i++) {
        if (i % 15 == 0)      printf("FizzBuzz\n");
        else if (i % 3 == 0)  printf("Fizz\n");
        else if (i % 5 == 0)  printf("Buzz\n");
        else                   printf("%d\n", i);
    }
    return 0;
}

Rust

// fizzbuzz.rs
fn main() {
    for i in 1..=20 {
        match (i % 3, i % 5) {
            (0, 0) => println!("FizzBuzz"),
            (0, _) => println!("Fizz"),
            (_, 0) => println!("Buzz"),
            _      => println!("{}", i),
        }
    }
}

The Rust version uses tuple matching to handle all four cases cleanly.

Quick Knowledge Check

  1. What does if (x = 5) do in C? What happens in Rust?
  2. Can you use an integer as a condition in a Rust if statement?
  3. What happens if you omit the _ wildcard in a Rust match on an i32?

Common Pitfalls

  • Missing break in C switch. Every case falls through without it. Use -Wimplicit-fallthrough to catch this.
  • Using = instead of == in C conditions. if (x = 5) assigns 5 to x and always evaluates to true. Use -Wall to get a warning.
  • Non-exhaustive match in Rust. The compiler will reject it. Always include a _ wildcard or cover every variant.
  • Off-by-one in ranges. C's for (i = 0; i < n; i++) corresponds to Rust's 0..n (exclusive). Use 0..=n for inclusive.
  • No ++/-- in Rust. Use += 1 and -= 1. This is deliberate to avoid the confusion between prefix and postfix increment.
  • Forgetting that Rust's for consumes the iterator. Use &collection to borrow instead of consuming.

Functions

Functions are the fundamental unit of code organization in both C and Rust. But the two languages differ in how they declare them, how they pass arguments, and how they organize code across files. This chapter covers all of it.

Declaring and Defining Functions in C

C distinguishes between a function declaration (prototype) and its definition (body).

/* functions_basic.c */
#include <stdio.h>

/* Declaration (prototype) */
int add(int a, int b);

int main(void)
{
    int result = add(3, 4);
    printf("3 + 4 = %d\n", result);
    return 0;
}

/* Definition */
int add(int a, int b)
{
    return a + b;
}
$ gcc -Wall -o functions_basic functions_basic.c && ./functions_basic
3 + 4 = 7

The declaration must appear before the first call. The definition can appear anywhere. In C89, calling an undeclared function was allowed -- the compiler assumed int return. Modern C (C99+) requires a declaration.

Caution: In older C code, you may see functions called without declarations. This is dangerous because the compiler cannot check argument types. Always use -Wall -Wextra.

Defining Functions in Rust

Rust has no separation between declaration and definition. A function is defined once and can be called from anywhere in the same module, regardless of order.

// functions_basic.rs
fn main() {
    let result = add(3, 4);
    println!("3 + 4 = {}", result);
}

fn add(a: i32, b: i32) -> i32 {
    a + b   // no semicolon: this is the return expression
}

The return type is specified with ->. If omitted, the function returns () (unit). The last expression without a semicolon is the return value.

Parameter Passing: By Value

Both C and Rust pass arguments by value by default.

/* pass_by_value.c */
#include <stdio.h>

void increment(int x)
{
    x = x + 1;
    printf("inside: x = %d\n", x);
}

int main(void)
{
    int a = 10;
    increment(a);
    printf("outside: a = %d\n", a);
    return 0;
}
// pass_by_value.rs
fn increment(mut x: i32) {
    x += 1;
    println!("inside: x = {}", x);
}

fn main() {
    let a = 10;
    increment(a);
    println!("outside: a = {}", a);
}

Both print inside: 11, outside: 10. The function receives a copy.

For non-Copy types like String, Rust's pass-by-value transfers ownership:

// move_demo.rs
fn take_string(s: String) {
    println!("got: {}", s);
}

fn main() {
    let msg = String::from("hello");
    take_string(msg);
    // println!("{}", msg); // ERROR: value used after move
}
Value flow (move):
  main:  msg ----[ownership transferred]----> take_string: s
         msg is now invalid                   s is valid, then dropped

Parameter Passing: By Pointer (C)

To modify the caller's variable, C passes a pointer.

/* pass_by_pointer.c */
#include <stdio.h>

void increment(int *x)
{
    *x = *x + 1;
}

int main(void)
{
    int a = 10;
    increment(&a);
    printf("a = %d\n", a);  /* 11 */
    return 0;
}
Memory layout during the call:

  main's stack frame          increment's stack frame
  +----------+                +----------+
  | a = 10   | <------------ | x = &a   |
  +----------+   pointer     +----------+
  addr: 0x100                 *x dereferences to 0x100

Caution: C does not prevent you from passing NULL. Dereferencing a null pointer is undefined behavior and typically causes a segfault.

Parameter Passing: By Reference (Rust)

Rust uses references (& for shared, &mut for exclusive) instead of raw pointers.

// references.rs
fn print_value(x: &i32) {
    println!("value = {}", x);
}

fn increment(x: &mut i32) {
    *x += 1;
}

fn main() {
    let mut a = 10;
    print_value(&a);
    increment(&mut a);
    println!("a = {}", a);  // 11
}
Rust borrowing rules:
  1. You can have MANY shared references (&T) at the same time
  2. You can have ONE mutable reference (&mut T) at a time
  3. You cannot have both at the same time

  These rules are enforced at compile time.

Rust Note: References in Rust are always valid. They cannot be null, they cannot dangle, and the borrow checker ensures no data races. This is fundamentally safer than C pointers.

Try It: In Rust, try creating a &mut a while a &a is still in scope. Read the compiler error.

Multiple Return Values

C: returning a struct

/* multi_return_c.c */
#include <stdio.h>

typedef struct {
    int quot;
    int rem;
} divmod_result;

divmod_result divmod(int a, int b)
{
    divmod_result r;
    r.quot = a / b;
    r.rem  = a % b;
    return r;
}

int main(void)
{
    divmod_result r = divmod(17, 5);
    printf("17 / 5 = %d remainder %d\n", r.quot, r.rem);
    return 0;
}

Alternative: out-parameters via pointers.

/* out_params.c */
#include <stdio.h>

void divmod(int a, int b, int *quot, int *rem)
{
    *quot = a / b;
    *rem  = a % b;
}

int main(void)
{
    int q, r;
    divmod(17, 5, &q, &r);
    printf("17 / 5 = %d remainder %d\n", q, r);
    return 0;
}

Rust: returning a tuple

// multi_return_rust.rs
fn divmod(a: i32, b: i32) -> (i32, i32) {
    (a / b, a % b)
}

fn main() {
    let (quot, rem) = divmod(17, 5);
    println!("17 / 5 = {} remainder {}", quot, rem);
}

Tuples are first-class. No struct or out-parameter boilerplate needed.

Try It: Write a function that returns (min, max, sum) for a slice of integers in both C (using a struct) and Rust (using a tuple).

Function Pointers

C

/* fn_pointer.c */
#include <stdio.h>

int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }

void apply(int (*op)(int, int), int x, int y)
{
    printf("result = %d\n", op(x, y));
}

int main(void)
{
    apply(add, 3, 4);
    apply(mul, 3, 4);
    return 0;
}

Rust

// fn_pointer.rs
fn add(a: i32, b: i32) -> i32 { a + b }
fn mul(a: i32, b: i32) -> i32 { a * b }

fn apply(op: fn(i32, i32) -> i32, x: i32, y: i32) {
    println!("result = {}", op(x, y));
}

fn main() {
    apply(add, 3, 4);
    apply(mul, 3, 4);
}

Rust also supports closures that capture their environment:

// closures.rs
fn apply(op: &dyn Fn(i32, i32) -> i32, x: i32, y: i32) {
    println!("result = {}", op(x, y));
}

fn main() {
    let offset = 10;
    let add_with_offset = |a, b| a + b + offset;
    apply(&add_with_offset, 3, 4);  // result = 17
}

Driver Prep: The Linux kernel makes heavy use of function pointers for abstraction. Every device driver fills in a struct of function pointers (struct file_operations, struct net_device_ops). Understanding function pointers is essential for driver work.

Forward Declarations and Header Files (C)

In real C projects, declarations go in headers (.h), definitions in sources (.c).

/* math_ops.h */
#ifndef MATH_OPS_H
#define MATH_OPS_H

int add(int a, int b);
int mul(int a, int b);

#endif
/* math_ops.c */
#include "math_ops.h"

int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }
/* main.c */
#include <stdio.h>
#include "math_ops.h"

int main(void)
{
    printf("add: %d\n", add(3, 4));
    printf("mul: %d\n", mul(3, 4));
    return 0;
}
$ gcc -Wall -c math_ops.c -o math_ops.o
$ gcc -Wall -c main.c -o main.o
$ gcc math_ops.o main.o -o math_demo
$ ./math_demo
add: 7
mul: 12
C compilation flow (multi-file):

  math_ops.h
      |
      v
  math_ops.c ---[gcc -c]---> math_ops.o ---+
                                             +--> [linker] --> math_demo
  main.c -------[gcc -c]---> main.o -------+
      ^
      |
  math_ops.h (included)

Modules in Rust

Rust replaces header files with a module system.

#![allow(unused)]
fn main() {
// src/math_ops.rs
pub fn add(a: i32, b: i32) -> i32 { a + b }
pub fn mul(a: i32, b: i32) -> i32 { a * b }
}
// src/main.rs
mod math_ops;

fn main() {
    println!("add: {}", math_ops::add(3, 4));
    println!("mul: {}", math_ops::mul(3, 4));
}

No header files. No include guards. The pub keyword controls visibility.

Rust module system:

  src/main.rs  --[mod math_ops]--> src/math_ops.rs
                                     pub fn add(...)
                                     pub fn mul(...)

Rust Note: Rust's module system enforces encapsulation at compile time. Items without pub are genuinely inaccessible from outside the module. In C, header files are documentation, not enforcement.

Static and Private Functions

C: static limits visibility to the current file

/* helpers.c */
static int helper(int x) { return x * 2; }

int public_function(int x) { return helper(x) + 1; }

Rust: omit pub

#![allow(unused)]
fn main() {
// helpers.rs
fn helper(x: i32) -> i32 { x * 2 }

pub fn public_function(x: i32) -> i32 { helper(x) + 1 }
}

Functions are private by default in Rust. No keyword needed.

Recursion

/* factorial_c.c */
#include <stdio.h>

unsigned long factorial(unsigned int n)
{
    if (n <= 1) return 1;
    return n * factorial(n - 1);
}

int main(void)
{
    for (unsigned int i = 0; i <= 10; i++) {
        printf("%2u! = %lu\n", i, factorial(i));
    }
    return 0;
}
// factorial_rust.rs
fn factorial(n: u64) -> u64 {
    if n <= 1 { 1 } else { n * factorial(n - 1) }
}

fn main() {
    for i in 0..=10 {
        println!("{:2}! = {}", i, factorial(i));
    }
}

Caution: Neither C nor Rust guarantees tail-call optimization. Deep recursion can overflow the stack. Prefer iterative solutions when depth is unbounded.

Quick Knowledge Check

  1. In C, what is the difference between a function declaration and a definition?
  2. What does pub do in Rust?
  3. Why can you not use a String after passing it by value to a Rust function?

Common Pitfalls

  • Forgetting the forward declaration in C. The compiler may assume int return type, leading to subtle bugs.
  • Passing NULL where a pointer is expected in C. No compile-time protection. Check for NULL defensively.
  • Confusing & and &mut in Rust. If you need to modify the argument, the function must take &mut T, and the caller must pass &mut value.
  • Forgetting that Rust strings are UTF-8. You cannot index a String by byte position. Use .chars() for iteration.
  • Returning a pointer to a local variable in C. The stack frame is gone after return. The pointer dangles. Rust prevents this at compile time.
  • Overusing return in Rust. Idiomatic style omits return for the last expression. Use return only for early exits.

Structs, Enums, and Unions

Primitive types only get you so far. Real programs model real things: a network packet has a source, a destination, and a payload. A device register has named bit fields. This chapter covers the composite types that make systems programming possible.

C Structs

A struct groups related values under one name.

/* struct_basic.c */
#include <stdio.h>
#include <math.h>

typedef struct {
    double x;
    double y;
} Point;

double distance(Point a, Point b)
{
    double dx = a.x - b.x;
    double dy = a.y - b.y;
    return sqrt(dx * dx + dy * dy);
}

int main(void)
{
    Point a = { .x = 0.0, .y = 0.0 };
    Point b = { .x = 3.0, .y = 4.0 };
    printf("distance = %f\n", distance(a, b));
    return 0;
}
$ gcc -Wall -std=c17 -o struct_basic struct_basic.c -lm && ./struct_basic
distance = 5.000000

typedef lets you write Point instead of struct Point everywhere. The .x syntax in the initializer is a C99 designated initializer.

Driver Prep: The Linux kernel uses structs constantly: struct file, struct inode, struct task_struct, struct sk_buff. Understanding struct layout and passing is foundational.

Rust Structs

Named-field struct

// struct_basic.rs
struct Point {
    x: f64,
    y: f64,
}

fn distance(a: &Point, b: &Point) -> f64 {
    let dx = a.x - b.x;
    let dy = a.y - b.y;
    (dx * dx + dy * dy).sqrt()
}

fn main() {
    let a = Point { x: 0.0, y: 0.0 };
    let b = Point { x: 3.0, y: 4.0 };
    println!("distance = {}", distance(&a, &b));
}

No typedef needed. The struct name is the type name directly.

Tuple struct and unit struct

// tuple_struct.rs
struct Color(u8, u8, u8);
struct Meters(f64);
struct Marker;    // unit struct, zero-sized

fn main() {
    let red = Color(255, 0, 0);
    println!("R={}, G={}, B={}", red.0, red.1, red.2);

    let height = Meters(1.82);
    println!("height = {} m", height.0);

    let _m = Marker;
    println!("size of Marker = {}", std::mem::size_of::<Marker>());
}

Tuple structs are useful for the "newtype" pattern -- wrapping a value in a distinct type for type safety. Unit structs take no memory at runtime.

Methods (impl blocks in Rust)

Rust attaches methods to structs via impl. C has no methods; you pass the struct to a function manually.

C: functions that take a struct pointer

/* rect_c.c */
#include <stdio.h>

typedef struct {
    double width;
    double height;
} Rect;

double rect_area(const Rect *r)
{
    return r->width * r->height;
}

int main(void)
{
    Rect r = { .width = 5.0, .height = 3.0 };
    printf("area = %f\n", rect_area(&r));
    return 0;
}

Rust: methods with self

// rect_rust.rs
struct Rect {
    width: f64,
    height: f64,
}

impl Rect {
    fn area(&self) -> f64 {
        self.width * self.height
    }

    fn new(width: f64, height: f64) -> Rect {
        Rect { width, height }
    }
}

fn main() {
    let r = Rect::new(5.0, 3.0);
    println!("area = {}", r.area());
}
C struct "method" call:      rect_area(&r)
Rust method call:            r.area()

Under the hood, both pass a pointer to the struct.
  &self   == const Rect*
  &mut self == Rect*

Try It: Add a scale method to the Rust Rect that takes &mut self and a factor: f64, and multiplies both width and height by the factor.

C Enums

In C, enums are just named integer constants.

/* enum_c.c */
#include <stdio.h>

enum Direction { NORTH = 0, SOUTH = 1, EAST = 2, WEST = 3 };

const char *direction_name(enum Direction d)
{
    switch (d) {
    case NORTH: return "North";
    case SOUTH: return "South";
    case EAST:  return "East";
    case WEST:  return "West";
    default:    return "Unknown";
    }
}

int main(void)
{
    enum Direction d = EAST;
    printf("direction = %s (%d)\n", direction_name(d), d);

    /* C allows any integer -- no type safety */
    enum Direction invalid = 99;
    printf("invalid = %s (%d)\n", direction_name(invalid), invalid);

    return 0;
}

Caution: C enums provide no type safety. You can assign any integer to an enum variable. The default case is your only defense.

Rust Enums: Algebraic Data Types

Rust enums are fundamentally more powerful. Each variant can carry data.

Simple enum

// enum_simple.rs
#[derive(Debug)]
enum Direction {
    North, South, East, West,
}

fn direction_name(d: &Direction) -> &str {
    match d {
        Direction::North => "North",
        Direction::South => "South",
        Direction::East  => "East",
        Direction::West  => "West",
    }
}

fn main() {
    let d = Direction::East;
    println!("direction = {} ({:?})", direction_name(&d), d);
    // let invalid: Direction = 99;  // does NOT compile
}

The match is exhaustive. Add a fifth variant and the compiler forces you to handle it everywhere.

Enums with data

// enum_data.rs
#[derive(Debug)]
enum Shape {
    Circle(f64),
    Rectangle(f64, f64),
    Triangle { base: f64, height: f64 },
}

fn area(shape: &Shape) -> f64 {
    match shape {
        Shape::Circle(r)                 => std::f64::consts::PI * r * r,
        Shape::Rectangle(w, h)           => w * h,
        Shape::Triangle { base, height } => 0.5 * base * height,
    }
}

fn main() {
    let shapes = vec![
        Shape::Circle(5.0),
        Shape::Rectangle(4.0, 6.0),
        Shape::Triangle { base: 3.0, height: 8.0 },
    ];

    for s in &shapes {
        println!("{:?} -> area = {:.2}", s, area(s));
    }
}
$ rustc enum_data.rs && ./enum_data
Circle(5.0) -> area = 78.54
Rectangle(4.0, 6.0) -> area = 24.00
Triangle { base: 3.0, height: 8.0 } -> area = 12.00

This is impossible in C with plain enums. You would need a struct with a tag and union.

Option and Result

Rust's standard library uses enums for two critical types.

Option replaces null pointers:

// option_demo.rs
fn find_first_negative(nums: &[i32]) -> Option<usize> {
    for (i, &n) in nums.iter().enumerate() {
        if n < 0 { return Some(i); }
    }
    None
}

fn main() {
    let data = [10, 20, -5, 30];
    match find_first_negative(&data) {
        Some(idx) => println!("first negative at index {}", idx),
        None      => println!("no negatives found"),
    }
}

Result replaces error codes:

// result_demo.rs
use std::num::ParseIntError;

fn parse_and_double(s: &str) -> Result<i32, ParseIntError> {
    let n: i32 = s.parse()?;
    Ok(n * 2)
}

fn main() {
    match parse_and_double("21") {
        Ok(val)  => println!("success: {}", val),
        Err(e)   => println!("error: {}", e),
    }
    match parse_and_double("abc") {
        Ok(val)  => println!("success: {}", val),
        Err(e)   => println!("error: {}", e),
    }
}

Rust Note: Option<T> is enum { Some(T), None }. Result<T, E> is enum { Ok(T), Err(E) }. These are ordinary enums with generics. The power comes from match and the ? operator.

C Unions

A union stores different types in the same memory. Only one field is valid at a time.

/* union_c.c */
#include <stdio.h>
#include <string.h>

typedef struct {
    enum { INT_VAL, FLOAT_VAL, STR_VAL } tag;
    union {
        int    i;
        double f;
        char   s[32];
    } data;
} Value;

void print_value(const Value *v)
{
    switch (v->tag) {
    case INT_VAL:   printf("int: %d\n", v->data.i);   break;
    case FLOAT_VAL: printf("float: %f\n", v->data.f); break;
    case STR_VAL:   printf("str: %s\n", v->data.s);   break;
    }
}

int main(void)
{
    Value a = { .tag = INT_VAL,   .data.i = 42 };
    Value b = { .tag = FLOAT_VAL, .data.f = 3.14 };
    Value c = { .tag = STR_VAL };
    strncpy(c.data.s, "hello", sizeof(c.data.s) - 1);

    print_value(&a);
    print_value(&b);
    print_value(&c);

    return 0;
}
Union memory layout:

  +------+------+------+------+------+------+------+------+
  |              shared memory (32 bytes)                   |
  +------+------+------+------+------+------+------+------+

  When tag == INT_VAL:    first 4 bytes hold int
  When tag == FLOAT_VAL:  first 8 bytes hold double
  When tag == STR_VAL:    all 32 bytes hold char[32]

  sizeof(union) = size of largest member = 32

Caution: Reading the wrong union member is undefined behavior in C. The tag field is a convention, not an enforcement. There is no runtime check.

Driver Prep: Type punning through unions is common in low-level code -- reading hardware registers, parsing binary protocols. The Linux kernel uses unions in structures like union sigval and union nf_inet_addr.

Rust's Safe Alternative to Unions

Rust enums with data are tagged unions with the tag built in and enforced by the compiler:

// tagged_union_rust.rs
enum Value {
    Int(i32),
    Float(f64),
    Str(String),
}

fn print_value(v: &Value) {
    match v {
        Value::Int(i)   => println!("int: {}", i),
        Value::Float(f) => println!("float: {}", f),
        Value::Str(s)   => println!("str: {}", s),
    }
}

fn main() {
    let values = vec![
        Value::Int(42),
        Value::Float(3.14),
        Value::Str(String::from("hello")),
    ];
    for v in &values { print_value(v); }
}

For low-level type punning, Rust has raw union types (access requires unsafe):

// raw_union.rs
union FloatBits {
    f: f32,
    u: u32,
}

fn main() {
    let fb = FloatBits { f: 1.0 };
    let bits = unsafe { fb.u };
    println!("float 1.0 as bits: 0x{:08X}", bits);
}

Rust Note: Raw Rust unions exist primarily for C interop (FFI). In pure Rust code, prefer enums. The unsafe block signals that the programmer is taking responsibility for correctness.

Memory Layout Comparison

/* layout_c.c */
#include <stdio.h>
#include <stddef.h>

typedef struct {
    char   a;    /* 1 byte  */
    int    b;    /* 4 bytes */
    char   c;    /* 1 byte  */
    double d;    /* 8 bytes */
} Example;

int main(void)
{
    printf("sizeof(Example) = %zu\n", sizeof(Example));
    printf("offset of a = %zu\n", offsetof(Example, a));
    printf("offset of b = %zu\n", offsetof(Example, b));
    printf("offset of c = %zu\n", offsetof(Example, c));
    printf("offset of d = %zu\n", offsetof(Example, d));
    return 0;
}
C struct layout (with padding):

  Byte:  0    1    2    3    4    5    6    7
        +----+----+----+----+----+----+----+----+
        | a  | pad| pad| pad| b  | b  | b  | b  |
        +----+----+----+----+----+----+----+----+
  Byte:  8    9   10   11   12   13   14   15
        +----+----+----+----+----+----+----+----+
        | c  | pad| pad| pad| pad| pad| pad| pad|
        +----+----+----+----+----+----+----+----+
  Byte: 16   17   18   19   20   21   22   23
        +----+----+----+----+----+----+----+----+
        | d  | d  | d  | d  | d  | d  | d  | d  |
        +----+----+----+----+----+----+----+----+

  Total: 24 bytes (10 bytes of padding!)

Rust reorders fields to minimize padding:

// layout_rust.rs
use std::mem;

struct Example { a: u8, b: i32, c: u8, d: f64 }

fn main() {
    println!("size of Example = {}", mem::size_of::<Example>());
}
Rust struct layout (fields reordered by compiler):

  Byte:  0    1    2    3    4    5    6    7
        +----+----+----+----+----+----+----+----+
        | d  | d  | d  | d  | d  | d  | d  | d  |
        +----+----+----+----+----+----+----+----+
  Byte:  8    9   10   11   12   13   14   15
        +----+----+----+----+----+----+----+----+
        | b  | b  | b  | b  | a  | c  | pad| pad|
        +----+----+----+----+----+----+----+----+

  Total: 16 bytes (2 bytes of padding)

To force C-compatible layout, use #[repr(C)]:

// repr_c.rs
#[repr(C)]
struct Example { a: u8, b: i32, c: u8, d: f64 }

fn main() {
    println!("size (#[repr(C)]) = {}", std::mem::size_of::<Example>());
    // prints 24, same as C
}

Driver Prep: When passing structs to the kernel or hardware, you must control the layout. Use #[repr(C)] in Rust. In C, use __attribute__((packed)) if you need to eliminate padding entirely.

Try It: Reorder the fields in the C struct to minimize padding manually. What is the smallest sizeof you can achieve?

Enum Memory Layout

// enum_size.rs
use std::mem;

enum Color { Red, Green, Blue }

fn main() {
    println!("size of Color = {}", mem::size_of::<Color>());
    println!("size of Option<u8> = {}", mem::size_of::<Option<u8>>());
    println!("size of Option<Box<i32>> = {}", mem::size_of::<Option<Box<i32>>>());
}
$ rustc enum_size.rs && ./enum_size
size of Color = 1
size of Option<u8> = 2
size of Option<Box<i32>> = 8

Rust uses the smallest discriminant that fits. Color needs only 1 byte. A C enum is typically 4 bytes (int-sized).

Option<Box<i32>> is the same size as Box<i32> -- Rust uses "niche optimization": since Box can never be null, the null bit pattern represents None.

Option<Box<i32>> layout:

  Some(ptr):  | non-zero pointer value (8 bytes) |
  None:       | 0x0000000000000000     (8 bytes) |

  No extra tag byte needed.

Quick Knowledge Check

  1. What is the difference between a C union and a Rust enum with data?
  2. Why does Rust reorder struct fields by default?
  3. What does #[repr(C)] do?

Common Pitfalls

  • Reading the wrong union member in C. Undefined behavior. No runtime check. Use a tag field and validate it in every access.
  • Forgetting padding in C structs. sizeof(struct) may be larger than the sum of field sizes. Use offsetof to check.
  • Assuming C enum values are contiguous. You can assign arbitrary values: enum E { A = 0, B = 100 }. Do not use them as array indices without bounds checks.
  • Forgetting pub on Rust struct fields. The struct may be public, but fields are private by default.
  • Using #[repr(C)] everywhere in Rust. Only use it when you need C-compatible layout (FFI, memory-mapped I/O). Otherwise let the compiler optimize.
  • Ignoring niche optimization. Option<&T> is the same size as &T. Do not wrap references in custom tagged enums when Option already does it for free.

Pointers in C

Pointers are the single most important concept in C. They are how C talks to hardware, manages memory, and builds every non-trivial data structure. If you do not understand pointers, you cannot write a device driver, a kernel module, or any serious systems code.

The Address-Of Operator (&)

Every variable lives at a memory address. The & operator gives you that address.

/* addr.c */
#include <stdio.h>

int main(void)
{
    int x = 42;
    printf("value of x:   %d\n", x);
    printf("address of x: %p\n", (void *)&x);
    return 0;
}

Compile and run:

$ gcc -o addr addr.c && ./addr
value of x:   42
address of x: 0x7ffd3a2b1c4c

The exact address changes every run (ASLR). The point: &x yields a number that identifies where x sits in memory.

Declaring and Dereferencing Pointers

A pointer variable stores an address. The * in a declaration says "this variable holds an address." The * in an expression says "follow this address."

/* deref.c */
#include <stdio.h>

int main(void)
{
    int x = 10;
    int *p = &x;        /* p holds the address of x  */

    printf("x  = %d\n", x);
    printf("*p = %d\n", *p);   /* dereference: follow the address */

    *p = 99;             /* write through the pointer */
    printf("x  = %d\n", x);   /* x changed */

    return 0;
}

Output:

x  = 10
*p = 10
x  = 99

ASCII memory layout:

  Stack
  +--------+--------+
  |  name  | value  |
  +--------+--------+
  |   x    |   99   |  <-- address 0x1000 (example)
  +--------+--------+
  |   p    | 0x1000 |  <-- p stores address of x
  +--------+--------+

Driver Prep: In kernel code, hardware registers are accessed through pointers to specific physical addresses. volatile unsigned int *reg = (volatile unsigned int *)0x40021000; is real embedded C.

NULL Pointers

A pointer that points to nothing should be set to NULL. Dereferencing NULL is undefined behavior -- on most systems, a segfault.

/* null.c */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int *p = NULL;

    if (p == NULL) {
        printf("p is NULL, not dereferencing\n");
    }

    /* Uncomment the next line to crash: */
    /* printf("%d\n", *p); */

    return 0;
}

Caution: Dereferencing a NULL pointer is undefined behavior. The kernel uses NULL-pointer dereference as a common crash vector. Always check before you dereference.

Pointer Arithmetic

When you add 1 to a pointer, it advances by sizeof(*p) bytes, not one byte. This is how C walks through arrays.

/* arith.c */
#include <stdio.h>

int main(void)
{
    int arr[] = {10, 20, 30, 40, 50};
    int *p = arr;  /* arr decays to pointer to first element */

    for (int i = 0; i < 5; i++) {
        printf("p + %d = %p, value = %d\n", i, (void *)(p + i), *(p + i));
    }

    return 0;
}
p + 0 = 0x7ffc..., value = 10
p + 1 = 0x7ffc..., value = 20
p + 2 = 0x7ffc..., value = 30
p + 3 = 0x7ffc..., value = 40
p + 4 = 0x7ffc..., value = 50

Each step moves 4 bytes (sizeof(int)), not 1.

  Memory (each cell = 4 bytes for int)
  +----+----+----+----+----+
  | 10 | 20 | 30 | 40 | 50 |
  +----+----+----+----+----+
  p+0  p+1  p+2  p+3  p+4

Try It: Change the array to char arr[] and print addresses. Notice that each step now moves 1 byte instead of 4. Pointer arithmetic is always in units of the pointed-to type.

Arrays Decay to Pointers

In most expressions, an array name becomes a pointer to its first element. This is called "decay."

/* decay.c */
#include <stdio.h>

void print_first(int *p)
{
    printf("first element via pointer: %d\n", *p);
}

int main(void)
{
    int arr[] = {100, 200, 300};

    print_first(arr);  /* arr decays to &arr[0] */

    /* These are equivalent: */
    printf("arr[1]   = %d\n", arr[1]);
    printf("*(arr+1) = %d\n", *(arr + 1));

    return 0;
}

The key exception: sizeof(arr) gives the full array size, not the pointer size. Once passed to a function, the size information is lost.

/* sizeof_decay.c */
#include <stdio.h>

void show_size(int *p)
{
    /* This prints the size of the pointer, not the array */
    printf("inside function: sizeof(p) = %zu\n", sizeof(p));
}

int main(void)
{
    int arr[5] = {0};
    printf("in main: sizeof(arr) = %zu\n", sizeof(arr)); /* 20 */
    show_size(arr);   /* 8 on 64-bit */
    return 0;
}

Caution: This is why C functions that take arrays always need a separate length parameter. Forgetting this is the root cause of most buffer overflows.

Pointers to Structs and the -> Operator

When you have a pointer to a struct, you access members with -> instead of .. It is equivalent to (*p).member but far more readable.

/* structptr.c */
#include <stdio.h>

struct point {
    int x;
    int y;
};

void move_right(struct point *p, int dx)
{
    p->x += dx;   /* same as (*p).x += dx */
}

int main(void)
{
    struct point pt = {3, 7};
    printf("before: (%d, %d)\n", pt.x, pt.y);

    move_right(&pt, 10);
    printf("after:  (%d, %d)\n", pt.x, pt.y);

    return 0;
}

Output:

before: (3, 7)
after:  (13, 7)

Driver Prep: Kernel data structures (file_operations, net_device, device_driver) are all accessed through struct pointers. You will write code like dev->irq and filp->private_data constantly.

Double Pointers (Pointer to Pointer)

A double pointer stores the address of another pointer. This is used when a function needs to change which address a pointer holds.

/* doubleptr.c */
#include <stdio.h>
#include <stdlib.h>

void allocate(int **pp, int value)
{
    *pp = malloc(sizeof(int));
    if (*pp == NULL) {
        perror("malloc");
        exit(1);
    }
    **pp = value;
}

int main(void)
{
    int *p = NULL;

    allocate(&p, 42);
    printf("*p = %d\n", *p);

    free(p);
    return 0;
}
  Stack                    Heap
  +------+---------+       +------+
  |  p   | 0x9000 -|------>|  42  |
  +------+---------+       +------+
                            0x9000

  Inside allocate():
  +------+---------+
  |  pp  | &p     -|----> p (on caller's stack)
  +------+---------+

Common uses of double pointers:

  • Functions that allocate memory and return it via a parameter
  • Arrays of strings (char **argv)
  • Linked list head modification

void * -- The Generic Pointer

void * can point to any type. You cannot dereference it directly; you must cast it first. This is C's mechanism for generic programming.

/* voidptr.c */
#include <stdio.h>

void print_bytes(const void *data, int len)
{
    const unsigned char *bytes = (const unsigned char *)data;
    for (int i = 0; i < len; i++) {
        printf("%02x ", bytes[i]);
    }
    printf("\n");
}

int main(void)
{
    int x = 0x12345678;
    float f = 3.14f;

    printf("int bytes:   ");
    print_bytes(&x, sizeof(x));

    printf("float bytes: ");
    print_bytes(&f, sizeof(f));

    return 0;
}

malloc returns void *. The kernel's kmalloc does the same. Every callback mechanism in C uses void * for user data.

Rust Note: Rust does not have void *. Generics and trait objects (dyn Trait) replace it with type safety. In unsafe Rust, *const u8 or *mut u8 serve a similar role.

Common Pointer Bugs

Dangling Pointer

A pointer to memory that has been freed or gone out of scope.

/* dangling.c -- DO NOT DO THIS */
#include <stdio.h>
#include <stdlib.h>

int *bad_function(void)
{
    int local = 42;
    return &local;  /* WARNING: returning address of local variable */
}

int main(void)
{
    int *p = bad_function();
    /* p is now dangling -- local no longer exists */
    printf("%d\n", *p);  /* undefined behavior */
    return 0;
}
$ gcc -Wall -o dangling dangling.c
dangling.c: warning: function returns address of local variable

Always heed compiler warnings. They catch this.

Wild Pointer

An uninitialized pointer contains garbage. Dereferencing it is undefined.

/* wild.c -- DO NOT DO THIS */
#include <stdio.h>

int main(void)
{
    int *p;          /* uninitialized -- points to random address */
    /* *p = 10; */   /* undefined behavior, likely segfault */
    printf("p = %p\n", (void *)p);  /* garbage address */
    return 0;
}

Always initialize pointers to NULL or a valid address.

Use-After-Free

/* use_after_free.c -- DO NOT DO THIS */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int *p = malloc(sizeof(int));
    *p = 42;
    free(p);
    /* p still holds the old address but the memory is freed */
    /* *p = 99; */  /* undefined behavior */

    /* Good practice: set to NULL after free */
    p = NULL;
    return 0;
}

Off-By-One

/* offbyone.c -- DO NOT DO THIS */
#include <stdio.h>

int main(void)
{
    int arr[5] = {1, 2, 3, 4, 5};
    /* Bug: accessing arr[5], which is one past the end */
    for (int i = 0; i <= 5; i++) {   /* should be i < 5 */
        printf("%d ", arr[i]);
    }
    printf("\n");
    return 0;
}

Caution: Off-by-one errors in pointer/array access are the most common source of buffer overflows and security vulnerabilities in C code. The Linux kernel has had hundreds of CVEs from this class of bug.

Pointers and const

The placement of const matters. Learn to read declarations right-to-left.

/* constptr.c */
#include <stdio.h>

int main(void)
{
    int x = 10, y = 20;

    const int *p1 = &x;      /* pointer to const int: cannot change *p1 */
    /* *p1 = 99; */           /* ERROR */
    p1 = &y;                  /* OK: can change where p1 points */

    int *const p2 = &x;      /* const pointer to int: cannot change p2 */
    *p2 = 99;                 /* OK: can change the value */
    /* p2 = &y; */            /* ERROR */

    const int *const p3 = &x; /* const pointer to const int: nothing changes */
    /* *p3 = 99; */           /* ERROR */
    /* p3 = &y; */            /* ERROR */

    printf("x = %d\n", x);
    return 0;
}
  Read declarations right-to-left:

  const int *p       --> p is a pointer to int that is const
  int *const p       --> p is a const pointer to int
  const int *const p --> p is a const pointer to const int

Rust Note: Rust's &T is like const int * (read-only). Rust's &mut T is like int * (read-write). There is no equivalent of int *const because Rust bindings are immutable by default (let vs let mut).

Function Pointers

Functions have addresses too. A function pointer lets you call a function indirectly -- the basis of callbacks.

/* funcptr.c */
#include <stdio.h>

int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }

void apply(int (*op)(int, int), int x, int y)
{
    printf("result = %d\n", op(x, y));
}

int main(void)
{
    apply(add, 3, 4);   /* result = 7  */
    apply(mul, 3, 4);   /* result = 12 */

    /* Array of function pointers */
    int (*ops[])(int, int) = {add, mul};
    for (int i = 0; i < 2; i++) {
        printf("ops[%d](5, 6) = %d\n", i, ops[i](5, 6));
    }

    return 0;
}

Driver Prep: The Linux kernel's struct file_operations is a struct of function pointers. Every device driver fills one in. Understanding function pointers is non-negotiable for kernel work.

Putting It Together: A Tiny Stack

/* stack.c */
#include <stdio.h>
#include <stdlib.h>

#define STACK_MAX 16

struct stack {
    int data[STACK_MAX];
    int top;
};

void stack_init(struct stack *s)
{
    s->top = 0;
}

int stack_push(struct stack *s, int value)
{
    if (s->top >= STACK_MAX)
        return -1;   /* full */
    s->data[s->top++] = value;
    return 0;
}

int stack_pop(struct stack *s, int *out)
{
    if (s->top <= 0)
        return -1;   /* empty */
    *out = s->data[--s->top];
    return 0;
}

int main(void)
{
    struct stack s;
    stack_init(&s);

    stack_push(&s, 10);
    stack_push(&s, 20);
    stack_push(&s, 30);

    int val;
    while (stack_pop(&s, &val) == 0) {
        printf("popped: %d\n", val);
    }

    return 0;
}

Output:

popped: 30
popped: 20
popped: 10

Notice: every function takes struct stack *. The caller owns the struct; the functions borrow it via pointer. This is the C pattern that Rust formalizes with borrowing.

Try It: Add a stack_peek function that returns the top value without removing it. Use a pointer parameter for the output, just like stack_pop.

Knowledge Check

  1. What does *(arr + 3) mean if arr is an int array?
  2. Why must you pass array length separately in C?
  3. What is the difference between const int *p and int *const p?

Common Pitfalls

  • Forgetting to check for NULL after malloc -- crashes in production.
  • Returning a pointer to a local variable -- instant dangling pointer.
  • Confusing *p++ precedence -- it increments p, not *p. Use (*p)++.
  • Casting away const -- the compiler lets you, the program breaks at runtime.
  • Not setting freed pointers to NULL -- use-after-free becomes silent corruption.
  • Sizeof on a decayed pointer -- gives pointer size, not array size.
  • Pointer arithmetic on void* -- not standard C (GCC allows it as extension, treating it as char*).

References and Borrowing in Rust

Rust replaces C pointers with references that the compiler can reason about. You get the power of indirection without the bugs. This chapter shows how Rust's borrowing system works, what you give up compared to C, and what you gain.

Shared References: &T

A shared reference lets you read data without owning it. Multiple shared references can exist at the same time.

// shared_ref.rs
fn print_value(r: &i32) {
    println!("value = {}", r);
    // *r = 99;  // ERROR: cannot assign through a shared reference
}

fn main() {
    let x = 42;
    let r1 = &x;
    let r2 = &x;   // multiple shared references: OK

    print_value(r1);
    print_value(r2);
    println!("x is still {}", x);
}
$ rustc shared_ref.rs && ./shared_ref
value = 42
value = 42
x is still 42

Compare to C:

/* shared_ref.c */
#include <stdio.h>

void print_value(const int *r)
{
    printf("value = %d\n", *r);
    /* *r = 99; */  /* ERROR: r points to const int */
}

int main(void)
{
    int x = 42;
    const int *r1 = &x;
    const int *r2 = &x;

    print_value(r1);
    print_value(r2);
    printf("x is still %d\n", x);
    return 0;
}

The surface similarity is real: &T in Rust behaves like const T * in C. But Rust enforces it at a deeper level -- you cannot cast away the constness.

Mutable References: &mut T

A mutable reference gives exclusive read-write access. Only one &mut T can exist for a given value at a time, and no shared references may coexist with it.

// mut_ref.rs
fn increment(r: &mut i32) {
    *r += 1;
}

fn main() {
    let mut x = 10;
    increment(&mut x);
    increment(&mut x);
    println!("x = {}", x);  // 12
}
$ rustc mut_ref.rs && ./mut_ref
x = 12

The C equivalent:

/* mut_ref.c */
#include <stdio.h>

void increment(int *r)
{
    (*r)++;
}

int main(void)
{
    int x = 10;
    increment(&x);
    increment(&x);
    printf("x = %d\n", x);  /* 12 */
    return 0;
}

In C, any int * is mutable. There is no compiler-enforced exclusivity.

The Borrowing Rules

Rust enforces exactly two rules at compile time:

  1. One mutable reference, OR any number of shared references -- never both.
  2. References must always be valid -- no dangling references.
// borrow_rules.rs -- This will NOT compile
fn main() {
    let mut x = 10;
    let r1 = &x;       // shared borrow
    let r2 = &mut x;   // ERROR: cannot borrow x as mutable
                        // because it is also borrowed as immutable
    println!("{} {}", r1, r2);
}
error[E0502]: cannot borrow `x` as mutable because it is
              also borrowed as immutable

This is the key innovation. The compiler prevents data races and aliased mutation at compile time.

  Borrowing Rules Visualized
  ===========================

  Allowed:                    Allowed:
  +---+   &T    +---+        +---+  &mut T  +---+
  | A |-------->| x |        | A |--------->| x |
  +---+         +---+        +---+          +---+
  +---+   &T      ^               (only one)
  | B |----------/
  +---+

  FORBIDDEN:
  +---+   &T    +---+
  | A |-------->| x |
  +---+         +---+
  +---+ &mut T    ^
  | B |---------/          <-- compile error
  +---+

Rust Note: This is the fundamental difference from C. In C, you can have a const int * and an int * to the same address at the same time. The compiler cannot catch the resulting bugs.

Why This Prevents Data Races

A data race requires three conditions simultaneously:

  1. Two or more threads access the same memory
  2. At least one access is a write
  3. No synchronization

Rust's borrowing rules make condition 2 impossible when condition 1 is true. If multiple references exist, none can write. If a mutable reference exists, no other reference exists.

// no_data_race.rs
use std::thread;

fn main() {
    let mut data = vec![1, 2, 3];

    // This would fail to compile:
    // let r = &data;
    // thread::spawn(move || {
    //     data.push(4);  // cannot move `data` while borrowed
    // });
    // println!("{:?}", r);

    // Instead, you must choose: move or borrow, not both
    thread::spawn(move || {
        data.push(4);
        println!("{:?}", data);
    }).join().unwrap();
}
$ rustc no_data_race.rs && ./no_data_race
[1, 2, 3, 4]

In C, the equivalent code compiles without complaint and races silently.

Non-Lexical Lifetimes (NLL)

Rust's borrow checker is smart. A borrow ends at its last use, not at the end of the scope. This is called Non-Lexical Lifetimes.

// nll.rs
fn main() {
    let mut x = 10;

    let r1 = &x;
    println!("r1 = {}", r1);
    // r1's borrow ends here (last use)

    let r2 = &mut x;   // OK: r1 is no longer active
    *r2 += 5;
    println!("r2 = {}", r2);
}
$ rustc nll.rs && ./nll
r1 = 10
r2 = 15

Without NLL (older Rust), this would not compile. The borrow checker has gotten significantly smarter over time.

References to Structs

Like C's -> operator, Rust auto-dereferences through references when calling methods or accessing fields.

// struct_ref.rs
struct Point {
    x: i32,
    y: i32,
}

fn move_right(p: &mut Point, dx: i32) {
    p.x += dx;  // no -> needed, Rust auto-dereferences
}

fn show(p: &Point) {
    println!("({}, {})", p.x, p.y);
}

fn main() {
    let mut pt = Point { x: 3, y: 7 };
    show(&pt);
    move_right(&mut pt, 10);
    show(&pt);
}
$ rustc struct_ref.rs && ./struct_ref
(3, 7)
(13, 7)

Compare to the C version from Chapter 6: the logic is identical, but Rust distinguishes &Point (read-only) from &mut Point (read-write) at the type level.

Reborrowing

When you pass a &mut T to a function, Rust implicitly creates a shorter-lived mutable borrow. The original reference is "frozen" until the reborrow ends.

// reborrow.rs
fn add_one(val: &mut i32) {
    *val += 1;
}

fn add_two(val: &mut i32) {
    add_one(val);  // reborrow: val is implicitly &mut *val
    add_one(val);  // works again after first reborrow ends
}

fn main() {
    let mut x = 0;
    add_two(&mut x);
    println!("x = {}", x);  // 2
}
$ rustc reborrow.rs && ./reborrow
x = 2

This is why you can call multiple &mut functions in sequence -- each reborrow is temporary.

Dangling References: Impossible in Safe Rust

In C, returning a pointer to a local variable is a dangling pointer bug. In Rust, the compiler rejects it outright.

// dangling.rs -- This will NOT compile
fn bad() -> &i32 {
    let x = 42;
    &x   // ERROR: `x` does not live long enough
}

fn main() {
    let r = bad();
    println!("{}", r);
}
error[E0106]: missing lifetime specifier
error[E0515]: cannot return reference to local variable `x`

Compare to C, where this compiles with only a warning:

/* dangling.c */
#include <stdio.h>

int *bad(void)
{
    int x = 42;
    return &x;  /* warning: returning address of local variable */
}

int main(void)
{
    int *r = bad();
    printf("%d\n", *r);  /* undefined behavior */
    return 0;
}

Caution: The C version "works" on many systems because the stack frame has not been overwritten yet. This makes the bug hard to detect. Rust eliminates the entire class of bug.

Slices: References to Contiguous Data

A slice &[T] is a reference plus a length. It is Rust's answer to C's "pointer plus separate length parameter."

// slice.rs
fn sum(data: &[i32]) -> i32 {
    let mut total = 0;
    for val in data {
        total += val;
    }
    total
}

fn main() {
    let arr = [10, 20, 30, 40, 50];
    println!("sum of all: {}", sum(&arr));
    println!("sum of [1..4]: {}", sum(&arr[1..4]));  // 20+30+40
}
$ rustc slice.rs && ./slice
sum of all: 150
sum of [1..4]: 90
  Slice layout in memory:

  &arr[1..4]
  +----------+--------+
  | pointer  | length |  <-- fat pointer (2 words)
  | &arr[1]  |   3    |
  +----------+--------+
       |
       v
  +----+----+----+----+----+
  | 10 | 20 | 30 | 40 | 50 |   <-- underlying array
  +----+----+----+----+----+
       [1]  [2]  [3]

Try It: Create a function fn largest(data: &[i32]) -> i32 that finds the maximum value in a slice. Test it with different sub-slices of an array.

What You Give Up, What You Gain

Compared to C pointers, Rust references give up:

  • Pointer arithmetic -- no p + 3 on references. Use indexing or iterators.
  • NULL -- references cannot be null. Use Option<&T> instead.
  • Aliased mutation -- you cannot have multiple mutable paths to the same data.
  • Casting -- no implicit type-punning through references.

What you gain:

  • No dangling references -- compile-time guarantee.
  • No data races -- compile-time guarantee.
  • No null dereferences -- no null to dereference.
  • No buffer overflows -- slices carry their length.

When You Need unsafe

Sometimes you genuinely need raw pointer behavior. Rust's unsafe blocks let you opt out of borrow checking for specific operations.

// raw_ptr.rs
fn main() {
    let mut x = 42;

    // Create raw pointers (safe -- creating is fine)
    let r1 = &x as *const i32;
    let r2 = &mut x as *mut i32;

    // Dereference raw pointers (unsafe -- you take responsibility)
    unsafe {
        println!("r1 = {}", *r1);
        *r2 = 99;
        println!("r2 = {}", *r2);
    }
}
$ rustc raw_ptr.rs && ./raw_ptr
r1 = 42
r2 = 99

Driver Prep: The Rust-for-Linux project uses unsafe blocks to interact with kernel C APIs. The idea is to build safe abstractions on top of unsafe foundations. The unsafe surface area is small and auditable.

Pattern: Option Instead of NULL

Rust uses Option<&T> where C uses "pointer or NULL."

// option_ref.rs
fn find(data: &[i32], target: i32) -> Option<&i32> {
    for val in data {
        if *val == target {
            return Some(val);
        }
    }
    None
}

fn main() {
    let arr = [10, 20, 30, 40];

    match find(&arr, 30) {
        Some(val) => println!("found: {}", val),
        None => println!("not found"),
    }

    match find(&arr, 99) {
        Some(val) => println!("found: {}", val),
        None => println!("not found"),
    }
}
$ rustc option_ref.rs && ./option_ref
found: 30
not found

The C equivalent:

/* find.c */
#include <stdio.h>

const int *find(const int *data, int len, int target)
{
    for (int i = 0; i < len; i++) {
        if (data[i] == target)
            return &data[i];
    }
    return NULL;
}

int main(void)
{
    int arr[] = {10, 20, 30, 40};
    const int *result = find(arr, 4, 30);

    if (result)
        printf("found: %d\n", *result);
    else
        printf("not found\n");

    return 0;
}

The Rust version forces you to handle None. The C version lets you forget to check for NULL.

Putting It Together: A Borrowing Exercise

// borrow_exercise.rs
struct Stats {
    count: usize,
    sum: f64,
}

fn compute_stats(data: &[f64]) -> Stats {
    let count = data.len();
    let sum: f64 = data.iter().sum();
    Stats { count, sum }
}

fn print_stats(s: &Stats) {
    println!("count = {}", s.count);
    println!("sum   = {:.2}", s.sum);
    if s.count > 0 {
        println!("mean  = {:.2}", s.sum / s.count as f64);
    }
}

fn normalize(data: &mut [f64], factor: f64) {
    for val in data.iter_mut() {
        *val /= factor;
    }
}

fn main() {
    let mut data = vec![10.0, 20.0, 30.0, 40.0, 50.0];

    let stats = compute_stats(&data);  // shared borrow
    print_stats(&stats);               // shared borrow of stats

    normalize(&mut data, stats.sum);   // mutable borrow of data
    println!("\nnormalized: {:?}", data);
}
$ rustc borrow_exercise.rs && ./borrow_exercise
count = 5
sum   = 150.00
mean  = 30.00

normalized: [0.06666666666666667, 0.13333333333333333, 0.2, ...]

Try It: Modify the program to normalize by the mean instead of the sum. Make sure you compute the stats before the mutable borrow.

Knowledge Check

  1. What happens if you try to hold &x and &mut x at the same time?
  2. How does Rust represent "this function might return no value"?
  3. What is a "fat pointer" in the context of slices?

Common Pitfalls

  • Fighting the borrow checker -- restructure your code, do not reach for unsafe.
  • Holding borrows across .push() -- pushing to a Vec might reallocate, invalidating references.
  • Forgetting &mut -- writing increment(&x) when you mean increment(&mut x).
  • Confusing &T with T -- a reference is not a copy; you must dereference to get the value.
  • Using unsafe to "shut up the compiler" -- if the compiler says no, you probably have a real bug.
  • Indexing instead of iterating -- for val in &data is safer than for i in 0..data.len().

Arrays, Slices, and Strings

Arrays and strings are the data structures that break the most programs. In C, they are raw memory with no guardrails. In Rust, they carry their length and check bounds. This chapter covers both approaches and shows why buffer overflows keep making headlines.

C Arrays: Fixed Size on the Stack

A C array is a contiguous block of elements. The size must be a compile-time constant (in standard C89/C99 with some caveats).

/* c_array.c */
#include <stdio.h>

int main(void)
{
    int arr[5] = {10, 20, 30, 40, 50};

    printf("sizeof(arr) = %zu bytes\n", sizeof(arr));   /* 20 */
    printf("elements    = %zu\n", sizeof(arr) / sizeof(arr[0])); /* 5 */

    for (int i = 0; i < 5; i++) {
        printf("arr[%d] = %d\n", i, arr[i]);
    }

    return 0;
}
  Stack layout:
  +----+----+----+----+----+
  | 10 | 20 | 30 | 40 | 50 |
  +----+----+----+----+----+
  arr[0]              arr[4]

  Total: 5 * sizeof(int) = 20 bytes

No length is stored anywhere. You, the programmer, must track it.

Variable-Length Arrays (VLAs)

C99 added VLAs where the size comes from a runtime value. They live on the stack and can blow it up.

/* vla.c */
#include <stdio.h>

void fill(int n)
{
    int arr[n];   /* VLA: size determined at runtime */
    for (int i = 0; i < n; i++) {
        arr[i] = i * i;
    }
    for (int i = 0; i < n; i++) {
        printf("%d ", arr[i]);
    }
    printf("\n");
}

int main(void)
{
    fill(5);
    fill(10);
    return 0;
}

Caution: VLAs are banned in the Linux kernel (-Wvla flag). A large n overflows the kernel stack (typically 8 KB or 16 KB). Use kmalloc or fixed-size arrays instead.

Heap Arrays in C

For dynamic sizes, allocate on the heap with malloc.

/* heap_array.c */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int n = 5;
    int *arr = malloc(n * sizeof(int));
    if (arr == NULL) {
        perror("malloc");
        return 1;
    }

    for (int i = 0; i < n; i++) {
        arr[i] = (i + 1) * 100;
    }

    for (int i = 0; i < n; i++) {
        printf("arr[%d] = %d\n", i, arr[i]);
    }

    free(arr);
    return 0;
}
  Stack                        Heap
  +-----+----------+          +-----+-----+-----+-----+-----+
  | arr | 0x8000  -|--------->| 100 | 200 | 300 | 400 | 500 |
  +-----+----------+          +-----+-----+-----+-----+-----+
  | n   |    5     |
  +-----+----------+

Rust Arrays: [T; N]

Rust arrays have their size baked into the type. [i32; 5] is a different type from [i32; 3].

// rust_array.rs
fn main() {
    let arr: [i32; 5] = [10, 20, 30, 40, 50];

    println!("length = {}", arr.len());

    for (i, val) in arr.iter().enumerate() {
        println!("arr[{}] = {}", i, val);
    }

    // Bounds checking at runtime:
    // let bad = arr[10];  // panics: index out of bounds
}
$ rustc rust_array.rs && ./rust_array
length = 5
arr[0] = 10
arr[1] = 20
arr[2] = 30
arr[3] = 40
arr[4] = 50

The length is part of the type. No separate variable needed.

Rust Vec<T>: The Growable Array

Vec<T> is Rust's heap-allocated, growable array. It replaces C's malloc/realloc pattern.

// vec_demo.rs
fn main() {
    let mut v: Vec<i32> = Vec::new();

    v.push(10);
    v.push(20);
    v.push(30);

    println!("length   = {}", v.len());
    println!("capacity = {}", v.capacity());

    for val in &v {
        println!("{}", val);
    }

    v.pop();  // removes last element
    println!("after pop: {:?}", v);
}
$ rustc vec_demo.rs && ./vec_demo
length   = 3
capacity = 4
10
20
30
after pop: [10, 20]
  Vec<T> layout:

  Stack (Vec struct)         Heap (buffer)
  +----------+---------+    +----+----+----+----+
  | pointer  | 0x5000 -|--->| 10 | 20 | 30 |    |
  +----------+---------+    +----+----+----+----+
  | length   |    3    |    [0]  [1]  [2]  unused
  +----------+---------+
  | capacity |    4    |
  +----------+---------+

When you push beyond capacity, Vec allocates a new, larger buffer, copies the data, and frees the old one. This is automatic realloc.

Slices: &[T]

C has no concept of a slice. When you pass an array to a function in C, you pass a pointer and pray the caller also passed the correct length.

Rust slices bundle pointer and length together.

// slices.rs
fn sum(data: &[i32]) -> i32 {
    let mut total = 0;
    for &val in data {
        total += val;
    }
    total
}

fn main() {
    let arr = [1, 2, 3, 4, 5];
    let v = vec![10, 20, 30];

    // Slice from array
    println!("sum(arr)       = {}", sum(&arr));
    println!("sum(arr[1..4]) = {}", sum(&arr[1..4]));  // 2+3+4

    // Slice from Vec
    println!("sum(v)         = {}", sum(&v));
    println!("sum(v[..2])    = {}", sum(&v[..2]));     // 10+20
}
$ rustc slices.rs && ./slices
sum(arr)       = 15
sum(arr[1..4]) = 9
sum(v)         = 60
sum(v[..2])    = 30

The C equivalent requires explicit length passing:

/* sum.c */
#include <stdio.h>

int sum(const int *data, int len)
{
    int total = 0;
    for (int i = 0; i < len; i++) {
        total += data[i];
    }
    return total;
}

int main(void)
{
    int arr[] = {1, 2, 3, 4, 5};
    printf("sum = %d\n", sum(arr, 5));
    printf("sum[1..4] = %d\n", sum(arr + 1, 3));
    return 0;
}

Rust Note: Slices perform bounds checking on every index access. This costs a branch instruction but prevents buffer overflows. In hot loops, the optimizer often eliminates the check.

Try It: Write a Rust function fn max_value(data: &[i32]) -> Option<i32> that returns None for an empty slice and Some(max) otherwise. Compare how much simpler it is than the C equivalent.

C Strings: Null-Terminated char *

C strings are arrays of char terminated by a zero byte ('\0'). There is no stored length.

/* cstring.c */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char greeting[] = "Hello";

    printf("string:  %s\n", greeting);
    printf("strlen:  %zu\n", strlen(greeting));   /* 5 */
    printf("sizeof:  %zu\n", sizeof(greeting));   /* 6 (includes \0) */

    /* Print each byte */
    for (int i = 0; i <= (int)strlen(greeting); i++) {
        printf("  [%d] = '%c' (%d)\n", i, greeting[i], greeting[i]);
    }

    return 0;
}
string:  Hello
strlen:  5
sizeof:  6
  [0] = 'H' (72)
  [1] = 'e' (101)
  [2] = 'l' (108)
  [3] = 'l' (108)
  [4] = 'o' (111)
  [5] = '' (0)       <-- null terminator
  C string in memory:
  +---+---+---+---+---+----+
  | H | e | l | l | o | \0 |
  +---+---+---+---+---+----+
  greeting[0]        greeting[5]

The Dangerous String Functions

strcpy -- No Bounds Checking

/* strcpy_bad.c -- DO NOT DO THIS in production */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char buf[8];
    char *input = "This string is way too long for buf";

    strcpy(buf, input);  /* BUFFER OVERFLOW */
    printf("%s\n", buf);
    return 0;
}

Caution: strcpy writes until it hits \0 in the source. It has no idea how big the destination is. This is the cause of thousands of CVEs.

strncpy -- Better, But Tricky

/* strncpy_demo.c */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char buf[8];
    strncpy(buf, "Hello, World!", sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';  /* strncpy may not null-terminate! */

    printf("buf = '%s'\n", buf);  /* "Hello, " (truncated) */
    return 0;
}

Caution: strncpy does NOT guarantee null-termination if the source is longer than the buffer. Always set the last byte to \0 manually.

snprintf -- The Safe Choice

/* snprintf_demo.c */
#include <stdio.h>

int main(void)
{
    char buf[16];
    int written = snprintf(buf, sizeof(buf), "Count: %d", 42);

    printf("buf = '%s'\n", buf);
    printf("would have written %d chars\n", written);

    /* If written >= sizeof(buf), truncation occurred */
    if (written >= (int)sizeof(buf)) {
        printf("WARNING: output truncated\n");
    }

    return 0;
}

snprintf always null-terminates (if size > 0) and tells you how many characters it wanted to write. Use it for all string formatting in C.

Driver Prep: The Linux kernel uses scnprintf, a variant that returns the number of characters actually written (not the would-have-been count). Never use sprintf in kernel code.

Rust Strings: String and &str

Rust has two main string types:

  • String -- owned, heap-allocated, growable (like Vec<u8> with UTF-8 guarantee)
  • &str -- borrowed string slice (like &[u8] but guaranteed UTF-8)
// rust_strings.rs
fn greet(name: &str) {
    println!("Hello, {}!", name);
}

fn main() {
    // String literal -> &str (stored in binary, read-only)
    let s1: &str = "world";
    greet(s1);

    // Owned String on the heap
    let s2: String = String::from("Rust");
    greet(&s2);  // &String auto-coerces to &str

    // Building strings
    let mut s3 = String::new();
    s3.push_str("Hello");
    s3.push(' ');
    s3.push_str("World");
    println!("{}", s3);

    // Length is always known
    println!("len = {}", s3.len());      // bytes
    println!("chars = {}", s3.chars().count()); // unicode scalar values
}
$ rustc rust_strings.rs && ./rust_strings
Hello, world!
Hello, Rust!
Hello World
len = 11
chars = 11
  String layout:

  Stack (String struct)       Heap
  +----------+---------+    +---+---+---+---+---+---+---+---+---+---+---+
  | pointer  | 0x7000 -|--->| H | e | l | l | o |   | W | o | r | l | d |
  +----------+---------+    +---+---+---+---+---+---+---+---+---+---+---+
  | length   |   11    |    UTF-8 bytes, NO null terminator
  +----------+---------+
  | capacity |   16    |
  +----------+---------+

  &str layout:
  +----------+---------+
  | pointer  | 0x7000  |    Points into String or binary data
  +----------+---------+
  | length   |   11    |    Fat pointer, always knows its length
  +----------+---------+

Rust Note: Rust strings are always valid UTF-8. You cannot put arbitrary bytes in a String. For raw bytes, use Vec<u8> or &[u8]. For OS-interface strings, use OsString and OsStr.

Buffer Overflows: Why They Happen

Buffer overflows happen when code writes past the end of a buffer. In C, this is trivially easy:

/* overflow.c */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char password[8] = "secret";
    char buffer[8];

    printf("Enter name: ");
    /* gets() has been removed from the C standard.
       scanf without width limit is equally dangerous: */
    scanf("%s", buffer);  /* no length limit! */

    printf("buffer = '%s'\n", buffer);
    printf("password = '%s'\n", password);

    return 0;
}

If the user types more than 7 characters, buffer overflows into password (or whatever is adjacent on the stack). This is how stack-smashing attacks work. The Rust equivalent simply cannot overflow:

// no_overflow.rs
use std::io;

fn main() {
    let mut buffer = String::new();
    println!("Enter name:");
    io::stdin().read_line(&mut buffer).unwrap();

    // String grows as needed -- cannot overflow
    println!("buffer = '{}'", buffer.trim());
}

Side by Side: Processing CSV Lines

A practical example showing the difference in safety.

C Version

/* csv_c.c */
#include <stdio.h>
#include <string.h>

void parse_line(const char *line)
{
    char buf[256];
    strncpy(buf, line, sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';

    char *token = strtok(buf, ",");
    int col = 0;
    while (token != NULL) {
        printf("  col %d: '%s'\n", col, token);
        token = strtok(NULL, ",");
        col++;
    }
}

int main(void)
{
    const char *lines[] = {
        "Alice,30,Engineer",
        "Bob,25,Designer",
        "Carol,35,Manager",
    };

    for (int i = 0; i < 3; i++) {
        printf("Line %d:\n", i);
        parse_line(lines[i]);
    }

    return 0;
}

Rust Version

// csv_rust.rs
fn parse_line(line: &str) {
    for (col, token) in line.split(',').enumerate() {
        println!("  col {}: '{}'", col, token);
    }
}

fn main() {
    let lines = [
        "Alice,30,Engineer",
        "Bob,25,Designer",
        "Carol,35,Manager",
    ];

    for (i, line) in lines.iter().enumerate() {
        println!("Line {}:", i);
        parse_line(line);
    }
}

The Rust version has no fixed-size buffer, no null terminator management, no strtok with its hidden static state, and no possible overflow.

Try It: Extend the C version to handle lines longer than 256 characters. Notice how much code you need. Then notice that the Rust version already handles any length.

Knowledge Check

  1. What is the difference between strlen(s) and sizeof(s) for a char array?
  2. Why is strcpy dangerous? What should you use instead?
  3. How does a Rust &str differ from a C const char *?

Common Pitfalls

  • Forgetting the null terminator -- C strings need +1 byte. char buf[5] holds at most 4 characters.
  • Using strlen in a loop condition -- it traverses the string every call. Cache the length.
  • strncpy does not null-terminate -- if source is longer than n, the destination has no \0.
  • Mixing up bytes and characters -- UTF-8 characters can be 1-4 bytes. strlen counts bytes.
  • Array decay in sizeof -- sizeof(arr) inside the declaring function gives array size; inside a called function, it gives pointer size.
  • Off-by-one in loop bounds -- i <= n when you mean i < n.
  • Not checking snprintf return -- it tells you if truncation occurred; ignoring it means silent data loss.

Dynamic Memory: malloc/free vs Box/Vec

Stack memory is fast but limited in size and lifetime. When you need memory that outlives a function call or whose size is not known at compile time, you allocate it on the heap. C gives you raw tools; Rust gives you safe abstractions over the same tools.

malloc and free

malloc requests bytes from the heap. free returns them. Everything between is your responsibility.

/* malloc_basic.c */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int *p = malloc(sizeof(int));
    if (p == NULL) {
        perror("malloc");
        return 1;
    }

    *p = 42;
    printf("*p = %d\n", *p);

    free(p);
    p = NULL;   /* good practice: prevent use-after-free */

    return 0;
}
  Before malloc:              After malloc:
  Stack                       Stack           Heap
  +---+------+                +---+--------+  +----+
  | p | NULL |                | p | 0x5000-|->| 42 |
  +---+------+                +---+--------+  +----+

  After free:
  Stack           Heap
  +---+------+    +----+
  | p | NULL |    | ?? |  <-- memory returned to allocator
  +---+------+    +----+

calloc and realloc

calloc allocates and zero-initializes. realloc resizes an existing allocation.

/* calloc_realloc.c */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    /* calloc: allocate 5 ints, all zeroed */
    int *arr = calloc(5, sizeof(int));
    if (arr == NULL) {
        perror("calloc");
        return 1;
    }

    for (int i = 0; i < 5; i++) {
        arr[i] = (i + 1) * 10;
    }

    /* realloc: grow to 10 ints */
    int *tmp = realloc(arr, 10 * sizeof(int));
    if (tmp == NULL) {
        perror("realloc");
        free(arr);
        return 1;
    }
    arr = tmp;

    /* New elements are uninitialized (not zeroed!) */
    for (int i = 5; i < 10; i++) {
        arr[i] = (i + 1) * 10;
    }

    for (int i = 0; i < 10; i++) {
        printf("arr[%d] = %d\n", i, arr[i]);
    }

    free(arr);
    return 0;
}

Caution: Never do arr = realloc(arr, new_size). If realloc fails, it returns NULL and you lose the original pointer -- a memory leak. Always use a temporary variable.

Memory Leaks

A memory leak occurs when allocated memory is never freed. The process holds onto memory it can never use again.

/* leak.c -- DO NOT DO THIS */
#include <stdio.h>
#include <stdlib.h>

void leaky(void)
{
    int *p = malloc(1024);
    if (p == NULL) return;
    *p = 42;
    /* forgot to free(p) -- leaked 1024 bytes */
}

int main(void)
{
    for (int i = 0; i < 1000; i++) {
        leaky();  /* leaks 1 MB total */
    }
    printf("Done (but leaked ~1 MB)\n");
    return 0;
}

Run with Valgrind to detect:

$ gcc -g -o leak leak.c
$ valgrind --leak-check=full ./leak
...
==12345== LEAK SUMMARY:
==12345==    definitely lost: 1,024,000 bytes in 1,000 blocks

Double-Free

Freeing the same pointer twice is undefined behavior. It can corrupt the allocator's internal data structures, leading to crashes or exploits.

/* double_free.c -- DO NOT DO THIS */
#include <stdlib.h>

int main(void)
{
    int *p = malloc(sizeof(int));
    *p = 42;
    free(p);
    /* free(p); */  /* UNDEFINED BEHAVIOR: double-free */

    p = NULL;       /* Setting to NULL after free prevents double-free */
    free(p);        /* free(NULL) is safe -- it does nothing */

    return 0;
}

Caution: Double-free bugs are a major source of security vulnerabilities. Attackers can exploit heap corruption caused by double-free to execute arbitrary code.

Rust's Box<T>

Box<T> allocates a single value on the heap. When the Box goes out of scope, the memory is freed automatically.

// box_demo.rs
fn main() {
    let b = Box::new(42);
    println!("*b = {}", *b);

    // b goes out of scope here -> memory freed automatically
    // No free() needed. No leak possible. No double-free possible.
}
  Stack                  Heap
  +---+---------+       +----+
  | b | 0x5000 -|------>| 42 |
  +---+---------+       +----+

  When b drops:
  - Heap memory at 0x5000 is freed
  - b is gone from the stack

Rust's Vec<T>

Vec<T> is a growable heap array. It replaces C's malloc/realloc/free pattern.

// vec_grow.rs
fn main() {
    let mut v: Vec<i32> = Vec::new();
    println!("len={}, cap={}", v.len(), v.capacity());

    for i in 0..10 {
        v.push(i * 10);
        println!("pushed {}: len={}, cap={}", i * 10, v.len(), v.capacity());
    }

    println!("\ncontents: {:?}", v);
}
$ rustc vec_grow.rs && ./vec_grow
len=0, cap=0
pushed 0: len=1, cap=4
pushed 10: len=2, cap=4
pushed 20: len=3, cap=4
pushed 30: len=4, cap=4
pushed 40: len=5, cap=8
pushed 50: len=6, cap=8
pushed 60: len=7, cap=8
pushed 70: len=8, cap=8
pushed 80: len=9, cap=16
pushed 90: len=10, cap=16

contents: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

Vec doubles its capacity when full (the exact growth factor is an implementation detail). Compare this to manually calling realloc in C.

RAII and the Drop Trait

RAII (Resource Acquisition Is Initialization) means: tie resource cleanup to scope exit. In Rust, every type can implement the Drop trait to run cleanup code when a value goes out of scope.

// drop_demo.rs
struct Resource {
    name: String,
}

impl Resource {
    fn new(name: &str) -> Self {
        println!("[{}] acquired", name);
        Resource {
            name: String::from(name),
        }
    }
}

impl Drop for Resource {
    fn drop(&mut self) {
        println!("[{}] released", self.name);
    }
}

fn main() {
    let _a = Resource::new("A");
    {
        let _b = Resource::new("B");
        let _c = Resource::new("C");
        println!("-- end of inner scope --");
    }  // B and C dropped here (reverse order)
    println!("-- end of main --");
}  // A dropped here
$ rustc drop_demo.rs && ./drop_demo
[A] acquired
[B] acquired
[C] acquired
-- end of inner scope --
[C] released
[B] released
-- end of main --
[A] released

Drop order is reverse of creation order, just like C++ destructors.

Driver Prep: The Rust-for-Linux project uses RAII extensively. When a device driver struct is dropped, it automatically unregisters the device, frees DMA buffers, and releases IRQs. This eliminates an entire class of kernel resource leaks.

C Equivalent of RAII: goto cleanup

C does not have destructors. The standard pattern is goto cleanup:

/* goto_cleanup.c */
#include <stdio.h>
#include <stdlib.h>

int process_data(int n)
{
    int ret = -1;

    int *buf1 = malloc(n * sizeof(int));
    if (buf1 == NULL) goto out;

    int *buf2 = malloc(n * sizeof(int));
    if (buf2 == NULL) goto free_buf1;

    /* Do work with buf1 and buf2 */
    for (int i = 0; i < n; i++) {
        buf1[i] = i;
        buf2[i] = i * 2;
    }
    printf("Processed %d elements\n", n);
    ret = 0;

    free(buf2);
free_buf1:
    free(buf1);
out:
    return ret;
}

int main(void)
{
    if (process_data(10) != 0) {
        fprintf(stderr, "processing failed\n");
        return 1;
    }
    return 0;
}

Driver Prep: This goto cleanup pattern is the single most common pattern in Linux kernel code. Every probe() function looks like this. Rust's RAII replaces it entirely.

Detecting Memory Bugs

Valgrind

$ gcc -g -o program program.c
$ valgrind --leak-check=full --show-leak-kinds=all ./program

Valgrind instruments every memory access at runtime. It catches:

  • Memory leaks
  • Use-after-free
  • Double-free
  • Buffer overflows (heap)
  • Uninitialized reads

AddressSanitizer (ASan)

$ gcc -g -fsanitize=address -o program program.c
$ ./program

ASan is faster than Valgrind (2x slowdown vs 20x). It catches the same bugs plus stack buffer overflows. Both GCC and Clang support it.

/* asan_demo.c -- compile with -fsanitize=address */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int *p = malloc(5 * sizeof(int));
    p[5] = 99;  /* heap-buffer-overflow */
    free(p);
    return 0;
}
$ gcc -g -fsanitize=address -o asan_demo asan_demo.c && ./asan_demo
=================================================================
==12345==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x...
WRITE of size 4 at 0x... thread T0
    #0 0x... in main asan_demo.c:8

Try It: Compile the leak example above with -fsanitize=address and compare the output to Valgrind. ASan's reports are often more readable.

Side-by-Side: Linked List

This is the definitive comparison. A singly-linked list shows every difference between C and Rust memory management.

C Linked List

/* linked_list_c.c */
#include <stdio.h>
#include <stdlib.h>

struct node {
    int value;
    struct node *next;
};

struct node *list_push(struct node *head, int value)
{
    struct node *n = malloc(sizeof(struct node));
    if (n == NULL) {
        perror("malloc");
        exit(1);
    }
    n->value = value;
    n->next = head;
    return n;
}

void list_print(const struct node *head)
{
    const struct node *cur = head;
    while (cur != NULL) {
        printf("%d -> ", cur->value);
        cur = cur->next;
    }
    printf("NULL\n");
}

void list_free(struct node *head)
{
    struct node *cur = head;
    while (cur != NULL) {
        struct node *next = cur->next;
        free(cur);
        cur = next;
    }
}

int main(void)
{
    struct node *list = NULL;

    list = list_push(list, 10);
    list = list_push(list, 20);
    list = list_push(list, 30);

    list_print(list);   /* 30 -> 20 -> 10 -> NULL */

    list_free(list);
    return 0;
}

Three things that can go wrong in the C version:

  1. Forget list_free -- memory leak
  2. Use list after list_free -- use-after-free
  3. Free a node twice -- double-free

Rust Linked List

// linked_list_rust.rs

enum List {
    Cons(i32, Box<List>),
    Nil,
}

use List::{Cons, Nil};

impl List {
    fn push(self, value: i32) -> List {
        Cons(value, Box::new(self))
    }

    fn print(&self) {
        let mut current = self;
        loop {
            match current {
                Cons(val, next) => {
                    print!("{} -> ", val);
                    current = next;
                }
                Nil => {
                    println!("Nil");
                    break;
                }
            }
        }
    }
}

fn main() {
    let list = Nil
        .push(10)
        .push(20)
        .push(30);

    list.print();  // 30 -> 20 -> 10 -> Nil

    // list goes out of scope here.
    // Each Box is dropped recursively. No leak. No double-free.
}
$ rustc linked_list_rust.rs && ./linked_list_rust
30 -> 20 -> 10 -> Nil
  Memory layout:

  C version:
  head -> [30|*]----> [20|*]----> [10|NULL]
          malloc'd    malloc'd    malloc'd
          (must free  (must free  (must free
           manually)   manually)   manually)

  Rust version:
  list = Cons(30, Box::new(
           Cons(20, Box::new(
             Cons(10, Box::new(Nil))))))

  Stack           Heap           Heap           Heap
  +------+       +------+       +------+       +------+
  | list |------>| 30   |       | 20   |       | 10   |
  +------+       | Box -|------>| Box -|------>| Nil  |
                 +------+       +------+       +------+
                 (auto-drop)    (auto-drop)    (auto-drop)

No list_free function needed. When list goes out of scope, each Box drops its contents, which triggers the next drop, recursively freeing the entire chain.

Rust Note: For long lists, recursive drop can overflow the stack. In production, you would implement Drop manually with an iterative loop. The standard library's LinkedList<T> handles this.

Comparing Memory Management Styles

  +-------------------+-------------------------+------------------------+
  | Operation         | C                       | Rust                   |
  +-------------------+-------------------------+------------------------+
  | Allocate one      | malloc(sizeof(T))       | Box::new(val)          |
  | Allocate array    | malloc(n * sizeof(T))   | Vec::with_capacity(n)  |
  | Zero-allocate     | calloc(n, sizeof(T))    | vec![0; n]             |
  | Resize            | realloc(p, new_size)    | v.reserve(additional)  |
  | Free              | free(p)                 | automatic (Drop)       |
  | Detect leaks      | valgrind, ASan          | not needed*            |
  | Detect use-after  | valgrind, ASan          | compile error           |
  | Detect double-free| valgrind, ASan          | compile error           |
  +-------------------+-------------------------+------------------------+
  * Rust can still leak via mem::forget or Rc cycles, but accidental
    leaks from forgetting free() are impossible.

std::mem::drop and Early Cleanup

Sometimes you want to free memory before a scope ends. Rust's drop() function consumes a value, triggering its destructor.

// early_drop.rs
fn main() {
    let data = vec![1, 2, 3, 4, 5];
    println!("data = {:?}", data);

    drop(data);  // free heap memory now

    // println!("{:?}", data);  // ERROR: use of moved value
    println!("data has been freed");
}

This is safe because drop takes ownership (moves the value). After the move, the compiler prevents any further access.

Try It: Create a struct that holds a large Vec<u8> (say, 10 MB). Print its size, then drop it, then allocate another. Observe that you cannot accidentally use the first after dropping.

Knowledge Check

  1. What happens if realloc fails and you wrote p = realloc(p, new_size)?
  2. What is the Rust equivalent of C's goto cleanup pattern?
  3. Why can Rust guarantee no double-free at compile time?

Common Pitfalls

  • Forgetting to check malloc's return -- it can return NULL on allocation failure.
  • Using realloc incorrectly -- always assign to a temporary first.
  • Mixing allocators -- do not free() memory from a custom allocator, or vice versa.
  • Forgetting to free in every code path -- C's goto cleanup exists because error paths leak memory.
  • Leaking Rc cycles in Rust -- Rc<T> does not use a garbage collector. Cycles leak. Use Weak<T> to break them.
  • Calling mem::forget casually -- it prevents Drop from running. Use it only when you know what you are doing.
  • Heap fragmentation -- many small allocations can fragment memory. Use pool allocators or arena allocation for high-frequency allocation patterns.

Ownership and Lifetimes

This is the chapter that makes Rust click. Ownership is how Rust manages memory without a garbage collector and without manual free. Lifetimes are how the compiler proves your references are always valid. Together, they replace the entire class of memory bugs that plague C programs.

The Three Rules of Ownership

  1. Every value has exactly one owner.
  2. When the owner goes out of scope, the value is dropped (freed).
  3. Ownership can be transferred (moved), not duplicated (by default).
// ownership_basic.rs
fn main() {
    let s1 = String::from("hello");  // s1 owns the String
    let s2 = s1;                     // ownership moves to s2
    // println!("{}", s1);           // ERROR: s1 no longer valid
    println!("{}", s2);              // OK: s2 is the owner now
}
  Before move:
  Stack                  Heap
  +------+---------+    +---+---+---+---+---+
  |  s1  | ptr   --|----| h | e | l | l | o |
  |      | len=5   |    +---+---+---+---+---+
  |      | cap=5   |
  +------+---------+

  After move (s1 = s2):
  Stack                  Heap
  +------+---------+
  |  s1  | invalid |    (no longer accessible)
  +------+---------+
  +------+---------+    +---+---+---+---+---+
  |  s2  | ptr   --|----| h | e | l | l | o |
  |      | len=5   |    +---+---+---+---+---+
  |      | cap=5   |
  +------+---------+

There is still only one pointer to the heap data. When s2 goes out of scope, the heap memory is freed exactly once.

Compare to C, where this "just works" but is dangerous:

/* ownership_c.c -- C has no concept of ownership */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    char *s1 = malloc(6);
    strcpy(s1, "hello");

    char *s2 = s1;  /* both pointers to same memory */

    free(s1);
    /* s2 is now dangling -- use-after-free if accessed */
    /* free(s2); */  /* double-free if called */

    return 0;
}

Rust Note: C trusts the programmer to track who "owns" each allocation mentally. Rust makes ownership explicit in the type system and enforces it at compile time. There is zero runtime cost.

Move Semantics

Passing a value to a function moves it. The caller loses access.

// move_fn.rs
fn take_ownership(s: String) {
    println!("got: {}", s);
}  // s is dropped here

fn main() {
    let s = String::from("hello");
    take_ownership(s);
    // println!("{}", s);  // ERROR: value moved
}

To let the function use the value without taking ownership, pass a reference (borrowing, covered in Chapter 7).

// borrow_fn.rs
fn borrow(s: &String) {
    println!("borrowed: {}", s);
}  // nothing dropped -- we only had a reference

fn main() {
    let s = String::from("hello");
    borrow(&s);
    println!("still mine: {}", s);  // OK
}

Copy vs Clone

Copy Types

Simple, stack-only types implement the Copy trait. Assignment copies the bits instead of moving.

// copy_demo.rs
fn main() {
    let x: i32 = 42;
    let y = x;      // copy, not move -- i32 implements Copy
    println!("x = {}, y = {}", x, y);  // both valid

    let a: f64 = 3.14;
    let b = a;      // copy
    println!("a = {}, b = {}", a, b);  // both valid
}

Types that implement Copy: all integer types, f32, f64, bool, char, tuples of Copy types, fixed-size arrays of Copy types.

Types that do NOT implement Copy: String, Vec<T>, Box<T>, anything that owns heap memory.

Clone

Clone is explicit duplication. You call .clone() to make a deep copy.

// clone_demo.rs
fn main() {
    let s1 = String::from("hello");
    let s2 = s1.clone();  // deep copy: new heap allocation

    println!("s1 = {}", s1);  // OK -- s1 still valid
    println!("s2 = {}", s2);
}
  After clone:
  Stack                  Heap
  +------+---------+    +---+---+---+---+---+
  |  s1  | ptr   --|----| h | e | l | l | o |  <-- allocation 1
  |      | len=5   |    +---+---+---+---+---+
  +------+---------+
  +------+---------+    +---+---+---+---+---+
  |  s2  | ptr   --|----| h | e | l | l | o |  <-- allocation 2
  |      | len=5   |    +---+---+---+---+---+
  +------+---------+

  Two separate heap allocations. No aliasing.

Try It: Try to assign a Vec<i32> without cloning. Observe the move error. Then add .clone() and see that both vectors work independently.

Lifetimes: What 'a Means

A lifetime is the scope during which a reference is valid. Usually, the compiler infers lifetimes automatically. When it cannot, you annotate them.

// lifetime_basic.rs
fn longer<'a>(s1: &'a str, s2: &'a str) -> &'a str {
    if s1.len() >= s2.len() {
        s1
    } else {
        s2
    }
}

fn main() {
    let s1 = String::from("long string");
    let result;
    {
        let s2 = String::from("hi");
        result = longer(&s1, &s2);
        println!("longer: {}", result);
    }
    // println!("{}", result);  // Would fail if s2's data were used
}

The 'a annotation says: "the returned reference lives at least as long as the shorter of the two input references." This lets the compiler verify that the returned reference does not outlive its data.

  Lifetime visualization:

  fn longer<'a>(s1: &'a str, s2: &'a str) -> &'a str
                 ^^            ^^              ^^
                 |             |               |
                 +------- all the same 'a -----+
                 |
                 The returned reference is valid for
                 the INTERSECTION of s1 and s2's lifetimes.

  Timeline:
  |---- s1 valid -----------------------------|
  |         |---- s2 valid ----|              |
  |         |---- 'a ---------|              |
  |         |-- result valid --|              |

Why the Compiler Needs Lifetimes

Without lifetimes, the compiler cannot tell if this function is safe:

// lifetime_needed.rs -- what if there were no annotations?
fn first_word(s: &str) -> &str {
    let bytes = s.as_bytes();
    for (i, &byte) in bytes.iter().enumerate() {
        if byte == b' ' {
            return &s[0..i];
        }
    }
    s
}

fn main() {
    let sentence = String::from("hello world");
    let word = first_word(&sentence);
    println!("first word: {}", word);
}

This compiles without explicit lifetime annotations because of lifetime elision rules. The compiler infers that the output lifetime matches the input. Here are the three elision rules:

  1. Each reference parameter gets its own lifetime.
  2. If there is exactly one input lifetime, it is assigned to all output lifetimes.
  3. If one parameter is &self or &mut self, its lifetime is assigned to outputs.

When these rules are insufficient, you must annotate manually.

Structs with References

If a struct holds a reference, it needs a lifetime parameter.

// struct_lifetime.rs
struct Excerpt<'a> {
    text: &'a str,
}

impl<'a> Excerpt<'a> {
    fn new(text: &'a str) -> Self {
        Excerpt { text }
    }

    fn display(&self) {
        println!("Excerpt: {}", self.text);
    }
}

fn main() {
    let novel = String::from("Call me Ishmael. Some years ago...");
    let excerpt = Excerpt::new(&novel[..16]);
    excerpt.display();
}
$ rustc struct_lifetime.rs && ./struct_lifetime
Excerpt: Call me Ishmael.

The lifetime 'a guarantees that the Excerpt cannot outlive the string it references. In C, you would just store a const char * with no such guarantee.

/* struct_lifetime.c -- C version: no safety */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct excerpt {
    const char *text;  /* dangling pointer? who knows */
};

int main(void)
{
    struct excerpt e;
    {
        char *novel = strdup("Call me Ishmael. Some years ago...");
        e.text = novel;
        free(novel);  /* e.text is now dangling */
    }
    /* printf("%s\n", e.text); */  /* undefined behavior */
    return 0;
}

Caution: The C version compiles without warnings. The dangling pointer is invisible to the compiler. Rust rejects the equivalent code outright.

The Borrow Checker in Action

The borrow checker enforces ownership and lifetime rules at compile time. Here is a classic example it catches:

// borrow_checker.rs -- This will NOT compile
fn main() {
    let mut v = vec![1, 2, 3];
    let first = &v[0];   // immutable borrow

    v.push(4);            // mutable borrow (push might reallocate!)

    println!("{}", first); // ERROR: first might be dangling
}
error[E0502]: cannot borrow `v` as mutable because it is also
              borrowed as immutable

Why? push might reallocate the underlying buffer, invalidating first. The borrow checker catches this at compile time. In C, this is a silent bug:

/* borrow_bug.c -- C version: silent bug */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int *arr = malloc(3 * sizeof(int));
    arr[0] = 1; arr[1] = 2; arr[2] = 3;

    int *first = &arr[0];  /* pointer into arr */

    /* realloc might move the buffer */
    int *tmp = realloc(arr, 100 * sizeof(int));
    if (tmp) arr = tmp;

    /* first might point to freed memory now */
    printf("%d\n", *first);  /* undefined behavior */

    free(arr);
    return 0;
}

Rc<T>: Reference-Counted Shared Ownership

Sometimes multiple parts of your program need to own the same data. Rc<T> (Reference Counted) tracks how many owners exist and frees the data when the count reaches zero.

// rc_demo.rs
use std::rc::Rc;

fn main() {
    let a = Rc::new(String::from("shared data"));
    println!("ref count after a: {}", Rc::strong_count(&a));

    let b = Rc::clone(&a);  // increment ref count, not deep copy
    println!("ref count after b: {}", Rc::strong_count(&a));

    {
        let c = Rc::clone(&a);
        println!("ref count after c: {}", Rc::strong_count(&a));
    }  // c dropped, ref count decremented

    println!("ref count after c dropped: {}", Rc::strong_count(&a));
    println!("data: {}", a);
}
$ rustc rc_demo.rs && ./rc_demo
ref count after a: 1
ref count after b: 2
ref count after c: 3
ref count after c dropped: 2
data: shared data
  Rc layout:

  Stack                 Heap (Rc control block + data)
  +---+---------+      +----------------+-------------------+
  | a | ptr   --|----->| strong_count=2 | "shared data"     |
  +---+---------+      | weak_count=0   |                   |
  +---+---------+      +----------------+-------------------+
  | b | ptr   --|------^
  +---+---------+

  When all Rc's drop, strong_count hits 0 -> data freed.

Caution: Rc<T> is single-threaded only. It does not use atomic operations. Using it across threads is a compile error.

Arc<T>: Atomic Reference Counting

Arc<T> is the thread-safe version of Rc<T>. It uses atomic operations to update the reference count.

// arc_demo.rs
use std::sync::Arc;
use std::thread;

fn main() {
    let data = Arc::new(vec![1, 2, 3, 4, 5]);

    let mut handles = vec![];

    for i in 0..3 {
        let data_clone = Arc::clone(&data);
        let handle = thread::spawn(move || {
            let sum: i32 = data_clone.iter().sum();
            println!("thread {}: sum = {}", i, sum);
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("ref count: {}", Arc::strong_count(&data));
}
$ rustc arc_demo.rs && ./arc_demo
thread 0: sum = 15
thread 1: sum = 15
thread 2: sum = 15
ref count: 1

Driver Prep: In Rust-for-Linux, Arc<T> is used for shared device state. The kernel's own struct kref is the C equivalent -- a manual reference counter with explicit kref_get and kref_put calls.

When to Use What

  +------------------+--------------------------------------------+
  | Ownership Model  | Use When                                   |
  +------------------+--------------------------------------------+
  | T (owned)        | Single owner, value moves with assignment  |
  | &T               | Read-only access, no ownership change      |
  | &mut T           | Exclusive read-write access, temporary     |
  | Box<T>           | Single owner, heap allocation needed       |
  | Rc<T>            | Multiple owners, single-threaded           |
  | Arc<T>           | Multiple owners, multi-threaded            |
  | Rc<RefCell<T>>   | Multiple owners + interior mutability (ST) |
  | Arc<Mutex<T>>    | Multiple owners + interior mutability (MT) |
  +------------------+--------------------------------------------+

'static Lifetime

The 'static lifetime means the reference is valid for the entire program duration. String literals have 'static lifetime because they are embedded in the binary.

// static_lifetime.rs
fn get_greeting() -> &'static str {
    "Hello, world!"  // string literal: lives forever
}

fn main() {
    let s = get_greeting();
    println!("{}", s);
}

'static does NOT mean "allocated forever." It means "valid for the rest of the program." Leaked memory is also 'static, but that is usually a bug.

When to Use unsafe

unsafe does not turn off the borrow checker. It unlocks five specific powers:

  1. Dereference raw pointers (*const T, *mut T)
  2. Call unsafe functions
  3. Access mutable statics
  4. Implement unsafe traits
  5. Access fields of union types
// unsafe_demo.rs
fn main() {
    let mut x = 42;

    // Creating raw pointers is safe
    let r1 = &x as *const i32;
    let r2 = &mut x as *mut i32;

    // Dereferencing raw pointers requires unsafe
    unsafe {
        println!("r1 = {}", *r1);
        *r2 = 99;
        println!("r2 = {}", *r2);
    }

    // Calling C functions via FFI
    unsafe {
        let pid = libc_getpid();
        println!("pid = {}", pid);
    }
}

// Declaring an external C function
extern "C" {
    #[link_name = "getpid"]
    fn libc_getpid() -> i32;
}
$ rustc unsafe_demo.rs && ./unsafe_demo
r1 = 42
r2 = 99
pid = 12345

Driver Prep: Rust-for-Linux kernel modules use unsafe at the boundary between Rust and the C kernel API. The goal is to wrap unsafe operations in safe abstractions so that driver authors rarely need unsafe in their own code.

C Trusts the Programmer, Rust Trusts the Compiler

This is the philosophical divide. C gives you full control and assumes you know what you are doing. Rust restricts what you can express and proves correctness at compile time.

/* trust_programmer.c */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int *p = malloc(sizeof(int));
    *p = 42;

    int *alias = p;    /* two pointers, same memory */
    free(p);           /* free through one */
    *alias = 99;       /* use through other -- UB */

    /* C: compiles, runs, "works" until it doesn't */
    return 0;
}
// trust_compiler.rs
fn main() {
    let p = Box::new(42);
    // let alias = p;  // ownership moves, p is invalidated
    // println!("{}", *p);  // compile error: use of moved value

    // Rust: does not compile. Bug caught before it exists.
}

The cost of Rust's approach: a learning curve and occasional fights with the borrow checker. The benefit: entire categories of bugs are impossible.

Knowledge Check

  1. What is the difference between Copy and Clone?
  2. What does the lifetime 'a in fn foo<'a>(x: &'a str) -> &'a str mean?
  3. Why is Rc<T> not safe to use across threads?

Common Pitfalls

  • Cloning everything to avoid the borrow checker -- this works but defeats the purpose. Restructure your code instead.
  • Confusing move in closures with ownership transfer -- move captures variables by value, taking ownership.
  • Forgetting that Rc cycles leak -- use Weak<T> references to break cycles.
  • Overusing 'static -- not every reference needs to live forever. Use the narrowest lifetime that works.
  • Putting lifetimes on everything -- trust the elision rules first. Annotate only when the compiler asks.
  • Using unsafe to bypass the borrow checker -- if the borrow checker rejects your code, you almost certainly have a real bug. unsafe does not fix logic errors.
  • Thinking unsafe means "no rules" -- you still must uphold Rust's safety invariants. unsafe means "I, the programmer, guarantee these invariants hold."

Binary, Hex, and Bitwise Operations

Systems code lives at the bit level. Device registers, protocol headers, permission flags -- all of them demand that you read, set, and clear individual bits. This chapter gives you the vocabulary and the muscle memory for that work, first in C, then in Rust.

Number Representations

A byte is eight bits. How you display those bits is a matter of base.

Base 10 (decimal):   42
Base 2  (binary):    00101010
Base 8  (octal):     052
Base 16 (hex):       0x2A

Hex is the lingua franca of systems programming because one hex digit maps to exactly four bits. Two hex digits map to one byte. Clean, compact, no ambiguity.

Hex digit:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
Binary:    0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

C Literals

#include <stdio.h>

int main(void) {
    int dec = 42;
    int oct = 052;       /* leading zero = octal */
    int hex = 0x2A;
    int bin = 0b00101010; /* C23 / GCC extension */

    printf("dec=%d  oct=%o  hex=0x%X  bin(manual)=00101010\n", dec, oct, hex);
    printf("All equal? %d\n", (dec == oct) && (oct == hex) && (hex == bin));
    return 0;
}

Compile: gcc -std=c2x -o numrep numrep.c && ./numrep

Caution: A leading zero makes a literal octal in C. Writing int x = 010; gives you 8, not 10. This has caused real bugs in real codebases.

Rust Literals

fn main() {
    let dec = 42;
    let oct = 0o52;        // explicit 0o prefix -- no silent octal trap
    let hex = 0x2A;
    let bin = 0b00101010;

    println!("dec={dec}  oct={oct}  hex=0x{hex:X}  bin=0b{bin:08b}");
    println!("All equal? {}", dec == oct && oct == hex && hex == bin);
}

Rust Note: Rust requires 0o for octal. There is no silent leading-zero trap. Rust also lets you use underscores as visual separators: 0b0010_1010, 1_000_000.

Try It: Print the value 0xDEAD_BEEF in decimal, octal, and binary in both C and Rust. How many bits does it need?

Bitwise Operators

Six operators manipulate bits directly. They work on integer types only.

Operator   C     Rust    Meaning
-------------------------------------------------------
AND        &     &       1 only if both bits are 1
OR         |     |       1 if either bit is 1
XOR        ^     ^       1 if bits differ
NOT        ~     !       flip every bit (C: ~x, Rust: !x)
Left shift <<    <<      shift bits left, fill with 0
Right shift>>    >>      shift bits right (see below)

AND -- Masking

AND keeps only the bits that are 1 in both operands. Use it to extract bits.

#include <stdio.h>

int main(void) {
    unsigned char val  = 0b11010110;
    unsigned char mask = 0b00001111;  /* keep low nibble */

    unsigned char result = val & mask;
    printf("0x%02X & 0x%02X = 0x%02X\n", val, mask, result);
    /* output: 0xD6 & 0x0F = 0x06 */
    return 0;
}
    1 1 0 1 0 1 1 0     val   (0xD6)
AND 0 0 0 0 1 1 1 1     mask  (0x0F)
  = 0 0 0 0 0 1 1 0     result(0x06)

OR -- Setting Bits

OR forces bits to 1. Use it to set flags.

#include <stdio.h>

int main(void) {
    unsigned char flags = 0b00000010;  /* bit 1 already set */
    unsigned char bit4  = 0b00010000;  /* want to set bit 4 */

    flags = flags | bit4;
    printf("flags = 0x%02X\n", flags);  /* 0x12 */
    return 0;
}

XOR -- Toggling Bits

XOR flips bits where the mask is 1, leaves others untouched.

#include <stdio.h>

int main(void) {
    unsigned char val = 0b11001100;
    unsigned char tog = 0b00001111;

    val ^= tog;
    printf("After toggle: 0x%02X\n", val);  /* 0xC3 = 0b11000011 */
    return 0;
}

NOT -- Inverting All Bits

#include <stdio.h>

int main(void) {
    unsigned char val = 0b00001111;
    unsigned char inv = ~val;
    printf("~0x%02X = 0x%02X\n", val, inv);  /* ~0x0F = 0xF0 */
    return 0;
}

Rust Note: Rust uses ! for bitwise NOT (not ~). The ! operator on a boolean gives logical NOT; on an integer, bitwise NOT. Context determines behavior.

Rust Equivalents -- All Operators at Once

fn main() {
    let a: u8 = 0b1100_1010;
    let b: u8 = 0b0011_1100;

    println!("AND:  {:08b}", a & b);   // 00001000
    println!("OR:   {:08b}", a | b);   // 11111110
    println!("XOR:  {:08b}", a ^ b);   // 11110110
    println!("NOT:  {:08b}", !a);      // 00110101
    println!("SHL:  {:08b}", a << 2);  // 00101000 (top bits lost)
    println!("SHR:  {:08b}", a >> 2);  // 00110010
}

Shifts: Arithmetic vs Logical

Left shift always fills with zeros. Right shift is where trouble lurks.

Logical right shift: fills the vacated high bits with 0. Arithmetic right shift: fills the vacated high bits with the sign bit.

Logical  >> 2:   1100 0000  ->  0011 0000   (zero-filled)
Arithmetic >> 2: 1100 0000  ->  1111 0000   (sign-extended)

In C, the behavior of >> on signed integers is implementation-defined. Most compilers do arithmetic shift, but it is not guaranteed.

#include <stdio.h>

int main(void) {
    int           signed_val   = -128;  /* 0xFFFFFF80 in 32-bit */
    unsigned int  unsigned_val = 0xFF000000u;

    printf("signed   >> 4 = 0x%08X\n", signed_val >> 4);   /* likely 0xFFFFFFF8 */
    printf("unsigned >> 4 = 0x%08X\n", unsigned_val >> 4);  /* always 0x0FF00000 */
    return 0;
}

Caution: Never right-shift a negative value in portable C code. Use unsigned types for bit manipulation. Always.

In Rust, the rules are explicit:

fn main() {
    let s: i8 = -128_i8;     // 0x80
    let u: u8 = 0x80;

    // Rust: >> is arithmetic on signed, logical on unsigned. Always.
    println!("signed   >> 2 = {}", s >> 2);   // -32 (arithmetic)
    println!("unsigned >> 2 = {}", u >> 2);   // 32  (logical)
}

Rust Note: Rust defines shift behavior precisely: arithmetic for signed, logical for unsigned. In debug mode, shifting by >= the bit width panics. In release mode, it wraps. No undefined behavior either way.

Common Bit Patterns

These are the bread and butter of driver and kernel code.

Check if Bit N is Set

#include <stdio.h>
#include <stdint.h>

int bit_is_set(uint32_t val, int n) {
    return (val >> n) & 1;
}

int main(void) {
    uint32_t reg = 0xA5;  /* 1010 0101 */
    for (int i = 7; i >= 0; i--)
        printf("%d", bit_is_set(reg, i));
    printf("\n");
    return 0;
}

Set Bit N

val |= (1u << n);

Clear Bit N

val &= ~(1u << n);

Toggle Bit N

val ^= (1u << n);

All Four in Rust

fn main() {
    let mut val: u32 = 0b1010_0101;
    let n = 3;

    let is_set = (val >> n) & 1 == 1;
    println!("Bit {n} set? {is_set}");

    val |= 1 << n;           // set bit 3
    println!("After set:    0b{val:08b}");

    val &= !(1u32 << n);     // clear bit 3
    println!("After clear:  0b{val:08b}");

    val ^= 1 << n;           // toggle bit 3
    println!("After toggle: 0b{val:08b}");
}

Driver Prep: Every hardware register you touch in a driver uses exactly these four operations. A typical register write looks like: reg |= ENABLE_BIT; writel(reg, base + OFFSET);

Powers of Two

Bit shifts and powers of two are the same thing.

1 << 0  =  1       =  2^0
1 << 1  =  2       =  2^1
1 << 4  =  16      =  2^4
1 << 10 =  1024    =  2^10
1 << 20 =  1048576 =  2^20  (1 MiB)

A classic trick: check if a number is a power of two.

#include <stdio.h>
#include <stdbool.h>

bool is_power_of_two(unsigned int x) {
    return x != 0 && (x & (x - 1)) == 0;
}

int main(void) {
    unsigned int tests[] = {0, 1, 2, 3, 4, 15, 16, 255, 256};
    int n = sizeof(tests) / sizeof(tests[0]);

    for (int i = 0; i < n; i++)
        printf("%3u -> %s\n", tests[i], is_power_of_two(tests[i]) ? "yes" : "no");
    return 0;
}

Why does x & (x - 1) work?

x     = 0001 0000   (16, a power of 2)
x - 1 = 0000 1111
x & (x-1) = 0000 0000   => zero, so it IS a power of 2

x     = 0001 0100   (20, NOT a power of 2)
x - 1 = 0001 0011
x & (x-1) = 0001 0000   => non-zero, so it is NOT
fn is_power_of_two(x: u32) -> bool {
    x != 0 && (x & (x - 1)) == 0
}

fn main() {
    for x in [0, 1, 2, 3, 4, 15, 16, 255, 256] {
        println!("{x:>3} -> {}", if is_power_of_two(x) { "yes" } else { "no" });
    }
}

Rust Note: Rust provides u32::is_power_of_two() in the standard library. But knowing the bit trick matters -- you will see x & (x - 1) in kernel code.

Counting Set Bits (Population Count)

How many bits are 1 in a value? This operation is called popcount.

#include <stdio.h>
#include <stdint.h>

int popcount_naive(uint32_t x) {
    int count = 0;
    while (x) {
        count += x & 1;
        x >>= 1;
    }
    return count;
}

/* Brian Kernighan's trick: x & (x-1) clears the lowest set bit */
int popcount_kernighan(uint32_t x) {
    int count = 0;
    while (x) {
        x &= x - 1;
        count++;
    }
    return count;
}

int main(void) {
    uint32_t val = 0xDEADBEEF;
    printf("popcount(0x%X) = %d (naive)\n", val, popcount_naive(val));
    printf("popcount(0x%X) = %d (kernighan)\n", val, popcount_kernighan(val));

    /* GCC/Clang built-in -- compiles to a single POPCNT instruction */
    printf("popcount(0x%X) = %d (builtin)\n", val, __builtin_popcount(val));
    return 0;
}
fn main() {
    let val: u32 = 0xDEAD_BEEF;
    println!("popcount(0x{val:X}) = {}", val.count_ones());

    // Also available: count_zeros, leading_zeros, trailing_zeros
    println!("leading zeros:  {}", val.leading_zeros());
    println!("trailing zeros: {}", val.trailing_zeros());
}

Try It: Write a function that returns the position of the highest set bit in a u32. Test it with 0x00000001 (should return 0) and 0x80000000 (should return 31).

Extracting and Inserting Bit Fields

Registers often pack multiple values into a single word.

 31       24 23    16 15     8 7       0
+----------+---------+--------+---------+
|  field_d |field_c  |field_b | field_a |
+----------+---------+--------+---------+

Extract field_b (bits 8..15):

#include <stdio.h>
#include <stdint.h>

int main(void) {
    uint32_t reg = 0xAABBCCDD;

    /* Extract bits [15:8] */
    uint32_t field_b = (reg >> 8) & 0xFF;
    printf("field_b = 0x%02X\n", field_b);  /* 0xCC */

    /* Insert new value 0x42 into bits [15:8] */
    reg &= ~(0xFF << 8);       /* clear the field */
    reg |= (0x42u << 8);       /* set new value   */
    printf("reg = 0x%08X\n", reg);  /* 0xAABB42DD */
    return 0;
}
fn main() {
    let mut reg: u32 = 0xAABB_CCDD;

    // Extract bits [15:8]
    let field_b = (reg >> 8) & 0xFF;
    println!("field_b = 0x{field_b:02X}");  // 0xCC

    // Insert 0x42 into bits [15:8]
    reg &= !(0xFFu32 << 8);
    reg |= 0x42u32 << 8;
    println!("reg = 0x{reg:08X}");  // 0xAABB42DD
}

Driver Prep: The pattern (reg >> SHIFT) & MASK to read and reg = (reg & ~(MASK << SHIFT)) | (val << SHIFT) to write is the single most common operation in Linux driver code. Burn it into memory.

No Implicit Conversions in Rust

In C, bitwise operators can silently promote or truncate types:

#include <stdio.h>

int main(void) {
    unsigned char a = 0xFF;
    /* ~a promotes a to int first, result is NOT 0x00 */
    printf("~a = 0x%08X\n", ~a);  /* 0xFFFFFF00 -- surprise! */
    return 0;
}

Caution: In C, ~ on a char promotes to int first. The result has 32 (or 64) bits, not 8. This causes subtle bugs in mask comparisons.

Rust does not promote:

fn main() {
    let a: u8 = 0xFF;
    let b: u8 = !a;
    println!("!0xFF = 0x{b:02X}");  // 0x00 -- exactly 8 bits, no surprise
}

Quick Knowledge Check

  1. What does 0x1F & 0x0F evaluate to? Work it out in binary before running code.
  2. You have a 32-bit register value. Bits [7:4] contain a 4-bit version number. Write the C expression to extract it.
  3. Why is x & (x - 1) == 0 not a correct power-of-two test when x is 0?

Common Pitfalls

  • Shifting by the type width. 1 << 32 on a 32-bit int is undefined in C. Use 1u << 31 as the maximum, or 1ULL << 32 for 64-bit.
  • Signed operands in bit ops. ~(-1) is well-defined but confusing. Use unsigned.
  • Forgetting the u suffix. 1 << 31 in C is signed overflow (UB on 32-bit int). Write 1u << 31.
  • Comparing after NOT. ~(unsigned char)0xFF is int, not unsigned char. Cast or mask the result.
  • Rust shift panics. 1u32 << 32 panics in debug. Design around it.

Bit Masks and Bit Fields

Hardware registers and protocol headers cram multiple values into single words. You need two skills: defining named masks for individual bits, and using C's bit field syntax for struct-level packing. This chapter covers both, along with the Rust alternatives.

Defining Flags with #define

The simplest approach: one #define per bit.

#include <stdio.h>
#include <stdint.h>

/* Permission flags -- each is a single bit */
#define PERM_READ    (1u << 0)   /* 0x01 */
#define PERM_WRITE   (1u << 1)   /* 0x02 */
#define PERM_EXEC    (1u << 2)   /* 0x04 */
#define PERM_SETUID  (1u << 3)   /* 0x08 */

void print_perms(uint8_t flags) {
    printf("Permissions:");
    if (flags & PERM_READ)   printf(" READ");
    if (flags & PERM_WRITE)  printf(" WRITE");
    if (flags & PERM_EXEC)   printf(" EXEC");
    if (flags & PERM_SETUID) printf(" SETUID");
    printf("\n");
}

int main(void) {
    uint8_t file_perms = PERM_READ | PERM_WRITE;
    print_perms(file_perms);

    /* Add execute */
    file_perms |= PERM_EXEC;
    print_perms(file_perms);

    /* Remove write */
    file_perms &= ~PERM_WRITE;
    print_perms(file_perms);

    /* Check a specific flag */
    if (file_perms & PERM_EXEC)
        printf("File is executable\n");

    return 0;
}

The pattern is always the same:

Set flag:     flags |=  FLAG;
Clear flag:   flags &= ~FLAG;
Toggle flag:  flags ^=  FLAG;
Test flag:    if (flags & FLAG)

Using enum for Flags

Some codebases use enum instead of #define. The effect is similar, but enums are visible in debuggers.

#include <stdio.h>
#include <stdint.h>

typedef enum {
    OPT_VERBOSE = (1u << 0),
    OPT_DEBUG   = (1u << 1),
    OPT_FORCE   = (1u << 2),
    OPT_DRY_RUN = (1u << 3),
} options_t;

int main(void) {
    uint32_t opts = OPT_VERBOSE | OPT_DEBUG;

    if (opts & OPT_VERBOSE)
        printf("Verbose mode on\n");
    if (opts & OPT_DEBUG)
        printf("Debug mode on\n");
    if (!(opts & OPT_FORCE))
        printf("Force mode off\n");

    return 0;
}

Caution: In C, enum values are int-sized. Combining them with | may produce a value outside the enum's defined range, which is technically valid but some compilers warn. Use an unsigned integer type for the combined flags variable.

Combining and Testing Multiple Flags

Test whether all of several flags are set:

#include <stdio.h>
#include <stdint.h>

#define FLAG_A  (1u << 0)
#define FLAG_B  (1u << 1)
#define FLAG_C  (1u << 2)

int main(void) {
    uint32_t flags = FLAG_A | FLAG_C;
    uint32_t required = FLAG_A | FLAG_B;

    /* Test if ALL required flags are set */
    if ((flags & required) == required)
        printf("All required flags set\n");
    else
        printf("Missing some required flags\n");

    /* Test if ANY of the required flags are set */
    if (flags & required)
        printf("At least one required flag set\n");

    return 0;
}

Caution: if (flags & required) tests if any bit matches. if ((flags & required) == required) tests if all bits match. Confusing the two is a classic bug.

C Bit Fields

C lets you declare struct members with explicit bit widths.

#include <stdio.h>

struct status_reg {
    unsigned int enabled  : 1;
    unsigned int mode     : 3;   /* 0-7 */
    unsigned int priority : 4;   /* 0-15 */
    unsigned int error    : 1;
    unsigned int reserved : 23;
};

int main(void) {
    struct status_reg sr = {0};
    sr.enabled  = 1;
    sr.mode     = 5;
    sr.priority = 12;

    printf("enabled=%u  mode=%u  priority=%u  error=%u\n",
           sr.enabled, sr.mode, sr.priority, sr.error);
    printf("sizeof(struct status_reg) = %zu\n", sizeof(struct status_reg));
    return 0;
}

The layout in memory (assuming little-endian, no padding):

Bit:  31                 9  8    7      4 3    1  0
     +--------------------+---+----------+------+---+
     |     reserved (23)  |err| pri (4)  |mode 3|en |
     +--------------------+---+----------+------+---+

When to Use Bit Fields

Good uses:

  • Modeling hardware registers in documentation or test code
  • Compact storage of boolean flags
  • Quick prototyping

Bad uses:

  • Anything that crosses a machine boundary (network, file, IPC)
  • Portable code that must work on multiple compilers/architectures

Why Bit Fields Are Dangerous for Portable Code

The C standard leaves almost everything about bit fields implementation-defined:

  • Allocation order (MSB-first or LSB-first) is compiler-dependent
  • Whether a bit field can straddle a storage-unit boundary is compiler-dependent
  • Signedness of plain int bit fields is compiler-dependent
  • Padding between bit fields is compiler-dependent

Caution: Two different compilers (or the same compiler on two architectures) can lay out the same bit field struct differently. Never use bit fields for data that leaves the current process -- use explicit shifts and masks instead.

Try It: Compile the status_reg example on your machine. Cast the struct to uint32_t via memcpy and print the raw hex value. Does bit 0 correspond to enabled? Try on a different compiler or with -m32 if available.

Register Definitions with Bit Fields and Masks

Real driver code uses both approaches. Bit fields for readability during development, explicit masks for the actual hardware access.

#include <stdio.h>
#include <stdint.h>
#include <string.h>

/* Mask-based definitions (portable, used for real HW access) */
#define CTRL_ENABLE_BIT    (1u << 0)
#define CTRL_MODE_MASK     (0x7u << 1)
#define CTRL_MODE_SHIFT    1
#define CTRL_PRIO_MASK     (0xFu << 4)
#define CTRL_PRIO_SHIFT    4
#define CTRL_ERR_BIT       (1u << 8)

/* Helper macros */
#define CTRL_SET_MODE(reg, m)  \
    (((reg) & ~CTRL_MODE_MASK) | (((m) & 0x7u) << CTRL_MODE_SHIFT))
#define CTRL_GET_MODE(reg)     \
    (((reg) & CTRL_MODE_MASK) >> CTRL_MODE_SHIFT)

int main(void) {
    uint32_t ctrl = 0;

    ctrl |= CTRL_ENABLE_BIT;            /* enable */
    ctrl = CTRL_SET_MODE(ctrl, 5);       /* mode = 5 */
    ctrl |= (12u << CTRL_PRIO_SHIFT);   /* priority = 12 */

    printf("ctrl = 0x%08X\n", ctrl);
    printf("mode = %u\n", CTRL_GET_MODE(ctrl));
    printf("enabled = %u\n", (ctrl & CTRL_ENABLE_BIT) ? 1 : 0);
    return 0;
}

Driver Prep: Linux kernel drivers follow this exact pattern. Look at any register header file in drivers/ -- you will see _MASK, _SHIFT, and helper macros everywhere. The kernel avoids bit fields for hardware registers.

Rust: Manual Masks

The same mask-and-shift approach works in Rust, but with stronger types.

const CTRL_ENABLE_BIT: u32  = 1 << 0;
const CTRL_MODE_MASK: u32   = 0x7 << 1;
const CTRL_MODE_SHIFT: u32  = 1;
const CTRL_PRIO_MASK: u32   = 0xF << 4;
const CTRL_PRIO_SHIFT: u32  = 4;
const CTRL_ERR_BIT: u32     = 1 << 8;

fn ctrl_set_mode(reg: u32, mode: u32) -> u32 {
    (reg & !CTRL_MODE_MASK) | ((mode & 0x7) << CTRL_MODE_SHIFT)
}

fn ctrl_get_mode(reg: u32) -> u32 {
    (reg & CTRL_MODE_MASK) >> CTRL_MODE_SHIFT
}

fn main() {
    let mut ctrl: u32 = 0;

    ctrl |= CTRL_ENABLE_BIT;
    ctrl = ctrl_set_mode(ctrl, 5);
    ctrl |= 12 << CTRL_PRIO_SHIFT;

    println!("ctrl = 0x{ctrl:08X}");
    println!("mode = {}", ctrl_get_mode(ctrl));
    println!("enabled = {}", (ctrl & CTRL_ENABLE_BIT) != 0);
}

Rust: The bitflags Crate

For flag-style bitmasks (not multi-bit fields), the bitflags crate is the Rust community standard. Add bitflags = "2" to Cargo.toml.

// In Cargo.toml: bitflags = "2"
use bitflags::bitflags;

bitflags! {
    #[derive(Debug, Clone, Copy, PartialEq)]
    struct Permissions: u8 {
        const READ    = 0b0000_0001;
        const WRITE   = 0b0000_0010;
        const EXEC    = 0b0000_0100;
        const SETUID  = 0b0000_1000;
    }
}

fn main() {
    let mut perms = Permissions::READ | Permissions::WRITE;
    println!("{:?}", perms);  // Permissions(READ | WRITE)

    perms.insert(Permissions::EXEC);
    println!("{:?}", perms);  // Permissions(READ | WRITE | EXEC)

    perms.remove(Permissions::WRITE);
    println!("{:?}", perms);  // Permissions(READ | EXEC)

    if perms.contains(Permissions::EXEC) {
        println!("Executable");
    }

    // Test multiple flags at once
    let required = Permissions::READ | Permissions::EXEC;
    println!("Has required? {}", perms.contains(required));

    // Raw bits access
    println!("raw bits = 0b{:08b}", perms.bits());
}

Rust Note: bitflags gives you type safety -- you cannot accidentally OR a Permissions with an unrelated flag type. The raw bits are always accessible via .bits() when you need to pass them to hardware or system calls.

Protocol Header Parsing with Bit Masks

Real-world example: parsing an IPv4 header's first byte.

Byte 0 of IPv4 header:
  +---+---+---+---+---+---+---+---+
  | Version (4b)  |  IHL (4b)     |
  +---+---+---+---+---+---+---+---+
  Bits: 7 6 5 4   3 2 1 0
#include <stdio.h>
#include <stdint.h>

int main(void) {
    /* Simulated first byte of an IPv4 header: version=4, IHL=5 */
    uint8_t byte0 = 0x45;

    uint8_t version = (byte0 >> 4) & 0x0F;
    uint8_t ihl     = byte0 & 0x0F;

    printf("Version: %u\n", version);  /* 4 */
    printf("IHL:     %u (header = %u bytes)\n", ihl, ihl * 4);  /* 5 (20 bytes) */

    /* Construct a byte from fields */
    uint8_t built = ((4u & 0x0F) << 4) | (5u & 0x0F);
    printf("Built:   0x%02X\n", built);  /* 0x45 */
    return 0;
}
fn main() {
    let byte0: u8 = 0x45;

    let version = (byte0 >> 4) & 0x0F;
    let ihl = byte0 & 0x0F;

    println!("Version: {version}");
    println!("IHL:     {ihl} (header = {} bytes)", ihl as u32 * 4);

    let built: u8 = ((4 & 0x0F) << 4) | (5 & 0x0F);
    println!("Built:   0x{built:02X}");
}

A Larger Example: TCP Flags

TCP flags live in a single byte. Let us define, combine, and test them.

#include <stdio.h>
#include <stdint.h>

#define TCP_FIN  (1u << 0)
#define TCP_SYN  (1u << 1)
#define TCP_RST  (1u << 2)
#define TCP_PSH  (1u << 3)
#define TCP_ACK  (1u << 4)
#define TCP_URG  (1u << 5)

void print_tcp_flags(uint8_t flags) {
    const char *names[] = {"FIN","SYN","RST","PSH","ACK","URG"};
    printf("Flags:");
    for (int i = 0; i < 6; i++) {
        if (flags & (1u << i))
            printf(" %s", names[i]);
    }
    printf("\n");
}

int main(void) {
    /* SYN packet */
    uint8_t syn = TCP_SYN;
    print_tcp_flags(syn);

    /* SYN-ACK response */
    uint8_t syn_ack = TCP_SYN | TCP_ACK;
    print_tcp_flags(syn_ack);

    /* Is this a SYN without ACK? */
    if ((syn_ack & (TCP_SYN | TCP_ACK)) == TCP_SYN)
        printf("Pure SYN\n");
    else
        printf("Not a pure SYN\n");

    return 0;
}
use std::fmt;

#[derive(Clone, Copy)]
struct TcpFlags(u8);

impl TcpFlags {
    const FIN: u8 = 1 << 0;
    const SYN: u8 = 1 << 1;
    const RST: u8 = 1 << 2;
    const PSH: u8 = 1 << 3;
    const ACK: u8 = 1 << 4;
    const URG: u8 = 1 << 5;

    fn new(bits: u8) -> Self { TcpFlags(bits) }
    fn has(self, flag: u8) -> bool { (self.0 & flag) != 0 }
}

impl fmt::Display for TcpFlags {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let names = [
            (Self::FIN, "FIN"), (Self::SYN, "SYN"), (Self::RST, "RST"),
            (Self::PSH, "PSH"), (Self::ACK, "ACK"), (Self::URG, "URG"),
        ];
        let mut first = true;
        for (bit, name) in &names {
            if self.0 & bit != 0 {
                if !first { write!(f, " | ")?; }
                write!(f, "{name}")?;
                first = false;
            }
        }
        Ok(())
    }
}

fn main() {
    let syn = TcpFlags::new(TcpFlags::SYN);
    println!("Flags: {syn}");

    let syn_ack = TcpFlags::new(TcpFlags::SYN | TcpFlags::ACK);
    println!("Flags: {syn_ack}");

    let is_pure_syn = syn_ack.has(TcpFlags::SYN) && !syn_ack.has(TcpFlags::ACK);
    println!("Pure SYN? {is_pure_syn}");
}

Try It: Add ECE and CWR flags (bits 6 and 7) to both the C and Rust versions. Create a flags byte with SYN | ECE | CWR -- this represents a SYN packet with ECN support.

Quick Knowledge Check

  1. You have uint32_t flags = 0; and want bits 3, 5, and 7 set. Write one expression using the defined flag constants.
  2. What is the difference between if (flags & MASK) and if ((flags & MASK) == MASK)?
  3. Why does the Linux kernel avoid C bit fields for hardware register definitions?

Common Pitfalls

  • Forgetting ~ on clear. flags &= FLAG does not clear FLAG. You need flags &= ~FLAG.
  • Using == instead of & to test. if (flags == FLAG) only matches if FLAG is the only bit set. Use if (flags & FLAG).
  • Bit field portability. Layout is compiler-defined. Never serialize bit fields.
  • Missing parentheses in macros. #define FLAG 1 << 3 without parens will cause precedence bugs: FLAG | other becomes 1 << 3 | other which is 1 << (3 | other). Always write (1u << 3).
  • Rust ! vs C ~. Remember: Rust's bitwise NOT is !, not ~. Writing ~mask in Rust is a compile error.

Alignment, Padding, and Packing

The compiler silently inserts invisible bytes into your structs. This chapter shows you exactly where, why, and how to control it. You need this knowledge for network protocols, file formats, hardware registers, and shared memory -- any time data must match an exact byte layout.

Why the Compiler Inserts Padding

CPUs access memory most efficiently when data falls on natural boundaries. A 4-byte int is fastest to read when its address is a multiple of 4. A 2-byte short wants a multiple of 2. The compiler enforces this by inserting padding bytes between struct members.

#include <stdio.h>
#include <stddef.h>

struct example {
    char   a;    /* 1 byte */
    int    b;    /* 4 bytes */
    char   c;    /* 1 byte */
};

int main(void) {
    printf("sizeof(struct example) = %zu\n", sizeof(struct example));
    printf("offsetof(a) = %zu\n", offsetof(struct example, a));
    printf("offsetof(b) = %zu\n", offsetof(struct example, b));
    printf("offsetof(c) = %zu\n", offsetof(struct example, c));
    return 0;
}

Typical output on a 64-bit system:

sizeof(struct example) = 12
offsetof(a) = 0
offsetof(b) = 4
offsetof(c) = 8

The layout with padding:

Offset:  0    1    2    3    4    5    6    7    8    9   10   11
       +----+----+----+----+----+----+----+----+----+----+----+----+
       | a  | pad| pad| pad|    b (4 bytes)    | c  | pad| pad| pad|
       +----+----+----+----+----+----+----+----+----+----+----+----+

Three bytes of padding after a to align b on a 4-byte boundary. Three bytes of trailing padding after c so that an array of these structs keeps b aligned.

The offsetof Macro

offsetof(type, member) from <stddef.h> tells you the exact byte offset of any member. It is the essential tool for verifying layout.

#include <stdio.h>
#include <stddef.h>
#include <stdint.h>

struct packet {
    uint8_t  version;
    uint16_t length;
    uint32_t sequence;
    uint8_t  flags;
};

int main(void) {
    printf("Field        Offset  Size\n");
    printf("version      %zu       %zu\n", offsetof(struct packet, version),  sizeof(uint8_t));
    printf("length       %zu       %zu\n", offsetof(struct packet, length),   sizeof(uint16_t));
    printf("sequence     %zu       %zu\n", offsetof(struct packet, sequence), sizeof(uint32_t));
    printf("flags        %zu       %zu\n", offsetof(struct packet, flags),    sizeof(uint8_t));
    printf("total size   %zu\n", sizeof(struct packet));
    return 0;
}

Likely output:

Field        Offset  Size
version      0       1
length       2       2
sequence     4       4
flags        8       1
total size   12

One byte of padding after version, three bytes of trailing padding after flags.

Reordering Fields to Minimize Padding

Simply reordering members from largest to smallest eliminates most internal padding.

#include <stdio.h>
#include <stddef.h>

struct bad_order {
    char   a;   /* 1 byte + 7 padding */
    double b;   /* 8 bytes */
    char   c;   /* 1 byte + 7 padding */
};   /* total: 24 bytes */

struct good_order {
    double b;   /* 8 bytes */
    char   a;   /* 1 byte */
    char   c;   /* 1 byte + 6 padding */
};   /* total: 16 bytes */

int main(void) {
    printf("bad_order:  %zu bytes\n", sizeof(struct bad_order));
    printf("good_order: %zu bytes\n", sizeof(struct good_order));
    return 0;
}
bad_order layout (24 bytes):
+---+-------+--------+---+-------+
| a |  pad7 | b (8)  | c |  pad7 |
+---+-------+--------+---+-------+

good_order layout (16 bytes):
+--------+---+---+------+
| b (8)  | a | c | pad6 |
+--------+---+---+------+

Driver Prep: In kernel code, struct layout matters for cache performance. Hot fields are grouped together. The pahole tool shows struct layouts including padding holes. Run pahole my_object.o on compiled code to see real layouts.

Packed Structs in C

When you need exact byte layout -- network packets, file headers, hardware registers -- you must eliminate padding entirely.

attribute((packed)) (GCC/Clang)

#include <stdio.h>
#include <stddef.h>
#include <stdint.h>

struct __attribute__((packed)) wire_header {
    uint8_t  version;
    uint16_t length;
    uint32_t sequence;
    uint8_t  flags;
};

int main(void) {
    printf("sizeof = %zu\n", sizeof(struct wire_header));  /* 8, not 12 */
    printf("offsetof(length)   = %zu\n", offsetof(struct wire_header, length));   /* 1 */
    printf("offsetof(sequence) = %zu\n", offsetof(struct wire_header, sequence)); /* 3 */
    printf("offsetof(flags)    = %zu\n", offsetof(struct wire_header, flags));    /* 7 */
    return 0;
}

Packed layout:

Offset:  0    1    2    3    4    5    6    7
       +----+----+----+----+----+----+----+----+
       |ver |  length |     sequence      |flag|
       +----+----+----+----+----+----+----+----+

#pragma pack

MSVC and GCC both support #pragma pack. It affects all structs until reset.

#include <stdio.h>
#include <stdint.h>

#pragma pack(push, 1)
struct wire_header {
    uint8_t  version;
    uint16_t length;
    uint32_t sequence;
    uint8_t  flags;
};
#pragma pack(pop)

int main(void) {
    printf("sizeof = %zu\n", sizeof(struct wire_header));  /* 8 */
    return 0;
}

Caution: Always use push/pop with #pragma pack. Forgetting pop silently packs every subsequent struct in the translation unit, causing baffling bugs.

Performance Cost of Unaligned Access

On x86, unaligned access works but is slower. On ARM and RISC-V, it can trap or silently produce wrong results depending on the configuration.

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

#define ITERATIONS 100000000

int main(void) {
    /* Aligned access */
    uint32_t aligned_val = 0;
    uint32_t *aligned_ptr = &aligned_val;

    clock_t start = clock();
    for (int i = 0; i < ITERATIONS; i++)
        *aligned_ptr = *aligned_ptr + 1;
    clock_t aligned_time = clock() - start;

    /* Unaligned access via packed struct */
    struct __attribute__((packed)) {
        uint8_t  pad;
        uint32_t val;
    } unaligned = {0, 0};

    start = clock();
    for (int i = 0; i < ITERATIONS; i++)
        unaligned.val = unaligned.val + 1;
    clock_t unaligned_time = clock() - start;

    printf("Aligned:   %ld ticks\n", (long)aligned_time);
    printf("Unaligned: %ld ticks\n", (long)unaligned_time);
    return 0;
}

Try It: Compile with -O0 and -O2 and compare the results. The compiler may generate special unaligned-access instructions at higher optimization levels. Also try on an ARM machine if you have one -- the difference may be dramatic.

Rust: repr(C)

By default, Rust makes no guarantees about struct layout. The compiler is free to reorder fields, add padding, or change layout between compilations. To get a predictable C-compatible layout, use #[repr(C)].

use std::mem;

#[repr(C)]
struct Example {
    a: u8,
    b: u32,
    c: u8,
}

fn main() {
    println!("size  = {}", mem::size_of::<Example>());
    println!("align = {}", mem::align_of::<Example>());

    let ex = Example { a: 1, b: 2, c: 3 };
    let ptr = &ex as *const Example as *const u8;

    // Use offset_of! (stabilized in Rust 1.77)
    println!("offset of a = {}", mem::offset_of!(Example, a));
    println!("offset of b = {}", mem::offset_of!(Example, b));
    println!("offset of c = {}", mem::offset_of!(Example, c));
}

Output matches the C version: size 12, offsets 0/4/8.

Rust Note: Without #[repr(C)], Rust's default repr(Rust) may reorder fields to minimize padding. This is an optimization -- but it means you cannot predict the layout. Always use repr(C) for FFI or hardware-facing structs.

Rust: repr(packed)

use std::mem;

#[repr(C, packed)]
struct WireHeader {
    version:  u8,
    length:   u16,
    sequence: u32,
    flags:    u8,
}

fn main() {
    println!("size = {}", mem::size_of::<WireHeader>());  // 8

    println!("offset version  = {}", mem::offset_of!(WireHeader, version));
    println!("offset length   = {}", mem::offset_of!(WireHeader, length));
    println!("offset sequence = {}", mem::offset_of!(WireHeader, sequence));
    println!("offset flags    = {}", mem::offset_of!(WireHeader, flags));
}

Caution: In Rust, taking a reference to a field in a packed struct is undefined behavior if the field is not naturally aligned. The compiler will refuse to create &header.sequence if it might be unaligned. You must use addr_of!(header.sequence).read_unaligned() or copy the field first.

use std::ptr::addr_of;

#[repr(C, packed)]
struct Packed {
    a: u8,
    b: u32,
}

fn main() {
    let p = Packed { a: 1, b: 0xDEADBEEF };

    // This would be UB: let r = &p.b;
    // Safe way:
    let b_val = unsafe { addr_of!(p.b).read_unaligned() };
    println!("b = 0x{b_val:08X}");
}

Rust: repr(align(N))

Force a minimum alignment, useful for cache-line alignment.

use std::mem;

#[repr(C, align(64))]
struct CacheAligned {
    counter: u64,
    data: [u8; 32],
}

fn main() {
    println!("size  = {}", mem::size_of::<CacheAligned>());   // 64
    println!("align = {}", mem::align_of::<CacheAligned>());  // 64

    let obj = CacheAligned { counter: 0, data: [0; 32] };
    let addr = &obj as *const CacheAligned as usize;
    println!("address = 0x{addr:X}");
    println!("aligned to 64? {}", addr % 64 == 0);
}

Driver Prep: Cache-line alignment prevents false sharing in concurrent code. When two threads write to different fields that share a cache line, the CPU bounces the line between cores. Aligning to 64 bytes (typical cache line) avoids this. The Linux kernel uses ____cacheline_aligned for this purpose.

Verifying Layout at Compile Time

In C, use _Static_assert (C11):

#include <stdint.h>
#include <stddef.h>

struct __attribute__((packed)) wire_msg {
    uint8_t  type;
    uint16_t length;
    uint32_t payload;
};

_Static_assert(sizeof(struct wire_msg) == 7, "wire_msg must be 7 bytes");
_Static_assert(offsetof(struct wire_msg, payload) == 3, "payload at offset 3");

int main(void) {
    return 0;
}

In Rust, use const assertions:

#[repr(C, packed)]
struct WireMsg {
    msg_type: u8,
    length:   u16,
    payload:  u32,
}

const _: () = assert!(std::mem::size_of::<WireMsg>() == 7);

fn main() {
    println!("Layout verified at compile time.");
}

A Real-World Example: ELF Header

The ELF file format begins with a fixed-layout header. Here is a partial version:

#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <string.h>

struct __attribute__((packed)) elf_ident {
    uint8_t  magic[4];   /* 0x7F 'E' 'L' 'F' */
    uint8_t  class;      /* 1=32-bit, 2=64-bit */
    uint8_t  data;       /* 1=LE, 2=BE */
    uint8_t  version;
    uint8_t  osabi;
    uint8_t  pad[8];
};

_Static_assert(sizeof(struct elf_ident) == 16, "ELF ident must be 16 bytes");

int main(void) {
    struct elf_ident ident;
    memset(&ident, 0, sizeof(ident));
    ident.magic[0] = 0x7F;
    ident.magic[1] = 'E';
    ident.magic[2] = 'L';
    ident.magic[3] = 'F';
    ident.class = 2;  /* 64-bit */
    ident.data  = 1;  /* little-endian */
    ident.version = 1;

    printf("ELF ident: ");
    uint8_t *bytes = (uint8_t *)&ident;
    for (size_t i = 0; i < sizeof(ident); i++)
        printf("%02X ", bytes[i]);
    printf("\n");
    return 0;
}
use std::mem;

#[repr(C, packed)]
struct ElfIdent {
    magic:   [u8; 4],
    class:   u8,
    data:    u8,
    version: u8,
    osabi:   u8,
    pad:     [u8; 8],
}

const _: () = assert!(mem::size_of::<ElfIdent>() == 16);

fn main() {
    let ident = ElfIdent {
        magic:   [0x7F, b'E', b'L', b'F'],
        class:   2,   // 64-bit
        data:    1,   // little-endian
        version: 1,
        osabi:   0,
        pad:     [0; 8],
    };

    let bytes: &[u8] = unsafe {
        std::slice::from_raw_parts(
            &ident as *const ElfIdent as *const u8,
            mem::size_of::<ElfIdent>(),
        )
    };

    print!("ELF ident: ");
    for b in bytes {
        print!("{b:02X} ");
    }
    println!();
}

Try It: Read the first 16 bytes of /bin/ls (or any ELF binary) into this struct and verify the magic number. In C, use fread. In Rust, use std::fs::read and slice the first 16 bytes.

Quick Knowledge Check

  1. A struct has fields u8, u32, u8 with repr(C). What is its size and why?
  2. What happens on ARM if you read a u32 from an odd address without packed access?
  3. Why does Rust refuse to let you create &packed_struct.unaligned_field?

Common Pitfalls

  • Assuming struct size equals sum of field sizes. Padding exists. Always verify with sizeof/size_of.
  • Forgetting trailing padding. The struct's total size is rounded up to its alignment so that arrays work.
  • Using packed structs everywhere. Pack only when the wire format demands it. Unpacked structs are faster.
  • Taking references to packed fields in Rust. This is UB. Use read_unaligned.
  • Forgetting repr(C). Default Rust layout is unspecified. Without repr(C), your struct will not match the C equivalent.
  • Not asserting layout. Always add static assertions for struct size when the layout must be exact. Catch mistakes at compile time, not in production.

Endianness and Byte Order

A 32-bit integer is four bytes. Which byte goes first? The answer depends on the machine -- and it matters every time data crosses a boundary: network sockets, file formats, shared memory between architectures, or talking to hardware.

Little-Endian vs Big-Endian

The value 0x01020304 stored in memory:

Little-endian (x86, ARM default, RISC-V):
Address:   0x00  0x01  0x02  0x03
Content:   0x04  0x03  0x02  0x01
           LSB                MSB      (least significant byte first)

Big-endian (network byte order, SPARC, some ARM modes):
Address:   0x00  0x01  0x02  0x03
Content:   0x01  0x02  0x03  0x04
           MSB                LSB      (most significant byte first)

The term comes from Gulliver's Travels -- which end of the egg do you crack first? In systems programming, you crack whichever end the spec says.

Why It Matters

Within a single machine, endianness is invisible. The CPU loads and stores multi-byte values in its native order, and everything just works.

Problems appear when bytes cross boundaries:

  • Network protocols. TCP/IP headers are big-endian (by convention, "network byte order"). An x86 machine must swap bytes before sending and after receiving.
  • File formats. Some use big-endian (Java .class files), some use little-endian (most Windows formats), some specify per-file (TIFF).
  • Hardware registers. PCI is little-endian. Some SoC peripherals are big-endian. The driver must know.

Seeing It In C: The Union Trick

#include <stdio.h>
#include <stdint.h>

union endian_check {
    uint32_t word;
    uint8_t  bytes[4];
};

int main(void) {
    union endian_check ec;
    ec.word = 0x01020304;

    printf("word = 0x%08X\n", ec.word);
    printf("bytes: [0]=0x%02X [1]=0x%02X [2]=0x%02X [3]=0x%02X\n",
           ec.bytes[0], ec.bytes[1], ec.bytes[2], ec.bytes[3]);

    if (ec.bytes[0] == 0x04)
        printf("Little-endian\n");
    else if (ec.bytes[0] == 0x01)
        printf("Big-endian\n");
    else
        printf("Mixed endian (?)\n");

    return 0;
}

On x86 you will see:

word = 0x01020304
bytes: [0]=0x04 [1]=0x03 [2]=0x02 [3]=0x01
Little-endian

Caution: Accessing a union through a different member than the one last written is technically undefined behavior in C99+ (it is defined in C11 as a "type-pun"). In practice, every major compiler supports it. For strictly-conforming code, use memcpy instead.

Detecting Endianness at Runtime

The union trick above is one approach. Here is another, using a pointer cast:

#include <stdio.h>
#include <stdint.h>

int is_little_endian(void) {
    uint16_t val = 1;
    uint8_t *byte = (uint8_t *)&val;
    return byte[0] == 1;
}

int main(void) {
    printf("This machine is %s-endian\n",
           is_little_endian() ? "little" : "big");
    return 0;
}

In Rust:

fn is_little_endian() -> bool {
    let val: u16 = 1;
    let bytes = val.to_ne_bytes();  // native endian
    bytes[0] == 1
}

fn main() {
    if is_little_endian() {
        println!("Little-endian");
    } else {
        println!("Big-endian");
    }
}

Rust Note: In practice you rarely need runtime detection. Rust's byte conversion methods (to_be_bytes, to_le_bytes, etc.) handle the conversion for you regardless of the host platform.

The C Conversion Functions: htons, htonl, ntohs, ntohl

POSIX provides four functions to convert between host and network byte order.

htons -- host to network, short (16-bit)
htonl -- host to network, long  (32-bit)
ntohs -- network to host, short (16-bit)
ntohl -- network to host, long  (32-bit)

On a little-endian machine, these swap bytes. On a big-endian machine, they are no-ops.

#include <stdio.h>
#include <stdint.h>
#include <arpa/inet.h>

int main(void) {
    uint16_t host_port = 8080;
    uint16_t net_port  = htons(host_port);

    printf("Host order: 0x%04X\n", host_port);   /* 0x1F90 */
    printf("Net  order: 0x%04X\n", net_port);     /* 0x901F on LE */

    uint32_t host_addr = 0xC0A80001;  /* 192.168.0.1 */
    uint32_t net_addr  = htonl(host_addr);

    printf("Host order: 0x%08X\n", host_addr);
    printf("Net  order: 0x%08X\n", net_addr);

    /* Round-trip */
    printf("Back:       0x%08X\n", ntohl(net_addr));
    return 0;
}

Compile: gcc -o endian endian.c && ./endian

What About 64-bit?

POSIX does not define htonll. You can build your own or use compiler builtins:

#include <stdio.h>
#include <stdint.h>
#include <arpa/inet.h>

uint64_t htonll(uint64_t val) {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    return __builtin_bswap64(val);
#else
    return val;
#endif
}

uint64_t ntohll(uint64_t val) {
    return htonll(val);  /* same operation -- swap is its own inverse */
}

int main(void) {
    uint64_t host_val = 0x0102030405060708ULL;
    uint64_t net_val  = htonll(host_val);

    printf("Host: 0x%016lX\n", host_val);
    printf("Net:  0x%016lX\n", net_val);
    printf("Back: 0x%016lX\n", ntohll(net_val));
    return 0;
}

Rust: to_be_bytes / to_le_bytes / from_be_bytes

Rust takes a different approach: no separate functions, just methods on integer types.

fn main() {
    let port: u16 = 8080;

    // Convert to big-endian (network order) bytes
    let net_bytes = port.to_be_bytes();
    println!("port {port} as network bytes: {:02X} {:02X}",
             net_bytes[0], net_bytes[1]);

    // Convert back
    let recovered = u16::from_be_bytes(net_bytes);
    println!("recovered: {recovered}");

    // 32-bit example
    let addr: u32 = 0xC0A80001;  // 192.168.0.1
    let net = addr.to_be_bytes();
    println!("IP as network bytes: {}.{}.{}.{}",
             net[0], net[1], net[2], net[3]);

    // 64-bit -- just works, no special function needed
    let val: u64 = 0x0102030405060708;
    let be = val.to_be_bytes();
    println!("64-bit big-endian: {:02X?}", be);
    let back = u64::from_be_bytes(be);
    println!("recovered: 0x{back:016X}");
}

The full set of methods:

.to_be_bytes()    -- convert to big-endian byte array
.to_le_bytes()    -- convert to little-endian byte array
.to_ne_bytes()    -- convert to native-endian byte array
from_be_bytes()   -- construct from big-endian bytes
from_le_bytes()   -- construct from little-endian bytes
from_ne_bytes()   -- construct from native-endian bytes

Rust Note: These methods return and consume fixed-size arrays ([u8; 2], [u8; 4], [u8; 8]), not slices. This means the conversion is zero-cost when the compiler can see both the conversion and the use -- it often just emits a bswap instruction or nothing at all.

Wire Format Patterns

When parsing a network packet, always convert from network byte order to host order. When building a packet, always convert from host to network order.

Parsing a Simple Packet Header in C

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>

struct __attribute__((packed)) msg_header {
    uint16_t msg_type;
    uint16_t msg_length;
    uint32_t sequence;
};

void parse_header(const uint8_t *data) {
    struct msg_header hdr;
    memcpy(&hdr, data, sizeof(hdr));

    /* Convert from network byte order */
    uint16_t type = ntohs(hdr.msg_type);
    uint16_t len  = ntohs(hdr.msg_length);
    uint32_t seq  = ntohl(hdr.sequence);

    printf("Type: %u  Length: %u  Seq: %u\n", type, len, seq);
}

void build_header(uint8_t *buf, uint16_t type, uint16_t len, uint32_t seq) {
    struct msg_header hdr;
    hdr.msg_type   = htons(type);
    hdr.msg_length = htons(len);
    hdr.sequence   = htonl(seq);
    memcpy(buf, &hdr, sizeof(hdr));
}

int main(void) {
    uint8_t wire[8];
    build_header(wire, 1, 128, 42);

    printf("Wire bytes: ");
    for (int i = 0; i < 8; i++)
        printf("%02X ", wire[i]);
    printf("\n");

    parse_header(wire);
    return 0;
}

Same Pattern in Rust

fn parse_header(data: &[u8]) {
    if data.len() < 8 {
        eprintln!("Short packet");
        return;
    }

    let msg_type = u16::from_be_bytes([data[0], data[1]]);
    let msg_len  = u16::from_be_bytes([data[2], data[3]]);
    let sequence = u32::from_be_bytes([data[4], data[5], data[6], data[7]]);

    println!("Type: {msg_type}  Length: {msg_len}  Seq: {sequence}");
}

fn build_header(msg_type: u16, msg_len: u16, sequence: u32) -> [u8; 8] {
    let mut buf = [0u8; 8];
    buf[0..2].copy_from_slice(&msg_type.to_be_bytes());
    buf[2..4].copy_from_slice(&msg_len.to_be_bytes());
    buf[4..8].copy_from_slice(&sequence.to_be_bytes());
    buf
}

fn main() {
    let wire = build_header(1, 128, 42);

    print!("Wire bytes: ");
    for b in &wire {
        print!("{b:02X} ");
    }
    println!();

    parse_header(&wire);
}

Try It: Extend both programs to include a uint64_t timestamp field in the header. Use htonll/ntohll in C and to_be_bytes/from_be_bytes in Rust.

Byte Swapping Internals

What does a byte swap actually do?

Original (LE):  0x04 0x03 0x02 0x01
Swapped  (BE):  0x01 0x02 0x03 0x04

A manual 32-bit swap:

#include <stdio.h>
#include <stdint.h>

uint32_t swap32(uint32_t x) {
    return ((x & 0x000000FFu) << 24)
         | ((x & 0x0000FF00u) <<  8)
         | ((x & 0x00FF0000u) >>  8)
         | ((x & 0xFF000000u) >> 24);
}

int main(void) {
    uint32_t val = 0x01020304;
    uint32_t swapped = swap32(val);
    printf("0x%08X -> 0x%08X\n", val, swapped);
    return 0;
}

Modern compilers recognize this pattern and emit a single bswap instruction. You can also use the builtins directly:

/* GCC/Clang */
uint16_t s = __builtin_bswap16(0x0102);  /* 0x0201 */
uint32_t w = __builtin_bswap32(0x01020304);
uint64_t d = __builtin_bswap64(0x0102030405060708ULL);

In Rust:

fn main() {
    let val: u32 = 0x01020304;
    let swapped = val.swap_bytes();
    println!("0x{val:08X} -> 0x{swapped:08X}");

    // Also available on u16, u64, u128, i32, etc.
    let s: u16 = 0x0102;
    println!("0x{s:04X} -> 0x{:04X}", s.swap_bytes());
}

Endianness in Structs: A Complete Example

Suppose a sensor sends data in big-endian format. Here is how you would parse it.

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>

struct __attribute__((packed)) sensor_reading {
    uint8_t  sensor_id;
    uint16_t temperature;   /* big-endian, units: 0.1 deg C */
    uint32_t timestamp;     /* big-endian, Unix epoch */
};

void decode_reading(const uint8_t *raw) {
    struct sensor_reading r;
    memcpy(&r, raw, sizeof(r));

    uint8_t  id   = r.sensor_id;
    uint16_t temp  = ntohs(r.temperature);
    uint32_t ts    = ntohl(r.timestamp);

    printf("Sensor %u: %.1f C at t=%u\n", id, temp / 10.0, ts);
}

int main(void) {
    /* Simulated wire data: sensor 3, 25.6 C (256 = 0x0100), ts=1000 */
    uint8_t wire[] = {
        0x03,             /* sensor_id */
        0x01, 0x00,       /* temperature = 256 (big-endian) */
        0x00, 0x00, 0x03, 0xE8  /* timestamp = 1000 (big-endian) */
    };

    decode_reading(wire);
    return 0;
}
fn decode_reading(raw: &[u8]) {
    if raw.len() < 7 {
        eprintln!("Short reading");
        return;
    }

    let id   = raw[0];
    let temp  = u16::from_be_bytes([raw[1], raw[2]]);
    let ts    = u32::from_be_bytes([raw[3], raw[4], raw[5], raw[6]]);

    println!("Sensor {id}: {:.1} C at t={ts}", temp as f64 / 10.0);
}

fn main() {
    let wire: &[u8] = &[
        0x03,
        0x01, 0x00,
        0x00, 0x00, 0x03, 0xE8,
    ];

    decode_reading(wire);
}

Driver Prep: PCI and PCIe are little-endian by specification. When your driver reads a register on an x86 host, no swapping is needed. But some SoC buses are big-endian, and the kernel provides ioread32be / iowrite32be for those. Always check the hardware manual for the device's byte order.

Mixed Endianness in the Wild

Some formats mix endianness. The classic example: the ELF file format. The ELF identification bytes specify the endianness of the rest of the file.

Byte 5 of ELF header (e_ident[EI_DATA]):
  1 = ELFDATA2LSB (little-endian)
  2 = ELFDATA2MSB (big-endian)

Your parser must read this byte first, then decide how to interpret all subsequent multi-byte fields.

#include <stdio.h>
#include <stdint.h>
#include <string.h>

uint16_t read_u16(const uint8_t *p, int big_endian) {
    if (big_endian)
        return ((uint16_t)p[0] << 8) | p[1];
    else
        return ((uint16_t)p[1] << 8) | p[0];
}

uint32_t read_u32(const uint8_t *p, int big_endian) {
    if (big_endian)
        return ((uint32_t)p[0] << 24) | ((uint32_t)p[1] << 16)
             | ((uint32_t)p[2] << 8)  | p[3];
    else
        return ((uint32_t)p[3] << 24) | ((uint32_t)p[2] << 16)
             | ((uint32_t)p[1] << 8)  | p[0];
}

int main(void) {
    uint8_t be_data[] = {0x00, 0x01, 0x00, 0x02};
    uint8_t le_data[] = {0x01, 0x00, 0x02, 0x00};

    printf("BE u16: %u\n", read_u16(be_data, 1));      /* 1 */
    printf("LE u16: %u\n", read_u16(le_data, 0));      /* 1 */
    printf("BE u32: %u\n", read_u32(be_data, 1));      /* 65538 */
    printf("LE u32: %u\n", read_u32(le_data, 0));      /* 131073 */
    return 0;
}
fn read_u16(p: &[u8], big_endian: bool) -> u16 {
    if big_endian {
        u16::from_be_bytes([p[0], p[1]])
    } else {
        u16::from_le_bytes([p[0], p[1]])
    }
}

fn read_u32(p: &[u8], big_endian: bool) -> u32 {
    if big_endian {
        u32::from_be_bytes([p[0], p[1], p[2], p[3]])
    } else {
        u32::from_le_bytes([p[0], p[1], p[2], p[3]])
    }
}

fn main() {
    let be_data = [0x00u8, 0x01, 0x00, 0x02];
    let le_data = [0x01u8, 0x00, 0x02, 0x00];

    println!("BE u16: {}", read_u16(&be_data, true));
    println!("LE u16: {}", read_u16(&le_data, false));
    println!("BE u32: {}", read_u32(&be_data, true));
    println!("LE u32: {}", read_u32(&le_data, false));
}

Quick Knowledge Check

  1. On a little-endian machine, uint32_t x = 1; -- what is the value of the byte at ((uint8_t *)&x)[0]? What about [3]?
  2. A protocol spec says the port field is "in network byte order." You receive bytes 0x1F 0x90. What is the port number?
  3. You see htonl(INADDR_ANY) in network code. INADDR_ANY is 0. Does htonl change it? Why or why not?

Common Pitfalls

  • Forgetting to convert. The most common network bug: sending uint32_t without htonl. It works on big-endian machines, fails on little-endian, and the test server was big-endian.
  • Double-converting. Calling htonl on a value that is already in network order swaps it back. Convert exactly once.
  • Assuming endianness. Your code might run on ARM big-endian someday. Always use explicit conversions for portable code.
  • Casting instead of memcpy. *(uint32_t *)buf is an alignment violation if buf is not 4-byte aligned. Use memcpy or from_be_bytes.
  • No 64-bit POSIX function. htonll is not standard. Roll your own or use __builtin_bswap64.

Volatile, Type Punning, and Hardware Access Patterns

When your code talks directly to hardware, two things break: the compiler's assumptions about memory, and the type system's assumptions about data. This chapter covers the volatile keyword, type punning, strict aliasing, and the register access patterns used in embedded and driver code.

The Problem: The Compiler Is Too Smart

Compilers optimize aggressively. They assume that if no code writes to a variable, its value does not change. They assume that writing to a variable that is never read afterward is dead code. Both assumptions are wrong when hardware is involved.

#include <stdio.h>
#include <stdint.h>

/* Simulating a hardware status register */
static uint32_t fake_hw_register = 0;

void wait_for_ready_broken(void) {
    uint32_t *status = &fake_hw_register;

    /* BUG: compiler sees *status never changes in this loop */
    /* At -O2, this becomes an infinite loop or is removed entirely */
    while ((*status & 0x01) == 0) {
        /* spin */
    }
}

int main(void) {
    printf("This function has a bug -- see the source.\n");
    /* Do NOT call wait_for_ready_broken -- it will hang */
    return 0;
}

At -O2, the compiler loads *status once, sees it is zero, and generates an infinite loop -- or removes the loop as dead code. The compiler does not know that hardware can change the value behind its back.

The volatile Keyword

volatile tells the compiler: do not optimize away accesses to this variable. Read it every time the code says to read it. Write it every time the code says to write it. In the order the code specifies.

#include <stdio.h>
#include <stdint.h>

static volatile uint32_t fake_hw_status = 0;

void wait_for_ready(void) {
    /* volatile forces a real memory read on every iteration */
    while ((fake_hw_status & 0x01) == 0) {
        /* spin -- compiler MUST re-read fake_hw_status each time */
    }
}

int main(void) {
    printf("volatile prevents the compiler from caching the read.\n");
    /* Still do not call wait_for_ready in this demo -- there is no */
    /* other thread or hardware changing the value.                 */
    return 0;
}

What volatile Does and Does NOT Do

volatile guarantees:

  • Every read in the source produces a load instruction
  • Every write in the source produces a store instruction
  • Reads and writes to the same volatile variable are not reordered relative to each other

volatile does NOT guarantee:

  • Atomicity -- a 64-bit volatile read on a 32-bit CPU may tear
  • Memory ordering between different variables (use memory barriers for that)
  • Thread safety (use _Atomic or stdatomic.h for threads)

Caution: volatile is NOT a substitute for atomic operations in multithreaded code. In C, use _Atomic. In Rust, use std::sync::atomic. volatile is for hardware registers and memory-mapped I/O only.

Memory-Mapped Hardware Registers

Real hardware appears as addresses in the CPU's memory map. Reading or writing those addresses talks to the device.

Physical memory map (simplified):
0x0000_0000 - 0x3FFF_FFFF   RAM
0x4000_0000 - 0x4000_00FF   UART registers
0x4000_0100 - 0x4000_01FF   GPIO registers
0x4000_0200 - 0x4000_02FF   Timer registers

A typical register block for a UART:

Offset  Register    Access
0x00    DATA        R/W    (read = receive, write = transmit)
0x04    STATUS      R      (bit 0 = TX ready, bit 1 = RX data available)
0x08    CONTROL     R/W    (bit 0 = enable, bit 1 = interrupt enable)
0x0C    BAUD_DIV    R/W    (baud rate divisor)

Accessing Registers in C

#include <stdio.h>
#include <stdint.h>

/* In real code, UART_BASE comes from device tree or platform header */
/* Here we simulate with a static array */
static uint32_t simulated_uart[4] = {0, 0x03, 0, 0};

#define UART_BASE    ((volatile uint32_t *)simulated_uart)
#define UART_DATA    (UART_BASE[0])
#define UART_STATUS  (UART_BASE[1])
#define UART_CONTROL (UART_BASE[2])
#define UART_BAUD    (UART_BASE[3])

#define STATUS_TX_READY  (1u << 0)
#define STATUS_RX_AVAIL  (1u << 1)
#define CTRL_ENABLE      (1u << 0)
#define CTRL_IRQ_EN      (1u << 1)

void uart_init(uint32_t baud_divisor) {
    UART_BAUD    = baud_divisor;
    UART_CONTROL = CTRL_ENABLE;
}

void uart_send(uint8_t byte) {
    while (!(UART_STATUS & STATUS_TX_READY)) {
        /* spin -- volatile ensures re-read */
    }
    UART_DATA = byte;
}

int main(void) {
    uart_init(26);  /* e.g., 115200 baud */

    /* STATUS already has TX_READY set in our simulation */
    uart_send('H');

    printf("Sent 'H' (0x%02X) to simulated UART\n",
           (unsigned)simulated_uart[0]);
    printf("CONTROL = 0x%08X\n", (unsigned)simulated_uart[2]);
    printf("BAUD    = %u\n", (unsigned)simulated_uart[3]);
    return 0;
}

Driver Prep: In the Linux kernel, you never access physical addresses directly. The kernel provides ioremap() to map physical addresses into kernel virtual space, and readl()/writel() to perform volatile MMIO reads/writes with proper barriers. The pattern is: void __iomem *base = ioremap(phys, size); then val = readl(base + OFFSET);.

Type Punning in C

Type punning means reinterpreting the bytes of one type as another. There are three ways to do it in C, and two of them are problematic.

Method 1: Pointer Cast (Dangerous)

#include <stdio.h>
#include <stdint.h>

int main(void) {
    float f = 3.14f;
    uint32_t *p = (uint32_t *)&f;   /* strict aliasing violation! */
    printf("float 3.14 as uint32: 0x%08X\n", *p);
    return 0;
}

This compiles and works on most compilers with default settings. But it violates the strict aliasing rule and is technically undefined behavior.

Method 2: Union (Common, Practical)

#include <stdio.h>
#include <stdint.h>

union float_bits {
    float    f;
    uint32_t u;
};

int main(void) {
    union float_bits fb;
    fb.f = 3.14f;
    printf("float 3.14 as uint32: 0x%08X\n", fb.u);

    /* Inspect the IEEE 754 parts */
    uint32_t sign     = (fb.u >> 31) & 1;
    uint32_t exponent = (fb.u >> 23) & 0xFF;
    uint32_t mantissa = fb.u & 0x7FFFFF;
    printf("sign=%u  exp=%u  mantissa=0x%06X\n", sign, exponent, mantissa);
    return 0;
}

Caution: Union type-punning is well-defined in C11 (6.5.2.3) but NOT in C++. If you write code that must compile as both C and C++, use memcpy.

Method 3: memcpy (Always Correct)

#include <stdio.h>
#include <stdint.h>
#include <string.h>

int main(void) {
    float f = 3.14f;
    uint32_t u;
    memcpy(&u, &f, sizeof(u));
    printf("float 3.14 as uint32: 0x%08X\n", u);

    /* Round-trip */
    float f2;
    memcpy(&f2, &u, sizeof(f2));
    printf("back to float: %f\n", f2);
    return 0;
}

memcpy is the only method that is correct under all standards, all compilers, and all optimization levels. Modern compilers optimize small memcpy calls into register moves -- there is no performance penalty.

The Strict Aliasing Rule

The strict aliasing rule (C11 6.5 paragraph 7) says: you may only access an object through a pointer to a compatible type, a character type, or a signed/unsigned variant of its declared type.

#include <stdio.h>
#include <stdint.h>

/* This violates strict aliasing: */
void bad_example(void) {
    int x = 42;
    float *fp = (float *)&x;  /* int* -> float*: VIOLATION */
    /* Reading *fp is undefined behavior */
    printf("%f\n", *fp);  /* compiler may return garbage at -O2 */
}

/* This is fine -- char* can alias anything: */
void ok_example(void) {
    int x = 42;
    unsigned char *cp = (unsigned char *)&x;
    for (size_t i = 0; i < sizeof(x); i++)
        printf("%02X ", cp[i]);
    printf("\n");
}

int main(void) {
    ok_example();
    /* bad_example();  -- do not rely on this */
    return 0;
}

GCC's -fstrict-aliasing (enabled at -O2 and above) lets the compiler assume the rule is followed. Violations cause real, baffling, optimization-dependent bugs.

$ gcc -O0 -o alias alias.c    # might "work"
$ gcc -O2 -o alias alias.c    # might break -- UB
$ gcc -O2 -fno-strict-aliasing -o alias alias.c  # disables the optimization

Caution: The Linux kernel compiles with -fno-strict-aliasing because kernel code routinely casts between pointer types. This is a pragmatic choice -- not a license to ignore aliasing in your own code.

Rust: read_volatile / write_volatile

Rust has no volatile keyword. Instead, it provides two functions in std::ptr:

use std::ptr;

fn main() {
    let mut hw_reg: u32 = 0;

    // Volatile write
    unsafe {
        ptr::write_volatile(&mut hw_reg as *mut u32, 0xDEAD_BEEF);
    }

    // Volatile read
    let val = unsafe {
        ptr::read_volatile(&hw_reg as *const u32)
    };

    println!("Register value: 0x{val:08X}");
}

Rust Note: read_volatile and write_volatile are unsafe because they take raw pointers. The volatility is a property of the access, not the variable. This is more precise than C's model, where volatility is part of the type.

Modeling Hardware Registers in Rust

In idiomatic Rust, you wrap register access in a struct that encapsulates the unsafe volatile operations.

use std::ptr;

/// A read-write hardware register at a fixed memory address.
struct Register {
    addr: *mut u32,
}

impl Register {
    /// # Safety
    /// `addr` must point to a valid, mapped hardware register.
    unsafe fn new(addr: *mut u32) -> Self {
        Register { addr }
    }

    fn read(&self) -> u32 {
        unsafe { ptr::read_volatile(self.addr) }
    }

    fn write(&self, val: u32) {
        unsafe { ptr::write_volatile(self.addr, val) }
    }

    fn set_bits(&self, mask: u32) {
        let old = self.read();
        self.write(old | mask);
    }

    fn clear_bits(&self, mask: u32) {
        let old = self.read();
        self.write(old & !mask);
    }

    fn read_field(&self, mask: u32, shift: u32) -> u32 {
        (self.read() & mask) >> shift
    }

    fn write_field(&self, mask: u32, shift: u32, val: u32) {
        let old = self.read() & !mask;
        self.write(old | ((val << shift) & mask));
    }
}

// Demonstration using a simulated register
fn main() {
    let mut simulated_reg: u32 = 0;

    let reg = unsafe { Register::new(&mut simulated_reg as *mut u32) };

    reg.write(0x0000_0000);
    reg.set_bits(0x01);              // enable
    reg.write_field(0x0E, 1, 5);     // set mode field [3:1] = 5

    println!("Register = 0x{:08X}", reg.read());
    println!("Mode     = {}", reg.read_field(0x0E, 1));
}

Read-Only and Write-Only Registers

Some registers must not be written (status registers), and some must not be read (command/data FIFOs where reading has side effects). Encode this in the type system.

use std::ptr;
use std::marker::PhantomData;

struct ReadOnly;
struct WriteOnly;
struct ReadWrite;

struct Reg<MODE> {
    addr: *mut u32,
    _mode: PhantomData<MODE>,
}

impl<MODE> Reg<MODE> {
    unsafe fn new(addr: *mut u32) -> Self {
        Reg { addr, _mode: PhantomData }
    }
}

impl Reg<ReadOnly> {
    fn read(&self) -> u32 {
        unsafe { ptr::read_volatile(self.addr) }
    }
    // No write method -- compile error if you try
}

impl Reg<WriteOnly> {
    fn write(&self, val: u32) {
        unsafe { ptr::write_volatile(self.addr, val) }
    }
    // No read method
}

impl Reg<ReadWrite> {
    fn read(&self) -> u32 {
        unsafe { ptr::read_volatile(self.addr) }
    }
    fn write(&self, val: u32) {
        unsafe { ptr::write_volatile(self.addr, val) }
    }
}

fn main() {
    let mut status_mem: u32 = 0x42;
    let mut data_mem: u32 = 0;
    let mut ctrl_mem: u32 = 0;

    let status: Reg<ReadOnly>  = unsafe { Reg::new(&mut status_mem) };
    let data:   Reg<WriteOnly> = unsafe { Reg::new(&mut data_mem) };
    let ctrl:   Reg<ReadWrite> = unsafe { Reg::new(&mut ctrl_mem) };

    println!("Status = 0x{:02X}", status.read());
    // status.write(0);  // COMPILE ERROR -- ReadOnly has no write()

    data.write(0xFF);
    // data.read();  // COMPILE ERROR -- WriteOnly has no read()

    ctrl.write(0x01);
    println!("Ctrl = 0x{:02X}", ctrl.read());
}

Driver Prep: The Rust embedded ecosystem (cortex-m, svd2rust) generates register access code with exactly this pattern. The SVD file from the chip vendor describes which registers are read-only, write-only, or read-write, and the generated code enforces it at compile time.

Type Punning in Rust

Rust does not have unions in the C sense (Rust has union but accessing fields is unsafe). The idiomatic approach is transmute or byte-level methods.

Using transmute

fn main() {
    let f: f32 = 3.14;
    let bits: u32 = unsafe { std::mem::transmute(f) };
    println!("f32 3.14 as u32: 0x{bits:08X}");

    let sign     = (bits >> 31) & 1;
    let exponent = (bits >> 23) & 0xFF;
    let mantissa = bits & 0x7F_FFFF;
    println!("sign={sign}  exp={exponent}  mantissa=0x{mantissa:06X}");

    // Round-trip
    let f2: f32 = unsafe { std::mem::transmute(bits) };
    println!("back to f32: {f2}");
}

Caution: transmute is extremely unsafe. The source and destination types must have the same size (checked at compile time) but the compiler cannot verify that the bit pattern is valid for the destination type. Prefer safer alternatives when they exist.

Using to_bits / from_bits (Preferred)

fn main() {
    let f: f32 = 3.14;
    let bits = f.to_bits();
    println!("f32 3.14 as u32: 0x{bits:08X}");

    let f2 = f32::from_bits(bits);
    println!("back to f32: {f2}");

    // For f64 <-> u64:
    let d: f64 = 2.718281828;
    let dbits = d.to_bits();
    println!("f64 as u64: 0x{dbits:016X}");
}

to_bits() and from_bits() are safe, stable, and produce the same code as transmute. Always prefer them for float/integer conversions.

The Register Access Pattern for Embedded/Driver Code

Putting it all together: a complete register block definition as used in real embedded Rust.

use std::ptr;

/// UART register block starting at a base address.
struct Uart {
    base: *mut u8,
}

impl Uart {
    const DATA_OFF:    usize = 0x00;
    const STATUS_OFF:  usize = 0x04;
    const CONTROL_OFF: usize = 0x08;
    const BAUD_OFF:    usize = 0x0C;

    const STATUS_TX_READY: u32 = 1 << 0;
    const STATUS_RX_AVAIL: u32 = 1 << 1;
    const CTRL_ENABLE:     u32 = 1 << 0;

    /// # Safety
    /// `base` must point to a mapped UART register block.
    unsafe fn new(base: *mut u8) -> Self {
        Uart { base }
    }

    fn read_reg(&self, offset: usize) -> u32 {
        unsafe {
            ptr::read_volatile(self.base.add(offset) as *const u32)
        }
    }

    fn write_reg(&self, offset: usize, val: u32) {
        unsafe {
            ptr::write_volatile(self.base.add(offset) as *mut u32, val);
        }
    }

    fn init(&self, baud_divisor: u32) {
        self.write_reg(Self::BAUD_OFF, baud_divisor);
        self.write_reg(Self::CONTROL_OFF, Self::CTRL_ENABLE);
    }

    fn send_byte(&self, byte: u8) {
        while (self.read_reg(Self::STATUS_OFF) & Self::STATUS_TX_READY) == 0 {
            // spin
        }
        self.write_reg(Self::DATA_OFF, byte as u32);
    }

    fn try_recv(&self) -> Option<u8> {
        if (self.read_reg(Self::STATUS_OFF) & Self::STATUS_RX_AVAIL) != 0 {
            Some(self.read_reg(Self::DATA_OFF) as u8)
        } else {
            None
        }
    }
}

fn main() {
    // Simulate a register block in memory
    let mut regs = [0u32; 4];
    regs[1] = 0x01;  // STATUS: TX_READY set

    let uart = unsafe { Uart::new(regs.as_mut_ptr() as *mut u8) };
    uart.init(26);
    uart.send_byte(b'R');

    println!("DATA    = 0x{:08X}", regs[0]);  // 'R' = 0x52
    println!("CONTROL = 0x{:08X}", regs[2]);  // ENABLE = 0x01
    println!("BAUD    = {}", regs[3]);          // 26
}

This pattern -- base pointer plus offsets, volatile reads/writes, bit masks for fields -- is the foundation of every hardware driver, whether you write it in C or Rust.

Try It: Add an interrupt-enable bit to the CONTROL register (bit 1). Write a method enable_interrupts(&self) that sets bit 1 without clearing bit 0. This is the read-modify-write pattern that every driver uses.

Quick Knowledge Check

  1. What happens if you remove volatile from a hardware status register poll loop and compile at -O2?
  2. In C, why is memcpy preferred over pointer casts for type punning?
  3. Why does Rust make read_volatile / write_volatile unsafe, when C just uses a type qualifier?

Common Pitfalls

  • Using volatile for thread synchronization. It does not provide atomicity or memory ordering between threads. Use atomics.
  • Forgetting volatile on MMIO. The compiler will optimize your register writes away. One missing volatile can make a device non-functional.
  • Read-modify-write races. reg |= BIT is read, modify, write. If an interrupt fires between the read and write, the change is lost. Use spin locks in kernel code.
  • Strict aliasing violations. Pointer casts between unrelated types are UB at -O2. Use memcpy.
  • transmute misuse in Rust. If the bit pattern is invalid for the target type (e.g., transmuting 2u8 to bool), it is instant UB. Prefer to_bits() or TryFrom.
  • Assuming volatile ordering across variables. volatile orders accesses to the same variable only. Use compiler/kernel barrier macros for cross-variable ordering.

Data Structures in C and Rust

Every systems program is a data structure program. This chapter builds the classics by hand in C -- linked lists, hash tables, trees, stacks, queues -- then shows how Rust's standard library replaces most of that labor.

The Textbook Linked List

The singly linked list is the "hello world" of dynamic data structures. A node holds data plus a pointer to the next node. The list ends when next is NULL.

/* slist.c -- singly linked list in C */
#include <stdio.h>
#include <stdlib.h>

struct node {
    int data;
    struct node *next;
};

/* Prepend a new node to the front of the list. */
struct node *list_push(struct node *head, int value)
{
    struct node *n = malloc(sizeof(*n));
    if (!n) {
        perror("malloc");
        exit(1);
    }
    n->data  = value;
    n->next  = head;
    return n;
}

/* Print every element. */
void list_print(const struct node *head)
{
    for (const struct node *cur = head; cur; cur = cur->next)
        printf("%d -> ", cur->data);
    printf("NULL\n");
}

/* Free every node. */
void list_free(struct node *head)
{
    while (head) {
        struct node *tmp = head;
        head = head->next;
        free(tmp);
    }
}

int main(void)
{
    struct node *list = NULL;
    for (int i = 1; i <= 5; i++)
        list = list_push(list, i);

    list_print(list);   /* 5 -> 4 -> 3 -> 2 -> 1 -> NULL */
    list_free(list);
    return 0;
}

Memory layout after pushing 3, 2, 1:

  head
   |
   v
 +---+---+    +---+---+    +---+---+
 | 1 | *-+--->| 2 | *-+--->| 3 | / |
 +---+---+    +---+---+    +---+---+
  data next    data next    data next (NULL)

Caution: Every malloc must pair with exactly one free. Forget one -- memory leak. Free twice -- undefined behavior and likely a crash.

Try It: Add a list_find function that returns a pointer to the first node whose data equals a given value, or NULL if not found. Then add a list_remove function that unlinks and frees that node.

The Kernel Way: Intrusive Lists

The textbook list above embeds data inside the list node. The Linux kernel flips this: it embeds a list node inside the data struct. This is called an intrusive list.

Textbook:                     Kernel (intrusive):

 struct node {                struct task_info {
     int data;       <--+         int pid;
     struct node *next;  |        char name[16];
 };                      |        struct list_head tasks;  <-- just prev/next
                         |    };
                         |
              data lives      data owns the link
              inside node     node lives inside data

The kernel's struct list_head is simply:

struct list_head {
    struct list_head *next;
    struct list_head *prev;
};

It is a doubly linked, circular list. The magic happens when you need to get back from the embedded list_head to the enclosing struct.

container_of and offsetof

offsetof(type, member) returns the byte offset of member within type. container_of subtracts that offset from a pointer to the member to recover the parent struct.

/* container_of.c -- demonstrate container_of */
#include <stdio.h>
#include <stddef.h>

#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

struct list_head {
    struct list_head *next;
    struct list_head *prev;
};

struct task_info {
    int pid;
    char name[16];
    struct list_head link;
};

int main(void)
{
    struct task_info t = { .pid = 42, .name = "init" };

    /* Given only a pointer to the embedded link ... */
    struct list_head *lh = &t.link;

    /* ... recover the enclosing task_info. */
    struct task_info *owner = container_of(lh, struct task_info, link);

    printf("pid = %d, name = %s\n", owner->pid, owner->name);

    printf("offsetof(task_info, link) = %zu\n",
           offsetof(struct task_info, link));
    return 0;
}
struct task_info layout:

 byte 0        byte 4        byte 20
 +------+------+-----------+----------+-----------+
 | pid  | pad  |  name[16] | link.next| link.prev |
 +------+------+-----------+----------+-----------+
 ^                          ^
 |                          |
 owner              lh points here
                    owner = lh - offsetof(..., link)

Driver Prep: The kernel uses container_of thousands of times. When you write a driver, your struct my_device embeds a struct list_head for the subsystem's device list, and you use container_of to recover it.

Hash Table with Chaining

A hash table maps keys to buckets. Collisions are handled by chaining -- each bucket is a linked list.

/* hashtable.c -- simple chained hash table */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NBUCKETS 16

struct entry {
    char *key;
    int   value;
    struct entry *next;
};

struct hashtable {
    struct entry *buckets[NBUCKETS];
};

static unsigned hash(const char *s)
{
    unsigned h = 5381;
    while (*s)
        h = h * 33 + (unsigned char)*s++;
    return h % NBUCKETS;
}

void ht_insert(struct hashtable *ht, const char *key, int value)
{
    unsigned idx = hash(key);
    struct entry *e = malloc(sizeof(*e));
    if (!e) { perror("malloc"); exit(1); }
    e->key   = strdup(key);
    e->value = value;
    e->next  = ht->buckets[idx];
    ht->buckets[idx] = e;
}

int ht_lookup(struct hashtable *ht, const char *key, int *out)
{
    unsigned idx = hash(key);
    for (struct entry *e = ht->buckets[idx]; e; e = e->next) {
        if (strcmp(e->key, key) == 0) {
            *out = e->value;
            return 1;   /* found */
        }
    }
    return 0;  /* not found */
}

void ht_free(struct hashtable *ht)
{
    for (int i = 0; i < NBUCKETS; i++) {
        struct entry *e = ht->buckets[i];
        while (e) {
            struct entry *tmp = e;
            e = e->next;
            free(tmp->key);
            free(tmp);
        }
    }
}

int main(void)
{
    struct hashtable ht = {0};

    ht_insert(&ht, "alice", 100);
    ht_insert(&ht, "bob",   200);
    ht_insert(&ht, "carol", 300);

    int val;
    if (ht_lookup(&ht, "bob", &val))
        printf("bob => %d\n", val);

    if (!ht_lookup(&ht, "dave", &val))
        printf("dave not found\n");

    ht_free(&ht);
    return 0;
}
  buckets[0] -> NULL
  buckets[1] -> NULL
  buckets[2] -> [alice|100] -> NULL
  ...
  buckets[7] -> [carol|300] -> [bob|200] -> NULL
  ...
  buckets[15] -> NULL

Try It: Add an ht_delete function that removes a key-value pair and frees its memory. Watch out for the head-of-list special case.

Binary Search Tree

/* bst.c -- binary search tree */
#include <stdio.h>
#include <stdlib.h>

struct bst_node {
    int data;
    struct bst_node *left;
    struct bst_node *right;
};

struct bst_node *bst_insert(struct bst_node *root, int value)
{
    if (!root) {
        struct bst_node *n = malloc(sizeof(*n));
        if (!n) { perror("malloc"); exit(1); }
        n->data  = value;
        n->left  = NULL;
        n->right = NULL;
        return n;
    }
    if (value < root->data)
        root->left  = bst_insert(root->left, value);
    else if (value > root->data)
        root->right = bst_insert(root->right, value);
    return root;
}

void bst_inorder(const struct bst_node *root)
{
    if (!root) return;
    bst_inorder(root->left);
    printf("%d ", root->data);
    bst_inorder(root->right);
}

void bst_free(struct bst_node *root)
{
    if (!root) return;
    bst_free(root->left);
    bst_free(root->right);
    free(root);
}

int main(void)
{
    struct bst_node *tree = NULL;
    int vals[] = {5, 3, 7, 1, 4, 6, 8};

    for (int i = 0; i < 7; i++)
        tree = bst_insert(tree, vals[i]);

    printf("In-order: ");
    bst_inorder(tree);   /* 1 2 3 4 5 6 7 8 */
    printf("\n");

    bst_free(tree);
    return 0;
}
         5
        / \
       3   7
      / \ / \
     1  4 6  8

Stack and Queue from Scratch

A stack is last-in-first-out. A queue is first-in-first-out. Both can be built on a linked list or on a contiguous array.

/* stack_queue.c -- array-based stack and queue */
#include <stdio.h>
#include <stdlib.h>

/* ---- Stack (LIFO) ---- */
struct stack {
    int *data;
    int  top;
    int  cap;
};

struct stack stack_new(int cap)
{
    struct stack s;
    s.data = malloc(cap * sizeof(int));
    if (!s.data) { perror("malloc"); exit(1); }
    s.top = 0;
    s.cap = cap;
    return s;
}

void stack_push(struct stack *s, int val)
{
    if (s->top >= s->cap) {
        fprintf(stderr, "stack overflow\n");
        exit(1);
    }
    s->data[s->top++] = val;
}

int stack_pop(struct stack *s)
{
    if (s->top == 0) {
        fprintf(stderr, "stack underflow\n");
        exit(1);
    }
    return s->data[--s->top];
}

void stack_free(struct stack *s) { free(s->data); }

/* ---- Queue (FIFO, circular buffer) ---- */
struct queue {
    int *data;
    int  head;
    int  tail;
    int  count;
    int  cap;
};

struct queue queue_new(int cap)
{
    struct queue q;
    q.data  = malloc(cap * sizeof(int));
    if (!q.data) { perror("malloc"); exit(1); }
    q.head  = 0;
    q.tail  = 0;
    q.count = 0;
    q.cap   = cap;
    return q;
}

void queue_enqueue(struct queue *q, int val)
{
    if (q->count >= q->cap) {
        fprintf(stderr, "queue full\n");
        exit(1);
    }
    q->data[q->tail] = val;
    q->tail = (q->tail + 1) % q->cap;
    q->count++;
}

int queue_dequeue(struct queue *q)
{
    if (q->count == 0) {
        fprintf(stderr, "queue empty\n");
        exit(1);
    }
    int val = q->data[q->head];
    q->head = (q->head + 1) % q->cap;
    q->count--;
    return val;
}

void queue_free(struct queue *q) { free(q->data); }

int main(void)
{
    /* Stack demo */
    struct stack s = stack_new(8);
    stack_push(&s, 10);
    stack_push(&s, 20);
    stack_push(&s, 30);
    printf("stack pop: %d\n", stack_pop(&s));  /* 30 */
    printf("stack pop: %d\n", stack_pop(&s));  /* 20 */
    stack_free(&s);

    /* Queue demo */
    struct queue q = queue_new(8);
    queue_enqueue(&q, 10);
    queue_enqueue(&q, 20);
    queue_enqueue(&q, 30);
    printf("queue deq: %d\n", queue_dequeue(&q));  /* 10 */
    printf("queue deq: %d\n", queue_dequeue(&q));  /* 20 */
    queue_free(&q);

    return 0;
}
  Stack (after push 10, 20, 30):

  top=3
   |
   v
  [10][20][30][ ][ ][ ][ ][ ]
   0    1   2   3  4  5  6  7

  Queue (circular buffer, after enqueue 10, 20, 30):

  head=0           tail=3
   |                |
   v                v
  [10][20][30][ ][ ][ ][ ][ ]
   0    1   2   3  4  5  6  7

Rust: The Standard Library Does the Heavy Lifting

Rust ships Vec, VecDeque, HashMap, BTreeMap, LinkedList, and more in std::collections. You rarely build these from scratch.

// rust_collections.rs
use std::collections::{VecDeque, HashMap, BTreeMap};

fn main() {
    // Vec -- growable array, also works as a stack
    let mut stack: Vec<i32> = Vec::new();
    stack.push(10);
    stack.push(20);
    stack.push(30);
    println!("stack pop: {:?}", stack.pop());  // Some(30)
    println!("stack pop: {:?}", stack.pop());  // Some(20)

    // VecDeque -- double-ended queue (ring buffer internally)
    let mut queue: VecDeque<i32> = VecDeque::new();
    queue.push_back(10);
    queue.push_back(20);
    queue.push_back(30);
    println!("queue deq: {:?}", queue.pop_front());  // Some(10)
    println!("queue deq: {:?}", queue.pop_front());  // Some(20)

    // HashMap
    let mut map = HashMap::new();
    map.insert("alice", 100);
    map.insert("bob", 200);
    map.insert("carol", 300);
    if let Some(val) = map.get("bob") {
        println!("bob => {}", val);
    }

    // BTreeMap -- sorted by keys
    let mut btree = BTreeMap::new();
    btree.insert(5, "five");
    btree.insert(3, "three");
    btree.insert(7, "seven");
    for (k, v) in &btree {
        println!("{}: {}", k, v);  // printed in sorted order
    }
}

Rust Note: Rust's Vec is not a linked list -- it is a contiguous, growable array. This is almost always what you want. LinkedList exists in std::collections but is rarely the right choice because of poor cache locality.

Side-by-Side: C Linked List vs Rust Vec

OperationC (manual linked list)Rust (Vec<T>)
Createnode *head = NULL;let mut v = Vec::new();
Prependallocate node, fix pointersv.insert(0, val);
Appendwalk to end, allocate, linkv.push(val);
Index accesswalk the list -- O(n)v[i] -- O(1)
Remove by idxwalk, relink, free -- O(n)v.remove(i); -- O(n)
Memory layoutscattered heap allocationsone contiguous buffer
Cache behaviorterrible (pointer chasing)excellent (sequential)
Safetydangling pointers, double freeborrow checker enforced

For almost all user-space programming, Vec beats a linked list. The linked list wins only when you need O(1) insertion/removal in the middle and you already hold a pointer to the node.

When You'd Still Write Your Own

  • Embedded/no-alloc environments: You cannot use Vec without an allocator. You write intrusive lists on a pre-allocated pool.
  • Kernel modules: The kernel has its own allocator and its own list macros. You use struct list_head, not std::collections.
  • Lock-free data structures: Standard collections are not lock-free. You hand-roll with atomics.
  • Performance-critical hot paths: When the profiler says the standard container is the bottleneck (rare, but it happens).

Driver Prep: In kernel space, you will use struct list_head, struct hlist_head (hash list), and struct rb_root (red-black tree). All are intrusive. Learn the pattern now; the kernel macros are the same idea.

Knowledge Check

  1. What does container_of(ptr, type, member) compute, and why does the kernel need it?
  2. Why is a Vec (contiguous array) usually faster than a linked list for iteration, even though both are O(n)?
  3. In the circular-buffer queue implementation, what happens if you forget the modulo operation on head and tail?

Common Pitfalls

  • Forgetting to free every node in a linked list -- walk the list and free each node; do not just free the head.
  • Off-by-one in circular buffers -- the modulo wrap must use the capacity, not the count.
  • Using a linked list when a Vec would do -- cache misses from pointer chasing dominate on modern hardware.
  • Dangling pointers after removal -- in C, after you free a node, any pointer still referencing it is undefined behavior.
  • Hash function quality -- a bad hash clusters entries into a few buckets, turning O(1) average into O(n).

Generic Programming: void* to Generics

C has no generics in the language. Instead it has three escape hatches: void*, macros, and _Generic. This chapter shows all three, then shows how Rust does it properly with monomorphized generics and trait bounds.

The void* Pattern

void* is C's universal pointer -- it can point to any type. The cost is total loss of type information. The compiler cannot check that you cast correctly.

/* void_swap.c -- generic swap with void* */
#include <stdio.h>
#include <string.h>

void generic_swap(void *a, void *b, size_t size)
{
    unsigned char tmp[size];   /* VLA -- C99 */
    memcpy(tmp, a, size);
    memcpy(a, b, size);
    memcpy(b, tmp, size);
}

int main(void)
{
    int x = 10, y = 20;
    generic_swap(&x, &y, sizeof(int));
    printf("x=%d y=%d\n", x, y);   /* x=20 y=10 */

    double p = 3.14, q = 2.72;
    generic_swap(&p, &q, sizeof(double));
    printf("p=%.2f q=%.2f\n", p, q);  /* p=2.72 q=3.14 */

    return 0;
}

This works, but nothing stops you from passing the wrong size or casting the result to the wrong type. The bug compiles cleanly and corrupts memory at runtime.

Caution: void* erases the type. The compiler will not warn if you pass sizeof(int) for a double*. The resulting memory corruption is silent and undefined.

qsort: The Classic void* API

The C standard library's qsort is the canonical example. Its signature:

void qsort(void *base, size_t nmemb, size_t size,
            int (*compar)(const void *, const void *));

Every argument is void* or size_t. The comparison function receives void* and must cast to the real type.

/* qsort_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int cmp_int(const void *a, const void *b)
{
    int ia = *(const int *)a;
    int ib = *(const int *)b;
    return (ia > ib) - (ia < ib);
}

int cmp_str(const void *a, const void *b)
{
    const char *sa = *(const char **)a;
    const char *sb = *(const char **)b;
    return strcmp(sa, sb);
}

int main(void)
{
    int nums[] = {5, 3, 8, 1, 4};
    qsort(nums, 5, sizeof(int), cmp_int);
    for (int i = 0; i < 5; i++)
        printf("%d ", nums[i]);
    printf("\n");   /* 1 3 4 5 8 */

    const char *words[] = {"cherry", "apple", "banana"};
    qsort(words, 3, sizeof(char *), cmp_str);
    for (int i = 0; i < 3; i++)
        printf("%s ", words[i]);
    printf("\n");   /* apple banana cherry */

    return 0;
}

Try It: Write a cmp_int_desc comparator that sorts integers in descending order. Pass it to qsort and verify the output.

typedef for Clarity

Raw function-pointer types are hard to read. typedef helps.

/* typedef_demo.c */
#include <stdio.h>

/* Without typedef: hard to parse */
int (*get_operation_raw(char op))(int, int);

/* With typedef: much clearer */
typedef int (*binop_fn)(int, int);

int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }

binop_fn get_operation(char op)
{
    switch (op) {
    case '+': return add;
    case '*': return mul;
    default:  return NULL;
    }
}

int main(void)
{
    binop_fn f = get_operation('+');
    if (f)
        printf("3 + 4 = %d\n", f(3, 4));  /* 7 */
    return 0;
}

Macro-Based Generics

When void* is too dangerous, C programmers reach for macros. The preprocessor does text substitution before the compiler sees the code, so a macro can "generate" type-specific functions.

/* macro_generic.c -- type-safe min/max via macros */
#include <stdio.h>

#define DEFINE_MIN(TYPE, NAME)          \
    static inline TYPE NAME(TYPE a, TYPE b) { return a < b ? a : b; }

DEFINE_MIN(int,    min_int)
DEFINE_MIN(double, min_double)

int main(void)
{
    printf("min_int(3,7) = %d\n",      min_int(3, 7));
    printf("min_double(3.1,2.7) = %f\n", min_double(3.1, 2.7));
    return 0;
}

The kernel takes this further. list_for_each_entry iterates over an intrusive list and hands you typed pointers -- no casting needed.

/* Simplified kernel-style iteration macro */
#include <stdio.h>
#include <stddef.h>

#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

struct list_head {
    struct list_head *next;
    struct list_head *prev;
};

/* Initialize a list head to point to itself (empty circular list) */
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define LIST_HEAD(name) struct list_head name = LIST_HEAD_INIT(name)

static inline void list_add(struct list_head *new_node,
                            struct list_head *head)
{
    new_node->next = head->next;
    new_node->prev = head;
    head->next->prev = new_node;
    head->next = new_node;
}

#define list_for_each_entry(pos, head, member)                      \
    for (pos = container_of((head)->next, typeof(*pos), member);    \
         &pos->member != (head);                                    \
         pos = container_of(pos->member.next, typeof(*pos), member))

struct task {
    int pid;
    struct list_head link;
};

int main(void)
{
    LIST_HEAD(tasks);

    struct task t1 = { .pid = 1 };
    struct task t2 = { .pid = 2 };
    struct task t3 = { .pid = 3 };

    list_add(&t1.link, &tasks);
    list_add(&t2.link, &tasks);
    list_add(&t3.link, &tasks);

    struct task *pos;
    list_for_each_entry(pos, &tasks, link)
        printf("pid = %d\n", pos->pid);

    return 0;
}
Circular doubly linked list after adding t1, t2, t3:

  tasks (sentinel)
    |
    +--next--> t3.link --next--> t2.link --next--> t1.link --+
    +--prev--- t1.link <--prev-- t2.link <--prev-- t3.link <-+

Driver Prep: list_for_each_entry and container_of are the two macros you will use most often in kernel code. Master them now.

_Generic in C11

C11 introduced _Generic, a compile-time type dispatch. It selects an expression based on the type of its controlling argument.

/* generic_c11.c -- _Generic dispatch (requires C11) */
#include <stdio.h>
#include <math.h>

#define abs_val(x) _Generic((x),       \
    int:    abs,                         \
    long:   labs,                        \
    float:  fabsf,                       \
    double: fabs                         \
)(x)

#define print_val(x) _Generic((x),     \
    int:    print_int,                   \
    double: print_double,                \
    char *: print_str                    \
)(x)

void print_int(int x)       { printf("int: %d\n", x); }
void print_double(double x) { printf("double: %.2f\n", x); }
void print_str(char *s)     { printf("string: %s\n", s); }

int main(void)
{
    printf("abs(-3)   = %d\n",   abs_val(-3));
    printf("abs(-3.5) = %.1f\n", abs_val(-3.5));

    print_val(42);
    print_val(3.14);
    print_val("hello");

    return 0;
}

_Generic is limited: it dispatches on a fixed set of types you enumerate. You cannot write open-ended generic code with it. It is best used for type-overloaded convenience macros.

The typedef + Function Pointer + Macro Trinity

In practice, "generic" C code combines all three techniques:

  1. typedef names the function-pointer type so humans can read it.
  2. Function pointers supply type-specific operations at runtime.
  3. Macros stamp out boilerplate and provide type-safe wrappers.
/* trinity.c -- combining typedef, fn pointers, and macros */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* 1. typedef for clarity */
typedef int (*cmp_fn)(const void *, const void *);

/* 2. "Generic" sorted array with function pointer for comparison */
struct sorted_array {
    void   *data;
    size_t  elem_size;
    size_t  len;
    size_t  cap;
    cmp_fn  cmp;
};

struct sorted_array sa_new(size_t elem_size, size_t cap, cmp_fn cmp)
{
    struct sorted_array sa;
    sa.data      = malloc(elem_size * cap);
    sa.elem_size = elem_size;
    sa.len       = 0;
    sa.cap       = cap;
    sa.cmp       = cmp;
    return sa;
}

void sa_insert(struct sorted_array *sa, const void *elem)
{
    if (sa->len >= sa->cap) {
        fprintf(stderr, "sorted_array full\n");
        return;
    }
    /* Find insertion point (linear scan for simplicity) */
    size_t i;
    char *base = (char *)sa->data;
    for (i = 0; i < sa->len; i++) {
        if (sa->cmp(elem, base + i * sa->elem_size) < 0)
            break;
    }
    /* Shift elements right */
    memmove(base + (i + 1) * sa->elem_size,
            base + i * sa->elem_size,
            (sa->len - i) * sa->elem_size);
    memcpy(base + i * sa->elem_size, elem, sa->elem_size);
    sa->len++;
}

/* 3. Macro for type-safe access */
#define SA_GET(sa, type, index) (((type *)(sa)->data)[index])

void sa_free(struct sorted_array *sa) { free(sa->data); }

int cmp_int(const void *a, const void *b)
{
    int ia = *(const int *)a;
    int ib = *(const int *)b;
    return (ia > ib) - (ia < ib);
}

int main(void)
{
    struct sorted_array sa = sa_new(sizeof(int), 16, cmp_int);

    int vals[] = {5, 3, 8, 1, 7};
    for (int i = 0; i < 5; i++)
        sa_insert(&sa, &vals[i]);

    printf("Sorted: ");
    for (size_t i = 0; i < sa.len; i++)
        printf("%d ", SA_GET(&sa, int, i));
    printf("\n");   /* 1 3 5 7 8 */

    sa_free(&sa);
    return 0;
}

Caution: The SA_GET macro trusts you to pass the right type. Get it wrong and you read garbage. C has no defense against this.

Rust: Real Generics

Rust generics are monomorphized -- the compiler generates a separate copy of the function for each concrete type used. You get type safety and zero runtime cost.

// generics.rs -- Rust generics with trait bounds
fn min_val<T: PartialOrd>(a: T, b: T) -> T {
    if a < b { a } else { b }
}

fn print_sorted<T: Ord + std::fmt::Debug>(mut items: Vec<T>) {
    items.sort();
    println!("{:?}", items);
}

fn main() {
    println!("min(3, 7) = {}", min_val(3, 7));
    println!("min(3.1, 2.7) = {}", min_val(3.1, 2.7));

    print_sorted(vec![5, 3, 8, 1, 7]);    // [1, 3, 5, 7, 8]
    print_sorted(vec!["cherry", "apple", "banana"]);
}

Rust Note: Monomorphization means min_val::<i32> and min_val::<f64> are two separate compiled functions. There is no void*, no casting, no runtime dispatch. The compiler catches type errors at compile time.

Trait Bounds and Where Clauses

Trait bounds constrain what a generic type must support. A where clause is just a cleaner way to write the same constraints.

// trait_bounds.rs
use std::fmt::Display;
use std::ops::Add;

// Inline bounds
fn sum_and_print<T: Add<Output = T> + Display + Copy>(a: T, b: T) {
    let result = a + b;
    println!("{} + {} = {}", a, b, result);
}

// Equivalent with where clause (clearer for many bounds)
fn describe<T>(item: &T)
where
    T: Display + PartialOrd + Copy,
{
    println!("Value: {}", item);
}

// Multiple generic parameters
fn largest<T>(list: &[T]) -> &T
where
    T: PartialOrd,
{
    let mut max = &list[0];
    for item in &list[1..] {
        if item > max {
            max = item;
        }
    }
    max
}

fn main() {
    sum_and_print(3, 4);       // 3 + 4 = 7
    sum_and_print(1.5, 2.5);   // 1.5 + 2.5 = 4

    describe(&42);
    describe(&"hello");

    let nums = vec![5, 3, 8, 1, 7];
    println!("largest = {}", largest(&nums));   // 8
}

Generic Structs

In C, you fake generic containers with void* and macros. In Rust, you declare a generic struct directly.

// generic_struct.rs
struct SortedVec<T: Ord> {
    data: Vec<T>,
}

impl<T: Ord> SortedVec<T> {
    fn new() -> Self {
        SortedVec { data: Vec::new() }
    }

    fn insert(&mut self, value: T) {
        let pos = self.data.binary_search(&value).unwrap_or_else(|e| e);
        self.data.insert(pos, value);
    }

    fn contains(&self, value: &T) -> bool {
        self.data.binary_search(value).is_ok()
    }

    fn iter(&self) -> impl Iterator<Item = &T> {
        self.data.iter()
    }
}

fn main() {
    let mut sv = SortedVec::new();
    sv.insert(5);
    sv.insert(3);
    sv.insert(8);
    sv.insert(1);
    sv.insert(7);

    print!("Sorted: ");
    for v in sv.iter() {
        print!("{} ", v);
    }
    println!();  // 1 3 5 7 8

    println!("contains 3? {}", sv.contains(&3));  // true
    println!("contains 6? {}", sv.contains(&6));  // false
}

Why the Kernel Chose Macros Over void*

The kernel avoids void* for linked lists. Here is why:

  void* approach:                Macro approach:

  struct list_node {             struct list_head {
      void *data;  <-- cast!         struct list_head *next;
      struct list_node *next;        struct list_head *prev;
  };                             };
                                 // Embed list_head in your struct.
  Requires:                      // container_of recovers parent.
  - runtime cast on every access
  - extra indirection (pointer   Requires:
    to data, not data itself)    - typeof / offsetof at compile time
  - no type checking             - zero extra indirection
                                 - type-safe iteration macros

The macro approach gives:

  • No extra allocations: the list node lives inside the data struct.
  • No casts at use sites: list_for_each_entry hands you a typed pointer.
  • No double indirection: one fewer pointer dereference per access.

Try It: Rewrite the generic_swap function from the beginning of this chapter using a macro instead of void*. The macro version should work for any type without requiring a size parameter. (Hint: use typeof.)

Knowledge Check

  1. What goes wrong if you pass sizeof(int) to generic_swap but the pointers actually point to double values?
  2. In _Generic, what happens if the controlling expression's type does not match any of the listed types and there is no default case?
  3. What does "monomorphization" mean in Rust, and how does it differ from C++'s template instantiation?

Common Pitfalls

  • Casting void* to the wrong type -- the compiler says nothing, the program corrupts memory silently.
  • Macro hygiene -- macro arguments evaluated multiple times cause bugs: MIN(x++, y) increments x twice.
  • _Generic does not support user-defined types easily -- you must enumerate every type explicitly.
  • Forgetting trait bounds in Rust -- the compiler will tell you exactly which trait is missing, but the error messages can be long.
  • Over-constraining generics -- requiring more traits than necessary reduces reusability.

Function Pointers and Callbacks

A function pointer stores the address of a function. Combined with a context pointer, it becomes a callback -- the mechanism C uses for polymorphism, event handling, and plugin architectures. Rust replaces the pattern with closures and traits.

C Function Pointer Syntax

The syntax is notoriously hard to read. The parentheses around *fn are mandatory -- without them you declare a function returning a pointer, not a pointer to a function.

/* fnptr_basic.c */
#include <stdio.h>

int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }

int main(void)
{
    /* Declare a function pointer */
    int (*op)(int, int);

    op = add;
    printf("add(3,4) = %d\n", op(3, 4));   /* 7 */

    op = mul;
    printf("mul(3,4) = %d\n", op(3, 4));   /* 12 */

    /* You can also call through the pointer explicitly */
    printf("(*op)(5,6) = %d\n", (*op)(5, 6));  /* 30 */

    return 0;
}
  Memory layout:

  op (8 bytes on x86-64)
  +------------------+
  | address of add() |   or   | address of mul() |
  +------------------+
          |
          v
  .text section: machine code for the function

typedef for Readability

Always typedef function pointer types. Compare:

/* Without typedef */
int (*get_op(char c))(int, int);

/* With typedef */
typedef int (*binop_fn)(int, int);
binop_fn get_op(char c);

The second is instantly readable. The first requires right-left parsing.

/* fnptr_typedef.c */
#include <stdio.h>

typedef int (*binop_fn)(int, int);

int add(int a, int b) { return a + b; }
int sub(int a, int b) { return a - b; }
int mul(int a, int b) { return a * b; }

binop_fn get_op(char c)
{
    switch (c) {
    case '+': return add;
    case '-': return sub;
    case '*': return mul;
    default:  return NULL;
    }
}

int main(void)
{
    char ops[] = "+-*";
    for (int i = 0; i < 3; i++) {
        binop_fn f = get_op(ops[i]);
        if (f)
            printf("10 %c 3 = %d\n", ops[i], f(10, 3));
    }
    return 0;
}

Try It: Add a division operator to get_op. Handle division by zero inside the div function by returning 0 and printing a warning.

Passing Functions as Arguments

The C standard library uses function pointers extensively.

qsort

/* qsort_fp.c */
#include <stdio.h>
#include <stdlib.h>

int ascending(const void *a, const void *b)
{
    return *(const int *)a - *(const int *)b;
}

int descending(const void *a, const void *b)
{
    return *(const int *)b - *(const int *)a;
}

void print_array(const int *arr, int n)
{
    for (int i = 0; i < n; i++)
        printf("%d ", arr[i]);
    printf("\n");
}

int main(void)
{
    int nums[] = {5, 1, 4, 2, 3};

    qsort(nums, 5, sizeof(int), ascending);
    printf("ascending:  "); print_array(nums, 5);

    qsort(nums, 5, sizeof(int), descending);
    printf("descending: "); print_array(nums, 5);

    return 0;
}

signal

/* signal_fp.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

void handle_sigint(int sig)
{
    /* Async-signal-safe: only write() is safe here */
    const char msg[] = "\nCaught SIGINT\n";
    write(STDOUT_FILENO, msg, sizeof(msg) - 1);
    _exit(0);
}

int main(void)
{
    signal(SIGINT, handle_sigint);
    printf("Press Ctrl-C...\n");
    while (1)
        pause();   /* sleep until a signal arrives */
    return 0;
}

pthread_create

/* pthread_fp.c */
#include <stdio.h>
#include <pthread.h>

void *worker(void *arg)
{
    int id = *(int *)arg;
    printf("Thread %d running\n", id);
    return NULL;
}

int main(void)
{
    pthread_t threads[3];
    int ids[3] = {1, 2, 3};

    for (int i = 0; i < 3; i++)
        pthread_create(&threads[i], NULL, worker, &ids[i]);

    for (int i = 0; i < 3; i++)
        pthread_join(threads[i], NULL);

    return 0;
}

Compile with: gcc -pthread pthread_fp.c -o pthread_fp

Callback Pattern: Function Pointer + void* Context

A callback alone is rarely enough. You usually need to pass some context -- user-defined data the callback can access. In C, this is done with a void*.

/* callback_ctx.c -- callback with context */
#include <stdio.h>

typedef void (*event_handler)(int event_id, void *ctx);

struct logger_ctx {
    const char *prefix;
    int         count;
};

void log_event(int event_id, void *ctx)
{
    struct logger_ctx *log = (struct logger_ctx *)ctx;
    log->count++;
    printf("[%s] event %d (total: %d)\n", log->prefix, event_id, log->count);
}

struct event_source {
    event_handler handler;
    void         *ctx;
};

void event_source_fire(struct event_source *src, int event_id)
{
    if (src->handler)
        src->handler(event_id, src->ctx);
}

int main(void)
{
    struct logger_ctx my_log = { .prefix = "app", .count = 0 };

    struct event_source src = {
        .handler = log_event,
        .ctx     = &my_log,
    };

    event_source_fire(&src, 100);
    event_source_fire(&src, 200);
    event_source_fire(&src, 300);

    return 0;
}
  event_source
  +-------------+-----------+
  | handler: *  | ctx: *    |
  +------+------+-----+-----+
         |            |
         v            v
    log_event()   logger_ctx { prefix, count }

Caution: The void* ctx pointer is completely unchecked. If you register the wrong context type for a given handler, the cast inside the handler will silently interpret garbage. This is a common source of bugs in C codebases.

Vtables: Struct of Function Pointers

When an object needs multiple operations, you group the function pointers into a struct. This is C's version of a vtable -- the same pattern the kernel uses for file_operations, inode_operations, and dozens of other interfaces.

/* vtable.c -- polymorphism via struct of function pointers */
#include <stdio.h>
#include <math.h>

/* The "interface" */
struct shape_ops {
    double (*area)(const void *self);
    double (*perimeter)(const void *self);
    void   (*describe)(const void *self);
};

/* A "class": circle */
struct circle {
    const struct shape_ops *ops;   /* vtable pointer */
    double radius;
};

double circle_area(const void *self)
{
    const struct circle *c = self;
    return M_PI * c->radius * c->radius;
}

double circle_perimeter(const void *self)
{
    const struct circle *c = self;
    return 2.0 * M_PI * c->radius;
}

void circle_describe(const void *self)
{
    const struct circle *c = self;
    printf("Circle(r=%.1f)\n", c->radius);
}

static const struct shape_ops circle_ops = {
    .area      = circle_area,
    .perimeter = circle_perimeter,
    .describe  = circle_describe,
};

/* A "class": rectangle */
struct rectangle {
    const struct shape_ops *ops;
    double width, height;
};

double rect_area(const void *self)
{
    const struct rectangle *r = self;
    return r->width * r->height;
}

double rect_perimeter(const void *self)
{
    const struct rectangle *r = self;
    return 2.0 * (r->width + r->height);
}

void rect_describe(const void *self)
{
    const struct rectangle *r = self;
    printf("Rectangle(%.1f x %.1f)\n", r->width, r->height);
}

static const struct shape_ops rect_ops = {
    .area      = rect_area,
    .perimeter = rect_perimeter,
    .describe  = rect_describe,
};

/* Polymorphic function -- works with any "shape" */
void print_shape_info(const void *shape)
{
    /* The first field of every "shape" is the ops pointer */
    const struct shape_ops *ops = *(const struct shape_ops **)shape;
    ops->describe(shape);
    printf("  area      = %.2f\n", ops->area(shape));
    printf("  perimeter = %.2f\n", ops->perimeter(shape));
}

int main(void)
{
    struct circle c = { .ops = &circle_ops, .radius = 5.0 };
    struct rectangle r = { .ops = &rect_ops, .width = 4.0, .height = 6.0 };

    print_shape_info(&c);
    print_shape_info(&r);

    return 0;
}

Compile with: gcc -lm vtable.c -o vtable

  circle layout:             rectangle layout:

  +----------+----------+    +----------+-------+--------+
  | ops: *   | radius   |    | ops: *   | width | height |
  +----+-----+----------+    +----+-----+-------+--------+
       |                          |
       v                          v
  circle_ops {               rect_ops {
    .area      = circle_area     .area      = rect_area
    .perimeter = circle_peri     .perimeter = rect_perimeter
    .describe  = circle_desc     .describe  = rect_describe
  }                          }

Driver Prep: The kernel's struct file_operations is exactly this pattern. When you write a character device driver, you fill in a file_operations struct with pointers to your read, write, open, release, and ioctl functions. The VFS calls them through the vtable.

Rust: fn Pointers

Rust has bare function pointers, spelled as fn(args) -> ret. They are used less often than closures but exist for FFI and when no state capture is needed.

// fn_pointer.rs
fn add(a: i32, b: i32) -> i32 { a + b }
fn mul(a: i32, b: i32) -> i32 { a * b }

fn apply(f: fn(i32, i32) -> i32, x: i32, y: i32) -> i32 {
    f(x, y)
}

fn main() {
    println!("add(3,4) = {}", apply(add, 3, 4));
    println!("mul(3,4) = {}", apply(mul, 3, 4));

    // Store in a variable
    let op: fn(i32, i32) -> i32 = add;
    println!("op(5,6) = {}", op(5, 6));
}

Rust: Closures and the Fn Traits

Closures capture variables from their environment. They implement one or more of three traits:

TraitCaptures byCan callAnalogy
Fnshared referencemany timesconst void *ctx
FnMutmutable refmany timesvoid *ctx (mutating)
FnOncemove (ownership)exactly onceconsuming the context
// closures.rs
fn apply_fn(f: &dyn Fn(i32) -> i32, x: i32) -> i32 {
    f(x)
}

fn apply_fn_mut(f: &mut dyn FnMut(i32) -> i32, x: i32) -> i32 {
    f(x)
}

fn apply_fn_once(f: impl FnOnce(i32) -> String, x: i32) -> String {
    f(x)
}

fn main() {
    // Fn -- captures `offset` by shared reference
    let offset = 10;
    let add_offset = |x: i32| x + offset;
    println!("Fn: {}", apply_fn(&add_offset, 5));       // 15

    // FnMut -- captures `count` by mutable reference
    let mut count = 0;
    let mut counter = |x: i32| -> i32 {
        count += 1;
        x + count
    };
    println!("FnMut: {}", apply_fn_mut(&mut counter, 5));   // 6
    println!("FnMut: {}", apply_fn_mut(&mut counter, 5));   // 7

    // FnOnce -- moves `name` into the closure
    let name = String::from("event");
    let describe = move |x: i32| -> String {
        format!("{} #{}", name, x)
    };
    println!("FnOnce: {}", apply_fn_once(describe, 42));
    // `describe` is consumed -- cannot call it again
}

Rust Note: Every closure has a unique, anonymous type. You cannot name it. When you need to store closures in a struct, use Box<dyn Fn(...)> for trait objects or generics with impl Fn(...) bounds.

How Closures Capture

By default, closures borrow variables with the least permissions needed. Use move to force ownership transfer -- required when the closure outlives the current scope (e.g., sending to another thread).

  Without move:                  With move:

  stack frame:                   stack frame:
  +---+                          +---+
  | x | = 5                     | x | = 5  (copied into closure)
  +---+                          +---+
  | y | = "hello" (on heap)     | y | = MOVED
  +---+                          +---+
    |
    |  closure captures &y       closure owns y's data
    v                            directly in its anonymous struct

Rust Traits as Vtables

Rust's dyn Trait is the direct equivalent of C's struct-of-function-pointers.

// trait_vtable.rs
use std::f64::consts::PI;

trait Shape {
    fn area(&self) -> f64;
    fn perimeter(&self) -> f64;
    fn describe(&self);
}

struct Circle {
    radius: f64,
}

impl Shape for Circle {
    fn area(&self) -> f64 { PI * self.radius * self.radius }
    fn perimeter(&self) -> f64 { 2.0 * PI * self.radius }
    fn describe(&self) { println!("Circle(r={:.1})", self.radius); }
}

struct Rectangle {
    width: f64,
    height: f64,
}

impl Shape for Rectangle {
    fn area(&self) -> f64 { self.width * self.height }
    fn perimeter(&self) -> f64 { 2.0 * (self.width + self.height) }
    fn describe(&self) {
        println!("Rectangle({:.1} x {:.1})", self.width, self.height);
    }
}

fn print_shape_info(shape: &dyn Shape) {
    shape.describe();
    println!("  area      = {:.2}", shape.area());
    println!("  perimeter = {:.2}", shape.perimeter());
}

fn main() {
    let c = Circle { radius: 5.0 };
    let r = Rectangle { width: 4.0, height: 6.0 };

    print_shape_info(&c);
    print_shape_info(&r);

    // Store heterogeneous shapes in a Vec
    let shapes: Vec<Box<dyn Shape>> = vec![
        Box::new(Circle { radius: 3.0 }),
        Box::new(Rectangle { width: 2.0, height: 8.0 }),
    ];
    for s in &shapes {
        print_shape_info(s.as_ref());
    }
}

Rust Note: &dyn Shape is a fat pointer: it stores a pointer to the data AND a pointer to the vtable. This is exactly the same layout as the C pattern where every struct starts with const struct shape_ops *ops. The difference: Rust enforces it at compile time.

  &dyn Shape (fat pointer):

  +----------+----------+
  | data_ptr | vtbl_ptr |
  +----+-----+----+-----+
       |          |
       v          v
    Circle {    Shape vtable for Circle {
      radius      area:      -> Circle::area
    }             perimeter: -> Circle::perimeter
                  describe:  -> Circle::describe
                  drop:      -> Circle::drop
                }

Try It: Add a Triangle struct that implements Shape. Add it to the shapes vector and verify polymorphic dispatch works.

Knowledge Check

  1. In C, why must you use parentheses in int (*fn)(int, int) -- what happens if you write int *fn(int, int) instead?
  2. Why does the C callback pattern need both a function pointer and a void* context, while a Rust closure needs only one value?
  3. How does dyn Trait in Rust achieve the same result as a C vtable?

Common Pitfalls

  • Calling a NULL function pointer -- undefined behavior. Always check before calling.
  • Mismatched callback signatures -- C will not always warn if the function pointer type does not match the actual function.
  • Context lifetime -- if the void* context points to a local variable that goes out of scope, the callback reads dangling memory.
  • Confusing fn and Fn in Rust -- fn is a bare function pointer; Fn is a trait that closures implement. They are not interchangeable.
  • Forgetting move on closures passed to threads or stored in structs -- the closure captures references to locals that will be dropped.

State Machines with Function Pointers

A state machine has a finite set of states, a set of events, and transition rules. When an event arrives, the machine moves from its current state to a new one and optionally performs an action. This chapter builds state machines in C with function-pointer dispatch tables, then rebuilds them in Rust with enums and pattern matching.

Why State Machines Matter

Drivers, protocol parsers, network stacks, and user-interface logic are all state machines. A TCP connection goes through LISTEN, SYN_SENT, ESTABLISHED, FIN_WAIT, and more. A UART driver handles IDLE, RECEIVING, ERROR. If you write systems code, you write state machines.

Dispatch Tables: Array of Function Pointers

The simplest state machine is an array of function pointers indexed by state. Each function handles the current state and returns the next state.

/* dispatch_table.c -- state machine via function pointer array */
#include <stdio.h>

typedef enum {
    STATE_IDLE,
    STATE_RUNNING,
    STATE_STOPPED,
    STATE_COUNT   /* number of states */
} state_t;

typedef enum {
    EVENT_START,
    EVENT_STOP,
    EVENT_RESET,
    EVENT_COUNT
} event_t;

typedef state_t (*handler_fn)(void);

state_t on_idle_start(void)
{
    printf("  IDLE -> start -> RUNNING\n");
    return STATE_RUNNING;
}

state_t on_idle_stop(void)
{
    printf("  IDLE -> stop -> (stay IDLE)\n");
    return STATE_IDLE;
}

state_t on_idle_reset(void)
{
    printf("  IDLE -> reset -> (stay IDLE)\n");
    return STATE_IDLE;
}

state_t on_running_start(void)
{
    printf("  RUNNING -> start -> (stay RUNNING)\n");
    return STATE_RUNNING;
}

state_t on_running_stop(void)
{
    printf("  RUNNING -> stop -> STOPPED\n");
    return STATE_STOPPED;
}

state_t on_running_reset(void)
{
    printf("  RUNNING -> reset -> IDLE\n");
    return STATE_IDLE;
}

state_t on_stopped_start(void)
{
    printf("  STOPPED -> start -> RUNNING\n");
    return STATE_RUNNING;
}

state_t on_stopped_stop(void)
{
    printf("  STOPPED -> stop -> (stay STOPPED)\n");
    return STATE_STOPPED;
}

state_t on_stopped_reset(void)
{
    printf("  STOPPED -> reset -> IDLE\n");
    return STATE_IDLE;
}

/* 2D dispatch table: [state][event] -> handler */
static handler_fn dispatch[STATE_COUNT][EVENT_COUNT] = {
    [STATE_IDLE]    = { on_idle_start,    on_idle_stop,    on_idle_reset    },
    [STATE_RUNNING] = { on_running_start, on_running_stop, on_running_reset },
    [STATE_STOPPED] = { on_stopped_start, on_stopped_stop, on_stopped_reset },
};

static const char *state_names[] = { "IDLE", "RUNNING", "STOPPED" };

int main(void)
{
    state_t current = STATE_IDLE;

    event_t events[] = {
        EVENT_START, EVENT_STOP, EVENT_RESET, EVENT_START, EVENT_STOP
    };

    for (int i = 0; i < 5; i++) {
        printf("State: %s\n", state_names[current]);
        current = dispatch[current][events[i]]();
    }
    printf("Final state: %s\n", state_names[current]);

    return 0;
}
  Dispatch table layout:

              EVENT_START       EVENT_STOP        EVENT_RESET
            +----------------+----------------+----------------+
  IDLE      | on_idle_start  | on_idle_stop   | on_idle_reset  |
            +----------------+----------------+----------------+
  RUNNING   | on_run_start   | on_run_stop    | on_run_reset   |
            +----------------+----------------+----------------+
  STOPPED   | on_stop_start  | on_stop_stop   | on_stop_reset  |
            +----------------+----------------+----------------+

  Lookup: dispatch[current_state][event]() -> next_state

Try It: Add a STATE_ERROR and an EVENT_ERROR that transitions from any state to STATE_ERROR. Only EVENT_RESET can leave STATE_ERROR.

Protocol Parser State Machine

A real use case: parsing a simple key=value protocol. Messages look like KEY=VALUE\n. The parser walks through states as it reads each character.

/* parser_sm.c -- protocol parser state machine */
#include <stdio.h>
#include <string.h>

typedef enum {
    PS_KEY,       /* reading the key */
    PS_VALUE,     /* reading the value */
    PS_DONE,      /* line complete */
    PS_ERROR      /* malformed input */
} parse_state_t;

struct parser {
    parse_state_t state;
    char key[64];
    char value[256];
    int  key_len;
    int  val_len;
};

void parser_init(struct parser *p)
{
    p->state   = PS_KEY;
    p->key_len = 0;
    p->val_len = 0;
    p->key[0]  = '\0';
    p->value[0] = '\0';
}

parse_state_t feed_key(struct parser *p, char c)
{
    if (c == '=') {
        p->key[p->key_len] = '\0';
        return PS_VALUE;
    }
    if (c == '\n' || c == '\r')
        return PS_ERROR;   /* newline before '=' */
    if (p->key_len < 63) {
        p->key[p->key_len++] = c;
    }
    return PS_KEY;
}

parse_state_t feed_value(struct parser *p, char c)
{
    if (c == '\n') {
        p->value[p->val_len] = '\0';
        return PS_DONE;
    }
    if (p->val_len < 255) {
        p->value[p->val_len++] = c;
    }
    return PS_VALUE;
}

parse_state_t feed_done(struct parser *p, char c)
{
    (void)c;
    (void)p;
    return PS_DONE;   /* ignore further input */
}

parse_state_t feed_error(struct parser *p, char c)
{
    (void)c;
    (void)p;
    return PS_ERROR;
}

typedef parse_state_t (*feed_fn)(struct parser *, char);

static feed_fn state_handlers[] = {
    [PS_KEY]   = feed_key,
    [PS_VALUE] = feed_value,
    [PS_DONE]  = feed_done,
    [PS_ERROR] = feed_error,
};

void parser_feed(struct parser *p, char c)
{
    p->state = state_handlers[p->state](p, c);
}

int main(void)
{
    const char *input = "host=192.168.1.1\n";

    struct parser p;
    parser_init(&p);

    for (int i = 0; input[i] != '\0'; i++)
        parser_feed(&p, input[i]);

    if (p.state == PS_DONE)
        printf("Parsed: key='%s', value='%s'\n", p.key, p.value);
    else
        printf("Parse error\n");

    return 0;
}
  State transitions for "host=192.168.1.1\n":

  'h' -> PS_KEY
  'o' -> PS_KEY
  's' -> PS_KEY
  't' -> PS_KEY
  '=' -> PS_VALUE  (key complete: "host")
  '1' -> PS_VALUE
  '9' -> PS_VALUE
  '2' -> PS_VALUE
  ...
  '1' -> PS_VALUE
  '\n' -> PS_DONE  (value complete: "192.168.1.1")

Driver Prep: Many driver protocols (I2C, SPI, USB) use state machines for packet parsing. The function-pointer-per-state pattern keeps the code modular: each state handler is a small, testable function.

Event-Driven Design

In event-driven systems, a main loop reads events and dispatches them to the current state handler. This decouples event sources from state logic.

/* event_driven.c -- event loop with state machine */
#include <stdio.h>
#include <string.h>

typedef enum { ST_OFF, ST_ON, ST_COUNT } state_t;
typedef enum { EV_PRESS, EV_TIMER, EV_COUNT } event_t;

typedef struct {
    state_t  state;
    int      press_count;
} context_t;

typedef state_t (*handler_fn)(context_t *ctx);

state_t off_press(context_t *ctx)
{
    ctx->press_count++;
    printf("  [OFF] Button pressed (#%d) -> ON\n", ctx->press_count);
    return ST_ON;
}

state_t off_timer(context_t *ctx)
{
    (void)ctx;
    printf("  [OFF] Timer tick -> (stay OFF)\n");
    return ST_OFF;
}

state_t on_press(context_t *ctx)
{
    ctx->press_count++;
    printf("  [ON] Button pressed (#%d) -> OFF\n", ctx->press_count);
    return ST_OFF;
}

state_t on_timer(context_t *ctx)
{
    (void)ctx;
    printf("  [ON] Timer tick -> (stay ON)\n");
    return ST_ON;
}

static handler_fn dispatch[ST_COUNT][EV_COUNT] = {
    [ST_OFF] = { off_press, off_timer },
    [ST_ON]  = { on_press,  on_timer  },
};

void process_event(context_t *ctx, event_t ev)
{
    ctx->state = dispatch[ctx->state][ev](ctx);
}

int main(void)
{
    context_t ctx = { .state = ST_OFF, .press_count = 0 };

    /* Simulate an event stream */
    event_t events[] = { EV_TIMER, EV_PRESS, EV_TIMER, EV_PRESS, EV_PRESS };

    for (int i = 0; i < 5; i++)
        process_event(&ctx, events[i]);

    printf("Final press count: %d\n", ctx.press_count);
    return 0;
}

Try It: Add a ST_BLINK state entered by pressing the button while ON. A timer tick in BLINK goes back to ON. A press in BLINK goes to OFF.

Rust: enum + match

Rust's enums with data (algebraic data types) and exhaustive pattern matching are a natural fit for state machines. The compiler verifies you handle every state.

// state_machine.rs -- state machine with enum + match
#[derive(Debug, Clone, Copy, PartialEq)]
enum State {
    Idle,
    Running,
    Stopped,
}

#[derive(Debug, Clone, Copy)]
enum Event {
    Start,
    Stop,
    Reset,
}

fn transition(state: State, event: Event) -> State {
    match (state, event) {
        (State::Idle,    Event::Start) => {
            println!("  IDLE -> Start -> RUNNING");
            State::Running
        }
        (State::Running, Event::Stop) => {
            println!("  RUNNING -> Stop -> STOPPED");
            State::Stopped
        }
        (State::Running, Event::Reset) => {
            println!("  RUNNING -> Reset -> IDLE");
            State::Idle
        }
        (State::Stopped, Event::Start) => {
            println!("  STOPPED -> Start -> RUNNING");
            State::Running
        }
        (State::Stopped, Event::Reset) => {
            println!("  STOPPED -> Reset -> IDLE");
            State::Idle
        }
        (s, e) => {
            println!("  {:?} -> {:?} -> (no change)", s, e);
            s
        }
    }
}

fn main() {
    let mut state = State::Idle;

    let events = [
        Event::Start, Event::Stop, Event::Reset,
        Event::Start, Event::Stop,
    ];

    for &event in &events {
        println!("State: {:?}", state);
        state = transition(state, event);
    }
    println!("Final state: {:?}", state);
}

Rust Note: If you add a new variant to the State enum and forget to handle it in the match, the compiler refuses to build. In C, adding a new state to the enum but forgetting to add a row to the dispatch table compiles fine and crashes at runtime with a NULL function pointer call.

Protocol Parser in Rust

The same key=value parser, but using Rust enums.

// parser_sm.rs -- protocol parser with enum states
#[derive(Debug)]
enum ParseState {
    Key,
    Value,
    Done,
    Error(String),
}

struct Parser {
    state: ParseState,
    key: String,
    value: String,
}

impl Parser {
    fn new() -> Self {
        Parser {
            state: ParseState::Key,
            key: String::new(),
            value: String::new(),
        }
    }

    fn feed(&mut self, c: char) {
        self.state = match &self.state {
            ParseState::Key => {
                if c == '=' {
                    ParseState::Value
                } else if c == '\n' || c == '\r' {
                    ParseState::Error("newline before '='".into())
                } else {
                    self.key.push(c);
                    ParseState::Key
                }
            }
            ParseState::Value => {
                if c == '\n' {
                    ParseState::Done
                } else {
                    self.value.push(c);
                    ParseState::Value
                }
            }
            ParseState::Done => ParseState::Done,
            ParseState::Error(msg) => ParseState::Error(msg.clone()),
        };
    }

    fn result(&self) -> Option<(&str, &str)> {
        match &self.state {
            ParseState::Done => Some((&self.key, &self.value)),
            _ => None,
        }
    }
}

fn main() {
    let input = "host=192.168.1.1\n";

    let mut parser = Parser::new();
    for c in input.chars() {
        parser.feed(c);
    }

    match parser.result() {
        Some((k, v)) => println!("Parsed: key='{}', value='{}'", k, v),
        None => println!("Parse error: {:?}", parser.state),
    }
}

Notice that the Rust Error variant carries a message. Rust enums can hold data per variant -- C enums cannot.

Try It: Extend the Rust parser to handle multiple key=value pairs separated by newlines. After each Done state, reset to Key and collect results into a Vec<(String, String)>.

Why This Matters for Drivers

Device drivers are inherently state machines. A block device goes through initialization, ready, busy, and error states. A network driver manages link negotiation states. An interrupt handler transitions a UART between idle, receiving, and transmitting.

The C function-pointer dispatch table maps directly to how the kernel structures driver state machines. The Rust enum + match approach is how the kernel's Rust abstractions will express the same logic with compile-time exhaustiveness checking.

  Typical driver state machine:

          init_hw()        ready_for_io()
  PROBE ----------> INIT --------------> READY
                      |                    |  ^
                      | error              |  | io_complete()
                      v                    v  |
                    ERROR <--- error --- BUSY
                      |
                      | reset()
                      v
                    PROBE (retry)

Driver Prep: When you write a driver, draw the state diagram first. Enumerate every state and every event. Then implement it as a dispatch table (C) or enum + match (Rust). Missing transitions become obvious on paper before they become bugs in production.

Knowledge Check

  1. What advantage does a 2D dispatch table [state][event] have over a large switch statement with nested switches?
  2. In the Rust parser, why do we use std::mem::replace to take ownership of the current state before matching on it?
  3. How does Rust's exhaustive match checking prevent a class of bugs that C dispatch tables are vulnerable to?

Common Pitfalls

  • Uninitialized dispatch table entries -- in C, a missing entry is a NULL pointer. Calling it is undefined behavior. Always initialize every cell or add a bounds check.
  • Forgetting to handle "no transition" -- some state/event combinations should be no-ops. Make this explicit, not accidental.
  • State explosion -- too many states and events make the table unwieldy. Decompose into hierarchical state machines if the table exceeds about 5x5.
  • Side effects in transition logic -- keep handlers pure when possible. Separate "compute next state" from "perform action" for testability.
  • Rust: forgetting std::mem::replace -- if you match on &self.state you cannot move data out of variants. The take-and-replace idiom is standard for owned state machines.

Opaque Types and Encapsulation

Good APIs hide their guts. In C, the technique is called the "opaque pointer" or "handle" pattern — you forward-declare a struct in a header and only define it in the implementation file. In Rust, the compiler enforces privacy by default. This chapter shows both approaches and why the difference matters for large codebases.

The Problem: Leaking Implementation Details

When you put a full struct definition in a header, every file that includes it can reach into the struct's fields. Change a field name and you recompile the world. Worse, callers start depending on layout details you never promised.

+-------------------------------+
|  widget.h                     |
|  struct widget {              |
|      int x;   <-- exposed     |
|      int y;   <-- exposed     |
|      char *name; <-- exposed  |
|  };                           |
+-------------------------------+
        |           |
   file_a.c     file_b.c
   w->x = 5;   free(w->name);  <-- both reach inside

Every consumer is now coupled to the exact layout. This is fragile.

C: The Opaque Pointer Pattern

The fix in C is simple: forward-declare the struct in the header, define it only in the .c file, and expose functions that operate on pointers to it.

The Header (widget.h)

/* widget.h -- public interface only */
#ifndef WIDGET_H
#define WIDGET_H

#include <stddef.h>

/* Forward declaration -- callers never see the fields */
typedef struct widget widget_t;

/* Constructor / destructor */
widget_t *widget_create(const char *name, int x, int y);
void      widget_destroy(widget_t *w);

/* Accessors */
const char *widget_name(const widget_t *w);
int         widget_x(const widget_t *w);
int         widget_y(const widget_t *w);

/* Mutator */
void widget_move(widget_t *w, int dx, int dy);

#endif /* WIDGET_H */

The Implementation (widget.c)

/* widget.c -- only this file knows the struct layout */
#include "widget.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct widget {
    char *name;
    int   x;
    int   y;
};

widget_t *widget_create(const char *name, int x, int y)
{
    widget_t *w = malloc(sizeof(*w));
    if (!w)
        return NULL;

    w->name = strdup(name);
    if (!w->name) {
        free(w);
        return NULL;
    }
    w->x = x;
    w->y = y;
    return w;
}

void widget_destroy(widget_t *w)
{
    if (!w)
        return;
    free(w->name);
    free(w);
}

const char *widget_name(const widget_t *w)
{
    return w->name;
}

int widget_x(const widget_t *w)
{
    return w->x;
}

int widget_y(const widget_t *w)
{
    return w->y;
}

void widget_move(widget_t *w, int dx, int dy)
{
    w->x += dx;
    w->y += dy;
}

A Caller (main.c)

/* main.c */
#include <stdio.h>
#include "widget.h"

int main(void)
{
    widget_t *w = widget_create("button", 10, 20);
    if (!w) {
        fprintf(stderr, "widget_create failed\n");
        return 1;
    }

    printf("Widget '%s' at (%d, %d)\n",
           widget_name(w), widget_x(w), widget_y(w));

    widget_move(w, 5, -3);
    printf("After move: (%d, %d)\n", widget_x(w), widget_y(w));

    /* w->x = 99;  <-- compile error: incomplete type */

    widget_destroy(w);
    return 0;
}

Compile and run:

gcc -Wall -c widget.c -o widget.o
gcc -Wall main.c widget.o -o main
./main

Output:

Widget 'button' at (10, 20)
After move: (15, 17)

The caller cannot access w->x directly because the compiler only sees the forward declaration — the struct is an incomplete type in main.c.

Caution: The opaque pointer pattern in C relies entirely on programmer discipline. Nothing stops someone from copying the struct definition into their own file. It is a convention, not a guarantee.

Handles You Already Know

The C standard library and POSIX use this pattern everywhere:

+------------------+-----------------------------+
| Handle           | Hidden struct               |
+------------------+-----------------------------+
| FILE *           | struct _IO_FILE (glibc)     |
| DIR *            | struct __dirstream           |
| pthread_t        | unsigned long (or struct)    |
| sqlite3 *        | struct sqlite3               |
+------------------+-----------------------------+

You call fopen() and get a FILE *. You never allocate a FILE yourself. You never inspect its fields. You pass it to fread, fwrite, fclose. That is the handle pattern.

Driver Prep: The Linux kernel uses opaque handles constantly. A struct file * is passed through the VFS layer. Driver authors implement operations behind function pointers without exposing internal state to userspace.

Try It: Add a widget_rename function that changes the widget's name. Make sure the old name is freed. Only modify widget.c and widget.h — the caller should not need to know how names are stored.

Rust: Privacy by Default

Rust flips the default. Struct fields are private unless you say pub.

// widget.rs (or lib.rs in a library crate)

pub struct Widget {
    name: String,   // private -- only this module can touch it
    x: i32,         // private
    y: i32,         // private
}

impl Widget {
    /// Constructor -- the only way to create a Widget from outside
    pub fn new(name: &str, x: i32, y: i32) -> Self {
        Widget {
            name: name.to_string(),
            x,
            y,
        }
    }

    pub fn name(&self) -> &str {
        &self.name
    }

    pub fn x(&self) -> i32 {
        self.x
    }

    pub fn y(&self) -> i32 {
        self.y
    }

    pub fn move_by(&mut self, dx: i32, dy: i32) {
        self.x += dx;
        self.y += dy;
    }
}

fn main() {
    let mut w = Widget::new("button", 10, 20);
    println!("Widget '{}' at ({}, {})", w.name(), w.x(), w.y());

    w.move_by(5, -3);
    println!("After move: ({}, {})", w.x(), w.y());

    // w.x = 99;  // compile error: field `x` is private
}

Compile and run:

rustc widget.rs && ./widget

Output:

Widget 'button' at (10, 20)
After move: (15, 17)

Rust Note: In Rust, privacy is enforced at the module level, not the file level. All code in the same module can access private fields. But code outside the module cannot, even within the same crate, unless fields are marked pub.

Module Visibility in Detail

Rust gives you fine-grained control:

mod engine {
    pub struct Motor {
        pub horsepower: u32,        // anyone can read/write
        pub(crate) serial: u64,     // only this crate
        pub(super) temperature: f64,// only the parent module
        rpm: u32,                   // only this module
    }

    impl Motor {
        pub fn new(hp: u32) -> Self {
            Motor {
                horsepower: hp,
                serial: 12345,
                temperature: 90.0,
                rpm: 0,
            }
        }

        pub fn start(&mut self) {
            self.rpm = 800;
        }

        pub fn rpm(&self) -> u32 {
            self.rpm
        }
    }
}

fn main() {
    let mut m = engine::Motor::new(250);
    m.start();
    println!("HP: {}, RPM: {}", m.horsepower, m.rpm());

    // m.rpm = 9000;         // error: field `rpm` is private
    // m.serial = 0;         // error: field `serial` is private (pub(crate) --
    //                       //   same crate but different module path depending
    //                       //   on context in a multi-file project)
}

The visibility ladder:

+---------------------+--------------------------------+
| Visibility          | Who can access                 |
+---------------------+--------------------------------+
| (none) / private    | current module only            |
| pub(self)           | same as private (explicit)     |
| pub(super)          | parent module                  |
| pub(crate)          | anywhere in the same crate     |
| pub                 | anyone, including other crates |
+---------------------+--------------------------------+

The Newtype Pattern

Sometimes you want a distinct type that wraps a primitive. In C you use typedef, but it creates only an alias — not a separate type.

C: typedef is a Weak Alias

/* c_newtype.c */
#include <stdio.h>

typedef int user_id;
typedef int product_id;

void print_user(user_id uid)
{
    printf("User: %d\n", uid);
}

int main(void)
{
    user_id   u = 42;
    product_id p = 99;

    print_user(u);   /* correct */
    print_user(p);   /* compiles fine -- oops! */

    return 0;
}

The compiler treats user_id and product_id as the same type. No warning. No error. Just a bug waiting to happen.

Rust: Newtype Is a Real Type

// newtype.rs
struct UserId(i32);
struct ProductId(i32);

fn print_user(uid: &UserId) {
    println!("User: {}", uid.0);
}

fn main() {
    let u = UserId(42);
    let _p = ProductId(99);

    print_user(&u);     // ok
    // print_user(&_p); // compile error: expected `&UserId`, found `&ProductId`
}

Compile and run:

rustc newtype.rs && ./newtype

Output:

User: 42

The newtype wrapper has zero runtime overhead — it is the same size as the inner value. But the compiler treats them as distinct types.

Memory layout (both are identical at runtime):

  UserId(42)      ProductId(99)
  +----------+    +----------+
  | 42 (i32) |    | 99 (i32) |
  +----------+    +----------+
  4 bytes         4 bytes

But the type system sees them as DIFFERENT types.

Try It: Add a Meters and Feet newtype in Rust. Write a function add_meters(Meters, Meters) -> Meters. Verify that passing a Feet value is a compile error.

C: Opaque Handle with Function Pointers (vtable-style)

A more advanced C pattern combines opaque types with function pointers. This is how the Linux kernel implements polymorphism (e.g., struct file_operations).

/* stream.h */
#ifndef STREAM_H
#define STREAM_H

#include <stddef.h>

typedef struct stream stream_t;

/* Operations table -- like a vtable */
typedef struct {
    int  (*read)(stream_t *s, void *buf, size_t len);
    int  (*write)(stream_t *s, const void *buf, size_t len);
    void (*close)(stream_t *s);
} stream_ops_t;

stream_t *stream_create(const stream_ops_t *ops, void *private_data);
void      stream_destroy(stream_t *s);
int       stream_read(stream_t *s, void *buf, size_t len);
int       stream_write(stream_t *s, const void *buf, size_t len);

#endif
/* stream.c */
#include "stream.h"
#include <stdlib.h>

struct stream {
    const stream_ops_t *ops;
    void               *private_data;
};

stream_t *stream_create(const stream_ops_t *ops, void *private_data)
{
    stream_t *s = malloc(sizeof(*s));
    if (!s)
        return NULL;
    s->ops = ops;
    s->private_data = private_data;
    return s;
}

void stream_destroy(stream_t *s)
{
    if (s && s->ops->close)
        s->ops->close(s);
    free(s);
}

int stream_read(stream_t *s, void *buf, size_t len)
{
    if (!s || !s->ops->read)
        return -1;
    return s->ops->read(s, buf, len);
}

int stream_write(stream_t *s, const void *buf, size_t len)
{
    if (!s || !s->ops->write)
        return -1;
    return s->ops->write(s, buf, len);
}

This is the same architecture as struct file_operations in the Linux kernel. The caller never sees the internal struct. The operations table provides polymorphism.

Driver Prep: When you write a Linux driver, you fill in a struct file_operations with function pointers for read, write, open, release, and ioctl. The kernel calls your functions through these pointers. The opaque-handle-plus-vtable pattern is the foundation of the entire VFS layer.

Rust: Traits as the Clean Equivalent

// stream_trait.rs
use std::io;

trait Stream {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
    fn write(&mut self, buf: &[u8]) -> io::Result<usize>;
}

struct MemoryStream {
    data: Vec<u8>,
    pos: usize,
}

impl MemoryStream {
    fn new(data: Vec<u8>) -> Self {
        MemoryStream { data, pos: 0 }
    }
}

impl Stream for MemoryStream {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        let remaining = &self.data[self.pos..];
        let n = buf.len().min(remaining.len());
        buf[..n].copy_from_slice(&remaining[..n]);
        self.pos += n;
        Ok(n)
    }

    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
        self.data.extend_from_slice(buf);
        Ok(buf.len())
    }
}

fn read_all(stream: &mut dyn Stream) -> io::Result<Vec<u8>> {
    let mut result = Vec::new();
    let mut buf = [0u8; 64];
    loop {
        let n = stream.read(&mut buf)?;
        if n == 0 {
            break;
        }
        result.extend_from_slice(&buf[..n]);
    }
    Ok(result)
}

fn main() {
    let mut ms = MemoryStream::new(b"Hello, opaque world!".to_vec());
    let data = read_all(&mut ms).unwrap();
    println!("{}", String::from_utf8_lossy(&data));
}

Compile and run:

rustc stream_trait.rs && ./stream_trait

The trait object &mut dyn Stream is Rust's version of the vtable pattern. The compiler generates a vtable automatically. No manual function-pointer tables needed.

Side-by-Side Comparison

+--------------------------+----------------------------+
| C (opaque pointer)       | Rust (private fields)      |
+--------------------------+----------------------------+
| Forward-declare struct   | Fields private by default  |
| Define in .c file only   | Define in module           |
| Expose via pointer       | Expose via pub methods     |
| Convention-based         | Compiler-enforced          |
| Caller can cast around   | Caller cannot bypass       |
| typedef = weak alias     | Newtype = real distinct type|
| Function ptr table       | Trait object (dyn Trait)   |
+--------------------------+----------------------------+

Knowledge Check

  1. In C, what makes a struct "opaque" to callers? What compiler concept prevents the caller from accessing fields?

  2. In Rust, what is the default visibility of a struct field? How do you make it accessible outside the module?

  3. Why is a Rust newtype safer than a C typedef for preventing argument mix-ups?

Common Pitfalls

  • Forgetting the destructor in C opaque types. The caller cannot free the internals because they cannot see them. You must provide a destroy function.
  • Leaking the definition by putting the struct in a "private" header that someone else includes anyway. In C, there is no enforcement.
  • Making all fields pub in Rust "just to get it compiling." This throws away the safety you get for free.
  • Returning mutable references to private fields from Rust methods. This leaks internal state just as badly as making the field public.
  • Confusing typedef with a newtype. In C, typedef int foo does not create a new type. The compiler still treats it as int.

Error Handling: errno to Result

Every syscall can fail. Every allocation can return NULL. How a language handles errors defines how reliable the software built with it can be. C gives you conventions. Rust gives you a type system. This chapter covers both, from the humble return code to the ? operator.

C: The Return Code Convention

The simplest C error-handling pattern: return 0 for success, non-zero for failure.

/* retcode.c */
#include <stdio.h>

int parse_positive_int(const char *s, int *out)
{
    int val = 0;
    if (!s || !out)
        return -1;  /* invalid argument */

    for (const char *p = s; *p; p++) {
        if (*p < '0' || *p > '9')
            return -2;  /* not a digit */
        val = val * 10 + (*p - '0');
    }

    *out = val;
    return 0;  /* success */
}

int main(void)
{
    int val;
    int rc;

    rc = parse_positive_int("42", &val);
    if (rc == 0)
        printf("Parsed: %d\n", val);
    else
        printf("Error: %d\n", rc);

    rc = parse_positive_int("12ab", &val);
    if (rc == 0)
        printf("Parsed: %d\n", val);
    else
        printf("Error: %d\n", rc);

    return 0;
}

Compile and run:

gcc -Wall -o retcode retcode.c && ./retcode

Output:

Parsed: 42
Error: -2

This works for small programs. But notice: the caller must remember to check the return value. The compiler will not warn if they forget.

C: errno -- The Global Error Variable

POSIX functions return -1 (or NULL) on failure and set errno to indicate what went wrong.

/* errno_demo.c */
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
    int fd = open("/nonexistent/file.txt", O_RDONLY);
    if (fd == -1) {
        printf("errno = %d\n", errno);
        printf("strerror: %s\n", strerror(errno));
        perror("open");  /* prints: open: No such file or directory */
    }

    errno = 0;  /* manual reset */

    FILE *f = fopen("/etc/shadow", "r");
    if (!f) {
        perror("fopen /etc/shadow");
    }

    return 0;
}

Compile and run:

gcc -Wall -o errno_demo errno_demo.c && ./errno_demo

The flow:

  open("/nonexistent/file.txt", O_RDONLY)
      |
      +-- kernel returns error
      +-- glibc sets errno = ENOENT (2)
      +-- returns -1 to caller
      +-- caller checks return value, reads errno for detail

Caution: errno is thread-local in modern C (C11 / POSIX), but it is still fragile. Any function call between the failing call and reading errno can overwrite it. Always check errno immediately after the call that failed.

C: The -1 / NULL / errno Pattern

Most POSIX and standard library functions follow one of these patterns:

+---------------------------+------------------+------------------+
| Function returns          | On success       | On failure       |
+---------------------------+------------------+------------------+
| int (file descriptor)     | >= 0             | -1, sets errno   |
| pointer                   | valid pointer    | NULL, sets errno |
| ssize_t (byte count)      | >= 0             | -1, sets errno   |
| int (status)              | 0                | -1, sets errno   |
+---------------------------+------------------+------------------+

Not all functions follow this. getchar() returns EOF on failure. pthread_create returns the error number directly (not through errno). You must read the man page for every function you call.

Caution: Some functions (like strtol) have ambiguous return values. A return of 0 might mean "parsed zero" or "parse failed." You must set errno = 0 before calling and check it afterward.

C: The goto cleanup Pattern

When a function acquires multiple resources, you need to release them all on any error path. The goto pattern is standard practice in C -- and is used extensively in the Linux kernel.

/* goto_cleanup.c */
#include <stdio.h>
#include <stdlib.h>

int process_file(const char *path)
{
    int ret = -1;
    FILE *f = NULL;
    char *buf = NULL;

    f = fopen(path, "r");
    if (!f) {
        perror("fopen");
        goto cleanup;
    }

    buf = malloc(4096);
    if (!buf) {
        perror("malloc");
        goto cleanup;
    }

    if (!fgets(buf, 4096, f)) {
        if (ferror(f)) {
            perror("fgets");
            goto cleanup;
        }
        buf[0] = '\0';
    }

    printf("First line: %s", buf);
    ret = 0;  /* success */

cleanup:
    free(buf);   /* free(NULL) is safe */
    if (f)
        fclose(f);
    return ret;
}

int main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <file>\n", argv[0]);
        return 1;
    }
    return process_file(argv[1]) == 0 ? 0 : 1;
}

Compile and run:

gcc -Wall -o goto_cleanup goto_cleanup.c && echo "hello world" > /tmp/test.txt && ./goto_cleanup /tmp/test.txt

The goto-cleanup pattern:

  function entry
      |
      allocate resource A ---fail---> goto cleanup
      |
      allocate resource B ---fail---> goto cleanup
      |
      do work -------------fail---> goto cleanup
      |
      success: ret = 0
      |
  cleanup:
      free B (if allocated)
      free A (if allocated)
      return ret

Driver Prep: The Linux kernel style guide explicitly endorses goto for error handling. You will see this pattern in virtually every kernel function that acquires resources.

Try It: Modify process_file to also malloc a second buffer for a processed output. Add the proper cleanup. What happens if you forget to free the second buffer on the error path?

Rust: Result<T, E>

Rust replaces all of the above with a single type:

#![allow(unused)]
fn main() {
enum Result<T, E> {
    Ok(T),
    Err(E),
}
}

You cannot ignore it. If a function returns Result, the compiler warns you if you do not handle it.

// result_basic.rs
use std::fs;
use std::io;
fn read_first_line(path: &str) -> Result<String, io::Error> {
    let contents = fs::read_to_string(path)?;
    let first = contents.lines().next().unwrap_or("").to_string();
    Ok(first)
}

fn main() {
    match read_first_line("/tmp/test.txt") {
        Ok(line) => println!("First line: {}", line),
        Err(e) => eprintln!("Error: {}", e),
    }

    match read_first_line("/nonexistent/file.txt") {
        Ok(line) => println!("First line: {}", line),
        Err(e) => eprintln!("Error: {}", e),
    }
}

Compile and run:

rustc result_basic.rs && ./result_basic

The ? Operator

The ? operator is Rust's answer to the goto-cleanup pattern. It does three things:

  1. If the Result is Ok(val), unwrap val and continue.
  2. If the Result is Err(e), convert e into the function's error type and return early.
  3. All cleanup happens automatically via Drop.
// question_mark.rs
use std::fs::File;
use std::io::{self, BufRead, BufReader};

fn first_line_length(path: &str) -> Result<usize, io::Error> {
    let file = File::open(path)?;          // returns Err if open fails
    let reader = BufReader::new(file);
    let mut line = String::new();
    reader.take(4096).read_line(&mut line)?; // returns Err if read fails
    Ok(line.trim_end().len())
}

fn main() {
    match first_line_length("/tmp/test.txt") {
        Ok(len) => println!("Length: {}", len),
        Err(e) => eprintln!("Error: {}", e),
    }
}

Compare the flow to C's goto:

  C (goto cleanup)                Rust (? operator)
  ----------------                ------------------
  f = fopen(path)                 let file = File::open(path)?;
  if (!f) goto cleanup;           // auto-returns Err on failure

  buf = malloc(4096);             let mut line = String::new();
  if (!buf) goto cleanup;         // String manages its own memory

  fgets(buf, 4096, f);            reader.read_line(&mut line)?;
  if error goto cleanup;          // auto-returns, auto-cleans up

  cleanup:                        // no cleanup block needed --
    free(buf);                    // Drop runs automatically
    fclose(f);

Option -- Nullable Without NULL

C uses NULL for "no value." Rust uses Option<T>:

// option_demo.rs
fn find_first_negative(nums: &[i32]) -> Option<usize> {
    for (i, &n) in nums.iter().enumerate() {
        if n < 0 {
            return Some(i);
        }
    }
    None
}

fn main() {
    let data = [3, 7, -2, 5, -8];

    match find_first_negative(&data) {
        Some(idx) => println!("First negative at index {}", idx),
        None => println!("No negatives found"),
    }

    if let Some(idx) = find_first_negative(&data) {
        println!("Value: {}", data[idx]);
    }

    let empty: &[i32] = &[];
    println!("In empty: {:?}", find_first_negative(empty));
}

Compile and run:

rustc option_demo.rs && ./option_demo

Rust Note: Option<T> has zero overhead for pointer types. The compiler uses the null representation internally, so Option<&T> is the same size as a raw pointer. This is called the "null pointer optimization."

Custom Error Types

Real programs have multiple error sources. In Rust, you define an enum that implements std::error::Error:

// custom_error.rs
use std::fmt;
use std::io;
use std::num::ParseIntError;

#[derive(Debug)]
enum AppError {
    Io(io::Error),
    Parse(ParseIntError),
    InvalidArg(String),
}

impl fmt::Display for AppError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            AppError::Io(e) => write!(f, "I/O error: {}", e),
            AppError::Parse(e) => write!(f, "Parse error: {}", e),
            AppError::InvalidArg(msg) => write!(f, "Invalid argument: {}", msg),
        }
    }
}

impl From<io::Error> for AppError {
    fn from(e: io::Error) -> Self {
        AppError::Io(e)
    }
}

impl From<ParseIntError> for AppError {
    fn from(e: ParseIntError) -> Self {
        AppError::Parse(e)
    }
}

fn read_number_from_file(path: &str) -> Result<i64, AppError> {
    if path.is_empty() {
        return Err(AppError::InvalidArg("empty path".to_string()));
    }
    let contents = std::fs::read_to_string(path)?;  // io::Error -> AppError
    let num: i64 = contents.trim().parse()?;         // ParseIntError -> AppError
    Ok(num)
}

fn main() {
    match read_number_from_file("/tmp/number.txt") {
        Ok(n) => println!("Number: {}", n),
        Err(e) => eprintln!("Error: {}", e),
    }

    match read_number_from_file("") {
        Ok(n) => println!("Number: {}", n),
        Err(e) => eprintln!("Error: {}", e),
    }
}

Compile and run:

echo "42" > /tmp/number.txt
rustc custom_error.rs && ./custom_error

The From implementations let the ? operator automatically convert different error types into your unified AppError.

Why C Error Handling Leads to Bugs

Consider this real-world pattern:

/* bug_demo.c -- spot the bugs */
#include <stdio.h>
#include <stdlib.h>

char *load_config(const char *path)
{
    FILE *f = fopen(path, "r");
    if (!f)
        return NULL;

    char *buf = malloc(1024);
    if (!buf)
        return NULL;  /* BUG: f is leaked! */

    if (!fgets(buf, 1024, f)) {
        free(buf);
        return NULL;  /* BUG: f is leaked again! */
    }

    fclose(f);
    return buf;
}

int main(void)
{
    char *cfg = load_config("/tmp/test.txt");
    if (cfg) {
        printf("Config: %s", cfg);
        free(cfg);
    }
    return 0;
}

Two resource leaks in a 15-line function. This is everywhere in C codebases. The language simply does not help.

Try It: Fix the bugs in load_config using the goto-cleanup pattern. Then write the equivalent in Rust and observe that the resource leaks are impossible.

Using thiserror and anyhow (Cargo ecosystem)

For real Rust projects, two crates simplify error handling enormously.

thiserror -- for library code, auto-generates Display and From:

#![allow(unused)]
fn main() {
// In Cargo.toml: thiserror = "1"
use thiserror::Error;

#[derive(Error, Debug)]
enum DataError {
    #[error("I/O error: {0}")]
    Io(#[from] std::io::Error),

    #[error("parse error: {0}")]
    Parse(#[from] std::num::ParseIntError),

    #[error("value {0} out of range {1}..{2}")]
    OutOfRange(i64, i64, i64),
}
}

anyhow -- for application code, wraps any error into a single type:

// In Cargo.toml: anyhow = "1"
use anyhow::{Context, Result};

fn load_config(path: &str) -> Result<String> {
    let contents = std::fs::read_to_string(path)
        .with_context(|| format!("failed to read config from {}", path))?;
    Ok(contents)
}

fn main() -> Result<()> {
    let cfg = load_config("/tmp/test.txt")?;
    println!("Config: {}", cfg.trim());
    Ok(())
}

Error Handling Decision Tree

  Are you writing a library?
      |
      YES --> Use thiserror (or manual impl)
      |         - Callers need to match on specific errors
      |
      NO --> Are you writing an application?
              |
              YES --> Use anyhow
              |         - Just print errors and bail
              |         - Use .context() for readable messages
              |
              (learning / small scripts) --> Use Box<dyn Error>

Mapping C errno to Rust

When calling C functions from Rust (via FFI), convert errno to a Rust error using std::io::Error:

// errno_to_rust.rs
fn main() {
    let result = std::fs::File::open("/nonexistent/path");
    match result {
        Ok(_) => println!("opened"),
        Err(e) => {
            println!("Error: {}", e);
            println!("OS error code: {:?}", e.raw_os_error());
            println!("Error kind: {:?}", e.kind());
        }
    }
}

Output:

Error: No such file or directory (os error 2)
OS error code: Some(2)
Error kind: NotFound

io::Error wraps errno values and maps them to the ErrorKind enum. This is how Rust bridges the C world.

Knowledge Check

  1. What happens in C if you call two POSIX functions in a row and only check errno after the second one?

  2. In Rust, what does the ? operator do when it encounters an Err value?

  3. Why does Rust not need a goto-cleanup pattern for resource management?

Common Pitfalls

  • Forgetting to check return values in C. The compiler will happily let you ignore the return value of read(), write(), or close(). Bugs hide here for years.
  • Checking errno after an intervening call. Even printf can overwrite errno. Read it immediately after the failing call.
  • Using unwrap() in Rust production code. It panics on Err. Use ? or match instead. Reserve unwrap() for cases where failure is truly impossible.
  • Ignoring #[must_use] warnings. Rust marks Result as #[must_use]. If you see a warning about an unused Result, you are ignoring an error.
  • Confusing Option with Result. Use Option for "might not exist" and Result for "might fail." Do not use Result<T, ()> when you mean Option<T>.

The Preprocessor and Macros

Before the C compiler sees your code, the preprocessor runs a text-substitution pass over it. This is powerful, dangerous, and entirely unlike anything in most modern languages. Rust replaces the preprocessor with a hygienic macro system and feature flags. This chapter covers both.

The C Preprocessor: A Text Rewriting Engine

The preprocessor operates on text, not on syntax trees. It knows nothing about types, scopes, or semantics. Every directive starts with #.

  Source file (.c)
      |
      v
  [Preprocessor]  -- #include, #define, #ifdef
      |
      v
  Translation unit (expanded text)
      |
      v
  [Compiler]  -- parsing, type checking, codegen
      |
      v
  Object file (.o)

#include and Include Guards

#include literally copies the contents of another file into the current position. No module system, no namespacing -- just text insertion. Without protection, including the same header twice causes redefinition errors:

/* sensor.h */
#ifndef SENSOR_H
#define SENSOR_H

struct sensor {
    int id;
    float value;
};

int sensor_read(struct sensor *s);

#endif /* SENSOR_H */

Many compilers also support #pragma once as a non-standard shortcut.

#define: Constants and Macros

Simple Constants

/* constants.c */
#include <stdio.h>

#define MAX_SENSORS  16
#define PI           3.14159265358979
#define VERSION      "1.0.3"

int main(void)
{
    printf("Max sensors: %d\n", MAX_SENSORS);
    printf("Pi: %.5f\n", PI);
    printf("Version: %s\n", VERSION);
    return 0;
}

The preprocessor replaces every occurrence of MAX_SENSORS with the literal text 16. No type checking. No scoping.

Caution: #define constants have no type. MAX_SENSORS is not an int -- it is the text 16. Prefer enum or static const for typed constants in modern C.

Function-Like Macros

/* macros.c */
#include <stdio.h>

#define MIN(a, b)  ((a) < (b) ? (a) : (b))
#define MAX(a, b)  ((a) > (b) ? (a) : (b))
#define SQUARE(x)  ((x) * (x))

int main(void)
{
    printf("min(3, 7) = %d\n", MIN(3, 7));
    printf("max(3, 7) = %d\n", MAX(3, 7));
    printf("square(5) = %d\n", SQUARE(5));
    printf("square(2+3) = %d\n", SQUARE(2 + 3));  /* 25, correct with parens */
    return 0;
}

Compile and run:

gcc -Wall -o macros macros.c && ./macros

Every parameter is wrapped in parentheses and the whole expression is wrapped in outer parentheses. Without this, operator precedence causes silent bugs.

Caution: Macro arguments are evaluated each time they appear. Consider SQUARE(i++) -- this expands to ((i++) * (i++)), which increments i twice and invokes undefined behavior. This is the most infamous macro pitfall in C. Never pass expressions with side effects to C macros.

Useful Kernel Patterns

The Linux kernel defines several macros that every systems programmer should know:

/* kernel_patterns.c */
#include <stdio.h>
#include <stddef.h>

/* Array size -- works only on true arrays, not pointers */
#define ARRAY_SIZE(arr)  (sizeof(arr) / sizeof((arr)[0]))

/* Build-time assertion (simplified) */
#define BUILD_BUG_ON(cond) \
    ((void)sizeof(char[1 - 2 * !!(cond)]))

/* Container-of: get the parent struct from a member pointer */
#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

struct device {
    int id;
    char name[32];
};

int main(void)
{
    int data[] = {10, 20, 30, 40, 50};
    printf("Array size: %zu\n", ARRAY_SIZE(data));

    BUILD_BUG_ON(sizeof(int) != 4);  /* passes on most platforms */
    /* BUILD_BUG_ON(sizeof(int) == 4);  -- would fail to compile */

    struct device dev = { .id = 42, .name = "sensor" };
    char *name_ptr = dev.name;
    struct device *dev_ptr = container_of(name_ptr, struct device, name);
    printf("Device ID via container_of: %d\n", dev_ptr->id);

    return 0;
}

Compile and run:

gcc -Wall -o kernel_patterns kernel_patterns.c && ./kernel_patterns

Output:

Array size: 5
Device ID via container_of: 42

Driver Prep: container_of is used on nearly every page of the Linux kernel source. When you have a pointer to a member of a struct (like a list_head), this macro recovers the pointer to the enclosing struct.

Stringification and Token Pasting

The preprocessor has two special operators: # turns a macro argument into a string, and ## pastes two tokens together.

/* stringify.c */
#include <stdio.h>

#define STRINGIFY(x)  #x
#define TO_STRING(x)  STRINGIFY(x)

#define CONCAT(a, b)  a##b

#define DEBUG_VAR(var)  printf(#var " = %d\n", var)

#define VERSION_MAJOR 2
#define VERSION_MINOR 7

int main(void)
{
    int count = 42;
    DEBUG_VAR(count);  /* expands to: printf("count" " = %d\n", count); */

    int xy = 100;
    printf("CONCAT(x, y) = %d\n", CONCAT(x, y));  /* becomes: xy */

    /* Two-level stringification for expanding macros */
    printf("Version: %s.%s\n",
           TO_STRING(VERSION_MAJOR),
           TO_STRING(VERSION_MINOR));

    return 0;
}

Compile and run:

gcc -Wall -o stringify stringify.c && ./stringify

Output:

count = 42
CONCAT(x, y) = 100
Version: 2.7

Note the two-level TO_STRING / STRINGIFY trick. If you write STRINGIFY(VERSION_MAJOR), you get the string "VERSION_MAJOR". The extra indirection forces macro expansion first.

Variadic Macros

/* variadic_macro.c */
#include <stdio.h>

#define LOG(fmt, ...) \
    fprintf(stderr, "[LOG] " fmt "\n", ##__VA_ARGS__)

int main(void)
{
    LOG("Starting up");
    LOG("Sensor %d: value = %.2f", 3, 27.5);
    LOG("Shutting down with code %d", 0);
    return 0;
}

Compile and run:

gcc -Wall -o variadic_macro variadic_macro.c && ./variadic_macro

The ##__VA_ARGS__ is a GCC extension that removes the trailing comma when no variadic arguments are passed.

X-Macros: Code Generation

X-macros generate repetitive code from a single list definition.

/* x_macro.c */
#include <stdio.h>

#define ERROR_LIST \
    X(ERR_NONE,     "no error")       \
    X(ERR_IO,       "I/O error")      \
    X(ERR_PARSE,    "parse error")    \
    X(ERR_OVERFLOW, "overflow")       \
    X(ERR_TIMEOUT,  "timeout")

/* Generate the enum */
#define X(code, str) code,
typedef enum {
    ERROR_LIST
} error_code_t;
#undef X

/* Generate the string table */
#define X(code, str) [code] = str,
static const char *error_strings[] = {
    ERROR_LIST
};
#undef X

const char *error_to_string(error_code_t e)
{
    if (e < 0 || (size_t)e >= sizeof(error_strings) / sizeof(error_strings[0]))
        return "unknown error";
    return error_strings[e];
}

int main(void)
{
    for (int i = ERR_NONE; i <= ERR_TIMEOUT; i++) {
        printf("%d: %s\n", i, error_to_string(i));
    }
    return 0;
}

Compile and run:

gcc -Wall -o x_macro x_macro.c && ./x_macro

Output:

0: no error
1: I/O error
2: parse error
3: overflow
4: timeout

The error codes and their string representations are defined in one place. You can never add a code and forget its string, or vice versa.

Conditional Compilation

/* conditional.c */
#include <stdio.h>

#ifdef __linux__
    #define PLATFORM "Linux"
#elif defined(_WIN32)
    #define PLATFORM "Windows"
#elif defined(__APPLE__)
    #define PLATFORM "macOS"
#else
    #define PLATFORM "Unknown"
#endif

#ifndef NDEBUG
    #define DBG(fmt, ...) fprintf(stderr, "DBG: " fmt "\n", ##__VA_ARGS__)
#else
    #define DBG(fmt, ...) ((void)0)
#endif

int main(void)
{
    printf("Platform: %s\n", PLATFORM);
    DBG("This only prints in debug mode");
    DBG("x = %d", 42);
    return 0;
}

Compile and run:

gcc -Wall -o conditional conditional.c && ./conditional
gcc -Wall -DNDEBUG -o conditional_rel conditional.c && ./conditional_rel

In the release build (-DNDEBUG), the DBG macro expands to nothing.

Try It: Add a #define VERBOSE flag. When defined, make the LOG macro also print the file name and line number using __FILE__ and __LINE__.

Rust: macro_rules! -- Pattern-Matching Macros

Rust macros operate on the syntax tree, not on raw text. They are hygienic: they cannot accidentally capture variables from the surrounding scope.

// rust_macros.rs

macro_rules! min {
    ($a:expr, $b:expr) => {{
        let a = $a;
        let b = $b;
        if a < b { a } else { b }
    }};
}

macro_rules! debug_var {
    ($var:expr) => {
        eprintln!("{} = {:?}", stringify!($var), $var);
    };
}

macro_rules! make_vec {
    ( $( $elem:expr ),* $(,)? ) => {{
        let mut v = Vec::new();
        $( v.push($elem); )*
        v
    }};
}

fn main() {
    let x = 10;
    let y = 3;
    println!("min({}, {}) = {}", x, y, min!(x, y));

    // Safe with side effects -- each argument evaluated once
    let mut counter = 0;
    let result = min!({ counter += 1; counter }, 5);
    println!("result = {}, counter = {}", result, counter);
    // counter is exactly 1, not 2

    debug_var!(x + y);
    debug_var!("hello");

    let v = make_vec![1, 2, 3, 4, 5];
    println!("vec: {:?}", v);
}

Compile and run:

rustc rust_macros.rs && ./rust_macros

Output:

min(10, 3) = 3
result = 1, counter = 1
x + y = 13
"hello" = "hello"
vec: [1, 2, 3, 4, 5]

Rust Note: Rust macros evaluate each argument once by binding it to a local variable. The SQUARE(i++) bug from C is impossible. This is what "hygienic macros" means in practice.

Rust: cfg Attributes for Conditional Compilation

Rust replaces #ifdef with the cfg attribute system:

// cfg_demo.rs

#[cfg(target_os = "linux")]
fn platform() -> &'static str {
    "Linux"
}

#[cfg(target_os = "windows")]
fn platform() -> &'static str {
    "Windows"
}

#[cfg(target_os = "macos")]
fn platform() -> &'static str {
    "macOS"
}

#[cfg(not(any(target_os = "linux", target_os = "windows", target_os = "macos")))]
fn platform() -> &'static str {
    "Unknown"
}

fn main() {
    println!("Platform: {}", platform());

    if cfg!(debug_assertions) {
        println!("Debug mode is ON");
    } else {
        println!("Release mode");
    }
}

Compile and run:

rustc cfg_demo.rs && ./cfg_demo

The cfg! macro evaluates at compile time. Dead branches are eliminated entirely.

Rust: Feature Flags in Cargo

Cargo supports feature flags for conditional compilation:

# Cargo.toml
[package]
name = "myapp"
version = "0.1.0"
edition = "2021"

[features]
default = ["json"]
json = ["dep:serde_json"]
verbose_logging = []

[dependencies]
serde_json = { version = "1", optional = true }
// src/main.rs (Cargo project)

#[cfg(feature = "json")]
fn parse_config(data: &str) {
    let v: serde_json::Value = serde_json::from_str(data).unwrap();
    println!("Parsed JSON: {}", v);
}

#[cfg(not(feature = "json"))]
fn parse_config(_data: &str) {
    println!("JSON support not compiled in");
}

fn main() {
    parse_config(r#"{"key": "value"}"#);
}

Build with different features:

cargo run                              # default features (json)
cargo run --no-default-features        # no json
cargo run --features verbose_logging   # default + verbose

Procedural Macros: A Brief Overview

Rust also has procedural macros: Rust functions that transform token streams at compile time. The three kinds are derive macros (#[derive(Debug, Serialize)]), attribute macros (#[route("GET", "/users")]), and function-like macros (sql!(SELECT * FROM users)). They are defined in a separate crate. Derive macros are by far the most common.

Side-by-Side: C Preprocessor vs Rust Macros

+----------------------------+----------------------------------+
| C Preprocessor             | Rust Macros                      |
+----------------------------+----------------------------------+
| Text substitution          | Syntax tree transformation       |
| No hygiene                 | Hygienic -- no name capture      |
| Arguments re-evaluated     | Arguments evaluated once         |
| No type safety             | Type-checked after expansion     |
| #ifdef for platforms       | #[cfg()] attributes              |
| #define constants          | const / static                   |
| Include guards needed      | Module system handles it         |
| Errors point to expanded   | Errors point to macro call site  |
|   code (unreadable)        |   (usually readable)             |
+----------------------------+----------------------------------+

Debugging Macro Expansions

In C, use gcc -E to see the preprocessed output:

gcc -E macros.c | tail -20

In Rust, use cargo expand (requires the cargo-expand tool):

cargo install cargo-expand
cargo expand

Try It: Write a C macro CLAMP(x, lo, hi) that clamps a value to a range. Then write the Rust equivalent using macro_rules!. Verify that the Rust version is safe with side effects by passing { counter += 1; counter } as an argument.

Knowledge Check

  1. What happens if you write #define SQUARE(x) x * x without parentheses and then call SQUARE(2 + 3)?

  2. Why does SQUARE(i++) cause undefined behavior in C but not in a Rust macro?

  3. What is the difference between cfg!(target_os = "linux") and #[cfg(target_os = "linux")] in Rust?

Common Pitfalls

  • Missing parentheses in C macros. Always wrap every parameter and the entire expression: #define M(x) ((x) + 1).
  • Multi-statement macros without do-while. Use do { ... } while(0) for macros that expand to multiple statements, or they break if/else chains.
  • Macro arguments with side effects. Never pass i++ or function calls to C macros unless you know the argument is used exactly once.
  • Include guard name collisions. Using a common name like UTILS_H in two different libraries causes silent header suppression.
  • Over-using macros when functions work. Modern C compilers inline small functions automatically. Use static inline instead of function-like macros when possible.
  • Overcomplicating Rust macros. If a function does the job, use a function. Macros are for cases where you need syntax flexibility (variadic arguments, code generation, compile-time string manipulation).

Inline Assembly

Sometimes you need to drop below the language and talk directly to the CPU. Reading hardware registers, executing specific instructions the compiler does not emit, or inserting precise memory barriers -- these require inline assembly. This chapter shows how to embed assembly in both C (GCC extended asm) and Rust (the asm! macro), with real examples on x86-64.

When You Need Inline Assembly

Most code never needs it. But these situations demand it:

  • Reading CPU-specific registers (cycle counter, model info, control registers)
  • Memory and compiler barriers (preventing reordering in lock-free code)
  • Specific SIMD instructions that the compiler does not auto-vectorize to
  • Hardware I/O (in/out instructions for port-mapped I/O)
  • Atomic operations not provided by the language or library
  • System calls (the raw syscall instruction)

Driver Prep: Kernel code and device drivers use inline assembly for all of the above. The Linux kernel's arch/x86/include/asm/ directory is full of inline assembly wrappers. Understanding the constraint system is essential.

C: GCC Extended Assembly Syntax

The basic form:

asm volatile (
    "assembly template"
    : output operands
    : input operands
    : clobber list
);

Example: Reading the CPU Cycle Counter (RDTSC)

The rdtsc instruction reads the Time Stamp Counter into EDX:EAX (high 32 bits in EDX, low 32 bits in EAX).

/* rdtsc.c */
#include <stdio.h>
#include <stdint.h>

static inline uint64_t read_tsc(void)
{
    uint32_t lo, hi;
    asm volatile (
        "rdtsc"
        : "=a" (lo),   /* output: EAX -> lo */
          "=d" (hi)    /* output: EDX -> hi */
        :              /* no inputs */
        :              /* no extra clobbers */
    );
    return ((uint64_t)hi << 32) | lo;
}

int main(void)
{
    uint64_t start = read_tsc();

    /* Do some work */
    volatile int sum = 0;
    for (int i = 0; i < 1000000; i++)
        sum += i;

    uint64_t end = read_tsc();
    printf("Cycles: %lu\n", end - start);
    printf("Sum: %d\n", sum);
    return 0;
}

Compile and run:

gcc -O2 -Wall -o rdtsc rdtsc.c && ./rdtsc

The constraint letters tell GCC which registers to use:

+------------+---------------------------+
| Constraint | Meaning                   |
+------------+---------------------------+
| "=a"       | output in EAX             |
| "=d"       | output in EDX             |
| "=r"       | output in any GPR         |
| "=m"       | output in memory          |
| "r"        | input in any GPR          |
| "i"        | immediate constant        |
| "m"        | input in memory           |
+------------+---------------------------+
| Modifiers                              |
+------------+---------------------------+
| "="        | write-only output         |
| "+"        | read-write operand        |
| "&"        | early-clobber output      |
+------------+---------------------------+

Example: CPUID

The cpuid instruction returns CPU identification data. It reads EAX as a "leaf" selector and writes results to EAX, EBX, ECX, EDX.

/* cpuid.c */
#include <stdio.h>
#include <stdint.h>
#include <string.h>

void get_cpu_vendor(char *vendor)
{
    uint32_t ebx, ecx, edx;
    asm volatile (
        "cpuid"
        : "=b" (ebx),
          "=c" (ecx),
          "=d" (edx)
        : "a" (0)       /* input: leaf 0 */
        :
    );
    /* Vendor string is in EBX:EDX:ECX (yes, that order) */
    memcpy(vendor + 0, &ebx, 4);
    memcpy(vendor + 4, &edx, 4);
    memcpy(vendor + 8, &ecx, 4);
    vendor[12] = '\0';
}

int main(void)
{
    char vendor[13];
    get_cpu_vendor(vendor);
    printf("CPU Vendor: %s\n", vendor);
    return 0;
}

Compile and run:

gcc -O2 -Wall -o cpuid cpuid.c && ./cpuid

The volatile Keyword

asm volatile tells the compiler: "Do not optimize this away, do not move it, do not assume it has no side effects." Without volatile, the compiler may remove the asm block, reorder it, or merge duplicates. Always use volatile unless the asm block is a pure function with no side effects (rare).

Caution: Even volatile does not prevent CPU-level reordering. For that you need memory barriers (fence instructions). volatile only constrains the compiler's optimizer.

Memory Barriers

On modern out-of-order CPUs, the processor can reorder memory operations. Compilers can also reorder loads and stores. Barriers prevent both.

/* barriers.c */
#include <stdio.h>
#include <stdint.h>

/* Compiler barrier -- prevents compiler reordering, not CPU reordering */
#define compiler_barrier()  asm volatile ("" ::: "memory")

/* Full memory fence -- prevents both compiler and CPU reordering */
#define full_fence()        asm volatile ("mfence" ::: "memory")

/* Store fence -- all prior stores complete before any later store */
#define store_fence()       asm volatile ("sfence" ::: "memory")

/* Load fence -- all prior loads complete before any later load */
#define load_fence()        asm volatile ("lfence" ::: "memory")

volatile int shared_flag = 0;
volatile int shared_data = 0;

void producer(void)
{
    shared_data = 42;       /* write data first */
    store_fence();          /* ensure data is visible before flag */
    shared_flag = 1;        /* signal that data is ready */
}

void consumer(void)
{
    while (!shared_flag)    /* wait for flag */
        ;
    load_fence();           /* ensure we read data after flag */
    printf("Data: %d\n", shared_data);  /* guaranteed to see 42 */
}

int main(void)
{
    /* Single-threaded demo -- the barriers matter in multi-threaded code */
    producer();
    consumer();
    return 0;
}

Compile and run:

gcc -O2 -Wall -o barriers barriers.c && ./barriers

The "memory" clobber tells the compiler that the asm block may read or write any memory, so it must not reorder loads/stores across it.

  Without barrier:           With barrier:

  store data = 42            store data = 42
  store flag = 1             sfence
  (CPU may reorder these!)   store flag = 1
                             (stores are ordered)

Driver Prep: The Linux kernel defines mb(), rmb(), wmb() (full, read, write memory barriers) and smp_mb(), smp_rmb(), smp_wmb() (SMP variants). These are thin wrappers around inline assembly fence instructions.

C: Inline Assembly for a System Call

On x86-64 Linux, system calls use the syscall instruction:

/* raw_syscall.c */
#include <stdio.h>
#include <stdint.h>

/* write(fd, buf, count) -- syscall number 1 on x86-64 */
static long raw_write(int fd, const void *buf, unsigned long count)
{
    long ret;
    asm volatile (
        "syscall"
        : "=a" (ret)                /* output: return value in RAX */
        : "a" (1),                  /* input: syscall number in RAX */
          "D" ((long)fd),           /* input: arg1 in RDI */
          "S" ((long)buf),          /* input: arg2 in RSI */
          "d" ((long)count)         /* input: arg3 in RDX */
        : "rcx", "r11", "memory"   /* clobbers: syscall destroys RCX, R11 */
    );
    return ret;
}

int main(void)
{
    const char msg[] = "Hello from raw syscall!\n";
    long written = raw_write(1, msg, sizeof(msg) - 1);
    printf("Bytes written: %ld\n", written);
    return 0;
}

Compile and run:

gcc -O2 -Wall -o raw_syscall raw_syscall.c && ./raw_syscall

Output:

Hello from raw syscall!
Bytes written: 23

The x86-64 Linux syscall convention: RAX = syscall number and return value, arguments in RDI, RSI, RDX, R10, R8, R9. The CPU clobbers RCX and R11.

Try It: Implement a raw_exit(int code) function using the syscall instruction (syscall number 60 on x86-64). Call it instead of return 0 and verify the exit code with echo $?.

Rust: The asm! Macro

Rust stabilized inline assembly in Rust 1.59. The syntax is different from GCC's but the concept is the same.

Example: Reading the CPU Cycle Counter

// rdtsc_rust.rs
use std::arch::asm;

fn read_tsc() -> u64 {
    let lo: u32;
    let hi: u32;
    unsafe {
        asm!(
            "rdtsc",
            out("eax") lo,
            out("edx") hi,
            options(nostack, nomem),
        );
    }
    ((hi as u64) << 32) | (lo as u64)
}

fn main() {
    let start = read_tsc();

    let mut sum: u64 = 0;
    for i in 0..1_000_000u64 {
        sum = sum.wrapping_add(i);
    }

    let end = read_tsc();
    println!("Cycles: {}", end - start);
    println!("Sum: {}", sum);
}

Compile and run:

rustc -O rdtsc_rust.rs && ./rdtsc_rust

Key differences from GCC syntax:

+---------------------------+--------------------------------+
| GCC (C)                   | Rust asm!                      |
+---------------------------+--------------------------------+
| "=a" (lo)                 | out("eax") lo                  |
| "=d" (hi)                 | out("edx") hi                  |
| : "memory"                | options(nomem) if no memory    |
|                           | access, otherwise omit         |
| asm volatile              | asm! is volatile by default    |
| "r" (input)               | in(reg) input                  |
+---------------------------+--------------------------------+

Example: CPUID in Rust

// cpuid_rust.rs
use std::arch::asm;

fn cpu_vendor() -> String {
    let ebx: u32;
    let ecx: u32;
    let edx: u32;

    unsafe {
        asm!(
            "cpuid",
            inout("eax") 0u32 => _,
            out("ebx") ebx,
            out("ecx") ecx,
            out("edx") edx,
            options(nostack),
        );
    }

    let mut vendor = [0u8; 12];
    vendor[0..4].copy_from_slice(&ebx.to_le_bytes());
    vendor[4..8].copy_from_slice(&edx.to_le_bytes());
    vendor[8..12].copy_from_slice(&ecx.to_le_bytes());
    String::from_utf8_lossy(&vendor).to_string()
}

fn main() {
    println!("CPU Vendor: {}", cpu_vendor());
}

Compile and run:

rustc -O cpuid_rust.rs && ./cpuid_rust

Rust asm! Operand and Option Reference

+-------------------+------------------------------------------+
| Operand           | Meaning                                  |
+-------------------+------------------------------------------+
| in(reg) expr      | Input in any general-purpose register    |
| in("eax") expr    | Input in a specific register             |
| out(reg) var      | Output to any GPR                        |
| out("edx") var    | Output to a specific register            |
| inout(reg) var    | Read-write operand                       |
| inout("eax") x=>y | Input x, output y, same register        |
| out(reg) _        | Clobbered register (output discarded)    |
+-------------------+------------------------------------------+

+-------------------+------------------------------------------+
| Option            | Meaning                                  |
+-------------------+------------------------------------------+
| nomem             | Asm does not read/write memory           |
| nostack           | Asm does not use the stack               |
| pure              | No side effects (allows optimization)    |
| preserves_flags   | Does not modify CPU flags (EFLAGS)       |
| att_syntax        | Use AT&T syntax instead of Intel         |
+-------------------+------------------------------------------+

By default, asm! blocks are treated as volatile (they will not be removed or reordered). Adding pure and nomem together allows the compiler to optimize like a regular function call.

Rust: A Raw System Call

// raw_syscall_rust.rs
use std::arch::asm;

/// Perform a raw write() system call on x86-64 Linux.
unsafe fn raw_write(fd: u64, buf: *const u8, count: u64) -> i64 {
    let ret: i64;
    asm!(
        "syscall",
        in("rax") 1u64,      // syscall number: write = 1
        in("rdi") fd,         // arg1: file descriptor
        in("rsi") buf as u64, // arg2: buffer pointer
        in("rdx") count,      // arg3: byte count
        out("rcx") _,         // clobbered by syscall
        out("r11") _,         // clobbered by syscall
        lateout("rax") ret,   // return value
        options(nostack),
    );
    ret
}

fn main() {
    let msg = b"Hello from Rust raw syscall!\n";
    let written = unsafe { raw_write(1, msg.as_ptr(), msg.len() as u64) };
    println!("Bytes written: {}", written);
}

Compile and run:

rustc -O raw_syscall_rust.rs && ./raw_syscall_rust

Note lateout("rax") instead of out("rax"). This tells the compiler that the output is written late (after inputs are consumed), which is necessary because rax is also used as an input.

Rust Note: In practice, use std::sync::atomic with proper Ordering values (SeqCst, Acquire, Release) instead of raw fence instructions for memory barriers. The atomic types generate the correct barriers automatically. Inline assembly for barriers is only needed when interfacing with hardware or writing the lowest levels of a synchronization library.

SIMD: A Practical Example

Let us use SSSE3 to sum an array of four 32-bit integers using 128-bit SIMD registers.

/* simd_sum.c */
#include <stdio.h>
#include <stdint.h>

int32_t simd_sum_4(const int32_t vals[4])
{
    int32_t result;
    asm volatile (
        "movdqu (%1), %%xmm0\n\t"     /* load 4 ints into xmm0 */
        "phaddd %%xmm0, %%xmm0\n\t"   /* horizontal add pairs */
        "phaddd %%xmm0, %%xmm0\n\t"   /* horizontal add again */
        "movd   %%xmm0, %0\n\t"       /* extract low 32 bits */
        : "=r" (result)
        : "r" (vals)
        : "xmm0", "memory"
    );
    return result;
}

int main(void)
{
    int32_t data[4] = {10, 20, 30, 40};
    printf("SIMD sum: %d\n", simd_sum_4(data));
    printf("Expected: %d\n", 10 + 20 + 30 + 40);
    return 0;
}

Compile and run (requires SSSE3):

gcc -O2 -Wall -mssse3 -o simd_sum simd_sum.c && ./simd_sum

The equivalent in Rust:

// simd_sum_rust.rs
use std::arch::asm;

fn simd_sum_4(vals: &[i32; 4]) -> i32 {
    let result: i32;
    unsafe {
        asm!(
            "movdqu ({ptr}), %xmm0",
            "phaddd %xmm0, %xmm0",
            "phaddd %xmm0, %xmm0",
            "movd   %xmm0, {out}",
            ptr = in(reg) vals.as_ptr(),
            out = out(reg) result,
            out("xmm0") _,
            options(att_syntax, nostack),
        );
    }
    result
}

fn main() {
    let data = [10i32, 20, 30, 40];
    println!("SIMD sum: {}", simd_sum_4(&data));
    println!("Expected: {}", 10 + 20 + 30 + 40);
}

Compile and run:

RUSTFLAGS="-C target-feature=+ssse3" rustc -O simd_sum_rust.rs && ./simd_sum_rust

Rust Note: For SIMD in production Rust code, prefer the std::arch intrinsics (like _mm_hadd_epi32) or the portable std::simd module (nightly). Use inline assembly only when the specific instruction you need has no intrinsic wrapper.

Try It: Modify the SIMD example to sum 8 integers using two movdqu loads and a paddd to combine them before the horizontal adds.

Safety Considerations

Inline assembly bypasses every safety guarantee both languages provide.

+----------------------------------+-----------------------------------+
| Risk                             | Mitigation                        |
+----------------------------------+-----------------------------------+
| Wrong register constraints       | Test on multiple opt levels       |
| Missing clobber declaration      | List ALL modified registers       |
| Stack misalignment               | Use nostack or align manually     |
| Forgetting "memory" clobber      | Add if asm touches any memory     |
| Platform-specific code           | Guard with #ifdef / #[cfg()]      |
| Compiler upgrades break asm      | Minimize asm surface area         |
+----------------------------------+-----------------------------------+

Caution: Incorrect constraints are silent killers. The assembler will not warn you. The program will appear to work, then break under different optimization levels or with different surrounding code. Always test with -O0, -O2, and -O3.

Wrap each asm block in a small, well-named inline function. Keep the asm to the one or two instructions you actually need, and let the compiler handle everything else. The compiler is better at register allocation and instruction scheduling than you are.

Knowledge Check

  1. Why is asm volatile used instead of plain asm for reading hardware counters?

  2. What happens if you forget to list a register in the clobber list that your assembly modifies?

  3. In Rust's asm! macro, what is the difference between out("rax") and lateout("rax")?

Common Pitfalls

  • Missing volatile. Without it, the compiler may eliminate or move your asm block. Use volatile for anything with side effects.
  • Incomplete clobber lists. If your assembly modifies a register and you did not declare it, the compiler may store a value there that gets silently corrupted.
  • Forgetting the "memory" clobber. If your assembly reads or writes memory through a pointer, you must include "memory" in the clobber list (or use nomem/omit it appropriately in Rust).
  • Assuming AT&T vs Intel syntax. GCC uses AT&T syntax by default (src, dst). Rust's asm! uses Intel syntax by default (dst, src). Use the att_syntax option in Rust if you prefer AT&T.
  • Writing complex logic in assembly. Keep asm blocks to one or two instructions. Let the compiler handle the rest.
  • Not guarding platform-specific code. Wrap all inline assembly in #ifdef __x86_64__ (C) or #[cfg(target_arch = "x86_64")] (Rust) so the code does not break on ARM or other architectures.

From Source to Binary

When you type gcc main.c -o main, four distinct stages run in sequence. Understanding each stage turns opaque compiler errors into something you can reason about -- and makes debugging linker failures, ABI mismatches, and cross-compilation issues far less painful.

The Four Stages

 Source (.c)
    |
    v
 [Preprocessor]  -->  Expanded source (.i)
    |
    v
 [Compiler]       -->  Assembly (.s)
    |
    v
 [Assembler]      -->  Object file (.o)
    |
    v
 [Linker]         -->  Executable (ELF)

Each stage is a separate program. GCC orchestrates them, but you can stop at any point and inspect the output.

Stage 1: Preprocessing

The preprocessor handles #include, #define, #ifdef, and macro expansion. It produces pure C with no directives left.

/* version.h */
#ifndef VERSION_H
#define VERSION_H
#define APP_VERSION "1.0.3"
#define MAX_RETRIES 5
#endif
/* stage1.c */
#include <stdio.h>
#include "version.h"

#ifdef DEBUG
  #define LOG(msg) fprintf(stderr, "DEBUG: %s\n", msg)
#else
  #define LOG(msg) ((void)0)
#endif

int main(void) {
    LOG("starting up");
    printf("App version: %s\n", APP_VERSION);
    printf("Max retries: %d\n", MAX_RETRIES);
    return 0;
}

Stop after preprocessing:

gcc -E stage1.c -o stage1.i

Open stage1.i -- it will be thousands of lines long because <stdio.h> gets fully expanded. Scroll to the bottom and you will see your code with all macros replaced:

int main(void) {
    ((void)0);
    printf("App version: %s\n", "1.0.3");
    printf("Max retries: %d\n", 5);
    return 0;
}

The string "1.0.3" is inlined. LOG became ((void)0) because DEBUG was not defined. Now try:

gcc -E -DDEBUG stage1.c -o stage1_debug.i

The LOG call now expands to an actual fprintf.

Try It: Add a #define PLATFORM "linux" to version.h and use it in main. Run gcc -E and confirm the string appears in the .i file.

Stage 2: Compilation (to Assembly)

The compiler translates the preprocessed C into assembly for the target architecture. On x86-64:

gcc -S stage1.c -o stage1.s
/* arith.c */
int add(int a, int b) {
    return a + b;
}

int square(int x) {
    return x * x;
}
gcc -S -O0 arith.c -o arith.s

The output (simplified, x86-64):

add:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -4(%rbp), %edx
    movl    -8(%rbp), %eax
    addl    %edx, %eax
    popq    %rbp
    ret

square:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -4(%rbp)
    movl    -4(%rbp), %eax
    imull   %eax, %eax
    popq    %rbp
    ret

Now try with optimization:

gcc -S -O2 arith.c -o arith_opt.s

The optimized output is dramatically shorter -- the compiler may skip the frame pointer entirely and use registers directly.

Try It: Compile arith.c with -O0, -O1, -O2, and -O3. Compare the assembly output with diff. Notice how the compiler eliminates unnecessary memory operations at higher levels.

Stage 3: Assembly (to Object Code)

The assembler translates assembly into machine code, producing an ELF object file:

gcc -c arith.c -o arith.o

Inspect it:

file arith.o
# arith.o: ELF 64-bit LSB relocatable, x86-64, ...

objdump -d arith.o

The object file contains machine instructions, but addresses are not yet resolved. Function calls to external symbols are placeholders.

/* caller.c */
#include <stdio.h>

extern int add(int a, int b);
extern int square(int x);

int main(void) {
    printf("add(3,4) = %d\n", add(3, 4));
    printf("square(5) = %d\n", square(5));
    return 0;
}
gcc -c caller.c -o caller.o
objdump -d caller.o

In the disassembly, calls to add, square, and printf show placeholder addresses (often all zeros). These are relocations -- the linker fills them in later.

Stage 4: Linking

The linker combines object files, resolves symbols, and produces the final executable:

gcc caller.o arith.o -o program
./program

Output:

add(3,4) = 7
square(5) = 25

Symbols and the Symbol Table

Every object file carries a symbol table. View it with nm:

nm arith.o
0000000000000000 T add
0000000000000014 T square

T means the symbol is in the text (code) section and is globally visible.

nm caller.o
                 U add
0000000000000000 T main
                 U printf
                 U square

U means undefined -- these symbols must be provided by another object file or library at link time.

Relocations

View relocations with readelf:

readelf -r caller.o

Each relocation entry says: "At offset X in section Y, insert the address of symbol Z." The linker processes every relocation in every object file.

+------------------+     +------------------+
|   caller.o       |     |   arith.o        |
|                  |     |                  |
|  main            |     |  add       [T]   |
|  calls add   [U] |---->|  square    [T]   |
|  calls square[U] |---->|                  |
|  calls printf[U] |--+  +------------------+
+------------------+  |
                      |  +------------------+
                      +->|   libc.so        |
                         |  printf     [T]  |
                         +------------------+

Caution: If you see "undefined reference to ..." at link time, it means the linker cannot find a symbol. Check that you are passing all required object files and libraries. Order matters with static libraries -- the linker processes files left to right.

Examining the Final Executable

file program
# program: ELF 64-bit LSB executable, x86-64, ...

readelf -h program     # ELF header
readelf -l program     # program headers (segments)
readelf -S program     # section headers
objdump -d program     # full disassembly

Key sections in an ELF binary:

+-------------------+
| .text             |  Executable code
+-------------------+
| .rodata           |  Read-only data (string literals)
+-------------------+
| .data             |  Initialized global/static variables
+-------------------+
| .bss              |  Uninitialized global/static variables
+-------------------+
| .symtab           |  Symbol table
+-------------------+
| .strtab           |  String table for symbols
+-------------------+
| .rel.text         |  Relocations (in .o files)
+-------------------+

Driver Prep: Kernel modules are ELF relocatable objects (.ko files). The kernel's module loader performs its own linking at insmod time, resolving symbols against the running kernel's symbol table. Understanding relocations now pays off directly when debugging module load failures.

Rust's Compilation Model

Rust does not follow the same four-stage pipeline. Instead:

 Source (.rs)
    |
    v
 [rustc frontend]   -->  HIR --> MIR
    |
    v
 [LLVM backend]     -->  Object files (.o) or LLVM IR (.ll)
    |
    v
 [Linker]           -->  Executable (ELF)

The Rust compiler (rustc) handles preprocessing-like tasks (macro expansion, conditional compilation with cfg) internally. There is no separate preprocessor.

A Rust Example

// arith.rs
fn add(a: i32, b: i32) -> i32 {
    a + b
}

fn square(x: i32) -> i32 {
    x * x
}

fn main() {
    println!("add(3,4) = {}", add(3, 4));
    println!("square(5) = {}", square(5));
}
rustc arith.rs -o arith_rust
./arith_rust

Viewing Intermediate Representations

Emit LLVM IR:

rustc --emit=llvm-ir arith.rs

This produces arith.ll -- LLVM's intermediate representation, which is portable across architectures.

Emit assembly:

rustc --emit=asm arith.rs

Emit object file only (no linking):

rustc --emit=obj arith.rs

Inspect the resulting object file the same way:

nm arith.o
objdump -d arith.o

Rust Note: Rust mangles symbol names by default. You will see names like _ZN5arith3add17h...E rather than plain add. Use #[no_mangle] and extern "C" when you need C-compatible symbol names. We cover this in Chapter 26.

Crates and Incremental Compilation

Rust's unit of compilation is the crate, not the individual .rs file. A crate can contain many modules spread across multiple files, but rustc compiles the entire crate as one unit.

Cargo enables incremental compilation: when you change one function, only the affected parts of the crate are recompiled. Incremental data is cached in target/debug/incremental/.

cargo build          # first build -- compiles everything
# edit one function
cargo build          # incremental -- only recompiles the changed parts

Compare with C, where each .c file is compiled independently into a .o file, and the build system (Make) decides which files to recompile based on timestamps.

C model:                     Rust/Cargo model:

file1.c --> file1.o          +------------------+
file2.c --> file2.o    vs    |  entire crate    |---> crate .rlib
file3.c --> file3.o          |  (all .rs files) |
   \       |      /         +------------------+
    \      |     /
     v     v    v
     [  linker  ]
     [executable]

Comparing Object Files from C and Rust

Let us compile equivalent functions in both languages and compare:

/* cfunc.c */
#include <stdint.h>

int32_t multiply(int32_t a, int32_t b) {
    return a * b;
}
#![allow(unused)]
fn main() {
// rfunc.rs
#[no_mangle]
pub extern "C" fn multiply(a: i32, b: i32) -> i32 {
    a * b
}
}
gcc -c -O2 cfunc.c -o cfunc.o
rustc --crate-type=staticlib --emit=obj -C opt-level=2 rfunc.rs -o rfunc.o

objdump -d cfunc.o
objdump -d rfunc.o

At -O2, both produce nearly identical machine code for this simple function:

multiply:
    movl    %edi, %eax
    imull   %esi, %eax
    ret

The LLVM backend (used by Rust) and GCC's backend produce equivalent output for straightforward arithmetic. Differences appear with more complex code -- different inlining decisions, vectorization strategies, and so on.

Try It: Write a function that sums an array of integers in both C and Rust. Compile with -O2 / -C opt-level=2 and compare the assembly. Does one auto-vectorize and the other not?

Practical: Walking Through All Four Stages

Here is a complete C program that we will take through every stage manually:

/* pipeline.c */
#include <stdio.h>

#define GREETING "Hello from the pipeline"

static int helper(int n) {
    return n * 2 + 1;
}

int main(void) {
    int result = helper(21);
    printf("%s: result = %d\n", GREETING, result);
    return 0;
}

Run each stage explicitly:

# Stage 1: Preprocess
gcc -E pipeline.c -o pipeline.i
wc -l pipeline.i        # thousands of lines

# Stage 2: Compile to assembly
gcc -S pipeline.i -o pipeline.s
wc -l pipeline.s         # tens of lines

# Stage 3: Assemble to object
gcc -c pipeline.s -o pipeline.o
nm pipeline.o            # main is T, printf is U, helper may be t (static)

# Stage 4: Link
gcc pipeline.o -o pipeline
./pipeline
# Hello from the pipeline: result = 43

Notice that helper might appear as t (lowercase) in nm output -- the lowercase means it is a local symbol (because of static). Local symbols are not visible to the linker from other object files.

The static Keyword and Symbol Visibility

/* visibility.c */
static int internal_func(void) {  /* local to this file */
    return 42;
}

int public_func(void) {  /* visible to linker */
    return internal_func();
}
gcc -c visibility.c -o visibility.o
nm visibility.o
0000000000000000 t internal_func
0000000000000014 T public_func

Lowercase t = local. Uppercase T = global.

In Rust, the equivalent is pub vs non-pub:

#![allow(unused)]
fn main() {
// visibility.rs
fn internal_func() -> i32 {
    42
}

pub fn public_func() -> i32 {
    internal_func()
}
}

Non-pub functions are not exported from the crate. When generating a C-ABI library, only #[no_mangle] pub extern "C" functions appear as global symbols.

Knowledge Check

  1. What does the preprocessor do with #include <stdio.h>? What does the resulting .i file contain?

  2. An object file contains a call to printf but its address is all zeros. What mechanism resolves this to the real address?

  3. In nm output, what is the difference between T and U?

Common Pitfalls

  • Forgetting to link all object files. If main.o calls add defined in arith.o, you must pass both to the linker.

  • Confusing compilation errors with linker errors. "undefined reference" is a linker error, not a compiler error. The code compiled fine; the symbol is just missing at link time.

  • Assuming identical assembly from C and Rust. Different compilers (GCC vs Clang/LLVM) make different optimization choices. Close does not mean identical.

  • Ignoring static visibility. A static function in one .c file cannot be called from another. This is intentional encapsulation, not a bug.

  • Stripping debug binaries during development. Keep symbols during development; strip only for release.

Make, CMake, and Cargo

No serious project compiles files by hand. Build systems track dependencies, recompile only what changed, and manage flags across platforms. This chapter covers the three build tools you will encounter most: Make for C, CMake for portable C/C++, and Cargo for Rust.

Make: The Foundation

Make reads a Makefile and builds targets based on dependency rules. The core idea is simple: if a target is older than its dependencies, run the recipe to rebuild it.

Anatomy of a Rule

target: dependencies
	recipe

The recipe line must start with a tab character, not spaces.

A Minimal Makefile

Given this project structure:

project/
  main.c
  mathlib.c
  mathlib.h
  Makefile
/* mathlib.h */
#ifndef MATHLIB_H
#define MATHLIB_H

int add(int a, int b);
int multiply(int a, int b);

#endif
/* mathlib.c */
#include "mathlib.h"

int add(int a, int b) {
    return a + b;
}

int multiply(int a, int b) {
    return a * b;
}
/* main.c */
#include <stdio.h>
#include "mathlib.h"

int main(void) {
    printf("add(3,4) = %d\n", add(3, 4));
    printf("multiply(3,4) = %d\n", multiply(3, 4));
    return 0;
}
# Makefile
CC      = gcc
CFLAGS  = -Wall -Wextra -std=c11
LDFLAGS =

SRCS    = main.c mathlib.c
OBJS    = $(SRCS:.c=.o)
TARGET  = calculator

.PHONY: all clean

all: $(TARGET)

$(TARGET): $(OBJS)
	$(CC) $(LDFLAGS) -o $@ $^

%.o: %.c mathlib.h
	$(CC) $(CFLAGS) -c -o $@ $<

clean:
	rm -f $(OBJS) $(TARGET)

How it works:

  • $@ is the target name.
  • $^ is all dependencies.
  • $< is the first dependency.
  • %.o: %.c is a pattern rule: any .o depends on its corresponding .c.
  • .PHONY tells Make that all and clean are not real files.
make            # builds calculator
make clean      # removes build artifacts

Variables and Overrides

Override variables from the command line:

make CC=clang CFLAGS="-Wall -O2"

Automatic Dependency Generation

Manually listing header dependencies is fragile. Use GCC's -MMD flag:

# Makefile with auto-deps
CC      = gcc
CFLAGS  = -Wall -Wextra -std=c11 -MMD -MP
LDFLAGS =

SRCS    = main.c mathlib.c
OBJS    = $(SRCS:.c=.o)
DEPS    = $(OBJS:.o=.d)
TARGET  = calculator

.PHONY: all clean

all: $(TARGET)

$(TARGET): $(OBJS)
	$(CC) $(LDFLAGS) -o $@ $^

%.o: %.c
	$(CC) $(CFLAGS) -c -o $@ $<

-include $(DEPS)

clean:
	rm -f $(OBJS) $(DEPS) $(TARGET)

-MMD generates .d files listing each .c file's header dependencies. -include $(DEPS) pulls them in silently (the - suppresses errors on first build when .d files do not exist).

Try It: Add a new file utils.c / utils.h to the project. Update the SRCS variable and verify that make rebuilds correctly when you modify utils.h.

CMake: Portable Build Generation

Make works well for single-platform projects, but CMake generates build files for Make, Ninja, Visual Studio, Xcode, and more. CMake is the standard for cross-platform C and C++ projects.

CMakeLists.txt Basics

# CMakeLists.txt
cmake_minimum_required(VERSION 3.16)
project(Calculator VERSION 1.0 LANGUAGES C)

set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)

add_executable(calculator
    main.c
    mathlib.c
)

target_compile_options(calculator PRIVATE -Wall -Wextra)

Out-of-Tree Build

CMake strongly recommends building outside the source directory:

mkdir build && cd build
cmake ..
make
./calculator

The source directory stays clean. All generated files live in build/.

Libraries in CMake

Split the math library into its own target:

# CMakeLists.txt
cmake_minimum_required(VERSION 3.16)
project(Calculator VERSION 1.0 LANGUAGES C)

set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)

# Build mathlib as a static library
add_library(mathlib STATIC mathlib.c)
target_include_directories(mathlib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})

# Build the executable and link against mathlib
add_executable(calculator main.c)
target_link_libraries(calculator PRIVATE mathlib)
target_compile_options(calculator PRIVATE -Wall -Wextra)

Change STATIC to SHARED to build a shared library instead.

Finding External Libraries

find_package(Threads REQUIRED)
target_link_libraries(calculator PRIVATE Threads::Threads)

find_package(ZLIB REQUIRED)
target_link_libraries(calculator PRIVATE ZLIB::ZLIB)

find_package searches standard system paths and produces imported targets you can link against.

CMake Build Types

cmake -DCMAKE_BUILD_TYPE=Debug ..     # -g, no optimization
cmake -DCMAKE_BUILD_TYPE=Release ..   # -O3, NDEBUG defined
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..  # -O2 -g

Try It: Create a CMake project with a SHARED library. Build it, then run ldd calculator to see that it links against the shared library.

Cargo: Rust's Build System and Package Manager

Cargo is to Rust what Make + a package manager is to C, but integrated into one tool. Every Rust project starts with:

cargo new myproject
cd myproject

This creates:

myproject/
  Cargo.toml
  src/
    main.rs

Cargo.toml

[package]
name = "myproject"
version = "0.1.0"
edition = "2021"

[dependencies]

Add a dependency:

[dependencies]
serde = { version = "1.0", features = ["derive"] }
clap = "4"

Run:

cargo build    # downloads deps, compiles everything
cargo run      # build + run
cargo test     # build + run tests
cargo check    # type-check only, no codegen (fast)

A Complete Cargo Project

#![allow(unused)]
fn main() {
// src/mathlib.rs
pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

pub fn multiply(a: i32, b: i32) -> i32 {
    a * b
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add() {
        assert_eq!(add(3, 4), 7);
    }

    #[test]
    fn test_multiply() {
        assert_eq!(multiply(3, 4), 12);
    }
}
}
// src/main.rs
mod mathlib;

fn main() {
    println!("add(3,4) = {}", mathlib::add(3, 4));
    println!("multiply(3,4) = {}", mathlib::multiply(3, 4));
}
cargo run
cargo test

Build Profiles

Cargo has two built-in profiles:

# These are the defaults -- you can override them in Cargo.toml

[profile.dev]
opt-level = 0
debug = true
overflow-checks = true

[profile.release]
opt-level = 3
debug = false
overflow-checks = false
lto = false
cargo build             # uses dev profile
cargo build --release   # uses release profile

The release binary goes to target/release/ instead of target/debug/.

Custom profiles are possible:

[profile.release-with-debug]
inherits = "release"
debug = true
cargo build --profile release-with-debug

Rust Note: Unlike C where you pass -O2 or -g to the compiler directly, Cargo manages optimization and debug info through profiles. This centralizes build configuration and makes it reproducible.

Workspaces

Large Rust projects split into multiple crates within a workspace:

# Cargo.toml (workspace root)
[workspace]
members = [
    "mathlib",
    "calculator",
]
# mathlib/Cargo.toml
[package]
name = "mathlib"
version = "0.1.0"
edition = "2021"
# calculator/Cargo.toml
[package]
name = "calculator"
version = "0.1.0"
edition = "2021"

[dependencies]
mathlib = { path = "../mathlib" }
cargo build              # builds all workspace members
cargo test -p mathlib    # test only the mathlib crate

Features

Cargo features enable conditional compilation:

# mathlib/Cargo.toml
[package]
name = "mathlib"
version = "0.1.0"
edition = "2021"

[features]
default = []
advanced = []
#![allow(unused)]
fn main() {
// mathlib/src/lib.rs
pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

#[cfg(feature = "advanced")]
pub fn power(base: i32, exp: u32) -> i32 {
    (0..exp).fold(1, |acc, _| acc * base)
}
}
cargo build                              # power() not compiled
cargo build --features advanced          # power() included

Comparing the Three

+------------------+-----------+----------------+----------------+
| Feature          | Make      | CMake          | Cargo          |
+------------------+-----------+----------------+----------------+
| Config file      | Makefile  | CMakeLists.txt | Cargo.toml     |
| Language         | Make DSL  | CMake DSL      | TOML + Rust    |
| Dep management   | Manual    | find_package   | crates.io      |
| Cross-platform   | Weak      | Strong         | Strong         |
| Incremental      | File-time | File-time      | Crate-internal |
| Parallel build   | make -j   | Inherited      | Built-in       |
+------------------+-----------+----------------+----------------+

Integrating C and Rust

Real projects often mix C and Rust. Three crates make this practical.

The cc Crate: Compiling C from Cargo

# Cargo.toml
[package]
name = "c-from-rust"
version = "0.1.0"
edition = "2021"

[build-dependencies]
cc = "1"
/* csrc/helper.c */
#include <stdint.h>

int32_t c_add(int32_t a, int32_t b) {
    return a + b;
}
// build.rs
fn main() {
    cc::Build::new()
        .file("csrc/helper.c")
        .compile("helper");
}
// src/main.rs
extern "C" {
    fn c_add(a: i32, b: i32) -> i32;
}

fn main() {
    let result = unsafe { c_add(10, 20) };
    println!("c_add(10, 20) = {}", result);
}
cargo run
# c_add(10, 20) = 30

The cc crate compiles the C file, produces a static library, and tells Cargo to link it. The build.rs script runs before compilation of the main crate.

bindgen and cbindgen

Writing extern "C" blocks by hand is error-prone. The bindgen crate reads a C header and auto-generates Rust FFI declarations. Add bindgen = "0.70" to [build-dependencies], call bindgen::Builder::default().header("mylib.h").generate() in build.rs, and use include!(concat!(env!("OUT_DIR"), "/bindings.rs")) in your Rust source. It handles structs, enums, typedefs, and function declarations.

The cbindgen crate does the reverse -- it reads Rust source with #[no_mangle] pub extern "C" functions and generates a C header file automatically. Add cbindgen = "0.27" to [build-dependencies] and call cbindgen::generate(crate_dir) from build.rs. The generated header contains proper C declarations matching your Rust exports.

Driver Prep: The Linux kernel build system uses a highly customized Kbuild system built on Make. Rust-for-Linux integrates with Kbuild to compile Rust kernel modules alongside C. Understanding both Make and Cargo is essential for this workflow.

Knowledge Check

  1. In a Makefile, what does $< expand to? What about $@ and $^?

  2. Why does CMake recommend out-of-tree builds?

  3. How does cargo build --release differ from cargo build in terms of optimization and debug info?

Common Pitfalls

  • Spaces instead of tabs in Makefiles. Make requires literal tab characters for recipe lines. Many editors silently convert tabs to spaces.

  • Forgetting -fPIC for shared libraries in CMake. Use set(CMAKE_POSITION_INDEPENDENT_CODE ON) or let CMake handle it with add_library(... SHARED ...).

  • Not running cargo clean when switching profiles. Stale artifacts in target/ can cause confusing behavior.

  • Linking order with static libraries in Make. The linker processes left to right. If main.o depends on libmath.a, write gcc main.o -lmath, not gcc -lmath main.o.

  • Forgetting build.rs in cc / bindgen workflows. The build script must exist and be referenced correctly in Cargo.toml.

Static and Shared Libraries

Libraries let you package compiled code for reuse without distributing source. The distinction between static and shared libraries affects binary size, load time, memory usage, and update strategy. This chapter covers both, plus how to bridge C and Rust libraries across the language boundary.

Static Libraries (.a)

A static library is an archive of object files. At link time, the linker copies the needed object code directly into the final executable.

Creating a Static Library in C

/* vec2.h */
#ifndef VEC2_H
#define VEC2_H

typedef struct {
    double x;
    double y;
} Vec2;

Vec2 vec2_add(Vec2 a, Vec2 b);
Vec2 vec2_scale(Vec2 v, double s);
double vec2_dot(Vec2 a, Vec2 b);

#endif
/* vec2.c */
#include "vec2.h"

Vec2 vec2_add(Vec2 a, Vec2 b) {
    return (Vec2){ a.x + b.x, a.y + b.y };
}

Vec2 vec2_scale(Vec2 v, double s) {
    return (Vec2){ v.x * s, v.y * s };
}

double vec2_dot(Vec2 a, Vec2 b) {
    return a.x * b.x + a.y * b.y;
}

Build the static library:

gcc -c -O2 vec2.c -o vec2.o
ar rcs libvec2.a vec2.o
  • ar is the archiver.
  • r inserts files into the archive (replacing if they exist).
  • c creates the archive if it does not exist.
  • s writes an index (equivalent to running ranlib).

Inspect it:

ar t libvec2.a      # list contents
nm libvec2.a        # list symbols

Linking Against a Static Library

/* main.c */
#include <stdio.h>
#include "vec2.h"

int main(void) {
    Vec2 a = {1.0, 2.0};
    Vec2 b = {3.0, 4.0};

    Vec2 sum = vec2_add(a, b);
    printf("sum = (%.1f, %.1f)\n", sum.x, sum.y);

    double d = vec2_dot(a, b);
    printf("dot = %.1f\n", d);

    Vec2 scaled = vec2_scale(a, 3.0);
    printf("scaled = (%.1f, %.1f)\n", scaled.x, scaled.y);

    return 0;
}
gcc -O2 main.c -L. -lvec2 -o vectest
./vectest
  • -L. tells the linker to search the current directory for libraries.
  • -lvec2 tells it to look for libvec2.a (or libvec2.so).

The resulting binary is self-contained -- it does not need libvec2.a at runtime.

+-------------------+       +-------------------+
|  main.o           |       |  libvec2.a        |
|  main() [T]       |       |   vec2.o:         |
|  vec2_add [U]  ---+------>|     vec2_add [T]  |
|  vec2_dot [U]  ---+------>|     vec2_dot [T]  |
+-------------------+       +-------------------+
         \                         /
          \                       /
           v                     v
        +---------------------------+
        |  vectest (executable)     |
        |  main()                   |
        |  vec2_add()  (copied in)  |
        |  vec2_dot()  (copied in)  |
        +---------------------------+

Caution: Static linking copies code into every executable that uses it. If ten programs link libvec2.a, each gets its own copy. Security patches to the library require recompiling all ten programs.

Shared Libraries (.so)

A shared library is loaded at runtime. Multiple programs can share a single copy in memory.

Creating a Shared Library

gcc -c -O2 -fPIC vec2.c -o vec2_pic.o
gcc -shared -o libvec2.so vec2_pic.o
  • -fPIC generates position-independent code, required for shared libs.
  • -shared tells the linker to produce a shared object.

Linking Against a Shared Library

gcc -O2 main.c -L. -lvec2 -o vectest_shared

But running it may fail:

./vectest_shared
# error: libvec2.so: cannot open shared object file

The dynamic linker does not search the current directory by default. Solutions:

# Option 1: Set LD_LIBRARY_PATH
LD_LIBRARY_PATH=. ./vectest_shared

# Option 2: Install to a system path
sudo cp libvec2.so /usr/local/lib/
sudo ldconfig

# Option 3: Embed the path at link time
gcc -O2 main.c -L. -lvec2 -Wl,-rpath,'$ORIGIN' -o vectest_shared

The -Wl,-rpath,'$ORIGIN' approach embeds a relative search path in the binary itself. $ORIGIN expands to the directory containing the executable.

Runtime vs. Compile Time

+-----------------------+
|   Compile/Link Time   |
|-----------------------|
|  gcc finds libvec2.so |
|  records dependency   |
|  does NOT copy code   |
+-----------------------+
         |
         v
+-----------------------+
|   Runtime             |
|-----------------------|
|  ld.so loads .so      |
|  maps into memory     |
|  resolves symbols     |
+-----------------------+

Check what shared libraries an executable needs:

ldd vectest_shared

Soname Versioning

Shared libraries use a versioning scheme:

libvec2.so.1.2.3      # real name (major.minor.patch)
libvec2.so.1          # soname (major version)
libvec2.so            # linker name (symlink)
gcc -shared -Wl,-soname,libvec2.so.1 -o libvec2.so.1.0.0 vec2_pic.o
ln -s libvec2.so.1.0.0 libvec2.so.1
ln -s libvec2.so.1 libvec2.so

The executable records the soname (libvec2.so.1), not the full version. This means you can update libvec2.so.1.0.0 to libvec2.so.1.1.0 without relinking executables, as long as the ABI is compatible.

readelf -d vectest_shared | grep NEEDED
# 0x0000000000000001 (NEEDED)  Shared library: [libvec2.so.1]

The ldconfig command manages the soname symlinks system-wide. Run sudo ldconfig after installing a new library to update the cache.

dlopen / dlsym: Runtime Loading

Sometimes you need to load a library at runtime -- for plugins, optional features, or late binding.

Define a plugin with a clean ABI:

/* plugin_api.h */
#ifndef PLUGIN_API_H
#define PLUGIN_API_H

int plugin_init(void);
int plugin_process(int input);
void plugin_cleanup(void);

#endif
/* my_plugin.c */
#include <stdio.h>
#include "plugin_api.h"

int plugin_init(void) {
    printf("[plugin] initialized\n");
    return 0;
}

int plugin_process(int input) {
    return input * 3 + 1;
}

void plugin_cleanup(void) {
    printf("[plugin] cleaned up\n");
}
gcc -shared -fPIC -o my_plugin.so my_plugin.c
/* host.c */
#include <stdio.h>
#include <dlfcn.h>

typedef int (*init_fn)(void);
typedef int (*process_fn)(int);
typedef void (*cleanup_fn)(void);

int main(int argc, char *argv[]) {
    const char *plugin_path = (argc > 1) ? argv[1] : "./my_plugin.so";

    void *handle = dlopen(plugin_path, RTLD_LAZY);
    if (!handle) {
        fprintf(stderr, "dlopen: %s\n", dlerror());
        return 1;
    }

    init_fn    init    = (init_fn)dlsym(handle, "plugin_init");
    process_fn process = (process_fn)dlsym(handle, "plugin_process");
    cleanup_fn cleanup = (cleanup_fn)dlsym(handle, "plugin_cleanup");

    if (!init || !process || !cleanup) {
        fprintf(stderr, "dlsym: %s\n", dlerror());
        dlclose(handle);
        return 1;
    }

    init();
    printf("process(10) = %d\n", process(10));
    cleanup();

    dlclose(handle);
    return 0;
}
gcc -o host host.c -ldl
./host ./my_plugin.so

Output:

[plugin] initialized
process(10) = 31
[plugin] cleaned up

Link with -ldl to get dlopen / dlsym / dlclose.

Driver Prep: The Linux kernel's module system is conceptually similar to dlopen. When you run insmod mydriver.ko, the kernel loads the module's ELF object, resolves symbols against the kernel's exported symbol table, and calls the module's init function.

Rust Library Types

Rust supports several library output types, configured in Cargo.toml:

[lib]
crate-type = ["rlib"]       # default: Rust-native library
# crate-type = ["staticlib"]  # C-compatible static library (.a)
# crate-type = ["cdylib"]     # C-compatible shared library (.so)
# crate-type = ["dylib"]      # Rust-native shared library
TypeFileUse case
rlib.rlibDependency for other Rust crates
staticlib.aLink into a C/C++ program
cdylib.soShared lib callable from C
dylib.soShared lib for other Rust code

Building a Rust Static Library for C

# Cargo.toml
[package]
name = "rustmath"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["staticlib"]
#![allow(unused)]
fn main() {
// src/lib.rs
use std::os::raw::c_int;

#[no_mangle]
pub extern "C" fn rust_add(a: c_int, b: c_int) -> c_int {
    a + b
}

#[no_mangle]
pub extern "C" fn rust_factorial(n: c_int) -> c_int {
    if n <= 1 { 1 } else { n * rust_factorial(n - 1) }
}
}
cargo build --release
ls target/release/librustmath.a

Now call it from C:

/* use_rustlib.c */
#include <stdio.h>
#include <stdint.h>

/* Declarations matching Rust's extern "C" functions */
int32_t rust_add(int32_t a, int32_t b);
int32_t rust_factorial(int32_t n);

int main(void) {
    printf("rust_add(10, 20) = %d\n", rust_add(10, 20));
    printf("rust_factorial(6) = %d\n", rust_factorial(6));
    return 0;
}
gcc -O2 use_rustlib.c -L target/release -lrustmath -lpthread -ldl -lm -o use_rustlib
./use_rustlib

The extra -lpthread -ldl -lm flags are needed because Rust's standard library depends on them.

Rust Note: When producing a staticlib, Rust statically links its own standard library into the .a file. This makes the archive self-contained but larger. A cdylib dynamically links the Rust standard library.

Building a Rust Shared Library for C

Change the crate type to ["cdylib"] and rebuild. This produces a librustmath.so that can be dynamically linked from C the same way.

Writing a C Library Callable from Rust

The reverse direction: wrap an existing C library for use in Rust.

/* cstack.h */
#ifndef CSTACK_H
#define CSTACK_H

#include <stdint.h>
#include <stdbool.h>

#define STACK_CAPACITY 64

typedef struct {
    int32_t data[STACK_CAPACITY];
    int32_t top;
} Stack;

void stack_init(Stack *s);
bool stack_push(Stack *s, int32_t value);
bool stack_pop(Stack *s, int32_t *out);
int32_t stack_size(const Stack *s);

#endif
/* cstack.c */
#include "cstack.h"

void stack_init(Stack *s) {
    s->top = -1;
}

bool stack_push(Stack *s, int32_t value) {
    if (s->top >= STACK_CAPACITY - 1) return false;
    s->data[++(s->top)] = value;
    return true;
}

bool stack_pop(Stack *s, int32_t *out) {
    if (s->top < 0) return false;
    *out = s->data[(s->top)--];
    return true;
}

int32_t stack_size(const Stack *s) {
    return s->top + 1;
}

Use it from Rust with the cc crate:

# Cargo.toml
[package]
name = "use-cstack"
version = "0.1.0"
edition = "2021"

[build-dependencies]
cc = "1"
// build.rs
fn main() {
    cc::Build::new()
        .file("cstack.c")
        .compile("cstack");
}
// src/main.rs
use std::os::raw::c_int;

const STACK_CAPACITY: usize = 64;

#[repr(C)]
struct Stack {
    data: [c_int; STACK_CAPACITY],
    top: c_int,
}

extern "C" {
    fn stack_init(s: *mut Stack);
    fn stack_push(s: *mut Stack, value: c_int) -> bool;
    fn stack_pop(s: *mut Stack, out: *mut c_int) -> bool;
    fn stack_size(s: *const Stack) -> c_int;
}

fn main() {
    unsafe {
        let mut s = std::mem::MaybeUninit::<Stack>::uninit();
        stack_init(s.as_mut_ptr());
        let mut s = s.assume_init();

        stack_push(&mut s, 10);
        stack_push(&mut s, 20);
        stack_push(&mut s, 30);

        println!("size = {}", stack_size(&s));

        let mut val: c_int = 0;
        while stack_pop(&mut s, &mut val) {
            println!("popped: {}", val);
        }
    }
}
cargo run

Output:

size = 3
popped: 30
popped: 20
popped: 10

Caution: When defining #[repr(C)] structs in Rust to match C structs, you must get the field order, types, and sizes exactly right. A mismatch causes silent memory corruption. Use bindgen to generate these automatically for anything non-trivial.

ABI Compatibility

ABI (Application Binary Interface) defines how functions pass arguments, return values, and lay out structs at the machine level. On x86-64 Linux, the System V AMD64 ABI passes the first six integer arguments in registers RDI, RSI, RDX, RCX, R8, R9. Return values go in RAX.

  +--------+--------+--------+--------+--------+--------+
  | Arg 1  | Arg 2  | Arg 3  | Arg 4  | Arg 5  | Arg 6  |
  | RDI    | RSI    | RDX    | RCX    | R8     | R9     |
  +--------+--------+--------+--------+--------+--------+
  | Remaining args go on the stack, right to left        |
  +------------------------------------------------------+

When Rust uses extern "C", it follows this exact convention.

Rust Note: Rust's native ABI is not stable and can change between compiler versions. Always use extern "C" when crossing language boundaries. The #[no_mangle] attribute prevents Rust from mangling the symbol name, making it findable by C code.

Knowledge Check

  1. What is the difference between ar rcs libfoo.a foo.o and gcc -shared -o libfoo.so foo.o?

  2. An executable built against libvec2.so.1 fails to run after you update the library. What might have changed?

  3. Why must you compile with -fPIC before creating a shared library?

Common Pitfalls

  • Forgetting -fPIC. Without position-independent code, the shared library cannot be loaded at arbitrary addresses. The linker will error.

  • Library search order confusion. The linker prefers .so over .a when both exist. Use -static or pass the .a path directly to force static linking.

  • Missing transitive dependencies. If libA.so depends on libB.so, you may need to link both explicitly. Use pkg-config or CMake's target_link_libraries to manage this.

  • Forgetting -ldl for dlopen. On glibc systems, dlopen and dlsym live in libdl. Link with -ldl.

  • ABI mismatch between C and Rust structs. If you define a struct in both languages, the layout must match exactly. Use #[repr(C)] in Rust and verify with offsetof / std::mem::offset_of!.

  • Stripping symbols from a shared library. Stripping all symbols from a .so makes it useless. Use strip --strip-unneeded to keep only the dynamic symbols.

Cross-Compilation and Targets

Cross-compilation means building code on one machine (the host) to run on a different machine (the target). You compile on your x86-64 laptop and produce a binary for an ARM Raspberry Pi, a RISC-V board, or an embedded microcontroller. This is essential for embedded systems, driver development, and any scenario where the target cannot compile its own code.

Why Cross-Compile?

The target machine may be too slow, too resource-constrained, or not yet booted. You cannot compile a kernel for a board that has no operating system running. Embedded ARM devices, IoT sensors, and custom hardware all require cross-compilation from a development workstation.

+---------------------+          +---------------------+
|   Host (x86-64)     |          |   Target (aarch64)  |
|   - gcc / rustc     |  build   |   - no compiler     |
|   - full OS         | -------> |   - runs the binary  |
|   - cross-toolchain |          |   - limited resources|
+---------------------+          +---------------------+

The Target Triple

Both GCC and LLVM/Rust use a target triple (sometimes a quadruple) to identify the target platform:

<arch>-<vendor>-<os>-<abi>

Examples:
  x86_64-unknown-linux-gnu       Your typical desktop Linux
  aarch64-unknown-linux-gnu      64-bit ARM Linux
  arm-unknown-linux-gnueabihf    32-bit ARM Linux, hard float
  riscv64gc-unknown-linux-gnu    64-bit RISC-V Linux
  x86_64-unknown-linux-musl      x86-64 Linux with musl libc
  aarch64-unknown-none            Bare-metal ARM (no OS)
  thumbv7em-none-eabihf           ARM Cortex-M4/M7, no OS

Each component:

FieldMeaning
archCPU architecture (x86_64, aarch64, arm, riscv64)
vendorWho made it (unknown, apple, pc)
osOperating system (linux, windows, none)
abiABI / libc (gnu, musl, eabi, eabihf)

The triple determines what instruction set the compiler emits, what system call conventions to use, and what C library to link against.

Cross-Compilation in C

Installing a Cross Toolchain

On Debian/Ubuntu, install cross-compilation tools:

sudo apt install gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu

This gives you aarch64-linux-gnu-gcc, aarch64-linux-gnu-ld, aarch64-linux-gnu-objdump, and friends.

For 32-bit ARM:

sudo apt install gcc-arm-linux-gnueabihf

For RISC-V:

sudo apt install gcc-riscv64-linux-gnu

A Complete Cross-Compilation Example

/* hello_cross.c */
#include <stdio.h>

int main(void) {
    printf("Hello from cross-compiled code!\n");
    printf("sizeof(void*) = %zu\n", sizeof(void *));
    printf("sizeof(long)  = %zu\n", sizeof(long));
    return 0;
}

Compile for aarch64:

aarch64-linux-gnu-gcc -O2 hello_cross.c -o hello_aarch64

Inspect the result:

file hello_aarch64
# hello_aarch64: ELF 64-bit LSB executable, ARM aarch64, ...

objdump -d hello_aarch64 | head -30
# You'll see ARM instructions, not x86

You cannot run it directly on x86-64:

./hello_aarch64
# bash: ./hello_aarch64: cannot execute binary file: Exec format error

But you can run it with QEMU user-mode emulation:

sudo apt install qemu-user qemu-user-static
qemu-aarch64 -L /usr/aarch64-linux-gnu ./hello_aarch64

Output:

Hello from cross-compiled code!
sizeof(void*) = 8
sizeof(long)  = 8

Try It: Install gcc-arm-linux-gnueabihf and cross-compile the same program for 32-bit ARM. Use file to confirm it is an ARM executable. Run it with qemu-arm. Check what sizeof(void*) reports -- it should be 4.

The Sysroot

A sysroot is a directory containing the target's headers and libraries. When you install gcc-aarch64-linux-gnu, the sysroot is typically at /usr/aarch64-linux-gnu/.

/usr/aarch64-linux-gnu/
  include/        # target's C headers
  lib/            # target's C library, crt*.o, etc.

The cross-compiler knows its sysroot. You can override it:

aarch64-linux-gnu-gcc --sysroot=/path/to/my/sysroot -O2 hello_cross.c -o hello

This is essential when building for custom Linux distributions or embedded systems with non-standard libraries.

Cross-compilation data flow:

  hello_cross.c
       |
       v
  aarch64-linux-gnu-gcc
       |
       +-- uses headers from /usr/aarch64-linux-gnu/include/
       +-- links against /usr/aarch64-linux-gnu/lib/libc.so
       |
       v
  hello_aarch64 (ELF for aarch64)

Cross-Compiling with a Makefile

Modify the Makefile to accept a CROSS_COMPILE prefix:

# Makefile
CROSS_COMPILE ?=
CC       = $(CROSS_COMPILE)gcc
AR       = $(CROSS_COMPILE)ar
STRIP    = $(CROSS_COMPILE)strip
CFLAGS   = -Wall -Wextra -O2
LDFLAGS  =

SRCS     = hello_cross.c
OBJS     = $(SRCS:.c=.o)
TARGET   = hello

.PHONY: all clean

all: $(TARGET)

$(TARGET): $(OBJS)
	$(CC) $(LDFLAGS) -o $@ $^

%.o: %.c
	$(CC) $(CFLAGS) -c -o $@ $<

clean:
	rm -f $(OBJS) $(TARGET)
make                                    # native build
make CROSS_COMPILE=aarch64-linux-gnu-   # cross-compile for ARM64
make CROSS_COMPILE=arm-linux-gnueabihf- # cross-compile for ARM32

Driver Prep: The Linux kernel uses exactly this pattern. The kernel Makefile accepts CROSS_COMPILE and ARCH variables: make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- This is how you build a kernel for a Raspberry Pi on your laptop.

Cross-Compilation in Rust

Rust makes cross-compilation substantially easier than C. The Rust compiler uses LLVM, which can emit code for many targets from a single compiler binary. You do not need a separate rustc for each target.

Adding a Target

List installed targets:

rustup target list --installed

Add a new target:

rustup target add aarch64-unknown-linux-gnu
rustup target add arm-unknown-linux-gnueabihf
rustup target add x86_64-unknown-linux-musl

A Simple Cross-Compile

// src/main.rs
fn main() {
    println!("Hello from Rust cross-compilation!");
    println!("Target arch: {}", std::env::consts::ARCH);
    println!("Target OS:   {}", std::env::consts::OS);
    println!("Pointer size: {} bytes", std::mem::size_of::<*const u8>());
}
cargo build --target aarch64-unknown-linux-gnu

This compiles Rust code for aarch64 but fails at linking because Cargo does not know where the aarch64 linker is. You need to tell it.

Configuring the Linker

Create or edit .cargo/config.toml:

[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"

[target.arm-unknown-linux-gnueabihf]
linker = "arm-linux-gnueabihf-gcc"

[target.riscv64gc-unknown-linux-gnu]
linker = "riscv64-linux-gnu-gcc"

Now:

cargo build --target aarch64-unknown-linux-gnu
file target/aarch64-unknown-linux-gnu/debug/myproject
# ELF 64-bit LSB pie executable, ARM aarch64, ...

Run with QEMU:

qemu-aarch64 -L /usr/aarch64-linux-gnu target/aarch64-unknown-linux-gnu/debug/myproject

Output:

Hello from Rust cross-compilation!
Target arch: aarch64
Target OS:   linux
Pointer size: 8 bytes

Static Linking with musl

For maximum portability, link statically against musl libc. The resulting binary has zero runtime dependencies:

rustup target add x86_64-unknown-linux-musl
cargo build --target x86_64-unknown-linux-musl --release
file target/x86_64-unknown-linux-musl/release/myproject
# ELF 64-bit LSB executable, x86-64, statically linked, ...

ldd target/x86_64-unknown-linux-musl/release/myproject
# not a dynamic executable

This binary runs on any x86-64 Linux system regardless of the installed glibc version.

Rust Note: Rust's musl target produces fully static binaries by default. In C, achieving the same requires musl-gcc or musl-cross-make and careful management of all dependencies. Rust makes this trivial.

Conditional Compilation for Target Architecture

// src/main.rs
fn main() {
    #[cfg(target_arch = "x86_64")]
    println!("Running on x86-64");

    #[cfg(target_arch = "aarch64")]
    println!("Running on ARM64");

    #[cfg(target_arch = "arm")]
    println!("Running on 32-bit ARM");

    #[cfg(target_os = "linux")]
    println!("Operating system: Linux");

    #[cfg(target_os = "none")]
    println!("No OS (bare metal)");

    #[cfg(target_pointer_width = "64")]
    println!("64-bit pointers");

    #[cfg(target_pointer_width = "32")]
    println!("32-bit pointers");
}

The C equivalent uses preprocessor macros:

/* arch_detect.c */
#include <stdio.h>

int main(void) {
#if defined(__x86_64__)
    printf("Running on x86-64\n");
#elif defined(__aarch64__)
    printf("Running on ARM64\n");
#elif defined(__arm__)
    printf("Running on 32-bit ARM\n");
#elif defined(__riscv)
    printf("Running on RISC-V\n");
#else
    printf("Unknown architecture\n");
#endif

#if defined(__linux__)
    printf("Operating system: Linux\n");
#endif

    printf("Pointer size: %zu bytes\n", sizeof(void *));
    return 0;
}
gcc arch_detect.c -o arch_native
./arch_native
# Running on x86-64
# Operating system: Linux
# Pointer size: 8 bytes

aarch64-linux-gnu-gcc arch_detect.c -o arch_arm64
qemu-aarch64 -L /usr/aarch64-linux-gnu ./arch_arm64
# Running on ARM64
# Operating system: Linux
# Pointer size: 8 bytes

Cross-Compiling a C Library for ARM

Cross-compile a static library for aarch64 using the same tools:

aarch64-linux-gnu-gcc -c -O2 sensor.c -o sensor_arm64.o
aarch64-linux-gnu-ar rcs libsensor_arm64.a sensor_arm64.o

file sensor_arm64.o
# sensor_arm64.o: ELF 64-bit LSB relocatable, ARM aarch64

When using the cc crate in a Rust project, it automatically detects the Cargo target triple and invokes the correct cross-compiler. If you run cargo build --target aarch64-unknown-linux-gnu, the cc crate calls aarch64-linux-gnu-gcc instead of gcc.

Caution: Struct layout across architectures can differ. Fields may have different alignment requirements on ARM vs x86. Always use #[repr(C)] in Rust and fixed-width types (int16_t, uint32_t) in C to ensure consistent layout across platforms.

Checking Available Targets

GCC

GCC cross-compilers are separate binaries. List what is installed:

ls /usr/bin/*-gcc 2>/dev/null
# /usr/bin/aarch64-linux-gnu-gcc
# /usr/bin/arm-linux-gnueabihf-gcc
# /usr/bin/riscv64-linux-gnu-gcc

Rust

Rust shows all supported targets:

rustc --print target-list | wc -l
# Over 200 targets

rustc --print target-list | grep linux
# aarch64-unknown-linux-gnu
# arm-unknown-linux-gnueabihf
# riscv64gc-unknown-linux-gnu
# x86_64-unknown-linux-gnu
# x86_64-unknown-linux-musl
# ... many more

Get detailed target info:

rustc --print cfg --target aarch64-unknown-linux-gnu

This prints all cfg attributes that are true for that target, which determines what code #[cfg(...)] includes or excludes.

Bare-Metal Cross-Compilation

For embedded targets with no OS, the approach changes. There is no libc, no printf, no standard file I/O.

C for Bare Metal

/* bare.c -- for a bare-metal ARM target */
#include <stdint.h>

/* Memory-mapped UART register (hypothetical) */
#define UART0_DR  (*(volatile uint32_t *)0x09000000)

void uart_putc(char c) {
    UART0_DR = (uint32_t)c;
}

void uart_puts(const char *s) {
    while (*s) {
        uart_putc(*s++);
    }
}

void _start(void) {
    uart_puts("Hello, bare metal!\n");
    while (1) {}  /* hang */
}
aarch64-linux-gnu-gcc -ffreestanding -nostdlib -T linker.ld bare.c -o bare.elf
  • -ffreestanding tells the compiler not to assume a hosted environment.
  • -nostdlib tells the linker not to link the standard library.
  • -T linker.ld provides a custom linker script.

Rust for Bare Metal

#![allow(unused)]
fn main() {
// src/main.rs
#![no_std]
#![no_main]

use core::panic::PanicInfo;

const UART0_DR: *mut u32 = 0x0900_0000 as *mut u32;

fn uart_putc(c: u8) {
    unsafe {
        core::ptr::write_volatile(UART0_DR, c as u32);
    }
}

fn uart_puts(s: &str) {
    for b in s.bytes() {
        uart_putc(b);
    }
}

#[no_mangle]
pub extern "C" fn _start() -> ! {
    uart_puts("Hello from bare-metal Rust!\n");
    loop {}
}

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
}
cargo build --target aarch64-unknown-none

The aarch64-unknown-none target means: aarch64 architecture, no vendor, no operating system. Rust's core library is available (basic types, iterators, Option, Result), but std is not (no heap, no file I/O, no threads).

Driver Prep: Kernel modules operate in a similar environment to bare metal. There is no standard library, no heap by default, and you interact with hardware through memory-mapped registers. Cross-compilation to aarch64-unknown-linux-gnu is how you build kernel modules for ARM64 boards from your x86 workstation.

Summary Diagram: Cross-Compilation Workflow

Development Machine (x86-64)
+--------------------------------------------------+
|                                                  |
|  Source Code (.c / .rs)                          |
|       |                                          |
|       v                                          |
|  Cross-Compiler                                  |
|  (aarch64-linux-gnu-gcc / rustc --target ...)    |
|       |                                          |
|       +-- Sysroot: target headers + libs         |
|       |                                          |
|       v                                          |
|  Cross-Compiled Binary (ELF aarch64)             |
|       |                                          |
+-------+------------------------------------------+
        |
        |  scp / flash / JTAG / TFTP
        v
Target Machine (aarch64)
+--------------------------------------------------+
|                                                  |
|  Runs the binary natively                        |
|                                                  |
+--------------------------------------------------+

Knowledge Check

  1. What does the "gnu" part of aarch64-unknown-linux-gnu specify? What would "musl" mean instead?

  2. You cross-compile a Rust program for aarch64-unknown-linux-gnu but linking fails. What is the most likely missing piece?

  3. Why can you not just copy a dynamically-linked x86-64 binary to an aarch64 machine and run it?

Common Pitfalls

  • Forgetting to install the cross-linker for Rust. rustc can emit aarch64 code, but it needs aarch64-linux-gnu-gcc (or equivalent) to link. Configure this in .cargo/config.toml.

  • Mixing host and target libraries. If your Makefile picks up /usr/lib instead of the sysroot's lib/, you get x86 libraries linked into an ARM binary. The result may link but will crash at runtime.

  • Assuming identical struct layout across targets. Padding and alignment differ between 32-bit and 64-bit architectures. Use fixed-width types and #pragma pack or #[repr(C, packed)] when layout must be exact.

  • Not testing with QEMU. Before deploying to real hardware, test cross-compiled binaries with qemu-user. It catches most issues without needing the physical device.

  • Forgetting endianness in wire protocols. If you serialize a struct to bytes on one architecture and deserialize on another, byte order mismatches will corrupt every multi-byte field.

File Descriptors

On Linux, everything is a file. A regular file, a terminal, a pipe, a network socket, even a device -- they are all accessed through the same interface: the file descriptor. This chapter shows you that interface from both sides of the C/Rust divide.

The File Descriptor Table

Every process has a small integer table managed by the kernel. Each entry points to an open file description (an in-kernel structure). When you call open(), the kernel picks the lowest available integer, fills the slot, and returns that integer to you.

Process File-Descriptor Table            Kernel Open-File Descriptions
+-----+------------------+          +----------------------------------+
|  0  | ──────────────────>─────>   | struct file  (terminal /dev/pts/0)|
+-----+------------------+          +----------------------------------+
|  1  | ──────────────────>─────>   | struct file  (terminal /dev/pts/0)|
+-----+------------------+          +----------------------------------+
|  2  | ──────────────────>─────>   | struct file  (terminal /dev/pts/0)|
+-----+------------------+          +----------------------------------+
|  3  | ──────────────────>─────>   | struct file  (/tmp/data.txt)      |
+-----+------------------+          +----------------------------------+
|  4  |      (unused)     |
+-----+------------------+
| ... |                   |
+-----+------------------+

File descriptors 0, 1, and 2 are pre-opened by the shell before your program starts:

fdSymbolic NameC MacroPurpose
0standard inputSTDIN_FILENOkeyboard / pipe
1standard outputSTDOUT_FILENOterminal / pipe
2standard errorSTDERR_FILENOterminal / pipe

Driver Prep: In kernel modules you will work with struct file directly. Understanding the user-space side now makes the kernel side feel familiar.

Opening a File in C

/* open_file.c -- open, write, read, close */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

int main(void)
{
    const char *path = "/tmp/fd_demo.txt";

    /* O_WRONLY  -- write only
       O_CREAT   -- create if missing
       O_TRUNC   -- truncate to zero length if exists
       0644      -- rw-r--r-- permissions */
    int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) {
        perror("open");
        return 1;
    }
    printf("opened %s as fd %d\n", path, fd);

    const char *msg = "hello from file descriptor land\n";
    ssize_t nw = write(fd, msg, strlen(msg));
    if (nw == -1) {
        perror("write");
        close(fd);
        return 1;
    }
    printf("wrote %zd bytes\n", nw);

    if (close(fd) == -1) {
        perror("close");
        return 1;
    }

    /* Now reopen for reading */
    fd = open(path, O_RDONLY);
    if (fd == -1) {
        perror("open (read)");
        return 1;
    }

    char buf[128];
    ssize_t nr = read(fd, buf, sizeof(buf) - 1);
    if (nr == -1) {
        perror("read");
        close(fd);
        return 1;
    }
    buf[nr] = '\0';
    printf("read back: %s", buf);

    close(fd);
    return 0;
}

Compile and run:

$ gcc -Wall -o open_file open_file.c && ./open_file
opened /tmp/fd_demo.txt as fd 3
wrote 32 bytes
read back: hello from file descriptor land

Notice the fd is 3 -- the first slot after stdin/stdout/stderr.

Caution: Always check the return value of open(). A return of -1 means failure, and errno tells you why. Forgetting this check is the single most common file-handling bug in C.

The Open Flags

Here are the flags you will use constantly:

FlagMeaning
O_RDONLYOpen for reading only
O_WRONLYOpen for writing only
O_RDWROpen for reading and writing
O_CREATCreate the file if it does not exist
O_TRUNCTruncate existing file to zero length
O_APPENDWrites always go to end of file
O_EXCLFail if file already exists (with O_CREAT)
O_CLOEXECClose fd automatically on exec()

O_CREAT requires a third argument to open() specifying the permission bits. Without it the permissions are garbage -- whatever happened to be on the stack.

Caution: Forgetting the mode argument when using O_CREAT is undefined behavior. The compiler will not warn you because open() uses variadic arguments.

Partial Reads and Writes

read() and write() are not guaranteed to transfer the full amount you requested. A read() of 4096 bytes might return 17 if only 17 bytes are available. A write() on a non-blocking socket might write half your buffer.

A robust write loop:

/* write_all.c -- handle partial writes */
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <stdio.h>

ssize_t write_all(int fd, const void *buf, size_t count)
{
    const char *p = buf;
    size_t remaining = count;

    while (remaining > 0) {
        ssize_t n = write(fd, p, remaining);
        if (n == -1) {
            if (errno == EINTR)
                continue;   /* interrupted by signal, retry */
            return -1;
        }
        p += n;
        remaining -= (size_t)n;
    }
    return (ssize_t)count;
}

int main(void)
{
    int fd = open("/tmp/robust.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); return 1; }

    const char *data = "robust write completed\n";
    if (write_all(fd, data, strlen(data)) == -1) {
        perror("write_all");
        close(fd);
        return 1;
    }

    close(fd);
    printf("done\n");
    return 0;
}

Try It: Modify write_all to also handle EAGAIN (would block on non-blocking descriptors). What should you do -- retry immediately or sleep?

The Rust Equivalent

Rust wraps file descriptors in std::fs::File. The Read and Write traits provide read() and write(). Dropping a File closes it automatically.

// open_file.rs -- open, write, read, close via drop
use std::fs::{File, OpenOptions};
use std::io::{Read, Write};

fn main() -> std::io::Result<()> {
    let path = "/tmp/fd_demo_rs.txt";

    // Create and write
    {
        let mut f = OpenOptions::new()
            .write(true)
            .create(true)
            .truncate(true)
            .open(path)?;

        let msg = b"hello from Rust file descriptor land\n";
        f.write_all(msg)?;
        println!("wrote {} bytes", msg.len());
    } // f is dropped here -- close() called automatically

    // Reopen and read
    {
        let mut f = File::open(path)?;
        let mut contents = String::new();
        f.read_to_string(&mut contents)?;
        print!("read back: {}", contents);
    }

    Ok(())
}

Compile and run:

$ rustc open_file.rs && ./open_file
wrote 37 bytes
read back: hello from Rust file descriptor land

Rust Note: write_all() already handles partial writes internally. You never need to write a retry loop in Rust -- the standard library does it for you. The ? operator propagates errors cleanly.

Accessing the Raw File Descriptor in Rust

Sometimes you need the raw integer, for example when calling a Linux-specific ioctl. Rust provides traits for this:

// raw_fd.rs -- access the underlying file descriptor number
use std::fs::File;
use std::os::unix::io::AsRawFd;

fn main() -> std::io::Result<()> {
    let f = File::open("/tmp/fd_demo_rs.txt")?;
    let raw: i32 = f.as_raw_fd();
    println!("raw fd = {}", raw);
    // f still owns the fd -- it will close on drop
    Ok(())
}
$ rustc raw_fd.rs && ./raw_fd
raw fd = 3

There is also FromRawFd for wrapping an existing fd, and IntoRawFd for giving up ownership. Use these when bridging C libraries.

#![allow(unused)]
fn main() {
use std::fs::File;
use std::os::unix::io::FromRawFd;

// SAFETY: fd must be a valid, open file descriptor that we now own.
let f = unsafe { File::from_raw_fd(raw_fd) };
}

Caution: from_raw_fd is unsafe because Rust cannot verify the fd is valid or that nobody else will close it. Double-close is undefined behavior at the OS level.

Duplicating File Descriptors: dup and dup2

dup(fd) duplicates a file descriptor, returning the lowest available number. dup2(oldfd, newfd) duplicates oldfd onto newfd, closing newfd first if it was open.

This is how shells implement redirection. ls > output.txt is roughly:

fd = open("output.txt", ...)
dup2(fd, STDOUT_FILENO)    // stdout now points to output.txt
close(fd)                   // don't need the extra fd
exec("ls", ...)
/* dup_demo.c -- redirect stdout to a file */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
    int fd = open("/tmp/dup_out.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); return 1; }

    /* Save original stdout */
    int saved_stdout = dup(STDOUT_FILENO);
    if (saved_stdout == -1) { perror("dup"); return 1; }

    /* Redirect stdout to the file */
    if (dup2(fd, STDOUT_FILENO) == -1) { perror("dup2"); return 1; }
    close(fd);  /* fd is no longer needed; stdout points to the file */

    /* This printf goes to /tmp/dup_out.txt */
    printf("this line goes to the file\n");
    fflush(stdout);

    /* Restore stdout */
    dup2(saved_stdout, STDOUT_FILENO);
    close(saved_stdout);

    /* This printf goes to the terminal */
    printf("this line goes to the terminal\n");

    return 0;
}
$ gcc -Wall -o dup_demo dup_demo.c && ./dup_demo
this line goes to the terminal
$ cat /tmp/dup_out.txt
this line goes to the file
Before dup2:             After dup2(fd, 1):       After close(fd):
  fd 1 ──> terminal        fd 1 ──> file            fd 1 ──> file
  fd 3 ──> file            fd 3 ──> file            fd 3 ──  (closed)

dup2 in Rust

Rust has no safe wrapper for dup2 in the standard library. Use the libc crate or the nix crate:

// dup2_demo.rs -- redirect stdout using libc::dup2
// Cargo.toml needs: libc = "0.2"
use std::fs::OpenOptions;
use std::os::unix::io::AsRawFd;

fn main() -> std::io::Result<()> {
    let file = OpenOptions::new()
        .write(true).create(true).truncate(true)
        .open("/tmp/dup_out_rs.txt")?;

    let saved = unsafe { libc::dup(1) };
    if saved == -1 { return Err(std::io::Error::last_os_error()); }

    if unsafe { libc::dup2(file.as_raw_fd(), 1) } == -1 {
        return Err(std::io::Error::last_os_error());
    }

    println!("this line goes to the file");

    unsafe { libc::dup2(saved, 1); libc::close(saved); }
    println!("this line goes to the terminal");
    Ok(())
}

Rust Note: The nix crate provides safe wrappers: nix::unistd::dup2(). For production code, prefer nix over raw libc calls.

Error Handling at Every Syscall

In C, every system call can fail. The pattern is always the same:

int result = some_syscall(...);
if (result == -1) {
    perror("some_syscall");
    // handle error: cleanup, return, exit
}

The global variable errno is set on failure. perror() prints a human-readable message. strerror(errno) gives you the string directly.

Common errors:

errnoMeaning
ENOENTNo such file or directory
EACCESPermission denied
EEXISTFile already exists (with O_EXCL)
EMFILEToo many open files (per-process)
ENFILEToo many open files (system-wide)
EINTRInterrupted by signal
EBADFBad file descriptor

In Rust, std::io::Error wraps all of this. The kind() method maps to ErrorKind variants, and raw_os_error() gives you the raw errno.

use std::fs::File;
use std::io::ErrorKind;

fn main() {
    match File::open("/nonexistent/path") {
        Ok(_) => println!("opened"),
        Err(e) => {
            println!("error kind: {:?}", e.kind());
            println!("os error:   {:?}", e.raw_os_error());
            println!("message:    {}", e);

            if e.kind() == ErrorKind::NotFound {
                println!("file does not exist");
            }
        }
    }
}
$ rustc error_demo.rs && ./error_demo
error kind: NotFound
os error:   Some(2)
message:    No such file or directory (os error 2)
file does not exist

O_CLOEXEC and File Descriptor Leaks

When you fork() and then exec(), all open file descriptors are inherited by the child process unless they are marked close-on-exec. This is a common source of file descriptor leaks and security bugs.

/* cloexec.c -- demonstrate O_CLOEXEC */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
    /* Without O_CLOEXEC: fd leaks to child after exec */
    int fd_leak = open("/tmp/fd_demo.txt", O_RDONLY);

    /* With O_CLOEXEC: fd is automatically closed on exec */
    int fd_safe = open("/tmp/fd_demo.txt", O_RDONLY | O_CLOEXEC);

    printf("fd_leak = %d, fd_safe = %d\n", fd_leak, fd_safe);

    /* In a fork+exec scenario, fd_leak would be visible to the child
       process, but fd_safe would not. */

    close(fd_leak);
    close(fd_safe);
    return 0;
}

Driver Prep: Kernel drivers deal with struct file directly. The release callback in struct file_operations is called when the last fd referring to a file is closed. Understanding reference counting of descriptors here prepares you for that.

lseek: Moving the File Offset

Every open file description has a current offset. read() and write() advance it. lseek() repositions it.

/* lseek_demo.c -- seek within a file */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main(void)
{
    int fd = open("/tmp/seek_demo.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); return 1; }

    write(fd, "ABCDEFGHIJ", 10);

    /* Seek back to offset 3 */
    off_t pos = lseek(fd, 3, SEEK_SET);
    printf("position after lseek: %ld\n", (long)pos);

    /* Overwrite from position 3 */
    write(fd, "xyz", 3);

    /* Seek to beginning and read everything */
    lseek(fd, 0, SEEK_SET);
    char buf[32];
    ssize_t n = read(fd, buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("contents: %s\n", buf);

    close(fd);
    return 0;
}
$ gcc -Wall -o lseek_demo lseek_demo.c && ./lseek_demo
position after lseek: 3
contents: ABCxyzGHIJ

SEEK_SET -- offset from beginning. SEEK_CUR -- offset from current position. SEEK_END -- offset from end of file.

In Rust, use Seek trait:

// seek_demo.rs
use std::fs::OpenOptions;
use std::io::{Read, Write, Seek, SeekFrom};

fn main() -> std::io::Result<()> {
    let mut f = OpenOptions::new()
        .read(true)
        .write(true)
        .create(true)
        .truncate(true)
        .open("/tmp/seek_demo_rs.txt")?;

    f.write_all(b"ABCDEFGHIJ")?;

    // Seek to offset 3
    let pos = f.seek(SeekFrom::Start(3))?;
    println!("position after seek: {}", pos);

    f.write_all(b"xyz")?;

    // Seek to start and read
    f.seek(SeekFrom::Start(0))?;
    let mut contents = String::new();
    f.read_to_string(&mut contents)?;
    println!("contents: {}", contents);

    Ok(())
}

Quick Knowledge Check

  1. What file descriptor number does open() return if stdin, stdout, and stderr are all open and no other files are open?

  2. What happens if you call open() with O_CREAT but forget the third argument (the mode)?

  3. After dup2(fd, STDOUT_FILENO), which file descriptor should you close -- fd, STDOUT_FILENO, or both?

Common Pitfalls

  • Forgetting to check return values. Every syscall can fail. Every one.

  • Forgetting the mode with O_CREAT. The file gets garbage permissions.

  • Not handling partial reads/writes. read() returning less than requested is normal, not an error.

  • Leaking file descriptors. Every open() must have a matching close(). In Rust, Drop handles this, but in C it is your job.

  • Using fd after close. Just like use-after-free, using a closed fd is a bug. The number might be reassigned to a different file.

  • Ignoring EINTR. Signals can interrupt blocking syscalls. Always retry on EINTR.

  • Forgetting O_CLOEXEC. File descriptors leak across exec() by default. Always use O_CLOEXEC unless you specifically want inheritance.

Buffered vs Unbuffered I/O

Every write() system call crosses the user-kernel boundary. That crossing is expensive -- hundreds of nanoseconds at minimum. Buffered I/O collects small writes into a large buffer and flushes them in one syscall. This chapter shows you both layers and when to use each.

The Two Layers

Your Program
    |
    v
+------------------------------+
|  stdio (fopen, fprintf, ...) |   <-- buffered (user-space)
|  internal buffer: 4096+ bytes|
+------------------------------+
    |  fflush() or buffer full
    v
+------------------------------+
|  syscalls (open, write, ...) |   <-- unbuffered (kernel boundary)
+------------------------------+
    |
    v
  Kernel page cache / disk

The unbuffered layer (open, read, write, close) is what we covered in Chapter 28. The buffered layer (fopen, fread, fwrite, fprintf, fclose) wraps the unbuffered layer with a user-space buffer.

Buffered I/O in C: stdio

/* stdio_demo.c -- buffered I/O with fopen, fprintf, fclose */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    FILE *fp = fopen("/tmp/buffered.txt", "w");
    if (!fp) {
        perror("fopen");
        return 1;
    }

    /* fprintf writes to an internal buffer, not directly to disk */
    fprintf(fp, "line one\n");
    fprintf(fp, "line two\n");
    fprintf(fp, "value: %d\n", 42);

    /* fclose flushes the buffer and then calls close() */
    if (fclose(fp) != 0) {
        perror("fclose");
        return 1;
    }

    /* Read it back */
    fp = fopen("/tmp/buffered.txt", "r");
    if (!fp) {
        perror("fopen");
        return 1;
    }

    char line[256];
    while (fgets(line, sizeof(line), fp)) {
        printf("read: %s", line);
    }

    fclose(fp);
    return 0;
}
$ gcc -Wall -o stdio_demo stdio_demo.c && ./stdio_demo
read: line one
read: line two
read: value: 42

fread and fwrite

For binary data or bulk transfers, use fread and fwrite instead of fprintf and fgets:

/* fread_fwrite.c -- binary I/O */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int data[] = {10, 20, 30, 40, 50};
    size_t count = sizeof(data) / sizeof(data[0]);

    /* Write binary data */
    FILE *fp = fopen("/tmp/binary.dat", "wb");
    if (!fp) { perror("fopen"); return 1; }

    size_t written = fwrite(data, sizeof(int), count, fp);
    printf("wrote %zu integers\n", written);
    fclose(fp);

    /* Read it back */
    int buf[5] = {0};
    fp = fopen("/tmp/binary.dat", "rb");
    if (!fp) { perror("fopen"); return 1; }

    size_t nread = fread(buf, sizeof(int), count, fp);
    printf("read %zu integers:", nread);
    for (size_t i = 0; i < nread; i++) {
        printf(" %d", buf[i]);
    }
    printf("\n");
    fclose(fp);

    return 0;
}
$ gcc -Wall -o fread_fwrite fread_fwrite.c && ./fread_fwrite
wrote 5 integers
read 5 integers: 10 20 30 40 50

Caution: fwrite of raw structs is not portable across architectures due to endianness and padding differences. For files that must be portable, serialize field by field.

Buffer Modes: Full, Line, None

stdio supports three buffering modes, set with setvbuf():

ModeConstantBehavior
Full buffering_IOFBFFlush when buffer is full
Line buffering_IOLBFFlush on newline or when full
No buffering_IONBFEvery write goes to kernel immediately

Default behavior:

  • stderr is unbuffered (_IONBF) -- errors appear immediately
  • stdout is line-buffered when connected to a terminal, full-buffered when connected to a pipe or file
  • Files opened with fopen are full-buffered
/* setvbuf_demo.c -- control buffering mode */
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    FILE *fp = fopen("/tmp/setvbuf_demo.txt", "w");
    if (!fp) { perror("fopen"); return 1; }

    /* Set line buffering with a 1024-byte buffer */
    char mybuf[1024];
    if (setvbuf(fp, mybuf, _IOLBF, sizeof(mybuf)) != 0) {
        perror("setvbuf");
        fclose(fp);
        return 1;
    }

    fprintf(fp, "this flushes on newline\n");  /* flushed now */
    fprintf(fp, "no newline yet...");           /* still in buffer */
    fprintf(fp, " now!\n");                    /* flushed now */

    fclose(fp);

    /* Verify */
    fp = fopen("/tmp/setvbuf_demo.txt", "r");
    char line[256];
    while (fgets(line, sizeof(line), fp))
        printf("%s", line);
    fclose(fp);

    return 0;
}

To disable buffering entirely:

setvbuf(fp, NULL, _IONBF, 0);

Try It: Write a program that prints to stdout without a newline, then sleeps for 3 seconds, then prints a newline. Run it piped to cat vs directly on the terminal. Observe the difference in when output appears.

When to Flush: fflush

fflush(fp) forces the buffer to be written to the kernel. Common situations where you need it:

  • Before a fork() -- otherwise the child inherits the buffer and you get double output
  • Before reading from the same file you are writing to
  • Before a crash-sensitive section -- data in the buffer is lost on crash
  • Before switching between stdio and raw fd operations on the same file
/* fflush_demo.c -- explicit flush */
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    printf("prompt: ");
    fflush(stdout);    /* force output before blocking on read */

    char buf[64];
    if (fgets(buf, sizeof(buf), stdin)) {
        printf("you typed: %s", buf);
    }

    return 0;
}

fflush(NULL) flushes all open output streams. Useful before fork().

Caution: fflush(stdin) is undefined behavior in the C standard, even though some implementations (like glibc on Linux) define it to discard input. Do not rely on it.

The Rust Equivalent: BufReader and BufWriter

Rust separates buffering from the file type. You wrap any reader in BufReader and any writer in BufWriter.

// buffered_io.rs -- BufWriter and BufReader
use std::fs::File;
use std::io::{BufWriter, BufReader, BufRead, Write};

fn main() -> std::io::Result<()> {
    let path = "/tmp/buffered_rs.txt";

    // Buffered writing
    {
        let file = File::create(path)?;
        let mut writer = BufWriter::new(file);

        writeln!(writer, "line one")?;
        writeln!(writer, "line two")?;
        writeln!(writer, "value: {}", 42)?;

        // BufWriter flushes on drop, but explicit flush
        // lets you catch errors
        writer.flush()?;
    }

    // Buffered reading
    {
        let file = File::open(path)?;
        let reader = BufReader::new(file);

        for line in reader.lines() {
            let line = line?;
            println!("read: {}", line);
        }
    }

    Ok(())
}
$ rustc buffered_io.rs && ./buffered_io
read: line one
read: line two
read: value: 42

Rust Note: BufWriter flushes its buffer when dropped. However, any error during that flush is silently ignored. Always call .flush() explicitly before dropping if you need to detect write failures.

The BufRead Trait

BufReader implements the BufRead trait, which gives you lines(), read_line(), and read_until():

// bufread_demo.rs
use std::io::{self, BufRead};

fn main() {
    let stdin = io::stdin();
    let handle = stdin.lock();  // locked handle implements BufRead

    println!("Type lines (Ctrl-D to stop):");
    for (i, line) in handle.lines().enumerate() {
        match line {
            Ok(text) => println!("  line {}: {}", i + 1, text),
            Err(e) => {
                eprintln!("error: {}", e);
                break;
            }
        }
    }
}

The lines() iterator strips trailing newlines and yields io::Result<String> for each line.

write! and writeln! Macros

Rust's write! and writeln! macros work on any type implementing the Write trait -- not just stdout:

// write_macro.rs
use std::io::Write;
use std::fs::File;

fn main() -> std::io::Result<()> {
    let mut f = File::create("/tmp/write_macro.txt")?;

    write!(f, "no newline")?;
    writeln!(f, " -- now with newline")?;
    writeln!(f, "pi is approximately {:.4}", std::f64::consts::PI)?;

    // Also works with Vec<u8> as an in-memory buffer
    let mut buf: Vec<u8> = Vec::new();
    writeln!(buf, "hello into a vector")?;
    println!("buf contains: {:?}", String::from_utf8(buf).unwrap());

    Ok(())
}
$ rustc write_macro.rs && ./write_macro
buf contains: "hello into a vector\n"

Do Not Mix Buffered and Unbuffered I/O

Using both write() and fprintf() on the same file descriptor leads to interleaved, corrupted output because the stdio buffer and the kernel see different states.

/* bad_mix.c -- DO NOT DO THIS */
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main(void)
{
    /* stdout is fd 1, and printf uses a buffer on fd 1 */
    printf("buffered line");        /* sits in stdio buffer */
    const char *msg = "unbuffered line\n";
    write(1, msg, strlen(msg));     /* goes directly to kernel */
    printf(" -- surprise!\n");      /* still in buffer, flushed later */

    return 0;
}
$ gcc -Wall -o bad_mix bad_mix.c && ./bad_mix | cat
unbuffered line
buffered line -- surprise!

The unbuffered write() bypasses the stdio buffer and reaches the output first. The buffered printf output appears later when the buffer flushes.

Caution: Never mix write()/read() and fprintf()/fread() on the same file descriptor. Pick one layer and stick with it.

Performance: Buffered vs Unbuffered

Let us measure the difference. Writing one million single-byte writes:

/* perf_test.c -- compare buffered vs unbuffered single-byte writes */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>

#define N 1000000

static double now(void)
{
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec + ts.tv_nsec * 1e-9;
}

int main(void)
{
    /* Unbuffered: 1 million write() syscalls */
    int fd = open("/tmp/perf_unbuf.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    double t0 = now();
    for (int i = 0; i < N; i++) {
        write(fd, "x", 1);
    }
    double t1 = now();
    close(fd);
    printf("unbuffered: %.3f seconds (%d syscalls)\n", t1 - t0, N);

    /* Buffered: stdio batches into ~4096-byte chunks */
    FILE *fp = fopen("/tmp/perf_buf.txt", "w");
    t0 = now();
    for (int i = 0; i < N; i++) {
        fputc('x', fp);
    }
    fclose(fp);
    t1 = now();
    printf("buffered:   %.3f seconds (~%d syscalls)\n", t1 - t0, N / 4096);

    return 0;
}
$ gcc -O2 -Wall -o perf_test perf_test.c && ./perf_test
unbuffered: 1.247 seconds (1000000 syscalls)
buffered:   0.018 seconds (~244 syscalls)

The buffered version is roughly 60x faster. Each fputc copies one byte into the stdio buffer. Only when the buffer fills (~4096 bytes) does a write() syscall happen.

Unbuffered:  write("x")  write("x")  write("x")  ...  (1,000,000 syscalls)
                |            |            |
                v            v            v
             kernel        kernel       kernel

Buffered:    fputc -> [....buffer fills....] -> write(4096 bytes) -> kernel
             fputc -> [....buffer fills....] -> write(4096 bytes) -> kernel
             ...                                (~244 syscalls total)

Rust Performance Comparison

// perf_test.rs -- buffered vs unbuffered in Rust
use std::fs::File;
use std::io::{BufWriter, Write};
use std::time::Instant;

const N: usize = 1_000_000;

fn main() -> std::io::Result<()> {
    // Unbuffered: each write_all is a syscall
    {
        let mut f = File::create("/tmp/perf_unbuf_rs.txt")?;
        let t0 = Instant::now();
        for _ in 0..N {
            f.write_all(b"x")?;
        }
        let elapsed = t0.elapsed();
        println!("unbuffered: {:.3} seconds", elapsed.as_secs_f64());
    }

    // Buffered: BufWriter batches writes
    {
        let f = File::create("/tmp/perf_buf_rs.txt")?;
        let mut writer = BufWriter::new(f);
        let t0 = Instant::now();
        for _ in 0..N {
            writer.write_all(b"x")?;
        }
        writer.flush()?;
        let elapsed = t0.elapsed();
        println!("buffered:   {:.3} seconds", elapsed.as_secs_f64());
    }

    Ok(())
}

Rust Note: File::write_all does not buffer -- each call goes directly to the kernel. Always wrap File in BufWriter when doing many small writes. This is one of the most common Rust I/O performance mistakes.

Custom Buffer Sizes

The default BufWriter buffer is 8 KiB. For large sequential writes (like copying a multi-gigabyte file), a larger buffer can help:

#![allow(unused)]
fn main() {
use std::fs::File;
use std::io::BufWriter;

let f = File::create("/tmp/large_output.bin")?;
let writer = BufWriter::with_capacity(64 * 1024, f);  // 64 KiB buffer
}

In C, setvbuf does the same:

FILE *fp = fopen("/tmp/large_output.bin", "w");
char *buf = malloc(64 * 1024);
setvbuf(fp, buf, _IOFBF, 64 * 1024);
/* ... use fp ... */
fclose(fp);
free(buf);

Driver Prep: In kernel space there is no stdio. Drivers use raw copy_to_user/copy_from_user for transferring data between kernel and user buffers. Understanding why buffering matters at the user level helps you design efficient kernel interfaces.

Quick Knowledge Check

  1. Why is stderr unbuffered by default?

  2. You call printf("hello") (no newline) and then your program crashes. Does "hello" appear on the terminal? What if stdout is connected to a pipe?

  3. In Rust, what happens if BufWriter::flush() is never called and the BufWriter is simply dropped?

Common Pitfalls

  • Forgetting to flush before fork(). Both parent and child inherit the buffer contents. When both eventually flush, you get duplicate output.

  • Assuming printf output appears immediately. It does on a terminal (line-buffered), but not when piped to another process (full-buffered).

  • Using fflush(stdin). Undefined behavior in the C standard.

  • Dropping BufWriter without explicit flush(). The implicit flush on drop silently discards errors. Always flush explicitly when error handling matters.

  • Using unbuffered I/O for many small writes. The syscall overhead dominates. Always buffer.

  • Mixing buffered and unbuffered on the same fd. Output arrives in unpredictable order.

  • Not setting binary mode on Windows. On Windows, fopen without "b" translates \n to \r\n. On Linux this is not an issue, but portable code should use "wb" / "rb" for binary files.

File Metadata and Directories

Files are not just data blobs. The kernel stores metadata about every file: its size, owner, permissions, timestamps, and more. This chapter teaches you how to query and modify that metadata, and how to navigate the directory tree from both C and Rust.

The stat() Family

The stat() system call fills a struct stat with information about a file. There are three variants:

FunctionOperates onFollows symlinks?
stat()a path (string)Yes
lstat()a path (string)No
fstat()an open fd (int)N/A
/* stat_demo.c -- query file metadata */
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#include <pwd.h>
#include <grp.h>

int main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "usage: %s <path>\n", argv[0]);
        return 1;
    }

    struct stat st;
    if (stat(argv[1], &st) == -1) {
        perror("stat");
        return 1;
    }

    printf("file:        %s\n", argv[1]);
    printf("inode:       %lu\n", (unsigned long)st.st_ino);
    printf("size:        %ld bytes\n", (long)st.st_size);
    printf("blocks:      %ld (512-byte units)\n", (long)st.st_blocks);
    printf("hard links:  %lu\n", (unsigned long)st.st_nlink);

    struct passwd *pw = getpwuid(st.st_uid);
    struct group  *gr = getgrgid(st.st_gid);
    printf("owner:       %s (uid %d)\n", pw ? pw->pw_name : "?", st.st_uid);
    printf("group:       %s (gid %d)\n", gr ? gr->gr_name : "?", st.st_gid);
    printf("permissions: %o\n", st.st_mode & 07777);

    if (S_ISREG(st.st_mode))       printf("type:        regular file\n");
    else if (S_ISDIR(st.st_mode))  printf("type:        directory\n");
    else if (S_ISLNK(st.st_mode))  printf("type:        symlink\n");
    else if (S_ISCHR(st.st_mode))  printf("type:        char device\n");
    else if (S_ISBLK(st.st_mode))  printf("type:        block device\n");
    else if (S_ISFIFO(st.st_mode)) printf("type:        FIFO/pipe\n");
    else if (S_ISSOCK(st.st_mode)) printf("type:        socket\n");

    char timebuf[64];
    struct tm *tm = localtime(&st.st_mtime);
    strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S", tm);
    printf("modified:    %s\n", timebuf);

    return 0;
}
$ gcc -Wall -o stat_demo stat_demo.c && ./stat_demo /etc/passwd
file:        /etc/passwd
inode:       1048594
size:        2773 bytes
blocks:      8 (512-byte units)
hard links:  1
owner:       root (uid 0)
group:       root (gid 0)
permissions: 644
type:        regular file
modified:    2025-01-15 10:22:33

The struct stat Layout

struct stat
+------------------+--------------------------------------------+
| st_dev           | device ID of filesystem                    |
| st_ino           | inode number (unique within filesystem)    |
| st_mode          | file type + permissions (see below)        |
| st_nlink         | number of hard links                       |
| st_uid           | owner user ID                              |
| st_gid           | owner group ID                             |
| st_rdev          | device ID (for char/block devices)         |
| st_size          | file size in bytes                         |
| st_blksize       | optimal I/O block size                     |
| st_blocks        | number of 512-byte blocks allocated        |
| st_atim          | last access time                           |
| st_mtim          | last modification time                     |
| st_ctim          | last status change time                    |
+------------------+--------------------------------------------+

st_mode bit layout (16 bits):
+------+------+------+------+------+------+
| type |setuid|setgid|sticky| user | group| other
| 4bit | 1    | 1    | 1    | rwx  | rwx  | rwx
+------+------+------+------+------+------+

Driver Prep: When you implement a character device driver, the kernel populates some of these fields for you. Your driver's getattr callback can override them. Understanding what each field means is essential.

Checking File Type with Macros

The S_IS* macros decode the file type from st_mode:

if (S_ISREG(st.st_mode))  { /* regular file */ }
if (S_ISDIR(st.st_mode))  { /* directory */ }
if (S_ISLNK(st.st_mode))  { /* symbolic link -- use lstat! */ }
if (S_ISCHR(st.st_mode))  { /* character device */ }
if (S_ISBLK(st.st_mode))  { /* block device */ }
if (S_ISFIFO(st.st_mode)) { /* FIFO (named pipe) */ }
if (S_ISSOCK(st.st_mode)) { /* socket */ }

Caution: stat() follows symbolic links. If you call stat() on a symlink, you get the metadata of the target file. Use lstat() to get the metadata of the symlink itself.

Rust: std::fs::metadata

// metadata_demo.rs -- file metadata in Rust
use std::fs;
use std::os::unix::fs::MetadataExt;
use std::os::unix::fs::PermissionsExt;

fn main() -> std::io::Result<()> {
    let path = "/etc/passwd";
    let meta = fs::metadata(path)?;

    println!("file:        {}", path);
    println!("inode:       {}", meta.ino());
    println!("size:        {} bytes", meta.len());
    println!("blocks:      {}", meta.blocks());
    println!("hard links:  {}", meta.nlink());
    println!("uid:         {}", meta.uid());
    println!("gid:         {}", meta.gid());
    println!("permissions: {:o}", meta.permissions().mode() & 0o7777);

    if meta.is_file()    { println!("type:        regular file"); }
    if meta.is_dir()     { println!("type:        directory"); }
    if meta.is_symlink() { println!("type:        symlink"); }

    if let Ok(modified) = meta.modified() {
        println!("modified:    {:?}", modified);
    }

    Ok(())
}

For symlink metadata (equivalent to lstat), use fs::symlink_metadata():

#![allow(unused)]
fn main() {
let meta = fs::symlink_metadata("/some/symlink")?;
println!("is symlink: {}", meta.is_symlink());
}

Rust Note: MetadataExt is Unix-specific (imported from std::os::unix::fs). The cross-platform Metadata type only exposes len(), is_file(), is_dir(), and timestamps. For inode, uid, gid, and other Unix-specific fields, you need the extension trait.

Changing Permissions and Ownership

/* chmod_demo.c -- change file permissions */
#include <stdio.h>
#include <sys/stat.h>

int main(void)
{
    const char *path = "/tmp/chmod_test.txt";

    FILE *fp = fopen(path, "w");
    if (!fp) { perror("fopen"); return 1; }
    fprintf(fp, "secret data\n");
    fclose(fp);

    if (chmod(path, 0400) == -1) {
        perror("chmod");
        return 1;
    }

    struct stat st;
    stat(path, &st);
    printf("permissions: %o\n", st.st_mode & 07777);

    chmod(path, 0644);  /* restore */
    return 0;
}

For changing ownership, chown(path, uid, gid) works the same way. Only root (or a process with CAP_CHOWN) can change file ownership to another user.

In Rust:

use std::fs;
use std::os::unix::fs::PermissionsExt;

fn main() -> std::io::Result<()> {
    let path = "/tmp/chmod_test_rs.txt";
    fs::write(path, "secret data\n")?;

    let perms = fs::Permissions::from_mode(0o400);
    fs::set_permissions(path, perms)?;

    let meta = fs::metadata(path)?;
    println!("permissions: {:o}", meta.permissions().mode() & 0o7777);

    fs::set_permissions(path, fs::Permissions::from_mode(0o644))?;
    Ok(())
}

Directory Operations in C

Reading a Directory

/* readdir_demo.c -- list directory contents */
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <sys/stat.h>
#include <string.h>

int main(int argc, char *argv[])
{
    const char *dirpath = argc > 1 ? argv[1] : ".";

    DIR *dp = opendir(dirpath);
    if (!dp) {
        perror("opendir");
        return 1;
    }

    struct dirent *entry;
    while ((entry = readdir(dp)) != NULL) {
        if (strcmp(entry->d_name, ".") == 0 ||
            strcmp(entry->d_name, "..") == 0)
            continue;

        char fullpath[4096];
        snprintf(fullpath, sizeof(fullpath), "%s/%s", dirpath, entry->d_name);

        struct stat st;
        if (lstat(fullpath, &st) == -1) {
            perror(fullpath);
            continue;
        }

        char type = '-';
        if (S_ISDIR(st.st_mode))       type = 'd';
        else if (S_ISLNK(st.st_mode))  type = 'l';
        else if (S_ISCHR(st.st_mode))  type = 'c';
        else if (S_ISBLK(st.st_mode))  type = 'b';

        printf("%c %8ld %s\n", type, (long)st.st_size, entry->d_name);
    }

    closedir(dp);
    return 0;
}
$ gcc -Wall -o readdir_demo readdir_demo.c && ./readdir_demo /tmp
- 32 fd_demo.txt
- 20 buffered.txt
-  0 chmod_test.txt

Creating and Removing Directories

/* mkdir_rmdir.c -- create and remove directories */
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>

int main(void)
{
    if (mkdir("/tmp/mydir", 0755) == -1)
        perror("mkdir");

    if (mkdir("/tmp/mydir/sub", 0755) == -1)
        perror("mkdir sub");

    printf("created /tmp/mydir/sub\n");

    /* rmdir only works on empty directories */
    rmdir("/tmp/mydir/sub");
    rmdir("/tmp/mydir");
    printf("removed both directories\n");

    return 0;
}

Caution: rmdir() fails with ENOTEMPTY if the directory is not empty. To remove a directory tree, you must remove its contents first (recursively).

/* unlink_rename.c */
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    FILE *fp = fopen("/tmp/old_name.txt", "w");
    fprintf(fp, "I will be renamed\n");
    fclose(fp);

    fp = fopen("/tmp/to_delete.txt", "w");
    fprintf(fp, "I will be deleted\n");
    fclose(fp);

    /* rename() -- atomic on the same filesystem */
    if (rename("/tmp/old_name.txt", "/tmp/new_name.txt") == -1) {
        perror("rename");
        return 1;
    }
    printf("renamed old_name.txt -> new_name.txt\n");

    /* unlink() -- remove a hard link (deletes file if last link) */
    if (unlink("/tmp/to_delete.txt") == -1) {
        perror("unlink");
        return 1;
    }
    printf("deleted to_delete.txt\n");

    unlink("/tmp/new_name.txt");
    return 0;
}

Directory Operations in Rust

// readdir_demo.rs -- list directory contents
use std::fs;

fn main() -> std::io::Result<()> {
    let dirpath = std::env::args().nth(1).unwrap_or_else(|| ".".to_string());

    for entry in fs::read_dir(&dirpath)? {
        let entry = entry?;
        let meta = entry.metadata()?;
        let file_type = entry.file_type()?;

        let type_char = if file_type.is_dir() { 'd' }
            else if file_type.is_symlink() { 'l' }
            else { '-' };

        println!("{} {:>8} {}", type_char, meta.len(),
                 entry.file_name().to_string_lossy());
    }

    Ok(())
}

Creating, Removing, Renaming

// dir_ops.rs -- mkdir, rmdir, rename, remove
use std::fs;

fn main() -> std::io::Result<()> {
    fs::create_dir_all("/tmp/rustdir/sub")?;
    println!("created /tmp/rustdir/sub");

    fs::write("/tmp/rustdir/sub/hello.txt", "hello\n")?;

    fs::rename("/tmp/rustdir/sub/hello.txt",
               "/tmp/rustdir/sub/world.txt")?;
    println!("renamed hello.txt -> world.txt");

    fs::remove_file("/tmp/rustdir/sub/world.txt")?;
    println!("removed world.txt");

    fs::remove_dir_all("/tmp/rustdir")?;
    println!("removed /tmp/rustdir and all contents");

    Ok(())
}

Rust Note: fs::create_dir_all is like mkdir -p -- it creates parent directories as needed. fs::remove_dir_all is like rm -rf -- it removes everything recursively. In C you must walk the tree yourself.

Hard link:
  name_a  ──>  inode 12345  <──  name_b
  (both names point to the same inode; same data blocks)

Symbolic link:
  symlink  ──>  "path/to/target"  (just stores a path string)
      |
      +-- readlink() returns "path/to/target"
      +-- stat() follows to the target
      +-- lstat() returns info about the symlink itself
/* links_demo.c -- create hard and symbolic links */
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>

int main(void)
{
    FILE *fp = fopen("/tmp/original.txt", "w");
    fprintf(fp, "original content\n");
    fclose(fp);

    link("/tmp/original.txt", "/tmp/hardlink.txt");
    symlink("/tmp/original.txt", "/tmp/symlink.txt");

    struct stat st;
    stat("/tmp/original.txt", &st);
    printf("original inode: %lu, nlink: %lu\n",
           (unsigned long)st.st_ino, (unsigned long)st.st_nlink);

    stat("/tmp/hardlink.txt", &st);
    printf("hardlink inode: %lu (same!)\n", (unsigned long)st.st_ino);

    lstat("/tmp/symlink.txt", &st);
    printf("symlink  inode: %lu (different)\n", (unsigned long)st.st_ino);

    char target[256];
    ssize_t n = readlink("/tmp/symlink.txt", target, sizeof(target) - 1);
    if (n != -1) {
        target[n] = '\0';
        printf("symlink target: %s\n", target);
    }

    unlink("/tmp/hardlink.txt");
    unlink("/tmp/symlink.txt");
    unlink("/tmp/original.txt");
    return 0;
}
$ gcc -Wall -o links_demo links_demo.c && ./links_demo
original inode: 2359301, nlink: 2
hardlink inode: 2359301 (same!)
symlink  inode: 2359447 (different)
symlink target: /tmp/original.txt

In Rust:

// links_demo.rs
use std::fs;
use std::os::unix::fs as unix_fs;
use std::os::unix::fs::MetadataExt;

fn main() -> std::io::Result<()> {
    fs::write("/tmp/original_rs.txt", "original content\n")?;

    fs::hard_link("/tmp/original_rs.txt", "/tmp/hardlink_rs.txt")?;
    unix_fs::symlink("/tmp/original_rs.txt", "/tmp/symlink_rs.txt")?;

    let orig = fs::metadata("/tmp/original_rs.txt")?;
    let hard = fs::metadata("/tmp/hardlink_rs.txt")?;
    let sym  = fs::symlink_metadata("/tmp/symlink_rs.txt")?;

    println!("original inode: {}, nlink: {}", orig.ino(), orig.nlink());
    println!("hardlink inode: {} (same!)", hard.ino());
    println!("symlink  inode: {} (different)", sym.ino());

    let target = fs::read_link("/tmp/symlink_rs.txt")?;
    println!("symlink target: {}", target.display());

    fs::remove_file("/tmp/hardlink_rs.txt")?;
    fs::remove_file("/tmp/symlink_rs.txt")?;
    fs::remove_file("/tmp/original_rs.txt")?;
    Ok(())
}

Working with Paths in Rust

Rust provides Path and PathBuf for safe path manipulation:

// path_demo.rs
use std::path::{Path, PathBuf};

fn main() {
    let p = Path::new("/home/user/documents/report.txt");

    println!("file name:   {:?}", p.file_name());
    println!("stem:        {:?}", p.file_stem());
    println!("extension:   {:?}", p.extension());
    println!("parent:      {:?}", p.parent());
    println!("is absolute: {}", p.is_absolute());

    let mut pb = PathBuf::from("/home/user");
    pb.push("documents");
    pb.push("report.txt");
    println!("built path:  {}", pb.display());

    let full = Path::new("/var/log").join("syslog");
    println!("joined:      {}", full.display());
}
$ rustc path_demo.rs && ./path_demo
file name:   Some("report.txt")
stem:        Some("report")
extension:   Some("txt")
parent:      Some("/home/user/documents")
is absolute: true
built path:  /home/user/documents/report.txt
joined:      /var/log/syslog

Try It: Write a program (C or Rust) that recursively walks a directory tree, printing each file's path and size. In C, you will need a recursive function that calls opendir/readdir/stat. In Rust, consider writing a recursive function or use the walkdir crate.

Quick Knowledge Check

  1. What is the difference between stat() and lstat() when called on a symbolic link?

  2. What does st_nlink equal for a regular file with no extra hard links?

  3. Why does rmdir() fail on a non-empty directory, and what must you do to remove a full directory tree in C?

Common Pitfalls

  • Using stat() on symlinks when you meant lstat(). You silently get the target's metadata.

  • Buffer overflow in path construction. In C, building paths with sprintf without length checks is a classic vulnerability. Use snprintf.

  • TOCTOU races. Checking a file's existence with stat() and then opening it is a race condition. Another process can change the file between your check and your open. Use O_CREAT | O_EXCL for atomic creation.

  • Forgetting closedir(). Leaks a file descriptor just like forgetting close().

  • Assuming d_type in struct dirent. Not all filesystems populate d_type. Always fall back to stat() if d_type == DT_UNKNOWN.

  • Hard-linking across filesystems. Hard links only work within a single filesystem. Use symbolic links for cross-filesystem references.

Memory-Mapped I/O

Instead of copying data between kernel and user space with read() and write(), you can map a file directly into your process's address space. The kernel handles paging data in and out transparently. This is mmap() -- one of the most powerful system calls on Linux.

How mmap Works

Traditional I/O:

  User space            Kernel space             Disk
  +--------+           +------------+          +------+
  | buffer | <--copy-- | page cache | <--DMA-- | file |
  +--------+           +------------+          +------+
     read() copies data from kernel to user buffer

Memory-mapped I/O:

  User space
  +--------+
  | mapped |  <-- page fault --> kernel loads page from disk
  | region |                     directly into this address range
  +--------+
     No copy -- your pointer IS the data

When you access a mapped page for the first time, a page fault occurs. The kernel loads the data from disk into a physical page and maps it into your address space. Subsequent accesses hit that page directly -- no syscall overhead at all.

mmap in C

/* mmap_read.c -- read a file via mmap */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>

int main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "usage: %s <file>\n", argv[0]);
        return 1;
    }

    int fd = open(argv[1], O_RDONLY);
    if (fd == -1) { perror("open"); return 1; }

    struct stat st;
    if (fstat(fd, &st) == -1) { perror("fstat"); close(fd); return 1; }

    if (st.st_size == 0) {
        printf("(empty file)\n");
        close(fd);
        return 0;
    }

    void *addr = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (addr == MAP_FAILED) {
        perror("mmap");
        close(fd);
        return 1;
    }

    /* We can close fd now -- the mapping keeps the file open internally */
    close(fd);

    const char *data = (const char *)addr;
    printf("first 80 chars:\n");
    size_t len = (size_t)st.st_size < 80 ? (size_t)st.st_size : 80;
    fwrite(data, 1, len, stdout);
    printf("\n");

    size_t lines = 0;
    for (off_t i = 0; i < st.st_size; i++) {
        if (data[i] == '\n') lines++;
    }
    printf("total lines: %zu\n", lines);

    munmap(addr, st.st_size);
    return 0;
}
$ gcc -Wall -o mmap_read mmap_read.c && ./mmap_read /etc/passwd
first 80 chars:
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/n
total lines: 42

The mmap Arguments

void *mmap(
    void   *addr,    /* suggested address (NULL = let kernel choose) */
    size_t  length,  /* how many bytes to map */
    int     prot,    /* protection: PROT_READ, PROT_WRITE, PROT_EXEC */
    int     flags,   /* MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS, ... */
    int     fd,      /* file descriptor (-1 with MAP_ANONYMOUS) */
    off_t   offset   /* offset within file (must be page-aligned) */
);
FlagMeaning
MAP_PRIVATECopy-on-write: writes go to private copy, not file
MAP_SHAREDWrites go through to the file (visible to others)
MAP_ANONYMOUSNo file backing; memory initialized to zero
MAP_FIXEDUse exact address (dangerous if misused)
ProtectionMeaning
PROT_READPages can be read
PROT_WRITEPages can be written
PROT_EXECPages can be executed
PROT_NONENo access (guard pages)

Caution: MAP_FIXED will silently overwrite any existing mapping at that address, including your heap or stack. Almost never use it in application code.

MAP_SHARED vs MAP_PRIVATE

MAP_PRIVATE (copy-on-write):

  Process A          Process B
  +--------+        +--------+
  | page 1 |--+  +--| page 1 |    Both point to same physical pages
  | page 2 |--+--+--| page 2 |    (read-only until a write)
  +--------+        +--------+

  When A writes to page 1:
  +--------+        +--------+
  | page 1'| (new)  | page 1 |    A gets a private copy
  | page 2 |--+--+--| page 2 |    page 2 still shared
  +--------+        +--------+

MAP_SHARED:

  Process A          Process B
  +--------+        +--------+
  | page 1 |--+--+--| page 1 |    Same physical pages, writable
  | page 2 |--+--+--| page 2 |    Writes by A visible to B
  +--------+        +--------+

Writing with mmap

/* mmap_write.c -- modify a file via mmap */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>

int main(void)
{
    const char *path = "/tmp/mmap_write_demo.txt";

    int fd = open(path, O_RDWR | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); return 1; }

    const char *initial = "Hello, World! This is memory-mapped.\n";
    size_t len = strlen(initial);
    write(fd, initial, len);

    void *addr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); close(fd); return 1; }
    close(fd);

    char *data = (char *)addr;
    printf("before: %s", data);

    memcpy(data, "HOWDY", 5);
    printf("after:  %s", data);

    /* Ensure changes reach disk */
    msync(addr, len, MS_SYNC);
    munmap(addr, len);

    /* Verify by reading normally */
    fd = open(path, O_RDONLY);
    char buf[128];
    ssize_t n = read(fd, buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("verify: %s", buf);
    close(fd);

    return 0;
}
$ gcc -Wall -o mmap_write mmap_write.c && ./mmap_write
before: Hello, World! This is memory-mapped.
after:  HOWDY, World! This is memory-mapped.
verify: HOWDY, World! This is memory-mapped.

Caution: You cannot extend a file by writing past its end via mmap. The mapping size is fixed at mmap() time. To grow a file, use ftruncate() first, then remap.

msync: Flushing to Disk

msync() ensures that modifications to a MAP_SHARED mapping are written back to the underlying file on disk.

FlagMeaning
MS_SYNCBlock until write is complete
MS_ASYNCInitiate write, return immediately
MS_INVALIDATEInvalidate other mappings (force re-read)

Without msync, the kernel will eventually flush dirty pages, but the timing is unpredictable. For data integrity, always msync before considering data durable.

Anonymous mmap: Shared Memory Without a File

MAP_ANONYMOUS creates a mapping not backed by any file. The memory is initialized to zero. Combined with MAP_SHARED, it survives across fork() and allows parent-child communication.

/* anon_mmap.c -- shared memory between parent and child */
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void)
{
    int *shared = mmap(NULL, sizeof(int),
                       PROT_READ | PROT_WRITE,
                       MAP_SHARED | MAP_ANONYMOUS,
                       -1, 0);
    if (shared == MAP_FAILED) { perror("mmap"); return 1; }

    *shared = 0;

    pid_t pid = fork();
    if (pid == -1) { perror("fork"); return 1; }

    if (pid == 0) {
        *shared = 42;
        printf("child set *shared = %d\n", *shared);
        _exit(0);
    }

    waitpid(pid, NULL, 0);
    printf("parent reads *shared = %d\n", *shared);

    munmap(shared, sizeof(int));
    return 0;
}
$ gcc -Wall -o anon_mmap anon_mmap.c && ./anon_mmap
child set *shared = 42
parent reads *shared = 42

Driver Prep: Kernel drivers often use remap_pfn_range() to map device memory or DMA buffers into user space. The user-space side calls mmap() on the device file. Understanding MAP_SHARED here is essential preparation.

Large File Processing with mmap and madvise

mmap is ideal for processing large files. The kernel pages data in on demand and can evict pages under memory pressure.

/* mmap_large.c -- count bytes in a large file via mmap */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>

int main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "usage: %s <file>\n", argv[0]);
        return 1;
    }

    int fd = open(argv[1], O_RDONLY);
    if (fd == -1) { perror("open"); return 1; }

    struct stat st;
    fstat(fd, &st);
    if (st.st_size == 0) { printf("empty file\n"); close(fd); return 0; }

    const char *data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (data == MAP_FAILED) { perror("mmap"); close(fd); return 1; }
    close(fd);

    /* Advise kernel we will read sequentially */
    madvise((void *)data, st.st_size, MADV_SEQUENTIAL);

    unsigned char target = 'e';
    size_t count = 0;
    for (off_t i = 0; i < st.st_size; i++) {
        if ((unsigned char)data[i] == target) count++;
    }

    printf("'%c' appears %zu times in %s (%ld bytes)\n",
           target, count, argv[1], (long)st.st_size);

    munmap((void *)data, st.st_size);
    return 0;
}

madvise() hints to the kernel how you plan to access the data:

HintMeaning
MADV_SEQUENTIALWill read sequentially; prefetch aggressively
MADV_RANDOMWill read randomly; do not prefetch
MADV_WILLNEEDWill need these pages soon; start loading
MADV_DONTNEEDDone with these pages; can be reclaimed

Try It: Map /usr/share/dict/words (if available) and count how many words start with the letter 'z'. Compare the speed against read() in a loop.

Rust: The memmap2 Crate

The Rust standard library does not include mmap. The memmap2 crate provides a safe wrapper. Add to Cargo.toml:

[dependencies]
memmap2 = "0.9"
// mmap_read.rs -- read a file via mmap in Rust
use memmap2::Mmap;
use std::fs::File;

fn main() -> std::io::Result<()> {
    let path = std::env::args().nth(1).expect("usage: mmap_read <file>");

    let file = File::open(&path)?;
    let mmap = unsafe { Mmap::map(&file)? };

    // mmap implements Deref<Target=[u8]>, so we can use it as a byte slice
    println!("file size: {} bytes", mmap.len());

    let preview = std::cmp::min(80, mmap.len());
    let text = String::from_utf8_lossy(&mmap[..preview]);
    println!("first {} bytes:\n{}", preview, text);

    let lines = mmap.iter().filter(|&&b| b == b'\n').count();
    println!("total lines: {}", lines);

    Ok(())
    // mmap is automatically unmapped when dropped
}

Rust Note: Mmap::map() is unsafe because the file could be modified by another process or truncated while you hold the mapping, causing undefined behavior (SIGBUS). This is the same risk as in C -- mmap is inherently a shared-memory interface.

Writable mmap in Rust

// mmap_write.rs -- modify a file via mmap in Rust
use memmap2::MmapMut;
use std::fs::OpenOptions;

fn main() -> std::io::Result<()> {
    let path = "/tmp/mmap_write_rs.txt";
    std::fs::write(path, b"Hello, World! Memory-mapped Rust.\n")?;

    let file = OpenOptions::new().read(true).write(true).open(path)?;
    let mut mmap = unsafe { MmapMut::map_mut(&file)? };

    println!("before: {}", String::from_utf8_lossy(&mmap[..]));

    mmap[..5].copy_from_slice(b"HOWDY");
    mmap.flush()?;

    println!("after:  {}", String::from_utf8_lossy(&mmap[..]));

    let contents = std::fs::read_to_string(path)?;
    print!("verify: {}", contents);
    Ok(())
}

mprotect: Guard Pages

mprotect() changes the protection on an existing mapping. One use case is guard pages -- regions marked PROT_NONE that cause a segfault on access, used to detect stack overflows or buffer overruns.

/* guard_page.c -- use mprotect to create a guard page */
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>

static void handler(int sig, siginfo_t *info, void *ctx)
{
    (void)ctx;
    printf("caught %s at address %p\n",
           sig == SIGSEGV ? "SIGSEGV" : "SIGBUS",
           info->si_addr);
    _exit(1);
}

int main(void)
{
    long page_size = sysconf(_SC_PAGESIZE);

    void *region = mmap(NULL, 2 * page_size,
                        PROT_READ | PROT_WRITE,
                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (region == MAP_FAILED) { perror("mmap"); return 1; }

    void *guard = (char *)region + page_size;
    if (mprotect(guard, page_size, PROT_NONE) == -1) {
        perror("mprotect");
        return 1;
    }

    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_sigaction = handler;
    sa.sa_flags = SA_SIGINFO;
    sigaction(SIGSEGV, &sa, NULL);

    char *usable = (char *)region;
    usable[0] = 'A';
    printf("wrote to usable page OK\n");

    printf("about to touch guard page...\n");
    char *bad = (char *)guard;
    bad[0] = 'B';  /* triggers SIGSEGV */

    munmap(region, 2 * page_size);
    return 0;
}
$ gcc -Wall -o guard_page guard_page.c && ./guard_page
wrote to usable page OK
about to touch guard page...
caught SIGSEGV at address 0x7f8a12341000
Memory layout:
+-------------------+-------------------+
|   usable page     |   guard page      |
|   PROT_READ |     |   PROT_NONE       |
|   PROT_WRITE      |   (any access =   |
|                   |    SIGSEGV)       |
+-------------------+-------------------+
^                   ^
region              region + page_size

mmap for Device Register Access (Preview)

In embedded and driver work, hardware registers live at fixed physical addresses. User-space programs can access them by mapping /dev/mem:

/* devmem_preview.c -- concept only, requires root */
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdint.h>

int main(void)
{
    off_t phys_addr = 0xFE200000;  /* hypothetical GPIO base */
    size_t page_size = sysconf(_SC_PAGESIZE);

    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    if (fd == -1) { perror("open /dev/mem (need root)"); return 1; }

    void *map = mmap(NULL, page_size,
                     PROT_READ | PROT_WRITE, MAP_SHARED,
                     fd, phys_addr);
    if (map == MAP_FAILED) { perror("mmap"); close(fd); return 1; }

    volatile uint32_t *regs = (volatile uint32_t *)map;
    uint32_t val = regs[0];
    printf("register at 0x%lx = 0x%08x\n", (long)phys_addr, val);

    munmap(map, page_size);
    close(fd);
    return 0;
}

Caution: Writing to the wrong physical address via /dev/mem can crash your system, corrupt data, or damage hardware. Production systems use proper kernel drivers with request_mem_region() and ioremap().

The volatile keyword is critical. Without it, the compiler may optimize away reads and writes to hardware registers. Hardware registers are side-effectful -- reading a status register may clear an interrupt flag.

Driver Prep: Device drivers map hardware registers into user space via the driver's mmap file_operation, which calls remap_pfn_range(). The user-space pattern is always: open device fd, mmap it, read/write through pointers.

Comparing I/O Methods

Method          Copies    Syscalls/access   Best for
-----------     ------    ---------------   --------
read()/write()  1-2       1 per call        Small files, streaming
stdio (fread)   2         ~1 per buffer     General purpose
mmap            0         0 (after fault)   Large files, random access,
                                            shared memory, device regs

mmap wins on: zero-copy access, automatic caching, cheap random access, and inter-process shared memory. It loses on: overhead for small files (minimum one page), harder error handling (SIGBUS), inability to grow without remapping, and inability to work with pipes, sockets, or non-seekable files.

Quick Knowledge Check

  1. What is the difference between MAP_SHARED and MAP_PRIVATE when you write to a mapped page?

  2. Why must you call msync() if you need to guarantee data has reached disk?

  3. What signal does the kernel deliver if you access an mmap'd region after the file has been truncated shorter than the mapping?

Common Pitfalls

  • Mapping a zero-length file. mmap with length 0 returns an error. Always check st_size before mapping.

  • Forgetting munmap(). Leaked mappings consume virtual address space. In long-running processes this eventually causes mmap to fail.

  • Ignoring SIGBUS. If the file is truncated while mapped, accessing beyond the new end delivers SIGBUS, not SIGSEGV.

  • Using MAP_FIXED casually. It silently overwrites existing mappings.

  • Writing past the mapping size. The mapping covers exactly the bytes you requested. Writing beyond it is a segfault.

  • Missing volatile on device registers. The compiler will optimize away your hardware accesses without it.

  • Forgetting O_SYNC for device memory. Without it, the kernel may use caching that reorders stores to device registers.

Creating Processes: fork, exec, wait

Every program you run from a shell starts as a clone. The kernel duplicates the running process, then the clone replaces itself with a new program. This fork-exec pattern is the foundation of Unix process creation, and understanding it is non-negotiable for systems work.

fork(): Duplicating a Process

fork() creates an almost-exact copy of the calling process. The parent gets the child's PID as the return value; the child gets zero.

/* fork_basic.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main(void)
{
    printf("Before fork: PID = %d\n", getpid());

    pid_t pid = fork();

    if (pid < 0) {
        perror("fork");
        return 1;
    }

    if (pid == 0) {
        /* Child process */
        printf("Child:  PID = %d, parent PID = %d\n", getpid(), getppid());
    } else {
        /* Parent process */
        printf("Parent: PID = %d, child PID = %d\n", getpid(), pid);
    }

    return 0;
}

Compile and run:

$ gcc -o fork_basic fork_basic.c
$ ./fork_basic
Before fork: PID = 1234
Parent: PID = 1234, child PID = 1235
Child:  PID = 1235, parent PID = 1234

The "Before fork" line prints once. After fork(), two processes execute the same code. The return value tells each process which role it plays.

         fork()
           |
     +-----+-----+
     |             |
  Parent          Child
  pid > 0         pid == 0
  (original)      (copy)

Caution: After fork(), the child inherits copies of all open file descriptors. If both parent and child write to the same fd without coordination, output will interleave unpredictably.

What the Child Inherits

The child gets copies of:

  • Memory (stack, heap, data, text segments -- copy-on-write)
  • Open file descriptors
  • Signal dispositions
  • Environment variables
  • Current working directory
  • umask

The child gets its own:

  • PID
  • Parent PID (set to the forking process)
  • Pending signals (cleared)
  • File locks (not inherited)
Parent Memory           Child Memory (after fork)
+------------------+    +------------------+
| text  (shared)   |    | text  (shared)   |
| data             |    | data  (COW copy) |
| heap             |    | heap  (COW copy) |
| stack            |    | stack (COW copy) |
| fd table [0,1,2] |    | fd table [0,1,2] |
+------------------+    +------------------+
         \                      /
          \-----> kernel <-----/
           (same open file descriptions)

Try It: Add a variable int x = 42; before fork(). In the child, set x = 99; and print it. In the parent, sleep one second, then print x. Confirm the parent still sees 42.

exec(): Replacing the Process Image

fork() gives you a clone. exec() replaces that clone with a different program entirely. The exec family includes: execl, execlp, execle, execv, execvp, execvpe. The differences are how you pass arguments and whether PATH is searched.

/* exec_basic.c */
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    printf("About to exec 'ls -la /tmp'\n");

    /* execlp searches PATH for the binary */
    execlp("ls", "ls", "-la", "/tmp", (char *)NULL);

    /* If exec returns, it failed */
    perror("execlp");
    return 1;
}

After a successful exec(), the calling process's code, data, and stack are replaced. The PID stays the same. Open file descriptors without FD_CLOEXEC remain open.

Caution: If exec() returns at all, it has failed. Always follow an exec() call with error handling.

The fork-exec Pattern

This is the standard Unix way to run a new program:

/* fork_exec.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void)
{
    pid_t pid = fork();

    if (pid < 0) {
        perror("fork");
        return 1;
    }

    if (pid == 0) {
        /* Child: replace self with 'date' */
        execlp("date", "date", "+%Y-%m-%d %H:%M:%S", (char *)NULL);
        perror("execlp");
        _exit(127);  /* Use _exit in child after failed exec */
    }

    /* Parent: wait for child to finish */
    int status;
    waitpid(pid, &status, 0);

    if (WIFEXITED(status)) {
        printf("Child exited with status %d\n", WEXITSTATUS(status));
    }

    return 0;
}

Caution: In the child after a failed exec(), use _exit() instead of exit(). The exit() function flushes stdio buffers -- which are copies from the parent. This can cause duplicated output.

wait() and waitpid(): Reaping Children

When a child process terminates, the kernel keeps a small record of its exit status until the parent retrieves it. Until then, the child is a zombie.

/* wait_status.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void)
{
    pid_t pid = fork();

    if (pid < 0) {
        perror("fork");
        return 1;
    }

    if (pid == 0) {
        printf("Child running, PID = %d\n", getpid());
        exit(42);
    }

    int status;
    pid_t waited = waitpid(pid, &status, 0);

    if (waited < 0) {
        perror("waitpid");
        return 1;
    }

    if (WIFEXITED(status)) {
        printf("Child %d exited normally, status = %d\n",
               waited, WEXITSTATUS(status));
    } else if (WIFSIGNALED(status)) {
        printf("Child %d killed by signal %d\n",
               waited, WTERMSIG(status));
    } else if (WIFSTOPPED(status)) {
        printf("Child %d stopped by signal %d\n",
               waited, WSTOPSIG(status));
    }

    return 0;
}

The status macros decode the packed integer:

MacroMeaning
WIFEXITED(s)True if child exited normally
WEXITSTATUS(s)Exit code (0-255)
WIFSIGNALED(s)True if killed by signal
WTERMSIG(s)Signal that killed it
WIFSTOPPED(s)True if stopped (traced)
WSTOPSIG(s)Signal that stopped it

The Zombie Problem

A zombie is a process that has exited but whose parent has not called wait(). It occupies a slot in the process table.

/* zombie.c -- creates a zombie for 30 seconds */
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main(void)
{
    pid_t pid = fork();

    if (pid == 0) {
        printf("Child exiting immediately\n");
        exit(0);
    }

    printf("Parent sleeping 30s -- child %d is a zombie\n", pid);
    printf("Run: ps aux | grep Z\n");
    sleep(30);

    /* Never reaps the child -- zombie persists until parent exits */
    return 0;
}

Try It: Run the zombie program. In another terminal, run ps aux | grep Z to see the zombie entry. Note the Z+ in the STAT column.

To avoid zombies in long-running servers, either:

  1. Call waitpid() periodically or after SIGCHLD.
  2. Set SIGCHLD to SIG_IGN (Linux-specific: auto-reaps children).
  3. Double-fork (child forks again, middle process exits immediately).

Rust: std::process::Command

Rust's standard library wraps fork-exec-wait into a safe, ergonomic API.

// command_basic.rs
use std::process::Command;

fn main() {
    let output = Command::new("date")
        .arg("+%Y-%m-%d %H:%M:%S")
        .output()
        .expect("failed to execute 'date'");

    println!("stdout: {}", String::from_utf8_lossy(&output.stdout));
    println!("status: {}", output.status);
}

Command::new does not fork immediately. Calling .output() forks, execs, and waits, returning all three streams and the exit status. For streaming output:

// command_stream.rs
use std::process::{Command, Stdio};
use std::io::{BufRead, BufReader};

fn main() {
    let mut child = Command::new("ls")
        .arg("-la")
        .arg("/tmp")
        .stdout(Stdio::piped())
        .spawn()
        .expect("failed to spawn");

    if let Some(stdout) = child.stdout.take() {
        let reader = BufReader::new(stdout);
        for line in reader.lines() {
            let line = line.expect("read error");
            println!("LINE: {}", line);
        }
    }

    let status = child.wait().expect("wait failed");
    println!("Exit status: {}", status);
}

Rust Note: Command handles the fork-exec-wait dance, fd cleanup, and error propagation. You never touch raw PIDs. The child is automatically waited on when the Child handle is dropped (though the drop does not block -- it just detaches).

Rust: Raw fork with the nix Crate

When you need the full power of fork(), the nix crate provides it:

// fork_nix.rs
// Cargo.toml: nix = { version = "0.29", features = ["process", "signal"] }
use nix::unistd::{fork, ForkResult, getpid, getppid};
use nix::sys::wait::waitpid;
use std::process::exit;

fn main() {
    println!("Before fork: PID = {}", getpid());

    match unsafe { fork() }.expect("fork failed") {
        ForkResult::Parent { child } => {
            println!("Parent: PID = {}, child = {}", getpid(), child);
            let status = waitpid(child, None).expect("waitpid failed");
            println!("Child exited: {:?}", status);
        }
        ForkResult::Child => {
            println!("Child: PID = {}, parent = {}", getpid(), getppid());
            exit(0);
        }
    }
}

Rust Note: fork() is unsafe in Rust because it duplicates the process including all threads' memory but only the calling thread. This can leave mutexes in a locked state with no thread to unlock them. Prefer Command unless you need pre-exec setup.

A Minimal Shell in 50 Lines

Putting it all together -- a shell that reads commands and runs them:

/* minishell.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>

#define MAX_ARGS 64
#define MAX_LINE 1024

int main(void)
{
    char line[MAX_LINE];
    char *args[MAX_ARGS];

    for (;;) {
        printf("mini$ ");
        fflush(stdout);

        if (!fgets(line, sizeof(line), stdin))
            break;

        /* Strip newline */
        line[strcspn(line, "\n")] = '\0';
        if (line[0] == '\0')
            continue;

        /* Built-in: exit */
        if (strcmp(line, "exit") == 0)
            break;

        /* Tokenize */
        int argc = 0;
        char *tok = strtok(line, " \t");
        while (tok && argc < MAX_ARGS - 1) {
            args[argc++] = tok;
            tok = strtok(NULL, " \t");
        }
        args[argc] = NULL;

        /* Fork-exec */
        pid_t pid = fork();
        if (pid < 0) {
            perror("fork");
            continue;
        }
        if (pid == 0) {
            execvp(args[0], args);
            perror(args[0]);
            _exit(127);
        }

        int status;
        waitpid(pid, &status, 0);
        if (WIFEXITED(status) && WEXITSTATUS(status) != 0)
            printf("[exit %d]\n", WEXITSTATUS(status));
    }

    printf("\n");
    return 0;
}

Try It: Extend the mini shell to support the cd built-in command. Hint: cd must be handled by the parent process since chdir() in a child only affects the child.

Driver Prep: Kernel modules do not use fork/exec -- the kernel spawns kernel threads with kthread_create(). But user-space driver helpers, udev rules, and firmware loaders all rely on fork-exec. Understanding this pattern is essential for writing device manager daemons.

Knowledge Check

  1. What value does fork() return to the child process? What does the parent receive?

  2. Why should you call _exit() instead of exit() in a child process after a failed exec()?

  3. What is a zombie process, and how do you prevent zombies in a long-running server?

Common Pitfalls

  • Forgetting to wait: Every fork() needs a corresponding wait() or SIGCHLD handler. Otherwise: zombies.

  • Using exit() after failed exec in child: Flushes the parent's buffered stdio. Use _exit().

  • Assuming execution order: After fork(), the scheduler decides who runs first. Do not assume parent runs before child or vice versa.

  • Fork bombs: A loop that calls fork() unconditionally will exhaust the process table. Always guard fork with proper termination logic.

  • Ignoring exec failure: If exec() returns, it failed. Handle it.

  • Sharing file descriptors carelessly: Both parent and child share the same open file descriptions. Close fds you do not need in each process.

Process Groups, Sessions, and Daemons

Unix organizes processes into groups and sessions. This hierarchy controls which processes receive signals from the terminal, how job control works, and how daemons detach from everything. If you write a server, a driver helper, or anything that outlives a login session, you need this.

The Process Hierarchy

Session (SID)
  |
  +-- Process Group (PGID) -- foreground job
  |     +-- Process (PID)
  |     +-- Process (PID)
  |
  +-- Process Group (PGID) -- background job
  |     +-- Process (PID)
  |
  +-- Process Group (PGID) -- background job
        +-- Process (PID)
        +-- Process (PID)

A session is a collection of process groups, typically one per login. A process group is a collection of processes, typically one per pipeline. The session leader is the process that called setsid() -- usually the shell.

Process Groups

Every process belongs to a process group. The group is identified by the PID of its leader.

/* pgid_demo.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(void)
{
    printf("Parent: PID=%d  PGID=%d  SID=%d\n",
           getpid(), getpgrp(), getsid(0));

    pid_t pid = fork();

    if (pid == 0) {
        printf("Child before setpgid: PID=%d  PGID=%d\n",
               getpid(), getpgrp());

        /* Put child in its own process group */
        setpgid(0, 0);

        printf("Child after  setpgid: PID=%d  PGID=%d\n",
               getpid(), getpgrp());
        _exit(0);
    }

    waitpid(pid, NULL, 0);
    return 0;
}
$ gcc -o pgid_demo pgid_demo.c && ./pgid_demo
Parent: PID=5000  PGID=5000  SID=4900
Child before setpgid: PID=5001  PGID=5000
Child after  setpgid: PID=5001  PGID=5001

setpgid(0, 0) means "set my PGID to my own PID" -- making the calling process a new group leader.

Why Process Groups Matter

When you press Ctrl+C in a terminal, the kernel sends SIGINT to the entire foreground process group, not just one process. A pipeline like cat file | grep pattern | wc -l runs as three processes in one group, so Ctrl+C kills them all.

Terminal (controlling terminal)
  |
  | SIGINT (Ctrl+C)
  v
Foreground Process Group
  +-- cat   (receives SIGINT)
  +-- grep  (receives SIGINT)
  +-- wc    (receives SIGINT)

Sessions and Controlling Terminals

A session is created by setsid(). The calling process becomes the session leader and is disconnected from any controlling terminal.

/* session_demo.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(void)
{
    printf("Original: PID=%d  PGID=%d  SID=%d\n",
           getpid(), getpgrp(), getsid(0));

    pid_t pid = fork();

    if (pid == 0) {
        /* setsid() fails if caller is already a group leader,
           so we fork first to guarantee we are not */
        pid_t new_sid = setsid();
        if (new_sid < 0) {
            perror("setsid");
            _exit(1);
        }

        printf("Child:    PID=%d  PGID=%d  SID=%d\n",
               getpid(), getpgrp(), getsid(0));

        /* Now PID == PGID == SID -- session leader */
        _exit(0);
    }

    waitpid(pid, NULL, 0);
    return 0;
}

Caution: setsid() fails if the calling process is already a process group leader (PID == PGID). The standard trick: fork first, then call setsid() in the child.

Job Control Basics

The shell manages foreground and background jobs by manipulating process groups and the terminal's foreground group.

ActionShell commandWhat happens
Run foreground./progShell sets prog's PGID as terminal foreground group
Run background./prog &Shell keeps its own PGID as foreground group
SuspendCtrl+ZKernel sends SIGTSTP to foreground group
Resume foregroundfgShell calls tcsetpgrp() + sends SIGCONT
Resume backgroundbgShell sends SIGCONT without changing foreground group
/* fg_group.c -- show foreground process group */
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    pid_t fg = tcgetpgrp(STDIN_FILENO);
    printf("Foreground PGID: %d\n", fg);
    printf("My PID:          %d\n", getpid());
    printf("My PGID:         %d\n", getpgrp());
    return 0;
}

Try It: Run fg_group normally, then run it with ./fg_group &. Compare the foreground PGID to your PGID in each case.

The Classic Daemon Recipe

A daemon is a process that runs in the background, detached from any terminal. The traditional recipe:

/* daemon_classic.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <signal.h>

static void daemonize(void)
{
    /* Step 1: Fork and let parent exit */
    pid_t pid = fork();
    if (pid < 0) { perror("fork"); exit(1); }
    if (pid > 0) _exit(0);   /* Parent exits */

    /* Step 2: Create new session */
    if (setsid() < 0) { perror("setsid"); exit(1); }

    /* Step 3: Ignore SIGHUP, fork again to prevent
       acquiring a controlling terminal */
    signal(SIGHUP, SIG_IGN);

    pid = fork();
    if (pid < 0) { perror("fork"); exit(1); }
    if (pid > 0) _exit(0);   /* First child exits */

    /* Step 4: Change working directory */
    chdir("/");

    /* Step 5: Reset umask */
    umask(0);

    /* Step 6: Close all open file descriptors */
    for (int fd = sysconf(_SC_OPEN_MAX); fd >= 0; fd--)
        close(fd);

    /* Step 7: Redirect stdin/stdout/stderr to /dev/null */
    open("/dev/null", O_RDWR);   /* stdin  = fd 0 */
    dup(0);                       /* stdout = fd 1 */
    dup(0);                       /* stderr = fd 2 */
}

int main(void)
{
    daemonize();

    /* Daemon work loop */
    FILE *log = fopen("/tmp/daemon_demo.log", "a");
    if (!log) _exit(1);

    for (int i = 0; i < 10; i++) {
        fprintf(log, "Daemon tick %d, PID=%d\n", i, getpid());
        fflush(log);
        sleep(2);
    }

    fclose(log);
    return 0;
}

The double-fork pattern:

Shell
  |
  +-- fork() --> Parent exits (shell gets exit status)
        |
        +-- setsid() --> New session, no controlling terminal
              |
              +-- fork() --> First child exits
                    |
                    +-- Daemon (not session leader,
                        cannot acquire controlling terminal)

Caution: The double fork is critical. A session leader that opens a terminal device can acquire it as a controlling terminal. The second fork ensures the daemon is not a session leader.

The Modern Way: systemd

On modern Linux systems, systemd manages daemons. You do not need to daemonize manually. Instead, write a simple foreground program and let systemd handle it.

A systemd service file:

# /etc/systemd/system/myservice.service
[Unit]
Description=My Demo Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/myservice
Restart=on-failure
User=nobody

[Install]
WantedBy=multi-user.target

Your program just runs in the foreground:

/* myservice.c -- systemd-friendly daemon */
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

static volatile sig_atomic_t running = 1;

static void handle_term(int sig)
{
    (void)sig;
    running = 0;
}

int main(void)
{
    signal(SIGTERM, handle_term);

    while (running) {
        printf("Service tick, PID=%d\n", getpid());
        fflush(stdout);
        sleep(5);
    }

    printf("Service shutting down\n");
    return 0;
}

systemd captures stdout to the journal. No syslog gymnastics needed.

Driver Prep: Kernel drivers do not daemonize -- they are loaded into kernel space. But user-space driver companions (firmware loaders, device managers, monitoring daemons) absolutely do. The udevd daemon is a perfect example: it manages device nodes and runs in the background from boot.

Rust: Daemon Patterns with nix

// daemon_nix.rs
// Cargo.toml: nix = { version = "0.29", features = ["process", "signal", "fs"] }
use nix::unistd::{fork, ForkResult, setsid, chdir, close, dup2};
use nix::sys::stat::umask;
use nix::sys::stat::Mode;
use std::fs::OpenOptions;
use std::os::unix::io::AsRawFd;
use std::process::exit;
use std::io::Write;
use std::thread;
use std::time::Duration;

fn daemonize() {
    // First fork
    match unsafe { fork() }.expect("first fork failed") {
        ForkResult::Parent { .. } => exit(0),
        ForkResult::Child => {}
    }

    // New session
    setsid().expect("setsid failed");

    // Second fork
    match unsafe { fork() }.expect("second fork failed") {
        ForkResult::Parent { .. } => exit(0),
        ForkResult::Child => {}
    }

    // Change directory
    chdir("/").expect("chdir failed");

    // Reset umask
    umask(Mode::empty());

    // Redirect std fds to /dev/null
    let devnull = OpenOptions::new()
        .read(true)
        .write(true)
        .open("/dev/null")
        .expect("open /dev/null");

    let fd = devnull.as_raw_fd();
    dup2(fd, 0).ok();
    dup2(fd, 1).ok();
    dup2(fd, 2).ok();
    if fd > 2 {
        close(fd).ok();
    }
}

fn main() {
    daemonize();

    let mut log = OpenOptions::new()
        .create(true)
        .append(true)
        .open("/tmp/rust_daemon.log")
        .expect("open log");

    for i in 0..10 {
        writeln!(log, "Rust daemon tick {}, PID={}", i, std::process::id())
            .expect("write log");
        log.flush().expect("flush");
        thread::sleep(Duration::from_secs(2));
    }
}

Rust Note: In practice, most Rust services run as simple foreground processes under systemd. The daemonize crate exists for cases where you truly need the classic pattern, but it is increasingly rare.

Rust: Process Groups with nix

// pgid_nix.rs
// Cargo.toml: nix = { version = "0.29", features = ["process"] }
use nix::unistd::{fork, ForkResult, getpid, getpgrp, setpgid, Pid};
use nix::sys::wait::waitpid;
use std::process::exit;

fn main() {
    println!("Parent: PID={} PGID={}", getpid(), getpgrp());

    match unsafe { fork() }.expect("fork failed") {
        ForkResult::Parent { child } => {
            waitpid(child, None).expect("waitpid");
        }
        ForkResult::Child => {
            println!("Child before: PID={} PGID={}", getpid(), getpgrp());
            setpgid(Pid::from_raw(0), Pid::from_raw(0))
                .expect("setpgid");
            println!("Child after:  PID={} PGID={}", getpid(), getpgrp());
            exit(0);
        }
    }
}

A Process Hierarchy Inspector

This utility prints the session, process group, and parent for the current process:

/* proc_info.c */
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    printf("PID:                 %d\n", getpid());
    printf("Parent PID:          %d\n", getppid());
    printf("Process Group ID:    %d\n", getpgrp());
    printf("Session ID:          %d\n", getsid(0));
    printf("Foreground PGID:     %d\n", tcgetpgrp(STDIN_FILENO));
    printf("Is session leader:   %s\n",
           getpid() == getsid(0) ? "yes" : "no");
    printf("Is group leader:     %s\n",
           getpid() == getpgrp() ? "yes" : "no");
    return 0;
}

Try It: Run proc_info from a shell. Then run cat | ./proc_info (pipe into it). Compare the Session ID and Foreground PGID. Why does the PGID change in the piped case?

Sending Signals to Process Groups

The kill() system call with a negative PID sends a signal to an entire process group:

/* kill_group.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>

int main(void)
{
    /* Create 3 children in the same new process group */
    pid_t first_child = 0;

    for (int i = 0; i < 3; i++) {
        pid_t pid = fork();
        if (pid == 0) {
            if (i == 0) setpgid(0, 0);
            else        setpgid(0, first_child);
            printf("Child %d: PID=%d PGID=%d\n", i, getpid(), getpgrp());
            pause();  /* Wait for signal */
            _exit(0);
        }
        if (i == 0) first_child = pid;
        setpgid(pid, first_child);
    }

    sleep(1);
    printf("Parent: sending SIGTERM to group %d\n", first_child);
    kill(-first_child, SIGTERM);  /* Negative PID = process group */

    for (int i = 0; i < 3; i++) {
        int status;
        pid_t w = wait(&status);
        if (WIFSIGNALED(status))
            printf("Reaped %d, killed by signal %d\n", w, WTERMSIG(status));
    }

    return 0;
}
                  kill(-PGID, SIGTERM)
                         |
              +----------+---------+
              |          |         |
           Child 0    Child 1   Child 2
           (all share the same PGID)

Knowledge Check

  1. What does setsid() do, and why must the caller not be a process group leader?

  2. Why does the classic daemon recipe fork twice?

  3. When you press Ctrl+C in a terminal, which processes receive SIGINT?

Common Pitfalls

  • Calling setsid() as group leader: It fails with EPERM. Fork first.

  • Single fork for daemons: The process remains a session leader and can accidentally acquire a controlling terminal.

  • Forgetting to close file descriptors: Inherited fds can hold locks, keep files open, or leak information. Close them all.

  • Not redirecting stdio: A daemon with no terminal that writes to stdout gets SIGPIPE and dies.

  • Manual daemonization under systemd: If systemd starts your service, do not daemonize. systemd expects Type=simple services to stay in the foreground.

  • Ignoring SIGHUP in daemons: When the session leader exits, SIGHUP is sent to the session. Daemons must handle or ignore it.

Environment and Configuration

Every Unix process inherits a block of key-value strings from its parent. This environment block controls program behavior without code changes. Understanding how it works -- and how to combine it with command-line arguments and configuration files -- is essential for writing well-behaved Unix tools.

The Environment Block

The kernel passes the environment to a new process on the stack, right after the argument strings. Each entry is a KEY=VALUE string.

/* print_env.c */
#include <stdio.h>

extern char **environ;  /* Global pointer to environment array */

int main(void)
{
    for (char **ep = environ; *ep != NULL; ep++) {
        printf("%s\n", *ep);
    }
    return 0;
}
$ gcc -o print_env print_env.c
$ ./print_env | head -5
SHELL=/bin/bash
HOME=/home/user
PATH=/usr/local/bin:/usr/bin:/bin
LANG=en_US.UTF-8
TERM=xterm-256color

The layout in memory:

Stack (high address)
+----------------------------+
| environment strings        |
| "HOME=/home/user\0"       |
| "PATH=/usr/bin:/bin\0"    |
| ...                        |
+----------------------------+
| environ[0] -> "HOME=..."  |
| environ[1] -> "PATH=..."  |
| environ[N] -> NULL         |
+----------------------------+
| argv strings               |
| argv[0] -> "./print_env"  |
| argv[1] -> NULL            |
+----------------------------+
| argc = 1                   |
+----------------------------+
        (stack grows down)

Reading and Writing the Environment

/* env_ops.c */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    /* Read */
    const char *home = getenv("HOME");
    if (home)
        printf("HOME = %s\n", home);
    else
        printf("HOME not set\n");

    /* Write -- adds or overwrites */
    setenv("MY_APP_DEBUG", "1", 1);  /* 1 = overwrite if exists */
    printf("MY_APP_DEBUG = %s\n", getenv("MY_APP_DEBUG"));

    /* Write without overwrite */
    setenv("MY_APP_DEBUG", "2", 0);  /* 0 = do not overwrite */
    printf("MY_APP_DEBUG = %s\n", getenv("MY_APP_DEBUG"));  /* Still "1" */

    /* Remove */
    unsetenv("MY_APP_DEBUG");
    printf("After unsetenv: %s\n",
           getenv("MY_APP_DEBUG") ? getenv("MY_APP_DEBUG") : "(null)");

    return 0;
}

Caution: putenv() inserts a pointer to your string directly into the environment. If that string is on the stack, it becomes a dangling pointer when the function returns. Prefer setenv(), which copies the string.

Caution: None of the environment functions are thread-safe. Calling setenv() or getenv() from multiple threads without synchronization is undefined behavior.

PATH Resolution and exec

When you call execlp() or execvp() (the "p" variants), the kernel searches the directories listed in PATH for the binary.

/* path_search.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    const char *path = getenv("PATH");
    if (!path) {
        printf("PATH not set\n");
        return 1;
    }

    printf("PATH directories:\n");

    /* strtok modifies the string, so copy it */
    char *copy = strdup(path);
    char *dir = strtok(copy, ":");

    int i = 0;
    while (dir) {
        printf("  [%d] %s\n", i++, dir);
        dir = strtok(NULL, ":");
    }

    free(copy);
    return 0;
}

The search order matters. If /usr/local/bin appears before /usr/bin, a binary in /usr/local/bin shadows the system version.

Caution: A PATH that includes . (current directory) or an empty component (like :/usr/bin -- note the leading colon) is a security risk. An attacker can place a malicious binary in the current directory.

Command-Line Parsing: getopt

getopt() is the traditional Unix way to parse command-line options.

/* getopt_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    int verbose = 0;
    int count = 1;
    const char *output = NULL;
    int opt;

    while ((opt = getopt(argc, argv, "vc:o:")) != -1) {
        switch (opt) {
        case 'v':
            verbose = 1;
            break;
        case 'c':
            count = atoi(optarg);
            break;
        case 'o':
            output = optarg;
            break;
        default:
            fprintf(stderr, "Usage: %s [-v] [-c count] [-o output] [files...]\n",
                    argv[0]);
            return 1;
        }
    }

    printf("verbose=%d count=%d output=%s\n",
           verbose, count, output ? output : "(none)");

    /* Remaining arguments (non-option) */
    for (int i = optind; i < argc; i++)
        printf("arg: %s\n", argv[i]);

    return 0;
}
$ ./getopt_demo -v -c 5 -o result.txt file1.txt file2.txt
verbose=1 count=5 output=result.txt
arg: file1.txt
arg: file2.txt

The option string "vc:o:" means: -v takes no argument, -c and -o each require one (indicated by the colon).

Long Options: getopt_long

For modern tools, long options like --verbose are expected.

/* getopt_long_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>

int main(int argc, char *argv[])
{
    int verbose = 0;
    int count = 1;
    const char *output = NULL;

    static struct option long_options[] = {
        {"verbose", no_argument,       NULL, 'v'},
        {"count",   required_argument, NULL, 'c'},
        {"output",  required_argument, NULL, 'o'},
        {"help",    no_argument,       NULL, 'h'},
        {NULL,      0,                 NULL,  0 }
    };

    int opt;
    while ((opt = getopt_long(argc, argv, "vc:o:h", long_options, NULL)) != -1) {
        switch (opt) {
        case 'v': verbose = 1; break;
        case 'c': count = atoi(optarg); break;
        case 'o': output = optarg; break;
        case 'h':
            printf("Usage: %s [--verbose] [--count N] [--output FILE]\n",
                   argv[0]);
            return 0;
        default:
            return 1;
        }
    }

    printf("verbose=%d count=%d output=%s\n",
           verbose, count, output ? output : "(none)");

    return 0;
}
$ ./getopt_long_demo --verbose --count 10 --output data.csv
verbose=1 count=10 output=data.csv

Rust: std::env

Rust's standard library provides safe environment access.

// env_demo.rs
use std::env;

fn main() {
    // Read
    match env::var("HOME") {
        Ok(val) => println!("HOME = {}", val),
        Err(_)  => println!("HOME not set"),
    }

    // Set
    env::set_var("MY_APP_DEBUG", "1");
    println!("MY_APP_DEBUG = {}", env::var("MY_APP_DEBUG").unwrap());

    // Remove
    env::remove_var("MY_APP_DEBUG");

    // Iterate all
    println!("\nAll environment variables:");
    for (key, value) in env::vars() {
        println!("  {}={}", key, value);
    }

    // PATH directories
    if let Some(path) = env::var_os("PATH") {
        println!("\nPATH directories:");
        for dir in env::split_paths(&path) {
            println!("  {}", dir.display());
        }
    }
}

Rust Note: env::set_var() and env::remove_var() are marked unsafe in Rust 1.66+ when used in multi-threaded programs. The Rust team recognized the same thread-safety issue that plagues C's setenv(). Prefer reading environment at startup and storing values in your own data structures.

Rust: Command-Line Parsing with clap

The clap crate is the standard Rust approach to argument parsing.

// clap_demo.rs
// Cargo.toml:
//   [dependencies]
//   clap = { version = "4", features = ["derive"] }

use clap::Parser;

/// A well-behaved Unix tool
#[derive(Parser, Debug)]
#[command(name = "mytool", version, about = "Does useful things")]
struct Args {
    /// Enable verbose output
    #[arg(short, long)]
    verbose: bool,

    /// Number of iterations
    #[arg(short, long, default_value_t = 1)]
    count: u32,

    /// Output file path
    #[arg(short, long)]
    output: Option<String>,

    /// Input files
    files: Vec<String>,
}

fn main() {
    let args = Args::parse();

    println!("verbose={} count={} output={:?}",
             args.verbose, args.count, args.output);

    for f in &args.files {
        println!("file: {}", f);
    }
}
$ cargo run -- --verbose --count 5 -o result.txt input1.dat input2.dat
verbose=true count=5 output=Some("result.txt")
file: input1.dat
file: input2.dat

The --help flag auto-generates usage text from the struct annotations.

Rust Note: clap with derive macros generates the help text, validation, and parsing code at compile time. The C equivalent requires writing all of this by hand or using a library like argp.

Configuration File Patterns

A well-behaved Unix tool checks configuration in this order (later overrides earlier):

1. Compiled-in defaults
2. System config:  /etc/myapp/config
3. User config:    ~/.config/myapp/config  (XDG_CONFIG_HOME)
4. Environment:    MYAPP_DEBUG=1
5. Command-line:   --debug

A minimal config file parser in C:

/* config_parse.c */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define MAX_LINE 256

struct config {
    int   port;
    int   verbose;
    char  logfile[256];
};

static void config_defaults(struct config *cfg)
{
    cfg->port = 8080;
    cfg->verbose = 0;
    strncpy(cfg->logfile, "/var/log/myapp.log", sizeof(cfg->logfile) - 1);
}

static int config_load(struct config *cfg, const char *path)
{
    FILE *f = fopen(path, "r");
    if (!f) return -1;

    char line[MAX_LINE];
    while (fgets(line, sizeof(line), f)) {
        /* Skip comments and empty lines */
        if (line[0] == '#' || line[0] == '\n')
            continue;

        char key[128], value[128];
        if (sscanf(line, "%127[^=]=%127[^\n]", key, value) == 2) {
            if (strcmp(key, "port") == 0)
                cfg->port = atoi(value);
            else if (strcmp(key, "verbose") == 0)
                cfg->verbose = atoi(value);
            else if (strcmp(key, "logfile") == 0)
                strncpy(cfg->logfile, value, sizeof(cfg->logfile) - 1);
        }
    }

    fclose(f);
    return 0;
}

int main(int argc, char *argv[])
{
    struct config cfg;
    config_defaults(&cfg);

    /* Try system config, then user config */
    config_load(&cfg, "/etc/myapp.conf");

    char user_conf[512];
    const char *home = getenv("HOME");
    if (home) {
        snprintf(user_conf, sizeof(user_conf), "%s/.myapp.conf", home);
        config_load(&cfg, user_conf);
    }

    /* Environment overrides */
    const char *env_port = getenv("MYAPP_PORT");
    if (env_port) cfg.port = atoi(env_port);

    printf("port=%d verbose=%d logfile=%s\n",
           cfg.port, cfg.verbose, cfg.logfile);

    return 0;
}

The /etc Convention

System-wide configuration lives under /etc. Per-application patterns:

PathPurpose
/etc/myapp.confSingle config file
/etc/myapp/Config directory
/etc/myapp/conf.d/Drop-in overrides (processed alphabetically)
/etc/default/myappDefault environment for init scripts

Putting It Together: A Well-Behaved Unix Tool

Here is a complete C program that follows all conventions:

/* wellbehaved.c -- a well-behaved Unix tool */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>

static struct {
    int   verbose;
    int   count;
    const char *output;
} config = {
    .verbose = 0,
    .count   = 1,
    .output  = NULL,
};

static void usage(const char *prog)
{
    fprintf(stderr,
        "Usage: %s [OPTIONS] [FILE...]\n"
        "\n"
        "Options:\n"
        "  -v, --verbose       Enable verbose output\n"
        "  -c, --count=N       Number of iterations (default: 1)\n"
        "  -o, --output=FILE   Output file\n"
        "  -h, --help          Show this help\n"
        "\n"
        "Environment:\n"
        "  WELLBEHAVED_VERBOSE  Set to 1 for verbose mode\n"
        "  WELLBEHAVED_COUNT    Default iteration count\n",
        prog);
}

int main(int argc, char *argv[])
{
    /* 1. Environment */
    const char *env_v = getenv("WELLBEHAVED_VERBOSE");
    if (env_v && strcmp(env_v, "1") == 0)
        config.verbose = 1;

    const char *env_c = getenv("WELLBEHAVED_COUNT");
    if (env_c) config.count = atoi(env_c);

    /* 2. Command line (overrides environment) */
    static struct option long_opts[] = {
        {"verbose", no_argument,       NULL, 'v'},
        {"count",   required_argument, NULL, 'c'},
        {"output",  required_argument, NULL, 'o'},
        {"help",    no_argument,       NULL, 'h'},
        {NULL, 0, NULL, 0}
    };

    int opt;
    while ((opt = getopt_long(argc, argv, "vc:o:h", long_opts, NULL)) != -1) {
        switch (opt) {
        case 'v': config.verbose = 1; break;
        case 'c': config.count = atoi(optarg); break;
        case 'o': config.output = optarg; break;
        case 'h': usage(argv[0]); return 0;
        default:  usage(argv[0]); return 1;
        }
    }

    /* 3. Act on stdin if no files given (Unix filter convention) */
    if (optind >= argc) {
        if (config.verbose)
            fprintf(stderr, "Reading from stdin...\n");
        /* Process stdin here */
    }

    /* 4. Process each file argument */
    for (int i = optind; i < argc; i++) {
        if (config.verbose)
            fprintf(stderr, "Processing: %s\n", argv[i]);

        FILE *f = fopen(argv[i], "r");
        if (!f) {
            fprintf(stderr, "%s: %s: %s\n", argv[0], argv[i], strerror(errno));
            continue;  /* Keep going -- do not abort on one bad file */
        }
        /* Process file here */
        fclose(f);
    }

    /* 5. Diagnostic output to stderr, data output to stdout */
    if (config.verbose)
        fprintf(stderr, "Done. Processed %d iteration(s).\n", config.count);

    return 0;
}

Key conventions this follows:

  • Diagnostic messages go to stderr, data to stdout
  • Works as a filter (reads stdin when no files given)
  • Continues on error (does not abort for one bad file)
  • Documents environment variables in --help
  • Uses exit code 0 for success, nonzero for failure

Try It: Write the Rust equivalent of wellbehaved.c using clap and std::env. Make it read from stdin when no files are given, using std::io::stdin().

Driver Prep: Kernel modules receive configuration through module parameters (module_param() macro) and device tree entries, not environment variables. But user-space tools that load, configure, and test drivers rely heavily on environment and command-line patterns. Tools like modprobe read /etc/modprobe.d/ for configuration.

Knowledge Check

  1. What is the order of precedence when a program checks compiled-in defaults, environment variables, and command-line arguments?

  2. Why is putenv() dangerous compared to setenv()?

  3. What does a leading colon or dot in PATH mean, and why is it a security risk?

Common Pitfalls

  • Not checking getenv() return value: It returns NULL if the variable is not set. Passing NULL to strcmp() or printf("%s", ...) is undefined behavior.

  • Modifying the string returned by getenv(): The returned pointer may point into the environment block. Modifying it has undefined behavior. Copy it first.

  • Thread-unsafe environment access: setenv() and getenv() are not thread-safe. Read everything you need at startup.

  • Hardcoding paths: Use environment variables (HOME, XDG_CONFIG_HOME) or /etc conventions. Never assume a home directory path.

  • Ignoring stdin: Unix tools that accept files should also work as filters. If no files are given, read from stdin.

  • Error messages to stdout: Diagnostic output must go to stderr so it does not corrupt piped data.

Signal Fundamentals

Signals are asynchronous notifications delivered by the kernel to a process. They interrupt whatever the process is doing -- right now, at any instruction boundary. They are Unix's oldest form of inter-process communication, and they are everywhere: Ctrl+C, child process death, illegal memory access, broken pipes. You cannot write robust systems code without understanding them.

What Signals Are

A signal is a small integer sent from the kernel (or another process) to a target process. When a signal arrives, the process can:

  1. Run a handler function (custom code).
  2. Accept the default action (terminate, core dump, ignore, or stop).
  3. Block the signal temporarily (it stays pending).
  Kernel / Other Process
         |
         | signal (e.g., SIGINT)
         v
  +-----------------+
  | Target Process  |
  |                 |
  |  Normal code    |  <-- interrupted
  |  ...            |
  |  Handler runs   |  <-- if installed
  |  ...            |
  |  Normal code    |  <-- resumes
  +-----------------+

Common Signals

SignalNumberDefault ActionTrigger
SIGHUP1TerminateTerminal hangup
SIGINT2TerminateCtrl+C
SIGQUIT3Core dumpCtrl+\
SIGILL4Core dumpIllegal instruction
SIGABRT6Core dumpabort()
SIGFPE8Core dumpDivide by zero (integer)
SIGKILL9TerminateUncatchable kill
SIGSEGV11Core dumpBad memory access
SIGPIPE13TerminateWrite to broken pipe
SIGALRM14Terminatealarm() timer
SIGTERM15TerminatePolite termination request
SIGCHLD17IgnoreChild process stopped/exited
SIGCONT18ContinueResume stopped process
SIGSTOP19StopUncatchable stop
SIGTSTP20StopCtrl+Z
SIGUSR110TerminateUser-defined
SIGUSR212TerminateUser-defined

Caution: SIGKILL (9) and SIGSTOP (19) cannot be caught, blocked, or ignored. The kernel enforces this. Do not waste time trying to handle them.

Default Actions

There are four possible default actions:

  • Terminate: Process exits.
  • Core dump: Process exits and writes a core file (if enabled).
  • Ignore: Signal is silently discarded.
  • Stop: Process is suspended (like Ctrl+Z).

SIGCHLD and SIGURG default to ignore. Most signals default to terminate.

Sending Signals

From the shell:

$ kill -TERM 1234       # Send SIGTERM to PID 1234
$ kill -9 1234          # Send SIGKILL (uncatchable)
$ kill -SIGUSR1 1234    # Send SIGUSR1
$ kill -0 1234          # Test if process exists (no signal sent)

From C:

/* send_signal.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>

int main(void)
{
    pid_t pid = fork();

    if (pid == 0) {
        printf("Child %d: waiting for signal...\n", getpid());
        pause();  /* Suspend until any signal arrives */
        printf("Child: this line never reached (default SIGTERM kills)\n");
        _exit(0);
    }

    sleep(1);
    printf("Parent: sending SIGTERM to child %d\n", pid);
    kill(pid, SIGTERM);

    int status;
    waitpid(pid, &status, 0);

    if (WIFSIGNALED(status))
        printf("Child killed by signal %d\n", WTERMSIG(status));

    return 0;
}

The signal() Function (And Why You Should Not Use It)

The original BSD/POSIX signal() function installs a handler:

/* signal_old.c -- for demonstration only */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

static void handler(int sig)
{
    /* UNSAFE: printf is not async-signal-safe -- demo only */
    printf("Caught signal %d\n", sig);
}

int main(void)
{
    signal(SIGINT, handler);

    printf("PID %d: Press Ctrl+C (3 times to exit via default)\n", getpid());

    for (int i = 0; i < 30; i++) {
        printf("tick %d\n", i);
        sleep(1);
    }

    return 0;
}

Caution: signal() has portability problems. On some systems it resets the handler to SIG_DFL after each delivery (System V behavior). On others it does not (BSD behavior). The behavior of signal() is implementation-defined by POSIX. Always use sigaction() instead (covered in the next chapter).

A Signal Demo: Handling SIGINT and SIGTERM

Here is a proper pattern using a flag (still using signal() for simplicity -- we will fix this with sigaction() in the next chapter):

/* graceful_shutdown.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

static volatile sig_atomic_t got_signal = 0;

static void handler(int sig)
{
    got_signal = sig;
}

int main(void)
{
    signal(SIGINT,  handler);
    signal(SIGTERM, handler);

    printf("PID %d: Running... (Ctrl+C to stop)\n", getpid());

    while (!got_signal) {
        /* Main work loop */
        printf("Working...\n");
        sleep(2);
    }

    printf("Received signal %d, shutting down gracefully.\n", got_signal);
    /* Cleanup code here */

    return 0;
}

The key type is volatile sig_atomic_t -- an integer type guaranteed to be read and written atomically with respect to signal delivery.

Main thread:              Signal delivery:

while (!got_signal) {     handler(SIGINT) {
    work();                   got_signal = SIGINT;
    sleep(2);             }
}
  |                         |
  +---- reads got_signal ---+

Try It: Modify graceful_shutdown.c to count how many times Ctrl+C is pressed. After 3 presses, exit. Print the count in the main loop.

SIGCHLD: Child Process Notifications

When a child process exits, the kernel sends SIGCHLD to the parent. This is how servers avoid blocking on waitpid():

/* sigchld_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <errno.h>

static void sigchld_handler(int sig)
{
    (void)sig;
    int saved_errno = errno;

    /* Reap all dead children (non-blocking) */
    while (waitpid(-1, NULL, WNOHANG) > 0)
        ;

    errno = saved_errno;
}

int main(void)
{
    signal(SIGCHLD, sigchld_handler);

    /* Spawn 3 children that exit at different times */
    for (int i = 0; i < 3; i++) {
        pid_t pid = fork();
        if (pid == 0) {
            sleep(i + 1);
            printf("Child %d (PID %d) exiting\n", i, getpid());
            _exit(0);
        }
        printf("Spawned child %d: PID %d\n", i, pid);
    }

    /* Parent does other work */
    for (int i = 0; i < 5; i++) {
        printf("Parent working (tick %d)...\n", i);
        sleep(2);
    }

    return 0;
}

Caution: The SIGCHLD handler must call waitpid() in a loop with WNOHANG. Multiple children can exit before the handler runs, but signals are not queued (standard signals, at least). One SIGCHLD delivery might represent multiple dead children.

SIGPIPE: Broken Pipes

When you write to a pipe or socket whose read end is closed, the kernel sends SIGPIPE. The default action kills the process.

/* sigpipe_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

int main(void)
{
    /* Ignore SIGPIPE -- check write() return instead */
    signal(SIGPIPE, SIG_IGN);

    int pipefd[2];
    pipe(pipefd);

    /* Close the read end immediately */
    close(pipefd[0]);

    /* Write to the broken pipe */
    const char *msg = "Hello, pipe!\n";
    ssize_t n = write(pipefd[1], msg, strlen(msg));

    if (n < 0) {
        printf("Write failed: %s (errno=%d)\n", strerror(errno), errno);
        /* errno == EPIPE */
    } else {
        printf("Wrote %zd bytes\n", n);
    }

    close(pipefd[1]);
    return 0;
}

Driver Prep: Kernel drivers handle signals indirectly. When a user-space process receives a signal while blocked in a system call, the kernel returns EINTR (or ERESTARTSYS internally). Driver code must check for signal_pending(current) and return -ERESTARTSYS so the VFS layer can restart or abort the system call.

Rust: The signal-hook Crate

Rust has no built-in signal handling in std. The signal-hook crate provides safe abstractions.

// signal_hook_demo.rs
// Cargo.toml:
//   [dependencies]
//   signal-hook = "0.3"

use signal_hook::consts::{SIGINT, SIGTERM};
use signal_hook::flag;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;
use std::time::Duration;

fn main() {
    let running = Arc::new(AtomicBool::new(true));

    // Register signal handlers that set the flag to false
    flag::register(SIGINT, Arc::clone(&running)).expect("register SIGINT");
    flag::register(SIGTERM, Arc::clone(&running)).expect("register SIGTERM");

    println!("PID {}: Running... (Ctrl+C to stop)", std::process::id());

    while running.load(Ordering::Relaxed) {
        println!("Working...");
        thread::sleep(Duration::from_secs(2));
    }

    println!("Signal received, shutting down.");
}

Rust Note: signal-hook uses atomic flags internally, avoiding the async-signal-safety pitfalls of C handlers. The flag::register function installs a minimal handler that just flips an AtomicBool. No unsafe code is needed in user code.

Rust: Using nix for Signals

The nix crate wraps the POSIX signal API:

// nix_signal_demo.rs
// Cargo.toml: nix = { version = "0.29", features = ["signal", "process"] }
use nix::sys::signal::{self, Signal, SigHandler};
use nix::unistd::Pid;
use std::thread;
use std::time::Duration;

extern "C" fn handler(sig: libc::c_int) {
    // Minimal handler -- only async-signal-safe operations
    // We just use write() to fd 1 (not println!)
    let msg = b"Signal caught!\n";
    unsafe { libc::write(1, msg.as_ptr() as *const libc::c_void, msg.len()); }
    let _ = sig;
}

fn main() {
    // Install handler for SIGUSR1
    unsafe {
        signal::signal(Signal::SIGUSR1, SigHandler::Handler(handler))
            .expect("signal");
    }

    let pid = nix::unistd::getpid();
    println!("PID {}: send SIGUSR1 to me", pid);
    println!("  kill -USR1 {}", pid);

    // Also send it to ourselves
    thread::sleep(Duration::from_secs(1));
    signal::kill(Pid::this(), Signal::SIGUSR1).expect("kill");

    thread::sleep(Duration::from_secs(1));
    println!("Done.");
}

Listing Signals on Your System

/* list_signals.c */
#include <stdio.h>
#include <string.h>
#include <signal.h>

int main(void)
{
    for (int i = 1; i < NSIG; i++) {
        const char *name = strsignal(i);
        if (name)
            printf("%2d  %s\n", i, name);
    }
    return 0;
}
$ gcc -o list_signals list_signals.c && ./list_signals
 1  Hangup
 2  Interrupt
 3  Quit
 ...

Try It: Run kill -l in your shell to see the full signal list. Compare it with the output of list_signals. Note the real-time signals at the end (32+).

Signal Delivery Flow

Event occurs (Ctrl+C, child dies, bad memory access, kill())
         |
         v
Kernel sets signal as "pending" for target process
         |
         v
Process is scheduled to run (or already running)
         |
         v
Kernel checks: is signal blocked?
    |                    |
   YES                  NO
    |                    |
    v                    v
Signal stays       Check disposition:
pending            SIG_DFL / SIG_IGN / handler
                        |
              +---------+----------+
              |         |          |
           SIG_DFL    SIG_IGN   handler()
              |         |          |
           Default   Discard   Run handler,
           action              then resume

Knowledge Check

  1. Name two signals that cannot be caught or ignored.

  2. What is the default action for SIGCHLD? Why is this important for servers that fork child processes?

  3. Why is volatile sig_atomic_t required for variables shared between a signal handler and the main program?

Common Pitfalls

  • Ignoring SIGPIPE: Network servers must ignore SIGPIPE or they will die when a client disconnects mid-write. Use signal(SIGPIPE, SIG_IGN) and check write() return values.

  • Not saving/restoring errno in handlers: Signal handlers can clobber errno. Save it on entry, restore on exit.

  • Assuming signals are queued: Standard signals (1-31) are not queued. If two SIGCHLD signals arrive before the handler runs, you get one delivery. Always loop in waitpid() with WNOHANG.

  • Using printf in handlers: It is not async-signal-safe. Use write() to a file descriptor if you must produce output.

  • Forgetting that sleep() is interrupted: sleep(), read(), write(), and other blocking calls return early with EINTR when a signal is caught. Always retry or handle the short return.

  • Catching SIGSEGV to "handle" crashes: You can catch it, but you cannot safely resume. The faulting instruction will re-execute and fault again unless you fix the underlying memory issue (which you almost certainly cannot do portably).

Signal Handlers and Masks

The previous chapter introduced signals. This chapter is about controlling them properly: installing handlers with sigaction(), restricting what runs inside a handler, and using signal masks to create critical sections where signals are deferred.

sigaction(): The Proper Way

sigaction() replaces signal() with well-defined, portable behavior. It does not reset the handler after delivery, it lets you control which signals are blocked during handler execution, and it provides additional flags.

/* sigaction_basic.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>

static volatile sig_atomic_t got_int = 0;

static void handler(int sig)
{
    (void)sig;
    got_int = 1;
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;

    if (sigaction(SIGINT, &sa, NULL) < 0) {
        perror("sigaction");
        return 1;
    }

    printf("PID %d: Press Ctrl+C\n", getpid());

    while (!got_int) {
        printf("Waiting...\n");
        sleep(2);
    }

    printf("Caught SIGINT, exiting gracefully.\n");
    return 0;
}

The struct sigaction fields:

FieldPurpose
sa_handlerPointer to handler function, SIG_DFL, or SIG_IGN
sa_maskAdditional signals to block during handler execution
sa_flagsBehavior flags (see below)
sa_sigactionExtended handler (used with SA_SIGINFO)

Common flags:

FlagEffect
SA_RESTARTAuto-restart interrupted system calls
SA_NOCLDSTOPDo not deliver SIGCHLD when child stops (only on exit)
SA_SIGINFOUse sa_sigaction instead of sa_handler
SA_RESETHANDReset to SIG_DFL after one delivery (like old signal())

SA_RESTART: Restarting Interrupted System Calls

Without SA_RESTART, a caught signal causes blocking calls like read() to return -1 with errno == EINTR. With it, the kernel restarts the call automatically.

/* sa_restart.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

static void handler(int sig)
{
    (void)sig;
    const char msg[] = "[signal caught]\n";
    write(STDERR_FILENO, msg, sizeof(msg) - 1);
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = handler;
    sigemptyset(&sa.sa_mask);

    /* Try with and without SA_RESTART */
    sa.sa_flags = SA_RESTART;  /* Comment this out to see EINTR */

    sigaction(SIGINT, &sa, NULL);

    printf("PID %d: Type something (Ctrl+C to test):\n", getpid());

    char buf[256];
    ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);

    if (n < 0) {
        if (errno == EINTR)
            printf("read() interrupted by signal (EINTR)\n");
        else
            perror("read");
    } else {
        buf[n] = '\0';
        printf("Read: %s", buf);
    }

    return 0;
}

Try It: Compile and run with SA_RESTART. Press Ctrl+C, then type something. The read completes normally. Now remove SA_RESTART, recompile, and press Ctrl+C. The read returns EINTR.

Async-Signal-Safe Functions: The Short List

Inside a signal handler, you can only call async-signal-safe functions. These are functions guaranteed to work correctly even when interrupting arbitrary code.

The POSIX-mandated safe list (selected):

_exit       write       read        open
close       signal      sigaction   sigprocmask
sigaddset   sigdelset   sigemptyset sigfillset
kill        raise       alarm       pause
fork        execve      waitpid     getpid

Not safe (most common traps):

printf      fprintf     malloc      free
syslog      strerror    localtime   gmtime
pthread_*   exit        atexit

Caution: Calling printf() from a signal handler is undefined behavior. It can deadlock if the signal interrupts printf() in the main program (both try to acquire the stdio lock). Use write() with a fixed-size buffer for any handler output.

/* safe_handler_output.c */
#include <signal.h>
#include <unistd.h>
#include <string.h>

static void handler(int sig)
{
    /* Only async-signal-safe calls here */
    const char msg[] = "Caught SIGINT\n";
    write(STDOUT_FILENO, msg, strlen(msg));
    (void)sig;
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    sigaction(SIGINT, &sa, NULL);

    pause();  /* Wait for signal */
    return 0;
}

Why printf in a Handler Is Undefined Behavior

Here is the scenario:

Main program                    Signal handler
     |                               |
printf("Working...\n")               |
  |                                  |
  +-- acquires stdio lock            |
  |                                  |
  +-- SIGINT arrives here!           |
  |                                  |
  |                           printf("Caught!\n")
  |                             |
  |                             +-- tries to acquire stdio lock
  |                             |
  |                             +-- DEADLOCK (same thread holds lock)

The program hangs forever. This is not theoretical -- it happens in production.

sig_atomic_t: The Shared Flag

The only safe way to communicate between a handler and the main program is through variables of type volatile sig_atomic_t.

/* sig_atomic_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>

static volatile sig_atomic_t signal_count = 0;

static void handler(int sig)
{
    (void)sig;
    signal_count++;
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    sigaction(SIGINT, &sa, NULL);

    printf("PID %d: Press Ctrl+C multiple times. Ctrl+\\ to quit.\n", getpid());

    while (1) {
        pause();
        printf("Signal count: %d\n", (int)signal_count);
    }

    return 0;
}

Caution: sig_atomic_t is guaranteed to be atomically readable/writable, but it is NOT a general-purpose atomic type. It is typically just an int. Only use it for simple flags and counters in signal handlers. For anything more complex, use signal masks or the self-pipe trick (next chapter).

Signal Masks: sigprocmask

A signal mask is a set of signals that are blocked (deferred) for the calling thread. Blocked signals stay pending until unblocked.

/* sigmask_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>

static void handler(int sig)
{
    const char msg[] = "SIGINT delivered!\n";
    write(STDOUT_FILENO, msg, strlen(msg));
    (void)sig;
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    sigaction(SIGINT, &sa, NULL);

    sigset_t block_set, old_set;
    sigemptyset(&block_set);
    sigaddset(&block_set, SIGINT);

    /* Block SIGINT */
    sigprocmask(SIG_BLOCK, &block_set, &old_set);
    printf("SIGINT blocked. Press Ctrl+C now (within 5 seconds).\n");
    sleep(5);

    /* Check if SIGINT is pending */
    sigset_t pending;
    sigpending(&pending);
    if (sigismember(&pending, SIGINT))
        printf("SIGINT is pending (was sent while blocked).\n");

    /* Unblock SIGINT -- pending signal will be delivered now */
    printf("Unblocking SIGINT...\n");
    sigprocmask(SIG_SETMASK, &old_set, NULL);

    printf("After unblock.\n");
    return 0;
}
$ ./sigmask_demo
SIGINT blocked. Press Ctrl+C now (within 5 seconds).
^C                          <-- pressed Ctrl+C during sleep
SIGINT is pending (was sent while blocked).
Unblocking SIGINT...
SIGINT delivered!           <-- handler runs when unblocked
After unblock.

Signal mask operations:

OperationMeaning
SIG_BLOCKAdd signals to the mask
SIG_UNBLOCKRemove signals from the mask
SIG_SETMASKReplace the entire mask
Signal Mask (per-thread):
+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |...|
+---+---+---+---+---+---+---+---+---+
  0   1   0   0   0   0   0   0  ...
      ^
      |
  SIGINT blocked (bit set = blocked)

Pending Signals:
+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |...|
+---+---+---+---+---+---+---+---+---+
  0   1   0   0   0   0   0   0  ...
      ^
      |
  SIGINT pending (received while blocked)

The Critical Section Pattern

Block signals around code that must not be interrupted:

/* critical_section.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>

static volatile sig_atomic_t got_signal = 0;

static void handler(int sig)
{
    (void)sig;
    got_signal = 1;
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    sigaction(SIGINT, &sa, NULL);

    sigset_t block, old;
    sigemptyset(&block);
    sigaddset(&block, SIGINT);

    /* --- Critical section: SIGINT deferred --- */
    sigprocmask(SIG_BLOCK, &block, &old);

    printf("Updating shared data structure...\n");
    sleep(3);  /* Simulate long update */
    printf("Update complete.\n");

    sigprocmask(SIG_SETMASK, &old, NULL);
    /* --- End critical section: pending SIGINT delivered here --- */

    if (got_signal)
        printf("Signal was deferred and delivered after critical section.\n");

    return 0;
}

This pattern is essential for data structures that must be consistent. If a signal handler touches the same data, blocking the signal prevents corruption.

Blocking Signals During Handler Execution

The sa_mask field of struct sigaction specifies additional signals to block while the handler is running. The caught signal is always blocked by default (unless SA_NODEFER is set).

/* sa_mask_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>

static void int_handler(int sig)
{
    (void)sig;
    const char msg[] = "SIGINT handler start\n";
    write(1, msg, strlen(msg));
    sleep(3);  /* SIGTERM blocked during this time via sa_mask */
    const char msg2[] = "SIGINT handler end\n";
    write(1, msg2, strlen(msg2));
}

static void term_handler(int sig)
{
    (void)sig;
    const char msg[] = "SIGTERM handler\n";
    write(1, msg, strlen(msg));
}

int main(void)
{
    struct sigaction sa_int, sa_term;

    memset(&sa_term, 0, sizeof(sa_term));
    sa_term.sa_handler = term_handler;
    sigemptyset(&sa_term.sa_mask);
    sa_term.sa_flags = 0;
    sigaction(SIGTERM, &sa_term, NULL);

    memset(&sa_int, 0, sizeof(sa_int));
    sa_int.sa_handler = int_handler;
    sigemptyset(&sa_int.sa_mask);
    sigaddset(&sa_int.sa_mask, SIGTERM);  /* Block SIGTERM during SIGINT handler */
    sa_int.sa_flags = 0;
    sigaction(SIGINT, &sa_int, NULL);

    printf("PID %d: Press Ctrl+C, then quickly send SIGTERM\n", getpid());
    printf("  kill -TERM %d\n", getpid());

    for (;;) pause();
    return 0;
}

Try It: Run the program. Press Ctrl+C. While "SIGINT handler start" is displayed, send kill -TERM <pid> from another terminal. Notice that SIGTERM is delivered only after the SIGINT handler finishes.

SA_SIGINFO: Extended Signal Information

With SA_SIGINFO, the handler receives a siginfo_t struct with details about who sent the signal and why.

/* siginfo_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>

static void handler(int sig, siginfo_t *info, void *ucontext)
{
    (void)ucontext;
    char buf[128];
    int len = snprintf(buf, sizeof(buf),
        "Signal %d from PID %d (uid %d)\n",
        sig, info->si_pid, info->si_uid);
    write(STDOUT_FILENO, buf, len);
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_sigaction = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_SIGINFO;

    sigaction(SIGUSR1, &sa, NULL);

    printf("PID %d: send SIGUSR1 to me\n", getpid());
    printf("  kill -USR1 %d\n", getpid());

    pause();
    return 0;
}

Caution: snprintf is technically not on the async-signal-safe list. For production code, format the output manually using write() and integer-to- string conversion. The example above is simplified for clarity.

Rust: Safe Signal Handling with signal-hook

The signal-hook crate provides multiple safe patterns:

// signal_hook_iterator.rs
// Cargo.toml:
//   [dependencies]
//   signal-hook = "0.3"

use signal_hook::consts::{SIGINT, SIGTERM, SIGUSR1};
use signal_hook::iterator::Signals;
use std::thread;
use std::time::Duration;

fn main() {
    let mut signals = Signals::new(&[SIGINT, SIGTERM, SIGUSR1])
        .expect("register signals");

    // Spawn a thread to handle signals
    let handle = thread::spawn(move || {
        for sig in signals.forever() {
            match sig {
                SIGINT => {
                    println!("Received SIGINT (Ctrl+C)");
                    println!("Shutting down...");
                    return;
                }
                SIGTERM => {
                    println!("Received SIGTERM");
                    println!("Shutting down...");
                    return;
                }
                SIGUSR1 => {
                    println!("Received SIGUSR1 -- reloading config");
                }
                _ => unreachable!(),
            }
        }
    });

    println!("PID {}: Running... (Ctrl+C to stop)", std::process::id());
    println!("  kill -USR1 {} to reload", std::process::id());

    // Main work loop
    loop {
        if handle.is_finished() {
            break;
        }
        println!("Working...");
        thread::sleep(Duration::from_secs(2));
    }

    handle.join().expect("signal thread panicked");
    println!("Clean shutdown complete.");
}

Rust Note: signal-hook's Signals::forever() uses a self-pipe internally. The signal handler writes a byte to a pipe; the iterator reads from it. This converts asynchronous signals into synchronous iteration, completely avoiding async-signal-safety issues. All the complex, unsafe C patterns become a simple for loop.

Rust Note: The nix crate also exposes sigprocmask via nix::sys::signal::sigprocmask(). The API mirrors C: create a SigSet, add signals to it, and call sigprocmask(SigmaskHow::SIG_BLOCK, Some(&set)). The old mask is returned for later restoration.

Driver Prep: Kernel signal handling is different -- kernel code does not receive signals. Instead, the kernel checks signal_pending(current) when returning from a blocking operation. If a signal is pending, the kernel returns -ERESTARTSYS to allow the system call to be restarted. Driver authors must handle this return code in any sleeping function.

Knowledge Check

  1. What happens if you call printf() inside a signal handler that interrupts another printf() call in the main program?

  2. What does SA_RESTART do, and when would you want to omit it?

  3. How do you temporarily block a signal, do some work, then unblock it so any pending instances are delivered?

Common Pitfalls

  • Using signal() instead of sigaction(): signal() has undefined reset behavior across platforms. Always use sigaction().

  • Calling malloc/free in handlers: Both use global state (the heap free list) and can deadlock or corrupt memory.

  • Not blocking signals during data structure updates: If a handler accesses shared data, block the signal during modifications in the main code.

  • Forgetting SA_RESTART: Without it, every read(), write(), accept(), and select() must check for EINTR and retry manually.

  • Using complex types in handlers: Only volatile sig_atomic_t is safe for communication between handlers and the main program.

  • Not saving errno: Signal handlers can be called between a system call failing and the program reading errno. Save and restore it.

Advanced Signals: signalfd and the Self-Pipe Trick

Standard signal handlers are awkward. They interrupt your code at arbitrary points, restrict you to a tiny set of safe functions, and make shared state management painful. This chapter covers techniques that convert signals from asynchronous interrupts into synchronous events you can handle in a normal event loop -- alongside sockets, timers, and other file descriptors.

The Problem with Async Signal Handlers

Consider a server using select() or epoll() to multiplex I/O. A signal handler fires between iterations, setting a flag. But select() is blocking -- it will not check the flag until a file descriptor event wakes it up, which might not happen for seconds or minutes.

Event loop:
  while (running) {
      n = select(...)     <-- blocks here
      handle_fd_events()
      check_signal_flag() <-- too late if no fd events
  }

Signal arrives during select():
  - Handler sets flag
  - select() returns EINTR (if no SA_RESTART)
  - OR select() keeps sleeping (with SA_RESTART)

We need signals to appear as file descriptor events.

Real-Time Signals (SIGRTMIN to SIGRTMAX)

Standard signals (1-31) have a critical limitation: they are not queued. If two SIGCHLD signals arrive before the handler runs, you get one delivery.

Real-time signals (SIGRTMIN through SIGRTMAX, typically 34-64) fix this:

  • They are queued: each send results in one delivery.
  • They carry data (an integer or pointer via sigqueue()).
  • They are delivered in order (lowest signal number first).
/* rt_signal.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <sys/wait.h>

static void handler(int sig, siginfo_t *info, void *ctx)
{
    (void)ctx;
    char buf[128];
    int len = snprintf(buf, sizeof(buf),
        "RT signal %d, value=%d, from PID %d\n",
        sig, info->si_value.sival_int, info->si_pid);
    write(STDOUT_FILENO, buf, len);
}

int main(void)
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_sigaction = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_SIGINFO;

    /* Install handler for SIGRTMIN */
    sigaction(SIGRTMIN, &sa, NULL);

    printf("PID %d: installing handler for signal %d (SIGRTMIN)\n",
           getpid(), SIGRTMIN);
    printf("RT signal range: %d - %d\n", SIGRTMIN, SIGRTMAX);

    pid_t pid = fork();
    if (pid == 0) {
        /* Child sends 3 queued signals with different values */
        union sigval val;
        for (int i = 1; i <= 3; i++) {
            val.sival_int = i * 100;
            sigqueue(getppid(), SIGRTMIN, val);
        }
        _exit(0);
    }

    /* Parent: block briefly to let all signals queue up */
    sigset_t block;
    sigemptyset(&block);
    sigaddset(&block, SIGRTMIN);
    sigprocmask(SIG_BLOCK, &block, NULL);

    waitpid(pid, NULL, 0);
    sleep(1);

    /* Unblock: all 3 queued signals should now deliver */
    printf("Unblocking RT signal...\n");
    sigprocmask(SIG_UNBLOCK, &block, NULL);

    sleep(1);
    return 0;
}
$ ./rt_signal
PID 5000: installing handler for signal 34 (SIGRTMIN)
RT signal range: 34 - 64
Unblocking RT signal...
RT signal 34, value=100, from PID 5001
RT signal 34, value=200, from PID 5001
RT signal 34, value=300, from PID 5001

All three deliveries happen. With a standard signal, only one would.

signalfd(): Signals as File Descriptor Events

Linux provides signalfd(), which creates a file descriptor that becomes readable when a signal is pending. This is the cleanest integration with event loops.

/* signalfd_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/signalfd.h>
#include <string.h>
#include <stdlib.h>
#include <poll.h>

int main(void)
{
    /* Block SIGINT and SIGTERM so they go to signalfd */
    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigaddset(&mask, SIGTERM);
    sigprocmask(SIG_BLOCK, &mask, NULL);

    /* Create signalfd */
    int sfd = signalfd(-1, &mask, 0);
    if (sfd < 0) {
        perror("signalfd");
        return 1;
    }

    printf("PID %d: Press Ctrl+C or send SIGTERM\n", getpid());

    /* Event loop using poll */
    struct pollfd pfd = { .fd = sfd, .events = POLLIN };

    for (;;) {
        int ret = poll(&pfd, 1, 5000);  /* 5 second timeout */

        if (ret < 0) {
            perror("poll");
            break;
        }

        if (ret == 0) {
            printf("Tick (no signals)...\n");
            continue;
        }

        /* Read the signal info */
        struct signalfd_siginfo si;
        ssize_t n = read(sfd, &si, sizeof(si));
        if (n != sizeof(si)) {
            perror("read signalfd");
            break;
        }

        printf("Received signal %d from PID %u\n",
               si.ssi_signo, si.ssi_pid);

        if (si.ssi_signo == SIGINT || si.ssi_signo == SIGTERM) {
            printf("Shutting down.\n");
            break;
        }
    }

    close(sfd);
    return 0;
}

The flow:

1. Block signals with sigprocmask()
2. Create signalfd() with same mask
3. Signals arrive -> kernel queues them on the fd
4. poll()/epoll() reports fd as readable
5. read() from fd returns struct signalfd_siginfo
6. Handle signal synchronously in your event loop

+----------+     +----------+     +-----------+
| Kernel   | --> | signalfd | --> | poll/epoll|
| signal   |     | (fd)     |     | event     |
| delivery |     |          |     | loop      |
+----------+     +----------+     +-----------+

Caution: You must block the signals with sigprocmask() before creating the signalfd. If a default handler or custom handler is installed, the signal gets delivered to the handler instead of the fd.

The Self-Pipe Trick

Before signalfd() existed (or on non-Linux systems), the self-pipe trick was the standard solution. Create a pipe. In the signal handler, write one byte to the pipe. In the event loop, include the pipe's read end in your select()/poll() set.

/* self_pipe.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <poll.h>
#include <errno.h>

static int pipe_fds[2];

static void handler(int sig)
{
    /* Write one byte -- the signal number */
    unsigned char s = (unsigned char)sig;
    write(pipe_fds[1], &s, 1);
}

static void make_nonblocking(int fd)
{
    int flags = fcntl(fd, F_GETFL);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}

int main(void)
{
    if (pipe(pipe_fds) < 0) {
        perror("pipe");
        return 1;
    }

    make_nonblocking(pipe_fds[0]);
    make_nonblocking(pipe_fds[1]);

    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_RESTART;
    sigaction(SIGINT, &sa, NULL);
    sigaction(SIGTERM, &sa, NULL);

    printf("PID %d: Press Ctrl+C or send SIGTERM\n", getpid());

    struct pollfd pfds[2] = {
        { .fd = STDIN_FILENO, .events = POLLIN },
        { .fd = pipe_fds[0],  .events = POLLIN },
    };

    for (;;) {
        int ret = poll(pfds, 2, 5000);

        if (ret < 0 && errno == EINTR)
            continue;

        if (ret < 0) {
            perror("poll");
            break;
        }

        if (ret == 0) {
            printf("Tick...\n");
            continue;
        }

        /* Check for stdin input */
        if (pfds[0].revents & POLLIN) {
            char buf[256];
            ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
            if (n > 0) {
                buf[n] = '\0';
                printf("Input: %s", buf);
            }
        }

        /* Check for signal via pipe */
        if (pfds[1].revents & POLLIN) {
            unsigned char sig;
            while (read(pipe_fds[0], &sig, 1) > 0) {
                printf("Signal %d via self-pipe\n", sig);
                if (sig == SIGINT || sig == SIGTERM) {
                    printf("Shutting down.\n");
                    close(pipe_fds[0]);
                    close(pipe_fds[1]);
                    return 0;
                }
            }
        }
    }

    close(pipe_fds[0]);
    close(pipe_fds[1]);
    return 0;
}

Why make the pipe non-blocking? If the signal fires rapidly, the write end could fill up. A blocking write in a signal handler would deadlock the process. With O_NONBLOCK, the write simply fails silently if the pipe is full -- which is fine because we only need to wake the event loop.

Try It: Modify the self-pipe program to handle SIGUSR1 as a "reload configuration" trigger. When received, print "Reloading config..." in the event loop (not in the handler).

timerfd: Timers as File Descriptors

While not a signal mechanism, timerfd solves the same integration problem for timers. Instead of SIGALRM, you get a readable file descriptor.

/* timerfd_demo.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/timerfd.h>
#include <stdint.h>
#include <poll.h>

int main(void)
{
    int tfd = timerfd_create(CLOCK_MONOTONIC, 0);
    if (tfd < 0) {
        perror("timerfd_create");
        return 1;
    }

    /* Fire every 2 seconds, first fire in 1 second */
    struct itimerspec ts = {
        .it_interval = { .tv_sec = 2, .tv_nsec = 0 },
        .it_value    = { .tv_sec = 1, .tv_nsec = 0 },
    };

    timerfd_settime(tfd, 0, &ts, NULL);

    printf("Timer started. Reading 5 ticks...\n");

    for (int i = 0; i < 5; i++) {
        uint64_t expirations;
        ssize_t n = read(tfd, &expirations, sizeof(expirations));
        if (n != sizeof(expirations)) {
            perror("read timerfd");
            break;
        }
        printf("Timer tick %d (expirations: %lu)\n", i, expirations);
    }

    close(tfd);
    return 0;
}

Integrating Everything: An Event-Driven Server Skeleton

Here is a skeleton that combines signalfd, timerfd, and socket I/O in one event loop:

/* event_server.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/signalfd.h>
#include <sys/timerfd.h>
#include <poll.h>
#include <string.h>
#include <stdint.h>

int main(void)
{
    /* 1. Set up signalfd for SIGINT, SIGTERM */
    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigaddset(&mask, SIGTERM);
    sigprocmask(SIG_BLOCK, &mask, NULL);

    int sig_fd = signalfd(-1, &mask, 0);

    /* 2. Set up timerfd for periodic work */
    int tmr_fd = timerfd_create(CLOCK_MONOTONIC, 0);
    struct itimerspec ts = {
        .it_interval = { .tv_sec = 3, .tv_nsec = 0 },
        .it_value    = { .tv_sec = 3, .tv_nsec = 0 },
    };
    timerfd_settime(tmr_fd, 0, &ts, NULL);

    /* 3. Event loop */
    enum { FD_SIGNAL, FD_TIMER, FD_COUNT };
    struct pollfd pfds[FD_COUNT] = {
        [FD_SIGNAL] = { .fd = sig_fd, .events = POLLIN },
        [FD_TIMER]  = { .fd = tmr_fd, .events = POLLIN },
    };

    printf("PID %d: event server running\n", getpid());

    int running = 1;
    while (running) {
        int n = poll(pfds, FD_COUNT, -1);
        if (n < 0) { perror("poll"); break; }

        /* Signal event */
        if (pfds[FD_SIGNAL].revents & POLLIN) {
            struct signalfd_siginfo si;
            read(sig_fd, &si, sizeof(si));
            printf("Signal %d received. Shutting down.\n", si.ssi_signo);
            running = 0;
        }

        /* Timer event */
        if (pfds[FD_TIMER].revents & POLLIN) {
            uint64_t exp;
            read(tmr_fd, &exp, sizeof(exp));
            printf("Timer tick (expirations: %lu)\n", exp);
        }
    }

    close(sig_fd);
    close(tmr_fd);
    return 0;
}
Event Loop Architecture:

  +----------+   +----------+   +----------+
  | signalfd |   | timerfd  |   | socket   |
  | (signals)|   | (timers) |   | (network)|
  +----+-----+   +----+-----+   +----+-----+
       |              |              |
       v              v              v
  +--------------------------------------+
  |         poll() / epoll_wait()        |
  +--------------------------------------+
       |
       v
  Handle event synchronously
  (no async-signal-safety concerns)

Driver Prep: Kernel drivers use wait_event() and wake_up() for event notification, not signalfd or timerfd. But user-space driver frameworks (like DPDK, SPDK, or UIO helpers) often build event loops with these exact Linux fd types. The pattern of multiplexing heterogeneous event sources into one loop translates directly.

Rust: signalfd via nix

// signalfd_nix.rs
// Cargo.toml: nix = { version = "0.29", features = ["signal", "poll"] }
use nix::sys::signal::SigSet;
use nix::sys::signal::Signal;
use nix::sys::signalfd::{SignalFd, SfdFlags};
use nix::sys::signal::SigmaskHow;
use nix::sys::signal;
use nix::poll::{PollFd, PollFlags, poll, PollTimeout};
use std::os::unix::io::AsFd;

fn main() {
    // Block signals
    let mut mask = SigSet::empty();
    mask.add(Signal::SIGINT);
    mask.add(Signal::SIGTERM);
    signal::sigprocmask(SigmaskHow::SIG_BLOCK, Some(&mask))
        .expect("sigprocmask");

    // Create signalfd
    let mut sfd = SignalFd::with_flags(&mask, SfdFlags::empty())
        .expect("signalfd");

    println!("PID {}: Press Ctrl+C or send SIGTERM", std::process::id());

    loop {
        let poll_fd = PollFd::new(sfd.as_fd(), PollFlags::POLLIN);
        let ret = poll(&mut [poll_fd], PollTimeout::from(5000u16))
            .expect("poll");

        if ret == 0 {
            println!("Tick...");
            continue;
        }

        if let Some(info) = sfd.read_signal().expect("read_signal") {
            println!("Received signal {} from PID {}",
                     info.ssi_signo, info.ssi_pid);

            let sig = info.ssi_signo as i32;
            if sig == Signal::SIGINT as i32 || sig == Signal::SIGTERM as i32 {
                println!("Shutting down.");
                break;
            }
        }
    }
}

Rust Note: For production async Rust, signal-hook integrates with mio (the I/O reactor behind tokio) via signal-hook-mio. Tokio also provides built-in tokio::signal that uses signalfd on Linux internally. You simply await a signal future. The ecosystem has converged on treating signals as just another async event.

Comparison: Signal Handling Approaches

ApproachPortabilityComplexityEvent Loop Integration
signal() / sigaction() handlerPOSIXLowPoor
Self-pipe trickPOSIXMediumGood
signalfd()Linux onlyLowExcellent
signal-hook (Rust)Cross-platformLowExcellent

When to Use What

  • Simple CLI tools: sigaction() with a volatile sig_atomic_t flag. Nothing more needed.

  • Event-driven servers on Linux: signalfd() with epoll(). Clean, efficient, no race conditions.

  • Portable servers: Self-pipe trick. Works everywhere, adds one extra fd.

  • Rust programs: signal-hook crate. It picks the right backend automatically.

Try It: Write a program that uses signalfd and timerfd together in one poll() loop. The timer fires every second and prints a count. SIGUSR1 resets the count to zero. SIGINT exits cleanly.

Knowledge Check

  1. Why must you block signals with sigprocmask() before creating a signalfd?

  2. What advantage do real-time signals have over standard signals?

  3. In the self-pipe trick, why must both ends of the pipe be set to non-blocking mode?

Common Pitfalls

  • Forgetting to block signals before signalfd: The signal gets delivered to the default handler instead of the fd. The fd never becomes readable.

  • Not draining the self-pipe: If you only read one byte but multiple signals arrived, the pipe stays readable. Always read in a loop until EAGAIN.

  • Blocking write in signal handler: If the self-pipe fills up and the write end is blocking, the handler blocks forever. Always use O_NONBLOCK.

  • Mixing signalfd and handlers: If you have both a signalfd and a sigaction handler for the same signal, behavior is undefined. Pick one.

  • Ignoring timerfd expirations count: read() on a timerfd returns a uint64_t with the number of expirations since last read. If your process was delayed, this count can be greater than 1.

  • Using signalfd in multithreaded programs: Signal masks are per-thread. Block the signals in all threads, then read the signalfd from one thread only.

Threads and pthreads

Threads let you run multiple execution paths inside a single process, sharing the same address space. They are lighter than fork() because there is no page-table copy, no duplicated file descriptors, no COW overhead. This chapter covers POSIX threads in C and std::thread in Rust.

Why Threads?

Process with one thread:          Process with three threads:

+---------------------------+     +---------------------------+
| Code   | Data  | Heap     |     | Code   | Data  | Heap     |
|        |       |          |     |        | (shared)          |
+---------------------------+     +---------------------------+
| Stack                     |     | Stack-0 | Stack-1 | Stack-2|
+---------------------------+     +---------------------------+
| 1 program counter         |     | PC-0    | PC-1    | PC-2   |
+---------------------------+     +---------------------------+

Every thread shares the code, global data, heap, and file descriptors. Each thread gets its own stack and register set. This makes communication between threads trivial (just read shared memory) but also dangerous (data races).

Creating a Thread in C

/* thread_hello.c */
#include <stdio.h>
#include <pthread.h>

void *greet(void *arg) {
    int id = *(int *)arg;
    printf("Hello from thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t t;
    int id = 42;

    if (pthread_create(&t, NULL, greet, &id) != 0) {
        perror("pthread_create");
        return 1;
    }

    pthread_join(t, NULL);
    printf("Thread finished\n");
    return 0;
}

Compile with:

gcc -o thread_hello thread_hello.c -pthread

The -pthread flag links the pthreads library and defines the right macros.

pthread_create takes four arguments:

ArgumentMeaning
&tWhere to store the thread ID
NULLThread attributes (NULL = defaults)
greetThe function to run
&idArgument passed to that function

The thread function signature is always void *(*)(void *) -- it takes a void * and returns a void *.

Passing Arguments Safely

A common bug: passing a pointer to a stack variable that changes before the thread reads it.

/* broken_args.c -- DO NOT DO THIS */
#include <stdio.h>
#include <pthread.h>

void *print_id(void *arg) {
    int id = *(int *)arg;   /* race: main may have changed *arg */
    printf("Thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t threads[5];
    for (int i = 0; i < 5; i++) {
        pthread_create(&threads[i], NULL, print_id, &i);  /* BUG */
    }
    for (int i = 0; i < 5; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

Caution: The loop variable i is shared across all threads. By the time a thread reads *arg, i may already be 3 or 5. You might see "Thread 5" printed five times.

The fix: give each thread its own copy.

/* fixed_args.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *print_id(void *arg) {
    int id = *(int *)arg;
    free(arg);
    printf("Thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t threads[5];
    for (int i = 0; i < 5; i++) {
        int *p = malloc(sizeof(int));
        *p = i;
        pthread_create(&threads[i], NULL, print_id, p);
    }
    for (int i = 0; i < 5; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

Each thread gets its own heap-allocated integer. The thread frees it after reading.

Try It: Modify broken_args.c to use an array int ids[5] instead of malloc. Set ids[i] = i before creating each thread. Does this fix the bug? Why or why not?

Return Values

A thread function returns void *. You retrieve it through pthread_join.

/* thread_return.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *compute_square(void *arg) {
    int val = *(int *)arg;
    int *result = malloc(sizeof(int));
    *result = val * val;
    return result;
}

int main(void) {
    pthread_t t;
    int input = 7;
    void *retval;

    pthread_create(&t, NULL, compute_square, &input);
    pthread_join(t, &retval);

    printf("7 squared = %d\n", *(int *)retval);
    free(retval);
    return 0;
}

Caution: Never return a pointer to a local variable from the thread function. The thread's stack is destroyed after it exits. Return heap-allocated memory or cast an integer to void *.

Joinable vs Detached Threads

By default, threads are joinable. If you never join them, you leak resources (similar to zombie processes). Detached threads clean up automatically when they exit.

/* detached.c */
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

void *background_work(void *arg) {
    (void)arg;
    sleep(1);
    printf("Background work done\n");
    return NULL;
}

int main(void) {
    pthread_t t;
    pthread_create(&t, NULL, background_work, NULL);
    pthread_detach(t);    /* cannot join after this */

    printf("Main continues immediately\n");
    sleep(2);  /* give detached thread time to finish */
    return 0;
}

You can also create a thread as detached from the start:

pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&t, &attr, func, arg);
pthread_attr_destroy(&attr);

Thread-Local Storage

Sometimes each thread needs its own copy of a variable. Three approaches in C:

1. The __thread keyword (GCC extension, also C11 _Thread_local):

/* tls_keyword.c */
#include <stdio.h>
#include <pthread.h>

__thread int counter = 0;

void *worker(void *arg) {
    int id = *(int *)arg;
    for (int i = 0; i < 1000; i++)
        counter++;
    printf("Thread %d: counter = %d\n", id, counter);
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    int id1 = 1, id2 = 2;
    pthread_create(&t1, NULL, worker, &id1);
    pthread_create(&t2, NULL, worker, &id2);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Main: counter = %d\n", counter);
    return 0;
}

Each thread sees counter = 1000. Main sees counter = 0. No synchronization needed.

2. pthread_key_create / pthread_getspecific / pthread_setspecific:

/* tls_key.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

static pthread_key_t key;

void destructor(void *val) {
    free(val);
}

void *worker(void *arg) {
    int *p = malloc(sizeof(int));
    *p = *(int *)arg;
    pthread_setspecific(key, p);

    int *my_val = pthread_getspecific(key);
    printf("Thread-local value: %d\n", *my_val);
    return NULL;
}

int main(void) {
    pthread_key_create(&key, destructor);

    pthread_t t1, t2;
    int a = 10, b = 20;
    pthread_create(&t1, NULL, worker, &a);
    pthread_create(&t2, NULL, worker, &b);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    pthread_key_delete(key);
    return 0;
}

The destructor runs automatically when a thread exits.

Thread Safety: What Breaks

When two threads touch the same data without synchronization, you get a data race.

/* data_race.c */
#include <stdio.h>
#include <pthread.h>

int shared_counter = 0;

void *increment(void *arg) {
    (void)arg;
    for (int i = 0; i < 1000000; i++)
        shared_counter++;   /* NOT atomic */
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Expected: 2000000, Got: %d\n", shared_counter);
    return 0;
}

Run this several times. You will almost never see 2000000. The increment shared_counter++ is three CPU instructions (load, add, store). Two threads interleave them:

Thread A: load counter (0)
Thread B: load counter (0)
Thread A: add 1 -> 1
Thread B: add 1 -> 1
Thread A: store 1
Thread B: store 1          <-- one increment lost

Caution: Data races in C are undefined behavior per C11. The compiler is free to assume they do not happen, leading to bizarre optimizations.

Rust: std::thread::spawn

Rust threads use OS threads, just like pthreads. The API is safer.

// thread_hello.rs
use std::thread;

fn main() {
    let handle = thread::spawn(|| {
        println!("Hello from a spawned thread");
    });

    handle.join().unwrap();
    println!("Thread finished");
}

No void * casting. No manual memory management. The closure captures its environment.

Move Closures for Safe Data Passing

Rust forces you to either borrow or move data into the thread closure. Since the compiler cannot prove the borrow outlives the thread, you must use move.

// thread_move.rs
use std::thread;

fn main() {
    let mut handles = vec![];

    for i in 0..5 {
        let handle = thread::spawn(move || {
            println!("Thread {}", i);
        });
        handles.push(handle);
    }

    for h in handles {
        h.join().unwrap();
    }
}

Each closure gets its own copy of i (integers implement Copy). There is no equivalent of the C bug where all threads share a pointer to the same loop variable.

Rust Note: Rust's thread::spawn requires the closure to be 'static -- it cannot borrow stack-local data from the parent. This prevents the entire class of dangling-pointer bugs that plague pthreads.

Returning Values from Rust Threads

The JoinHandle<T> carries the return value.

// thread_return.rs
use std::thread;

fn main() {
    let handle = thread::spawn(|| -> i32 {
        7 * 7
    });

    let result = handle.join().unwrap();
    println!("7 squared = {}", result);
}

No malloc, no void * cast, no free. The value is moved out of the thread safely.

Thread-Local Storage in Rust

// thread_local.rs
use std::cell::RefCell;
use std::thread;

thread_local! {
    static COUNTER: RefCell<u32> = RefCell::new(0);
}

fn main() {
    let mut handles = vec![];

    for id in 0..3 {
        let h = thread::spawn(move || {
            COUNTER.with(|c| {
                for _ in 0..1000 {
                    *c.borrow_mut() += 1;
                }
                println!("Thread {}: counter = {}", id, *c.borrow());
            });
        });
        handles.push(h);
    }

    for h in handles {
        h.join().unwrap();
    }

    COUNTER.with(|c| {
        println!("Main: counter = {}", *c.borrow());
    });
}

Each thread sees its own COUNTER. The thread_local! macro initializes lazily per thread.

Comparing C and Rust Thread APIs

+--------------------+-------------------------------+---------------------------+
| Operation          | C (pthreads)                  | Rust (std::thread)        |
+--------------------+-------------------------------+---------------------------+
| Create             | pthread_create(&t, NULL, f, a)| thread::spawn(closure)    |
| Join               | pthread_join(t, &retval)      | handle.join().unwrap()    |
| Detach             | pthread_detach(t)             | drop(handle) (implicit)   |
| Pass args          | void* cast                    | move closure              |
| Return values      | void* cast                    | JoinHandle<T>             |
| Thread-local       | __thread / pthread_key        | thread_local! macro       |
| Data race protect  | programmer discipline         | compiler-enforced         |
+--------------------+-------------------------------+---------------------------+

Driver Prep: Linux kernel threads use kthread_create and kthread_run, which follow a similar create-join pattern. The kernel has its own synchronization primitives (spinlock_t, mutex, rcu) but the mental model is the same: shared data needs protection.

Knowledge Check

  1. What happens if you pass &i (where i is a loop variable) to five pthread_create calls without copying i?
  2. Why must you compile with -pthread and not just -lpthread?
  3. In Rust, why does thread::spawn require a 'static closure?

Common Pitfalls

  • Forgetting -pthread -- the program may compile but crash at runtime or behave strangely.
  • Returning a pointer to a local variable from a thread function -- the stack is gone after the thread exits.
  • Not joining and not detaching -- resource leak, just like a zombie process.
  • Passing a shared pointer to multiple threads without synchronization -- data race, undefined behavior.
  • Calling pthread_join on a detached thread -- undefined behavior.
  • Assuming printf is thread-safe in all cases -- it is, by POSIX, but output may interleave at the line level.

Mutexes, Condition Variables, and Synchronization

The previous chapter showed that two threads incrementing a shared counter lose updates. This chapter fixes that with mutexes, condition variables, and read-write locks. We start with C's pthreads primitives, then show how Rust wraps the data inside the lock itself.

The Race Condition, Concretely

Here is the broken counter again for reference:

/* race.c */
#include <stdio.h>
#include <pthread.h>

static int counter = 0;

void *increment(void *arg) {
    (void)arg;
    for (int i = 0; i < 1000000; i++)
        counter++;
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Expected 2000000, got %d\n", counter);
    return 0;
}

The CPU executes counter++ as three steps: load, increment, store. Two threads interleaving these steps lose updates.

Time   Thread A              Thread B              counter
----   --------              --------              -------
 1     load counter (100)                           100
 2                           load counter (100)     100
 3     add 1 -> 101                                 100
 4                           add 1 -> 101           100
 5     store 101                                    101
 6                           store 101              101  <-- lost update

Mutex: The Fix

A mutex (mutual exclusion) ensures only one thread enters the critical section at a time.

/* mutex_counter.c */
#include <stdio.h>
#include <pthread.h>

static int counter = 0;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *increment(void *arg) {
    (void)arg;
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);
        counter++;
        pthread_mutex_unlock(&lock);
    }
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    pthread_mutex_destroy(&lock);
    printf("Expected 2000000, got %d\n", counter);
    return 0;
}

Now the output is always 2000000. The mutex serializes access to counter.

The lifecycle of a mutex:

PTHREAD_MUTEX_INITIALIZER  or  pthread_mutex_init(&m, NULL)
        |
        v
   pthread_mutex_lock(&m)    <-- blocks if another thread holds it
        |
        v
   [ critical section ]
        |
        v
   pthread_mutex_unlock(&m)
        |
        v
   pthread_mutex_destroy(&m)

Try It: Remove the pthread_mutex_lock / unlock calls and run the program 10 times. How much variance do you see in the output?

Dynamic Initialization

For mutexes allocated on the heap or inside a struct, use pthread_mutex_init:

/* mutex_dynamic.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

typedef struct {
    int value;
    pthread_mutex_t lock;
} SafeCounter;

SafeCounter *safe_counter_new(void) {
    SafeCounter *sc = malloc(sizeof(SafeCounter));
    sc->value = 0;
    pthread_mutex_init(&sc->lock, NULL);
    return sc;
}

void safe_counter_inc(SafeCounter *sc) {
    pthread_mutex_lock(&sc->lock);
    sc->value++;
    pthread_mutex_unlock(&sc->lock);
}

void safe_counter_free(SafeCounter *sc) {
    pthread_mutex_destroy(&sc->lock);
    free(sc);
}

int main(void) {
    SafeCounter *sc = safe_counter_new();
    safe_counter_inc(sc);
    safe_counter_inc(sc);
    printf("Counter: %d\n", sc->value);
    safe_counter_free(sc);
    return 0;
}

Deadlock

Deadlock occurs when two threads each hold a lock the other needs.

Thread A                    Thread B
--------                    --------
lock(mutex_1)               lock(mutex_2)
  ...                         ...
lock(mutex_2)  <-- blocked  lock(mutex_1)  <-- blocked
  DEADLOCK                    DEADLOCK

Prevention rules:

  1. Lock ordering -- always acquire locks in the same global order.
  2. Try-lock -- use pthread_mutex_trylock and back off if it fails.
  3. Avoid holding multiple locks whenever possible.
/* deadlock_fixed.c */
#include <stdio.h>
#include <pthread.h>

static pthread_mutex_t lock_a = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t lock_b = PTHREAD_MUTEX_INITIALIZER;

void *worker1(void *arg) {
    (void)arg;
    /* Always lock A before B */
    pthread_mutex_lock(&lock_a);
    pthread_mutex_lock(&lock_b);
    printf("Worker 1 has both locks\n");
    pthread_mutex_unlock(&lock_b);
    pthread_mutex_unlock(&lock_a);
    return NULL;
}

void *worker2(void *arg) {
    (void)arg;
    /* Same order: A before B */
    pthread_mutex_lock(&lock_a);
    pthread_mutex_lock(&lock_b);
    printf("Worker 2 has both locks\n");
    pthread_mutex_unlock(&lock_b);
    pthread_mutex_unlock(&lock_a);
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, worker1, NULL);
    pthread_create(&t2, NULL, worker2, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    return 0;
}

Caution: Deadlocks are silent -- the program just hangs. Use pthread_mutex_timedlock in debug builds to detect them.

Condition Variables

A condition variable lets a thread sleep until some condition is true, without busy-waiting.

Classic pattern: producer-consumer queue.

/* condvar.c */
#include <stdio.h>
#include <pthread.h>
#include <stdbool.h>

#define QUEUE_SIZE 5

static int queue[QUEUE_SIZE];
static int count = 0;
static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t not_empty = PTHREAD_COND_INITIALIZER;
static pthread_cond_t not_full  = PTHREAD_COND_INITIALIZER;

void *producer(void *arg) {
    (void)arg;
    for (int i = 0; i < 20; i++) {
        pthread_mutex_lock(&mtx);
        while (count == QUEUE_SIZE)         /* MUST be while, not if */
            pthread_cond_wait(&not_full, &mtx);
        queue[count++] = i;
        printf("Produced %d (count=%d)\n", i, count);
        pthread_cond_signal(&not_empty);
        pthread_mutex_unlock(&mtx);
    }
    return NULL;
}

void *consumer(void *arg) {
    (void)arg;
    for (int i = 0; i < 20; i++) {
        pthread_mutex_lock(&mtx);
        while (count == 0)                  /* MUST be while, not if */
            pthread_cond_wait(&not_empty, &mtx);
        int val = queue[--count];
        printf("Consumed %d (count=%d)\n", val, count);
        pthread_cond_signal(&not_full);
        pthread_mutex_unlock(&mtx);
    }
    return NULL;
}

int main(void) {
    pthread_t prod, cons;
    pthread_create(&prod, NULL, producer, NULL);
    pthread_create(&cons, NULL, consumer, NULL);
    pthread_join(prod, NULL);
    pthread_join(cons, NULL);

    pthread_mutex_destroy(&mtx);
    pthread_cond_destroy(&not_empty);
    pthread_cond_destroy(&not_full);
    return 0;
}

Caution: Always check the condition in a while loop, not an if. Spurious wakeups are allowed by POSIX. The thread may wake up even though no one signaled the condvar.

The flow:

pthread_cond_wait(&cond, &mtx):
  1. Atomically: unlock mtx + sleep on cond
  2. When woken: re-lock mtx
  3. Return (caller re-checks condition in while loop)

pthread_cond_signal(&cond):
  Wake ONE waiting thread

pthread_cond_broadcast(&cond):
  Wake ALL waiting threads

Read-Write Locks

When reads vastly outnumber writes, a read-write lock allows multiple simultaneous readers.

/* rwlock.c */
#include <stdio.h>
#include <pthread.h>

static int shared_data = 0;
static pthread_rwlock_t rwl = PTHREAD_RWLOCK_INITIALIZER;

void *reader(void *arg) {
    int id = *(int *)arg;
    pthread_rwlock_rdlock(&rwl);
    printf("Reader %d sees %d\n", id, shared_data);
    pthread_rwlock_unlock(&rwl);
    return NULL;
}

void *writer(void *arg) {
    (void)arg;
    pthread_rwlock_wrlock(&rwl);
    shared_data = 42;
    printf("Writer set data to 42\n");
    pthread_rwlock_unlock(&rwl);
    return NULL;
}

int main(void) {
    pthread_t r1, r2, w;
    int id1 = 1, id2 = 2;

    pthread_create(&w, NULL, writer, NULL);
    pthread_create(&r1, NULL, reader, &id1);
    pthread_create(&r2, NULL, reader, &id2);

    pthread_join(w, NULL);
    pthread_join(r1, NULL);
    pthread_join(r2, NULL);

    pthread_rwlock_destroy(&rwl);
    return 0;
}

Rust: Mutex -- Data Inside the Lock

In C, the mutex and the data it protects are separate. You can forget to lock. In Rust, the data lives inside the Mutex<T>. You cannot access the data without locking.

// mutex_counter.rs
use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let counter = Arc::new(Mutex::new(0));
    let mut handles = vec![];

    for _ in 0..2 {
        let counter = Arc::clone(&counter);
        let h = thread::spawn(move || {
            for _ in 0..1_000_000 {
                let mut num = counter.lock().unwrap();
                *num += 1;
            }   // MutexGuard dropped here -> unlock
        });
        handles.push(h);
    }

    for h in handles {
        h.join().unwrap();
    }

    println!("Result: {}", *counter.lock().unwrap());
}

Rust Note: Mutex::lock() returns a MutexGuard<T>. This guard implements Deref and DerefMut so you use it like a reference. When the guard is dropped, the mutex is automatically unlocked. You literally cannot forget to unlock.

Rust: RwLock

// rwlock.rs
use std::sync::{Arc, RwLock};
use std::thread;

fn main() {
    let data = Arc::new(RwLock::new(0));
    let mut handles = vec![];

    // spawn readers
    for id in 0..3 {
        let data = Arc::clone(&data);
        handles.push(thread::spawn(move || {
            let val = data.read().unwrap();
            println!("Reader {} sees {}", id, *val);
        }));
    }

    // spawn writer
    {
        let data = Arc::clone(&data);
        handles.push(thread::spawn(move || {
            let mut val = data.write().unwrap();
            *val = 42;
            println!("Writer set data to 42");
        }));
    }

    for h in handles {
        h.join().unwrap();
    }
}

Rust: Condvar

// condvar.rs
use std::sync::{Arc, Mutex, Condvar};
use std::thread;

fn main() {
    let pair = Arc::new((Mutex::new(false), Condvar::new()));

    let pair_clone = Arc::clone(&pair);
    let producer = thread::spawn(move || {
        let (lock, cvar) = &*pair_clone;
        let mut ready = lock.lock().unwrap();
        *ready = true;
        println!("Producer: data is ready");
        cvar.notify_one();
    });

    let (lock, cvar) = &*pair;
    let mut ready = lock.lock().unwrap();
    while !*ready {
        ready = cvar.wait(ready).unwrap();
    }
    println!("Consumer: got the signal, ready = {}", *ready);

    producer.join().unwrap();
}

The Condvar::wait method takes the MutexGuard, releases the lock, sleeps, reacquires the lock, and returns a new guard. Same semantics as pthread_cond_wait, but type-safe.

Rust: Channels (mpsc)

Message passing avoids shared state entirely. Rust provides multi-producer, single-consumer channels.

// channel.rs
use std::sync::mpsc;
use std::thread;

fn main() {
    let (tx, rx) = mpsc::channel();

    let producer = thread::spawn(move || {
        for i in 0..5 {
            tx.send(i * i).unwrap();
        }
    });

    for val in rx {
        println!("Received: {}", val);
    }

    producer.join().unwrap();
}

When the tx (sender) is dropped, the rx iterator ends. Clean, simple, no locks.

For multiple producers, clone the sender:

// multi_producer.rs
use std::sync::mpsc;
use std::thread;

fn main() {
    let (tx, rx) = mpsc::channel();
    let mut handles = vec![];

    for id in 0..3 {
        let tx = tx.clone();
        handles.push(thread::spawn(move || {
            tx.send(format!("Hello from thread {}", id)).unwrap();
        }));
    }
    drop(tx);  // drop original sender so rx iterator terminates

    for msg in rx {
        println!("{}", msg);
    }

    for h in handles {
        h.join().unwrap();
    }
}

Driver Prep: The Linux kernel uses similar patterns: wait_event / wake_up for condition variables, spinlock_t for short critical sections, and completion for one-shot signaling. Message-passing patterns appear in kernel workqueues.

Why Rust's Mutex Is Better Than C's

C:   mutex and data are separate
     - You can access data without locking
     - You can lock the wrong mutex
     - You can forget to unlock

Rust: data is INSIDE the Mutex<T>
     - You MUST lock to access data
     - The lock guard auto-unlocks on drop
     - The compiler enforces Send + Sync bounds

Try It: In the Rust mutex_counter.rs example, try removing Arc::clone and just moving counter into both closures. What error does the compiler give? Why?

Knowledge Check

  1. Why must the condition in a condition variable be checked in a while loop, not an if?
  2. What is the difference between pthread_cond_signal and pthread_cond_broadcast?
  3. In Rust, what prevents you from accessing data protected by a Mutex<T> without locking it?

Common Pitfalls

  • Forgetting to unlock -- in C, every lock must have a matching unlock, even on error paths. Use cleanup handlers or RAII wrappers.
  • Locking inside a loop body when you meant to lock outside it -- performance disaster from lock contention.
  • Deadlock from inconsistent lock ordering -- establish a global order and document it.
  • Using if instead of while with condition variables -- spurious wakeups cause logic bugs.
  • Holding a lock while doing I/O -- blocks all other threads waiting on that lock. Keep critical sections short.
  • Poisoned mutex in Rust -- if a thread panics while holding a MutexGuard, the mutex is poisoned. Call .unwrap() or handle the PoisonError.

Rust Threads, Channels, and Async

Rust's concurrency model is built on two pillars: the type system prevents data races at compile time, and the ecosystem gives you both OS threads and async I/O. This chapter digs into Send, Sync, scoped threads, channels, Arc<Mutex<T>>, and introduces async/await with tokio.

Send and Sync

Two marker traits control what can cross thread boundaries.

  • Send: A type is Send if it can be transferred to another thread. Most types are Send. Raw pointers are not.
  • Sync: A type is Sync if it can be shared (via &T) between threads. A type is Sync if &T is Send.
+---------------------+--------+--------+
| Type                | Send?  | Sync?  |
+---------------------+--------+--------+
| i32, String, Vec<T> | Yes    | Yes    |
| Mutex<T>            | Yes    | Yes    |
| Rc<T>               | No     | No     |
| Arc<T>              | Yes    | Yes    |
| Cell<T>             | Yes    | No     |
| *mut T              | No     | No     |
+---------------------+--------+--------+

If you try to send an Rc<T> to another thread, the compiler stops you:

// send_error.rs -- WILL NOT COMPILE
use std::rc::Rc;
use std::thread;

fn main() {
    let data = Rc::new(42);
    thread::spawn(move || {
        println!("{}", data);
    });
}
error: `Rc<i32>` cannot be sent between threads safely

Rust Note: These traits are automatically derived by the compiler. You almost never implement them manually. They exist so the compiler can reason about thread safety without runtime checks.

Channels: mpsc in Depth

Rust's standard library provides std::sync::mpsc -- multi-producer, single-consumer channels.

// channel_types.rs
use std::sync::mpsc;
use std::thread;
use std::time::Duration;

fn main() {
    // Unbounded channel (infinite buffer)
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let messages = vec!["hello", "from", "the", "thread"];
        for msg in messages {
            tx.send(msg).unwrap();
            thread::sleep(Duration::from_millis(200));
        }
    });

    // recv() blocks until a message arrives
    // When the sender drops, recv() returns Err
    loop {
        match rx.recv() {
            Ok(msg) => println!("Got: {}", msg),
            Err(_) => {
                println!("Channel closed");
                break;
            }
        }
    }
}

For bounded channels (backpressure):

// sync_channel.rs
use std::sync::mpsc;
use std::thread;

fn main() {
    // Buffer holds at most 2 messages
    let (tx, rx) = mpsc::sync_channel(2);

    let producer = thread::spawn(move || {
        for i in 0..5 {
            println!("Sending {}", i);
            tx.send(i).unwrap();  // blocks if buffer full
            println!("Sent {}", i);
        }
    });

    for val in rx {
        println!("Received: {}", val);
    }

    producer.join().unwrap();
}

Try It: Change the buffer size to 0. This creates a rendezvous channel where send blocks until the receiver calls recv. Run it and observe the interleaving.

Arc<Mutex> for Shared Mutable State

When multiple threads need to read and write the same data, combine Arc (atomic reference counting) with Mutex (mutual exclusion).

// arc_mutex.rs
use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let data = Arc::new(Mutex::new(vec![1, 2, 3]));
    let mut handles = vec![];

    for i in 0..3 {
        let data = Arc::clone(&data);
        handles.push(thread::spawn(move || {
            let mut vec = data.lock().unwrap();
            vec.push(i * 10);
            println!("Thread {} pushed {}", i, i * 10);
        }));
    }

    for h in handles {
        h.join().unwrap();
    }

    println!("Final: {:?}", *data.lock().unwrap());
}

The ownership diagram:

main thread            thread 0            thread 1
    |                    |                    |
    v                    v                    v
 Arc -------> [strong count = 3] <-------- Arc
              |
              v
          Mutex<Vec<i32>>
              |
              v
          Vec [1, 2, 3, ...]

Each Arc::clone increments the atomic reference count. When the last Arc is dropped, the Mutex and Vec are freed.

Scoped Threads

std::thread::scope (stable since Rust 1.63) lets threads borrow from the parent stack. No 'static requirement, no Arc needed.

// scoped.rs
use std::thread;

fn main() {
    let mut data = vec![1, 2, 3, 4, 5];

    thread::scope(|s| {
        s.spawn(|| {
            let sum: i32 = data.iter().sum();
            println!("Sum: {}", sum);
        });

        s.spawn(|| {
            let len = data.len();
            println!("Length: {}", len);
        });
    });
    // All scoped threads are joined here automatically

    // We can mutate data again -- threads are done
    data.push(6);
    println!("After scope: {:?}", data);
}

Rust Note: Scoped threads solve the problem of needing Arc just to share a reference. The scope guarantees all threads finish before the borrow ends. This is similar to OpenMP parallel regions.

With scoped threads you can even have one thread borrow mutably while others read different data:

// scoped_mut.rs
use std::thread;

fn main() {
    let mut a = 10;
    let b = 20;

    thread::scope(|s| {
        s.spawn(|| {
            a += b;  // mutable borrow of a
        });

        s.spawn(|| {
            println!("b = {}", b);  // shared borrow of b only
        });
    });

    println!("a = {}", a);
}

Introduction to Async: Why and When

Threads are great for CPU-bound work. But for I/O-bound work (network servers, file I/O), OS threads are heavy: each one costs ~8KB of kernel stack plus scheduling overhead. A server handling 10,000 connections needs 10,000 threads.

Async I/O uses cooperative multitasking: tasks yield when they would block, and a runtime multiplexes many tasks onto a few OS threads.

Threads (preemptive):            Async (cooperative):

Thread 1 [==BLOCK=====RUN==]    Task 1 [==yield--RUN==yield--]
Thread 2 [=RUN===BLOCK==RUN]    Task 2 [--RUN====yield--RUN==]
Thread 3 [BLOCK=======RUN==]    Task 3 [--yield--RUN========-]
                                         ^
3 OS threads                    1 OS thread, 3 tasks

Rule of thumb:

  • CPU-bound (number crunching, compression): use threads
  • I/O-bound (network, disk): use async
  • Mixed: use async with spawn_blocking for CPU work

The Future Trait

An async function returns a Future. A Future is a state machine that can be polled.

#![allow(unused)]
fn main() {
// This is conceptual -- you don't implement Future manually for most code
trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

enum Poll<T> {
    Ready(T),
    Pending,
}
}

When you write async fn, the compiler transforms your function into a state machine that implements Future. The .await points are where the state machine yields.

Tokio Basics

Tokio is the most widely used async runtime for Rust. Add it to Cargo.toml:

[dependencies]
tokio = { version = "1", features = ["full"] }

A minimal async program:

// tokio_hello.rs
#[tokio::main]
async fn main() {
    println!("Hello from async main");

    let result = compute().await;
    println!("Result: {}", result);
}

async fn compute() -> i32 {
    tokio::time::sleep(std::time::Duration::from_millis(100)).await;
    42
}

The #[tokio::main] macro sets up the runtime. Without it, async fn main would return a Future that nobody polls.

Spawning Async Tasks

// tokio_spawn.rs
use tokio::time::{sleep, Duration};

#[tokio::main]
async fn main() {
    let handle1 = tokio::spawn(async {
        sleep(Duration::from_millis(100)).await;
        println!("Task 1 done");
        1
    });

    let handle2 = tokio::spawn(async {
        sleep(Duration::from_millis(50)).await;
        println!("Task 2 done");
        2
    });

    let r1 = handle1.await.unwrap();
    let r2 = handle2.await.unwrap();
    println!("Results: {} + {} = {}", r1, r2, r1 + r2);
}

Tasks run concurrently on the thread pool. tokio::spawn is like thread::spawn but for async tasks.

select!: Racing Tasks

// tokio_select.rs
use tokio::time::{sleep, Duration};

#[tokio::main]
async fn main() {
    tokio::select! {
        _ = sleep(Duration::from_secs(1)) => {
            println!("1 second elapsed");
        }
        _ = sleep(Duration::from_millis(500)) => {
            println!("500ms elapsed first");
        }
    }
}

select! waits for the first future to complete and cancels the rest. Useful for timeouts, shutdown signals, and multiplexing.

Async TCP Echo Server

Here is a complete async echo server -- the kind of thing that would need threads-per-connection in C:

// echo_server.rs
use tokio::net::TcpListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let listener = TcpListener::bind("127.0.0.1:8080").await?;
    println!("Listening on 127.0.0.1:8080");

    loop {
        let (mut socket, addr) = listener.accept().await?;
        println!("New connection from {}", addr);

        tokio::spawn(async move {
            let mut buf = [0u8; 1024];
            loop {
                let n = match socket.read(&mut buf).await {
                    Ok(0) => {
                        println!("{} disconnected", addr);
                        return;
                    }
                    Ok(n) => n,
                    Err(e) => {
                        eprintln!("Read error from {}: {}", addr, e);
                        return;
                    }
                };

                if let Err(e) = socket.write_all(&buf[..n]).await {
                    eprintln!("Write error to {}: {}", addr, e);
                    return;
                }
            }
        });
    }
}

Try It: Run the echo server, then connect with nc 127.0.0.1 8080 from multiple terminals. Each connection is handled by a lightweight task, not an OS thread.

C Comparison: Threaded Echo Server

For contrast, here is the same server in C using one thread per connection:

/* echo_server_threaded.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/socket.h>
#include <netinet/in.h>

void *handle_client(void *arg) {
    int fd = *(int *)arg;
    free(arg);
    char buf[1024];
    ssize_t n;
    while ((n = read(fd, buf, sizeof(buf))) > 0) {
        write(fd, buf, n);
    }
    close(fd);
    return NULL;
}

int main(void) {
    int srv = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port = htons(8080),
        .sin_addr.s_addr = INADDR_ANY
    };
    bind(srv, (struct sockaddr *)&addr, sizeof(addr));
    listen(srv, 128);
    printf("Listening on port 8080\n");

    while (1) {
        int *client = malloc(sizeof(int));
        *client = accept(srv, NULL, NULL);
        pthread_t t;
        pthread_create(&t, NULL, handle_client, client);
        pthread_detach(t);
    }
}

This works but creates an OS thread per connection. At 10,000 connections, you have 10,000 threads. The async version uses a small thread pool regardless of connection count.

Driver Prep: Kernel drivers do not use async/await, but they use a similar concept: workqueues and tasklets defer work without creating new threads. The kernel's io_uring interface is the closest thing to async I/O at the syscall level.

Threads vs Async: Decision Guide

+------------------+-------------------+--------------------+
| Factor           | OS Threads        | Async Tasks        |
+------------------+-------------------+--------------------+
| Scheduling       | Preemptive (OS)   | Cooperative (user) |
| Stack size       | ~8KB kernel stack  | Few hundred bytes  |
| Creation cost    | Moderate          | Very cheap         |
| Best for         | CPU-bound work    | I/O-bound work     |
| Max concurrency  | ~thousands        | ~millions          |
| Blocking calls   | OK                | MUST NOT block     |
| Debugging        | Easier            | Harder (state mc.) |
+------------------+-------------------+--------------------+

Caution: Never call blocking functions (like std::thread::sleep or synchronous file I/O) inside an async task. Use tokio::time::sleep, tokio::fs, or tokio::task::spawn_blocking instead. Blocking an async task blocks the entire runtime thread.

Knowledge Check

  1. What is the difference between Send and Sync?
  2. Why does Rc<T> fail to compile when sent to another thread?
  3. When should you use tokio::task::spawn_blocking instead of tokio::spawn?

Common Pitfalls

  • Using Rc instead of Arc across threads -- compile error, but confusing for beginners.
  • Forgetting move on closures passed to thread::spawn -- the closure borrows from the stack, which the thread may outlive.
  • Holding a MutexGuard across an .await point -- this blocks the async runtime. Use tokio::sync::Mutex if you must hold a lock across await.
  • Calling .await outside an async function -- futures are lazy; they do nothing until polled.
  • Mixing std::sync::Mutex with async code -- it works if the critical section is short and never crosses an await, but tokio::sync::Mutex is safer for async contexts.
  • Not dropping the original sender when using mpsc::channel with cloned senders -- the receiver never terminates.

Pipes and FIFOs

Pipes are the oldest IPC mechanism on Unix. When you type ls | grep foo | wc -l in a shell, three processes are connected by two pipes. This chapter covers unnamed pipes, named pipes (FIFOs), dup2 for I/O redirection, and building a mini shell pipeline.

pipe(): The Basics

pipe() creates two file descriptors: one for reading, one for writing. Data written to the write end comes out the read end, in order, like a one-way queue.

/* pipe_basic.c */
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    int fd[2];
    if (pipe(fd) == -1) {
        perror("pipe");
        return 1;
    }

    /* fd[0] = read end, fd[1] = write end */
    const char *msg = "Hello through a pipe!\n";
    write(fd[1], msg, strlen(msg));
    close(fd[1]);  /* close write end so read sees EOF */

    char buf[128];
    ssize_t n = read(fd[0], buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("Read: %s", buf);
    close(fd[0]);

    return 0;
}
pipe() returns:

  fd[0] ----READ----<  KERNEL BUFFER  <----WRITE---- fd[1]
                       (4096-65536 bytes)

Parent-Child Communication

The real power of pipes comes with fork(). The parent and child share the pipe file descriptors.

/* pipe_fork.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

int main(void) {
    int fd[2];
    pipe(fd);

    pid_t pid = fork();
    if (pid == -1) {
        perror("fork");
        return 1;
    }

    if (pid == 0) {
        /* Child: write to pipe */
        close(fd[0]);  /* close unused read end */
        const char *msg = "Message from child\n";
        write(fd[1], msg, strlen(msg));
        close(fd[1]);
        _exit(0);
    } else {
        /* Parent: read from pipe */
        close(fd[1]);  /* close unused write end */
        char buf[128];
        ssize_t n = read(fd[0], buf, sizeof(buf) - 1);
        buf[n] = '\0';
        printf("Parent received: %s", buf);
        close(fd[0]);
        wait(NULL);
    }

    return 0;
}

Caution: Always close the unused ends of the pipe in each process. If the child does not close fd[0], and the parent does not close fd[1], read() may block forever because the kernel thinks a writer still exists.

The flow after fork:

Before fork:
  Process: fd[0]=read, fd[1]=write

After fork:
  Parent:  fd[0]=read,  fd[1]=CLOSE
  Child:   fd[0]=CLOSE, fd[1]=write

  Child writes --> kernel buffer --> Parent reads

dup2: Redirecting stdin/stdout

dup2(oldfd, newfd) makes newfd a copy of oldfd, closing newfd first if open. This is how shells redirect I/O.

/* dup2_example.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void) {
    int fd[2];
    pipe(fd);

    pid_t pid = fork();
    if (pid == 0) {
        /* Child: redirect stdout to pipe write end */
        close(fd[0]);
        dup2(fd[1], STDOUT_FILENO);  /* stdout now writes to pipe */
        close(fd[1]);                /* original fd no longer needed */

        execlp("echo", "echo", "Hello from echo", NULL);
        _exit(1);
    }

    /* Parent: read from pipe */
    close(fd[1]);
    char buf[256];
    ssize_t n = read(fd[0], buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("Captured: %s", buf);
    close(fd[0]);
    wait(NULL);

    return 0;
}

After dup2(fd[1], STDOUT_FILENO):

Before dup2:              After dup2:
  fd[1] -> pipe_write       fd[1] -> pipe_write (closed next)
  stdout -> terminal         stdout -> pipe_write

Implementing a Shell Pipeline

Let us implement ls -la /tmp | grep log | wc -l with pipes and fork.

/* pipeline.c -- ls -la /tmp | grep log | wc -l */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void) {
    int pipe1[2], pipe2[2];
    pipe(pipe1);
    pipe(pipe2);

    /* First child: ls -la /tmp */
    pid_t p1 = fork();
    if (p1 == 0) {
        close(pipe1[0]);
        close(pipe2[0]);
        close(pipe2[1]);
        dup2(pipe1[1], STDOUT_FILENO);
        close(pipe1[1]);
        execlp("ls", "ls", "-la", "/tmp", NULL);
        _exit(1);
    }

    /* Second child: grep log */
    pid_t p2 = fork();
    if (p2 == 0) {
        close(pipe1[1]);
        close(pipe2[0]);
        dup2(pipe1[0], STDIN_FILENO);
        close(pipe1[0]);
        dup2(pipe2[1], STDOUT_FILENO);
        close(pipe2[1]);
        execlp("grep", "grep", "log", NULL);
        _exit(1);
    }

    /* Third child: wc -l */
    pid_t p3 = fork();
    if (p3 == 0) {
        close(pipe1[0]);
        close(pipe1[1]);
        close(pipe2[1]);
        dup2(pipe2[0], STDIN_FILENO);
        close(pipe2[0]);
        execlp("wc", "wc", "-l", NULL);
        _exit(1);
    }

    /* Parent: close all pipe ends and wait */
    close(pipe1[0]);
    close(pipe1[1]);
    close(pipe2[0]);
    close(pipe2[1]);
    wait(NULL);
    wait(NULL);
    wait(NULL);

    return 0;
}

The data flow:

ls -la /tmp --pipe1--> grep log --pipe2--> wc -l --> stdout

Try It: Modify the pipeline to run cat /etc/passwd | grep root | head -1. Remember to create two pipes and three child processes.

Pipe Capacity and Blocking

Linux pipes have a default capacity of 65536 bytes (16 pages). You can query and change it with fcntl:

/* pipe_capacity.c */
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main(void) {
    int fd[2];
    pipe(fd);

    int capacity = fcntl(fd[0], F_GETPIPE_SZ);
    printf("Default pipe capacity: %d bytes\n", capacity);

    /* Increase capacity (requires CAP_SYS_RESOURCE for > 1MB) */
    fcntl(fd[0], F_SETPIPE_SZ, 1048576);
    capacity = fcntl(fd[0], F_GETPIPE_SZ);
    printf("New pipe capacity: %d bytes\n", capacity);

    close(fd[0]);
    close(fd[1]);
    return 0;
}

Blocking behavior:

  • write() to a full pipe blocks until space is available (or returns EAGAIN if O_NONBLOCK is set).
  • read() from an empty pipe blocks until data arrives.
  • read() returns 0 when all write ends are closed (EOF).
  • write() to a pipe with no readers sends SIGPIPE to the writer.

Caution: SIGPIPE kills the process by default. In servers, set signal(SIGPIPE, SIG_IGN) and check the return value of write() for EPIPE instead.

Named Pipes (FIFOs)

Unnamed pipes only work between related processes (parent-child). FIFOs are special files on the filesystem that unrelated processes can open.

/* fifo_writer.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <string.h>

int main(void) {
    const char *fifo_path = "/tmp/myfifo";
    mkfifo(fifo_path, 0666);   /* create the FIFO */

    printf("Opening FIFO for writing (blocks until a reader opens)...\n");
    int fd = open(fifo_path, O_WRONLY);
    const char *msg = "Hello through a FIFO!\n";
    write(fd, msg, strlen(msg));
    close(fd);

    return 0;
}
/* fifo_reader.c */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

int main(void) {
    const char *fifo_path = "/tmp/myfifo";

    printf("Opening FIFO for reading (blocks until a writer opens)...\n");
    int fd = open(fifo_path, O_RDONLY);

    char buf[256];
    ssize_t n = read(fd, buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("Received: %s", buf);
    close(fd);

    unlink(fifo_path);  /* clean up */
    return 0;
}

Run the writer and reader in two terminals. The open() call blocks until both ends are connected.

Terminal 1: ./fifo_writer     Terminal 2: ./fifo_reader
  "Opening FIFO..."            "Opening FIFO..."
  (blocks)                     (connects)
  (writes, exits)              "Received: Hello through a FIFO!"

Rust: Pipes with std::process

Rust's standard library makes pipe-based communication easy through Command and Stdio:

// rust_pipe.rs
use std::process::{Command, Stdio};
use std::io::Read;

fn main() {
    let mut child = Command::new("echo")
        .arg("Hello from echo")
        .stdout(Stdio::piped())
        .spawn()
        .expect("Failed to spawn echo");

    let mut output = String::new();
    child.stdout.take().unwrap().read_to_string(&mut output).unwrap();
    child.wait().unwrap();

    println!("Captured: {}", output.trim());
}

Rust: Shell Pipeline

// rust_pipeline.rs
use std::process::{Command, Stdio};
use std::io::Read;

fn main() {
    // ls -la /tmp | grep log | wc -l
    let ls = Command::new("ls")
        .args(["-la", "/tmp"])
        .stdout(Stdio::piped())
        .spawn()
        .expect("Failed to start ls");

    let grep = Command::new("grep")
        .arg("log")
        .stdin(Stdio::from(ls.stdout.unwrap()))
        .stdout(Stdio::piped())
        .spawn()
        .expect("Failed to start grep");

    let mut wc = Command::new("wc")
        .arg("-l")
        .stdin(Stdio::from(grep.stdout.unwrap()))
        .stdout(Stdio::piped())
        .spawn()
        .expect("Failed to start wc");

    let mut output = String::new();
    wc.stdout.take().unwrap().read_to_string(&mut output).unwrap();
    wc.wait().unwrap();

    println!("Lines matching 'log': {}", output.trim());
}

Rust Note: Stdio::from() transfers ownership of the pipe file descriptor. Rust's type system ensures you cannot accidentally use the same stdout twice. In C, you manually close file descriptors and hope you did not make a mistake.

Rust: Low-Level Pipes with nix

For direct pipe() and dup2() access:

// rust_nix_pipe.rs
// Cargo.toml: nix = { version = "0.29", features = ["process", "unistd"] }
use nix::unistd::{pipe, fork, ForkResult, write, read, close, dup2};
use std::os::fd::AsRawFd;

fn main() {
    let (read_fd, write_fd) = pipe().expect("pipe failed");

    match unsafe { fork() }.expect("fork failed") {
        ForkResult::Child => {
            close(read_fd.as_raw_fd()).ok();
            let msg = b"Hello from child via nix\n";
            write(&write_fd, msg).unwrap();
            close(write_fd.as_raw_fd()).ok();
            std::process::exit(0);
        }
        ForkResult::Parent { child: _ } => {
            close(write_fd.as_raw_fd()).ok();
            let mut buf = [0u8; 128];
            let n = read(read_fd.as_raw_fd(), &mut buf).unwrap();
            close(read_fd.as_raw_fd()).ok();
            print!("Parent got: {}", std::str::from_utf8(&buf[..n]).unwrap());
            nix::sys::wait::wait().ok();
        }
    }
}

Driver Prep: Linux kernel modules use struct pipe_inode_info internally. The concept of pipes extends to kernel-space communication: relay channels and trace_pipe use the same ring-buffer idea for high-throughput kernel-to-user data transfer.

Knowledge Check

  1. What happens if you write() to a pipe whose read end has been closed by all processes?
  2. Why must you close unused pipe ends after fork()?
  3. What is the difference between an unnamed pipe and a FIFO?

Common Pitfalls

  • Not closing unused pipe ends -- leads to deadlock because read() never sees EOF.
  • Forgetting to handle SIGPIPE -- a write to a pipe with no readers kills the process.
  • Using pipes for large data transfers -- pipes are limited to kernel buffer size. For bulk data, use shared memory or files.
  • Opening a FIFO with O_RDWR -- technically works but defeats the blocking semantics and is not portable.
  • Race condition with mkfifo -- if the file already exists, mkfifo returns EEXIST. Check or use unlink first.
  • Assuming pipe writes are atomic -- writes up to PIPE_BUF (4096 on Linux) are atomic. Larger writes may be interleaved with other writers.

Shared Memory

Shared memory is the fastest IPC mechanism. Two processes map the same physical memory into their address spaces. There is no copying through the kernel -- a write by one process is instantly visible to the other. The cost: you must synchronize access yourself.

POSIX Shared Memory in C

Three steps: create/open, set the size, map it.

/* shm_writer.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>

#define SHM_NAME "/my_shm"
#define SHM_SIZE 4096

int main(void) {
    /* Create shared memory object */
    int fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    if (fd == -1) {
        perror("shm_open");
        return 1;
    }

    /* Set its size */
    if (ftruncate(fd, SHM_SIZE) == -1) {
        perror("ftruncate");
        return 1;
    }

    /* Map it into our address space */
    void *ptr = mmap(NULL, SHM_SIZE, PROT_READ | PROT_WRITE,
                     MAP_SHARED, fd, 0);
    if (ptr == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    close(fd);  /* fd no longer needed after mmap */

    /* Write data */
    const char *msg = "Hello from shared memory!";
    memcpy(ptr, msg, strlen(msg) + 1);
    printf("Writer: wrote '%s'\n", msg);

    /* Keep running so reader can access */
    printf("Writer: press Enter to clean up...\n");
    getchar();

    munmap(ptr, SHM_SIZE);
    shm_unlink(SHM_NAME);
    return 0;
}
/* shm_reader.c */
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>

#define SHM_NAME "/my_shm"
#define SHM_SIZE 4096

int main(void) {
    int fd = shm_open(SHM_NAME, O_RDONLY, 0);
    if (fd == -1) {
        perror("shm_open");
        return 1;
    }

    void *ptr = mmap(NULL, SHM_SIZE, PROT_READ, MAP_SHARED, fd, 0);
    if (ptr == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    close(fd);

    printf("Reader: got '%s'\n", (char *)ptr);

    munmap(ptr, SHM_SIZE);
    return 0;
}

Compile both with -lrt (for shm_open):

gcc -o shm_writer shm_writer.c -lrt
gcc -o shm_reader shm_reader.c -lrt

Run the writer first, then the reader in another terminal.

The memory layout:

Process A (writer)               Process B (reader)
+------------------+            +------------------+
| Virtual Memory   |            | Virtual Memory   |
|                  |            |                  |
|  mmap region  ------+    +------  mmap region   |
|                  |   |    |   |                  |
+------------------+   |    |   +------------------+
                       v    v
                  +------------+
                  | Physical   |
                  | Memory     |
                  | (shared)   |
                  +------------+

The shm_open / mmap API

FunctionPurpose
shm_open(name, flags, mode)Create or open a shared memory object (lives under /dev/shm/)
ftruncate(fd, size)Set the size of the shared memory object
mmap(addr, len, prot, flags, fd, offset)Map the object into the process address space
munmap(addr, len)Unmap the region
shm_unlink(name)Remove the shared memory object

Caution: shm_unlink removes the name from the filesystem, but the memory stays mapped until all processes call munmap or exit. If you forget shm_unlink, the shared memory persists across reboots (it lives in /dev/shm/). Check with ls /dev/shm/.

Try It: Run shm_writer, then look at /dev/shm/my_shm with ls -la /dev/shm/. You will see a file. Run shm_reader, then press Enter in the writer to clean up. Verify the file is gone.

Sharing Structured Data

You can share any fixed-size structure. Use offsetof and fixed-width types for portability.

/* shm_struct.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdint.h>
#include <sys/wait.h>

typedef struct {
    int32_t counter;
    char    message[64];
} SharedData;

int main(void) {
    const char *name = "/struct_shm";
    int fd = shm_open(name, O_CREAT | O_RDWR, 0666);
    ftruncate(fd, sizeof(SharedData));

    SharedData *data = mmap(NULL, sizeof(SharedData),
                            PROT_READ | PROT_WRITE,
                            MAP_SHARED, fd, 0);
    close(fd);

    data->counter = 0;
    strcpy(data->message, "initialized");

    pid_t pid = fork();
    if (pid == 0) {
        /* Child increments counter */
        for (int i = 0; i < 100000; i++)
            data->counter++;    /* WARNING: no synchronization! */
        strcpy(data->message, "child was here");
        _exit(0);
    }

    /* Parent also increments */
    for (int i = 0; i < 100000; i++)
        data->counter++;        /* WARNING: race condition! */

    wait(NULL);
    printf("Counter: %d (expected 200000)\n", data->counter);
    printf("Message: %s\n", data->message);

    munmap(data, sizeof(SharedData));
    shm_unlink(name);
    return 0;
}

The counter will be wrong due to the race condition. We need synchronization.

Process-Shared Mutex

A regular pthread_mutex_t only works within a single process. For cross-process synchronization, use PTHREAD_PROCESS_SHARED.

/* shm_mutex.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/wait.h>

typedef struct {
    pthread_mutex_t lock;
    int counter;
} SharedData;

int main(void) {
    const char *name = "/mutex_shm";
    int fd = shm_open(name, O_CREAT | O_RDWR, 0666);
    ftruncate(fd, sizeof(SharedData));

    SharedData *data = mmap(NULL, sizeof(SharedData),
                            PROT_READ | PROT_WRITE,
                            MAP_SHARED, fd, 0);
    close(fd);

    /* Initialize process-shared mutex */
    pthread_mutexattr_t attr;
    pthread_mutexattr_init(&attr);
    pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
    pthread_mutex_init(&data->lock, &attr);
    pthread_mutexattr_destroy(&attr);

    data->counter = 0;

    pid_t pid = fork();
    if (pid == 0) {
        for (int i = 0; i < 100000; i++) {
            pthread_mutex_lock(&data->lock);
            data->counter++;
            pthread_mutex_unlock(&data->lock);
        }
        _exit(0);
    }

    for (int i = 0; i < 100000; i++) {
        pthread_mutex_lock(&data->lock);
        data->counter++;
        pthread_mutex_unlock(&data->lock);
    }

    wait(NULL);
    printf("Counter: %d (expected 200000)\n", data->counter);

    pthread_mutex_destroy(&data->lock);
    munmap(data, sizeof(SharedData));
    shm_unlink(name);
    return 0;
}

Compile with:

gcc -o shm_mutex shm_mutex.c -lrt -pthread

Now the counter is always 200000. The key line is pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED).

Caution: The mutex must be stored in the shared memory region itself, not on the stack or heap of either process. Both processes must access the same pthread_mutex_t object.

Rust: Shared Memory with memmap2

Rust does not have a standard-library shared memory API. The memmap2 crate provides a safe wrapper around mmap.

Add to Cargo.toml:

[dependencies]
memmap2 = "0.9"
nix = { version = "0.29", features = ["mman", "fs"] }
// shm_writer.rs
use nix::fcntl::OFlag;
use nix::sys::mman::{shm_open, shm_unlink};
use nix::sys::stat::Mode;
use nix::unistd::ftruncate;
use memmap2::MmapMut;
use std::fs::File;
use std::os::fd::FromRawFd;
use std::io::Write;

const SHM_NAME: &str = "/rust_shm";
const SHM_SIZE: usize = 4096;

fn main() {
    // Create shared memory
    let fd = shm_open(
        SHM_NAME,
        OFlag::O_CREAT | OFlag::O_RDWR,
        Mode::S_IRUSR | Mode::S_IWUSR,
    )
    .expect("shm_open failed");

    ftruncate(&fd, SHM_SIZE as i64).expect("ftruncate failed");

    let file = unsafe { File::from_raw_fd(fd.as_raw_fd()) };
    let mut mmap = unsafe {
        MmapMut::map_mut(&file).expect("mmap failed")
    };

    let msg = b"Hello from Rust shared memory!";
    mmap[..msg.len()].copy_from_slice(msg);

    println!("Writer: wrote message. Press Enter to clean up.");
    let mut buf = String::new();
    std::io::stdin().read_line(&mut buf).unwrap();

    drop(mmap);
    shm_unlink(SHM_NAME).ok();
}
// shm_reader.rs
use nix::fcntl::OFlag;
use nix::sys::mman::shm_open;
use nix::sys::stat::Mode;
use memmap2::Mmap;
use std::fs::File;
use std::os::fd::FromRawFd;

const SHM_NAME: &str = "/rust_shm";
const SHM_SIZE: usize = 4096;

fn main() {
    let fd = shm_open(SHM_NAME, OFlag::O_RDONLY, Mode::empty())
        .expect("shm_open failed -- is the writer running?");

    let file = unsafe { File::from_raw_fd(fd.as_raw_fd()) };
    let mmap = unsafe {
        Mmap::map(&file).expect("mmap failed")
    };

    // Find the null terminator or use a fixed length
    let end = mmap.iter().position(|&b| b == 0).unwrap_or(SHM_SIZE);
    let msg = std::str::from_utf8(&mmap[..end]).unwrap();
    println!("Reader: got '{}'", msg);
}

Rust Note: mmap is inherently unsafe in Rust because another process can modify the mapped memory at any time, violating Rust's aliasing rules. The unsafe blocks here acknowledge that you are opting into shared-memory semantics, where the compiler cannot enforce data-race freedom.

When to Use Shared Memory

+------------------+-----------+----------+----------+
| Factor           | Pipe      | Socket   | Shm      |
+------------------+-----------+----------+----------+
| Speed            | Medium    | Medium   | Fastest  |
| Kernel copies    | 2 (w+r)   | 2 (w+r)  | 0        |
| Sync needed      | Built-in  | Built-in | Manual   |
| Unrelated procs  | No (FIFO) | Yes      | Yes      |
| Structured data  | Serialize | Serialize| Direct   |
| Complexity       | Low       | Medium   | High     |
+------------------+-----------+----------+----------+

Use shared memory when:

  • You need the absolute lowest latency (high-frequency trading, real-time audio).
  • You are transferring large amounts of data between processes.
  • The data is a fixed-size structure that both processes understand.

Do not use it when:

  • You need communication between machines (use sockets).
  • The data is small and infrequent (use pipes or message queues).
  • You cannot afford the complexity of manual synchronization.

Anonymous Shared Memory with mmap

You do not always need shm_open. For parent-child sharing, use MAP_SHARED | MAP_ANONYMOUS:

/* anon_shm.c */
#include <stdio.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void) {
    int *shared = mmap(NULL, sizeof(int),
                       PROT_READ | PROT_WRITE,
                       MAP_SHARED | MAP_ANONYMOUS,
                       -1, 0);
    if (shared == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    *shared = 0;

    pid_t pid = fork();
    if (pid == 0) {
        *shared = 42;
        _exit(0);
    }

    wait(NULL);
    printf("Child set shared value to %d\n", *shared);

    munmap(shared, sizeof(int));
    return 0;
}

No filesystem name needed. The mapping is inherited by fork() and shared between parent and child.

Driver Prep: Linux kernel drivers use shared memory extensively. mmap in a device driver maps kernel buffers into user space (e.g., framebuffer devices, DMA buffers). The remap_pfn_range function in the kernel is the driver-side equivalent of mmap.

Memory-Mapped Files

mmap can also map regular files, giving you shared, persistent storage:

/* mmap_file.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    const char *path = "/tmp/mmap_test.dat";
    int fd = open(path, O_RDWR | O_CREAT | O_TRUNC, 0666);
    ftruncate(fd, 4096);

    char *map = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
                     MAP_SHARED, fd, 0);
    close(fd);

    strcpy(map, "Persisted via mmap");
    msync(map, 4096, MS_SYNC);  /* flush to disk */
    printf("Wrote to file via mmap\n");

    munmap(map, 4096);

    /* Verify by reading the file normally */
    fd = open(path, O_RDONLY);
    char buf[64];
    read(fd, buf, sizeof(buf));
    close(fd);
    printf("Read back: %s\n", buf);

    unlink(path);
    return 0;
}

Try It: Modify mmap_file.c to map an existing file (like /etc/hostname) as read-only and print its contents without using read(). Hint: use PROT_READ and MAP_PRIVATE.

Knowledge Check

  1. What is the difference between MAP_SHARED and MAP_PRIVATE?
  2. Why must a process-shared mutex be stored in the shared memory region itself?
  3. What does msync do, and when would you need it?

Common Pitfalls

  • Forgetting ftruncate -- the shared memory object starts at size 0. Accessing unmapped memory causes SIGBUS.
  • Using MAP_PRIVATE when you want sharing -- MAP_PRIVATE creates a copy-on-write mapping. Changes are not visible to other processes.
  • Not calling shm_unlink -- the shared memory object persists in /dev/shm/ until you remove it.
  • Assuming memory ordering -- on architectures with weak memory ordering (ARM, RISC-V), you need memory barriers or atomics even with shared memory. x86 is relatively forgiving but do not rely on it.
  • Mapping too much memory -- mmap reserves virtual address space but physical memory is allocated on demand (page faults). Still, do not map terabytes casually.
  • Storing pointers in shared memory -- pointers are process-local. Store offsets instead.

Message Queues and Semaphores

Message queues give you structured, typed communication between processes without the byte-stream nature of pipes. Semaphores give you lightweight synchronization without the overhead of a full mutex. This chapter covers POSIX message queues and POSIX semaphores, then compares all IPC mechanisms.

POSIX Message Queues

A message queue is a kernel-managed list of messages. Each message has a body and a priority. Higher-priority messages are delivered first.

/* mq_sender.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <mqueue.h>

#define QUEUE_NAME "/my_queue"
#define MAX_MSG_SIZE 256
#define MAX_MSGS 10

int main(void) {
    struct mq_attr attr = {
        .mq_flags = 0,
        .mq_maxmsg = MAX_MSGS,
        .mq_msgsize = MAX_MSG_SIZE,
        .mq_curmsgs = 0
    };

    mqd_t mq = mq_open(QUEUE_NAME, O_CREAT | O_WRONLY, 0666, &attr);
    if (mq == (mqd_t)-1) {
        perror("mq_open");
        return 1;
    }

    const char *messages[] = {
        "Low priority message",
        "Medium priority message",
        "High priority message"
    };
    unsigned int priorities[] = {1, 5, 10};

    for (int i = 0; i < 3; i++) {
        if (mq_send(mq, messages[i], strlen(messages[i]) + 1,
                    priorities[i]) == -1) {
            perror("mq_send");
            return 1;
        }
        printf("Sent (priority %u): %s\n", priorities[i], messages[i]);
    }

    mq_close(mq);
    return 0;
}
/* mq_receiver.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <mqueue.h>

#define QUEUE_NAME "/my_queue"
#define MAX_MSG_SIZE 256

int main(void) {
    mqd_t mq = mq_open(QUEUE_NAME, O_RDONLY);
    if (mq == (mqd_t)-1) {
        perror("mq_open");
        return 1;
    }

    char buf[MAX_MSG_SIZE + 1];
    unsigned int priority;

    /* Receive 3 messages -- highest priority comes first */
    for (int i = 0; i < 3; i++) {
        ssize_t n = mq_receive(mq, buf, sizeof(buf), &priority);
        if (n == -1) {
            perror("mq_receive");
            return 1;
        }
        buf[n] = '\0';
        printf("Received (priority %u): %s\n", priority, buf);
    }

    mq_close(mq);
    mq_unlink(QUEUE_NAME);
    return 0;
}

Compile with:

gcc -o mq_sender mq_sender.c -lrt
gcc -o mq_receiver mq_receiver.c -lrt

Run the sender first, then the receiver. Notice the output order is by descending priority:

Received (priority 10): High priority message
Received (priority 5): Medium priority message
Received (priority 1): Low priority message

The message queue API:

mq_open(name, flags, mode, attr)  -- create or open
mq_send(mq, msg, len, priority)   -- send a message
mq_receive(mq, buf, len, &prio)   -- receive highest-priority message
mq_close(mq)                      -- close the descriptor
mq_unlink(name)                   -- remove the queue
mq_getattr(mq, &attr)             -- query attributes
mq_notify(mq, &sigevent)          -- register for async notification

Caution: The mq_receive buffer must be at least mq_msgsize bytes (as set in mq_attr). If it is smaller, mq_receive fails with EMSGSIZE. This is a common mistake.

Try It: Modify the sender to send 5 messages with the same priority. Verify that they arrive in FIFO order (first-in, first-out within the same priority level).

Message Queue Limits

Linux imposes system-wide limits on message queues:

/proc/sys/fs/mqueue/msg_max      -- max messages per queue (default 10)
/proc/sys/fs/mqueue/msgsize_max  -- max message size (default 8192)
/proc/sys/fs/mqueue/queues_max   -- max number of queues (default 256)

You can view and modify these:

cat /proc/sys/fs/mqueue/msg_max

Non-blocking and Timed Operations

/* mq_nonblock.c */
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <mqueue.h>
#include <errno.h>
#include <time.h>
#include <string.h>

#define QUEUE_NAME "/nb_queue"
#define MAX_MSG_SIZE 256

int main(void) {
    struct mq_attr attr = {
        .mq_flags = 0,
        .mq_maxmsg = 10,
        .mq_msgsize = MAX_MSG_SIZE,
        .mq_curmsgs = 0
    };

    mqd_t mq = mq_open(QUEUE_NAME, O_CREAT | O_RDWR | O_NONBLOCK,
                        0666, &attr);
    if (mq == (mqd_t)-1) {
        perror("mq_open");
        return 1;
    }

    /* Non-blocking receive on empty queue */
    char buf[MAX_MSG_SIZE + 1];
    unsigned int prio;
    if (mq_receive(mq, buf, sizeof(buf), &prio) == -1) {
        if (errno == EAGAIN)
            printf("No messages available (non-blocking)\n");
    }

    /* Send a message */
    const char *msg = "test message";
    mq_send(mq, msg, strlen(msg) + 1, 0);

    /* Timed receive: wait up to 2 seconds */
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    ts.tv_sec += 2;

    ssize_t n = mq_timedreceive(mq, buf, sizeof(buf), &prio, &ts);
    if (n > 0) {
        buf[n] = '\0';
        printf("Timed receive got: %s\n", buf);
    }

    mq_close(mq);
    mq_unlink(QUEUE_NAME);
    return 0;
}

POSIX Semaphores

A semaphore is a counter that supports two atomic operations: wait (decrement) and post (increment). When the counter is zero, sem_wait blocks.

There are two kinds:

  • Named semaphores -- created with sem_open, accessible by unrelated processes via a filesystem name.
  • Unnamed semaphores -- created with sem_init, live in shared memory or within a single process.

Named Semaphore

/* sem_named.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <semaphore.h>
#include <sys/wait.h>
#include <unistd.h>

#define SEM_NAME "/my_sem"

int main(void) {
    /* Create semaphore with initial value 1 (binary semaphore = mutex) */
    sem_t *sem = sem_open(SEM_NAME, O_CREAT, 0666, 1);
    if (sem == SEM_FAILED) {
        perror("sem_open");
        return 1;
    }

    pid_t pid = fork();
    if (pid == 0) {
        /* Child */
        sem_wait(sem);
        printf("Child: entered critical section\n");
        sleep(1);
        printf("Child: leaving critical section\n");
        sem_post(sem);
        sem_close(sem);
        _exit(0);
    }

    /* Parent */
    sleep(0);  /* let child start first */
    sem_wait(sem);
    printf("Parent: entered critical section\n");
    printf("Parent: leaving critical section\n");
    sem_post(sem);

    wait(NULL);
    sem_close(sem);
    sem_unlink(SEM_NAME);
    return 0;
}

Compile with:

gcc -o sem_named sem_named.c -pthread

Unnamed Semaphore in Shared Memory

/* sem_unnamed.c */
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void) {
    /* Allocate semaphore in shared memory */
    sem_t *sem = mmap(NULL, sizeof(sem_t), PROT_READ | PROT_WRITE,
                      MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (sem == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    /* Initialize: pshared=1 (process-shared), value=1 */
    sem_init(sem, 1, 1);

    pid_t pid = fork();
    if (pid == 0) {
        sem_wait(sem);
        printf("Child: in critical section\n");
        sleep(1);
        printf("Child: done\n");
        sem_post(sem);
        _exit(0);
    }

    sem_wait(sem);
    printf("Parent: in critical section\n");
    printf("Parent: done\n");
    sem_post(sem);

    wait(NULL);
    sem_destroy(sem);
    munmap(sem, sizeof(sem_t));
    return 0;
}

Counting Semaphores: Resource Pools

A counting semaphore tracks the number of available resources.

/* sem_pool.c */
#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>

#define POOL_SIZE 3
#define NUM_WORKERS 8

static sem_t pool;

void *worker(void *arg) {
    int id = *(int *)arg;

    sem_wait(&pool);
    printf("Worker %d: acquired resource (entering pool)\n", id);
    sleep(1);  /* simulate work with the resource */
    printf("Worker %d: releasing resource\n", id);
    sem_post(&pool);

    return NULL;
}

int main(void) {
    sem_init(&pool, 0, POOL_SIZE);  /* 3 resources available */

    pthread_t threads[NUM_WORKERS];
    int ids[NUM_WORKERS];

    for (int i = 0; i < NUM_WORKERS; i++) {
        ids[i] = i;
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }

    for (int i = 0; i < NUM_WORKERS; i++)
        pthread_join(threads[i], NULL);

    sem_destroy(&pool);
    return 0;
}

Output shows at most 3 workers in the pool at any time:

Worker 0: acquired resource (entering pool)
Worker 1: acquired resource (entering pool)
Worker 2: acquired resource (entering pool)
Worker 0: releasing resource
Worker 3: acquired resource (entering pool)
...

The semaphore value diagram:

sem value:  3  2  1  0  0  0  1  0  1  ...
            |  |  |  |     |  |  |  |
            W0 W1 W2 W3   W4 W0 W5 W1
            acq     acq blocks  rel acq

Driver Prep: The Linux kernel uses semaphores (struct semaphore) and counting semaphores for resource management. The down() and up() functions in the kernel correspond to sem_wait() and sem_post(). Modern kernel code prefers mutexes for binary locking and completions for signaling.

Semaphore vs Mutex

+------------------+-------------------+--------------------+
| Feature          | Mutex             | Semaphore          |
+------------------+-------------------+--------------------+
| Value range      | 0 or 1 (locked/   | 0 to N             |
|                  | unlocked)         |                    |
| Ownership        | Yes (only owner   | No (any thread     |
|                  | can unlock)       | can post)          |
| Use case         | Mutual exclusion  | Resource counting  |
| Priority inherit | Yes (on Linux)    | No                 |
| Cross-process    | With PSHARED attr | Named or in shm    |
+------------------+-------------------+--------------------+

Rust: Message Passing with Channels

Rust does not wrap POSIX message queues in the standard library. Instead, it provides channels (covered in Ch39-40) which serve the same purpose within a single process. For cross-process message queues, use the posixmq crate:

// rust_mq.rs
// Cargo.toml: posixmq = "1"
use posixmq::PosixMq;

fn main() {
    let name = "/rust_mq";

    // Open or create the queue
    let mq = PosixMq::create(name)
        .max_msg_len(256)
        .capacity(10)
        .open_or_create()
        .expect("Failed to open message queue");

    // Send messages with priorities
    mq.send(0, b"Low priority").unwrap();
    mq.send(5, b"Medium priority").unwrap();
    mq.send(10, b"High priority").unwrap();

    // Receive (highest priority first)
    let mut buf = vec![0u8; 256];
    for _ in 0..3 {
        let (priority, len) = mq.recv(&mut buf).unwrap();
        let msg = std::str::from_utf8(&buf[..len]).unwrap();
        println!("Received (priority {}): {}", priority, msg);
    }

    PosixMq::unlink(name).ok();
}

Rust: Semaphore Alternatives

Rust's standard library has no semaphore type. Use tokio's Semaphore for async code or build one from Mutex and Condvar:

// semaphore.rs
use std::sync::{Arc, Mutex, Condvar};
use std::thread;
use std::time::Duration;

struct Semaphore {
    count: Mutex<usize>,
    cvar: Condvar,
}

impl Semaphore {
    fn new(initial: usize) -> Self {
        Semaphore {
            count: Mutex::new(initial),
            cvar: Condvar::new(),
        }
    }

    fn acquire(&self) {
        let mut count = self.count.lock().unwrap();
        while *count == 0 {
            count = self.cvar.wait(count).unwrap();
        }
        *count -= 1;
    }

    fn release(&self) {
        let mut count = self.count.lock().unwrap();
        *count += 1;
        self.cvar.notify_one();
    }
}

fn main() {
    let sem = Arc::new(Semaphore::new(3));
    let mut handles = vec![];

    for id in 0..8 {
        let sem = Arc::clone(&sem);
        handles.push(thread::spawn(move || {
            sem.acquire();
            println!("Worker {}: acquired resource", id);
            thread::sleep(Duration::from_secs(1));
            println!("Worker {}: releasing", id);
            sem.release();
        }));
    }

    for h in handles {
        h.join().unwrap();
    }
}

Rust Note: Rust's philosophy favors channels over semaphores for most use cases. "Do not communicate by sharing memory; share memory by communicating." Channels are easier to reason about and less prone to bugs.

IPC Decision Table

+------------------+-------+--------+--------+--------+--------+
| Feature          | Pipe  | FIFO   | Shm    | MsgQ   | Socket |
+------------------+-------+--------+--------+--------+--------+
| Related procs    | Yes   | Any    | Any    | Any    | Any    |
| Network capable  | No    | No     | No     | No     | Yes    |
| Message boundary | No    | No     | N/A    | Yes    | DGRAM  |
| Priority         | No    | No     | N/A    | Yes    | No     |
| Speed            | Med   | Med    | Fast   | Med    | Med    |
| Kernel copies    | 2     | 2      | 0      | 2      | 2      |
| Bidirectional    | No    | No     | Yes    | No*    | Yes    |
| Max data size    | 64KB  | 64KB   | RAM    | 8KB**  | Large  |
| Persistence      | No    | File   | /dev/  | /dev/  | No     |
|                  |       |        | shm    | mqueue |        |
+------------------+-------+--------+--------+--------+--------+
  * Two queues needed for bidirectional
  ** Default, configurable

Try It: Write a producer-consumer pair using POSIX message queues. The producer sends 10 numbered messages with alternating priorities (odd numbers get priority 1, even get priority 5). The consumer prints them and observes the ordering.

Knowledge Check

  1. What is the difference between a named semaphore and an unnamed semaphore?
  2. Why does mq_receive require a buffer of at least mq_msgsize bytes?
  3. In what situation would you choose a message queue over a pipe?

Common Pitfalls

  • mq_receive buffer too small -- fails with EMSGSIZE even if the actual message is short. The buffer must be mq_msgsize or larger.
  • Forgetting mq_unlink or sem_unlink -- the objects persist in /dev/mqueue/ and /dev/shm/ until explicitly removed.
  • Using sem_init with pshared=0 across processes -- the semaphore only works within one process. Set pshared=1 for cross-process use.
  • Deadlock with semaphores -- if sem_wait is called more times than sem_post, the semaphore blocks forever.
  • Ignoring EINTR -- sem_wait and mq_receive can be interrupted by signals. Always check for EINTR and retry.
  • Message queue full -- mq_send blocks (or returns EAGAIN in non-blocking mode) when the queue is at capacity.

Unix Domain Sockets

Unix domain sockets are the Swiss Army knife of Linux IPC. They use the familiar socket API (socket, bind, listen, accept, connect) but communicate within the same machine, without network overhead. They support both stream and datagram modes, can pass file descriptors between processes, and can verify the identity of the peer. If you only learn one IPC mechanism, make it this one.

Creating a Unix Domain Socket

The key difference from network sockets: AF_UNIX instead of AF_INET, and struct sockaddr_un instead of struct sockaddr_in.

/* uds_server.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SOCKET_PATH "/tmp/my_uds.sock"

int main(void) {
    int srv = socket(AF_UNIX, SOCK_STREAM, 0);
    if (srv == -1) {
        perror("socket");
        return 1;
    }

    /* Remove any leftover socket file */
    unlink(SOCKET_PATH);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    if (bind(srv, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
        perror("bind");
        return 1;
    }

    listen(srv, 5);
    printf("Server listening on %s\n", SOCKET_PATH);

    int client = accept(srv, NULL, NULL);
    if (client == -1) {
        perror("accept");
        return 1;
    }

    char buf[256];
    ssize_t n = read(client, buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("Server received: %s\n", buf);

    const char *reply = "Hello from server";
    write(client, reply, strlen(reply));

    close(client);
    close(srv);
    unlink(SOCKET_PATH);
    return 0;
}
/* uds_client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SOCKET_PATH "/tmp/my_uds.sock"

int main(void) {
    int fd = socket(AF_UNIX, SOCK_STREAM, 0);
    if (fd == -1) {
        perror("socket");
        return 1;
    }

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    if (connect(fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
        perror("connect");
        return 1;
    }

    const char *msg = "Hello from client";
    write(fd, msg, strlen(msg));

    char buf[256];
    ssize_t n = read(fd, buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("Client received: %s\n", buf);

    close(fd);
    return 0;
}

Run the server in one terminal, the client in another. The socket appears as a file:

$ ls -la /tmp/my_uds.sock
srwxrwxr-x 1 user user 0 ... /tmp/my_uds.sock

The s at the start of the permissions indicates a socket file.

SOCK_STREAM vs SOCK_DGRAM

SOCK_STREAM (like TCP):
  - Connection-oriented
  - Reliable, ordered byte stream
  - Must listen/accept/connect

SOCK_DGRAM (like UDP):
  - Connectionless
  - Message boundaries preserved
  - No listen/accept needed
  - Reliable (unlike UDP -- no network to drop packets)

A datagram example:

/* uds_dgram_server.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SERVER_PATH "/tmp/uds_dgram_srv.sock"

int main(void) {
    int fd = socket(AF_UNIX, SOCK_DGRAM, 0);
    unlink(SERVER_PATH);

    struct sockaddr_un addr = {0};
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SERVER_PATH, sizeof(addr.sun_path) - 1);
    bind(fd, (struct sockaddr *)&addr, sizeof(addr));

    printf("Datagram server waiting on %s\n", SERVER_PATH);

    char buf[256];
    struct sockaddr_un client_addr;
    socklen_t len = sizeof(client_addr);

    ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
                         (struct sockaddr *)&client_addr, &len);
    buf[n] = '\0';
    printf("Server got: %s\n", buf);

    /* Send reply back to client */
    const char *reply = "ACK";
    sendto(fd, reply, strlen(reply), 0,
           (struct sockaddr *)&client_addr, len);

    close(fd);
    unlink(SERVER_PATH);
    return 0;
}
/* uds_dgram_client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SERVER_PATH "/tmp/uds_dgram_srv.sock"
#define CLIENT_PATH "/tmp/uds_dgram_cli.sock"

int main(void) {
    int fd = socket(AF_UNIX, SOCK_DGRAM, 0);

    /* Client must bind too, so server can reply */
    unlink(CLIENT_PATH);
    struct sockaddr_un client_addr = {0};
    client_addr.sun_family = AF_UNIX;
    strncpy(client_addr.sun_path, CLIENT_PATH,
            sizeof(client_addr.sun_path) - 1);
    bind(fd, (struct sockaddr *)&client_addr, sizeof(client_addr));

    struct sockaddr_un server_addr = {0};
    server_addr.sun_family = AF_UNIX;
    strncpy(server_addr.sun_path, SERVER_PATH,
            sizeof(server_addr.sun_path) - 1);

    const char *msg = "Hello datagram";
    sendto(fd, msg, strlen(msg), 0,
           (struct sockaddr *)&server_addr, sizeof(server_addr));

    char buf[256];
    ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0, NULL, NULL);
    buf[n] = '\0';
    printf("Client got reply: %s\n", buf);

    close(fd);
    unlink(CLIENT_PATH);
    return 0;
}

Try It: Modify the datagram server to loop and handle multiple messages from different clients. Each client should bind to a unique path (e.g., /tmp/client_PID.sock).

Abstract Socket Namespace

Linux supports an abstract namespace that does not create a filesystem entry. Set sun_path[0] = '\0':

/* uds_abstract.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>

int main(void) {
    int srv = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    /* Abstract: first byte is \0, rest is the name */
    const char *name = "\0my_abstract_socket";
    memcpy(addr.sun_path, name, 20);

    socklen_t addr_len = offsetof(struct sockaddr_un, sun_path) + 20;
    bind(srv, (struct sockaddr *)&addr, addr_len);
    listen(srv, 1);

    pid_t pid = fork();
    if (pid == 0) {
        /* Child: connect */
        close(srv);
        int cli = socket(AF_UNIX, SOCK_STREAM, 0);
        connect(cli, (struct sockaddr *)&addr, addr_len);
        write(cli, "abstract!", 9);
        close(cli);
        _exit(0);
    }

    int client = accept(srv, NULL, NULL);
    char buf[64];
    ssize_t n = read(client, buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("Received via abstract socket: %s\n", buf);

    close(client);
    close(srv);
    wait(NULL);
    return 0;
}

Advantages of abstract sockets:

  • No filesystem cleanup needed (no unlink required).
  • No permission issues with the socket file.
  • Automatically vanishes when all file descriptors are closed.

Caution: Abstract sockets are Linux-specific. They do not exist on macOS or FreeBSD. The address length matters -- you must pass the exact length, not sizeof(addr), because the name may contain null bytes.

Passing File Descriptors (SCM_RIGHTS)

This is the killer feature. One process can send an open file descriptor to another process over a Unix domain socket. The kernel creates a new file descriptor in the receiver's file descriptor table pointing to the same underlying file.

/* fd_sender.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SOCKET_PATH "/tmp/fd_pass.sock"

int send_fd(int sock, int fd_to_send) {
    char buf[1] = {'F'};
    struct iovec iov = { .iov_base = buf, .iov_len = 1 };

    /* Ancillary data buffer */
    union {
        char buf[CMSG_SPACE(sizeof(int))];
        struct cmsghdr align;
    } cmsg_buf;

    struct msghdr msg = {0};
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_control = cmsg_buf.buf;
    msg.msg_controllen = sizeof(cmsg_buf.buf);

    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
    cmsg->cmsg_level = SOL_SOCKET;
    cmsg->cmsg_type = SCM_RIGHTS;
    cmsg->cmsg_len = CMSG_LEN(sizeof(int));
    memcpy(CMSG_DATA(cmsg), &fd_to_send, sizeof(int));

    return sendmsg(sock, &msg, 0);
}

int main(void) {
    int srv = socket(AF_UNIX, SOCK_STREAM, 0);
    unlink(SOCKET_PATH);

    struct sockaddr_un addr = {0};
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
    bind(srv, (struct sockaddr *)&addr, sizeof(addr));
    listen(srv, 1);

    printf("Sender: waiting for connection...\n");
    int client = accept(srv, NULL, NULL);

    /* Open a file and send the fd to the other process */
    int file_fd = open("/etc/hostname", O_RDONLY);
    if (file_fd == -1) {
        perror("open");
        return 1;
    }

    printf("Sender: sending fd %d for /etc/hostname\n", file_fd);
    send_fd(client, file_fd);

    close(file_fd);
    close(client);
    close(srv);
    unlink(SOCKET_PATH);
    return 0;
}
/* fd_receiver.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SOCKET_PATH "/tmp/fd_pass.sock"

int recv_fd(int sock) {
    char buf[1];
    struct iovec iov = { .iov_base = buf, .iov_len = 1 };

    union {
        char buf[CMSG_SPACE(sizeof(int))];
        struct cmsghdr align;
    } cmsg_buf;

    struct msghdr msg = {0};
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_control = cmsg_buf.buf;
    msg.msg_controllen = sizeof(cmsg_buf.buf);

    if (recvmsg(sock, &msg, 0) <= 0)
        return -1;

    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
    if (cmsg && cmsg->cmsg_level == SOL_SOCKET
             && cmsg->cmsg_type == SCM_RIGHTS) {
        int fd;
        memcpy(&fd, CMSG_DATA(cmsg), sizeof(int));
        return fd;
    }
    return -1;
}

int main(void) {
    int fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr = {0};
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
    connect(fd, (struct sockaddr *)&addr, sizeof(addr));

    int received_fd = recv_fd(fd);
    printf("Receiver: got fd %d\n", received_fd);

    /* Read from the received fd */
    char buf[256];
    ssize_t n = read(received_fd, buf, sizeof(buf) - 1);
    buf[n] = '\0';
    printf("Receiver: read from passed fd: %s", buf);

    close(received_fd);
    close(fd);
    return 0;
}

Run the sender first, then the receiver. The receiver reads /etc/hostname using a file descriptor it never opened -- the sender passed it over the socket.

FD passing flow:

Sender process:                   Receiver process:
  fd 3 -> /etc/hostname
      |
      +-- sendmsg(SCM_RIGHTS) --> recvmsg() --> fd 4 -> /etc/hostname
                                                        (same file, new fd #)

Caution: The received file descriptor number will be different from the sender's. The kernel allocates the lowest available fd number in the receiver's table. The underlying file description (offset, flags) is shared.

Passing Credentials (SCM_CREDENTIALS)

Unix domain sockets can also verify the peer's PID, UID, and GID.

/* cred_server.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#define SOCKET_PATH "/tmp/cred_check.sock"

int main(void) {
    int srv = socket(AF_UNIX, SOCK_STREAM, 0);
    unlink(SOCKET_PATH);

    struct sockaddr_un addr = {0};
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
    bind(srv, (struct sockaddr *)&addr, sizeof(addr));
    listen(srv, 1);

    int client = accept(srv, NULL, NULL);

    /* Enable credential passing */
    int optval = 1;
    setsockopt(client, SOL_SOCKET, SO_PASSCRED, &optval, sizeof(optval));

    char buf[1];
    struct iovec iov = { .iov_base = buf, .iov_len = 1 };

    union {
        char buf[CMSG_SPACE(sizeof(struct ucred))];
        struct cmsghdr align;
    } cmsg_buf;

    struct msghdr msg = {0};
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_control = cmsg_buf.buf;
    msg.msg_controllen = sizeof(cmsg_buf.buf);

    recvmsg(client, &msg, 0);

    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
    if (cmsg && cmsg->cmsg_level == SOL_SOCKET
             && cmsg->cmsg_type == SCM_CREDENTIALS) {
        struct ucred cred;
        memcpy(&cred, CMSG_DATA(cmsg), sizeof(cred));
        printf("Peer PID: %d\n", cred.pid);
        printf("Peer UID: %d\n", cred.uid);
        printf("Peer GID: %d\n", cred.gid);
    }

    close(client);
    close(srv);
    unlink(SOCKET_PATH);
    return 0;
}

This is how D-Bus, systemd, and many daemons authenticate their clients without passwords.

Driver Prep: Unix domain sockets are used heavily in the Linux ecosystem. systemd socket activation passes pre-opened sockets to services. D-Bus uses Unix domain sockets for desktop IPC. Container runtimes pass file descriptors for namespace setup. Understanding sendmsg/recvmsg with ancillary data is essential for systems programming.

Rust: UnixStream and UnixListener

Rust's standard library includes Unix domain socket support:

// uds_server.rs
use std::os::unix::net::UnixListener;
use std::io::{Read, Write};

fn main() {
    let path = "/tmp/rust_uds.sock";
    let _ = std::fs::remove_file(path);

    let listener = UnixListener::bind(path).expect("bind failed");
    println!("Server listening on {}", path);

    let (mut stream, _addr) = listener.accept().expect("accept failed");

    let mut buf = [0u8; 256];
    let n = stream.read(&mut buf).expect("read failed");
    let msg = std::str::from_utf8(&buf[..n]).unwrap();
    println!("Server received: {}", msg);

    stream.write_all(b"Hello from Rust server").expect("write failed");

    std::fs::remove_file(path).ok();
}
// uds_client.rs
use std::os::unix::net::UnixStream;
use std::io::{Read, Write};

fn main() {
    let path = "/tmp/rust_uds.sock";
    let mut stream = UnixStream::connect(path).expect("connect failed");

    stream.write_all(b"Hello from Rust client").expect("write failed");

    let mut buf = [0u8; 256];
    let n = stream.read(&mut buf).expect("read failed");
    let msg = std::str::from_utf8(&buf[..n]).unwrap();
    println!("Client received: {}", msg);
}

Rust: Datagram Sockets

// uds_dgram.rs
use std::os::unix::net::UnixDatagram;

fn main() {
    let server_path = "/tmp/rust_dgram_srv.sock";
    let client_path = "/tmp/rust_dgram_cli.sock";

    let _ = std::fs::remove_file(server_path);
    let _ = std::fs::remove_file(client_path);

    let server = UnixDatagram::bind(server_path).unwrap();
    let client = UnixDatagram::bind(client_path).unwrap();

    client.send_to(b"Hello datagram", server_path).unwrap();

    let mut buf = [0u8; 256];
    let (n, addr) = server.recv_from(&mut buf).unwrap();
    println!("Server got: {}", std::str::from_utf8(&buf[..n]).unwrap());

    server.send_to(b"ACK", addr.as_pathname().unwrap()).unwrap();

    let n = client.recv(&mut buf).unwrap();
    println!("Client got: {}", std::str::from_utf8(&buf[..n]).unwrap());

    std::fs::remove_file(server_path).ok();
    std::fs::remove_file(client_path).ok();
}

For async Unix domain sockets, tokio provides tokio::net::UnixListener and tokio::net::UnixStream with the same API as the sync versions but using .await. See Ch40 for async patterns.

Rust Note: Rust's std::os::unix::net types do not support SCM_RIGHTS directly. For file descriptor passing in Rust, use the nix crate's sendmsg/recvmsg with ControlMessage::ScmRights, or the passfd crate.

Why Unix Domain Sockets Are the Best IPC

+---------------------------+------------------------------------+
| Feature                   | Unix Domain Sockets                |
+---------------------------+------------------------------------+
| Bidirectional             | Yes (SOCK_STREAM)                  |
| Message boundaries        | Yes (SOCK_DGRAM)                   |
| Unrelated processes       | Yes                                |
| File descriptor passing   | Yes (SCM_RIGHTS)                   |
| Credential checking       | Yes (SCM_CREDENTIALS)              |
| Familiar API              | Same as TCP/UDP sockets            |
| Performance               | Faster than TCP loopback           |
| Backpressure              | Yes (kernel buffer limits)         |
| Async-compatible          | Yes (epoll, tokio, etc.)           |
| Easy to upgrade to TCP    | Change AF_UNIX to AF_INET          |
+---------------------------+------------------------------------+

Knowledge Check

  1. What is the difference between a filesystem-path socket and an abstract socket?
  2. How does SCM_RIGHTS work at the kernel level?
  3. Why must a datagram client also bind to a path if it wants to receive replies?

Common Pitfalls

  • Forgetting to unlink the socket file -- the next bind will fail with EADDRINUSE. Always unlink before bind.
  • Using sizeof(addr) for abstract socket addresses -- abstract names can contain null bytes. Pass the exact computed length.
  • Not setting SO_PASSCRED before receiving credentials -- the kernel does not attach credential data by default.
  • Assuming SOCK_DGRAM is unreliable -- unlike UDP, Unix datagram sockets are reliable on the same machine. Messages are never dropped (but the sender blocks if the receiver's buffer is full).
  • Permission issues on the socket file -- the socket file inherits the umask. Use chmod or fchmod if other users need access.
  • Buffer overflow in sun_path -- the path field is only 108 bytes on Linux. Use abstract sockets for long names.

The Socket API

Networking on Linux starts with sockets. A socket is a file descriptor that represents one end of a network conversation. Every networked program you have ever used -- web browsers, SSH clients, game servers -- builds on the same handful of system calls: socket(), bind(), listen(), accept(), connect().

This chapter walks through each call, the address structures that feed them, and the DNS resolution machinery that maps hostnames to addresses.

The Two Workflows

Before any code, understand the two fundamental patterns.

  CLIENT                              SERVER
  ------                              ------
  socket()                            socket()
     |                                   |
  connect() -----> [network] ----->   bind()
     |                                   |
  write()/read()                      listen()
     |                                   |
  close()                             accept() ---> new fd
                                         |
                                      read()/write()
                                         |
                                      close()

The client creates a socket and immediately connects. The server creates a socket, binds it to an address, starts listening, and accepts incoming connections.

Address Structures

Every socket call that touches an address needs a struct sockaddr. In practice you never use the generic one directly. You fill in a protocol-specific structure and cast it.

  struct sockaddr (generic, 16 bytes)
  +--------+---------------------------+
  | family |       14 bytes of data    |
  +--------+---------------------------+

  struct sockaddr_in (IPv4)
  +--------+--------+------------------+
  | AF_INET| port   |  4-byte IPv4 addr|  + 8 bytes padding
  +--------+--------+------------------+

  struct sockaddr_in6 (IPv6)
  +--------+--------+------+-----------+----------+
  |AF_INET6| port   |flow  | 16-byte IPv6 addr    | + scope_id
  +--------+--------+------+-----------+----------+

Creating a Socket in C

/* create_socket.c -- create a TCP socket and print its fd */
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_STREAM, 0);
    if (fd < 0) {
        perror("socket");
        return 1;
    }
    printf("TCP socket fd = %d\n", fd);

    int udp_fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (udp_fd < 0) {
        perror("socket");
        close(fd);
        return 1;
    }
    printf("UDP socket fd = %d\n", udp_fd);

    close(fd);
    close(udp_fd);
    return 0;
}

The three arguments: address family (AF_INET for IPv4, AF_INET6 for IPv6), socket type (SOCK_STREAM for TCP, SOCK_DGRAM for UDP), and protocol (0 lets the kernel pick the obvious one).

Caution: A socket fd is just a number. If you forget to close it, you leak a file descriptor. In a long-running server, this eventually hits the per-process fd limit and new connections silently fail.

Filling an Address: inet_pton

inet_pton converts a human-readable address string into binary form. inet_ntop goes the other direction.

/* addr_convert.c -- convert addresses between text and binary */
#include <stdio.h>
#include <arpa/inet.h>

int main(void)
{
    struct sockaddr_in addr;
    addr.sin_family = AF_INET;
    addr.sin_port = htons(8080);   /* host-to-network byte order */

    if (inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr) != 1) {
        fprintf(stderr, "bad address\n");
        return 1;
    }

    /* Convert back to string */
    char buf[INET_ADDRSTRLEN];
    const char *result = inet_ntop(AF_INET, &addr.sin_addr,
                                   buf, sizeof(buf));
    if (!result) {
        perror("inet_ntop");
        return 1;
    }
    printf("Address: %s  Port: %d\n", buf, ntohs(addr.sin_port));

    /* IPv6 example */
    struct sockaddr_in6 addr6;
    addr6.sin6_family = AF_INET6;
    addr6.sin6_port = htons(9090);
    inet_pton(AF_INET6, "::1", &addr6.sin6_addr);

    char buf6[INET6_ADDRSTRLEN];
    inet_ntop(AF_INET6, &addr6.sin6_addr, buf6, sizeof(buf6));
    printf("IPv6 Address: %s  Port: %d\n", buf6, ntohs(addr6.sin6_port));

    return 0;
}

htons and ntohs convert between host byte order and network byte order (big-endian). Every multi-byte field in a sockaddr must be in network byte order.

Caution: Forgetting htons() on the port is a classic bug. Port 8080 in little-endian becomes 47137 in big-endian. Your server binds to the wrong port and you spend an hour debugging.

DNS Resolution: getaddrinfo

Hard-coding IP addresses is fragile. getaddrinfo resolves hostnames and service names, returning a linked list of address structures ready to pass to connect() or bind().

/* resolve.c -- resolve a hostname to IP addresses */
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <arpa/inet.h>

int main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "usage: %s hostname\n", argv[0]);
        return 1;
    }

    struct addrinfo hints, *res, *p;
    memset(&hints, 0, sizeof(hints));
    hints.ai_family   = AF_UNSPEC;    /* IPv4 or IPv6 */
    hints.ai_socktype = SOCK_STREAM;  /* TCP */

    int status = getaddrinfo(argv[1], "http", &hints, &res);
    if (status != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(status));
        return 1;
    }

    for (p = res; p != NULL; p = p->ai_next) {
        char ipstr[INET6_ADDRSTRLEN];
        void *addr;
        const char *ipver;

        if (p->ai_family == AF_INET) {
            struct sockaddr_in *ipv4 = (struct sockaddr_in *)p->ai_addr;
            addr = &ipv4->sin_addr;
            ipver = "IPv4";
        } else {
            struct sockaddr_in6 *ipv6 = (struct sockaddr_in6 *)p->ai_addr;
            addr = &ipv6->sin6_addr;
            ipver = "IPv6";
        }

        inet_ntop(p->ai_family, addr, ipstr, sizeof(ipstr));
        printf("  %s: %s\n", ipver, ipstr);
    }

    freeaddrinfo(res);
    return 0;
}

getaddrinfo is thread-safe, handles both IPv4 and IPv6, and replaces the older gethostbyname. Always use it.

Try It: Compile resolve.c and run it with localhost, then google.com, then a hostname that does not exist. Observe the error from gai_strerror.

A Complete TCP Client in C

/* tcp_client.c -- connect to a server, send a message, read reply */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

int main(int argc, char *argv[])
{
    if (argc != 3) {
        fprintf(stderr, "usage: %s host port\n", argv[0]);
        return 1;
    }

    struct addrinfo hints, *res;
    memset(&hints, 0, sizeof(hints));
    hints.ai_family   = AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;

    int status = getaddrinfo(argv[1], argv[2], &hints, &res);
    if (status != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(status));
        return 1;
    }

    int sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
    if (sockfd < 0) {
        perror("socket");
        freeaddrinfo(res);
        return 1;
    }

    if (connect(sockfd, res->ai_addr, res->ai_addrlen) < 0) {
        perror("connect");
        close(sockfd);
        freeaddrinfo(res);
        return 1;
    }
    freeaddrinfo(res);

    const char *msg = "Hello, server!\n";
    write(sockfd, msg, strlen(msg));

    char buf[1024];
    ssize_t n = read(sockfd, buf, sizeof(buf) - 1);
    if (n > 0) {
        buf[n] = '\0';
        printf("Server replied: %s", buf);
    }

    close(sockfd);
    return 0;
}

The flow: resolve address, create socket, connect, write, read, close. Every real-world client follows this skeleton.

A Complete TCP Server in C

/* tcp_server.c -- accept one connection, echo, exit */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (listen_fd < 0) { perror("socket"); return 1; }

    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(7878);

    if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        perror("bind");
        close(listen_fd);
        return 1;
    }

    if (listen(listen_fd, 5) < 0) {
        perror("listen");
        close(listen_fd);
        return 1;
    }
    printf("Listening on port 7878...\n");

    struct sockaddr_in client_addr;
    socklen_t client_len = sizeof(client_addr);
    int conn_fd = accept(listen_fd, (struct sockaddr *)&client_addr,
                         &client_len);
    if (conn_fd < 0) {
        perror("accept");
        close(listen_fd);
        return 1;
    }

    char client_ip[INET_ADDRSTRLEN];
    inet_ntop(AF_INET, &client_addr.sin_addr, client_ip, sizeof(client_ip));
    printf("Connection from %s:%d\n", client_ip, ntohs(client_addr.sin_port));

    char buf[1024];
    ssize_t n = read(conn_fd, buf, sizeof(buf));
    if (n > 0) {
        write(conn_fd, buf, n);   /* echo back */
    }

    close(conn_fd);
    close(listen_fd);
    return 0;
}

SO_REUSEADDR lets you restart the server immediately after stopping it. Without it, the kernel holds the port in TIME_WAIT state for up to 60 seconds.

Try It: Run the server in one terminal and the client in another (./tcp_client 127.0.0.1 7878). Then modify the server to handle multiple connections in a loop instead of exiting after the first one.

TCP vs UDP

FeatureTCP (SOCK_STREAM)UDP (SOCK_DGRAM)
ConnectionYes (connect/accept)No (sendto/recvfrom)
ReliabilityGuaranteed deliveryBest-effort
OrderingPreservedNot guaranteed
FramingByte streamMessage boundaries
OverheadHigher (handshake, ACKs)Lower
Typical useHTTP, SSH, databasesDNS, gaming, streaming

Rust: std::net

Rust's standard library wraps the socket API into safe, high-level types. No raw sockaddr structs, no casts, no byte-order functions to remember.

// tcp_client.rs -- connect, send, receive
use std::io::{Read, Write};
use std::net::TcpStream;

fn main() -> std::io::Result<()> {
    let mut stream = TcpStream::connect("127.0.0.1:7878")?;
    stream.write_all(b"Hello, server!\n")?;

    let mut buf = [0u8; 1024];
    let n = stream.read(&mut buf)?;
    print!("Server replied: {}", String::from_utf8_lossy(&buf[..n]));
    Ok(())
}

One line to connect. One line to write. One line to read. The ? operator propagates errors without crashing.

// tcp_server.rs -- accept one connection, echo
use std::io::{Read, Write};
use std::net::TcpListener;

fn main() -> std::io::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:7878")?;
    println!("Listening on port 7878...");

    let (mut stream, addr) = listener.accept()?;
    println!("Connection from {}", addr);

    let mut buf = [0u8; 1024];
    let n = stream.read(&mut buf)?;
    stream.write_all(&buf[..n])?;

    Ok(())
}

Rust Note: TcpListener::bind handles socket(), bind(), and listen() in a single call. The address is parsed from a string automatically. SO_REUSEADDR is set by default on most platforms.

Rust: UDP

// udp_example.rs -- send and receive a datagram
use std::net::UdpSocket;

fn main() -> std::io::Result<()> {
    let socket = UdpSocket::bind("0.0.0.0:0")?;  // OS picks port
    socket.send_to(b"ping", "127.0.0.1:9000")?;

    let mut buf = [0u8; 1024];
    let (n, src) = socket.recv_from(&mut buf)?;
    println!("Got {} bytes from {}: {}", n, src,
             String::from_utf8_lossy(&buf[..n]));
    Ok(())
}

Rust: The nix Crate for Low-Level Control

When you need setsockopt, raw sockaddr manipulation, or socket options that std::net does not expose, use the nix crate.

// nix_socket.rs -- create a socket with nix for low-level control
// Cargo.toml: nix = { version = "0.29", features = ["net"] }
use nix::sys::socket::{
    socket, bind, listen, accept, AddressFamily, SockType, SockFlag,
    SockaddrIn,
};
use std::net::Ipv4Addr;
use std::io::{Read, Write};
use std::os::fd::FromRawFd;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let fd = socket(
        AddressFamily::Inet,
        SockType::Stream,
        SockFlag::empty(),
        None,
    )?;

    let addr = SockaddrIn::new(0, 0, 0, 0, 7879);  // 0.0.0.0:7879
    bind(fd.as_raw_fd(), &addr)?;
    listen(&fd, nix::sys::socket::Backlog::new(5)?)?;

    println!("nix: listening on port 7879");
    let conn_fd = accept(&fd)?;
    println!("nix: accepted connection");

    // Wrap in std File for Read/Write traits
    let mut stream = unsafe { std::net::TcpStream::from_raw_fd(conn_fd) };
    let mut buf = [0u8; 256];
    let n = stream.read(&mut buf)?;
    stream.write_all(&buf[..n])?;

    Ok(())
}

Driver Prep: In kernel modules, you will encounter struct socket and sock_create_kern(), which mirror the userspace socket API. Understanding the syscall interface here maps directly to the kernel's internal socket layer.

Data Flow Through the Stack

  Application:   write(fd, buf, len)
       |
  +---------+
  | TCP/UDP |   segmentation, checksums, sequence numbers
  +---------+
       |
  +---------+
  |   IP    |   routing, fragmentation, TTL
  +---------+
       |
  +---------+
  | Driver  |   DMA to NIC hardware
  +---------+
       |
     [wire]

Every write() to a socket sends data down this stack. Every read() pulls data up.

Knowledge Check

  1. What is the difference between SOCK_STREAM and SOCK_DGRAM? Which transport protocol does each imply?
  2. Why must you call htons() on the port number before storing it in sockaddr_in?
  3. What does getaddrinfo return, and why is it preferred over gethostbyname?

Common Pitfalls

  • Forgetting htons/htonl -- your address or port is silently wrong.
  • Not checking return values -- connect() can fail for dozens of reasons.
  • Not calling freeaddrinfo -- leaks the linked list returned by getaddrinfo.
  • Using INADDR_ANY without htonl -- works by accident on little-endian (0 is 0 in any byte order), but INADDR_LOOPBACK will not.
  • Assuming read() returns a complete message -- TCP is a byte stream. One write() can arrive as multiple read() calls.
  • Binding to a specific address when you want all interfaces -- use INADDR_ANY (0.0.0.0) to accept connections on every interface.

TCP Client-Server Programming

The previous chapter showed the individual socket calls. Now we wire them into real programs: an echo server that handles multiple clients, a matching client, graceful shutdown, and protocol framing. By the end, you will have a working chat server.

The Echo Server in C

This server accepts connections in a loop, forks a child process for each client, and echoes back everything it receives.

/* echo_server.c -- fork-per-connection echo server */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <errno.h>

static volatile sig_atomic_t running = 1;

static void handle_sigterm(int sig)
{
    (void)sig;
    running = 0;
}

static void reap_children(int sig)
{
    (void)sig;
    while (waitpid(-1, NULL, WNOHANG) > 0)
        ;
}

static void handle_client(int fd)
{
    char buf[4096];
    ssize_t n;
    while ((n = read(fd, buf, sizeof(buf))) > 0) {
        ssize_t written = 0;
        while (written < n) {
            ssize_t w = write(fd, buf + written, n - written);
            if (w <= 0) return;
            written += w;
        }
    }
    close(fd);
}

int main(void)
{
    struct sigaction sa_term;
    memset(&sa_term, 0, sizeof(sa_term));
    sa_term.sa_handler = handle_sigterm;
    sigaction(SIGTERM, &sa_term, NULL);
    sigaction(SIGINT, &sa_term, NULL);

    struct sigaction sa_chld;
    memset(&sa_chld, 0, sizeof(sa_chld));
    sa_chld.sa_handler = reap_children;
    sa_chld.sa_flags   = SA_RESTART;
    sigaction(SIGCHLD, &sa_chld, NULL);

    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (listen_fd < 0) { perror("socket"); return 1; }

    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(7878);

    if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        perror("bind"); return 1;
    }
    if (listen(listen_fd, 128) < 0) {
        perror("listen"); return 1;
    }
    printf("Echo server listening on port 7878\n");

    while (running) {
        struct sockaddr_in client;
        socklen_t clen = sizeof(client);
        int conn_fd = accept(listen_fd, (struct sockaddr *)&client, &clen);
        if (conn_fd < 0) {
            if (errno == EINTR) continue;
            perror("accept");
            break;
        }

        char ip[INET_ADDRSTRLEN];
        inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
        printf("New connection from %s:%d\n", ip, ntohs(client.sin_port));

        pid_t pid = fork();
        if (pid < 0) {
            perror("fork");
            close(conn_fd);
        } else if (pid == 0) {
            /* Child: handle client */
            close(listen_fd);
            handle_client(conn_fd);
            _exit(0);
        } else {
            /* Parent: close the connected fd, keep listening */
            close(conn_fd);
        }
    }

    printf("\nShutting down...\n");
    close(listen_fd);
    return 0;
}

SIGCHLD handler reaps zombie processes. accept() can return EINTR when interrupted; the loop retries. The child closes the listening socket; the parent closes the connected socket.

Caution: fork() duplicates the entire process. A thousand simultaneous connections means a thousand processes. We will fix this shortly.

The Matching Client

/* echo_client.c -- send lines from stdin, print echoed replies */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netdb.h>

int main(int argc, char *argv[])
{
    const char *host = argc > 1 ? argv[1] : "127.0.0.1";
    const char *port = argc > 2 ? argv[2] : "7878";

    struct addrinfo hints = {0}, *res;
    hints.ai_family   = AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;

    int s = getaddrinfo(host, port, &hints, &res);
    if (s != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(s));
        return 1;
    }

    int fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
    if (fd < 0) { perror("socket"); return 1; }

    if (connect(fd, res->ai_addr, res->ai_addrlen) < 0) {
        perror("connect"); return 1;
    }
    freeaddrinfo(res);

    printf("Connected. Type lines to echo (Ctrl-D to quit):\n");

    char line[1024];
    while (fgets(line, sizeof(line), stdin)) {
        write(fd, line, strlen(line));

        char buf[1024];
        ssize_t n = read(fd, buf, sizeof(buf) - 1);
        if (n <= 0) break;
        buf[n] = '\0';
        printf("echo: %s", buf);
    }

    close(fd);
    return 0;
}

Try It: Start the server, then open three separate terminals each running the client. Verify that all three sessions echo independently.

Thread-Per-Connection Alternative

Threads share the same address space, so they are cheaper than processes. Replace the fork() block with pthread_create.

/* echo_server_threaded.c -- thread-per-connection echo server */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

static void *client_thread(void *arg)
{
    int fd = *(int *)arg;
    free(arg);

    char buf[4096];
    ssize_t n;
    while ((n = read(fd, buf, sizeof(buf))) > 0) {
        ssize_t w = 0;
        while (w < n) {
            ssize_t ret = write(fd, buf + w, n - w);
            if (ret <= 0) goto done;
            w += ret;
        }
    }
done:
    close(fd);
    return NULL;
}

int main(void)
{
    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (listen_fd < 0) { perror("socket"); return 1; }

    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(7878);

    if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        perror("bind"); return 1;
    }
    listen(listen_fd, 128);
    printf("Threaded echo server on port 7878\n");

    for (;;) {
        struct sockaddr_in client;
        socklen_t clen = sizeof(client);
        int conn_fd = accept(listen_fd, (struct sockaddr *)&client, &clen);
        if (conn_fd < 0) { perror("accept"); continue; }

        int *fdp = malloc(sizeof(int));
        *fdp = conn_fd;

        pthread_t tid;
        if (pthread_create(&tid, NULL, client_thread, fdp) != 0) {
            perror("pthread_create");
            close(conn_fd);
            free(fdp);
        } else {
            pthread_detach(tid);
        }
    }

    close(listen_fd);
    return 0;
}

Compile with gcc -pthread echo_server_threaded.c -o echo_server_threaded.

Caution: We heap-allocate the fd so each thread gets its own copy. Passing &conn_fd directly is a race: the main loop may overwrite conn_fd before the thread reads it.

Protocol Framing

TCP is a byte stream. If the client sends two messages quickly, the server might receive them glued together in one read() call. You need a protocol to know where one message ends and the next begins.

Two common approaches: (1) length-prefix -- send 4 bytes of length then the payload; (2) delimiter -- terminate each message with \n. Length-prefix is more robust.

Length-Prefix Framing in C

/* framed_send.c -- send a length-prefixed message */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdint.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>

/* Write exactly n bytes */
static int write_all(int fd, const void *buf, size_t n)
{
    const char *p = buf;
    while (n > 0) {
        ssize_t w = write(fd, p, n);
        if (w <= 0) return -1;
        p += w;
        n -= w;
    }
    return 0;
}

/* Read exactly n bytes */
static int read_all(int fd, void *buf, size_t n)
{
    char *p = buf;
    while (n > 0) {
        ssize_t r = read(fd, p, n);
        if (r <= 0) return -1;
        p += r;
        n -= r;
    }
    return 0;
}

int send_message(int fd, const char *msg, size_t len)
{
    uint32_t net_len = htonl((uint32_t)len);
    if (write_all(fd, &net_len, 4) < 0) return -1;
    if (write_all(fd, msg, len) < 0)     return -1;
    return 0;
}

int recv_message(int fd, char *buf, size_t bufsize, size_t *out_len)
{
    uint32_t net_len;
    if (read_all(fd, &net_len, 4) < 0) return -1;
    uint32_t len = ntohl(net_len);
    if (len > bufsize) return -1;   /* message too large */
    if (read_all(fd, buf, len) < 0) return -1;
    *out_len = len;
    return 0;
}

Caution: Always validate the length prefix. A malicious client could send 0xFFFFFFFF and trick you into allocating 4 GB of memory. Set a maximum message size.

Try It: Write a small main() that connects to the echo server, sends a framed message, then reads the framed response. Verify that even rapid sends are correctly separated.

Rust TCP Server

// echo_server.rs -- multi-threaded echo server in Rust
use std::io::{Read, Write};
use std::net::{TcpListener, TcpStream};
use std::thread;

fn handle_client(mut stream: TcpStream) {
    let peer = stream.peer_addr().unwrap();
    println!("Connection from {}", peer);

    let mut buf = [0u8; 4096];
    loop {
        match stream.read(&mut buf) {
            Ok(0) => break,
            Ok(n) => {
                if stream.write_all(&buf[..n]).is_err() {
                    break;
                }
            }
            Err(_) => break,
        }
    }
    println!("{} disconnected", peer);
}

fn main() -> std::io::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:7878")?;
    println!("Echo server listening on port 7878");

    for stream in listener.incoming() {
        match stream {
            Ok(s) => {
                thread::spawn(move || handle_client(s));
            }
            Err(e) => eprintln!("accept error: {}", e),
        }
    }
    Ok(())
}

Rust Note: TcpStream is Send, so moving it into a thread::spawn closure is safe. The compiler ensures no two threads can access the same stream. No malloc for the fd pointer, no detach -- ownership transfer handles everything.

Rust: Length-Prefix Framing

// framed.rs -- length-prefix framing over TCP
use std::io::{self, Read, Write};
use std::net::TcpStream;

fn send_message(stream: &mut TcpStream, msg: &[u8]) -> io::Result<()> {
    let len = (msg.len() as u32).to_be_bytes();
    stream.write_all(&len)?;
    stream.write_all(msg)?;
    Ok(())
}

fn recv_message(stream: &mut TcpStream) -> io::Result<Vec<u8>> {
    let mut len_buf = [0u8; 4];
    stream.read_exact(&mut len_buf)?;
    let len = u32::from_be_bytes(len_buf) as usize;

    if len > 1_000_000 {
        return Err(io::Error::new(io::ErrorKind::InvalidData,
                                  "message too large"));
    }

    let mut buf = vec![0u8; len];
    stream.read_exact(&mut buf)?;
    Ok(buf)
}

fn main() -> io::Result<()> {
    let mut stream = TcpStream::connect("127.0.0.1:7878")?;
    send_message(&mut stream, b"Hello, framed world!")?;

    let reply = recv_message(&mut stream)?;
    println!("Got: {}", String::from_utf8_lossy(&reply));
    Ok(())
}

read_exact loops internally until exactly N bytes are read. This eliminates the manual read_all loop from C.

A Complete Chat Server in C

This is the culmination: a multi-client chat server where messages from one client are broadcast to all others.

/* chat_server.c -- simple broadcast chat (thread-per-connection) */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define MAX_CLIENTS 64
#define BUF_SIZE    1024

static pthread_mutex_t clients_lock = PTHREAD_MUTEX_INITIALIZER;
static int client_fds[MAX_CLIENTS];
static int client_count = 0;

static void add_client(int fd)
{
    pthread_mutex_lock(&clients_lock);
    if (client_count < MAX_CLIENTS) {
        client_fds[client_count++] = fd;
    }
    pthread_mutex_unlock(&clients_lock);
}

static void remove_client(int fd)
{
    pthread_mutex_lock(&clients_lock);
    for (int i = 0; i < client_count; i++) {
        if (client_fds[i] == fd) {
            client_fds[i] = client_fds[--client_count];
            break;
        }
    }
    pthread_mutex_unlock(&clients_lock);
}

static void broadcast(int sender_fd, const char *msg, size_t len)
{
    pthread_mutex_lock(&clients_lock);
    for (int i = 0; i < client_count; i++) {
        if (client_fds[i] != sender_fd) {
            write(client_fds[i], msg, len);
        }
    }
    pthread_mutex_unlock(&clients_lock);
}

static void *client_thread(void *arg)
{
    int fd = *(int *)arg;
    free(arg);
    add_client(fd);

    char buf[BUF_SIZE];
    ssize_t n;
    while ((n = read(fd, buf, sizeof(buf))) > 0) {
        broadcast(fd, buf, n);
    }

    remove_client(fd);
    close(fd);
    return NULL;
}

int main(void)
{
    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(9000);

    bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(listen_fd, 128);
    printf("Chat server on port 9000 (use: nc 127.0.0.1 9000)\n");

    for (;;) {
        struct sockaddr_in cl;
        socklen_t len = sizeof(cl);
        int conn = accept(listen_fd, (struct sockaddr *)&cl, &len);
        if (conn < 0) continue;

        char ip[INET_ADDRSTRLEN];
        inet_ntop(AF_INET, &cl.sin_addr, ip, sizeof(ip));
        printf("%s:%d joined\n", ip, ntohs(cl.sin_port));

        int *fdp = malloc(sizeof(int));
        *fdp = conn;
        pthread_t tid;
        pthread_create(&tid, NULL, client_thread, fdp);
        pthread_detach(tid);
    }
}

Test with multiple nc 127.0.0.1 9000 sessions. Type in one and watch it appear in the others.

Rust Chat Server

// chat_server.rs -- broadcast chat server
use std::io::{BufRead, BufReader, Write};
use std::net::{TcpListener, TcpStream, SocketAddr};
use std::sync::{Arc, Mutex};
use std::thread;

type ClientList = Arc<Mutex<Vec<(SocketAddr, TcpStream)>>>;

fn handle_client(stream: TcpStream, clients: ClientList) {
    let peer = stream.peer_addr().unwrap();
    println!("{} joined", peer);

    { clients.lock().unwrap().push((peer, stream.try_clone().unwrap())); }

    let reader = BufReader::new(stream);
    for line in reader.lines().flatten() {
        let full = format!("{}: {}\n", peer, line);
        let list = clients.lock().unwrap();
        for (addr, mut s) in list.iter().filter(|(a, _)| *a != peer) {
            let _ = s.write_all(full.as_bytes());
        }
    }

    { clients.lock().unwrap().retain(|(a, _)| *a != peer); }
    println!("{} left", peer);
}

fn main() -> std::io::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:9000")?;
    let clients: ClientList = Arc::new(Mutex::new(Vec::new()));
    println!("Chat server on port 9000");

    for stream in listener.incoming() {
        let stream = stream?;
        let clients = Arc::clone(&clients);
        thread::spawn(move || handle_client(stream, clients));
    }
    Ok(())
}

Rust Note: Arc<Mutex<Vec<TcpStream>>> is the standard pattern for shared mutable state across threads. The compiler refuses to compile the program if you try to share without proper synchronization. No equivalent compile-time guarantee exists in C.

Driver Prep: Kernel network drivers process packets without the luxury of threads-per-connection. The patterns in chapters 48 and 49 (poll, epoll) are what drivers and high-performance servers use instead.

Graceful Shutdown Pattern

The volatile sig_atomic_t running flag (shown in the fork server above) is the standard approach. The signal handler sets it to 0; the main loop checks it. Close the listening socket to unblock accept(), then wait for in-flight clients to finish before exiting.

Knowledge Check

  1. Why does the fork-based server close conn_fd in the parent and listen_fd in the child?
  2. What happens if you omit the mutex around the client list in the chat server?
  3. How does length-prefix framing solve the TCP message-boundary problem?

Common Pitfalls

  • Not handling partial writes -- write() can return fewer bytes than requested. Always loop.
  • Not handling partial reads -- same issue on the receive side. read() returns whatever is available, not a complete message.
  • Zombie processes -- forgetting SIGCHLD handler with fork-per-connection fills the process table.
  • Thread stack overflow -- each thread allocates a stack (typically 2-8 MB). Thousands of threads consume gigabytes of memory.
  • Broadcasting while holding the lock too long -- a slow client's write() can block, stalling all other broadcasts. Consider non-blocking I/O or per-client queues.
  • Forgetting SO_REUSEADDR -- restarting the server gives "Address already in use" for up to 60 seconds.

UDP and Datagram Sockets

UDP is the other transport protocol on IP. It provides no connections, no guaranteed delivery, no ordering. You send a datagram, and it either arrives whole or not at all. This simplicity makes UDP fast and the right choice for DNS lookups, live video, gaming, and service discovery.

This chapter covers sendto/recvfrom, multicast, broadcast, and builds a practical service discovery protocol.

UDP vs TCP Recap

  TCP (SOCK_STREAM)                  UDP (SOCK_DGRAM)
  +----------------------------------+----------------------------------+
  | 3-way handshake                  | No handshake                     |
  | Guaranteed delivery (retransmit) | Fire and forget                  |
  | Ordered byte stream              | Independent datagrams            |
  | Flow control, congestion control | None built-in                    |
  | ~200 bytes overhead per segment  | 8-byte header                    |
  +----------------------------------+----------------------------------+

A UDP Echo Server in C

/* udp_echo_server.c -- receive datagrams, echo them back */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0) { perror("socket"); return 1; }

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(5000);

    if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        perror("bind"); return 1;
    }
    printf("UDP echo server on port 5000\n");

    for (;;) {
        char buf[65535];
        struct sockaddr_in client;
        socklen_t clen = sizeof(client);

        ssize_t n = recvfrom(fd, buf, sizeof(buf), 0,
                             (struct sockaddr *)&client, &clen);
        if (n < 0) { perror("recvfrom"); continue; }

        char ip[INET_ADDRSTRLEN];
        inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
        printf("From %s:%d (%zd bytes)\n", ip, ntohs(client.sin_port), n);

        /* Echo back to sender */
        sendto(fd, buf, n, 0, (struct sockaddr *)&client, clen);
    }

    close(fd);
    return 0;
}

Notice: no listen(), no accept(). A single socket handles all clients. recvfrom tells you who sent the datagram; sendto sends a reply directly to that address.

A UDP Client in C

/* udp_client.c -- send a message, wait for reply */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0) { perror("socket"); return 1; }

    struct sockaddr_in server = {0};
    server.sin_family = AF_INET;
    server.sin_port   = htons(5000);
    inet_pton(AF_INET, "127.0.0.1", &server.sin_addr);

    const char *msg = "Hello, UDP!";
    sendto(fd, msg, strlen(msg), 0,
           (struct sockaddr *)&server, sizeof(server));

    char buf[1024];
    struct sockaddr_in from;
    socklen_t flen = sizeof(from);
    ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
                         (struct sockaddr *)&from, &flen);
    if (n > 0) {
        buf[n] = '\0';
        printf("Reply: %s\n", buf);
    }

    close(fd);
    return 0;
}

Caution: recvfrom blocks forever if no reply comes. In production, set a timeout with setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, ...) or use poll() before reading.

Try It: Start the UDP server, run the client, then kill the server and run the client again. Observe that sendto succeeds even though nobody is listening -- UDP does not detect that the remote end is unreachable (unless the network returns an ICMP port-unreachable, which may or may not arrive).

When UDP Is Appropriate

  • DNS -- a single question-answer exchange. Retransmit if no reply in 2 seconds.
  • Gaming -- player position updates arrive 60 times per second. A lost packet is stale by the time it would be retransmitted.
  • Live video/audio -- a dropped frame is better than a delayed frame.
  • Service discovery -- "who's on the network?" is a broadcast/multicast question, and TCP cannot broadcast.
  • IoT sensors -- tiny devices with limited memory cannot afford TCP's state machine.

Handling Packet Loss at the Application Layer

UDP gives you no retransmission. If reliability matters, build it yourself.

  Sender                          Receiver
    |                                |
    |--- [seq=1] data ------------->|
    |                                |--- [ack=1] ---------->|
    |--- [seq=2] data ------------->|
    |          (lost)                |
    |--- (timeout, resend seq=2) -->|
    |                                |--- [ack=2] ---------->|

The minimum reliable protocol over UDP:

  1. Attach a sequence number to each datagram.
  2. The receiver acknowledges each sequence number.
  3. The sender retransmits if no acknowledgment arrives within a timeout.
  4. The receiver discards duplicates.
/* reliable_header.h -- minimal reliability over UDP */
#ifndef RELIABLE_HEADER_H
#define RELIABLE_HEADER_H

#include <stdint.h>

struct reliable_hdr {
    uint32_t seq;       /* sequence number (network byte order) */
    uint32_t ack;       /* acknowledgment number */
    uint16_t flags;     /* 0x01 = DATA, 0x02 = ACK */
    uint16_t len;       /* payload length */
};

#define FLAG_DATA 0x01
#define FLAG_ACK  0x02

#endif

Driver Prep: Many industrial and automotive protocols (CAN bus, some PROFINET variants) run on UDP or raw frames and implement their own reliability layer. This pattern shows up everywhere below TCP.

Broadcast

Broadcast sends a datagram to every host on the local subnet. The destination address is 255.255.255.255 (limited broadcast) or the subnet broadcast address (e.g., 192.168.1.255 for a /24 network).

/* broadcast_sender.c -- send a broadcast message */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0) { perror("socket"); return 1; }

    /* Must enable broadcast on the socket */
    int broadcast = 1;
    if (setsockopt(fd, SOL_SOCKET, SO_BROADCAST,
                   &broadcast, sizeof(broadcast)) < 0) {
        perror("setsockopt"); return 1;
    }

    struct sockaddr_in dest = {0};
    dest.sin_family      = AF_INET;
    dest.sin_port        = htons(5001);
    inet_pton(AF_INET, "255.255.255.255", &dest.sin_addr);

    const char *msg = "DISCOVER";
    sendto(fd, msg, strlen(msg), 0,
           (struct sockaddr *)&dest, sizeof(dest));
    printf("Broadcast sent\n");

    close(fd);
    return 0;
}

Caution: Broadcasting generates traffic that every host on the subnet must process. Do it sparingly. On large networks, prefer multicast.

Multicast

Multicast sends datagrams to a group address (224.0.0.0 - 239.255.255.255). Only hosts that join the group receive the traffic. The network infrastructure (IGMP) handles group membership.

/* mcast_receiver.c -- join a multicast group and print messages */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0) { perror("socket"); return 1; }

    /* Allow multiple receivers on same port */
    int reuse = 1;
    setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(5002);

    if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        perror("bind"); return 1;
    }

    /* Join multicast group 239.1.1.1 */
    struct ip_mreq mreq;
    inet_pton(AF_INET, "239.1.1.1", &mreq.imr_multiaddr);
    mreq.imr_interface.s_addr = htonl(INADDR_ANY);

    if (setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
                   &mreq, sizeof(mreq)) < 0) {
        perror("setsockopt IP_ADD_MEMBERSHIP"); return 1;
    }
    printf("Joined multicast group 239.1.1.1, listening on port 5002\n");

    for (;;) {
        char buf[1024];
        ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0, NULL, NULL);
        if (n < 0) { perror("recvfrom"); break; }
        buf[n] = '\0';
        printf("Received: %s\n", buf);
    }

    close(fd);
    return 0;
}
/* mcast_sender.c -- send to a multicast group */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0) { perror("socket"); return 1; }

    /* Set TTL for multicast (1 = local subnet only) */
    unsigned char ttl = 1;
    setsockopt(fd, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, sizeof(ttl));

    struct sockaddr_in dest = {0};
    dest.sin_family = AF_INET;
    dest.sin_port   = htons(5002);
    inet_pton(AF_INET, "239.1.1.1", &dest.sin_addr);

    const char *msg = "Hello, multicast group!";
    sendto(fd, msg, strlen(msg), 0,
           (struct sockaddr *)&dest, sizeof(dest));
    printf("Sent to multicast group 239.1.1.1\n");

    close(fd);
    return 0;
}

Try It: Start two or more mcast_receiver processes, then run mcast_sender. All receivers should print the message. Then stop one receiver and verify the others still work.

A Simple Discovery Protocol

Combine broadcast and timed responses to build a LAN service discovery mechanism.

/* discover_server.c -- respond to discovery broadcasts */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    int reuse = 1;
    setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(5003);

    bind(fd, (struct sockaddr *)&addr, sizeof(addr));
    printf("Discovery responder on port 5003\n");

    for (;;) {
        char buf[256];
        struct sockaddr_in client;
        socklen_t clen = sizeof(client);

        ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
                             (struct sockaddr *)&client, &clen);
        if (n < 0) continue;
        buf[n] = '\0';

        if (strcmp(buf, "DISCOVER") == 0) {
            const char *reply = "SERVICE:echo:7878";
            sendto(fd, reply, strlen(reply), 0,
                   (struct sockaddr *)&client, clen);
        }
    }
}
/* discover_client.c -- broadcast DISCOVER, collect responses */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    int broadcast = 1;
    setsockopt(fd, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof(broadcast));

    /* Set 2-second receive timeout */
    struct timeval tv = { .tv_sec = 2, .tv_usec = 0 };
    setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));

    struct sockaddr_in dest = {0};
    dest.sin_family = AF_INET;
    dest.sin_port   = htons(5003);
    inet_pton(AF_INET, "255.255.255.255", &dest.sin_addr);

    sendto(fd, "DISCOVER", 8, 0,
           (struct sockaddr *)&dest, sizeof(dest));
    printf("Sent DISCOVER broadcast, waiting for replies...\n");

    for (;;) {
        char buf[256];
        struct sockaddr_in from;
        socklen_t flen = sizeof(from);
        ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
                             (struct sockaddr *)&from, &flen);
        if (n < 0) break;    /* timeout or error */
        buf[n] = '\0';

        char ip[INET_ADDRSTRLEN];
        inet_ntop(AF_INET, &from.sin_addr, ip, sizeof(ip));
        printf("Found: %s at %s\n", buf, ip);
    }

    printf("Discovery complete.\n");
    close(fd);
    return 0;
}
  Discovery flow:

  Client                          Network                     Server(s)
    |                                |                           |
    |-- DISCOVER (broadcast) ------->| ------->                  |
    |                                |         [server receives] |
    |<------- SERVICE:echo:7878 -----|<------                    |
    |                                |                           |
    |  (timeout: 2 seconds)          |                           |
    |  [done]                        |                           |

Rust: UdpSocket

// udp_echo_server.rs -- UDP echo server
use std::net::UdpSocket;

fn main() -> std::io::Result<()> {
    let socket = UdpSocket::bind("0.0.0.0:5000")?;
    println!("UDP echo server on port 5000");

    let mut buf = [0u8; 65535];
    loop {
        let (n, src) = socket.recv_from(&mut buf)?;
        println!("From {} ({} bytes)", src, n);
        socket.send_to(&buf[..n], src)?;
    }
}
// udp_client.rs -- send a datagram, receive reply
use std::net::UdpSocket;
use std::time::Duration;

fn main() -> std::io::Result<()> {
    let socket = UdpSocket::bind("0.0.0.0:0")?;  // OS picks port
    socket.set_read_timeout(Some(Duration::from_secs(3)))?;

    socket.send_to(b"Hello, UDP!", "127.0.0.1:5000")?;

    let mut buf = [0u8; 1024];
    match socket.recv_from(&mut buf) {
        Ok((n, src)) => {
            println!("Reply from {}: {}",
                     src, String::from_utf8_lossy(&buf[..n]));
        }
        Err(e) => eprintln!("No reply: {}", e),
    }
    Ok(())
}

Rust Note: UdpSocket::bind("0.0.0.0:0") binds to a random available port. The address string is parsed via the ToSocketAddrs trait, which also handles DNS resolution. The set_read_timeout method replaces the C setsockopt dance.

Rust: Multicast

// mcast_receiver.rs -- join multicast group and receive
use std::net::{UdpSocket, Ipv4Addr};

fn main() -> std::io::Result<()> {
    let socket = UdpSocket::bind("0.0.0.0:5002")?;

    let multiaddr: Ipv4Addr = "239.1.1.1".parse().unwrap();
    let interface = Ipv4Addr::UNSPECIFIED;
    socket.join_multicast_v4(&multiaddr, &interface)?;

    println!("Joined multicast group 239.1.1.1 on port 5002");

    let mut buf = [0u8; 1024];
    loop {
        let (n, src) = socket.recv_from(&mut buf)?;
        println!("From {}: {}", src,
                 String::from_utf8_lossy(&buf[..n]));
    }
}
// mcast_sender.rs -- send to multicast group
use std::net::UdpSocket;

fn main() -> std::io::Result<()> {
    let socket = UdpSocket::bind("0.0.0.0:0")?;
    socket.set_multicast_ttl_v4(1)?;

    socket.send_to(b"Hello, multicast group!", "239.1.1.1:5002")?;
    println!("Sent to multicast group");
    Ok(())
}

Rust: Discovery Protocol

// discover_client.rs -- broadcast discovery and collect replies
use std::net::UdpSocket;
use std::time::Duration;

fn main() -> std::io::Result<()> {
    let socket = UdpSocket::bind("0.0.0.0:0")?;
    socket.set_broadcast(true)?;
    socket.set_read_timeout(Some(Duration::from_secs(2)))?;

    socket.send_to(b"DISCOVER", "255.255.255.255:5003")?;
    println!("Sent DISCOVER, waiting for replies...");

    let mut buf = [0u8; 256];
    loop {
        match socket.recv_from(&mut buf) {
            Ok((n, src)) => {
                let msg = String::from_utf8_lossy(&buf[..n]);
                println!("Found: {} at {}", msg, src);
            }
            Err(_) => break,
        }
    }
    println!("Discovery complete.");
    Ok(())
}

Maximum Datagram Size

  +-- Ethernet MTU: 1500 bytes ---+
  | IP header (20B) | UDP (8B) | payload (up to 1472B) |
  +------------------+-----------+-----------------------+

  Larger datagrams are fragmented by IP.
  Any lost fragment = entire datagram lost.
  Safe payload size for LAN: 1472 bytes
  Safe payload size for internet: ~512 bytes (conservative)
  Maximum theoretical UDP payload: 65,507 bytes

Caution: Sending 64 KB datagrams over the internet is asking for trouble. IP fragmentation dramatically increases the chance of packet loss because losing any single fragment kills the entire datagram. Stay under the path MTU.

Knowledge Check

  1. Why does a UDP server not need listen() or accept()?
  2. What socket option must be enabled before calling sendto with a broadcast address?
  3. How does multicast differ from broadcast in terms of network traffic?

Common Pitfalls

  • Assuming delivery -- UDP does not guarantee anything. Always plan for lost packets.
  • Assuming ordering -- datagrams can arrive out of order, especially across the internet.
  • Forgetting SO_BROADCAST -- sendto with a broadcast address fails with EACCES without it.
  • Large datagrams -- IP fragmentation silently destroys reliability. Keep payloads small.
  • No timeout on recvfrom -- blocks forever if no packet arrives. Always set SO_RCVTIMEO or use poll().
  • Multicast on loopback only -- by default, multicast may not leave the loopback interface. Check your routing table if receivers on other hosts do not get packets.

Multiplexing with select and poll

Blocking I/O is simple: call read(), wait for data, process it. But a server with 100 clients cannot call read() on all 100 sockets at the same time. It blocks on the first one and ignores the other 99. Fork-per-connection and thread-per-connection solve this, but they are expensive. I/O multiplexing lets a single thread monitor many file descriptors and act only on the ones that are ready.

This chapter covers select() and poll(), their APIs, and their limitations.

The Problem

  Thread blocked on fd 3:       fds 4, 5, 6 have data waiting
  +---+                         +---+---+---+
  | 3 | <-- read() blocks       | 4 | 5 | 6 |  data piling up
  +---+                         +---+---+---+

  With multiplexing:
  +---+---+---+---+
  | 3 | 4 | 5 | 6 |  <-- "which of these are ready?"
  +---+---+---+---+
       |
       v
  "fd 4 and fd 6 are ready to read"
       |
       v
  read(4, ...)    read(6, ...)   <-- no blocking

select() in C

select() watches three sets of file descriptors: readable, writable, and exceptional. It blocks until at least one fd is ready or a timeout expires.

/* select_server.c -- single-threaded multi-client echo with select() */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/select.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(7878);
    bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(listen_fd, 128);
    printf("select server on port 7878 (max %d fds)\n", FD_SETSIZE);

    fd_set master_set;
    FD_ZERO(&master_set);
    FD_SET(listen_fd, &master_set);
    int max_fd = listen_fd;

    for (;;) {
        fd_set read_set = master_set;   /* select modifies the set */

        int ready = select(max_fd + 1, &read_set, NULL, NULL, NULL);
        if (ready < 0) { perror("select"); break; }

        for (int fd = 0; fd <= max_fd; fd++) {
            if (!FD_ISSET(fd, &read_set))
                continue;

            if (fd == listen_fd) {
                /* New connection */
                struct sockaddr_in client;
                socklen_t clen = sizeof(client);
                int conn = accept(listen_fd,
                                  (struct sockaddr *)&client, &clen);
                if (conn < 0) { perror("accept"); continue; }
                if (conn >= FD_SETSIZE) {
                    fprintf(stderr, "fd %d exceeds FD_SETSIZE\n", conn);
                    close(conn);
                    continue;
                }

                FD_SET(conn, &master_set);
                if (conn > max_fd) max_fd = conn;

                char ip[INET_ADDRSTRLEN];
                inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
                printf("+ %s:%d (fd %d)\n",
                       ip, ntohs(client.sin_port), conn);
            } else {
                /* Data from existing client */
                char buf[1024];
                ssize_t n = read(fd, buf, sizeof(buf));
                if (n <= 0) {
                    printf("- fd %d disconnected\n", fd);
                    close(fd);
                    FD_CLR(fd, &master_set);
                } else {
                    write(fd, buf, n);
                }
            }
        }
    }

    close(listen_fd);
    return 0;
}

The fd_set API

Macro/FunctionPurpose
FD_ZERO(&set)Clear all bits
FD_SET(fd, &set)Add fd to set
FD_CLR(fd, &set)Remove fd from set
FD_ISSET(fd, &set)Test if fd is in set
select(nfds, r, w, e, t)Block until fd(s) ready or timeout

The first argument to select() is the highest fd number plus one. The kernel scans from 0 to nfds-1.

Caution: FD_SETSIZE is typically 1024 on Linux. If your server opens fd 1024 or higher, FD_SET writes out of bounds, corrupting memory silently. This is undefined behavior, not a clean error. For servers that may handle more than ~1000 connections, use poll() or epoll instead.

Try It: Connect 5 clients to the select server using nc 127.0.0.1 7878. Type in different terminals and verify they all echo independently with no threads.

select() with Timeout

/* select_timeout.c -- wait for stdin with a 3-second timeout */
#include <stdio.h>
#include <sys/select.h>
#include <unistd.h>

int main(void)
{
    printf("Type something within 3 seconds...\n");

    fd_set fds;
    FD_ZERO(&fds);
    FD_SET(STDIN_FILENO, &fds);

    struct timeval tv;
    tv.tv_sec  = 3;
    tv.tv_usec = 0;

    int ret = select(STDIN_FILENO + 1, &fds, NULL, NULL, &tv);
    if (ret > 0 && FD_ISSET(STDIN_FILENO, &fds)) {
        char buf[256];
        ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
        buf[n] = '\0';
        printf("You typed: %s", buf);
    } else if (ret == 0) {
        printf("Timeout!\n");
    } else {
        perror("select");
    }
    return 0;
}

Caution: On Linux, select() modifies the timeval struct to reflect remaining time. Do not reuse it across calls without re-initializing. This behavior is Linux-specific and not portable.

poll() in C

poll() fixes the fd limit problem. Instead of a fixed-size bitmask, it takes an array of struct pollfd. You can monitor as many fds as the system allows.

/* poll_server.c -- single-threaded multi-client echo with poll() */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <poll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define MAX_FDS 4096

int main(void)
{
    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(7879);
    bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(listen_fd, 128);
    printf("poll server on port 7879\n");

    struct pollfd fds[MAX_FDS];
    int nfds = 0;

    /* First entry: the listening socket */
    fds[0].fd     = listen_fd;
    fds[0].events = POLLIN;
    nfds = 1;

    for (;;) {
        int ready = poll(fds, nfds, -1);  /* -1 = block forever */
        if (ready < 0) { perror("poll"); break; }

        /* Check listening socket first */
        if (fds[0].revents & POLLIN) {
            struct sockaddr_in client;
            socklen_t clen = sizeof(client);
            int conn = accept(listen_fd,
                              (struct sockaddr *)&client, &clen);
            if (conn >= 0 && nfds < MAX_FDS) {
                fds[nfds].fd     = conn;
                fds[nfds].events = POLLIN;
                nfds++;

                char ip[INET_ADDRSTRLEN];
                inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
                printf("+ %s:%d (fd %d, slot %d)\n",
                       ip, ntohs(client.sin_port), conn, nfds - 1);
            } else {
                if (conn >= 0) close(conn);  /* too many fds */
            }
        }

        /* Check client sockets */
        for (int i = 1; i < nfds; i++) {
            if (fds[i].revents & (POLLIN | POLLERR | POLLHUP)) {
                char buf[1024];
                ssize_t n = read(fds[i].fd, buf, sizeof(buf));
                if (n <= 0) {
                    printf("- fd %d disconnected\n", fds[i].fd);
                    close(fds[i].fd);
                    /* Swap with last entry to compact array */
                    fds[i] = fds[--nfds];
                    i--;  /* re-check this slot */
                } else {
                    write(fds[i].fd, buf, n);
                }
            }
        }
    }

    close(listen_fd);
    return 0;
}

struct pollfd

struct pollfd {
    int   fd;       /* file descriptor */
    short events;   /* requested events (input) */
    short revents;  /* returned events (output) */
};
FlagMeaning
POLLINData available to read
POLLOUTWriting will not block
POLLERRError condition (output only)
POLLHUPHang up (output only)
POLLNVALInvalid fd (output only)

Caution: POLLERR and POLLHUP are always monitored even if you do not set them in events. When they fire, you must handle them -- typically by closing the fd.

select vs poll

  select()                              poll()
  +------------------------------------+------------------------------------+
  | Fixed fd limit (FD_SETSIZE=1024)   | No fd limit (array of pollfd)      |
  | Bitmask modified on each call      | revents field written, events kept |
  | Must rebuild fd_set each iteration | Array persists between calls       |
  | O(max_fd) scanning                 | O(nfds) scanning                   |
  | Portable (POSIX, Windows)          | POSIX only (not native Windows)    |
  +------------------------------------+------------------------------------+

  Both share the fundamental limitation:
  The kernel scans the ENTIRE fd list on every call, even if only one fd is ready.
  At 10,000 fds, both spend most of their time scanning fds that have no events.

Monitoring for Writability

Sometimes you need to know when a socket is ready for writing -- for example, after a connect() in non-blocking mode, or when an output buffer was full.

/* poll_write.c -- detect when a non-blocking connect() completes */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <poll.h>
#include <errno.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void)
{
    int fd = socket(AF_INET, SOCK_STREAM, 0);

    /* Set non-blocking */
    int flags = fcntl(fd, F_GETFL, 0);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);

    struct sockaddr_in addr = {0};
    addr.sin_family = AF_INET;
    addr.sin_port   = htons(80);
    inet_pton(AF_INET, "93.184.216.34", &addr.sin_addr); /* example.com */

    int ret = connect(fd, (struct sockaddr *)&addr, sizeof(addr));
    if (ret < 0 && errno != EINPROGRESS) {
        perror("connect"); return 1;
    }

    /* Wait for connection to complete */
    struct pollfd pfd = { .fd = fd, .events = POLLOUT };
    int ready = poll(&pfd, 1, 5000);  /* 5-second timeout */

    if (ready > 0 && (pfd.revents & POLLOUT)) {
        int err = 0;
        socklen_t elen = sizeof(err);
        getsockopt(fd, SOL_SOCKET, SO_ERROR, &err, &elen);
        if (err == 0) {
            printf("Connected!\n");
        } else {
            printf("Connection failed: %s\n", strerror(err));
        }
    } else {
        printf("Timeout or error\n");
    }

    close(fd);
    return 0;
}

Driver Prep: The kernel's internal poll mechanism (struct file_operations.poll) works on the same principle. When you write a character device driver, you implement a poll callback so that userspace select()/poll() works on your device fd.

Rust: Using nix for select and poll

The Rust standard library does not expose select() or poll() directly. The nix crate provides safe wrappers.

poll with nix

// poll_server.rs -- multi-client echo with nix::poll
// Cargo.toml: nix = { version = "0.29", features = ["poll", "net"] }
use nix::poll::{poll, PollFd, PollFlags};
use std::collections::HashMap;
use std::io::{Read, Write};
use std::net::{TcpListener, TcpStream};
use std::os::fd::AsRawFd;

fn main() -> std::io::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:7879")?;
    listener.set_nonblocking(true)?;
    println!("Rust poll server on port 7879");

    let mut poll_fds: Vec<PollFd> = vec![
        PollFd::new(listener.as_raw_fd(), PollFlags::POLLIN),
    ];
    let mut clients: HashMap<i32, TcpStream> = HashMap::new();

    loop {
        let _ready = poll(&mut poll_fds, -1)
            .expect("poll failed");

        let mut new_fds: Vec<PollFd> = Vec::new();
        let mut remove_fds: Vec<i32> = Vec::new();

        for pfd in &poll_fds {
            let revents = pfd.revents().unwrap_or(PollFlags::empty());
            let fd = pfd.as_raw_fd();

            if fd == listener.as_raw_fd() {
                if revents.contains(PollFlags::POLLIN) {
                    // Accept all pending connections
                    loop {
                        match listener.accept() {
                            Ok((stream, addr)) => {
                                println!("+ {}", addr);
                                stream.set_nonblocking(true).ok();
                                let raw = stream.as_raw_fd();
                                new_fds.push(
                                    PollFd::new(raw, PollFlags::POLLIN)
                                );
                                clients.insert(raw, stream);
                            }
                            Err(_) => break,
                        }
                    }
                }
            } else if revents.intersects(
                PollFlags::POLLIN | PollFlags::POLLERR | PollFlags::POLLHUP
            ) {
                let mut buf = [0u8; 1024];
                if let Some(stream) = clients.get_mut(&fd) {
                    match stream.read(&mut buf) {
                        Ok(0) | Err(_) => {
                            println!("- fd {}", fd);
                            remove_fds.push(fd);
                        }
                        Ok(n) => {
                            let _ = stream.write_all(&buf[..n]);
                        }
                    }
                }
            }
        }

        // Remove disconnected clients
        for fd in &remove_fds {
            clients.remove(fd);
            poll_fds.retain(|p| p.as_raw_fd() != *fd);
        }

        // Add new connections
        poll_fds.extend(new_fds);
    }
}

Rust Note: Rust's ownership model prevents the common C bug of using a closed fd. Once the TcpStream is removed from the HashMap, it is dropped, and the fd is closed. No dangling fd in the poll set -- the retain call removes the stale entry.

select with nix

// select_demo.rs -- wait for stdin with timeout using nix::select
// Cargo.toml: nix = { version = "0.29", features = ["select"] }
use nix::sys::select::{select, FdSet};
use nix::sys::time::TimeVal;
use std::io::Read;
use std::os::fd::AsRawFd;

fn main() {
    println!("Type something within 3 seconds...");

    let stdin_fd = std::io::stdin().as_raw_fd();
    let mut read_fds = FdSet::new();
    read_fds.insert(stdin_fd);

    let mut timeout = TimeVal::new(3, 0);

    match select(
        stdin_fd + 1,
        Some(&mut read_fds),
        None,
        None,
        Some(&mut timeout),
    ) {
        Ok(n) if n > 0 => {
            let mut buf = [0u8; 256];
            let n = std::io::stdin().read(&mut buf).unwrap();
            print!("You typed: {}", String::from_utf8_lossy(&buf[..n]));
        }
        Ok(_) => println!("Timeout!"),
        Err(e) => eprintln!("select error: {}", e),
    }
}

When to Use What

  Connections     Recommendation
  -----------     -------------------------------------------
  < 10            select() is fine, simple and portable
  10 - 1000       poll() removes the fd limit
  > 1000          epoll (next chapter) -- O(1) notification

Both select and poll have the same fundamental scaling problem: on every call, the kernel walks the entire list of file descriptors to check which are ready. With 10,000 fds, this linear scan dominates the server's CPU time. The next chapter introduces epoll, which solves this.

Knowledge Check

  1. What is FD_SETSIZE and why is it dangerous to exceed it with select()?
  2. How does poll() avoid the fd limit problem of select()?
  3. Why do both select() and poll() have O(n) per-call overhead?

Common Pitfalls

  • Not re-initializing fd_set -- select() modifies the set in place. You must copy the master set before each call.
  • Exceeding FD_SETSIZE -- silent memory corruption. No error, no warning, just data corruption and crashes.
  • Forgetting to handle POLLERR/POLLHUP -- the fd is signaled but reading from it yields an error. Infinite busy-loop if not handled.
  • Not compacting the pollfd array -- leaving closed fds in the array with fd = -1 works (poll ignores them) but wastes scanning time.
  • Assuming select() timeout is preserved -- on Linux, timeval is updated to reflect remaining time. Reuse without reinitializing gives shorter and shorter timeouts until you are busy-polling.
  • Using select() for high-fd-count servers -- it was designed in 1983 for a handful of file descriptors. Use poll() or epoll instead.

epoll: Scalable Event-Driven I/O

select and poll scan every file descriptor on every call. With 50,000 connections, most of them idle, you spend most of your CPU time checking fds that have nothing to report. epoll fixes this by maintaining a ready list inside the kernel. Only fds that actually have events appear in the results. This is O(1) with respect to the total number of monitored fds and O(k) with respect to the number of ready fds.

This chapter builds a complete single-threaded event loop from scratch, covers level-triggered vs edge-triggered semantics, and connects to the Rust ecosystem through the nix and mio crates.

The Three epoll Calls

  epoll_create1(flags)          --> returns an epoll fd
  epoll_ctl(epfd, op, fd, ev)   --> add/modify/remove a watched fd
  epoll_wait(epfd, events, max, timeout) --> wait for ready fds
  Kernel                          Userspace
  +------------------+
  | epoll instance   |
  |  interest list:  |            epoll_ctl(ADD, fd=5)
  |   [fd=5, fd=9]  | <--------- epoll_ctl(ADD, fd=9)
  |                  |
  |  ready list:     |            epoll_wait() blocks...
  |   [fd=5]         | ---------> returns: fd=5 has EPOLLIN
  +------------------+

Only fd 5 is ready. The kernel does not scan fd 9 at all. With 50,000 fds and 3 ready, epoll_wait returns immediately with just those 3.

A Complete epoll Echo Server in C

/* epoll_server.c -- single-threaded echo server using epoll */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define MAX_EVENTS 64
#define BUF_SIZE   4096

static int set_nonblocking(int fd)
{
    int flags = fcntl(fd, F_GETFL, 0);
    if (flags < 0) return -1;
    return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}

int main(void)
{
    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (listen_fd < 0) { perror("socket"); return 1; }

    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {0};
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port        = htons(7878);

    if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        perror("bind"); return 1;
    }
    if (listen(listen_fd, 128) < 0) {
        perror("listen"); return 1;
    }
    set_nonblocking(listen_fd);

    /* Create epoll instance */
    int epfd = epoll_create1(0);
    if (epfd < 0) { perror("epoll_create1"); return 1; }

    struct epoll_event ev;
    ev.events  = EPOLLIN;
    ev.data.fd = listen_fd;
    epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);

    struct epoll_event events[MAX_EVENTS];
    printf("epoll echo server on port 7878\n");

    for (;;) {
        int nready = epoll_wait(epfd, events, MAX_EVENTS, -1);
        if (nready < 0) {
            if (errno == EINTR) continue;
            perror("epoll_wait");
            break;
        }

        for (int i = 0; i < nready; i++) {
            int fd = events[i].data.fd;

            if (fd == listen_fd) {
                /* Accept all pending connections */
                for (;;) {
                    struct sockaddr_in client;
                    socklen_t clen = sizeof(client);
                    int conn = accept(listen_fd,
                                      (struct sockaddr *)&client, &clen);
                    if (conn < 0) {
                        if (errno == EAGAIN || errno == EWOULDBLOCK)
                            break;  /* no more pending */
                        perror("accept");
                        break;
                    }
                    set_nonblocking(conn);

                    ev.events  = EPOLLIN | EPOLLET;  /* edge-triggered */
                    ev.data.fd = conn;
                    epoll_ctl(epfd, EPOLL_CTL_ADD, conn, &ev);

                    char ip[INET_ADDRSTRLEN];
                    inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
                    printf("+ %s:%d (fd %d)\n",
                           ip, ntohs(client.sin_port), conn);
                }
            } else {
                /* Data from a client (edge-triggered: drain fully) */
                char buf[BUF_SIZE];
                for (;;) {
                    ssize_t n = read(fd, buf, sizeof(buf));
                    if (n < 0) {
                        if (errno == EAGAIN || errno == EWOULDBLOCK)
                            break;  /* no more data right now */
                        /* Real error */
                        close(fd);
                        break;
                    }
                    if (n == 0) {
                        /* Client disconnected */
                        printf("- fd %d\n", fd);
                        close(fd);
                        break;
                    }
                    /* Echo back */
                    ssize_t written = 0;
                    while (written < n) {
                        ssize_t w = write(fd, buf + written, n - written);
                        if (w < 0) {
                            if (errno == EAGAIN) break;
                            close(fd);
                            goto next_event;
                        }
                        written += w;
                    }
                }
                next_event: ;
            }
        }
    }

    close(epfd);
    close(listen_fd);
    return 0;
}

Compile and run: gcc -o epoll_server epoll_server.c && ./epoll_server. Test with multiple nc 127.0.0.1 7878 sessions.

struct epoll_event and the data Union

struct epoll_event {
    uint32_t     events;   /* EPOLLIN, EPOLLOUT, EPOLLET, ... */
    epoll_data_t data;     /* user data returned with event   */
};

typedef union epoll_data {
    void    *ptr;     /* pointer to your own struct */
    int      fd;      /* file descriptor */
    uint32_t u32;
    uint64_t u64;
} epoll_data_t;

The data field is your tag. The kernel passes it back to you untouched in epoll_wait. Most simple servers use data.fd. Complex servers store a pointer to a connection struct:

struct connection {
    int fd;
    char read_buf[4096];
    size_t read_len;
    /* ... */
};

struct connection *conn = malloc(sizeof(*conn));
conn->fd = accepted_fd;

struct epoll_event ev;
ev.events  = EPOLLIN | EPOLLET;
ev.data.ptr = conn;
epoll_ctl(epfd, EPOLL_CTL_ADD, accepted_fd, &ev);

Then in the event loop: struct connection *c = events[i].data.ptr;

Level-Triggered vs Edge-Triggered

This is the most important distinction in epoll.

  Level-triggered (default):
  "Notify me AS LONG AS the fd is ready"

  Edge-triggered (EPOLLET):
  "Notify me ONCE WHEN the fd BECOMES ready"
  Data arrives:    [####]

  Level-triggered:
    epoll_wait -> EPOLLIN   (data available)
    read(100 bytes)         (still 300 bytes left)
    epoll_wait -> EPOLLIN   (still ready -- data remains)
    read(300 bytes)
    epoll_wait -> blocks    (no more data)

  Edge-triggered:
    epoll_wait -> EPOLLIN   (data just arrived)
    read(100 bytes)         (still 300 bytes left)
    epoll_wait -> BLOCKS    (no NEW data arrived -- edge already fired)
    *** 300 bytes stuck in the buffer forever ***

Caution: With edge-triggered mode, you MUST read until EAGAIN on every notification. If you stop reading early, the remaining data is stranded. The kernel will not notify you again until NEW data arrives.

The Edge-Triggered + Non-Blocking Pattern

This is the canonical pattern that all high-performance servers use:

  1. Set the fd to O_NONBLOCK
  2. Register with EPOLLET
  3. On EPOLLIN, loop read() until it returns EAGAIN
  4. On EPOLLOUT, loop write() until it returns EAGAIN
  while (true) {
      n = read(fd, buf, sizeof(buf));
      if (n > 0) {
          process(buf, n);
          continue;
      }
      if (n < 0 && errno == EAGAIN) {
          break;   // <-- all data consumed, wait for next edge
      }
      if (n == 0) {
          close(fd);  // client disconnected
          break;
      }
      // n < 0 && errno != EAGAIN: real error
      close(fd);
      break;
  }

Why Edge-Triggered?

Level-triggered is simpler and less error-prone. So why bother with edge-triggered?

Thundering herd. If multiple threads each have their own epoll_wait on the same epoll fd (a common pattern), level-triggered wakes ALL of them when data arrives. Only one can read() successfully; the rest wake up for nothing. Edge-triggered fires only once, waking a single thread.

Efficiency. Level-triggered can cause redundant wake-ups. If you know you are going to drain the entire buffer anyway, edge-triggered avoids the kernel re-checking readiness on the next epoll_wait.

In practice, most applications start with level-triggered and switch to edge-triggered only when they need the performance.

EPOLLONESHOT

For multi-threaded servers where multiple threads call epoll_wait, EPOLLONESHOT disables the fd after one event fires. You must re-arm it with EPOLL_CTL_MOD after processing. This guarantees exactly one thread handles a given fd at a time.

/* Register with EPOLLONESHOT */
ev.events  = EPOLLIN | EPOLLET | EPOLLONESHOT;
ev.data.fd = conn_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, conn_fd, &ev);

/* After processing, re-arm */
ev.events  = EPOLLIN | EPOLLET | EPOLLONESHOT;
ev.data.fd = conn_fd;
epoll_ctl(epfd, EPOLL_CTL_MOD, conn_fd, &ev);

The Reactor Pattern

The event loop in the epoll server is an instance of the reactor pattern:

  +----------------------------+
  |       Event Loop           |
  |  +----------------------+  |
  |  |    epoll_wait()      |  |
  |  +----------+-----------+  |
  |             |              |
  |    +--------+--------+    |
  |    |                 |    |
  |  accept           read    |
  |  handler          handler |
  |    |                 |    |
  |  register         process |
  |  new fd           + reply |
  +----------------------------+

The reactor:

  1. Waits for events (demultiplexing)
  2. Dispatches each event to a handler
  3. Handlers are non-blocking and complete quickly
  4. Returns to step 1

This single-threaded design handles thousands of connections with one thread, zero locks, zero context switches.

In production, you use data.ptr to store per-connection state (read buffers, write queues, protocol state machines). The epoll echo server above uses data.fd for simplicity, but real servers like nginx, Redis, and memcached all use the pointer variant with handler dispatch. This is the skeleton every event-driven C server builds on.

Try It: Modify the epoll echo server to use data.ptr with a struct connection that includes a write buffer. When write() returns EAGAIN, store the remaining data and register for EPOLLOUT. When the fd becomes writable, flush the buffer and switch back to EPOLLIN.

Rust: epoll via the nix Crate

// epoll_server.rs -- event loop using nix::sys::epoll
// Cargo.toml: nix = { version = "0.29", features = ["epoll", "net", "fs"] }
use nix::sys::epoll::*;
use std::collections::HashMap;
use std::io::{self, Read, Write};
use std::net::{TcpListener, TcpStream};
use std::os::fd::{AsRawFd, RawFd};

fn set_nonblocking(stream: &TcpStream) {
    stream.set_nonblocking(true).expect("set_nonblocking");
}

fn main() -> io::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:7878")?;
    listener.set_nonblocking(true)?;
    println!("Rust epoll server on port 7878");

    let epfd = Epoll::new(EpollCreateFlags::empty())
        .expect("epoll_create");

    let listen_fd = listener.as_raw_fd();
    epfd.add(
        &listener,
        EpollEvent::new(EpollFlags::EPOLLIN, listen_fd as u64),
    ).expect("epoll_add listener");

    let mut clients: HashMap<RawFd, TcpStream> = HashMap::new();
    let mut events = vec![EpollEvent::empty(); 64];

    loop {
        let n = epfd.wait(&mut events, -1)
            .expect("epoll_wait");

        for i in 0..n {
            let fd = events[i].data() as RawFd;

            if fd == listen_fd {
                loop {
                    match listener.accept() {
                        Ok((stream, addr)) => {
                            println!("+ {}", addr);
                            set_nonblocking(&stream);
                            let raw = stream.as_raw_fd();
                            epfd.add(
                                &stream,
                                EpollEvent::new(
                                    EpollFlags::EPOLLIN | EpollFlags::EPOLLET,
                                    raw as u64,
                                ),
                            ).expect("epoll_add client");
                            clients.insert(raw, stream);
                        }
                        Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => {
                            break;
                        }
                        Err(e) => {
                            eprintln!("accept: {}", e);
                            break;
                        }
                    }
                }
            } else if let Some(stream) = clients.get_mut(&fd) {
                let mut buf = [0u8; 4096];
                loop {
                    match stream.read(&mut buf) {
                        Ok(0) => {
                            println!("- fd {}", fd);
                            clients.remove(&fd);
                            break;
                        }
                        Ok(n) => {
                            let _ = stream.write_all(&buf[..n]);
                        }
                        Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => {
                            break;
                        }
                        Err(_) => {
                            clients.remove(&fd);
                            break;
                        }
                    }
                }
            }
        }
    }
}

Rust Note: When a TcpStream is removed from the HashMap, it is dropped, which closes the fd. The kernel automatically removes a closed fd from the epoll interest list. No explicit EPOLL_CTL_DEL needed.

Rust: mio for Portable Event Loops

epoll is Linux-only. kqueue is the equivalent on macOS/BSD. The mio crate abstracts over both, providing a single API.

// mio_server.rs -- portable event loop with mio
// Cargo.toml: mio = { version = "1", features = ["net", "os-poll"] }
use mio::net::{TcpListener, TcpStream};
use mio::{Events, Interest, Poll, Token};
use std::collections::HashMap;
use std::io::{self, Read, Write};

const LISTENER: Token = Token(0);

fn main() -> io::Result<()> {
    let mut poll = Poll::new()?;
    let mut events = Events::with_capacity(128);

    let addr = "0.0.0.0:7878".parse().unwrap();
    let mut listener = TcpListener::bind(addr)?;
    poll.registry().register(
        &mut listener, LISTENER, Interest::READABLE)?;

    let mut clients: HashMap<Token, TcpStream> = HashMap::new();
    let mut next_token = 1usize;

    println!("mio server on port 7878");

    loop {
        poll.poll(&mut events, None)?;

        for event in events.iter() {
            match event.token() {
                LISTENER => {
                    loop {
                        match listener.accept() {
                            Ok((mut stream, addr)) => {
                                let token = Token(next_token);
                                next_token += 1;
                                poll.registry().register(
                                    &mut stream,
                                    token,
                                    Interest::READABLE,
                                )?;
                                println!("+ {} (token {})", addr, token.0);
                                clients.insert(token, stream);
                            }
                            Err(ref e)
                                if e.kind() == io::ErrorKind::WouldBlock =>
                            {
                                break;
                            }
                            Err(e) => return Err(e),
                        }
                    }
                }
                token => {
                    let done = if let Some(stream) = clients.get_mut(&token) {
                        let mut buf = [0u8; 4096];
                        let mut closed = false;
                        loop {
                            match stream.read(&mut buf) {
                                Ok(0) => { closed = true; break; }
                                Ok(n) => {
                                    let _ = stream.write_all(&buf[..n]);
                                }
                                Err(ref e)
                                    if e.kind() == io::ErrorKind::WouldBlock =>
                                {
                                    break;
                                }
                                Err(_) => { closed = true; break; }
                            }
                        }
                        closed
                    } else {
                        false
                    };

                    if done {
                        if let Some(mut stream) = clients.remove(&token) {
                            poll.registry().deregister(&mut stream)?;
                            println!("- token {}", token.0);
                        }
                    }
                }
            }
        }
    }
}

Connection to tokio

tokio is Rust's most popular async runtime. Under the hood, it is an epoll (Linux) / kqueue (macOS) event loop built on mio. When you write:

#![allow(unused)]
fn main() {
// Conceptual -- requires tokio runtime setup
async fn handle(mut stream: tokio::net::TcpStream) {
    let (mut reader, mut writer) = stream.split();
    tokio::io::copy(&mut reader, &mut writer).await.unwrap();
}
}

...the .await suspends the task. tokio's reactor (an epoll event loop) resumes it when data arrives. There is no thread per connection, no manual epoll_ctl -- the async/await syntax hides the event loop plumbing you built in this chapter.

  Your code (this chapter):          tokio (same thing, hidden):
  +---------------------------+      +---------------------------+
  | epoll_wait()              |      | runtime.block_on(...)     |
  | -> fd ready               |      | -> task wakes up          |
  | -> call handler(fd)       |      | -> resume .await          |
  | -> handler does read/write|      | -> async fn does I/O      |
  | -> back to epoll_wait     |      | -> task yields at .await  |
  +---------------------------+      +---------------------------+

Driver Prep: The Linux kernel uses a similar event-driven model internally. The waitqueue mechanism wakes sleeping tasks when events occur. Kernel threads and work queues are the kernel's equivalent of the reactor pattern. Understanding epoll deeply prepares you for kernel-level event handling.

Performance Comparison

  10,000 idle connections, 100 active per second:

  select:   scans 10,000 fds per call       ~10,000 operations/call
  poll:     scans 10,000 pollfds per call    ~10,000 operations/call
  epoll:    returns only ~100 ready fds      ~100 operations/call

  At 100,000 connections: select/poll grind to a halt.
  epoll: barely notices.

This is why nginx, Redis, Node.js (libuv), and every modern event-driven server uses epoll on Linux.

Knowledge Check

  1. What is the difference between epoll_create1 and the older epoll_create?
  2. In edge-triggered mode, what happens if you read only part of the available data?
  3. Why does the reactor pattern avoid the need for mutexes?

Common Pitfalls

  • Edge-triggered without draining -- the most common epoll bug. Read until EAGAIN or you will lose data.
  • Forgetting O_NONBLOCK -- edge-triggered epoll with blocking fds deadlocks. A read() call blocks when there is no data, but you will never get another notification because the edge already fired.
  • Stale pointers in data.ptr -- if you free a connection struct but forget to remove the fd from epoll, the next event delivers a dangling pointer. Use-after-free.
  • EPOLL_CTL_DEL on a closed fd -- closing the fd automatically removes it from epoll (if it is the last reference). Calling EPOLL_CTL_DEL after close() returns EBADF. Close last, or skip the explicit delete.
  • Using EPOLLONESHOT without re-arming -- the fd goes silent forever. Every event handler must call EPOLL_CTL_MOD to re-enable.
  • Assuming portability -- epoll is Linux-only. Use kqueue on BSD/macOS, IOCP on Windows, or a library like mio or libuv for cross-platform code.

C Optimization Techniques

Performance matters in systems programming. This chapter covers the tools and techniques that separate amateur C from production C: compiler flags, profiling, cache-aware data layout, and branch prediction hints. The rule is always the same: measure first, optimize second.

Compiler Optimization Levels

GCC and Clang accept -O flags that control how aggressively the compiler transforms your code.

FlagEffect
-O0No optimization. Fastest compile, debuggable.
-O1Basic optimizations. Smaller code.
-O2Most optimizations. Good default for release.
-O3Aggressive: vectorization, inlining, unrolling.
-OsOptimize for size (like -O2 minus bloat).
-OgOptimize for debugging experience.

Let's see the difference on a trivial loop.

/* opt_levels.c */
#include <stdio.h>
#include <time.h>

static long sum_array(const int *arr, int n) {
    long total = 0;
    for (int i = 0; i < n; i++) {
        total += arr[i];
    }
    return total;
}

int main(void) {
    enum { N = 100000000 };
    static int data[N];

    for (int i = 0; i < N; i++)
        data[i] = i & 0xFF;

    struct timespec t0, t1;
    clock_gettime(CLOCK_MONOTONIC, &t0);

    long result = sum_array(data, N);

    clock_gettime(CLOCK_MONOTONIC, &t1);

    double elapsed = (t1.tv_sec - t0.tv_sec)
                   + (t1.tv_nsec - t0.tv_nsec) / 1e9;

    printf("sum = %ld, time = %.4f s\n", result, elapsed);
    return 0;
}

Compile and run at different levels:

$ gcc -O0 -o opt0 opt_levels.c && ./opt0
sum = 12700000000, time = 0.2510 s

$ gcc -O2 -o opt2 opt_levels.c && ./opt2
sum = 12700000000, time = 0.0380 s

$ gcc -O3 -o opt3 opt_levels.c && ./opt3
sum = 12700000000, time = 0.0120 s

At -O3, the compiler auto-vectorizes the loop using SIMD instructions.

Caution: -O3 can change behavior of code that relies on undefined behavior. If your program works at -O0 but breaks at -O2, you have a bug, not a compiler problem.

Looking at What the Compiler Did

Use -S to see assembly, or objdump -d on the binary:

$ gcc -O2 -S -o opt2.s opt_levels.c
$ grep -A5 'sum_array' opt2.s

The Compiler Explorer (godbolt.org) is invaluable for comparing output across flags and compilers. Use it.

Profile-Guided Optimization (PGO)

PGO lets the compiler observe real execution patterns, then recompile with that data.

# Step 1: Instrument
$ gcc -O2 -fprofile-generate -o opt_pgo_gen opt_levels.c

# Step 2: Run with representative input
$ ./opt_pgo_gen

# Step 3: Recompile using the profile
$ gcc -O2 -fprofile-use -o opt_pgo opt_levels.c

PGO helps the compiler make better inlining, branching, and layout decisions. Typical improvement: 5-20% on real workloads.

Profiling with perf

Never guess where time is spent. Use perf.

$ gcc -O2 -g -o opt2 opt_levels.c
$ perf stat ./opt2

This gives you cycle counts, cache misses, branch mispredictions, and IPC (instructions per cycle).

For function-level profiling:

$ perf record -g ./opt2
$ perf report

perf report shows a call-graph breakdown. Look for the hottest functions first.

Profiling with gprof

$ gcc -O2 -pg -o opt_gprof opt_levels.c
$ ./opt_gprof
$ gprof opt_gprof gmon.out | head -30

gprof adds instrumentation overhead but gives call counts and cumulative time.

Profiling with Valgrind/Callgrind

$ gcc -O2 -g -o opt2 opt_levels.c
$ valgrind --tool=callgrind ./opt2
$ callgrind_annotate callgrind.out.* | head -40

Callgrind simulates the CPU cache hierarchy. It's slow (20-50x), but gives exact instruction counts and cache miss data.

Try It: Compile opt_levels.c at -O0 and -O3. Run perf stat on both. Compare the "instructions" and "cache-misses" lines.

Cache-Friendly Data Layout

Modern CPUs are fast. Memory is slow. A cache miss costs 100+ cycles. Data layout determines cache behavior.

Array of Structs (AoS) vs Struct of Arrays (SoA)

AoS (Array of Structs):
+------+------+------+------+------+------+------+------+
| x[0] | y[0] | z[0] | w[0] | x[1] | y[1] | z[1] | w[1] | ...
+------+------+------+------+------+------+------+------+

SoA (Struct of Arrays):
+------+------+------+------+------+------+------+------+
| x[0] | x[1] | x[2] | x[3] | y[0] | y[1] | y[2] | y[3] | ...
+------+------+------+------+------+------+------+------+

If you iterate over all elements but only touch x, SoA wins because cache lines contain only x values.

/* cache_layout.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define N 10000000

/* Array of Structs */
struct Particle_AoS {
    float x, y, z;
    float vx, vy, vz;
    float mass;
    float pad; /* 32 bytes total */
};

/* Struct of Arrays */
struct Particles_SoA {
    float *x, *y, *z;
    float *vx, *vy, *vz;
    float *mass;
};

static double now(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec + ts.tv_nsec / 1e9;
}

int main(void) {
    /* AoS test */
    struct Particle_AoS *aos = malloc(N * sizeof(*aos));
    for (int i = 0; i < N; i++) {
        aos[i].x = (float)i;
        aos[i].mass = 1.0f;
    }

    double t0 = now();
    float sum_aos = 0;
    for (int i = 0; i < N; i++)
        sum_aos += aos[i].x * aos[i].mass;
    double t1 = now();
    printf("AoS: sum=%.0f  time=%.4f s\n", sum_aos, t1 - t0);

    /* SoA test */
    struct Particles_SoA soa;
    soa.x    = malloc(N * sizeof(float));
    soa.mass = malloc(N * sizeof(float));
    for (int i = 0; i < N; i++) {
        soa.x[i] = (float)i;
        soa.mass[i] = 1.0f;
    }

    t0 = now();
    float sum_soa = 0;
    for (int i = 0; i < N; i++)
        sum_soa += soa.x[i] * soa.mass[i];
    t1 = now();
    printf("SoA: sum=%.0f  time=%.4f s\n", sum_soa, t1 - t0);

    free(aos);
    free(soa.x);
    free(soa.mass);
    return 0;
}
$ gcc -O2 -o cache_layout cache_layout.c -lm
$ ./cache_layout
AoS: sum=...  time=0.0280 s
SoA: sum=...  time=0.0090 s

SoA wins because only 8 bytes per element touch the cache (x + mass), not 32.

Driver Prep: Kernel DMA buffer layout affects device performance. The same AoS-vs-SoA trade-off applies to descriptor rings in network drivers.

Branch Prediction Hints

CPUs predict branches. Mispredictions cost ~15 cycles. You can hint the compiler with __builtin_expect.

/* branch_hints.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define likely(x)   __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

static long process(const int *data, int n) {
    long sum = 0;
    for (int i = 0; i < n; i++) {
        if (unlikely(data[i] < 0)) {
            /* Error path: rarely taken */
            sum -= data[i];
        } else {
            sum += data[i];
        }
    }
    return sum;
}

int main(void) {
    enum { N = 100000000 };
    int *data = malloc(N * sizeof(int));

    /* 99.9% positive values */
    for (int i = 0; i < N; i++)
        data[i] = (i % 1000 == 0) ? -1 : i & 0xFF;

    struct timespec t0, t1;
    clock_gettime(CLOCK_MONOTONIC, &t0);
    long result = process(data, N);
    clock_gettime(CLOCK_MONOTONIC, &t1);

    double elapsed = (t1.tv_sec - t0.tv_sec)
                   + (t1.tv_nsec - t0.tv_nsec) / 1e9;
    printf("sum=%ld  time=%.4f s\n", result, elapsed);

    free(data);
    return 0;
}

The Linux kernel defines likely() and unlikely() macros everywhere. Use them on error-checking branches.

The restrict Keyword

restrict tells the compiler that two pointers don't alias (don't point to overlapping memory). This enables vectorization.

/* restrict_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void add_arrays(float *restrict dst,
                const float *restrict a,
                const float *restrict b,
                int n) {
    for (int i = 0; i < n; i++)
        dst[i] = a[i] + b[i];
}

void add_arrays_no_restrict(float *dst,
                            const float *a,
                            const float *b,
                            int n) {
    for (int i = 0; i < n; i++)
        dst[i] = a[i] + b[i];
}

int main(void) {
    enum { N = 50000000 };
    float *a = malloc(N * sizeof(float));
    float *b = malloc(N * sizeof(float));
    float *c = malloc(N * sizeof(float));

    for (int i = 0; i < N; i++) {
        a[i] = (float)i;
        b[i] = (float)(N - i);
    }

    struct timespec t0, t1;

    clock_gettime(CLOCK_MONOTONIC, &t0);
    add_arrays(c, a, b, N);
    clock_gettime(CLOCK_MONOTONIC, &t1);
    printf("restrict:    %.4f s\n",
           (t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);

    clock_gettime(CLOCK_MONOTONIC, &t0);
    add_arrays_no_restrict(c, a, b, N);
    clock_gettime(CLOCK_MONOTONIC, &t1);
    printf("no restrict: %.4f s\n",
           (t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);

    free(a); free(b); free(c);
    return 0;
}

Without restrict, the compiler must assume dst might overlap a or b, preventing SIMD optimization.

Caution: If you lie to the compiler with restrict and the pointers actually alias, you get undefined behavior. The compiler will generate wrong code.

Loop Unrolling

The compiler can unroll loops at -O2/-O3, but you can also do it manually or with pragmas:

/* unroll.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

long sum_unrolled(const int *arr, int n) {
    long s0 = 0, s1 = 0, s2 = 0, s3 = 0;
    int i = 0;

    /* Process 4 elements per iteration */
    for (; i + 3 < n; i += 4) {
        s0 += arr[i];
        s1 += arr[i + 1];
        s2 += arr[i + 2];
        s3 += arr[i + 3];
    }
    /* Handle remainder */
    for (; i < n; i++)
        s0 += arr[i];

    return s0 + s1 + s2 + s3;
}

long sum_simple(const int *arr, int n) {
    long total = 0;
    for (int i = 0; i < n; i++)
        total += arr[i];
    return total;
}

int main(void) {
    enum { N = 100000000 };
    int *data = malloc(N * sizeof(int));
    for (int i = 0; i < N; i++)
        data[i] = i & 0xFF;

    struct timespec t0, t1;

    clock_gettime(CLOCK_MONOTONIC, &t0);
    long r1 = sum_simple(data, N);
    clock_gettime(CLOCK_MONOTONIC, &t1);
    printf("simple:   sum=%ld  %.4f s\n", r1,
           (t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);

    clock_gettime(CLOCK_MONOTONIC, &t0);
    long r2 = sum_unrolled(data, N);
    clock_gettime(CLOCK_MONOTONIC, &t1);
    printf("unrolled: sum=%ld  %.4f s\n", r2,
           (t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);

    free(data);
    return 0;
}

Manual unrolling with multiple accumulators (s0-s3) breaks data dependencies and lets the CPU pipeline fill.

GCC also supports:

#pragma GCC unroll 8
for (int i = 0; i < n; i++)
    total += arr[i];

Rust Optimization

Rust uses LLVM. The same optimization principles apply.

# Debug build (like -O0)
$ cargo build

# Release build (like -O2 + LTO)
$ cargo build --release

In Cargo.toml:

[profile.release]
opt-level = 3
lto = true
codegen-units = 1
// src/main.rs — cache-friendly iteration
use std::time::Instant;

const N: usize = 10_000_000;

struct ParticlesAoS {
    data: Vec<(f32, f32, f32, f32)>, // x, y, z, mass
}

struct ParticlesSoA {
    x: Vec<f32>,
    mass: Vec<f32>,
}

fn main() {
    // AoS
    let aos = ParticlesAoS {
        data: (0..N).map(|i| (i as f32, 0.0, 0.0, 1.0)).collect(),
    };

    let t0 = Instant::now();
    let sum_aos: f32 = aos.data.iter().map(|(x, _, _, m)| x * m).sum();
    let d_aos = t0.elapsed();

    // SoA
    let soa = ParticlesSoA {
        x: (0..N).map(|i| i as f32).collect(),
        mass: vec![1.0; N],
    };

    let t0 = Instant::now();
    let sum_soa: f32 = soa.x.iter().zip(&soa.mass).map(|(x, m)| x * m).sum();
    let d_soa = t0.elapsed();

    println!("AoS: sum={sum_aos:.0} time={d_aos:?}");
    println!("SoA: sum={sum_soa:.0} time={d_soa:?}");
}

Rust Note: Rust iterators compile to the same tight loops as C for loops at -O2. No overhead. The compiler auto-vectorizes them just like it would a C loop.

Rust: Profiling

Use cargo flamegraph or perf directly:

$ cargo build --release
$ perf stat ./target/release/myapp
$ perf record -g ./target/release/myapp
$ perf report

For Criterion-based benchmarks:

#![allow(unused)]
fn main() {
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn bench_sum(c: &mut Criterion) {
    let data: Vec<i32> = (0..1_000_000).map(|i| (i & 0xFF) as i32).collect();
    c.bench_function("sum", |b| {
        b.iter(|| {
            let sum: i64 = data.iter().map(|&x| x as i64).sum();
            black_box(sum)
        })
    });
}

criterion_group!(benches, bench_sum);
criterion_main!(benches);
}

black_box prevents the compiler from optimizing away the computation.

Rust Note: Rust has no direct equivalent of restrict. The borrow checker ensures &mut references are unique, which gives LLVM the same aliasing information automatically.

Measuring: The Golden Rule

Rule: If you didn't measure it, you don't know it's slow.
Rule: If you didn't measure it after, you don't know you fixed it.

Micro-benchmark checklist

  1. Warm up the cache first (run the function once before timing).
  2. Use clock_gettime(CLOCK_MONOTONIC) in C, Instant::now() in Rust.
  3. Run multiple iterations and take the median, not the mean.
  4. Disable CPU frequency scaling during benchmarks.
  5. Use volatile or black_box to prevent dead-code elimination.
/* prevent_dce.c — prevent dead code elimination */
#include <stdio.h>
#include <time.h>

/* Force the compiler to keep the result */
static void do_not_optimize(void *p) {
    __asm__ volatile("" : : "g"(p) : "memory");
}

int main(void) {
    struct timespec t0, t1;
    long sum = 0;

    clock_gettime(CLOCK_MONOTONIC, &t0);
    for (int i = 0; i < 100000000; i++)
        sum += i;
    do_not_optimize(&sum);
    clock_gettime(CLOCK_MONOTONIC, &t1);

    double elapsed = (t1.tv_sec - t0.tv_sec)
                   + (t1.tv_nsec - t0.tv_nsec) / 1e9;
    printf("sum=%ld  time=%.4f s\n", sum, elapsed);
    return 0;
}

Try It: Compile prevent_dce.c with -O3 but without the do_not_optimize call. What happens to the loop? Check the assembly.

Optimization Decision Flowchart

Is it fast enough?
  |
  +-- YES --> Stop. Ship it.
  |
  +-- NO --> Profile it.
              |
              Where is the time?
              |
              +-- CPU bound --> Check -O level, restrict, SIMD
              |
              +-- Memory bound --> Check data layout, cache misses
              |
              +-- I/O bound --> Check syscall count, buffering
              |
              +-- Branch misses --> Check branch patterns, likely/unlikely

Quick Knowledge Check

  1. Why is -O0 useful even though it produces slow code?
  2. What does restrict promise the compiler, and what happens if you lie?
  3. When does SoA beat AoS?

Common Pitfalls

  • Optimizing without profiling. You will optimize the wrong thing.
  • Benchmarking at -O0. That measures the interpreter, not your algorithm.
  • Forgetting warm-up. Cold caches give misleading first-run numbers.
  • Using gettimeofday for benchmarks. It's not monotonic. Use clock_gettime(CLOCK_MONOTONIC).
  • Assuming -O3 is always better than -O2. Aggressive inlining can blow up the instruction cache.
  • restrict on aliased pointers. Undefined behavior, silently wrong.
  • Optimizing for one CPU. -march=native won't run on other machines.

Memory Pools and Arena Allocators

malloc and free are general-purpose. General-purpose means slow for specific patterns. When you allocate thousands of short-lived objects of the same size, or build a parse tree that you discard all at once, custom allocators win by an order of magnitude. This chapter builds both an arena allocator and a pool allocator from scratch.

Why malloc Is Sometimes Too Slow

malloc must handle any size, any order of free, and thread safety. That flexibility costs:

  • Metadata overhead per allocation (typically 16-32 bytes).
  • Fragmentation from interleaved alloc/free.
  • Lock contention in multi-threaded programs.
  • System calls (brk/mmap) when the free list is empty.

For patterns like "allocate many, free all at once" or "allocate fixed-size blocks rapidly," we can do much better.

Arena Allocator: Bump and Reset

An arena is a contiguous block of memory. Allocation bumps a pointer forward. Freeing individual objects is not supported -- you free everything at once.

Arena layout:

+---------------------------------------------------+
| used memory          | free space                  |
+---------------------------------------------------+
^                      ^                             ^
base                   offset                        base + capacity

Arena in C

/* arena.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    uint8_t *base;
    size_t   capacity;
    size_t   offset;
} Arena;

Arena arena_create(size_t capacity) {
    Arena a;
    a.base = (uint8_t *)malloc(capacity);
    if (!a.base) {
        fprintf(stderr, "arena: malloc failed\n");
        exit(1);
    }
    a.capacity = capacity;
    a.offset = 0;
    return a;
}

void *arena_alloc(Arena *a, size_t size, size_t align) {
    /* Align the current offset */
    size_t aligned = (a->offset + align - 1) & ~(align - 1);
    if (aligned + size > a->capacity) {
        fprintf(stderr, "arena: out of memory (%zu requested, %zu free)\n",
                size, a->capacity - a->offset);
        return NULL;
    }
    void *ptr = a->base + aligned;
    a->offset = aligned + size;
    return ptr;
}

void arena_reset(Arena *a) {
    a->offset = 0;
}

void arena_destroy(Arena *a) {
    free(a->base);
    a->base = NULL;
    a->capacity = 0;
    a->offset = 0;
}

/* ---- demo ---- */

typedef struct {
    int x, y;
    char label[24];
} Point;

int main(void) {
    Arena arena = arena_create(1024 * 1024); /* 1 MB */

    /* Allocate 1000 Points -- no individual free needed */
    Point **points = arena_alloc(&arena, 1000 * sizeof(Point *),
                                 _Alignof(Point *));

    for (int i = 0; i < 1000; i++) {
        points[i] = arena_alloc(&arena, sizeof(Point), _Alignof(Point));
        points[i]->x = i;
        points[i]->y = i * 2;
        snprintf(points[i]->label, sizeof(points[i]->label), "pt_%d", i);
    }

    printf("Point 42: (%d, %d) \"%s\"\n",
           points[42]->x, points[42]->y, points[42]->label);
    printf("Arena used: %zu / %zu bytes\n", arena.offset, arena.capacity);

    /* Free everything at once */
    arena_reset(&arena);
    printf("After reset: %zu bytes used\n", arena.offset);

    arena_destroy(&arena);
    return 0;
}
$ gcc -O2 -o arena arena.c && ./arena
Point 42: (42, 84) "pt_42"
Arena used: 40000 / 1048576 bytes
After reset: 0 bytes used

That's the entire allocator: 20 lines of logic. No free list, no metadata per object, no fragmentation.

Try It: Add a function arena_alloc_string(Arena *a, const char *s) that copies a string into the arena and returns a pointer to it. Hint: use strlen + arena_alloc + memcpy.

Arena Performance vs malloc

/* arena_bench.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>

typedef struct {
    uint8_t *base;
    size_t   capacity;
    size_t   offset;
} Arena;

Arena arena_create(size_t cap) {
    Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
    return a;
}

void *arena_alloc(Arena *a, size_t size) {
    size_t aligned = (a->offset + 7) & ~(size_t)7;
    void *p = a->base + aligned;
    a->offset = aligned + size;
    return p;
}

static double now(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec + ts.tv_nsec / 1e9;
}

int main(void) {
    enum { N = 1000000 };

    /* Benchmark malloc */
    void **ptrs = malloc(N * sizeof(void *));
    double t0 = now();
    for (int i = 0; i < N; i++)
        ptrs[i] = malloc(64);
    double t1 = now();
    printf("malloc: %.4f s\n", t1 - t0);
    for (int i = 0; i < N; i++)
        free(ptrs[i]);
    free(ptrs);

    /* Benchmark arena */
    Arena a = arena_create((size_t)N * 72);
    t0 = now();
    for (int i = 0; i < N; i++)
        arena_alloc(&a, 64);
    t1 = now();
    printf("arena:  %.4f s\n", t1 - t0);
    free(a.base);

    return 0;
}

Typical result: arena allocation is 5-20x faster than malloc for small objects.

Pool Allocator: Fixed-Size Blocks

A pool allocator manages blocks of identical size. Freed blocks go onto a free list for reuse.

Pool layout (block size = 32 bytes):

+--------+--------+--------+--------+--------+
| block0 | block1 | block2 | block3 | block4 | ...
+--------+--------+--------+--------+--------+

Free list (embedded in unused blocks):

block2 -> block0 -> block4 -> NULL

The trick: when a block is free, we store the free-list pointer inside the block itself. No extra metadata.

/* pool.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    uint8_t *memory;
    void    *free_list;
    size_t   block_size;
    size_t   block_count;
} Pool;

Pool pool_create(size_t block_size, size_t block_count) {
    /* Block must be large enough to hold a pointer */
    if (block_size < sizeof(void *))
        block_size = sizeof(void *);

    Pool p;
    p.block_size  = block_size;
    p.block_count = block_count;
    p.memory      = (uint8_t *)malloc(block_size * block_count);
    if (!p.memory) {
        fprintf(stderr, "pool: malloc failed\n");
        exit(1);
    }

    /* Build the free list */
    p.free_list = NULL;
    for (size_t i = 0; i < block_count; i++) {
        void *block = p.memory + i * block_size;
        *(void **)block = p.free_list;
        p.free_list = block;
    }

    return p;
}

void *pool_alloc(Pool *p) {
    if (!p->free_list) {
        fprintf(stderr, "pool: exhausted\n");
        return NULL;
    }
    void *block = p->free_list;
    p->free_list = *(void **)block;
    return block;
}

void pool_free(Pool *p, void *block) {
    *(void **)block = p->free_list;
    p->free_list = block;
}

void pool_destroy(Pool *p) {
    free(p->memory);
    p->memory = NULL;
    p->free_list = NULL;
}

/* ---- demo ---- */

typedef struct {
    int id;
    double value;
} Record;

int main(void) {
    Pool pool = pool_create(sizeof(Record), 1024);

    Record *r1 = pool_alloc(&pool);
    Record *r2 = pool_alloc(&pool);
    Record *r3 = pool_alloc(&pool);

    r1->id = 1; r1->value = 3.14;
    r2->id = 2; r2->value = 2.72;
    r3->id = 3; r3->value = 1.41;

    printf("r1: id=%d val=%.2f\n", r1->id, r1->value);
    printf("r2: id=%d val=%.2f\n", r2->id, r2->value);

    /* Return r2 to pool */
    pool_free(&pool, r2);

    /* Reuse that block */
    Record *r4 = pool_alloc(&pool);
    r4->id = 4; r4->value = 9.81;
    printf("r4: id=%d val=%.2f (reused r2's block)\n", r4->id, r4->value);
    printf("r4 == r2 address? %s\n", (r4 == r2) ? "yes" : "no");

    pool_destroy(&pool);
    return 0;
}
$ gcc -O2 -o pool pool.c && ./pool
r1: id=1 val=3.14
r2: id=2 val=2.72
r4: id=4 val=9.81 (reused r2's block)
r4 == r2 address? yes

Caution: Using a pool-allocated block after pool_free is use-after-free. The pool won't detect it. You'll corrupt the free list.

Rust: bumpalo and typed-arena

Rust has crate-level arena allocators that integrate with the borrow checker.

bumpalo (bump allocator)

// Cargo.toml: bumpalo = "3"
use bumpalo::Bump;

fn main() {
    let arena = Bump::new();

    // Allocate values -- they live as long as `arena`
    let x = arena.alloc(42_i32);
    let y = arena.alloc(3.14_f64);
    let s = arena.alloc_str("hello from the arena");

    println!("x = {x}, y = {y}, s = \"{s}\"");
    println!("Arena used: {} bytes", arena.allocated_bytes());

    // Everything freed when `arena` drops
}

typed-arena (single-type arena)

// Cargo.toml: typed-arena = "2"
use typed_arena::Arena;

struct Node {
    value: i32,
    label: String,
}

fn main() {
    let arena = Arena::new();

    let nodes: Vec<&Node> = (0..1000)
        .map(|i| {
            arena.alloc(Node {
                value: i,
                label: format!("node_{i}"),
            })
        })
        .collect();

    println!("Node 42: value={}, label=\"{}\"",
             nodes[42].value, nodes[42].label);
}

Rust Note: Rust arenas return references (&T) with the arena's lifetime. The borrow checker prevents use-after-free at compile time. This is the biggest difference from C arenas, where dangling pointers are your problem.

When to Use Each Allocator

+------------------+----------------------------+---------------------------+
| Pattern          | Allocator                  | Why                       |
+------------------+----------------------------+---------------------------+
| Parse a request, | Arena                      | Alloc many, free all at   |
| process, discard |                            | once. Zero fragmentation. |
+------------------+----------------------------+---------------------------+
| Game loop:       | Arena (per-frame)          | Reset at frame boundary.  |
| alloc per frame  |                            | No GC pauses.             |
+------------------+----------------------------+---------------------------+
| Connection pool: | Pool                       | Fixed-size blocks. Fast   |
| reuse objects    |                            | alloc/free. Reuse memory. |
+------------------+----------------------------+---------------------------+
| Mixed sizes,     | malloc/free (or jemalloc)  | General-purpose is fine   |
| long lifetimes   |                            | when patterns are random. |
+------------------+----------------------------+---------------------------+

Real-World Patterns

Protocol Parser with Arena

A network server receives a packet, parses headers and fields into an arena, processes the request, then resets the arena for the next packet.

/* parse_loop.c — sketch of arena-based packet parsing */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    uint8_t *base;
    size_t   capacity;
    size_t   offset;
} Arena;

Arena arena_create(size_t cap) {
    Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
    return a;
}

void *arena_alloc(Arena *a, size_t size) {
    size_t aligned = (a->offset + 7) & ~(size_t)7;
    void *p = a->base + aligned;
    a->offset = aligned + size;
    return p;
}

void arena_reset(Arena *a) { a->offset = 0; }

typedef struct {
    char *method;   /* "GET", "POST", etc. */
    char *path;     /* "/index.html" */
    int   num_headers;
} HttpRequest;

/* Parse a (fake) request into the arena */
HttpRequest *parse_request(Arena *a, const char *raw) {
    HttpRequest *req = arena_alloc(a, sizeof(HttpRequest));

    /* Copy method */
    req->method = arena_alloc(a, 8);
    strncpy(req->method, raw, 3);
    req->method[3] = '\0';

    /* Copy path */
    req->path = arena_alloc(a, 256);
    strncpy(req->path, raw + 4, 11);
    req->path[11] = '\0';

    req->num_headers = 3; /* placeholder */
    return req;
}

int main(void) {
    Arena arena = arena_create(4096);

    /* Simulate processing 3 requests */
    const char *requests[] = {
        "GET /index.html HTTP/1.1",
        "POST /api/data HTTP/1.1",
        "GET /style.css HTTP/1.1",
    };

    for (int i = 0; i < 3; i++) {
        arena_reset(&arena); /* free previous request's data */

        HttpRequest *req = parse_request(&arena, requests[i]);
        printf("Request %d: method=%s path=%s headers=%d (arena=%zu bytes)\n",
               i, req->method, req->path, req->num_headers, arena.offset);
    }

    free(arena.base);
    return 0;
}
$ gcc -O2 -o parse_loop parse_loop.c && ./parse_loop
Request 0: method=GET path=/index.html headers=3 (arena=288 bytes)
Request 1: method=POS path=/api/data HT headers=3 (arena=288 bytes)
Request 2: method=GET path=/style.css  headers=3 (arena=288 bytes)

Rust: Per-Request Arena

use bumpalo::Bump;

struct Request<'a> {
    method: &'a str,
    path: &'a str,
}

fn parse_request<'a>(arena: &'a Bump, raw: &str) -> Request<'a> {
    let parts: Vec<&str> = raw.splitn(3, ' ').collect();
    Request {
        method: arena.alloc_str(parts[0]),
        path: arena.alloc_str(parts[1]),
    }
}

fn main() {
    let requests = [
        "GET /index.html HTTP/1.1",
        "POST /api/data HTTP/1.1",
        "GET /style.css HTTP/1.1",
    ];

    let arena = Bump::new();
    for (i, raw) in requests.iter().enumerate() {
        arena.reset();
        let req = parse_request(&arena, raw);
        println!("Request {i}: {} {} (arena={} bytes)",
                 req.method, req.path, arena.allocated_bytes());
    }
}

Driver Prep: Kernel memory allocation uses slab allocators (kmem_cache), which are essentially pool allocators for fixed-size kernel objects. The concepts here map directly to kmem_cache_create / kmem_cache_alloc / kmem_cache_free in the kernel.

Growing an Arena

The simple arena above has a fixed capacity. A production arena grows by chaining blocks:

+--------+     +--------+     +--------+
| block1 | --> | block2 | --> | block3 |
| 4 KB   |     | 8 KB   |     | 16 KB  |   (double each time)
+--------+     +--------+     +--------+

Reset means: keep block1, free the rest. This gives good amortized performance without wasting memory on small workloads.

Try It: Extend the C arena to support growing. When arena_alloc runs out of space, allocate a new block (double the previous size), link it, and continue. arena_reset should free all blocks except the first.

Quick Knowledge Check

  1. Why can an arena allocator skip tracking individual frees?
  2. How does a pool allocator store its free list without extra metadata?
  3. When is malloc the right choice over an arena or pool?

Common Pitfalls

  • Using arena memory after reset. All pointers are invalidated. Same as use-after-free.
  • Pool block too small. Must be at least sizeof(void*) to hold the free-list pointer.
  • Forgetting alignment. Bumping by size without aligning causes bus errors on strict-alignment architectures (ARM).
  • Arena for long-lived objects. If you can't reset, the arena just grows forever. Use a pool or malloc.
  • Thread safety. Neither allocator above is thread-safe. Add a mutex or use per-thread arenas.

Zero-Copy Techniques and Atomics

Copying data is the enemy of performance. Every memcpy wastes CPU cycles and pollutes the cache. This chapter covers zero-copy I/O on Linux and atomic operations for lock-free data structures -- two techniques that separate fast systems code from everything else.

Zero-Copy I/O: Why Copies Hurt

A naive file-to-socket transfer does four copies:

Traditional copy path:

Disk --> Kernel Buffer --> User Buffer --> Kernel Buffer --> NIC
           copy #1          copy #2          copy #3
         (+ context switch)              (+ context switch)

With sendfile, the kernel does it in zero or one copy:

sendfile path:

Disk --> Kernel Buffer ---------> NIC
           (DMA)       (DMA or single copy)
           No user-space involvement

sendfile

/* sendfile_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/sendfile.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <file>\n", argv[0]);
        return 1;
    }

    int filefd = open(argv[1], O_RDONLY);
    if (filefd < 0) { perror("open"); return 1; }

    struct stat st;
    fstat(filefd, &st);

    /* Create a TCP server socket */
    int srv = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port   = htons(9000),
        .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
    };
    bind(srv, (struct sockaddr *)&addr, sizeof(addr));
    listen(srv, 1);
    printf("Listening on :9000, waiting for connection...\n");

    int client = accept(srv, NULL, NULL);
    if (client < 0) { perror("accept"); return 1; }

    /* Zero-copy transfer */
    off_t offset = 0;
    ssize_t sent = sendfile(client, filefd, &offset, st.st_size);
    printf("Sent %zd bytes via sendfile\n", sent);

    close(client);
    close(srv);
    close(filefd);
    return 0;
}
$ gcc -O2 -o sendfile_demo sendfile_demo.c
$ ./sendfile_demo /etc/passwd &
$ nc localhost 9000 | wc -c

splice

splice moves data between two file descriptors via a kernel pipe buffer. No user-space copy.

/* splice_demo.c */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

int main(void) {
    int infd  = open("/etc/hosts", O_RDONLY);
    int outfd = open("/tmp/hosts_copy", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (infd < 0 || outfd < 0) { perror("open"); return 1; }

    int pipefd[2];
    if (pipe(pipefd) < 0) { perror("pipe"); return 1; }

    /* Move data: file -> pipe -> file, no user-space buffer */
    ssize_t total = 0;
    ssize_t n;
    while ((n = splice(infd, NULL, pipefd[1], NULL, 65536,
                       SPLICE_F_MOVE)) > 0) {
        ssize_t w = splice(pipefd[0], NULL, outfd, NULL, n,
                           SPLICE_F_MOVE);
        if (w < 0) { perror("splice out"); return 1; }
        total += w;
    }

    printf("Copied %zd bytes via splice\n", total);

    close(infd); close(outfd);
    close(pipefd[0]); close(pipefd[1]);
    return 0;
}
$ gcc -O2 -o splice_demo splice_demo.c && ./splice_demo
Copied 221 bytes via splice

Parse In-Place: Avoiding memcpy in Protocols

Instead of copying fields out of a packet buffer, point into the buffer directly:

/* parse_inplace.c */
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>

/* A simple fixed-format message */
struct __attribute__((packed)) Message {
    uint16_t type;
    uint16_t length;
    uint32_t sequence;
    char     payload[0]; /* flexible array member */
};

void process_buffer(const uint8_t *buf, size_t len) {
    /* Parse in-place: cast, don't copy */
    const struct Message *msg = (const struct Message *)buf;

    printf("Type:     %u\n", ntohs(msg->type));
    printf("Length:   %u\n", ntohs(msg->length));
    printf("Sequence: %u\n", ntohl(msg->sequence));
    printf("Payload:  %.*s\n",
           (int)(len - sizeof(struct Message)),
           msg->payload);
}

int main(void) {
    /* Simulate a received buffer */
    uint8_t buf[64];
    struct Message *msg = (struct Message *)buf;
    msg->type     = htons(1);
    msg->length   = htons(13);
    msg->sequence = htonl(42);
    memcpy(msg->payload, "Hello World!", 13);

    process_buffer(buf, sizeof(struct Message) + 13);
    return 0;
}

Caution: In-place parsing requires careful attention to alignment and endianness. On architectures that require aligned access (ARM, SPARC), casting an unaligned buffer to a struct pointer is undefined behavior.

Rust: Zero-Copy Patterns

// zero_copy_parse.rs
use std::convert::TryInto;

fn parse_u16_be(buf: &[u8], offset: usize) -> u16 {
    u16::from_be_bytes(buf[offset..offset + 2].try_into().unwrap())
}

fn parse_u32_be(buf: &[u8], offset: usize) -> u32 {
    u32::from_be_bytes(buf[offset..offset + 4].try_into().unwrap())
}

fn main() {
    // Simulate a received buffer
    let buf: Vec<u8> = vec![
        0x00, 0x01,             // type = 1
        0x00, 0x0D,             // length = 13
        0x00, 0x00, 0x00, 0x2A, // sequence = 42
        b'H', b'e', b'l', b'l', b'o', b' ',
        b'W', b'o', b'r', b'l', b'd', b'!', 0x00,
    ];

    let msg_type = parse_u16_be(&buf, 0);
    let length   = parse_u16_be(&buf, 2);
    let sequence = parse_u32_be(&buf, 4);
    let payload  = std::str::from_utf8(&buf[8..8 + length as usize - 1])
                       .unwrap();

    println!("Type: {msg_type}, Length: {length}, Seq: {sequence}");
    println!("Payload: {payload}");
}

Rust Note: Rust doesn't allow arbitrary pointer casts to structs. Instead, you parse fields explicitly with from_be_bytes. The zerocopy and bytemuck crates provide safe zero-copy deserialization for types that meet alignment and validity requirements.

Atomic Operations

When threads share data without locks, you need atomics. An atomic operation completes indivisibly -- no other thread can see a half-written value.

C: __atomic Builtins

/* atomics_c.c */
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>

static atomic_long counter = 0;

void *increment(void *arg) {
    (void)arg;
    for (int i = 0; i < 1000000; i++) {
        atomic_fetch_add(&counter, 1);
    }
    return NULL;
}

int main(void) {
    pthread_t threads[4];

    for (int i = 0; i < 4; i++)
        pthread_create(&threads[i], NULL, increment, NULL);

    for (int i = 0; i < 4; i++)
        pthread_join(threads[i], NULL);

    printf("Counter: %ld (expected 4000000)\n",
           atomic_load(&counter));
    return 0;
}
$ gcc -O2 -o atomics_c atomics_c.c -lpthread && ./atomics_c
Counter: 4000000 (expected 4000000)

Without atomic_, the result would be less than 4000000 due to data races.

Rust: std::sync::atomic

// atomics_rust.rs
use std::sync::atomic::{AtomicI64, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    let counter = Arc::new(AtomicI64::new(0));

    let handles: Vec<_> = (0..4)
        .map(|_| {
            let c = Arc::clone(&counter);
            thread::spawn(move || {
                for _ in 0..1_000_000 {
                    c.fetch_add(1, Ordering::Relaxed);
                }
            })
        })
        .collect();

    for h in handles {
        h.join().unwrap();
    }

    println!("Counter: {} (expected 4000000)",
             counter.load(Ordering::Relaxed));
}

Memory Ordering

Atomics alone aren't enough. You must specify how operations are ordered relative to other memory accesses.

Ordering Strength (weakest to strongest):

Relaxed    No ordering guarantees. Only atomicity.
Acquire    Reads after this acquire see writes before the matching release.
Release    Writes before this release are visible after the matching acquire.
AcqRel     Both acquire and release.
SeqCst     Total global order. All threads agree on the sequence.

Acquire-Release Pattern

Thread A (producer):             Thread B (consumer):

data = 42;                      while (!ready.load(Acquire)) { }
ready.store(true, Release);     assert(data == 42);  // guaranteed!
           |                                ^
           +--- Release syncs with Acquire--+

Without proper ordering, Thread B might see ready == true but data == 0 because the CPU reordered the stores.

/* acquire_release.c */
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
#include <assert.h>

static int data = 0;
static atomic_int ready = 0;

void *producer(void *arg) {
    (void)arg;
    data = 42;                                    /* non-atomic write */
    atomic_store_explicit(&ready, 1, memory_order_release);
    return NULL;
}

void *consumer(void *arg) {
    (void)arg;
    while (atomic_load_explicit(&ready, memory_order_acquire) == 0) {
        /* spin */
    }
    assert(data == 42); /* guaranteed by acquire-release */
    printf("data = %d (correct!)\n", data);
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, producer, NULL);
    pthread_create(&t2, NULL, consumer, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    return 0;
}
$ gcc -O2 -o acqrel acquire_release.c -lpthread && ./acqrel
data = 42 (correct!)

Rust Acquire-Release

use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    let data = Arc::new(std::sync::Mutex::new(0));
    let ready = Arc::new(AtomicBool::new(false));

    // Use raw pointers for the non-atomic shared data
    // to mirror the C pattern (Rust normally prevents this)
    let shared: Arc<(AtomicBool, std::cell::UnsafeCell<i32>)> =
        Arc::new((AtomicBool::new(false), std::cell::UnsafeCell::new(0)));

    // SAFETY: Acquire-release ordering guarantees visibility
    let s1 = Arc::clone(&shared);
    let producer = thread::spawn(move || {
        unsafe { *s1.1.get() = 42; }
        s1.0.store(true, Ordering::Release);
    });

    let s2 = Arc::clone(&shared);
    let consumer = thread::spawn(move || {
        while !s2.0.load(Ordering::Acquire) {
            std::hint::spin_loop();
        }
        let val = unsafe { *s2.1.get() };
        assert_eq!(val, 42);
        println!("data = {val} (correct!)");
    });

    producer.join().unwrap();
    consumer.join().unwrap();
}

Rust Note: Rust requires unsafe to share non-atomic data between threads without a Mutex. The language forces you to acknowledge the danger explicitly. In practice, prefer Mutex or channels unless you've proven atomics are necessary.

Compare-and-Swap (CAS)

CAS is the fundamental lock-free primitive: "If the value is X, change it to Y. Otherwise, tell me the current value."

/* cas_demo.c */
#include <stdio.h>
#include <stdatomic.h>
#include <stdbool.h>

static atomic_int value = 0;

bool try_increment(int expected_old, int new_val) {
    /* Atomically: if value == expected_old, set to new_val */
    return atomic_compare_exchange_strong(&value, &expected_old, new_val);
}

int main(void) {
    atomic_store(&value, 10);

    /* Try to change 10 -> 20 */
    if (try_increment(10, 20))
        printf("CAS succeeded: %d -> 20\n", 10);
    else
        printf("CAS failed\n");

    /* Try to change 10 -> 30 (will fail, value is now 20) */
    if (try_increment(10, 30))
        printf("CAS succeeded: 10 -> 30\n");
    else
        printf("CAS failed: value is actually %d\n",
               atomic_load(&value));

    return 0;
}
$ gcc -O2 -o cas_demo cas_demo.c && ./cas_demo
CAS succeeded: 10 -> 20
CAS failed: value is actually 20

Rust CAS

use std::sync::atomic::{AtomicI32, Ordering};

fn main() {
    let value = AtomicI32::new(10);

    // Try to change 10 -> 20
    match value.compare_exchange(10, 20,
                                 Ordering::SeqCst,
                                 Ordering::SeqCst) {
        Ok(old)  => println!("CAS succeeded: {old} -> 20"),
        Err(cur) => println!("CAS failed: value is {cur}"),
    }

    // Try to change 10 -> 30 (will fail)
    match value.compare_exchange(10, 30,
                                 Ordering::SeqCst,
                                 Ordering::SeqCst) {
        Ok(old)  => println!("CAS succeeded: {old} -> 30"),
        Err(cur) => println!("CAS failed: value is {cur}"),
    }
}

Lock-Free Stack (Sketch)

A lock-free stack using CAS:

/* lockfree_stack.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdatomic.h>

typedef struct Node {
    int value;
    struct Node *next;
} Node;

typedef struct {
    _Atomic(Node *) top;
} Stack;

void stack_init(Stack *s) {
    atomic_store(&s->top, NULL);
}

void stack_push(Stack *s, int value) {
    Node *node = malloc(sizeof(Node));
    node->value = value;

    Node *old_top;
    do {
        old_top = atomic_load(&s->top);
        node->next = old_top;
    } while (!atomic_compare_exchange_weak(&s->top, &old_top, node));
}

int stack_pop(Stack *s, int *out) {
    Node *old_top;
    Node *new_top;
    do {
        old_top = atomic_load(&s->top);
        if (!old_top) return 0; /* empty */
        new_top = old_top->next;
    } while (!atomic_compare_exchange_weak(&s->top, &old_top, new_top));

    *out = old_top->value;
    free(old_top); /* caution: ABA problem in real code */
    return 1;
}

int main(void) {
    Stack s;
    stack_init(&s);

    stack_push(&s, 10);
    stack_push(&s, 20);
    stack_push(&s, 30);

    int val;
    while (stack_pop(&s, &val))
        printf("popped: %d\n", val);

    return 0;
}
$ gcc -O2 -o lockfree lockfree_stack.c && ./lockfree
popped: 30
popped: 20
popped: 10

Caution: This stack has the ABA problem: if thread A pops node X, thread B pops X and Y then pushes X back, thread A's CAS succeeds but the stack is corrupted. Real lock-free structures use tagged pointers or hazard pointers.

When to Use Atomics vs Mutex

+-------------------+----------------------------------+
| Use Atomics       | Use Mutex                        |
+-------------------+----------------------------------+
| Single counter    | Multiple fields updated together |
| Single flag       | Complex invariants               |
| Hot path, simple  | Readability matters              |
| Statistics        | Anything non-trivial             |
+-------------------+----------------------------------+

Rule of thumb: if you can't explain why your lock-free code is correct in one paragraph, use a mutex.

Driver Prep: The Linux kernel uses atomic operations extensively: atomic_t, atomic_inc(), atomic_dec_and_test(). Memory barriers (smp_mb(), smp_wmb(), smp_rmb()) map to the orderings above. DMA descriptor rings are often lock-free structures using these primitives.

Try It: Modify atomics_c.c to use memory_order_relaxed instead of the default memory_order_seq_cst. Does it still produce the correct count? Why?

Quick Knowledge Check

  1. How many copies does sendfile eliminate compared to read+write?
  2. What does memory_order_acquire guarantee that memory_order_relaxed does not?
  3. What is the ABA problem in lock-free programming?

Common Pitfalls

  • Using memcpy where a pointer would do. Profile first, but default to referencing data in-place.
  • Relaxed ordering everywhere. It works for counters but breaks for publish/subscribe patterns.
  • Forgetting volatile doesn't mean atomic. volatile prevents compiler reordering but not CPU reordering. It's not a substitute for atomics.
  • Lock-free code without formal reasoning. Lock-free is harder to get right than locks. Only use it when profiling proves the lock is the bottleneck.
  • sendfile on non-regular files. It doesn't work with pipes or sockets as the source -- use splice for those.
  • CAS loops without backoff. Under high contention, spinning CAS wastes CPU. Add exponential backoff or yield.

The /proc and /sys Filesystems

Linux exposes the kernel's internal state as files. Want to know how much memory a process uses? Read a file. Want to check CPU topology? Read a file. Want to toggle a GPIO pin? Write to a file. This chapter explores /proc and /sys -- the "everything is a file" philosophy at its most powerful.

/proc: Process and Kernel Information

/proc is a virtual filesystem. Nothing is stored on disk. Every read generates the content on the fly from kernel data structures.

/proc/
  |- 1/                  <-- init process
  |   |- status          <-- process state
  |   |- maps            <-- memory mappings
  |   |- fd/             <-- open file descriptors
  |   +- cmdline         <-- command line
  |- self/               <-- symlink to current process
  |- cpuinfo             <-- CPU details
  |- meminfo             <-- memory statistics
  |- uptime              <-- seconds since boot
  +- loadavg             <-- load averages

Reading /proc/self/maps

Every process can inspect its own memory layout.

/* proc_maps.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int global_var = 42;

int main(void) {
    int stack_var = 99;
    int *heap_var = malloc(sizeof(int));
    *heap_var = 7;

    printf("Addresses:\n");
    printf("  main()      = %p  (text)\n",  (void *)main);
    printf("  global_var  = %p  (data)\n",  (void *)&global_var);
    printf("  heap_var    = %p  (heap)\n",  (void *)heap_var);
    printf("  stack_var   = %p  (stack)\n", (void *)&stack_var);

    printf("\n--- /proc/self/maps ---\n");

    FILE *f = fopen("/proc/self/maps", "r");
    if (!f) { perror("fopen"); return 1; }

    char line[512];
    while (fgets(line, sizeof(line), f)) {
        /* Show only interesting segments */
        if (strstr(line, "proc_maps") ||   /* our binary */
            strstr(line, "[heap]")    ||
            strstr(line, "[stack]")   ||
            strstr(line, "[vdso]")) {
            printf("  %s", line);
        }
    }

    fclose(f);
    free(heap_var);
    return 0;
}
$ gcc -O0 -o proc_maps proc_maps.c && ./proc_maps
Addresses:
  main()      = 0x5599a3b00189  (text)
  global_var  = 0x5599a3b03010  (data)
  heap_var    = 0x5599a4e482a0  (heap)
  stack_var   = 0x7ffc1a2b3c44  (stack)

--- /proc/self/maps ---
  5599a3b00000-5599a3b01000 r-xp ... proc_maps
  5599a4e48000-5599a4e69000 rw-p ... [heap]
  7ffc1a294000-7ffc1a2b5000 rw-p ... [stack]
  7ffc1a2fd000-7ffc1a301000 r-xp ... [vdso]

The maps file shows: address range, permissions (r/w/x/p), offset, device, inode, and pathname.

Try It: Run the program and find which region contains each address. Can you identify the text, data, heap, and stack regions in the maps output?

Reading /proc/[pid]/status

/* proc_status.c */
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    char path[64];
    snprintf(path, sizeof(path), "/proc/%d/status", getpid());

    FILE *f = fopen(path, "r");
    if (!f) { perror("fopen"); return 1; }

    char line[256];
    while (fgets(line, sizeof(line), f)) {
        if (strncmp(line, "Name:", 5) == 0 ||
            strncmp(line, "Pid:", 4) == 0 ||
            strncmp(line, "PPid:", 5) == 0 ||
            strncmp(line, "VmSize:", 7) == 0 ||
            strncmp(line, "VmRSS:", 6) == 0 ||
            strncmp(line, "Threads:", 8) == 0) {
            printf("%s", line);
        }
    }

    fclose(f);
    return 0;
}
$ gcc -o proc_status proc_status.c && ./proc_status
Name:   proc_status
Pid:    12345
PPid:   11000
VmSize:     2104 kB
VmRSS:       768 kB
Threads:        1

Key fields:

  • VmSize: Total virtual memory.
  • VmRSS: Resident Set Size -- how much physical memory is actually used.
  • Threads: Number of threads in the process.

Reading /proc/cpuinfo and /proc/meminfo

/* sysinfo.c */
#include <stdio.h>
#include <string.h>

static void print_matching_lines(const char *path, const char *prefix) {
    FILE *f = fopen(path, "r");
    if (!f) { perror(path); return; }

    char line[256];
    while (fgets(line, sizeof(line), f)) {
        if (strncmp(line, prefix, strlen(prefix)) == 0)
            printf("%s", line);
    }
    fclose(f);
}

int main(void) {
    printf("=== CPU ===\n");
    print_matching_lines("/proc/cpuinfo", "model name");
    print_matching_lines("/proc/cpuinfo", "cpu cores");

    printf("\n=== Memory ===\n");
    print_matching_lines("/proc/meminfo", "MemTotal");
    print_matching_lines("/proc/meminfo", "MemFree");
    print_matching_lines("/proc/meminfo", "MemAvailable");
    print_matching_lines("/proc/meminfo", "SwapTotal");

    return 0;
}
$ gcc -o sysinfo sysinfo.c && ./sysinfo
=== CPU ===
model name      : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
cpu cores       : 8
=== Memory ===
MemTotal:       16384000 kB
MemFree:         4200000 kB
MemAvailable:   12000000 kB
SwapTotal:       8192000 kB

Rust: Reading /proc

// proc_reader.rs
use std::fs;
use std::io::{self, BufRead};

fn read_proc_field(path: &str, prefix: &str) -> io::Result<Vec<String>> {
    let file = fs::File::open(path)?;
    let reader = io::BufReader::new(file);

    let matches: Vec<String> = reader
        .lines()
        .filter_map(|line| {
            let line = line.ok()?;
            if line.starts_with(prefix) {
                Some(line)
            } else {
                None
            }
        })
        .collect();

    Ok(matches)
}

fn main() -> io::Result<()> {
    // Read our own memory maps
    let pid = std::process::id();
    let maps = fs::read_to_string(format!("/proc/{pid}/maps"))?;
    println!("=== Memory Maps (first 5 lines) ===");
    for line in maps.lines().take(5) {
        println!("  {line}");
    }

    // Read system info
    println!("\n=== CPU ===");
    for line in read_proc_field("/proc/cpuinfo", "model name")?.iter().take(1) {
        println!("  {line}");
    }

    println!("\n=== Memory ===");
    for line in &read_proc_field("/proc/meminfo", "MemTotal")? {
        println!("  {line}");
    }
    for line in &read_proc_field("/proc/meminfo", "MemAvailable")? {
        println!("  {line}");
    }

    Ok(())
}

/sys: The Device Model

/sys exposes the kernel's device model. It's organized by bus, class, and device.

/sys/
  |- class/
  |   |- net/             <-- network interfaces
  |   |   |- eth0/
  |   |   +- lo/
  |   |- block/           <-- block devices
  |   +- tty/             <-- terminals
  |- bus/
  |   |- pci/
  |   |- usb/
  |   +- platform/
  |- devices/             <-- device hierarchy
  +- kernel/              <-- kernel parameters

Reading Network Interface Info via sysfs

/* sysfs_net.c */
#include <stdio.h>
#include <string.h>
#include <dirent.h>

static int read_sysfs_str(const char *path, char *buf, size_t len) {
    FILE *f = fopen(path, "r");
    if (!f) return -1;
    if (!fgets(buf, len, f)) {
        fclose(f);
        return -1;
    }
    /* Remove trailing newline */
    buf[strcspn(buf, "\n")] = '\0';
    fclose(f);
    return 0;
}

int main(void) {
    DIR *d = opendir("/sys/class/net");
    if (!d) { perror("opendir"); return 1; }

    struct dirent *entry;
    while ((entry = readdir(d)) != NULL) {
        if (entry->d_name[0] == '.')
            continue;

        char path[256], buf[64];
        printf("Interface: %s\n", entry->d_name);

        /* Read MTU */
        snprintf(path, sizeof(path),
                 "/sys/class/net/%s/mtu", entry->d_name);
        if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
            printf("  MTU:       %s\n", buf);

        /* Read operstate (up/down) */
        snprintf(path, sizeof(path),
                 "/sys/class/net/%s/operstate", entry->d_name);
        if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
            printf("  State:     %s\n", buf);

        /* Read MAC address */
        snprintf(path, sizeof(path),
                 "/sys/class/net/%s/address", entry->d_name);
        if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
            printf("  MAC:       %s\n", buf);

        /* Read speed (may fail for loopback) */
        snprintf(path, sizeof(path),
                 "/sys/class/net/%s/speed", entry->d_name);
        if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
            printf("  Speed:     %s Mbps\n", buf);

        printf("\n");
    }

    closedir(d);
    return 0;
}
$ gcc -o sysfs_net sysfs_net.c && ./sysfs_net
Interface: eth0
  MTU:       1500
  State:     up
  MAC:       00:11:22:33:44:55
  Speed:     1000 Mbps

Interface: lo
  MTU:       65536
  State:     unknown
  MAC:       00:00:00:00:00:00

Writing to sysfs

Some sysfs attributes are writable. This is how you configure hardware from user space.

/* sysfs_write.c — set network interface MTU */
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
    if (argc != 3) {
        fprintf(stderr, "Usage: %s <interface> <mtu>\n", argv[0]);
        return 1;
    }

    char path[256];
    snprintf(path, sizeof(path),
             "/sys/class/net/%s/mtu", argv[1]);

    FILE *f = fopen(path, "w");
    if (!f) {
        perror("fopen (need root?)");
        return 1;
    }

    fprintf(f, "%s\n", argv[2]);
    fclose(f);

    printf("Set %s MTU to %s\n", argv[1], argv[2]);
    return 0;
}
$ gcc -o sysfs_write sysfs_write.c
$ sudo ./sysfs_write eth0 9000
Set eth0 MTU to 9000

Caution: Writing to /sys files can change hardware behavior. Setting a wrong MTU, disabling a device, or modifying power settings can cause system instability. Always check what an attribute does before writing.

GPIO via sysfs (Legacy Interface)

The classic sysfs GPIO interface demonstrates read/write device control. Note that modern Linux prefers the libgpiod character device interface, but sysfs remains common in embedded systems.

/* gpio_sysfs.c — toggle a GPIO pin (legacy interface) */
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>

static int write_file(const char *path, const char *value) {
    int fd = open(path, O_WRONLY);
    if (fd < 0) { perror(path); return -1; }
    write(fd, value, strlen(value));
    close(fd);
    return 0;
}

int main(void) {
    int gpio_num = 17;  /* Example: Raspberry Pi GPIO 17 */

    char buf[64];

    /* Export the GPIO */
    snprintf(buf, sizeof(buf), "%d", gpio_num);
    write_file("/sys/class/gpio/export", buf);

    /* Set direction to output */
    snprintf(buf, sizeof(buf), "/sys/class/gpio/gpio%d/direction", gpio_num);
    write_file(buf, "out");

    /* Toggle the pin */
    snprintf(buf, sizeof(buf), "/sys/class/gpio/gpio%d/value", gpio_num);
    for (int i = 0; i < 10; i++) {
        write_file(buf, (i & 1) ? "1" : "0");
        usleep(500000);  /* 500 ms */
    }

    /* Unexport */
    snprintf(buf, sizeof(buf), "%d", gpio_num);
    write_file("/sys/class/gpio/unexport", buf);

    printf("Done toggling GPIO %d\n", gpio_num);
    return 0;
}

Driver Prep: When you write a kernel driver, you create the sysfs attributes that user-space programs read and write. The device_attribute structure and sysfs_create_file() are the kernel-side API. Everything you're reading here was created by a driver.

Rust: Reading sysfs

// sysfs_reader.rs
use std::fs;
use std::path::Path;

fn read_sysfs(path: &str) -> Option<String> {
    fs::read_to_string(path)
        .ok()
        .map(|s| s.trim().to_string())
}

fn main() {
    let net_dir = Path::new("/sys/class/net");

    let entries = fs::read_dir(net_dir).expect("cannot read /sys/class/net");

    for entry in entries {
        let entry = entry.unwrap();
        let name = entry.file_name();
        let name = name.to_str().unwrap();

        println!("Interface: {name}");

        let base = format!("/sys/class/net/{name}");

        if let Some(mtu) = read_sysfs(&format!("{base}/mtu")) {
            println!("  MTU:   {mtu}");
        }
        if let Some(state) = read_sysfs(&format!("{base}/operstate")) {
            println!("  State: {state}");
        }
        if let Some(mac) = read_sysfs(&format!("{base}/address")) {
            println!("  MAC:   {mac}");
        }

        println!();
    }
}

Udev Rules

Udev runs in user space and reacts to kernel device events. Rules in /etc/udev/rules.d/ can:

  • Set device permissions.
  • Create stable symlinks (/dev/mydevice).
  • Run scripts when devices appear.

Example rule (/etc/udev/rules.d/99-usb-serial.rules):

# When a USB serial adapter is plugged in, create /dev/myserial
SUBSYSTEM=="tty", ATTRS{idVendor}=="1a86", ATTRS{idProduct}=="7523", \
    SYMLINK+="myserial", MODE="0666"

Test rules without reboot:

$ sudo udevadm trigger
$ sudo udevadm test /sys/class/tty/ttyUSB0

"Everything Is a File" in Practice

The /proc and /sys filesystems are the ultimate expression of Unix's "everything is a file" design:

+-------------------+---------------------------+--------------------------+
| What              | File path                 | Operation                |
+-------------------+---------------------------+--------------------------+
| Process memory    | /proc/[pid]/maps          | read                     |
| Kernel version    | /proc/version             | read                     |
| System uptime     | /proc/uptime              | read                     |
| Network MTU       | /sys/class/net/eth0/mtu   | read/write               |
| CPU frequency     | /sys/devices/.../scaling_  | read/write               |
|                   |   cur_freq                |                          |
| Disk scheduler    | /sys/block/sda/queue/     | read/write               |
|                   |   scheduler               |                          |
| LED brightness    | /sys/class/leds/.../       | read/write               |
|                   |   brightness              |                          |
+-------------------+---------------------------+--------------------------+

This means shell scripts, Python, C, Rust -- any language that can read files can control hardware.

Try It: Write a C program that reads /proc/uptime and prints how long the system has been running in hours, minutes, and seconds.

Quick Knowledge Check

  1. What is the difference between /proc and /sys?
  2. Why is VmRSS more useful than VmSize for understanding memory usage?
  3. How does a udev rule differ from directly writing to /sys?

Common Pitfalls

  • Parsing /proc with fixed offsets. Fields can change between kernel versions. Always search for the label.
  • Caching /proc data. It's generated on read. Old data is immediately stale.
  • Writing to /sys without root. Most writable attributes require CAP_SYS_ADMIN or root.
  • Assuming sysfs paths are stable. Hardware topology can change. Use udev rules for stable names.
  • Blocking on /proc reads. Some /proc files (like /proc/kmsg) block. Use non-blocking I/O or poll.
  • String parsing errors. /proc values often have trailing newlines or varying whitespace. Always trim() / strcspn().

ioctl and Device Interaction

When read, write, and lseek aren't enough, there's ioctl. It's the Swiss Army knife of device control -- a single syscall that can do anything the driver author wanted. This chapter explains how ioctl works, how request numbers are encoded, and how to use it to talk to devices from user space.

What ioctl Is

int ioctl(int fd, unsigned long request, ...);

ioctl sends a command to a device driver through an open file descriptor. The request number encodes what to do, and the optional third argument carries data (usually a pointer to a struct).

User space                     Kernel space
+----------+                   +------------------+
| program  |  ioctl(fd, cmd)   | driver           |
|          | ----------------> | .unlocked_ioctl  |
|          | <---------------- | return result     |
+----------+                   +------------------+

ioctl exists because devices have operations that don't fit the read/write model: setting baud rates, querying screen dimensions, ejecting disks, configuring network interfaces.

ioctl Request Number Encoding

On Linux, ioctl numbers are 32-bit values with structure:

Bits:  31..30   29..16    15..8      7..0
       +------+---------+----------+--------+
       | dir  |  size   |   type   | number |
       +------+---------+----------+--------+

dir:    _IOC_NONE (0), _IOC_READ (2), _IOC_WRITE (1), _IOC_READ|_IOC_WRITE (3)
size:   Size of the data argument (14 bits)
type:   Magic number identifying the driver (8 bits)
number: Command number within the driver (8 bits)

The kernel provides macros to build these:

#include <linux/ioctl.h>  /* or <sys/ioctl.h> */

_IO(type, number)              /* No data transfer */
_IOR(type, number, datatype)   /* Read from driver to user */
_IOW(type, number, datatype)   /* Write from user to driver */
_IOWR(type, number, datatype)  /* Both directions */

Example: _IOR('T', 1, struct winsize) means "read from driver, magic type 'T', command 1, data is a struct winsize."

Getting Terminal Size: TIOCGWINSZ

The most common ioctl in everyday programming.

/* term_size.c */
#include <stdio.h>
#include <sys/ioctl.h>
#include <unistd.h>

int main(void) {
    struct winsize ws;

    if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &ws) < 0) {
        perror("ioctl TIOCGWINSZ");
        return 1;
    }

    printf("Terminal size:\n");
    printf("  Rows:    %d\n", ws.ws_row);
    printf("  Columns: %d\n", ws.ws_col);
    printf("  X pixels: %d\n", ws.ws_xpixel);
    printf("  Y pixels: %d\n", ws.ws_ypixel);

    return 0;
}
$ gcc -o term_size term_size.c && ./term_size
Terminal size:
  Rows:    40
  Columns: 120
  X pixels: 0
  Y pixels: 0

The name decodes as: Terminal I/O Control, Get WINdow SiZe.

Setting Terminal Attributes

/* term_raw.c — put terminal in raw mode, then restore */
#include <stdio.h>
#include <unistd.h>
#include <termios.h>

int main(void) {
    struct termios orig, raw;

    /* Save original settings */
    if (tcgetattr(STDIN_FILENO, &orig) < 0) {
        perror("tcgetattr");
        return 1;
    }

    raw = orig;

    /* Disable canonical mode and echo */
    raw.c_lflag &= ~(ICANON | ECHO);
    raw.c_cc[VMIN]  = 1;  /* read at least 1 byte */
    raw.c_cc[VTIME] = 0;  /* no timeout */

    if (tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw) < 0) {
        perror("tcsetattr");
        return 1;
    }

    printf("Raw mode. Press 'q' to quit. Typed characters show as hex.\r\n");

    char c;
    while (read(STDIN_FILENO, &c, 1) == 1) {
        if (c == 'q') break;
        printf("0x%02x\r\n", (unsigned char)c);
    }

    /* Restore original settings */
    tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig);
    printf("\nRestored normal mode.\n");

    return 0;
}
$ gcc -o term_raw term_raw.c && ./term_raw
Raw mode. Press 'q' to quit. Typed characters show as hex.

Under the hood, tcgetattr and tcsetattr call ioctl with TCGETS and TCSETS requests.

Try It: Run term_raw and press arrow keys. You'll see escape sequences (0x1b 0x5b 0x41 for Up). This is how terminal applications detect special keys.

Block Device ioctls

/* blk_size.c — get block device size */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/fs.h>
#include <stdint.h>

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <block_device>\n", argv[0]);
        return 1;
    }

    int fd = open(argv[1], O_RDONLY);
    if (fd < 0) { perror("open"); return 1; }

    /* Get size in bytes */
    uint64_t size_bytes;
    if (ioctl(fd, BLKGETSIZE64, &size_bytes) < 0) {
        perror("ioctl BLKGETSIZE64");
        close(fd);
        return 1;
    }

    /* Get sector size */
    int sector_size;
    if (ioctl(fd, BLKSSZGET, &sector_size) < 0) {
        perror("ioctl BLKSSZGET");
        close(fd);
        return 1;
    }

    /* Get read-only status */
    int readonly;
    if (ioctl(fd, BLKROGET, &readonly) < 0) {
        perror("ioctl BLKROGET");
        close(fd);
        return 1;
    }

    printf("Device:      %s\n", argv[1]);
    printf("Size:        %lu bytes (%.2f GB)\n",
           size_bytes, size_bytes / 1e9);
    printf("Sector size: %d bytes\n", sector_size);
    printf("Read-only:   %s\n", readonly ? "yes" : "no");

    close(fd);
    return 0;
}
$ gcc -o blk_size blk_size.c
$ sudo ./blk_size /dev/sda
Device:      /dev/sda
Size:        500107862016 bytes (500.11 GB)
Sector size: 512 bytes
Read-only:   no

Defining Custom ioctl Numbers

When writing a user-space program that talks to a custom driver, you define matching ioctl numbers.

/* custom_ioctl.h — shared between driver and user-space */
#ifndef CUSTOM_IOCTL_H
#define CUSTOM_IOCTL_H

#include <linux/ioctl.h>

#define MYDEV_MAGIC 'M'

struct mydev_config {
    int  speed;
    int  mode;
    char name[32];
};

/* Commands */
#define MYDEV_GET_CONFIG  _IOR(MYDEV_MAGIC, 0, struct mydev_config)
#define MYDEV_SET_CONFIG  _IOW(MYDEV_MAGIC, 1, struct mydev_config)
#define MYDEV_RESET       _IO(MYDEV_MAGIC, 2)
#define MYDEV_TRANSFER    _IOWR(MYDEV_MAGIC, 3, struct mydev_config)

#endif /* CUSTOM_IOCTL_H */

User-space program using the custom ioctls:

/* user_ioctl.c — user-space side of custom device control */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <string.h>

/* In real code, include custom_ioctl.h */
#include <linux/ioctl.h>

#define MYDEV_MAGIC 'M'

struct mydev_config {
    int  speed;
    int  mode;
    char name[32];
};

#define MYDEV_GET_CONFIG  _IOR(MYDEV_MAGIC, 0, struct mydev_config)
#define MYDEV_SET_CONFIG  _IOW(MYDEV_MAGIC, 1, struct mydev_config)
#define MYDEV_RESET       _IO(MYDEV_MAGIC, 2)

int main(void) {
    int fd = open("/dev/mydevice", O_RDWR);
    if (fd < 0) {
        perror("open /dev/mydevice");
        printf("(This demo needs a matching kernel module loaded.)\n");
        return 1;
    }

    /* Read current config */
    struct mydev_config cfg;
    if (ioctl(fd, MYDEV_GET_CONFIG, &cfg) < 0) {
        perror("ioctl GET_CONFIG");
        close(fd);
        return 1;
    }
    printf("Current: speed=%d mode=%d name=%s\n",
           cfg.speed, cfg.mode, cfg.name);

    /* Modify and write back */
    cfg.speed = 115200;
    cfg.mode  = 3;
    strncpy(cfg.name, "fast_mode", sizeof(cfg.name));

    if (ioctl(fd, MYDEV_SET_CONFIG, &cfg) < 0) {
        perror("ioctl SET_CONFIG");
        close(fd);
        return 1;
    }
    printf("Config updated.\n");

    close(fd);
    return 0;
}

Driver Prep: On the kernel side, the driver's file_operations.unlocked_ioctl function receives these commands. It uses copy_from_user() and copy_to_user() to safely transfer the struct between user and kernel space. This is exactly how real hardware drivers are controlled.

Decoding ioctl Numbers

You can decode any ioctl number:

/* decode_ioctl.c */
#include <stdio.h>
#include <sys/ioctl.h>

int main(void) {
    unsigned long cmd = TIOCGWINSZ;

    printf("TIOCGWINSZ = 0x%lx\n", cmd);
    printf("  Direction: %lu\n", (cmd >> 30) & 3);
    printf("  Size:      %lu bytes\n", (cmd >> 16) & 0x3FFF);
    printf("  Type:      '%c' (0x%02lx)\n",
           (char)((cmd >> 8) & 0xFF), (cmd >> 8) & 0xFF);
    printf("  Number:    %lu\n", cmd & 0xFF);

    return 0;
}
$ gcc -o decode_ioctl decode_ioctl.c && ./decode_ioctl
TIOCGWINSZ = 0x5413
  Direction: 0
  Size:      0 bytes
  Type:      'T' (0x54)
  Number:    19

Note: TIOCGWINSZ is an older ioctl that predates the modern encoding scheme, so the direction and size fields may be zero.

The ioctl vs sysfs Debate

+------------------+----------------------------+---------------------------+
| Aspect           | ioctl                      | sysfs                     |
+------------------+----------------------------+---------------------------+
| Interface        | Binary struct              | Text string               |
| Discovery        | Need header file           | ls /sys/...               |
| Scripting        | Requires C/compiled code   | cat/echo from shell       |
| Performance      | One syscall                | open+read/write+close     |
| Complex data     | Handles structs natively   | Must serialize to text    |
| Debugging        | Opaque without docs        | Self-documenting filenames|
+------------------+----------------------------+---------------------------+

Modern practice: use sysfs for simple attributes (enable/disable, speed, status), use ioctl for complex operations (DMA transfers, firmware upload, bulk configuration).

Rust: ioctl with the nix Crate

The nix crate provides type-safe ioctl wrappers.

// Cargo.toml: nix = { version = "0.27", features = ["ioctl", "term"] }
use nix::libc;
use nix::sys::termios;
use std::os::unix::io::AsRawFd;
use std::io;

// Terminal size ioctl
nix::ioctl_read_bad!(tiocgwinsz, libc::TIOCGWINSZ, libc::winsize);

fn main() -> io::Result<()> {
    let stdout = io::stdout();
    let fd = stdout.as_raw_fd();

    // Get terminal size
    let mut ws = libc::winsize {
        ws_row: 0,
        ws_col: 0,
        ws_xpixel: 0,
        ws_ypixel: 0,
    };

    unsafe {
        tiocgwinsz(fd, &mut ws).expect("TIOCGWINSZ failed");
    }

    println!("Terminal: {} rows x {} cols", ws.ws_row, ws.ws_col);

    // Get terminal attributes using nix's safe wrapper
    let attrs = termios::tcgetattr(fd).expect("tcgetattr failed");
    println!("Input flags:  {:?}", attrs.input_flags);
    println!("Output flags: {:?}", attrs.output_flags);
    println!("Local flags:  {:?}", attrs.local_flags);

    Ok(())
}

Defining Custom ioctls in Rust

use nix::libc;

// Define the same custom ioctls from the C example
const MYDEV_MAGIC: u8 = b'M';

#[repr(C)]
struct MydevConfig {
    speed: i32,
    mode: i32,
    name: [u8; 32],
}

nix::ioctl_read!(mydev_get_config, MYDEV_MAGIC, 0, MydevConfig);
nix::ioctl_write_ptr!(mydev_set_config, MYDEV_MAGIC, 1, MydevConfig);
nix::ioctl_none!(mydev_reset, MYDEV_MAGIC, 2);

fn main() {
    // These would be called as:
    // unsafe { mydev_get_config(fd, &mut cfg) }
    // unsafe { mydev_set_config(fd, &cfg) }
    // unsafe { mydev_reset(fd) }
    println!("Custom ioctl macros defined successfully.");
    println!("(Need /dev/mydevice to actually use them.)");
}

Rust Note: The nix crate's ioctl macros generate unsafe functions because ioctls inherently bypass Rust's type system -- you're passing raw memory to a kernel driver. The unsafe block explicitly marks this trust boundary.

Practical Example: Watchdog Timer

The Linux watchdog (/dev/watchdog) is controlled entirely via ioctls.

/* watchdog_info.c */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/watchdog.h>

int main(void) {
    int fd = open("/dev/watchdog", O_RDWR);
    if (fd < 0) {
        perror("open /dev/watchdog (need root or watchdog group)");
        return 1;
    }

    /* Get watchdog info */
    struct watchdog_info info;
    if (ioctl(fd, WDIOC_GETSUPPORT, &info) == 0) {
        printf("Watchdog: %s\n", info.identity);
        printf("Firmware: %u\n", info.firmware_version);
        printf("Options:  0x%08x\n", info.options);
    }

    /* Get timeout */
    int timeout;
    if (ioctl(fd, WDIOC_GETTIMEOUT, &timeout) == 0)
        printf("Timeout:  %d seconds\n", timeout);

    /* Magic close: write 'V' before closing to prevent reboot */
    write(fd, "V", 1);
    close(fd);

    return 0;
}

Caution: Opening /dev/watchdog starts the watchdog timer. If you don't periodically write to it (or close with the magic 'V' character), the system will reboot. Do not run this on a production system without understanding the consequences.

Try It: Use strace to trace the ioctls of a familiar command: strace -e ioctl ls -l /dev/tty. How many different ioctl requests do you see?

Quick Knowledge Check

  1. What do the four fields in an ioctl number encode?
  2. What is the difference between _IOR and _IOW?
  3. Why does the nix crate mark ioctl functions as unsafe?

Common Pitfalls

  • Wrong ioctl number. A mismatched magic type or command number returns ENOTTY ("inappropriate ioctl for device"). Despite the confusing name, this is the standard error.
  • Wrong data size. If the struct in _IOR doesn't match the kernel's expected size, the ioctl fails or corrupts memory.
  • Missing O_RDWR. Some ioctls require the fd to be opened read-write, even if the ioctl only reads data.
  • Forgetting copy_from_user on the kernel side. Accessing user pointers directly from kernel code is a security vulnerability (and crashes on SMAP- enabled CPUs).
  • Platform differences. ioctl numbers can differ between architectures (32-bit vs 64-bit). Always use the macros, never hardcode numbers.
  • ioctl on the wrong fd. TIOCGWINSZ works on a terminal fd, not a regular file fd. Check what your fd actually points to.

Netlink Sockets

Netlink is Linux's primary mechanism for communication between the kernel and user-space processes. Unlike ioctl, netlink uses a proper socket interface with structured messages, multicast groups, and asynchronous notifications. This chapter shows how to read the routing table, monitor network events, and build a simple network monitor.

Netlink is an AF_NETLINK socket family. Instead of connecting to a remote host, you connect to the kernel.

User space                        Kernel
+------------------+              +------------------+
| netlink socket   | <----------> | netlink subsystem |
| AF_NETLINK       |   messages   | (routing, link,  |
| SOCK_DGRAM       |              |  firewall, ...)  |
+------------------+              +------------------+

Key properties:

  • Message-based (like UDP, not like TCP streams).
  • Supports multicast -- subscribe to kernel event groups.
  • Bidirectional -- query state or receive notifications.
  • Replaces many ioctl-based network configuration interfaces.

Every netlink message starts with struct nlmsghdr:

struct nlmsghdr {
    __u32 nlmsg_len;    /* Total message length (including header) */
    __u16 nlmsg_type;   /* Message type */
    __u16 nlmsg_flags;  /* Flags: NLM_F_REQUEST, NLM_F_DUMP, etc. */
    __u32 nlmsg_seq;    /* Sequence number (for matching replies) */
    __u32 nlmsg_pid;    /* Sending process PID */
};
Message layout:

+------------------+-------------------+------------------+
| nlmsghdr         | payload           | padding          |
| (16 bytes)       | (variable)        | (to 4-byte align)|
+------------------+-------------------+------------------+
|<-------- nlmsg_len -------->|

For route messages, the payload is struct rtmsg followed by route attributes. For link messages, it's struct ifinfomsg followed by link attributes.

Protocol Families

ProtocolPurpose
NETLINK_ROUTERouting, addresses, links, neighbors
NETLINK_GENERICGeneric netlink (extensible)
NETLINK_NETFILTERFirewall (nftables, conntrack)
NETLINK_KOBJECT_UEVENTDevice hotplug events
NETLINK_AUDITKernel audit subsystem

NETLINK_ROUTE is by far the most common.

Reading the Routing Table

/* netlink_routes.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <arpa/inet.h>

#define BUFSIZE 8192

struct nl_request {
    struct nlmsghdr hdr;
    struct rtmsg    msg;
};

static void parse_route(struct nlmsghdr *nlh) {
    struct rtmsg *rtm = NLMSG_DATA(nlh);

    /* Only show main table IPv4 routes */
    if (rtm->rtm_family != AF_INET)
        return;
    if (rtm->rtm_table != RT_TABLE_MAIN)
        return;

    char dst[INET_ADDRSTRLEN] = "0.0.0.0";
    char gw[INET_ADDRSTRLEN]  = "*";
    int  oif = 0;

    struct rtattr *rta = RTM_RTA(rtm);
    int rta_len = RTM_PAYLOAD(nlh);

    while (RTA_OK(rta, rta_len)) {
        switch (rta->rta_type) {
        case RTA_DST:
            inet_ntop(AF_INET, RTA_DATA(rta), dst, sizeof(dst));
            break;
        case RTA_GATEWAY:
            inet_ntop(AF_INET, RTA_DATA(rta), gw, sizeof(gw));
            break;
        case RTA_OIF:
            oif = *(int *)RTA_DATA(rta);
            break;
        }
        rta = RTA_NEXT(rta, rta_len);
    }

    printf("  %-18s via %-15s dev index %d  /%d\n",
           dst, gw, oif, rtm->rtm_dst_len);
}

int main(void) {
    int sock = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
    if (sock < 0) { perror("socket"); return 1; }

    /* Bind to netlink */
    struct sockaddr_nl sa = {
        .nl_family = AF_NETLINK,
        .nl_pid    = getpid(),
    };
    if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
        perror("bind");
        close(sock);
        return 1;
    }

    /* Request a dump of the routing table */
    struct nl_request req = {
        .hdr = {
            .nlmsg_len   = NLMSG_LENGTH(sizeof(struct rtmsg)),
            .nlmsg_type  = RTM_GETROUTE,
            .nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
            .nlmsg_seq   = 1,
            .nlmsg_pid   = getpid(),
        },
        .msg = {
            .rtm_family = AF_INET,
            .rtm_table  = RT_TABLE_MAIN,
        },
    };

    if (send(sock, &req, req.hdr.nlmsg_len, 0) < 0) {
        perror("send");
        close(sock);
        return 1;
    }

    /* Read the response */
    printf("IPv4 Routing Table:\n");
    printf("  %-18s %-17s %-15s %s\n",
           "Destination", "Gateway", "Dev Index", "Prefix");

    char buf[BUFSIZE];
    int done = 0;
    while (!done) {
        ssize_t len = recv(sock, buf, sizeof(buf), 0);
        if (len < 0) { perror("recv"); break; }

        struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
        while (NLMSG_OK(nlh, len)) {
            if (nlh->nlmsg_type == NLMSG_DONE) {
                done = 1;
                break;
            }
            if (nlh->nlmsg_type == NLMSG_ERROR) {
                fprintf(stderr, "Netlink error\n");
                done = 1;
                break;
            }
            parse_route(nlh);
            nlh = NLMSG_NEXT(nlh, len);
        }
    }

    close(sock);
    return 0;
}
$ gcc -O2 -o netlink_routes netlink_routes.c && ./netlink_routes
IPv4 Routing Table:
  Destination        Gateway           Dev Index       Prefix
  0.0.0.0            192.168.1.1       dev index 2  /0
  192.168.1.0        *                 dev index 2  /24
  172.17.0.0         *                 dev index 3  /16

Try It: Modify the program to also show IPv6 routes. Change rtm_family to AF_INET6 and use inet_ntop(AF_INET6, ...) with INET6_ADDRSTRLEN.

Monitoring Network Events

Netlink supports multicast. Subscribe to groups to receive real-time notifications when links go up/down or addresses change.

/* netlink_monitor.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <net/if.h>

#define BUFSIZE 8192

static const char *msg_type_str(int type) {
    switch (type) {
    case RTM_NEWLINK: return "NEW_LINK";
    case RTM_DELLINK: return "DEL_LINK";
    case RTM_NEWADDR: return "NEW_ADDR";
    case RTM_DELADDR: return "DEL_ADDR";
    case RTM_NEWROUTE: return "NEW_ROUTE";
    case RTM_DELROUTE: return "DEL_ROUTE";
    default: return "UNKNOWN";
    }
}

int main(void) {
    int sock = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
    if (sock < 0) { perror("socket"); return 1; }

    struct sockaddr_nl sa = {
        .nl_family = AF_NETLINK,
        .nl_pid    = getpid(),
        .nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR | RTMGRP_IPV4_ROUTE,
    };

    if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
        perror("bind");
        close(sock);
        return 1;
    }

    printf("Monitoring network events (Ctrl+C to stop)...\n");

    char buf[BUFSIZE];
    while (1) {
        ssize_t len = recv(sock, buf, sizeof(buf), 0);
        if (len < 0) { perror("recv"); break; }

        struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
        while (NLMSG_OK(nlh, len)) {
            printf("[%s] ", msg_type_str(nlh->nlmsg_type));

            if (nlh->nlmsg_type == RTM_NEWLINK ||
                nlh->nlmsg_type == RTM_DELLINK) {
                struct ifinfomsg *ifi = NLMSG_DATA(nlh);
                char ifname[IF_NAMESIZE];
                if_indextoname(ifi->ifi_index, ifname);
                printf("Interface: %s (index %d), flags=0x%x %s\n",
                       ifname, ifi->ifi_index, ifi->ifi_flags,
                       (ifi->ifi_flags & IFF_UP) ? "UP" : "DOWN");
            } else {
                printf("type=%d len=%d\n",
                       nlh->nlmsg_type, nlh->nlmsg_len);
            }

            nlh = NLMSG_NEXT(nlh, len);
        }
    }

    close(sock);
    return 0;
}
$ gcc -O2 -o netlink_monitor netlink_monitor.c && sudo ./netlink_monitor
Monitoring network events (Ctrl+C to stop)...
[NEW_LINK] Interface: eth0 (index 2), flags=0x1003 UP
[DEL_LINK] Interface: eth0 (index 2), flags=0x1002 DOWN

In another terminal, toggle an interface:

$ sudo ip link set eth0 down
$ sudo ip link set eth0 up
+------------------+----------------------------+---------------------------+
| Feature          | Netlink                    | ioctl                     |
+------------------+----------------------------+---------------------------+
| Async events     | Yes (multicast groups)     | No (must poll)            |
| Bulk queries     | Yes (NLM_F_DUMP)           | One item at a time        |
| Extensibility    | Attributes (TLV format)    | Fixed struct size         |
| Atomicity        | Can batch operations       | One operation per call    |
| Modern tools     | ip, iw use netlink         | ifconfig uses ioctl       |
| Complexity       | Higher (message parsing)   | Simpler (struct + call)   |
+------------------+----------------------------+---------------------------+

The ip command uses netlink. The old ifconfig command uses ioctl. Netlink is the modern, preferred interface.

Caution: Netlink messages must be properly aligned (NLMSG_ALIGN). Sending a message with wrong length or alignment can cause the kernel to reject it silently or return EINVAL.

This combines everything into a useful tool that watches for interface changes and prints their state.

/* link_watch.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <sys/socket.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <net/if.h>

#define BUFSIZE 8192

int main(void) {
    int sock = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
    if (sock < 0) { perror("socket"); return 1; }

    struct sockaddr_nl sa = {
        .nl_family = AF_NETLINK,
        .nl_groups = RTMGRP_LINK,
    };

    if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
        perror("bind");
        close(sock);
        return 1;
    }

    printf("%-20s %-10s %-8s %s\n",
           "Time", "Interface", "Event", "State");
    printf("%-20s %-10s %-8s %s\n",
           "----", "---------", "-----", "-----");

    char buf[BUFSIZE];
    while (1) {
        ssize_t len = recv(sock, buf, sizeof(buf), 0);
        if (len < 0) break;

        struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
        while (NLMSG_OK(nlh, len)) {
            if (nlh->nlmsg_type == RTM_NEWLINK) {
                struct ifinfomsg *ifi = NLMSG_DATA(nlh);
                char ifname[IF_NAMESIZE] = "???";
                if_indextoname(ifi->ifi_index, ifname);

                /* Get timestamp */
                time_t now = time(NULL);
                struct tm *tm = localtime(&now);
                char timebuf[20];
                strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S", tm);

                const char *state;
                if (ifi->ifi_flags & IFF_RUNNING)
                    state = "RUNNING";
                else if (ifi->ifi_flags & IFF_UP)
                    state = "UP (no carrier)";
                else
                    state = "DOWN";

                printf("%-20s %-10s %-8s %s\n",
                       timebuf, ifname, "CHANGE", state);
                fflush(stdout);
            }
            nlh = NLMSG_NEXT(nlh, len);
        }
    }

    close(sock);
    return 0;
}

The netlink-packet-route and netlink-sys crates provide structured netlink access.

// Cargo.toml dependencies:
// netlink-sys = "0.8"
// netlink-packet-core = "0.7"
// netlink-packet-route = "0.17"

use std::io;
use std::os::unix::io::AsRawFd;

fn main() -> io::Result<()> {
    // Low-level: use raw socket like the C version
    let sock = unsafe {
        libc::socket(libc::AF_NETLINK, libc::SOCK_DGRAM, libc::NETLINK_ROUTE)
    };
    if sock < 0 {
        return Err(io::Error::last_os_error());
    }

    // Bind with RTMGRP_LINK group
    let mut sa: libc::sockaddr_nl = unsafe { std::mem::zeroed() };
    sa.nl_family = libc::AF_NETLINK as u16;
    sa.nl_groups = 1; // RTMGRP_LINK

    let ret = unsafe {
        libc::bind(
            sock,
            &sa as *const _ as *const libc::sockaddr,
            std::mem::size_of::<libc::sockaddr_nl>() as u32,
        )
    };
    if ret < 0 {
        return Err(io::Error::last_os_error());
    }

    println!("Monitoring link events (Ctrl+C to stop)...");

    let mut buf = [0u8; 8192];
    loop {
        let len = unsafe {
            libc::recv(sock, buf.as_mut_ptr() as *mut _, buf.len(), 0)
        };
        if len < 0 {
            return Err(io::Error::last_os_error());
        }
        println!("Received {} bytes of netlink data", len);
    }
}

For a higher-level approach, use the rtnetlink crate:

// Cargo.toml: rtnetlink = "0.13", tokio = { version = "1", features = ["full"] }
use rtnetlink::new_connection;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (connection, handle, _) = new_connection()?;
    tokio::spawn(connection);

    // List all links
    let mut links = handle.link().get().execute();

    use futures::stream::StreamExt;
    while let Some(msg) = links.next().await {
        match msg {
            Ok(link) => {
                let name = link.header.index;
                println!("Link index {}: {:?}", name, link.attributes);
            }
            Err(e) => {
                eprintln!("Error: {e}");
                break;
            }
        }
    }

    Ok(())
}

Rust Note: The rtnetlink crate is async and uses tokio. It provides a much higher-level API than raw netlink sockets, with proper message parsing and type safety. For production code, this is strongly preferred over raw socket manipulation.

Generic netlink allows kernel modules and user-space programs to define custom message families without allocating a dedicated protocol number.

Flow:
1. Kernel module registers a generic netlink family ("my_family")
2. User-space resolves the family name to an ID via the controller
3. Communication proceeds using that dynamic ID

Tools like nl80211 (Wi-Fi configuration) and taskstats use generic netlink.

$ genl-ctrl-list   # (from libnl-utils)
0x0010 nlctrl version 2
0x0015 devlink version 1
0x001b nl80211 version 1
...

Driver Prep: Kernel modules that need a user-space communication channel often use generic netlink. When you write kernel modules, you'll use genl_register_family() to create a netlink family, and user-space programs will talk to your module via generic netlink sockets. This is the modern alternative to creating a custom character device for every module.

Try It: Run netlink_monitor in one terminal. In another terminal, run sudo ip addr add 10.99.99.1/24 dev lo and sudo ip addr del 10.99.99.1/24 dev lo. Watch the NEW_ADDR and DEL_ADDR events appear.

Quick Knowledge Check

  1. What advantages does netlink have over ioctl for network configuration?
  2. What does nl_groups in sockaddr_nl control?
  3. Why does netlink use NLMSG_ALIGN and NLMSG_NEXT macros instead of simple pointer arithmetic?

Common Pitfalls

  • Forgetting NLM_F_DUMP for bulk queries. Without it, you get one entry instead of the full table.
  • Not checking NLMSG_DONE. The kernel sends multi-part responses. You must loop until you see NLMSG_DONE.
  • Buffer too small. Netlink dumps can be large. Use at least 8KB buffers, or better, 32KB.
  • Wrong nl_pid. Set it to getpid() or 0 (let the kernel assign). Using a conflicting PID causes EADDRINUSE.
  • Ignoring NLMSG_ERROR. The kernel reports errors as netlink messages. Always check for error responses.
  • Assuming message order. Multicast events can arrive between dump responses. Use sequence numbers to match requests with replies.

Preparing for Kernel Space

Everything in this book has been user-space code. But every concept -- pointers, bit manipulation, function pointers, state machines, memory layout -- was chosen because it maps directly to kernel programming. This chapter connects the dots: what changes when you cross into kernel space, and how your user-space skills translate.

What Changes When You Cross the Boundary

User space                          Kernel space
+----------------------------------+----------------------------------+
| libc available                   | No libc                          |
| malloc/free                      | kmalloc/kfree (with GFP flags)   |
| printf                           | printk                           |
| Segfaults caught by kernel       | Bugs crash the whole system      |
| Virtual address space per process| Shared address space, all memory |
| Floating point available         | No floating point (usually)      |
| Large stack (8 MB default)       | Tiny stack (8-16 KB)             |
| User can be preempted freely     | Must think about preemption      |
| Errors return -1 and set errno   | Functions return negative errno  |
+----------------------------------+----------------------------------+

The kernel is freestanding C. No standard library, no heap by default, no safety net. Every technique we've practiced -- careful memory management, understanding alignment, defensive error handling -- becomes critical.

The Kernel's C Dialect

Kernel C is C11 (or later) with extensions and restrictions.

No Standard Library

You do not get #include <stdio.h>. Instead:

/* Kernel equivalents */
#include <linux/kernel.h>    /* printk, container_of */
#include <linux/slab.h>      /* kmalloc, kfree */
#include <linux/string.h>    /* memcpy, strcmp (kernel versions) */
#include <linux/types.h>     /* u8, u16, u32, u64, etc. */

printk replaces printf:

/* User space */
printf("value = %d\n", x);

/* Kernel space */
printk(KERN_INFO "value = %d\n", x);
/* or modern style: */
pr_info("value = %d\n", x);

No Floating Point

The kernel does not save/restore FPU state on context switches between kernel threads. Using floating point in kernel code silently corrupts user-space FPU registers.

/* WRONG in kernel code: */
double ratio = bytes / 1024.0;  /* will corrupt user FPU state */

/* CORRECT: use integer math */
unsigned long ratio = bytes / 1024;
unsigned long remainder = bytes % 1024;

If you absolutely need floating point (rare), you must wrap it:

kernel_fpu_begin();
/* ... floating point operations ... */
kernel_fpu_end();

Limited Stack

The kernel stack is typically 8 KB on x86 (two pages). Allocating large arrays on the stack will overflow it -- there's no guard page, just corruption.

/* WRONG in kernel code: */
char buffer[8192];  /* might overflow the entire kernel stack */

/* CORRECT: allocate on the heap */
char *buffer = kmalloc(8192, GFP_KERNEL);
if (!buffer)
    return -ENOMEM;
/* ... use buffer ... */
kfree(buffer);

GFP Flags

kmalloc takes a flags argument that specifies allocation context:

/* Can sleep (normal context, not in interrupt) */
ptr = kmalloc(size, GFP_KERNEL);

/* Cannot sleep (interrupt context, spinlock held) */
ptr = kmalloc(size, GFP_ATOMIC);

/* For DMA-able memory */
ptr = kmalloc(size, GFP_DMA);

Using GFP_KERNEL in interrupt context will deadlock the system. Using GFP_ATOMIC wastes emergency memory reserves. Getting this right is essential.

Caution: A wrong GFP flag is one of the most common kernel bugs. If you hold a spinlock or are in an interrupt handler, you must use GFP_ATOMIC. If you use GFP_KERNEL in that context, the allocator may sleep, and sleeping while holding a spinlock deadlocks the CPU.

Module Basics (Conceptual)

A kernel module is a .ko file loaded at runtime. The minimal structure:

/* hello_module.c -- conceptual, not compilable without kernel headers */
#include <linux/init.h>
#include <linux/module.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("Hello world kernel module");

static int __init hello_init(void)
{
    pr_info("hello: module loaded\n");
    return 0;  /* 0 = success */
}

static void __exit hello_exit(void)
{
    pr_info("hello: module unloaded\n");
}

module_init(hello_init);
module_exit(hello_exit);
$ make -C /lib/modules/$(uname -r)/build M=$(pwd) modules
$ sudo insmod hello.ko
$ dmesg | tail -1
[12345.678] hello: module loaded
$ sudo rmmod hello

The __init and __exit macros let the kernel free the init code after loading and skip the exit code for built-in (non-modular) drivers.

How Your User-Space Skills Map to Kernel Code

list_head: The Kernel's Real Linked List

In Chapter 16, we built linked lists in C. The kernel uses struct list_head -- an intrusive doubly-linked list that is embedded inside the data structure.

/* User space (from Ch16): */
struct node {
    int data;
    struct node *next;
};

/* Kernel: */
#include <linux/list.h>

struct my_item {
    int data;
    struct list_head list;  /* embedded list node */
};

/* Usage: */
LIST_HEAD(my_list);

struct my_item *item = kmalloc(sizeof(*item), GFP_KERNEL);
item->data = 42;
list_add(&item->list, &my_list);

/* Iterate: */
struct my_item *pos;
list_for_each_entry(pos, &my_list, list) {
    pr_info("data = %d\n", pos->data);
}

The container_of macro (which you may have implemented in Ch16) converts a list_head pointer back to the containing structure. This is the same technique, used everywhere in the kernel.

struct my_item layout:

+-----------+------------------+
| data (4B) | list_head (16B)  |
|           | .prev  | .next   |
+-----------+--------+---------+
^           ^
|           |
item        &item->list

container_of(&item->list, struct my_item, list) == item

Function Pointer vtables Become file_operations

In Chapter 18, we built vtables from function pointers. The kernel uses exactly the same pattern for its driver interfaces.

/* User space (from Ch18): */
struct Shape {
    double (*area)(void *self);
    void   (*draw)(void *self);
};

/* Kernel: file_operations for a character device */
#include <linux/fs.h>

static int     mydev_open(struct inode *i, struct file *f) { return 0; }
static ssize_t mydev_read(struct file *f, char __user *buf,
                          size_t len, loff_t *off) { return 0; }
static ssize_t mydev_write(struct file *f, const char __user *buf,
                           size_t len, loff_t *off) { return len; }
static long    mydev_ioctl(struct file *f, unsigned int cmd,
                           unsigned long arg) { return 0; }
static int     mydev_release(struct inode *i, struct file *f) { return 0; }

static const struct file_operations mydev_fops = {
    .owner          = THIS_MODULE,
    .open           = mydev_open,
    .read           = mydev_read,
    .write          = mydev_write,
    .unlocked_ioctl = mydev_ioctl,
    .release        = mydev_release,
};

This is the same struct-of-function-pointers pattern. The kernel dispatches open(), read(), write(), ioctl() through these pointers. Every character device, block device, and network device uses this pattern.

Similarly, platform drivers:

/* Kernel: platform driver operations */
#include <linux/platform_device.h>

static int  mydrv_probe(struct platform_device *pdev)  { return 0; }
static int  mydrv_remove(struct platform_device *pdev) { return 0; }

static struct platform_driver mydrv = {
    .probe  = mydrv_probe,
    .remove = mydrv_remove,
    .driver = {
        .name = "my_device",
    },
};

State Machines Become Driver Lifecycle

In Chapter 19, we built explicit state machines. Kernel drivers are state machines:

Driver Lifecycle State Machine:

  [UNLOADED]
      |
      v  module_init()
  [LOADED]
      |
      v  probe()
  [BOUND TO DEVICE]
      |
      +----> suspend()  --> [SUSPENDED]
      |                         |
      |      resume()   <-------+
      |
      v  remove()
  [UNBOUND]
      |
      v  module_exit()
  [UNLOADED]

Every transition has a corresponding callback in the driver structure. The patterns you practiced -- clear states, explicit transitions, error handling at each step -- are exactly what kernel drivers require.

Bit Manipulation Becomes Register Access

In Part III (Chapters 11-13), we covered bitwise operations, masks, and bit fields. In kernel drivers, you use these to read and write hardware registers.

/* User space (from Part III): */
#define BIT(n)          (1UL << (n))
#define SET_BIT(val, n) ((val) | BIT(n))
#define CLR_BIT(val, n) ((val) & ~BIT(n))

/* Kernel: register access */
#include <linux/io.h>

#define REG_CONTROL   0x00
#define REG_STATUS    0x04
#define CTRL_ENABLE   BIT(0)
#define CTRL_IRQ_EN   BIT(1)
#define STATUS_BUSY   BIT(7)

static void __iomem *base;  /* memory-mapped register base */

/* Enable the device */
u32 val = readl(base + REG_CONTROL);
val |= CTRL_ENABLE | CTRL_IRQ_EN;
writel(val, base + REG_CONTROL);

/* Wait for not busy */
while (readl(base + REG_STATUS) & STATUS_BUSY)
    cpu_relax();

readl and writel are memory-mapped I/O accessors that handle memory barriers and prevent compiler reordering. The bit manipulation is identical to what you learned.

Error Handling in the Kernel

The kernel returns negative errno values, not -1 with a separate errno variable:

/* User space: */
int fd = open(path, O_RDONLY);
if (fd < 0) {
    perror("open");  /* uses errno */
}

/* Kernel: */
static int mydrv_probe(struct platform_device *pdev)
{
    void *buf = kmalloc(1024, GFP_KERNEL);
    if (!buf)
        return -ENOMEM;  /* return the negative errno directly */

    int irq = platform_get_irq(pdev, 0);
    if (irq < 0)
        return irq;  /* pass through the error */

    /* ... */
    return 0;  /* success */
}

The goto-based cleanup pattern from earlier chapters is the standard kernel idiom:

static int mydrv_probe(struct platform_device *pdev)
{
    int ret;

    void *buf = kmalloc(1024, GFP_KERNEL);
    if (!buf)
        return -ENOMEM;

    ret = register_something();
    if (ret)
        goto err_free_buf;

    ret = setup_irq();
    if (ret)
        goto err_unregister;

    return 0;

err_unregister:
    unregister_something();
err_free_buf:
    kfree(buf);
    return ret;
}

This pattern appears in virtually every kernel driver probe function.

Concurrency in the Kernel

The kernel is massively concurrent: multiple CPUs, interrupts, softirqs, workqueues. Everything from Chapters on threads and synchronization applies, but with kernel primitives:

+---------------------+----------------------------+
| User space          | Kernel                     |
+---------------------+----------------------------+
| pthread_mutex_t     | struct mutex               |
| pthread_spinlock_t  | spinlock_t                 |
| sem_t               | struct semaphore           |
| atomic_int          | atomic_t, atomic_long_t    |
| pthread_cond_t      | wait_queue_head_t          |
| read-write lock     | rwlock_t, struct rw_semaphore |
+---------------------+----------------------------+

The key difference: in interrupt context, you cannot sleep, so you must use spinlocks rather than mutexes.

Rust in the Kernel

The Linux kernel has experimental Rust support. Kernel Rust has the same restrictions as kernel C: no standard library, no heap unless explicitly allocated, no floating point.

#![allow(unused)]
fn main() {
// Conceptual kernel module in Rust (requires Rust-for-Linux)
use kernel::prelude::*;

module! {
    type: MyModule,
    name: "my_module",
    license: "GPL",
}

struct MyModule;

impl kernel::Module for MyModule {
    fn init(_module: &'static ThisModule) -> Result<Self> {
        pr_info!("Hello from Rust kernel module!\n");
        Ok(MyModule)
    }
}

impl Drop for MyModule {
    fn drop(&mut self) {
        pr_info!("Goodbye from Rust kernel module!\n");
    }
}
}

Rust kernel modules use:

  • kernel:: crate instead of std::
  • pr_info! instead of println!
  • Result<T> with kernel error types
  • Box backed by kmalloc
  • The borrow checker prevents most use-after-free and data race bugs at compile time

Rust Note: Rust in the kernel is not a replacement for C. It's an additional language option for new drivers and subsystems. Existing kernel C code will not be rewritten. Understanding C kernel programming is still essential even if you plan to write Rust kernel modules.

The Complete Mapping

Here is how every major topic from this book connects to kernel programming:

+------------------------------+---------------------------------------+
| Book Chapter / Topic         | Kernel Equivalent                     |
+------------------------------+---------------------------------------+
| Pointers (Ch6-7)             | __user pointers, void __iomem *       |
| Structs (Ch8-9)              | Every kernel data structure            |
| Bit manipulation (Ch11-13)   | Register access, flag fields           |
| Linked lists (Ch16)          | struct list_head, hlist_head           |
| Function pointers (Ch18)     | file_operations, driver ops            |
| State machines (Ch19)        | Driver probe/remove/suspend/resume     |
| Opaque types (Ch20)          | struct device, struct file (internals) |
| Build system (Ch24-27)       | Kbuild, Kconfig, make menuconfig       |
| File descriptors (Ch28-31)   | struct file, VFS layer                 |
| Processes (Ch32-34)          | kthread, workqueue                     |
| Signals (Ch35-37)            | Kernel signal delivery                 |
| Memory mapping (Ch38-40)     | ioremap, DMA mapping                   |
| Threads (Ch41-43)            | kthread, per-cpu variables             |
| Synchronization (Ch44-46)    | spinlock, mutex, RCU                   |
| Networking (Ch47-49)         | sk_buff, net_device, socket layer      |
| Optimization (Ch50)          | Cache-aligned structs, likely/unlikely |
| Arenas/pools (Ch51)          | Slab allocator (kmem_cache)            |
| Atomics (Ch52)               | atomic_t, memory barriers              |
| /proc and /sys (Ch53)        | Creating procfs/sysfs entries           |
| ioctl (Ch54)                 | Implementing file_operations.ioctl     |
| Netlink (Ch55)               | genl_register_family()                 |
+------------------------------+---------------------------------------+

What to Study Next

  1. Linux Device Drivers (LDD3) -- the classic reference. Some APIs have changed, but the concepts are timeless.

  2. The kernel source itself -- drivers/ contains thousands of real examples. Start with simple ones like drivers/misc/.

  3. QEMU + buildroot -- build a minimal Linux system and test your modules in a VM. No risk of crashing your real machine.

  4. Kernel documentation -- Documentation/ in the kernel tree. Especially driver-api/ and core-api/.

  5. Rust for Linux -- if you want to write kernel modules in Rust, follow the rust-for-linux project.

  6. Write a character device driver -- your first kernel project should be a simple character device that implements open, read, write, and ioctl. You already know every concept required.

Driver Prep: This is it. You've learned the user-space foundations. Every concept in this book -- from pointers to atomics, from bit manipulation to state machines -- was chosen because it's essential in kernel and driver code. You're ready.

Try It: Download the kernel source. Navigate to drivers/misc/dummy-irq.c or drivers/misc/eeprom/at24.c. Read the code. You should recognize the patterns: module init/exit, probe/remove, file_operations, error handling with goto, bit manipulation for registers. If you can read and understand a real kernel driver, you've succeeded.

Quick Knowledge Check

  1. Why can't you use printf in kernel code?
  2. What happens if you use GFP_KERNEL inside an interrupt handler?
  3. How does container_of work, and why is it essential for kernel linked lists?

Common Pitfalls

  • Using malloc in kernel code. There is no libc. Use kmalloc.
  • Large stack allocations. The kernel stack is 8-16 KB. Allocate large buffers with kmalloc.
  • Sleeping in atomic context. If you hold a spinlock or are in an interrupt handler, you must not call anything that might sleep (kmalloc(GFP_KERNEL), mutex_lock(), copy_from_user() -- yes, even that can sleep).
  • Forgetting to free on error paths. The goto cleanup pattern exists for a reason. Every resource acquired must be released in reverse order.
  • Accessing user pointers directly. Always use copy_from_user() / copy_to_user(). Direct access crashes on SMAP-enabled CPUs and is a security vulnerability.
  • No error checking. Every kernel function that can fail must be checked. The kernel does not tolerate ignored errors the way user space sometimes does.

Appendix A: C Standard Library Quick Reference

This appendix covers the most important C standard library functions for systems programming, organized by header. Each entry lists the signature, purpose, and common pitfalls.

stdio.h -- Standard I/O

FunctionSignaturePurpose
printfint printf(const char *fmt, ...)Print formatted output to stdout
fprintfint fprintf(FILE *stream, const char *fmt, ...)Print to any stream
snprintfint snprintf(char *buf, size_t n, const char *fmt, ...)Safe formatted string
scanfint scanf(const char *fmt, ...)Read formatted input from stdin
fopenFILE *fopen(const char *path, const char *mode)Open a file stream
fcloseint fclose(FILE *stream)Close a file stream
freadsize_t fread(void *ptr, size_t size, size_t n, FILE *s)Binary read
fwritesize_t fwrite(const void *ptr, size_t size, size_t n, FILE *s)Binary write
fgetschar *fgets(char *s, int n, FILE *stream)Read a line (safe)
fputsint fputs(const char *s, FILE *stream)Write a string
fseekint fseek(FILE *stream, long offset, int whence)Seek in stream
ftelllong ftell(FILE *stream)Current position
fflushint fflush(FILE *stream)Flush output buffer
perrorvoid perror(const char *s)Print errno message
feofint feof(FILE *stream)Check end-of-file
ferrorint ferror(FILE *stream)Check stream error

Pitfalls:

  • sprintf has no bounds checking. Always use snprintf.
  • gets is removed from C11. Never use it. Use fgets.
  • scanf("%s", buf) has no bounds checking. Use scanf("%63s", buf) with a width specifier.
  • feof returns true only after a read attempt fails. Don't use it as a loop condition before reading.
  • fclose on an already-closed FILE* is undefined behavior.

stdlib.h -- General Utilities

FunctionSignaturePurpose
mallocvoid *malloc(size_t size)Allocate memory
callocvoid *calloc(size_t n, size_t size)Allocate zeroed memory
reallocvoid *realloc(void *ptr, size_t size)Resize allocation
freevoid free(void *ptr)Free memory
exitvoid exit(int status)Terminate process
atexitint atexit(void (*fn)(void))Register exit handler
atoiint atoi(const char *s)String to int (unsafe)
strtollong strtol(const char *s, char **end, int base)String to long (safe)
strtoulunsigned long strtoul(const char *s, char **end, int base)String to unsigned long
strtoddouble strtod(const char *s, char **end)String to double
absint abs(int n)Absolute value
randint rand(void)Pseudo-random integer
srandvoid srand(unsigned seed)Seed random generator
qsortvoid qsort(void *base, size_t n, size_t size, int(*cmp)(...))Sort array
bsearchvoid *bsearch(const void *key, const void *base, ...)Binary search
getenvchar *getenv(const char *name)Get environment variable
systemint system(const char *cmd)Run shell command

Pitfalls:

  • atoi returns 0 on error, which is indistinguishable from converting "0". Use strtol instead.
  • realloc(ptr, 0) behavior is implementation-defined. Don't rely on it.
  • realloc on failure returns NULL but doesn't free the original. Save the original pointer: tmp = realloc(ptr, n); if (tmp) ptr = tmp;
  • rand() is not cryptographically secure. Use getrandom() for security.
  • system() is a shell injection risk. Avoid in production code.

string.h -- String and Memory Operations

FunctionSignaturePurpose
strlensize_t strlen(const char *s)String length
strcpychar *strcpy(char *dst, const char *src)Copy string (unsafe)
strncpychar *strncpy(char *dst, const char *src, size_t n)Copy with limit
strcatchar *strcat(char *dst, const char *src)Concatenate (unsafe)
strncatchar *strncat(char *dst, const char *src, size_t n)Concatenate with limit
strcmpint strcmp(const char *a, const char *b)Compare strings
strncmpint strncmp(const char *a, const char *b, size_t n)Compare n bytes
strchrchar *strchr(const char *s, int c)Find char in string
strrchrchar *strrchr(const char *s, int c)Find char (from end)
strstrchar *strstr(const char *haystack, const char *needle)Find substring
strtokchar *strtok(char *s, const char *delim)Tokenize (modifies string)
memcpyvoid *memcpy(void *dst, const void *src, size_t n)Copy memory (no overlap)
memmovevoid *memmove(void *dst, const void *src, size_t n)Copy memory (overlap ok)
memsetvoid *memset(void *s, int c, size_t n)Fill memory
memcmpint memcmp(const void *a, const void *b, size_t n)Compare memory
strerrorchar *strerror(int errnum)Error number to string

Pitfalls:

  • strncpy does NOT null-terminate if src is longer than n. Always: strncpy(dst, src, n-1); dst[n-1] = '\0';
  • memcpy with overlapping regions is undefined behavior. Use memmove.
  • strtok uses a static buffer and is not thread-safe. Use strtok_r.
  • strcmp returns 0 for equal strings (not 1). The return value is the difference, not a boolean.

math.h -- Mathematics

FunctionSignaturePurpose
sqrtdouble sqrt(double x)Square root
powdouble pow(double base, double exp)Power
fabsdouble fabs(double x)Absolute value
ceildouble ceil(double x)Round up
floordouble floor(double x)Round down
rounddouble round(double x)Round to nearest
logdouble log(double x)Natural logarithm
log10double log10(double x)Base-10 logarithm
sin/cos/tandouble sin(double x)Trig functions (radians)

Pitfalls:

  • Link with -lm on Linux. Math functions are in libm, not libc.
  • sqrt(-1) returns NaN, not an error. Check isnan().

ctype.h -- Character Classification

FunctionPurposeExample
isalpha(c)Letter?isalpha('A') = true
isdigit(c)Digit?isdigit('5') = true
isalnum(c)Letter or digit?
isspace(c)Whitespace?isspace(' ') = true
isupper(c) / islower(c)Case check
toupper(c) / tolower(c)Case conversion

Pitfalls:

  • Argument must be unsigned char or EOF. Passing a signed char with negative value is undefined behavior: isalpha((unsigned char)c).

errno.h -- Error Numbers

Common errno values:

ValueNameMeaning
1EPERMOperation not permitted
2ENOENTNo such file or directory
5EIOI/O error
9EBADFBad file descriptor
11EAGAINTry again (non-blocking)
12ENOMEMOut of memory
13EACCESPermission denied
17EEXISTFile exists
22EINVALInvalid argument
28ENOSPCNo space left on device
32EPIPEBroken pipe

Pitfalls:

  • errno is only valid immediately after a failed call. Successful calls may or may not reset it.
  • errno is thread-local in glibc (actually a macro that expands to (*__errno_location())).

signal.h -- Signal Handling

FunctionSignaturePurpose
signalvoid (*signal(int sig, void (*handler)(int)))(int)Set signal handler
raiseint raise(int sig)Send signal to self
killint kill(pid_t pid, int sig)Send signal to process
sigactionint sigaction(int sig, const struct sigaction *act, ...)Set handler (preferred)
sigprocmaskint sigprocmask(int how, const sigset_t *set, ...)Block/unblock signals

Pitfalls:

  • signal() behavior varies across systems. Use sigaction() instead.
  • Signal handlers must only call async-signal-safe functions. No printf, no malloc.

time.h -- Time Functions

FunctionSignaturePurpose
timetime_t time(time_t *tloc)Current time (seconds)
clockclock_t clock(void)CPU time used
difftimedouble difftime(time_t t1, time_t t0)Time difference
localtimestruct tm *localtime(const time_t *t)Convert to local time
gmtimestruct tm *gmtime(const time_t *t)Convert to UTC
strftimesize_t strftime(char *s, size_t max, const char *fmt, ...)Format time string
clock_gettimeint clock_gettime(clockid_t id, struct timespec *tp)High-resolution time

Pitfalls:

  • localtime returns a pointer to a static buffer (not thread-safe). Use localtime_r.
  • clock() measures CPU time, not wall time. Use clock_gettime(CLOCK_MONOTONIC) for benchmarks.

unistd.h -- POSIX System Interface

FunctionSignaturePurpose
readssize_t read(int fd, void *buf, size_t count)Read from fd
writessize_t write(int fd, const void *buf, size_t count)Write to fd
closeint close(int fd)Close fd
lseekoff_t lseek(int fd, off_t offset, int whence)Seek in fd
forkpid_t fork(void)Create child process
exec*int execvp(const char *file, char *const argv[])Replace process
pipeint pipe(int pipefd[2])Create pipe
dup2int dup2(int oldfd, int newfd)Duplicate fd
getpidpid_t getpid(void)Current PID
getcwdchar *getcwd(char *buf, size_t size)Current directory
sleepunsigned sleep(unsigned seconds)Sleep
usleepint usleep(useconds_t usec)Sleep (microseconds)
unlinkint unlink(const char *path)Delete file
accessint access(const char *path, int mode)Check file access

Pitfalls:

  • read and write may transfer fewer bytes than requested (short reads/ writes). Always loop.
  • close can fail. Check the return value, especially for files opened with O_SYNC.
  • fork + exec is POSIX, not C standard. Use posix_spawn when possible.

sys/types.h -- System Data Types

TypePurpose
pid_tProcess ID
uid_t / gid_tUser/group ID
off_tFile offset
size_tUnsigned size
ssize_tSigned size (for error returns)
mode_tFile permissions
dev_tDevice number
ino_tInode number
time_tTime in seconds

Pitfalls:

  • size_t is unsigned. Subtracting two size_t values can wrap to a huge number. Use ssize_t or cast carefully.
  • off_t is 32-bit by default on 32-bit systems. Use _FILE_OFFSET_BITS=64 for large file support.

Appendix B: Rust std for C Programmers

This appendix maps C standard library functions and patterns to their Rust equivalents. If you know the C function, find the Rust way to do the same thing.

I/O: stdio.h -> std::io, std::fs

CRustNotes
printf("x=%d\n", x)println!("x={x}")Format macros
fprintf(stderr, ...)eprintln!(...)Stderr output
sprintf(buf, ...)format!(...)Returns String
snprintf(buf, n, ...)write!(buf, ...)Into any Write impl
fopen(path, "r")File::open(path)Returns Result<File>
fopen(path, "w")File::create(path)Truncates existing
fclose(f)(automatic via Drop)RAII closes the file
fread(buf, 1, n, f)f.read(&mut buf)Read trait
fwrite(buf, 1, n, f)f.write_all(&buf)Write trait
fgets(buf, n, f)reader.read_line(&mut s)BufRead trait
fseek(f, off, SEEK_SET)f.seek(SeekFrom::Start(off))Seek trait
fflush(f)f.flush()Write trait
perror("msg")eprintln!("msg: {e}")Where e is the error
#![allow(unused)]
fn main() {
// Reading a file in Rust (C equivalent: fopen + fread + fclose)
use std::fs;
let contents = fs::read_to_string("file.txt").expect("read failed");

// Line-by-line reading (C equivalent: fgets loop)
use std::io::{self, BufRead};
let file = fs::File::open("file.txt").expect("open failed");
for line in io::BufReader::new(file).lines() {
    let line = line.expect("read line failed");
    println!("{line}");
}
}

Memory: stdlib.h -> Box, Vec, ownership

CRustNotes
malloc(n)Box::new(value)Single heap object
malloc(n * sizeof(T))Vec::with_capacity(n)Dynamic array
calloc(n, size)vec![0; n]Zeroed allocation
realloc(p, new_size)v.reserve(additional)Vec grows automatically
free(p)(automatic via Drop)RAII frees memory
memcpy(dst, src, n)dst.copy_from_slice(src)Slices
memmove(dst, src, n)slice.copy_within(range, dest)Overlapping
memset(p, 0, n)buf.fill(0)Fill slice
memcmp(a, b, n)a == b or a.cmp(&b)Slice comparison
#![allow(unused)]
fn main() {
// Heap allocation (C: malloc + use + free)
let boxed = Box::new(42);          // heap-allocated i32
println!("{boxed}");                // auto-deref
// freed automatically when boxed goes out of scope

// Dynamic array (C: malloc + realloc pattern)
let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
v.push(3);
// no manual realloc or free needed
}

Strings: string.h -> String, &str

CRustNotes
strlen(s)s.len()O(1) in Rust (stored length)
strcpy(dst, src)let dst = src.to_string()New allocation
strcat(dst, src)dst.push_str(src)Append to String
strcmp(a, b)a == bDirect comparison
strchr(s, c)s.find(c)Returns Option<usize>
strstr(hay, needle)hay.find(needle)Returns Option<usize>
strtok(s, delim)s.split(delim)Returns iterator
strtol(s, NULL, 10)s.parse::<i64>()Returns Result
atoi(s)s.parse::<i32>().unwrap()Panics on failure
#![allow(unused)]
fn main() {
// String operations (C equivalents in comments)
let s = String::from("hello");     // like strdup("hello")
let len = s.len();                 // like strlen(s)
let upper = s.to_uppercase();      // no C equivalent in string.h

// Splitting (C: strtok loop)
let csv = "a,b,c,d";
let parts: Vec<&str> = csv.split(',').collect();
// parts = ["a", "b", "c", "d"]

// Parsing numbers (C: strtol)
let n: i64 = "42".parse().expect("not a number");
}

The &str vs String distinction

C:      const char *  (pointer to existing string)  <-->  &str
        char *buf     (owned, mutable buffer)        <-->  String

Rust rule: use &str for parameters, String for owned data.

Math: math.h -> std::f64, num traits

CRustNotes
sqrt(x)x.sqrt() or f64::sqrt(x)Method on f64
pow(x, y)x.powi(n) / x.powf(y)Integer/float exponent
fabs(x)x.abs()Method
ceil(x)x.ceil()Method
floor(x)x.floor()Method
round(x)x.round()Method
log(x)x.ln()Natural log
log10(x)x.log10()Base-10 log
sin(x)x.sin()Radians
INFINITYf64::INFINITYAssociated constant
NANf64::NANAssociated constant
isnan(x)x.is_nan()Method

No -lm flag needed. Math is built into the primitive types.

Process: unistd.h, stdlib.h -> std::process

CRustNotes
fork()(no direct equivalent)Use Command::new()
exec*()Command::new(prog).exec()Via std::os::unix
system(cmd)Command::new("sh").arg("-c").arg(cmd).status()
exit(n)std::process::exit(n)
getpid()std::process::id()Returns u32
getenv("PATH")std::env::var("PATH")Returns Result<String>
pipe()Command::new(...).stdin(Stdio::piped())
waitpid()child.wait()Returns ExitStatus
#![allow(unused)]
fn main() {
// Running a child process (C: fork + exec + wait)
use std::process::Command;

let output = Command::new("ls")
    .arg("-la")
    .output()
    .expect("failed to execute");

println!("stdout: {}", String::from_utf8_lossy(&output.stdout));
println!("exit: {}", output.status);
}

Networking: sys/socket.h -> std::net

CRustNotes
socket(AF_INET, SOCK_STREAM, 0)TcpListener::bind(addr)Combined
connect()TcpStream::connect(addr)
bind() + listen()TcpListener::bind(addr)
accept()listener.accept()Returns (TcpStream, SocketAddr)
send(fd, buf, n, 0)stream.write_all(&buf)Write trait
recv(fd, buf, n, 0)stream.read(&mut buf)Read trait
close(fd)(automatic via Drop)
inet_ntop()addr.to_string()Display trait
htons(port)(automatic)Rust handles byte order
#![allow(unused)]
fn main() {
// TCP echo server (C equivalent: socket + bind + listen + accept loop)
use std::net::TcpListener;
use std::io::{Read, Write};

let listener = TcpListener::bind("127.0.0.1:8080").unwrap();
for stream in listener.incoming() {
    let mut stream = stream.unwrap();
    let mut buf = [0u8; 1024];
    let n = stream.read(&mut buf).unwrap();
    stream.write_all(&buf[..n]).unwrap();
}
}

Error Handling: errno -> Result<T, E>

C patternRust equivalent
Return -1 and set errnoReturn Err(io::Error)
Check if (ret < 0)match or ? operator
perror("msg")eprintln!("msg: {e}")
strerror(errno)e.to_string()
#![allow(unused)]
fn main() {
// The ? operator replaces C error-checking boilerplate
use std::io;
use std::fs;

fn read_config() -> io::Result<String> {
    let contents = fs::read_to_string("/etc/myapp.conf")?;  // returns Err on failure
    Ok(contents)
}
}

Key Traits Every C Programmer Should Know

Read and Write

#![allow(unused)]
fn main() {
use std::io::{Read, Write};

// Anything that implements Read can be read from:
// File, TcpStream, &[u8], Stdin, ...
fn process_input(mut reader: impl Read) -> io::Result<String> {
    let mut buf = String::new();
    reader.read_to_string(&mut buf)?;
    Ok(buf)
}
}

Iterator

Replaces C for-loops over arrays and linked lists.

#![allow(unused)]
fn main() {
// C: for (int i = 0; i < n; i++) sum += arr[i];
let sum: i32 = arr.iter().sum();

// C: filter + transform loop
let even_squares: Vec<i32> = (0..10)
    .filter(|x| x % 2 == 0)
    .map(|x| x * x)
    .collect();
}

Display and Debug

#![allow(unused)]
fn main() {
use std::fmt;

// Display: for user-facing output (like a custom printf format)
impl fmt::Display for MyType {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "MyType({})", self.value)
    }
}

// Debug: for developer output (derive it)
#[derive(Debug)]
struct Point { x: f64, y: f64 }

let p = Point { x: 1.0, y: 2.0 };
println!("{p:?}");  // Point { x: 1.0, y: 2.0 }
}

From and Into

#![allow(unused)]
fn main() {
// From<T>: conversion from one type to another
// Replaces C's explicit casting and conversion functions
let s: String = String::from("hello");
let n: i64 = i64::from(42i32);

// Into<T>: the reverse direction (auto-derived from From)
fn takes_string(s: String) { /* ... */ }
takes_string("hello".into());  // &str -> String via Into
}

Clone and Copy

#![allow(unused)]
fn main() {
// Copy: bitwise copy (like C assignment for simple types)
// Applies to: integers, floats, bool, char, references
let a: i32 = 5;
let b = a;  // copy, both valid

// Clone: explicit deep copy (like manual malloc + memcpy)
let s1 = String::from("hello");
let s2 = s1.clone();  // explicit deep copy
// Both s1 and s2 are valid
}

Concurrency: pthreads -> std::thread, std::sync

C (pthreads)RustNotes
pthread_create()thread::spawn(closure)Closure captures data
pthread_join()handle.join()Returns Result
pthread_mutex_tMutex<T>Data inside the mutex
pthread_rwlock_tRwLock<T>
pthread_cond_tCondvar
sem_t(use Mutex + Condvar)No direct equivalent
atomic_intAtomicI32std::sync::atomic
(shared data)Arc<T>Thread-safe ref counting
#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex};
use std::thread;

let counter = Arc::new(Mutex::new(0));

let handles: Vec<_> = (0..4).map(|_| {
    let c = Arc::clone(&counter);
    thread::spawn(move || {
        let mut num = c.lock().unwrap();
        *num += 1;
    })
}).collect();

for h in handles { h.join().unwrap(); }
println!("Result: {}", *counter.lock().unwrap());
}

Time: time.h -> std::time

CRustNotes
time(NULL)SystemTime::now()Wall clock
clock_gettime(CLOCK_MONOTONIC)Instant::now()For benchmarks
sleep(n)thread::sleep(Duration::from_secs(n))
difftime(t1, t0)t1.duration_since(t0)Returns Duration
strftime(...)(use chrono crate)No built-in formatting
#![allow(unused)]
fn main() {
use std::time::Instant;

let start = Instant::now();
// ... work ...
let elapsed = start.elapsed();
println!("Took {elapsed:?}");
}

Collections: Manual C -> std::collections

C patternRust typeNotes
T array[N][T; N]Fixed-size array
T *arr + lenVec<T>Dynamic array
Hash table (hand-rolled)HashMap<K, V>
Binary tree (hand-rolled)BTreeMap<K, V>Sorted
Linked list (hand-rolled)LinkedList<T>(rarely used)
Bit set (manual)HashSet<T> or bitflags crate
Ring buffer (manual)VecDeque<T>Double-ended queue

Quick Conversion Cheatsheet

C type          -->  Rust type
int                  i32
unsigned int         u32
long                 i64 (on 64-bit Linux)
size_t               usize
ssize_t              isize
char                 u8 (byte) or char (Unicode)
char *               &str (borrowed) or String (owned)
void *               *const u8 / *mut u8 or &[u8]
NULL                 None (in Option<T>)
FILE *               File (in std::fs)
bool (C99)           bool
_Bool                bool

Appendix C: Linux System Call Reference

This appendix provides a categorized reference of the key system calls covered in this book. For each syscall: the signature, purpose, common errno values, and the chapter where it's covered in detail.

File I/O

open

#include <fcntl.h>
int open(const char *pathname, int flags, ... /* mode_t mode */);

Opens or creates a file. Returns a file descriptor.

Flags: O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_TRUNC, O_APPEND, O_NONBLOCK, O_CLOEXEC.

Errno: ENOENT (not found), EACCES (permission), EEXIST (with O_EXCL), EMFILE (too many open fds).

Chapter: 28 (File Descriptors)

read / write

#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);

Read/write bytes. May transfer fewer than requested (short read/write).

Errno: EBADF (bad fd), EINTR (interrupted by signal), EAGAIN (non-blocking and would block), EIO (hardware error).

Chapter: 28-29 (File I/O)

close

#include <unistd.h>
int close(int fd);

Closes a file descriptor. Releases kernel resources.

Errno: EBADF (not a valid fd), EIO (error flushing data).

Chapter: 28

lseek

#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);

Repositions file offset. whence: SEEK_SET, SEEK_CUR, SEEK_END.

Errno: EBADF, EINVAL (invalid whence), ESPIPE (fd is a pipe).

Chapter: 29

stat / fstat / lstat

#include <sys/stat.h>
int stat(const char *path, struct stat *buf);
int fstat(int fd, struct stat *buf);
int lstat(const char *path, struct stat *buf); /* doesn't follow symlinks */

Get file metadata: size, permissions, timestamps, inode.

Errno: ENOENT, EACCES, EBADF (fstat).

Chapter: 30

dup2

#include <unistd.h>
int dup2(int oldfd, int newfd);

Duplicates oldfd to newfd, closing newfd first if open.

Errno: EBADF, EMFILE.

Chapter: 31

sendfile

#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

Zero-copy transfer between file descriptors (in_fd must be mmappable).

Errno: EAGAIN, EBADF, EINVAL (fd types not supported).

Chapter: 52

mmap / munmap

#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *addr, size_t length);

Map files or devices into memory. Also used for anonymous memory allocation.

Prot: PROT_READ, PROT_WRITE, PROT_EXEC. Flags: MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS, MAP_FIXED.

Errno: EACCES, ENOMEM, EINVAL (bad alignment or length).

Chapter: 39

Process Management

fork

#include <unistd.h>
pid_t fork(void);

Creates a child process. Returns 0 in child, child PID in parent, -1 on error.

Errno: ENOMEM, EAGAIN (process limit).

Chapter: 32

exec family

#include <unistd.h>
int execve(const char *path, char *const argv[], char *const envp[]);
int execvp(const char *file, char *const argv[]);
int execlp(const char *file, const char *arg, ... /* (char *)NULL */);

Replace current process image. Does not return on success.

Errno: ENOENT, EACCES, ENOEXEC (bad format).

Chapter: 33

wait / waitpid

#include <sys/wait.h>
pid_t wait(int *wstatus);
pid_t waitpid(pid_t pid, int *wstatus, int options);

Wait for child process state change. Use WIFEXITED, WEXITSTATUS macros to interpret status.

Options: WNOHANG (don't block), WUNTRACED.

Errno: ECHILD (no children), EINTR.

Chapter: 33

_exit / exit_group

#include <unistd.h>
void _exit(int status);

Terminate process immediately (no atexit handlers, no stdio flush).

Chapter: 32

getpid / getppid

#include <unistd.h>
pid_t getpid(void);
pid_t getppid(void);

Get current process ID or parent process ID. Never fails.

Chapter: 32

Signal Handling

sigaction

#include <signal.h>
int sigaction(int signum, const struct sigaction *act,
              struct sigaction *oldact);

Set signal handler with full control over flags and signal mask.

Errno: EINVAL (invalid signal or attempt to handle SIGKILL/SIGSTOP).

Chapter: 35-36

kill

#include <signal.h>
int kill(pid_t pid, int sig);

Send signal to process or process group.

Errno: ESRCH (no such process), EPERM (permission).

Chapter: 35

sigprocmask

#include <signal.h>
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);

Block or unblock signals. how: SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK.

Chapter: 36

signalfd

#include <sys/signalfd.h>
int signalfd(int fd, const sigset_t *mask, int flags);

Receive signals via a file descriptor (pollable).

Errno: EINVAL, ENOMEM.

Chapter: 37

Memory Management

brk / sbrk

#include <unistd.h>
int brk(void *addr);
void *sbrk(intptr_t increment);

Adjust the program break (heap end). Rarely used directly; malloc calls these internally.

Chapter: 38

mprotect

#include <sys/mman.h>
int mprotect(void *addr, size_t len, int prot);

Change protection on a memory region.

Errno: EACCES, EINVAL, ENOMEM.

Chapter: 39

madvise

#include <sys/mman.h>
int madvise(void *addr, size_t length, int advice);

Advise kernel on memory usage patterns. MADV_SEQUENTIAL, MADV_DONTNEED, MADV_HUGEPAGE.

Chapter: 39

IPC (Inter-Process Communication)

pipe / pipe2

#include <unistd.h>
int pipe(int pipefd[2]);
int pipe2(int pipefd[2], int flags);  /* O_CLOEXEC, O_NONBLOCK */

Create a unidirectional data channel. pipefd[0] for reading, pipefd[1] for writing.

Errno: EMFILE, ENFILE.

Chapter: 34

socketpair

#include <sys/socket.h>
int socketpair(int domain, int type, int protocol, int sv[2]);

Create a pair of connected sockets (bidirectional pipe).

Chapter: 34

shmget / shmat / shmdt

#include <sys/shm.h>
int shmget(key_t key, size_t size, int shmflg);
void *shmat(int shmid, const void *shmaddr, int shmflg);
int shmdt(const void *shmaddr);

System V shared memory. Prefer mmap(MAP_SHARED) for new code.

Chapter: 40

Networking

socket

#include <sys/socket.h>
int socket(int domain, int type, int protocol);

Create a socket. Returns a file descriptor.

Domain: AF_INET, AF_INET6, AF_UNIX, AF_NETLINK. Type: SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW.

Errno: EAFNOSUPPORT, EMFILE, ENOMEM.

Chapter: 47

bind

#include <sys/socket.h>
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Assign address to socket.

Errno: EADDRINUSE, EACCES (privileged port), EINVAL.

Chapter: 47

listen / accept

int listen(int sockfd, int backlog);
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

Listen for connections and accept them.

Errno: EAGAIN (non-blocking), EMFILE, EINTR.

Chapter: 47-48

connect

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Connect to a remote address.

Errno: ECONNREFUSED, ETIMEDOUT, ENETUNREACH, EINPROGRESS (non-blocking).

Chapter: 47

send / recv / sendto / recvfrom

ssize_t send(int sockfd, const void *buf, size_t len, int flags);
ssize_t recv(int sockfd, void *buf, size_t len, int flags);
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
               const struct sockaddr *dest, socklen_t addrlen);
ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                 struct sockaddr *src, socklen_t *addrlen);

Send/receive data on sockets. sendto/recvfrom for UDP.

Flags: MSG_DONTWAIT, MSG_PEEK, MSG_NOSIGNAL.

Chapter: 47-48

epoll_create1 / epoll_ctl / epoll_wait

#include <sys/epoll.h>
int epoll_create1(int flags);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

Scalable I/O event notification. op: EPOLL_CTL_ADD, EPOLL_CTL_MOD, EPOLL_CTL_DEL.

Chapter: 49

Device Interaction

ioctl

#include <sys/ioctl.h>
int ioctl(int fd, unsigned long request, ...);

Device-specific control operations.

Errno: ENOTTY (wrong device type), EINVAL (bad request), EFAULT (bad pointer).

Chapter: 54

clone (underlying syscall for pthread_create)

#include <sched.h>
int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...);

Create a new process/thread with fine-grained sharing control.

Flags: CLONE_VM, CLONE_FILES, CLONE_SIGHAND, CLONE_THREAD.

Chapter: 41

futex

#include <linux/futex.h>
long futex(uint32_t *uaddr, int futex_op, uint32_t val, ...);

Fast user-space locking primitive. Used by pthread mutex implementation.

Operations: FUTEX_WAIT, FUTEX_WAKE.

Chapter: 44

Miscellaneous

getrandom

#include <sys/random.h>
ssize_t getrandom(void *buf, size_t buflen, unsigned int flags);

Cryptographically secure random bytes.

Flags: GRND_NONBLOCK, GRND_RANDOM.

Chapter: Referenced in security discussions

prctl

#include <sys/prctl.h>
int prctl(int option, unsigned long arg2, ...);

Process-specific operations: set name, enable seccomp, control dumpability.

seccomp

#include <linux/seccomp.h>
int seccomp(unsigned int operation, unsigned int flags, void *args);

Restrict available system calls for sandboxing.

Syscall Invocation

All the above are libc wrappers. The actual syscall instruction:

#include <sys/syscall.h>
#include <unistd.h>

/* Direct syscall (bypassing libc) */
long result = syscall(SYS_write, STDOUT_FILENO, "hello\n", 6);

On x86-64, this becomes:

mov  rax, 1          ; syscall number for write
mov  rdi, 1          ; fd = stdout
lea  rsi, [msg]      ; buffer
mov  rdx, 6          ; count
syscall

Registers: rax = syscall number, rdi = arg1, rsi = arg2, rdx = arg3, r10 = arg4, r8 = arg5, r9 = arg6. Return value in rax.

Appendix D: GDB Quick Reference

GDB is the GNU Debugger. It's the primary tool for finding segfaults, inspecting data structures, and understanding what your C and Rust programs actually do at runtime.

Starting GDB

# Compile with debug symbols
$ gcc -g -O0 -o myapp myapp.c
$ rustc -g -o myapp myapp.rs

# Start GDB
$ gdb ./myapp

# Start with arguments
$ gdb --args ./myapp arg1 arg2

# Attach to running process
$ gdb -p <pid>

# Analyze a core dump
$ gdb ./myapp core

# Start in TUI (text user interface) mode
$ gdb -tui ./myapp

Essential Commands

Running

CommandShortAction
runrStart the program
run arg1 arg2r arg1 arg2Start with arguments
continuecContinue after breakpoint
killKill the running program
quitqExit GDB

Breakpoints

CommandShortAction
break mainb mainBreak at function
break file.c:42b file.c:42Break at line
break *0x4005a0b *0x4005a0Break at address
break func if x > 10Conditional breakpoint
tbreak maintb mainTemporary (one-shot) breakpoint
info breakpointsi bList breakpoints
delete 1d 1Delete breakpoint #1
deletedDelete all breakpoints
disable 1dis 1Disable breakpoint #1
enable 1en 1Enable breakpoint #1

Stepping

CommandShortAction
nextnStep over (next line)
stepsStep into function
finishfinRun until current function returns
until 50u 50Run until line 50
nextiniStep one instruction (over calls)
stepisiStep one instruction (into calls)

Examining Variables

CommandAction
print xPrint variable x
print *ptrDereference pointer
print arr[5]Array element
print sizeof(x)Size of variable
print/x valPrint in hex
print/t valPrint in binary
print/d valPrint as decimal
print/c valPrint as character
print (struct foo *)ptrCast and print
display xPrint x every time we stop
undisplay 1Remove display #1
info localsAll local variables
info argsFunction arguments
ptype varShow type of variable
whatis varShort type description

Examining Memory

x/FMT ADDRESS

Format: x/[count][format][size]
  count:  number of items
  format: x (hex), d (decimal), u (unsigned), o (octal),
          t (binary), c (char), s (string), i (instruction)
  size:   b (byte), h (halfword=2), w (word=4), g (giant=8)
CommandAction
x/16xb ptr16 bytes in hex
x/4xw ptr4 words (32-bit) in hex
x/s strPrint as C string
x/10i $pc10 instructions at PC
x/gx &var8-byte value in hex

Example session:

(gdb) x/32xb buffer
0x7fffffffe400: 0x48 0x65 0x6c 0x6c 0x6f 0x00 0x00 0x00
0x7fffffffe408: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe410: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe418: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

Watchpoints

Watchpoints stop when a variable's value changes.

CommandAction
watch xBreak when x changes
watch *0x601040Break when memory at address changes
rwatch xBreak when x is read
awatch xBreak on read or write
info watchpointsList watchpoints

Watchpoints are hardware-assisted (limited number, typically 4) or software-assisted (very slow).

Backtrace and Stack

CommandShortAction
backtracebtShow call stack
backtrace fullbt fullStack with local vars
frame 3f 3Select stack frame #3
upMove up one frame
downMove down one frame
info framei fDetailed frame info
info stacki sStack summary

Threads

CommandAction
info threadsList all threads
thread 2Switch to thread 2
thread apply all btBacktrace all threads
thread apply all print varPrint var in all threads
set scheduler-locking onOnly run current thread
set scheduler-locking offRun all threads

TUI Mode

TUI (Text User Interface) shows source code alongside the command line.

# Start in TUI mode
$ gdb -tui ./myapp

# Toggle TUI in running session
(gdb) tui enable
(gdb) tui disable

# Or press Ctrl+X then A to toggle

# Switch layouts
(gdb) layout src      # source code
(gdb) layout asm      # assembly
(gdb) layout split    # source + assembly
(gdb) layout regs     # registers

Refresh the screen if it gets corrupted: Ctrl+L

Common Workflows

Finding a Segfault

$ gcc -g -O0 -o buggy buggy.c
$ gdb ./buggy
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400556 in process_data (ptr=0x0) at buggy.c:15
15          *ptr = 42;
(gdb) bt
#0  0x0000000000400556 in process_data (ptr=0x0) at buggy.c:15
#1  0x0000000000400589 in main () at buggy.c:22
(gdb) print ptr
$1 = (int *) 0x0
(gdb) frame 1
#1  0x0000000000400589 in main () at buggy.c:22
(gdb) info locals
data = 0x0

The backtrace tells you: process_data was called with a NULL pointer from main at line 22.

Inspecting a Linked List

(gdb) print *head
$1 = {value = 10, next = 0x602040}
(gdb) print *head->next
$2 = {value = 20, next = 0x602060}
(gdb) print *head->next->next
$3 = {value = 30, next = 0x0}

Or use a loop:

(gdb) set $node = head
(gdb) while $node
 > print $node->value
 > set $node = $node->next
 > end
$4 = 10
$5 = 20
$6 = 30

Debugging Multi-Threaded Programs

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f... "main" in main () at server.c:45
  2    Thread 0x7f... "worker" in handle_client () at server.c:28
  3    Thread 0x7f... "worker" in recv () from /lib/...

(gdb) thread 2
(gdb) bt
#0  handle_client (arg=0x602010) at server.c:28
#1  0x00007ffff... in start_thread ()

(gdb) thread apply all bt

Finding Memory Corruption

Use watchpoints to find what's overwriting a variable:

(gdb) break main
(gdb) run
(gdb) watch my_variable
(gdb) continue
Hardware watchpoint 2: my_variable

Old value = 42
New value = 0
0x000000000040066a in corrupt_function () at bug.c:33

Examining struct Layout

(gdb) ptype struct sockaddr_in
type = struct sockaddr_in {
    sa_family_t sin_family;
    in_port_t sin_port;
    struct in_addr sin_addr;
    unsigned char sin_zero[8];
}
(gdb) print sizeof(struct sockaddr_in)
$1 = 16

Rust-Specific GDB Tips

Compiling Rust for GDB

# Debug build (default, includes symbols)
$ cargo build
$ gdb ./target/debug/myapp

# Release with debug info
# In Cargo.toml:
# [profile.release]
# debug = true
$ cargo build --release
$ gdb ./target/release/myapp

Rust Type Names in GDB

Rust types appear with their full path:

(gdb) ptype v
type = alloc::vec::Vec<i32, alloc::alloc::Global>

Printing Rust Types

(gdb) print v
$1 = Vec(size=3) = {1, 2, 3}

(gdb) print s
$2 = "hello world"

(gdb) print opt
$3 = core::option::Option<i32>::Some(42)

(gdb) print result
$4 = core::result::Result<i32, ...>::Ok(10)

GDB has pretty-printers for common Rust types (Vec, String, Option, Result, HashMap) when Rust's GDB extensions are installed.

Rust Mangled Names

Rust function names are mangled. Use:

(gdb) break myapp::main
(gdb) break myapp::module::function_name

Or with tab completion:

(gdb) break myapp::<TAB><TAB>

Printing Enum Variants

(gdb) print my_enum
$1 = myapp::State::Running(42)

Unwinding Through Panics

(gdb) break rust_panic
(gdb) run
(gdb) bt
#0  std::panicking::begin_panic ()
#1  myapp::risky_function () at src/main.rs:15
#2  myapp::main () at src/main.rs:8

GDB Configuration

Put frequently used settings in ~/.gdbinit:

# ~/.gdbinit
set print pretty on
set print array on
set pagination off
set history save on
set history filename ~/.gdb_history
set history size 10000
set disassembly-flavor intel

# Rust pretty-printers (path varies by installation)
# python
# import gdb
# gdb.execute('source /path/to/rust-gdb-pretty-printers.py')
# end

Quick Reference Card

+------------------+-------+-------------------------------+
| Action           | Short | Full Command                  |
+------------------+-------+-------------------------------+
| Run              | r     | run [args]                    |
| Break            | b     | break location                |
| Continue         | c     | continue                      |
| Step over        | n     | next                          |
| Step into        | s     | step                          |
| Finish function  | fin   | finish                        |
| Print variable   | p     | print expression              |
| Examine memory   |       | x/FMT address                 |
| Backtrace        | bt    | backtrace                     |
| List source      | l     | list                          |
| Info breakpoints | i b   | info breakpoints              |
| Info locals      | i lo  | info locals                   |
| Info threads     | i th  | info threads                  |
| Quit             | q     | quit                          |
+------------------+-------+-------------------------------+

Appendix E: Glossary

Definitions of key terms used throughout this book.


ABI (Application Binary Interface) The low-level contract between compiled code: calling conventions, register usage, struct layout, name mangling. If two object files follow the same ABI, they can be linked together. C has a stable ABI on most platforms; Rust does not (use extern "C" for FFI).

Alignment The requirement that data sit at memory addresses that are multiples of a certain value (typically the type's size). A uint32_t usually requires 4-byte alignment. Misaligned access is undefined behavior in C on some architectures and always a performance penalty.

Arena Allocator A memory allocator that bumps a pointer forward for each allocation and frees all allocations at once. Zero fragmentation, zero per-object overhead, very fast. Useful for parsers, request handlers, and game loops. See Chapter 51.

Async (Asynchronous I/O) A programming model where I/O operations don't block the calling thread. Instead, the program registers interest in events and is notified when they complete. Linux provides epoll, io_uring, and aio for async I/O. Rust's async/await builds on epoll via runtimes like tokio.

Atomic Operation An operation that completes indivisibly -- no other thread can see it half-done. Used for lock-free synchronization. C provides <stdatomic.h>; Rust provides std::sync::atomic. See Chapter 52.

Borrow Checker Rust's compile-time system that enforces ownership rules: one mutable reference or any number of shared references, but not both simultaneously. Prevents data races and use-after-free at compile time.

Callback A function pointer passed to another function, to be called later. Used extensively in C APIs (signal handlers, thread start routines, comparators for qsort). In Rust, closures serve the same purpose with type safety.

Condition Variable A synchronization primitive that allows threads to wait until a condition becomes true. Always used with a mutex. C: pthread_cond_t. Rust: std::sync::Condvar. See Chapter 45.

Container_of A C macro (used extensively in the Linux kernel) that takes a pointer to a struct member and returns a pointer to the containing struct. Relies on offsetof. Essential for intrusive data structures like list_head.

Copy Semantics In C, assignment copies all bytes of a struct (shallow copy). In Rust, types that implement the Copy trait are copied on assignment; other types are moved (ownership transfer). See Chapter 9.

Daemon A background process with no controlling terminal. Created by forking, calling setsid, and closing stdin/stdout/stderr. System services (sshd, nginx, systemd) are daemons. See Chapter 34.

Data Race Two or more threads access the same memory location concurrently, at least one is a write, and there's no synchronization. Undefined behavior in both C and Rust. Rust's type system prevents data races at compile time.

Deadlock A situation where two or more threads are each waiting for the other to release a resource. Thread A holds lock 1 and waits for lock 2; Thread B holds lock 2 and waits for lock 1. Neither can proceed.

DMA (Direct Memory Access) Hardware that transfers data between devices and memory without CPU involvement. Kernel drivers set up DMA buffers and descriptors; the hardware reads/writes directly. Requires careful cache management and memory alignment.

Endianness Byte order within multi-byte values. Big-endian: most significant byte first (network byte order). Little-endian: least significant byte first (x86, ARM default). Use htons/ntohs for conversion. See Chapter 11.

epoll Linux's scalable I/O event notification mechanism. Monitors many file descriptors efficiently with O(1) per event. Three syscalls: epoll_create1, epoll_ctl, epoll_wait. See Chapter 49.

errno A thread-local integer set by system calls and library functions on failure. Check it immediately after a call returns an error. Reset it before calling functions that may or may not set it. See Appendix A.

File Descriptor An integer index into the kernel's per-process table of open files, sockets, pipes, and devices. 0 = stdin, 1 = stdout, 2 = stderr. The fundamental I/O abstraction in Unix. See Chapter 28.

Fork The fork() syscall creates a new process by duplicating the calling process. The child gets a copy of the parent's memory (via copy-on-write). Returns 0 in the child, the child's PID in the parent. See Chapter 32.

Futex (Fast Userspace Mutex) A Linux-specific synchronization primitive. Uncontended operations stay in user space (fast). Contended operations trap to the kernel to sleep. The building block for pthread_mutex_t. See Chapter 44.

GFP Flags Kernel memory allocation flags (GFP_KERNEL, GFP_ATOMIC, etc.) that tell the allocator what context the allocation occurs in. Using the wrong flag (e.g., GFP_KERNEL in interrupt context) causes deadlocks.

Inode The kernel data structure that represents a file on disk. Contains metadata (permissions, timestamps, size, block pointers) but not the filename. Multiple filenames (hard links) can point to the same inode.

ioctl (I/O Control) A catch-all system call for device-specific operations that don't fit the read/write model. Takes a file descriptor, a request number, and an optional argument. See Chapter 54.

IPC (Inter-Process Communication) Mechanisms for processes to exchange data: pipes, FIFOs, Unix domain sockets, shared memory, message queues, signals, netlink. See Chapters 34, 40.

Lifetime (Rust) A compile-time annotation that tracks how long a reference is valid. Written as 'a in type signatures. The borrow checker uses lifetimes to prevent dangling references. No runtime cost.

mmap (Memory Map) The mmap() syscall maps files or devices into a process's address space. Also used for anonymous memory allocation and shared memory between processes. See Chapter 39.

Move Semantics In Rust, assigning a value to a new variable transfers ownership. The original variable becomes invalid. Prevents double-free and use-after-free. Types that implement Copy are exempt (they're bitwise copied instead).

Mutex (Mutual Exclusion) A synchronization primitive that ensures only one thread accesses a critical section at a time. C: pthread_mutex_t. Rust: std::sync::Mutex<T> (data is inside the mutex, enforced by the type system). See Chapter 44.

Netlink A socket-based IPC mechanism between the Linux kernel and user-space processes. Used for network configuration, device events, and other kernel communication. See Chapter 55.

Opaque Type A type whose internal layout is hidden from users. In C, declared as a forward struct declaration with access only through function pointers. In Rust, achieved with module visibility. See Chapter 20.

Ownership (Rust) Rust's core memory management concept: every value has exactly one owner. When the owner goes out of scope, the value is dropped (freed). Ownership can be transferred (moved) or temporarily lent (borrowed).

POSIX (Portable Operating System Interface) The IEEE standard defining the API for Unix-like operating systems. Covers system calls, shell utilities, and C library functions. Linux is "mostly POSIX-compliant."

Race Condition A bug where the program's behavior depends on the timing of events (typically thread scheduling). Includes data races but also higher-level logic races (TOCTOU: time-of-check-to-time-of-use).

RAII (Resource Acquisition Is Initialization) A pattern where resources (memory, files, locks) are tied to object lifetime. Acquired in the constructor, released in the destructor. C++ and Rust use this heavily. C requires manual cleanup (often with goto chains).

Reactor Pattern An event-driven design where a central event loop waits for I/O events and dispatches them to handlers. Used by epoll-based servers, tokio, and most high-performance network servers. See Chapter 49.

Semaphore A synchronization primitive with a counter. wait() decrements (blocking if zero); post() increments (waking a waiter). Binary semaphore acts like a mutex. Counting semaphore limits concurrent access. C: sem_t. See Chapter 45.

Signal An asynchronous notification sent to a process. Examples: SIGTERM (terminate), SIGSEGV (segfault), SIGINT (Ctrl+C), SIGCHLD (child exited). Handled by signal handlers or signalfd. See Chapters 35-37.

Slab Allocator The Linux kernel's pool allocator for fixed-size objects. Uses kmem_cache structures. Pre-allocates pages, divides them into same-size slots, and maintains free lists. The kernel equivalent of the pool allocator in Chapter 51.

Socket An endpoint for network communication. Identified by a file descriptor. Types: stream (TCP), datagram (UDP), raw, Unix domain. See Chapters 47-49.

Spinlock A lock that busy-waits (spins) instead of sleeping. Appropriate when the expected hold time is very short and sleeping would be more expensive than spinning. Used in the kernel for interrupt-context synchronization.

Syscall (System Call) The interface between user-space programs and the kernel. Invoked via the syscall instruction (x86-64). Each has a number (e.g., write = 1 on x86-64). libc wraps syscalls in C functions. See Appendix C.

Thread-Safe Code that can be called simultaneously from multiple threads without data corruption. Achieved through synchronization (mutexes, atomics) or by avoiding shared mutable state.

TOCTOU (Time-of-Check-to-Time-of-Use) A race condition where the state checked by a program changes before the program acts on it. Example: checking file permissions with access(), then opening the file -- an attacker can swap the file in between.

Trait (Rust) An interface definition. Like a C vtable or a Java interface, but resolved at compile time (static dispatch) or runtime (dynamic dispatch with dyn Trait). Key traits: Read, Write, Iterator, Display, Debug, Clone, Copy.

Undefined Behavior (UB) Code whose behavior is not defined by the language standard. The compiler may do anything: crash, produce wrong results, appear to work, or format your hard drive. Common C causes: null dereference, buffer overflow, signed integer overflow, use-after-free, data races.

Volatile A C qualifier that prevents the compiler from optimizing away or reordering accesses to a variable. Used for memory-mapped I/O registers. Does NOT provide atomicity or prevent CPU reordering. Not the same as atomic.

VFS (Virtual File System) The kernel's abstraction layer that provides a uniform file interface over different filesystems (ext4, procfs, sysfs, tmpfs). All file operations go through VFS, which dispatches to the specific filesystem's file_operations.

Vtable (Virtual Function Table) A table of function pointers used for runtime polymorphism. In C, built manually as a struct of function pointers. In Rust, generated automatically for dyn Trait objects. In the kernel, file_operations and platform_driver are vtables.

Zero-Copy Transferring data without copying it between buffers. Techniques: sendfile, splice, mmap, io_uring, and in-place parsing. Eliminates CPU and cache overhead of memcpy. See Chapter 52.