C, Rust, and Linux Systems Programming
Learn C and Rust together while becoming a Linux systems programmer.
This book teaches C and Rust side by side — C first at each concept, then the Rust equivalent — while building you into a capable Linux user-space systems programmer. You'll go from writing your first hello world in both languages to building event-driven servers, manipulating bits in hardware registers, and talking directly to the kernel.
Who This Book Is For
You understand programming concepts — recursion, iteration, functions, data structures — but you haven't written C or Rust. You want to become a Linux systems programmer and you're preparing for protocol implementations and device driver work.
What You'll Learn
- C and Rust fundamentals — types, control flow, functions, structs, enums
- Pointers, memory, and ownership — from raw pointers to Rust's borrow checker
- Bit-level programming — bitwise operations, masks, alignment, endianness, volatile access
- Advanced patterns — data structures, generics, function pointers, state machines, error handling
- The build pipeline — compilation stages, Make, CMake, Cargo, libraries, cross-compilation
- Linux system programming — file descriptors, processes, signals, threads, IPC, networking
- Performance — optimization, memory pools, zero-copy, atomics
- The user-kernel boundary — /proc, /sys, ioctl, netlink, preparing for kernel space
How to Read This Book
Each chapter follows the same rhythm:
- Brief setup — what and why (not a lecture)
- C code — annotated, runnable, minimal
- Try it — modify something, see what happens
- Rust equivalent — what's the same, what's different
- Diagrams where they help
- Knowledge check — test your understanding
- Pitfalls — short, punchy list
- Move on
Every code snippet compiles and runs. Learn by doing.
Relationship to "How Programs Really Run"
That book explains the machine — CPU architecture, memory hierarchy, ELF format, virtual memory. It assumes C/Rust knowledge.
This book teaches C and Rust and focuses on programming the machine — system calls, IPC, signals, networking, bit-level manipulation.
They're complementary: read "How Programs Really Run" to understand what's underneath, read this book to learn to program with it.
Goljyu Innovations
Hello from C, Hello from Rust
Every systems programmer's journey starts the same way: make the machine say something. In this chapter you will write, compile, and run your first program in both C and Rust, and you will see how the two languages differ before a single line of logic appears.
Your First C Program
Create a file called hello.c:
/* hello.c -- the smallest useful C program */
#include <stdio.h>
int main(void)
{
printf("Hello from C!\n");
return 0;
}
Compile and run it:
$ gcc -Wall -o hello hello.c
$ ./hello
Hello from C!
Let us walk through every piece.
#include <stdio.h> -- This is a preprocessor directive. Before the compiler ever
sees your code, a separate tool (the C preprocessor) pastes the entire contents of
stdio.h into your file. That header declares printf and hundreds of other I/O
functions. Without it, the compiler does not know what printf is.
int main(void) -- The entry point. The operating system's C runtime calls main
after setting up the process. It returns int because the OS expects an exit code.
void in the parameter list means "no arguments" (in C, empty parentheses mean
"unspecified arguments", which is different).
printf("Hello from C!\n") -- Writes a string to standard output. The \n is a
newline character. printf is a variadic function; it accepts a format string followed
by zero or more arguments. We will use it heavily.
return 0; -- Exit code 0 means success. Any non-zero value signals an error.
The shell stores this value in $?.
$ ./hello
Hello from C!
$ echo $?
0
The gcc flags you should always use
| Flag | Purpose |
|---|---|
-Wall | Enable most warnings |
-Wextra | Enable even more warnings |
-std=c17 | Use the C17 standard |
-pedantic | Reject non-standard extensions |
-o name | Name the output binary |
A solid default:
$ gcc -Wall -Wextra -std=c17 -pedantic -o hello hello.c
Driver Prep: Kernel modules are compiled with an even stricter set of warnings. Getting comfortable with
-Wall -Wextranow saves pain later.
Try It: Change the
return 0;toreturn 42;. Recompile, run, then checkecho $?. What do you see?
Your First Rust Program
Create a file called hello.rs:
// hello.rs -- the smallest useful Rust program fn main() { println!("Hello from Rust!"); }
Compile and run it:
$ rustc hello.rs
$ ./hello
Hello from Rust!
fn main() -- Rust's entry point. No return type is written because main
implicitly returns () (the unit type, similar to void). No header includes, no
preprocessor. The compiler already knows about println!.
println!("Hello from Rust!") -- The ! marks this as a macro, not a function.
Macros in Rust are expanded at compile time. println! handles formatting, type
checking of arguments, and writes to stdout with an appended newline.
There is no explicit return 0. Rust's main returns exit code 0 on success
automatically. If you want to return a custom exit code:
// hello_exit.rs -- returning a custom exit code use std::process::ExitCode; fn main() -> ExitCode { println!("Hello from Rust!"); ExitCode::from(0) }
Rust Note: Rust does not have a preprocessor. There are no
#includedirectives. Modules,usestatements, and the compiler's built-in knowledge of the standard library replace that entire mechanism.
The Compilation Model
C and Rust compile your source code down to native machine code, but the journey is different.
C compilation pipeline
+-------------+
hello.c ----->| Preprocessor|----> hello.i (expanded source)
+-------------+
|
+-------------+
| Compiler |----> hello.s (assembly)
+-------------+
|
+-------------+
| Assembler |----> hello.o (object file)
+-------------+
|
+-------------+
| Linker |----> hello (executable)
+-------------+
You can see each stage:
$ gcc -E hello.c -o hello.i # preprocess only
$ gcc -S hello.c -o hello.s # compile to assembly
$ gcc -c hello.c -o hello.o # assemble to object file
$ gcc hello.o -o hello # link
Rust compilation pipeline
+-----------+
hello.rs ----->| rustc |----> hello (executable)
| (frontend |
| + LLVM |
| backend) |
+-----------+
rustc handles everything in one invocation. Internally it parses, type-checks,
performs borrow checking, generates LLVM IR, and invokes LLVM to produce machine code.
There is no separate preprocessor or linker step visible to the user (though a linker
is invoked behind the scenes).
Try It: Run
gcc -S hello.cand openhello.s. Find thecallinstruction that invokesprintf. On x86-64 Linux it will look something likecall printf@PLT.
Cargo: Rust's Build System
For anything beyond a single file, Rust programmers use Cargo.
$ cargo new hello_project
Created binary (application) `hello_project` package
$ cd hello_project
$ tree .
.
├── Cargo.toml
└── src
└── main.rs
src/main.rs already contains:
fn main() { println!("Hello, world!"); }
Build and run:
$ cargo build
Compiling hello_project v0.1.0
Finished dev [unoptimized + debuginfo] target(s)
$ cargo run
Hello, world!
| Cargo command | Purpose |
|---|---|
cargo new name | Create a new project |
cargo build | Compile (debug mode) |
cargo build --release | Compile with optimizations |
cargo run | Build and run |
cargo check | Type-check without producing a binary |
C has no official build system. Projects use Makefiles, CMake, Meson, or plain shell scripts. Here is a minimal Makefile for our hello program:
# Makefile
CC = gcc
CFLAGS = -Wall -Wextra -std=c17 -pedantic
hello: hello.c
$(CC) $(CFLAGS) -o hello hello.c
clean:
rm -f hello
$ make
gcc -Wall -Wextra -std=c17 -pedantic -o hello hello.c
$ make clean
rm -f hello
Driver Prep: The Linux kernel uses its own Kbuild Makefile system. Understanding basic Make targets (
all,clean,modules) is essential for kernel module work.
printf vs println!
The two are deceptively similar but work very differently under the hood.
C: printf
/* printf_demo.c */
#include <stdio.h>
int main(void)
{
int x = 42;
double pi = 3.14159;
char ch = 'A';
printf("integer: %d\n", x);
printf("float: %.2f\n", pi);
printf("char: %c\n", ch);
printf("hex: 0x%08x\n", x);
return 0;
}
$ gcc -Wall -o printf_demo printf_demo.c && ./printf_demo
integer: 42
float: 3.14
char: A
hex: 0x0000002a
printf format specifiers: %d (int), %f (double), %c (char), %s (string),
%x (hex), %p (pointer), %zu (size_t). Use the wrong one and you get undefined
behavior -- the compiler may warn you, but it is not required to.
Caution: Passing the wrong type to
printfis undefined behavior. For example,printf("%d\n", 3.14)will print garbage. The compiler cannot always catch this becauseprintfis a variadic function with no type information in its signature.
Rust: println!
// println_demo.rs fn main() { let x: i32 = 42; let pi: f64 = 3.14159; let ch: char = 'A'; println!("integer: {}", x); println!("float: {:.2}", pi); println!("char: {}", ch); println!("hex: {:#010x}", x); }
$ rustc println_demo.rs && ./println_demo
integer: 42
float: 3.14
char: A
hex: 0x0000002a
println! uses {} as the default placeholder. Formatting traits (Display, Debug)
determine how a type is printed. The compiler checks at compile time that every argument
matches a placeholder and implements the required trait.
Rust Note: You cannot pass the wrong type to
println!. It is a compile-time error, not undefined behavior. The macro expands into code that the type checker validates before any binary is produced.
Return Codes and Error Signaling
Both languages use the process exit code to signal success or failure to the OS.
/* exit_codes.c */
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, "Usage: %s <name>\n", argv[0]);
return EXIT_FAILURE; /* defined as 1 in stdlib.h */
}
printf("Hello, %s!\n", argv[1]);
return EXIT_SUCCESS; /* defined as 0 */
}
$ gcc -Wall -o exit_codes exit_codes.c
$ ./exit_codes
Usage: ./exit_codes <name>
$ echo $?
1
$ ./exit_codes Alice
Hello, Alice!
$ echo $?
0
The Rust equivalent:
// exit_codes.rs use std::env; use std::process; fn main() { let args: Vec<String> = env::args().collect(); if args.len() < 2 { eprintln!("Usage: {} <name>", args[0]); process::exit(1); } println!("Hello, {}!", args[1]); }
$ rustc exit_codes.rs
$ ./exit_codes
Usage: ./exit_codes <name>
$ echo $?
1
$ ./exit_codes Alice
Hello, Alice!
$ echo $?
0
eprintln! writes to stderr, just like fprintf(stderr, ...) in C.
The Edit-Compile-Run Cycle
Both languages follow the same workflow:
+------+ +---------+ +-----+
| Edit | ---> | Compile | ---> | Run |
+------+ +---------+ +-----+
^ |
| (fix bugs) |
+------------------------------+
In C the cycle is: edit hello.c, run gcc, run ./hello.
In Rust the cycle is: edit src/main.rs, run cargo run (which compiles and runs).
Rust's cargo check lets you skip code generation entirely when you only want to see
if your code type-checks. This is faster than a full build and useful during
development.
$ cargo check
Checking hello_project v0.1.0
Finished dev [unoptimized + debuginfo] target(s)
Multiple Source Files
A real C project splits code across files. Here is a minimal two-file example.
/* greet.h */
#ifndef GREET_H
#define GREET_H
void greet(const char *name);
#endif
/* greet.c */
#include <stdio.h>
#include "greet.h"
void greet(const char *name)
{
printf("Hello, %s!\n", name);
}
/* main.c */
#include "greet.h"
int main(void)
{
greet("world");
return 0;
}
$ gcc -Wall -c greet.c -o greet.o
$ gcc -Wall -c main.c -o main.o
$ gcc greet.o main.o -o hello
$ ./hello
Hello, world!
In Rust, you create a module:
#![allow(unused)] fn main() { // src/greet.rs pub fn greet(name: &str) { println!("Hello, {}!", name); } }
// src/main.rs mod greet; fn main() { greet::greet("world"); }
$ cargo run
Hello, world!
No header files. No include guards. No separate compilation step. Cargo handles it.
Driver Prep: Kernel modules in C use header files extensively. The kernel headers (
linux/module.h,linux/kernel.h, etc.) declare the interfaces you will call. Understanding#includeand header guards is not optional.
Quick Knowledge Check
- What does
return 0;in C'smaintell the operating system? - Why does
println!have an exclamation mark? - What gcc flag enables most compiler warnings?
Common Pitfalls
- Forgetting
\ninprintf. Output may not appear until the buffer flushes.println!adds the newline automatically. - Empty parentheses in C.
int main()means "unspecified parameters", not "no parameters". Writeint main(void)to mean "no parameters". - Using
rustcfor multi-file projects. Usecargoinstead.rustcworks for single files only. - Ignoring compiler warnings. Both
gcc -Wallandrustcproduce warnings for a reason. Treat warnings as errors during learning (-Werrorin gcc,#![deny(warnings)]in Rust). - Mixing up
printfformat specifiers.%dforint,%ldforlong,%zuforsize_t. Getting them wrong is undefined behavior in C.
Types and Variables
Every value in a running program occupies bytes in memory. C and Rust both force you to think about types, but Rust does it with stricter rules and stronger guarantees. This chapter maps the C type system onto Rust's so you can translate between them without hesitation.
Integer Types in C
C gives you a menu of integer types whose exact sizes are platform-dependent.
/* int_types.c */
#include <stdio.h>
#include <stdint.h>
int main(void)
{
char c = 'A';
short s = 32000;
int i = 2000000000;
long l = 2000000000L;
long long ll = 9000000000000000000LL;
unsigned int u = 4000000000U;
printf("char: %d (size: %zu)\n", c, sizeof(c));
printf("short: %d (size: %zu)\n", s, sizeof(s));
printf("int: %d (size: %zu)\n", i, sizeof(i));
printf("long: %ld (size: %zu)\n", l, sizeof(l));
printf("long long: %lld (size: %zu)\n", ll, sizeof(ll));
printf("unsigned: %u (size: %zu)\n", u, sizeof(u));
return 0;
}
$ gcc -Wall -o int_types int_types.c && ./int_types
char: 65 (size: 1)
short: 32000 (size: 2)
int: 2000000000 (size: 4)
long: 2000000000 (size: 8)
long long: 9000000000000000000 (size: 8)
unsigned: 4000000000 (size: 4)
The C standard only guarantees minimum sizes. int is at least 16 bits, long at
least 32, long long at least 64. For exact-width integers, use <stdint.h>:
int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t,
uint64_t. For object sizes, use size_t (unsigned, pointer-width).
Driver Prep: The Linux kernel uses fixed-width types extensively:
u8,u16,u32,u64,s8,s16,s32,s64. These map directly to the<stdint.h>types. Always prefer fixed-width types in systems code.
Integer Types in Rust
Rust makes the bit width part of the type name. No ambiguity.
// int_types.rs fn main() { let a: i8 = -128; let b: u8 = 255; let c: i16 = -32_768; let d: u16 = 65_535; let e: i32 = -2_147_483_647; let f: u32 = 4_294_967_295; let g: i64 = -9_223_372_036_854_775_807; let h: u64 = 18_446_744_073_709_551_615; let s: usize = 1024; println!("i8: {}", a); println!("u8: {}", b); println!("i32: {}", e); println!("u64: {}", h); println!("usize: {} (size: {} bytes)", s, std::mem::size_of::<usize>()); }
Note the underscores in numeric literals (65_535). Rust allows them for readability.
Type size comparison
+------------+----------+------------------+
| C type | Rust | Size (bytes) |
+------------+----------+------------------+
| char | i8 / u8 | 1 |
| short | i16 | 2 |
| int | i32 | 4 |
| long | i64* | 8 (on LP64) |
| long long | i64 | 8 |
| size_t | usize | pointer-width |
| ptrdiff_t | isize | pointer-width |
+------------+----------+------------------+
* C long is 4 bytes on Windows (LLP64), 8 on Linux (LP64).
Rust i64 is always 8 bytes.
Rust Note: Rust has no type whose size varies by platform except
usizeandisize, which are always pointer-width. Everything else is fixed. This eliminates an entire class of portability bugs.
Try It: In both C and Rust, print
sizeof(long)/size_of::<i64>()and confirm the sizes on your machine.
Floating-Point Types
C provides float (32-bit) and double (64-bit). Rust provides f32 and f64.
/* floats.c */
#include <stdio.h>
int main(void)
{
float f = 3.14f;
double d = 3.141592653589793;
printf("float: %.7f (size: %zu)\n", f, sizeof(f));
printf("double: %.15f (size: %zu)\n", d, sizeof(d));
return 0;
}
// floats.rs fn main() { let f: f32 = 3.14; let d: f64 = 3.141592653589793; println!("f32: {:.7} (size: {})", f, std::mem::size_of::<f32>()); println!("f64: {:.15} (size: {})", d, std::mem::size_of::<f64>()); }
Both follow IEEE 754. The behavior is identical at the bit level.
Characters
This is where C and Rust diverge sharply.
C: char is one byte
/* chars_c.c */
#include <stdio.h>
int main(void)
{
char c = 'A';
printf("char: %c, value: %d, size: %zu\n", c, c, sizeof(c));
return 0;
}
C's char is a single byte. It holds ASCII values (0-127). Whether char is signed
or unsigned is implementation-defined.
Rust: char is four bytes (a Unicode scalar value)
// chars_rust.rs fn main() { let c: char = 'A'; let heart: char = '\u{2764}'; let kanji: char = '\u{6F22}'; println!("char: {}, value: {}, size: {}", c, c as u32, std::mem::size_of::<char>()); println!("heart: {}, value: U+{:04X}", heart, heart as u32); println!("kanji: {}, value: U+{:04X}", kanji, kanji as u32); }
Rust Note: Rust's
charis a Unicode scalar value and always occupies 4 bytes. This is fundamentally different from C's 1-bytechar. In Rust,u8is the equivalent of C'scharwhen you want a raw byte.
C char 'A':
+----+
| 41 | 1 byte
+----+
Rust char 'A':
+----+----+----+----+
| 41 | 00 | 00 | 00 | 4 bytes (little-endian, Unicode scalar)
+----+----+----+----+
Booleans
C: Booleans are integers
/* bools_c.c */
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
bool a = true;
bool b = false;
int n = 42;
printf("true: %d (size: %zu)\n", a, sizeof(a));
printf("false: %d (size: %zu)\n", b, sizeof(b));
if (n) {
printf("%d is truthy in C\n", n);
}
return 0;
}
In C, true is 1, false is 0, and any integer can be used where a boolean is
expected. Zero is false; everything else is true.
Rust: bool is a distinct type
// bools_rust.rs fn main() { let a: bool = true; let b: bool = false; let n: i32 = 42; println!("true: {} (size: {})", a, std::mem::size_of::<bool>()); println!("false: {} (size: {})", b, std::mem::size_of::<bool>()); // if n { } // ERROR: expected `bool`, found `i32` if n != 0 { println!("{} is non-zero", n); } }
Caution: C's implicit integer-to-boolean conversion is a source of bugs.
if (x = 0)(assignment, not comparison) evaluates to false and silently succeeds. Rust rejects this at compile time.
Constants
C: const and #define
/* constants_c.c */
#include <stdio.h>
#define MAX_BUFFER 1024
static const int MAX_RETRIES = 5;
int main(void)
{
printf("buffer size: %d\n", MAX_BUFFER);
printf("max retries: %d\n", MAX_RETRIES);
return 0;
}
#define performs textual substitution -- no type, no scope, no address.
const creates a typed, scoped value.
Rust: const and static
// constants_rust.rs const MAX_BUFFER: usize = 1024; // compile-time constant, inlined static MAX_RETRIES: i32 = 5; // fixed address in memory fn main() { println!("buffer size: {}", MAX_BUFFER); println!("max retries: {}", MAX_RETRIES); }
| Keyword | Compile-time? | Has address? | Mutable? |
|---|---|---|---|
C #define | Preprocessor | No | N/A |
C const | No | Yes | No |
Rust const | Yes | No (inlined) | No |
Rust static | No | Yes | No* |
(*) static mut exists but is unsafe to access.
Rust Note: Rust's
constis evaluated at compile time and inlined at every use site. Rust'sstaticlives at a fixed address for the entire program lifetime.
Variable Declaration and Mutability
C: mutable by default
/* mutability_c.c */
#include <stdio.h>
int main(void)
{
int x = 10;
x = 20; /* fine -- variables are mutable by default */
const int y = 30;
/* y = 40; */ /* error: assignment of read-only variable */
printf("x = %d, y = %d\n", x, y);
return 0;
}
Rust: immutable by default
// mutability_rust.rs fn main() { let x = 10; // x = 20; // error: cannot assign twice to immutable variable let mut y = 30; y = 40; // fine -- declared with `mut` println!("x = {}, y = {}", x, y); }
This is the opposite default. In C, you opt into immutability with const. In Rust,
you opt into mutability with mut.
Try It: In Rust, try reassigning an immutable variable. Read the compiler error message. Rust error messages are famously helpful -- get used to reading them.
Type Casting and Coercion
C: implicit and explicit casts
/* casting_c.c */
#include <stdio.h>
int main(void)
{
int i = 42;
double d = i; /* implicit: int -> double */
int j = (int)3.99; /* explicit: double -> int (truncation!) */
char c = 300; /* implicit: int -> char (overflow!) */
printf("d = %f\n", d);
printf("j = %d\n", j);
printf("c = %d\n", c);
return 0;
}
Caution: C silently narrows values.
char c = 300wraps around without error. The compiler may warn with-Wall, but it compiles.
Rust: explicit only
// casting_rust.rs fn main() { let i: i32 = 42; let d: f64 = i as f64; // explicit: i32 -> f64 let j: i32 = 3.99_f64 as i32; // explicit: f64 -> i32 (truncates to 3) let c: u8 = 300_u16 as u8; // explicit: wraps to 44 println!("d = {}", d); println!("j = {}", j); println!("c = {}", c); }
Rust requires as for every numeric conversion. No implicit narrowing or widening.
sizeof in C, size_of in Rust
/* sizeof_demo.c */
#include <stdio.h>
int main(void)
{
printf("char: %zu bytes\n", sizeof(char));
printf("int: %zu bytes\n", sizeof(int));
printf("long: %zu bytes\n", sizeof(long));
printf("double: %zu bytes\n", sizeof(double));
printf("void*: %zu bytes\n", sizeof(void *));
return 0;
}
// sizeof_demo.rs use std::mem::size_of; fn main() { println!("i8: {} bytes", size_of::<i8>()); println!("i32: {} bytes", size_of::<i32>()); println!("i64: {} bytes", size_of::<i64>()); println!("f64: {} bytes", size_of::<f64>()); println!("bool: {} bytes", size_of::<bool>()); println!("char: {} bytes", size_of::<char>()); println!("usize: {} bytes", size_of::<usize>()); }
Complete type size reference (64-bit Linux)
+---------------+-------+---------------+-------+
| C type | Bytes | Rust type | Bytes |
+---------------+-------+---------------+-------+
| char | 1 | i8 / u8 | 1 |
| short | 2 | i16 / u16 | 2 |
| int | 4 | i32 / u32 | 4 |
| long | 8 | i64 / u64 | 8 |
| long long | 8 | i64 / u64 | 8 |
| (none) | 16 | i128 / u128 | 16 |
| float | 4 | f32 | 4 |
| double | 8 | f64 | 8 |
| _Bool | 1 | bool | 1 |
| char | 1 | char | 4 |
| void* | 8 | *const T | 8 |
| size_t | 8 | usize | 8 |
+---------------+-------+---------------+-------+
Integer Overflow
C: undefined behavior for signed, wrapping for unsigned
/* overflow_c.c */
#include <stdio.h>
#include <limits.h>
int main(void)
{
unsigned int u = UINT_MAX;
printf("UINT_MAX: %u\n", u);
printf("UINT_MAX + 1: %u\n", u + 1); /* wraps to 0 -- defined behavior */
int s = INT_MAX;
printf("INT_MAX: %d\n", s);
/* s + 1 is UNDEFINED BEHAVIOR for signed integers */
printf("INT_MAX + 1: %d\n", s + 1);
return 0;
}
Caution: Signed integer overflow in C is undefined behavior. The compiler is allowed to assume it never happens, and it may optimize your code in surprising ways based on that assumption.
Rust: panics in debug, wraps in release
// overflow_rust.rs fn main() { let u: u32 = u32::MAX; // Use wrapping_add for explicit wrapping: let v = u.wrapping_add(1); println!("u32::MAX wrapping_add(1) = {}", v); // Use checked_add to detect overflow: match u.checked_add(1) { Some(val) => println!("result: {}", val), None => println!("overflow detected!"), } // Use saturating_add to clamp at max: let w = u.saturating_add(1); println!("u32::MAX saturating_add(1) = {}", w); }
$ rustc overflow_rust.rs && ./overflow_rust
u32::MAX wrapping_add(1) = 0
overflow detected!
u32::MAX saturating_add(1) = 4294967295
Rust Note: Rust gives you four explicit choices for overflow:
wrapping_*,checked_*,saturating_*, andoverflowing_*. In debug builds, the standard+operator panics on overflow. In release builds, it wraps. There is no undefined behavior.
Quick Knowledge Check
- What is the size of
charin C versuscharin Rust? - What happens when you add 1 to
INT_MAXin C? In Rust (debug mode)? - How do you declare a mutable variable in Rust?
Common Pitfalls
- Assuming
intis always 32 bits. The C standard only guarantees at least 16. Useint32_twhen you need exactly 32 bits. - Forgetting that C's
charsignedness is implementation-defined. On ARM,charis unsigned. On x86, it is signed. Usesigned charorunsigned charto be explicit. - Using
%dto printsize_t. Use%zu. The wrong format specifier is undefined behavior. - Implicit narrowing in C. Assigning a
longto anintsilently truncates. Rust forces you to writeas i32. - Forgetting
mutin Rust. Variables are immutable by default. The compiler error is clear, but it catches newcomers off guard.
Control Flow
Programs need to make decisions and repeat work. C and Rust share the same fundamental constructs but differ in important ways around type safety, exhaustiveness, and what counts as a boolean. This chapter covers every branching and looping construct you will use in systems programming.
if / else
C
/* if_else.c */
#include <stdio.h>
int main(void)
{
int temp = 37;
if (temp > 100) {
printf("boiling\n");
} else if (temp > 0) {
printf("liquid\n");
} else {
printf("frozen\n");
}
return 0;
}
C's if condition is any expression. Zero is false, non-zero is true. Parentheses
around the condition are required.
Rust
// if_else.rs fn main() { let temp = 37; if temp > 100 { println!("boiling"); } else if temp > 0 { println!("liquid"); } else { println!("frozen"); } }
No parentheses around the condition (optional but idiomatic to omit). The condition
must be of type bool. You cannot write if temp { ... } when temp is an integer.
if as an expression in Rust
// if_expression.rs fn main() { let temp = 37; let state = if temp > 100 { "boiling" } else if temp > 0 { "liquid" } else { "frozen" }; println!("water is {}", state); }
Both arms must return the same type.
Rust Note: Because
ifis an expression, Rust has no need for a ternary operator.let x = if cond { a } else { b };replaces C'sx = cond ? a : b;.
Truthiness: 0 Is False in C
C: integers as booleans
/* truthiness.c */
#include <stdio.h>
int main(void)
{
int x = 0;
int y = 42;
int *p = NULL;
if (x) printf("x is truthy\n");
else printf("x is falsy\n");
if (y) printf("y is truthy\n");
else printf("y is falsy\n");
if (p) printf("p is non-null\n");
else printf("p is null\n");
return 0;
}
In C: 0, 0.0, NULL, and '\0' are all false. Everything else is true.
Rust: only bool is bool
// truthiness_rust.rs fn main() { let x: i32 = 0; // if x { } // ERROR: expected `bool`, found `i32` if x == 0 { println!("x is zero"); } let p: Option<i32> = None; if p.is_none() { println!("p is None"); } }
Caution: In C,
if (x = 0)assigns 0 to x and evaluates to false. This is a common bug that compilers warn about but do not reject. In Rust,if x = 0is a type error because assignment returns(), notbool.
Try It: In C, write
if (x = 5)(single equals) inside an if statement. Compile with-Wall. Read the warning. Then try the same in Rust.
The Ternary Operator (C Only)
/* ternary.c */
#include <stdio.h>
int main(void)
{
int x = 7;
const char *parity = (x % 2 == 0) ? "even" : "odd";
printf("%d is %s\n", x, parity);
int sign = (x > 0) ? 1 : (x < 0) ? -1 : 0;
printf("sign of %d is %d\n", x, sign);
return 0;
}
Rust replacement -- if/else expressions:
// ternary_rust.rs fn main() { let x = 7; let parity = if x % 2 == 0 { "even" } else { "odd" }; println!("{} is {}", x, parity); }
while Loops
/* while_loop.c */
#include <stdio.h>
int main(void)
{
int i = 0;
while (i < 5) {
printf("%d ", i);
i++;
}
printf("\n");
return 0;
}
// while_loop.rs fn main() { let mut i = 0; while i < 5 { print!("{} ", i); i += 1; } println!(); }
Rust has no ++ or -- operators. Use i += 1 and i -= 1.
do-while (C Only)
/* do_while.c */
#include <stdio.h>
int main(void)
{
int i = 10;
do {
printf("%d ", i);
i++;
} while (i < 5); /* condition is false, but body ran once */
printf("\n");
return 0;
}
Rust has no do-while. The idiomatic replacement uses loop:
// do_while_rust.rs fn main() { let mut i = 10; loop { print!("{} ", i); i += 1; if i >= 5 { break; } } println!(); }
for Loops
C
/* for_loop.c */
#include <stdio.h>
int main(void)
{
for (int i = 0; i < 5; i++) {
printf("%d ", i);
}
printf("\n");
int nums[] = {10, 20, 30, 40, 50};
size_t len = sizeof(nums) / sizeof(nums[0]);
for (size_t i = 0; i < len; i++) {
printf("%d ", nums[i]);
}
printf("\n");
return 0;
}
Rust
// for_loop.rs fn main() { for i in 0..5 { print!("{} ", i); } println!(); let nums = [10, 20, 30, 40, 50]; for n in &nums { print!("{} ", n); } println!(); for (i, n) in nums.iter().enumerate() { print!("[{}]={} ", i, n); } println!(); }
C for loop anatomy:
for (init; condition; update) { body }
Rust for loop anatomy:
for variable in iterator { body }
Try It: In Rust, change
0..5to0..=5(inclusive range). What is the difference in output?
loop: Rust's Infinite Loop
// loop_demo.rs fn main() { let mut count = 0; let result = loop { count += 1; if count == 10 { break count * 2; // loop can return a value via break } }; println!("result = {}", result); }
In C, you write while (1) or for (;;):
/* infinite_loop.c */
#include <stdio.h>
int main(void)
{
int count = 0;
int result;
for (;;) {
count++;
if (count == 10) {
result = count * 2;
break;
}
}
printf("result = %d\n", result);
return 0;
}
Driver Prep: Kernel code is full of infinite loops. The main kernel thread never returns. Device polling loops use
while (1)withbreakon status changes. Rust'sloopmaps directly to this pattern.
break and continue
Both languages support break (exit the loop) and continue (skip to next
iteration). The semantics are identical.
/* break_continue.c */
#include <stdio.h>
int main(void)
{
for (int i = 0; i < 10; i++) {
if (i == 3) continue;
if (i == 7) break;
printf("%d ", i);
}
printf("\n");
return 0;
}
// break_continue.rs fn main() { for i in 0..10 { if i == 3 { continue; } if i == 7 { break; } print!("{} ", i); } println!(); }
Both print: 0 1 2 4 5 6
Loop Labels (Rust)
Rust allows labeling loops and breaking/continuing to an outer loop by name.
// loop_labels.rs fn main() { 'outer: for i in 0..5 { for j in 0..5 { if i + j == 6 { println!("breaking outer at i={}, j={}", i, j); break 'outer; } if j == 3 { continue 'outer; } print!("({},{}) ", i, j); } } println!("done"); }
C has no loop labels. The typical workaround is a flag variable:
/* break_outer.c */
#include <stdio.h>
int main(void)
{
int done = 0;
for (int i = 0; i < 5 && !done; i++) {
for (int j = 0; j < 5; j++) {
if (i + j == 6) {
printf("breaking at i=%d, j=%d\n", i, j);
done = 1;
break;
}
}
}
printf("done\n");
return 0;
}
switch (C) vs match (Rust)
This is where the languages diverge most in control flow.
C: switch
/* switch_demo.c */
#include <stdio.h>
int main(void)
{
int day = 3;
switch (day) {
case 1: printf("Monday\n"); break;
case 2: printf("Tuesday\n"); break;
case 3: printf("Wednesday\n"); break;
case 4: printf("Thursday\n"); break;
case 5: printf("Friday\n"); break;
case 6: printf("Saturday\n"); break;
case 7: printf("Sunday\n"); break;
default: printf("Invalid\n"); break;
}
return 0;
}
Caution: Forgetting
breakin a Cswitchcauses fallthrough -- execution continues into the next case. This is a legendary source of bugs.
Rust: match
// match_demo.rs fn main() { let day = 3; let name = match day { 1 => "Monday", 2 => "Tuesday", 3 => "Wednesday", 4 => "Thursday", 5 => "Friday", 6 | 7 => "Weekend", _ => "Invalid", }; println!("day {} is {}", day, name); }
Key differences: no fallthrough, exhaustive (compiler rejects non-exhaustive matches),
and match is an expression that returns a value.
Pattern matching with ranges and guards
// match_patterns.rs fn main() { let score = 85; let grade = match score { 90..=100 => "A", 80..=89 => "B", 70..=79 => "C", 60..=69 => "D", 0..=59 => "F", _ => "Invalid", }; println!("score {} = grade {}", score, grade); let temp = 37; let status = match temp { t if t > 100 => "boiling", t if t == 37 => "body temperature", t if t > 0 => "cool", _ => "freezing", }; println!("{}C is {}", temp, status); }
Destructuring in match
// match_destructure.rs fn main() { let point = (3, -5); match point { (0, 0) => println!("origin"), (x, 0) => println!("on x-axis at x={}", x), (0, y) => println!("on y-axis at y={}", y), (x, y) => println!("point at ({}, {})", x, y), } }
Rust Note: Rust's
matchcan destructure tuples, structs, and enums, bind variables, use guards, and combine patterns. It is one of Rust's most distinctive features.
Combining Constructs: FizzBuzz
C
/* fizzbuzz.c */
#include <stdio.h>
int main(void)
{
for (int i = 1; i <= 20; i++) {
if (i % 15 == 0) printf("FizzBuzz\n");
else if (i % 3 == 0) printf("Fizz\n");
else if (i % 5 == 0) printf("Buzz\n");
else printf("%d\n", i);
}
return 0;
}
Rust
// fizzbuzz.rs fn main() { for i in 1..=20 { match (i % 3, i % 5) { (0, 0) => println!("FizzBuzz"), (0, _) => println!("Fizz"), (_, 0) => println!("Buzz"), _ => println!("{}", i), } } }
The Rust version uses tuple matching to handle all four cases cleanly.
Quick Knowledge Check
- What does
if (x = 5)do in C? What happens in Rust? - Can you use an integer as a condition in a Rust
ifstatement? - What happens if you omit the
_wildcard in a Rustmatchon ani32?
Common Pitfalls
- Missing
breakin Cswitch. Every case falls through without it. Use-Wimplicit-fallthroughto catch this. - Using
=instead of==in C conditions.if (x = 5)assigns 5 to x and always evaluates to true. Use-Wallto get a warning. - Non-exhaustive
matchin Rust. The compiler will reject it. Always include a_wildcard or cover every variant. - Off-by-one in ranges. C's
for (i = 0; i < n; i++)corresponds to Rust's0..n(exclusive). Use0..=nfor inclusive. - No
++/--in Rust. Use+= 1and-= 1. This is deliberate to avoid the confusion between prefix and postfix increment. - Forgetting that Rust's
forconsumes the iterator. Use&collectionto borrow instead of consuming.
Functions
Functions are the fundamental unit of code organization in both C and Rust. But the two languages differ in how they declare them, how they pass arguments, and how they organize code across files. This chapter covers all of it.
Declaring and Defining Functions in C
C distinguishes between a function declaration (prototype) and its definition (body).
/* functions_basic.c */
#include <stdio.h>
/* Declaration (prototype) */
int add(int a, int b);
int main(void)
{
int result = add(3, 4);
printf("3 + 4 = %d\n", result);
return 0;
}
/* Definition */
int add(int a, int b)
{
return a + b;
}
$ gcc -Wall -o functions_basic functions_basic.c && ./functions_basic
3 + 4 = 7
The declaration must appear before the first call. The definition can appear anywhere.
In C89, calling an undeclared function was allowed -- the compiler assumed int return.
Modern C (C99+) requires a declaration.
Caution: In older C code, you may see functions called without declarations. This is dangerous because the compiler cannot check argument types. Always use
-Wall -Wextra.
Defining Functions in Rust
Rust has no separation between declaration and definition. A function is defined once and can be called from anywhere in the same module, regardless of order.
// functions_basic.rs fn main() { let result = add(3, 4); println!("3 + 4 = {}", result); } fn add(a: i32, b: i32) -> i32 { a + b // no semicolon: this is the return expression }
The return type is specified with ->. If omitted, the function returns () (unit).
The last expression without a semicolon is the return value.
Parameter Passing: By Value
Both C and Rust pass arguments by value by default.
/* pass_by_value.c */
#include <stdio.h>
void increment(int x)
{
x = x + 1;
printf("inside: x = %d\n", x);
}
int main(void)
{
int a = 10;
increment(a);
printf("outside: a = %d\n", a);
return 0;
}
// pass_by_value.rs fn increment(mut x: i32) { x += 1; println!("inside: x = {}", x); } fn main() { let a = 10; increment(a); println!("outside: a = {}", a); }
Both print inside: 11, outside: 10. The function receives a copy.
For non-Copy types like String, Rust's pass-by-value transfers ownership:
// move_demo.rs fn take_string(s: String) { println!("got: {}", s); } fn main() { let msg = String::from("hello"); take_string(msg); // println!("{}", msg); // ERROR: value used after move }
Value flow (move):
main: msg ----[ownership transferred]----> take_string: s
msg is now invalid s is valid, then dropped
Parameter Passing: By Pointer (C)
To modify the caller's variable, C passes a pointer.
/* pass_by_pointer.c */
#include <stdio.h>
void increment(int *x)
{
*x = *x + 1;
}
int main(void)
{
int a = 10;
increment(&a);
printf("a = %d\n", a); /* 11 */
return 0;
}
Memory layout during the call:
main's stack frame increment's stack frame
+----------+ +----------+
| a = 10 | <------------ | x = &a |
+----------+ pointer +----------+
addr: 0x100 *x dereferences to 0x100
Caution: C does not prevent you from passing
NULL. Dereferencing a null pointer is undefined behavior and typically causes a segfault.
Parameter Passing: By Reference (Rust)
Rust uses references (& for shared, &mut for exclusive) instead of raw pointers.
// references.rs fn print_value(x: &i32) { println!("value = {}", x); } fn increment(x: &mut i32) { *x += 1; } fn main() { let mut a = 10; print_value(&a); increment(&mut a); println!("a = {}", a); // 11 }
Rust borrowing rules:
1. You can have MANY shared references (&T) at the same time
2. You can have ONE mutable reference (&mut T) at a time
3. You cannot have both at the same time
These rules are enforced at compile time.
Rust Note: References in Rust are always valid. They cannot be null, they cannot dangle, and the borrow checker ensures no data races. This is fundamentally safer than C pointers.
Try It: In Rust, try creating a
&mut awhile a&ais still in scope. Read the compiler error.
Multiple Return Values
C: returning a struct
/* multi_return_c.c */
#include <stdio.h>
typedef struct {
int quot;
int rem;
} divmod_result;
divmod_result divmod(int a, int b)
{
divmod_result r;
r.quot = a / b;
r.rem = a % b;
return r;
}
int main(void)
{
divmod_result r = divmod(17, 5);
printf("17 / 5 = %d remainder %d\n", r.quot, r.rem);
return 0;
}
Alternative: out-parameters via pointers.
/* out_params.c */
#include <stdio.h>
void divmod(int a, int b, int *quot, int *rem)
{
*quot = a / b;
*rem = a % b;
}
int main(void)
{
int q, r;
divmod(17, 5, &q, &r);
printf("17 / 5 = %d remainder %d\n", q, r);
return 0;
}
Rust: returning a tuple
// multi_return_rust.rs fn divmod(a: i32, b: i32) -> (i32, i32) { (a / b, a % b) } fn main() { let (quot, rem) = divmod(17, 5); println!("17 / 5 = {} remainder {}", quot, rem); }
Tuples are first-class. No struct or out-parameter boilerplate needed.
Try It: Write a function that returns
(min, max, sum)for a slice of integers in both C (using a struct) and Rust (using a tuple).
Function Pointers
C
/* fn_pointer.c */
#include <stdio.h>
int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }
void apply(int (*op)(int, int), int x, int y)
{
printf("result = %d\n", op(x, y));
}
int main(void)
{
apply(add, 3, 4);
apply(mul, 3, 4);
return 0;
}
Rust
// fn_pointer.rs fn add(a: i32, b: i32) -> i32 { a + b } fn mul(a: i32, b: i32) -> i32 { a * b } fn apply(op: fn(i32, i32) -> i32, x: i32, y: i32) { println!("result = {}", op(x, y)); } fn main() { apply(add, 3, 4); apply(mul, 3, 4); }
Rust also supports closures that capture their environment:
// closures.rs fn apply(op: &dyn Fn(i32, i32) -> i32, x: i32, y: i32) { println!("result = {}", op(x, y)); } fn main() { let offset = 10; let add_with_offset = |a, b| a + b + offset; apply(&add_with_offset, 3, 4); // result = 17 }
Driver Prep: The Linux kernel makes heavy use of function pointers for abstraction. Every device driver fills in a struct of function pointers (
struct file_operations,struct net_device_ops). Understanding function pointers is essential for driver work.
Forward Declarations and Header Files (C)
In real C projects, declarations go in headers (.h), definitions in sources (.c).
/* math_ops.h */
#ifndef MATH_OPS_H
#define MATH_OPS_H
int add(int a, int b);
int mul(int a, int b);
#endif
/* math_ops.c */
#include "math_ops.h"
int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }
/* main.c */
#include <stdio.h>
#include "math_ops.h"
int main(void)
{
printf("add: %d\n", add(3, 4));
printf("mul: %d\n", mul(3, 4));
return 0;
}
$ gcc -Wall -c math_ops.c -o math_ops.o
$ gcc -Wall -c main.c -o main.o
$ gcc math_ops.o main.o -o math_demo
$ ./math_demo
add: 7
mul: 12
C compilation flow (multi-file):
math_ops.h
|
v
math_ops.c ---[gcc -c]---> math_ops.o ---+
+--> [linker] --> math_demo
main.c -------[gcc -c]---> main.o -------+
^
|
math_ops.h (included)
Modules in Rust
Rust replaces header files with a module system.
#![allow(unused)] fn main() { // src/math_ops.rs pub fn add(a: i32, b: i32) -> i32 { a + b } pub fn mul(a: i32, b: i32) -> i32 { a * b } }
// src/main.rs mod math_ops; fn main() { println!("add: {}", math_ops::add(3, 4)); println!("mul: {}", math_ops::mul(3, 4)); }
No header files. No include guards. The pub keyword controls visibility.
Rust module system:
src/main.rs --[mod math_ops]--> src/math_ops.rs
pub fn add(...)
pub fn mul(...)
Rust Note: Rust's module system enforces encapsulation at compile time. Items without
pubare genuinely inaccessible from outside the module. In C, header files are documentation, not enforcement.
Static and Private Functions
C: static limits visibility to the current file
/* helpers.c */
static int helper(int x) { return x * 2; }
int public_function(int x) { return helper(x) + 1; }
Rust: omit pub
#![allow(unused)] fn main() { // helpers.rs fn helper(x: i32) -> i32 { x * 2 } pub fn public_function(x: i32) -> i32 { helper(x) + 1 } }
Functions are private by default in Rust. No keyword needed.
Recursion
/* factorial_c.c */
#include <stdio.h>
unsigned long factorial(unsigned int n)
{
if (n <= 1) return 1;
return n * factorial(n - 1);
}
int main(void)
{
for (unsigned int i = 0; i <= 10; i++) {
printf("%2u! = %lu\n", i, factorial(i));
}
return 0;
}
// factorial_rust.rs fn factorial(n: u64) -> u64 { if n <= 1 { 1 } else { n * factorial(n - 1) } } fn main() { for i in 0..=10 { println!("{:2}! = {}", i, factorial(i)); } }
Caution: Neither C nor Rust guarantees tail-call optimization. Deep recursion can overflow the stack. Prefer iterative solutions when depth is unbounded.
Quick Knowledge Check
- In C, what is the difference between a function declaration and a definition?
- What does
pubdo in Rust? - Why can you not use a
Stringafter passing it by value to a Rust function?
Common Pitfalls
- Forgetting the forward declaration in C. The compiler may assume
intreturn type, leading to subtle bugs. - Passing
NULLwhere a pointer is expected in C. No compile-time protection. Check forNULLdefensively. - Confusing
&and&mutin Rust. If you need to modify the argument, the function must take&mut T, and the caller must pass&mut value. - Forgetting that Rust strings are UTF-8. You cannot index a
Stringby byte position. Use.chars()for iteration. - Returning a pointer to a local variable in C. The stack frame is gone after return. The pointer dangles. Rust prevents this at compile time.
- Overusing
returnin Rust. Idiomatic style omitsreturnfor the last expression. Usereturnonly for early exits.
Structs, Enums, and Unions
Primitive types only get you so far. Real programs model real things: a network packet has a source, a destination, and a payload. A device register has named bit fields. This chapter covers the composite types that make systems programming possible.
C Structs
A struct groups related values under one name.
/* struct_basic.c */
#include <stdio.h>
#include <math.h>
typedef struct {
double x;
double y;
} Point;
double distance(Point a, Point b)
{
double dx = a.x - b.x;
double dy = a.y - b.y;
return sqrt(dx * dx + dy * dy);
}
int main(void)
{
Point a = { .x = 0.0, .y = 0.0 };
Point b = { .x = 3.0, .y = 4.0 };
printf("distance = %f\n", distance(a, b));
return 0;
}
$ gcc -Wall -std=c17 -o struct_basic struct_basic.c -lm && ./struct_basic
distance = 5.000000
typedef lets you write Point instead of struct Point everywhere. The .x
syntax in the initializer is a C99 designated initializer.
Driver Prep: The Linux kernel uses structs constantly:
struct file,struct inode,struct task_struct,struct sk_buff. Understanding struct layout and passing is foundational.
Rust Structs
Named-field struct
// struct_basic.rs struct Point { x: f64, y: f64, } fn distance(a: &Point, b: &Point) -> f64 { let dx = a.x - b.x; let dy = a.y - b.y; (dx * dx + dy * dy).sqrt() } fn main() { let a = Point { x: 0.0, y: 0.0 }; let b = Point { x: 3.0, y: 4.0 }; println!("distance = {}", distance(&a, &b)); }
No typedef needed. The struct name is the type name directly.
Tuple struct and unit struct
// tuple_struct.rs struct Color(u8, u8, u8); struct Meters(f64); struct Marker; // unit struct, zero-sized fn main() { let red = Color(255, 0, 0); println!("R={}, G={}, B={}", red.0, red.1, red.2); let height = Meters(1.82); println!("height = {} m", height.0); let _m = Marker; println!("size of Marker = {}", std::mem::size_of::<Marker>()); }
Tuple structs are useful for the "newtype" pattern -- wrapping a value in a distinct type for type safety. Unit structs take no memory at runtime.
Methods (impl blocks in Rust)
Rust attaches methods to structs via impl. C has no methods; you pass the struct
to a function manually.
C: functions that take a struct pointer
/* rect_c.c */
#include <stdio.h>
typedef struct {
double width;
double height;
} Rect;
double rect_area(const Rect *r)
{
return r->width * r->height;
}
int main(void)
{
Rect r = { .width = 5.0, .height = 3.0 };
printf("area = %f\n", rect_area(&r));
return 0;
}
Rust: methods with self
// rect_rust.rs struct Rect { width: f64, height: f64, } impl Rect { fn area(&self) -> f64 { self.width * self.height } fn new(width: f64, height: f64) -> Rect { Rect { width, height } } } fn main() { let r = Rect::new(5.0, 3.0); println!("area = {}", r.area()); }
C struct "method" call: rect_area(&r)
Rust method call: r.area()
Under the hood, both pass a pointer to the struct.
&self == const Rect*
&mut self == Rect*
Try It: Add a
scalemethod to the RustRectthat takes&mut selfand afactor: f64, and multiplies bothwidthandheightby the factor.
C Enums
In C, enums are just named integer constants.
/* enum_c.c */
#include <stdio.h>
enum Direction { NORTH = 0, SOUTH = 1, EAST = 2, WEST = 3 };
const char *direction_name(enum Direction d)
{
switch (d) {
case NORTH: return "North";
case SOUTH: return "South";
case EAST: return "East";
case WEST: return "West";
default: return "Unknown";
}
}
int main(void)
{
enum Direction d = EAST;
printf("direction = %s (%d)\n", direction_name(d), d);
/* C allows any integer -- no type safety */
enum Direction invalid = 99;
printf("invalid = %s (%d)\n", direction_name(invalid), invalid);
return 0;
}
Caution: C enums provide no type safety. You can assign any integer to an enum variable. The
defaultcase is your only defense.
Rust Enums: Algebraic Data Types
Rust enums are fundamentally more powerful. Each variant can carry data.
Simple enum
// enum_simple.rs #[derive(Debug)] enum Direction { North, South, East, West, } fn direction_name(d: &Direction) -> &str { match d { Direction::North => "North", Direction::South => "South", Direction::East => "East", Direction::West => "West", } } fn main() { let d = Direction::East; println!("direction = {} ({:?})", direction_name(&d), d); // let invalid: Direction = 99; // does NOT compile }
The match is exhaustive. Add a fifth variant and the compiler forces you to handle
it everywhere.
Enums with data
// enum_data.rs #[derive(Debug)] enum Shape { Circle(f64), Rectangle(f64, f64), Triangle { base: f64, height: f64 }, } fn area(shape: &Shape) -> f64 { match shape { Shape::Circle(r) => std::f64::consts::PI * r * r, Shape::Rectangle(w, h) => w * h, Shape::Triangle { base, height } => 0.5 * base * height, } } fn main() { let shapes = vec![ Shape::Circle(5.0), Shape::Rectangle(4.0, 6.0), Shape::Triangle { base: 3.0, height: 8.0 }, ]; for s in &shapes { println!("{:?} -> area = {:.2}", s, area(s)); } }
$ rustc enum_data.rs && ./enum_data
Circle(5.0) -> area = 78.54
Rectangle(4.0, 6.0) -> area = 24.00
Triangle { base: 3.0, height: 8.0 } -> area = 12.00
This is impossible in C with plain enums. You would need a struct with a tag and union.
Option and Result
Rust's standard library uses enums for two critical types.
Option replaces null pointers:
// option_demo.rs fn find_first_negative(nums: &[i32]) -> Option<usize> { for (i, &n) in nums.iter().enumerate() { if n < 0 { return Some(i); } } None } fn main() { let data = [10, 20, -5, 30]; match find_first_negative(&data) { Some(idx) => println!("first negative at index {}", idx), None => println!("no negatives found"), } }
Result replaces error codes:
// result_demo.rs use std::num::ParseIntError; fn parse_and_double(s: &str) -> Result<i32, ParseIntError> { let n: i32 = s.parse()?; Ok(n * 2) } fn main() { match parse_and_double("21") { Ok(val) => println!("success: {}", val), Err(e) => println!("error: {}", e), } match parse_and_double("abc") { Ok(val) => println!("success: {}", val), Err(e) => println!("error: {}", e), } }
Rust Note:
Option<T>isenum { Some(T), None }.Result<T, E>isenum { Ok(T), Err(E) }. These are ordinary enums with generics. The power comes frommatchand the?operator.
C Unions
A union stores different types in the same memory. Only one field is valid at a time.
/* union_c.c */
#include <stdio.h>
#include <string.h>
typedef struct {
enum { INT_VAL, FLOAT_VAL, STR_VAL } tag;
union {
int i;
double f;
char s[32];
} data;
} Value;
void print_value(const Value *v)
{
switch (v->tag) {
case INT_VAL: printf("int: %d\n", v->data.i); break;
case FLOAT_VAL: printf("float: %f\n", v->data.f); break;
case STR_VAL: printf("str: %s\n", v->data.s); break;
}
}
int main(void)
{
Value a = { .tag = INT_VAL, .data.i = 42 };
Value b = { .tag = FLOAT_VAL, .data.f = 3.14 };
Value c = { .tag = STR_VAL };
strncpy(c.data.s, "hello", sizeof(c.data.s) - 1);
print_value(&a);
print_value(&b);
print_value(&c);
return 0;
}
Union memory layout:
+------+------+------+------+------+------+------+------+
| shared memory (32 bytes) |
+------+------+------+------+------+------+------+------+
When tag == INT_VAL: first 4 bytes hold int
When tag == FLOAT_VAL: first 8 bytes hold double
When tag == STR_VAL: all 32 bytes hold char[32]
sizeof(union) = size of largest member = 32
Caution: Reading the wrong union member is undefined behavior in C. The
tagfield is a convention, not an enforcement. There is no runtime check.
Driver Prep: Type punning through unions is common in low-level code -- reading hardware registers, parsing binary protocols. The Linux kernel uses unions in structures like
union sigvalandunion nf_inet_addr.
Rust's Safe Alternative to Unions
Rust enums with data are tagged unions with the tag built in and enforced by the compiler:
// tagged_union_rust.rs enum Value { Int(i32), Float(f64), Str(String), } fn print_value(v: &Value) { match v { Value::Int(i) => println!("int: {}", i), Value::Float(f) => println!("float: {}", f), Value::Str(s) => println!("str: {}", s), } } fn main() { let values = vec![ Value::Int(42), Value::Float(3.14), Value::Str(String::from("hello")), ]; for v in &values { print_value(v); } }
For low-level type punning, Rust has raw union types (access requires unsafe):
// raw_union.rs union FloatBits { f: f32, u: u32, } fn main() { let fb = FloatBits { f: 1.0 }; let bits = unsafe { fb.u }; println!("float 1.0 as bits: 0x{:08X}", bits); }
Rust Note: Raw Rust unions exist primarily for C interop (FFI). In pure Rust code, prefer enums. The
unsafeblock signals that the programmer is taking responsibility for correctness.
Memory Layout Comparison
/* layout_c.c */
#include <stdio.h>
#include <stddef.h>
typedef struct {
char a; /* 1 byte */
int b; /* 4 bytes */
char c; /* 1 byte */
double d; /* 8 bytes */
} Example;
int main(void)
{
printf("sizeof(Example) = %zu\n", sizeof(Example));
printf("offset of a = %zu\n", offsetof(Example, a));
printf("offset of b = %zu\n", offsetof(Example, b));
printf("offset of c = %zu\n", offsetof(Example, c));
printf("offset of d = %zu\n", offsetof(Example, d));
return 0;
}
C struct layout (with padding):
Byte: 0 1 2 3 4 5 6 7
+----+----+----+----+----+----+----+----+
| a | pad| pad| pad| b | b | b | b |
+----+----+----+----+----+----+----+----+
Byte: 8 9 10 11 12 13 14 15
+----+----+----+----+----+----+----+----+
| c | pad| pad| pad| pad| pad| pad| pad|
+----+----+----+----+----+----+----+----+
Byte: 16 17 18 19 20 21 22 23
+----+----+----+----+----+----+----+----+
| d | d | d | d | d | d | d | d |
+----+----+----+----+----+----+----+----+
Total: 24 bytes (10 bytes of padding!)
Rust reorders fields to minimize padding:
// layout_rust.rs use std::mem; struct Example { a: u8, b: i32, c: u8, d: f64 } fn main() { println!("size of Example = {}", mem::size_of::<Example>()); }
Rust struct layout (fields reordered by compiler):
Byte: 0 1 2 3 4 5 6 7
+----+----+----+----+----+----+----+----+
| d | d | d | d | d | d | d | d |
+----+----+----+----+----+----+----+----+
Byte: 8 9 10 11 12 13 14 15
+----+----+----+----+----+----+----+----+
| b | b | b | b | a | c | pad| pad|
+----+----+----+----+----+----+----+----+
Total: 16 bytes (2 bytes of padding)
To force C-compatible layout, use #[repr(C)]:
// repr_c.rs #[repr(C)] struct Example { a: u8, b: i32, c: u8, d: f64 } fn main() { println!("size (#[repr(C)]) = {}", std::mem::size_of::<Example>()); // prints 24, same as C }
Driver Prep: When passing structs to the kernel or hardware, you must control the layout. Use
#[repr(C)]in Rust. In C, use__attribute__((packed))if you need to eliminate padding entirely.
Try It: Reorder the fields in the C struct to minimize padding manually. What is the smallest
sizeofyou can achieve?
Enum Memory Layout
// enum_size.rs use std::mem; enum Color { Red, Green, Blue } fn main() { println!("size of Color = {}", mem::size_of::<Color>()); println!("size of Option<u8> = {}", mem::size_of::<Option<u8>>()); println!("size of Option<Box<i32>> = {}", mem::size_of::<Option<Box<i32>>>()); }
$ rustc enum_size.rs && ./enum_size
size of Color = 1
size of Option<u8> = 2
size of Option<Box<i32>> = 8
Rust uses the smallest discriminant that fits. Color needs only 1 byte. A C enum
is typically 4 bytes (int-sized).
Option<Box<i32>> is the same size as Box<i32> -- Rust uses "niche optimization":
since Box can never be null, the null bit pattern represents None.
Option<Box<i32>> layout:
Some(ptr): | non-zero pointer value (8 bytes) |
None: | 0x0000000000000000 (8 bytes) |
No extra tag byte needed.
Quick Knowledge Check
- What is the difference between a C
unionand a Rustenumwith data? - Why does Rust reorder struct fields by default?
- What does
#[repr(C)]do?
Common Pitfalls
- Reading the wrong union member in C. Undefined behavior. No runtime check. Use a tag field and validate it in every access.
- Forgetting padding in C structs.
sizeof(struct)may be larger than the sum of field sizes. Useoffsetofto check. - Assuming C enum values are contiguous. You can assign arbitrary values:
enum E { A = 0, B = 100 }. Do not use them as array indices without bounds checks. - Forgetting
pubon Rust struct fields. The struct may be public, but fields are private by default. - Using
#[repr(C)]everywhere in Rust. Only use it when you need C-compatible layout (FFI, memory-mapped I/O). Otherwise let the compiler optimize. - Ignoring niche optimization.
Option<&T>is the same size as&T. Do not wrap references in custom tagged enums whenOptionalready does it for free.
Pointers in C
Pointers are the single most important concept in C. They are how C talks to hardware, manages memory, and builds every non-trivial data structure. If you do not understand pointers, you cannot write a device driver, a kernel module, or any serious systems code.
The Address-Of Operator (&)
Every variable lives at a memory address. The & operator gives you that address.
/* addr.c */
#include <stdio.h>
int main(void)
{
int x = 42;
printf("value of x: %d\n", x);
printf("address of x: %p\n", (void *)&x);
return 0;
}
Compile and run:
$ gcc -o addr addr.c && ./addr
value of x: 42
address of x: 0x7ffd3a2b1c4c
The exact address changes every run (ASLR). The point: &x yields a number
that identifies where x sits in memory.
Declaring and Dereferencing Pointers
A pointer variable stores an address. The * in a declaration says "this
variable holds an address." The * in an expression says "follow this address."
/* deref.c */
#include <stdio.h>
int main(void)
{
int x = 10;
int *p = &x; /* p holds the address of x */
printf("x = %d\n", x);
printf("*p = %d\n", *p); /* dereference: follow the address */
*p = 99; /* write through the pointer */
printf("x = %d\n", x); /* x changed */
return 0;
}
Output:
x = 10
*p = 10
x = 99
ASCII memory layout:
Stack
+--------+--------+
| name | value |
+--------+--------+
| x | 99 | <-- address 0x1000 (example)
+--------+--------+
| p | 0x1000 | <-- p stores address of x
+--------+--------+
Driver Prep: In kernel code, hardware registers are accessed through pointers to specific physical addresses.
volatile unsigned int *reg = (volatile unsigned int *)0x40021000;is real embedded C.
NULL Pointers
A pointer that points to nothing should be set to NULL. Dereferencing NULL
is undefined behavior -- on most systems, a segfault.
/* null.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = NULL;
if (p == NULL) {
printf("p is NULL, not dereferencing\n");
}
/* Uncomment the next line to crash: */
/* printf("%d\n", *p); */
return 0;
}
Caution: Dereferencing a NULL pointer is undefined behavior. The kernel uses NULL-pointer dereference as a common crash vector. Always check before you dereference.
Pointer Arithmetic
When you add 1 to a pointer, it advances by sizeof(*p) bytes, not one byte.
This is how C walks through arrays.
/* arith.c */
#include <stdio.h>
int main(void)
{
int arr[] = {10, 20, 30, 40, 50};
int *p = arr; /* arr decays to pointer to first element */
for (int i = 0; i < 5; i++) {
printf("p + %d = %p, value = %d\n", i, (void *)(p + i), *(p + i));
}
return 0;
}
p + 0 = 0x7ffc..., value = 10
p + 1 = 0x7ffc..., value = 20
p + 2 = 0x7ffc..., value = 30
p + 3 = 0x7ffc..., value = 40
p + 4 = 0x7ffc..., value = 50
Each step moves 4 bytes (sizeof(int)), not 1.
Memory (each cell = 4 bytes for int)
+----+----+----+----+----+
| 10 | 20 | 30 | 40 | 50 |
+----+----+----+----+----+
p+0 p+1 p+2 p+3 p+4
Try It: Change the array to
char arr[]and print addresses. Notice that each step now moves 1 byte instead of 4. Pointer arithmetic is always in units of the pointed-to type.
Arrays Decay to Pointers
In most expressions, an array name becomes a pointer to its first element. This is called "decay."
/* decay.c */
#include <stdio.h>
void print_first(int *p)
{
printf("first element via pointer: %d\n", *p);
}
int main(void)
{
int arr[] = {100, 200, 300};
print_first(arr); /* arr decays to &arr[0] */
/* These are equivalent: */
printf("arr[1] = %d\n", arr[1]);
printf("*(arr+1) = %d\n", *(arr + 1));
return 0;
}
The key exception: sizeof(arr) gives the full array size, not the pointer
size. Once passed to a function, the size information is lost.
/* sizeof_decay.c */
#include <stdio.h>
void show_size(int *p)
{
/* This prints the size of the pointer, not the array */
printf("inside function: sizeof(p) = %zu\n", sizeof(p));
}
int main(void)
{
int arr[5] = {0};
printf("in main: sizeof(arr) = %zu\n", sizeof(arr)); /* 20 */
show_size(arr); /* 8 on 64-bit */
return 0;
}
Caution: This is why C functions that take arrays always need a separate length parameter. Forgetting this is the root cause of most buffer overflows.
Pointers to Structs and the -> Operator
When you have a pointer to a struct, you access members with -> instead of ..
It is equivalent to (*p).member but far more readable.
/* structptr.c */
#include <stdio.h>
struct point {
int x;
int y;
};
void move_right(struct point *p, int dx)
{
p->x += dx; /* same as (*p).x += dx */
}
int main(void)
{
struct point pt = {3, 7};
printf("before: (%d, %d)\n", pt.x, pt.y);
move_right(&pt, 10);
printf("after: (%d, %d)\n", pt.x, pt.y);
return 0;
}
Output:
before: (3, 7)
after: (13, 7)
Driver Prep: Kernel data structures (file_operations, net_device, device_driver) are all accessed through struct pointers. You will write code like
dev->irqandfilp->private_dataconstantly.
Double Pointers (Pointer to Pointer)
A double pointer stores the address of another pointer. This is used when a function needs to change which address a pointer holds.
/* doubleptr.c */
#include <stdio.h>
#include <stdlib.h>
void allocate(int **pp, int value)
{
*pp = malloc(sizeof(int));
if (*pp == NULL) {
perror("malloc");
exit(1);
}
**pp = value;
}
int main(void)
{
int *p = NULL;
allocate(&p, 42);
printf("*p = %d\n", *p);
free(p);
return 0;
}
Stack Heap
+------+---------+ +------+
| p | 0x9000 -|------>| 42 |
+------+---------+ +------+
0x9000
Inside allocate():
+------+---------+
| pp | &p -|----> p (on caller's stack)
+------+---------+
Common uses of double pointers:
- Functions that allocate memory and return it via a parameter
- Arrays of strings (
char **argv) - Linked list head modification
void * -- The Generic Pointer
void * can point to any type. You cannot dereference it directly; you must
cast it first. This is C's mechanism for generic programming.
/* voidptr.c */
#include <stdio.h>
void print_bytes(const void *data, int len)
{
const unsigned char *bytes = (const unsigned char *)data;
for (int i = 0; i < len; i++) {
printf("%02x ", bytes[i]);
}
printf("\n");
}
int main(void)
{
int x = 0x12345678;
float f = 3.14f;
printf("int bytes: ");
print_bytes(&x, sizeof(x));
printf("float bytes: ");
print_bytes(&f, sizeof(f));
return 0;
}
malloc returns void *. The kernel's kmalloc does the same. Every callback
mechanism in C uses void * for user data.
Rust Note: Rust does not have
void *. Generics and trait objects (dyn Trait) replace it with type safety. In unsafe Rust,*const u8or*mut u8serve a similar role.
Common Pointer Bugs
Dangling Pointer
A pointer to memory that has been freed or gone out of scope.
/* dangling.c -- DO NOT DO THIS */
#include <stdio.h>
#include <stdlib.h>
int *bad_function(void)
{
int local = 42;
return &local; /* WARNING: returning address of local variable */
}
int main(void)
{
int *p = bad_function();
/* p is now dangling -- local no longer exists */
printf("%d\n", *p); /* undefined behavior */
return 0;
}
$ gcc -Wall -o dangling dangling.c
dangling.c: warning: function returns address of local variable
Always heed compiler warnings. They catch this.
Wild Pointer
An uninitialized pointer contains garbage. Dereferencing it is undefined.
/* wild.c -- DO NOT DO THIS */
#include <stdio.h>
int main(void)
{
int *p; /* uninitialized -- points to random address */
/* *p = 10; */ /* undefined behavior, likely segfault */
printf("p = %p\n", (void *)p); /* garbage address */
return 0;
}
Always initialize pointers to NULL or a valid address.
Use-After-Free
/* use_after_free.c -- DO NOT DO THIS */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc(sizeof(int));
*p = 42;
free(p);
/* p still holds the old address but the memory is freed */
/* *p = 99; */ /* undefined behavior */
/* Good practice: set to NULL after free */
p = NULL;
return 0;
}
Off-By-One
/* offbyone.c -- DO NOT DO THIS */
#include <stdio.h>
int main(void)
{
int arr[5] = {1, 2, 3, 4, 5};
/* Bug: accessing arr[5], which is one past the end */
for (int i = 0; i <= 5; i++) { /* should be i < 5 */
printf("%d ", arr[i]);
}
printf("\n");
return 0;
}
Caution: Off-by-one errors in pointer/array access are the most common source of buffer overflows and security vulnerabilities in C code. The Linux kernel has had hundreds of CVEs from this class of bug.
Pointers and const
The placement of const matters. Learn to read declarations right-to-left.
/* constptr.c */
#include <stdio.h>
int main(void)
{
int x = 10, y = 20;
const int *p1 = &x; /* pointer to const int: cannot change *p1 */
/* *p1 = 99; */ /* ERROR */
p1 = &y; /* OK: can change where p1 points */
int *const p2 = &x; /* const pointer to int: cannot change p2 */
*p2 = 99; /* OK: can change the value */
/* p2 = &y; */ /* ERROR */
const int *const p3 = &x; /* const pointer to const int: nothing changes */
/* *p3 = 99; */ /* ERROR */
/* p3 = &y; */ /* ERROR */
printf("x = %d\n", x);
return 0;
}
Read declarations right-to-left:
const int *p --> p is a pointer to int that is const
int *const p --> p is a const pointer to int
const int *const p --> p is a const pointer to const int
Rust Note: Rust's
&Tis likeconst int *(read-only). Rust's&mut Tis likeint *(read-write). There is no equivalent ofint *constbecause Rust bindings are immutable by default (letvslet mut).
Function Pointers
Functions have addresses too. A function pointer lets you call a function indirectly -- the basis of callbacks.
/* funcptr.c */
#include <stdio.h>
int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }
void apply(int (*op)(int, int), int x, int y)
{
printf("result = %d\n", op(x, y));
}
int main(void)
{
apply(add, 3, 4); /* result = 7 */
apply(mul, 3, 4); /* result = 12 */
/* Array of function pointers */
int (*ops[])(int, int) = {add, mul};
for (int i = 0; i < 2; i++) {
printf("ops[%d](5, 6) = %d\n", i, ops[i](5, 6));
}
return 0;
}
Driver Prep: The Linux kernel's
struct file_operationsis a struct of function pointers. Every device driver fills one in. Understanding function pointers is non-negotiable for kernel work.
Putting It Together: A Tiny Stack
/* stack.c */
#include <stdio.h>
#include <stdlib.h>
#define STACK_MAX 16
struct stack {
int data[STACK_MAX];
int top;
};
void stack_init(struct stack *s)
{
s->top = 0;
}
int stack_push(struct stack *s, int value)
{
if (s->top >= STACK_MAX)
return -1; /* full */
s->data[s->top++] = value;
return 0;
}
int stack_pop(struct stack *s, int *out)
{
if (s->top <= 0)
return -1; /* empty */
*out = s->data[--s->top];
return 0;
}
int main(void)
{
struct stack s;
stack_init(&s);
stack_push(&s, 10);
stack_push(&s, 20);
stack_push(&s, 30);
int val;
while (stack_pop(&s, &val) == 0) {
printf("popped: %d\n", val);
}
return 0;
}
Output:
popped: 30
popped: 20
popped: 10
Notice: every function takes struct stack *. The caller owns the struct; the
functions borrow it via pointer. This is the C pattern that Rust formalizes
with borrowing.
Try It: Add a
stack_peekfunction that returns the top value without removing it. Use a pointer parameter for the output, just likestack_pop.
Knowledge Check
- What does
*(arr + 3)mean ifarris anintarray? - Why must you pass array length separately in C?
- What is the difference between
const int *pandint *const p?
Common Pitfalls
- Forgetting to check for NULL after
malloc-- crashes in production. - Returning a pointer to a local variable -- instant dangling pointer.
- Confusing
*p++precedence -- it incrementsp, not*p. Use(*p)++. - Casting away const -- the compiler lets you, the program breaks at runtime.
- Not setting freed pointers to NULL -- use-after-free becomes silent corruption.
- Sizeof on a decayed pointer -- gives pointer size, not array size.
- Pointer arithmetic on void* -- not standard C (GCC allows it as extension, treating it as char*).
References and Borrowing in Rust
Rust replaces C pointers with references that the compiler can reason about. You get the power of indirection without the bugs. This chapter shows how Rust's borrowing system works, what you give up compared to C, and what you gain.
Shared References: &T
A shared reference lets you read data without owning it. Multiple shared references can exist at the same time.
// shared_ref.rs fn print_value(r: &i32) { println!("value = {}", r); // *r = 99; // ERROR: cannot assign through a shared reference } fn main() { let x = 42; let r1 = &x; let r2 = &x; // multiple shared references: OK print_value(r1); print_value(r2); println!("x is still {}", x); }
$ rustc shared_ref.rs && ./shared_ref
value = 42
value = 42
x is still 42
Compare to C:
/* shared_ref.c */
#include <stdio.h>
void print_value(const int *r)
{
printf("value = %d\n", *r);
/* *r = 99; */ /* ERROR: r points to const int */
}
int main(void)
{
int x = 42;
const int *r1 = &x;
const int *r2 = &x;
print_value(r1);
print_value(r2);
printf("x is still %d\n", x);
return 0;
}
The surface similarity is real: &T in Rust behaves like const T * in C. But
Rust enforces it at a deeper level -- you cannot cast away the constness.
Mutable References: &mut T
A mutable reference gives exclusive read-write access. Only one &mut T can
exist for a given value at a time, and no shared references may coexist with it.
// mut_ref.rs fn increment(r: &mut i32) { *r += 1; } fn main() { let mut x = 10; increment(&mut x); increment(&mut x); println!("x = {}", x); // 12 }
$ rustc mut_ref.rs && ./mut_ref
x = 12
The C equivalent:
/* mut_ref.c */
#include <stdio.h>
void increment(int *r)
{
(*r)++;
}
int main(void)
{
int x = 10;
increment(&x);
increment(&x);
printf("x = %d\n", x); /* 12 */
return 0;
}
In C, any int * is mutable. There is no compiler-enforced exclusivity.
The Borrowing Rules
Rust enforces exactly two rules at compile time:
- One mutable reference, OR any number of shared references -- never both.
- References must always be valid -- no dangling references.
// borrow_rules.rs -- This will NOT compile fn main() { let mut x = 10; let r1 = &x; // shared borrow let r2 = &mut x; // ERROR: cannot borrow x as mutable // because it is also borrowed as immutable println!("{} {}", r1, r2); }
error[E0502]: cannot borrow `x` as mutable because it is
also borrowed as immutable
This is the key innovation. The compiler prevents data races and aliased mutation at compile time.
Borrowing Rules Visualized
===========================
Allowed: Allowed:
+---+ &T +---+ +---+ &mut T +---+
| A |-------->| x | | A |--------->| x |
+---+ +---+ +---+ +---+
+---+ &T ^ (only one)
| B |----------/
+---+
FORBIDDEN:
+---+ &T +---+
| A |-------->| x |
+---+ +---+
+---+ &mut T ^
| B |---------/ <-- compile error
+---+
Rust Note: This is the fundamental difference from C. In C, you can have a
const int *and anint *to the same address at the same time. The compiler cannot catch the resulting bugs.
Why This Prevents Data Races
A data race requires three conditions simultaneously:
- Two or more threads access the same memory
- At least one access is a write
- No synchronization
Rust's borrowing rules make condition 2 impossible when condition 1 is true. If multiple references exist, none can write. If a mutable reference exists, no other reference exists.
// no_data_race.rs use std::thread; fn main() { let mut data = vec![1, 2, 3]; // This would fail to compile: // let r = &data; // thread::spawn(move || { // data.push(4); // cannot move `data` while borrowed // }); // println!("{:?}", r); // Instead, you must choose: move or borrow, not both thread::spawn(move || { data.push(4); println!("{:?}", data); }).join().unwrap(); }
$ rustc no_data_race.rs && ./no_data_race
[1, 2, 3, 4]
In C, the equivalent code compiles without complaint and races silently.
Non-Lexical Lifetimes (NLL)
Rust's borrow checker is smart. A borrow ends at its last use, not at the end of the scope. This is called Non-Lexical Lifetimes.
// nll.rs fn main() { let mut x = 10; let r1 = &x; println!("r1 = {}", r1); // r1's borrow ends here (last use) let r2 = &mut x; // OK: r1 is no longer active *r2 += 5; println!("r2 = {}", r2); }
$ rustc nll.rs && ./nll
r1 = 10
r2 = 15
Without NLL (older Rust), this would not compile. The borrow checker has gotten significantly smarter over time.
References to Structs
Like C's -> operator, Rust auto-dereferences through references when calling
methods or accessing fields.
// struct_ref.rs struct Point { x: i32, y: i32, } fn move_right(p: &mut Point, dx: i32) { p.x += dx; // no -> needed, Rust auto-dereferences } fn show(p: &Point) { println!("({}, {})", p.x, p.y); } fn main() { let mut pt = Point { x: 3, y: 7 }; show(&pt); move_right(&mut pt, 10); show(&pt); }
$ rustc struct_ref.rs && ./struct_ref
(3, 7)
(13, 7)
Compare to the C version from Chapter 6: the logic is identical, but Rust
distinguishes &Point (read-only) from &mut Point (read-write) at the type
level.
Reborrowing
When you pass a &mut T to a function, Rust implicitly creates a shorter-lived
mutable borrow. The original reference is "frozen" until the reborrow ends.
// reborrow.rs fn add_one(val: &mut i32) { *val += 1; } fn add_two(val: &mut i32) { add_one(val); // reborrow: val is implicitly &mut *val add_one(val); // works again after first reborrow ends } fn main() { let mut x = 0; add_two(&mut x); println!("x = {}", x); // 2 }
$ rustc reborrow.rs && ./reborrow
x = 2
This is why you can call multiple &mut functions in sequence -- each reborrow
is temporary.
Dangling References: Impossible in Safe Rust
In C, returning a pointer to a local variable is a dangling pointer bug. In Rust, the compiler rejects it outright.
// dangling.rs -- This will NOT compile fn bad() -> &i32 { let x = 42; &x // ERROR: `x` does not live long enough } fn main() { let r = bad(); println!("{}", r); }
error[E0106]: missing lifetime specifier
error[E0515]: cannot return reference to local variable `x`
Compare to C, where this compiles with only a warning:
/* dangling.c */
#include <stdio.h>
int *bad(void)
{
int x = 42;
return &x; /* warning: returning address of local variable */
}
int main(void)
{
int *r = bad();
printf("%d\n", *r); /* undefined behavior */
return 0;
}
Caution: The C version "works" on many systems because the stack frame has not been overwritten yet. This makes the bug hard to detect. Rust eliminates the entire class of bug.
Slices: References to Contiguous Data
A slice &[T] is a reference plus a length. It is Rust's answer to C's
"pointer plus separate length parameter."
// slice.rs fn sum(data: &[i32]) -> i32 { let mut total = 0; for val in data { total += val; } total } fn main() { let arr = [10, 20, 30, 40, 50]; println!("sum of all: {}", sum(&arr)); println!("sum of [1..4]: {}", sum(&arr[1..4])); // 20+30+40 }
$ rustc slice.rs && ./slice
sum of all: 150
sum of [1..4]: 90
Slice layout in memory:
&arr[1..4]
+----------+--------+
| pointer | length | <-- fat pointer (2 words)
| &arr[1] | 3 |
+----------+--------+
|
v
+----+----+----+----+----+
| 10 | 20 | 30 | 40 | 50 | <-- underlying array
+----+----+----+----+----+
[1] [2] [3]
Try It: Create a function
fn largest(data: &[i32]) -> i32that finds the maximum value in a slice. Test it with different sub-slices of an array.
What You Give Up, What You Gain
Compared to C pointers, Rust references give up:
- Pointer arithmetic -- no
p + 3on references. Use indexing or iterators. - NULL -- references cannot be null. Use
Option<&T>instead. - Aliased mutation -- you cannot have multiple mutable paths to the same data.
- Casting -- no implicit type-punning through references.
What you gain:
- No dangling references -- compile-time guarantee.
- No data races -- compile-time guarantee.
- No null dereferences -- no null to dereference.
- No buffer overflows -- slices carry their length.
When You Need unsafe
Sometimes you genuinely need raw pointer behavior. Rust's unsafe blocks let
you opt out of borrow checking for specific operations.
// raw_ptr.rs fn main() { let mut x = 42; // Create raw pointers (safe -- creating is fine) let r1 = &x as *const i32; let r2 = &mut x as *mut i32; // Dereference raw pointers (unsafe -- you take responsibility) unsafe { println!("r1 = {}", *r1); *r2 = 99; println!("r2 = {}", *r2); } }
$ rustc raw_ptr.rs && ./raw_ptr
r1 = 42
r2 = 99
Driver Prep: The Rust-for-Linux project uses
unsafeblocks to interact with kernel C APIs. The idea is to build safe abstractions on top of unsafe foundations. The unsafe surface area is small and auditable.
Pattern: Option Instead of NULL
Rust uses Option<&T> where C uses "pointer or NULL."
// option_ref.rs fn find(data: &[i32], target: i32) -> Option<&i32> { for val in data { if *val == target { return Some(val); } } None } fn main() { let arr = [10, 20, 30, 40]; match find(&arr, 30) { Some(val) => println!("found: {}", val), None => println!("not found"), } match find(&arr, 99) { Some(val) => println!("found: {}", val), None => println!("not found"), } }
$ rustc option_ref.rs && ./option_ref
found: 30
not found
The C equivalent:
/* find.c */
#include <stdio.h>
const int *find(const int *data, int len, int target)
{
for (int i = 0; i < len; i++) {
if (data[i] == target)
return &data[i];
}
return NULL;
}
int main(void)
{
int arr[] = {10, 20, 30, 40};
const int *result = find(arr, 4, 30);
if (result)
printf("found: %d\n", *result);
else
printf("not found\n");
return 0;
}
The Rust version forces you to handle None. The C version lets you forget to
check for NULL.
Putting It Together: A Borrowing Exercise
// borrow_exercise.rs struct Stats { count: usize, sum: f64, } fn compute_stats(data: &[f64]) -> Stats { let count = data.len(); let sum: f64 = data.iter().sum(); Stats { count, sum } } fn print_stats(s: &Stats) { println!("count = {}", s.count); println!("sum = {:.2}", s.sum); if s.count > 0 { println!("mean = {:.2}", s.sum / s.count as f64); } } fn normalize(data: &mut [f64], factor: f64) { for val in data.iter_mut() { *val /= factor; } } fn main() { let mut data = vec![10.0, 20.0, 30.0, 40.0, 50.0]; let stats = compute_stats(&data); // shared borrow print_stats(&stats); // shared borrow of stats normalize(&mut data, stats.sum); // mutable borrow of data println!("\nnormalized: {:?}", data); }
$ rustc borrow_exercise.rs && ./borrow_exercise
count = 5
sum = 150.00
mean = 30.00
normalized: [0.06666666666666667, 0.13333333333333333, 0.2, ...]
Try It: Modify the program to normalize by the mean instead of the sum. Make sure you compute the stats before the mutable borrow.
Knowledge Check
- What happens if you try to hold
&xand&mut xat the same time? - How does Rust represent "this function might return no value"?
- What is a "fat pointer" in the context of slices?
Common Pitfalls
- Fighting the borrow checker -- restructure your code, do not reach for
unsafe. - Holding borrows across
.push()-- pushing to a Vec might reallocate, invalidating references. - Forgetting
&mut-- writingincrement(&x)when you meanincrement(&mut x). - Confusing
&TwithT-- a reference is not a copy; you must dereference to get the value. - Using
unsafeto "shut up the compiler" -- if the compiler says no, you probably have a real bug. - Indexing instead of iterating --
for val in &datais safer thanfor i in 0..data.len().
Arrays, Slices, and Strings
Arrays and strings are the data structures that break the most programs. In C, they are raw memory with no guardrails. In Rust, they carry their length and check bounds. This chapter covers both approaches and shows why buffer overflows keep making headlines.
C Arrays: Fixed Size on the Stack
A C array is a contiguous block of elements. The size must be a compile-time constant (in standard C89/C99 with some caveats).
/* c_array.c */
#include <stdio.h>
int main(void)
{
int arr[5] = {10, 20, 30, 40, 50};
printf("sizeof(arr) = %zu bytes\n", sizeof(arr)); /* 20 */
printf("elements = %zu\n", sizeof(arr) / sizeof(arr[0])); /* 5 */
for (int i = 0; i < 5; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
return 0;
}
Stack layout:
+----+----+----+----+----+
| 10 | 20 | 30 | 40 | 50 |
+----+----+----+----+----+
arr[0] arr[4]
Total: 5 * sizeof(int) = 20 bytes
No length is stored anywhere. You, the programmer, must track it.
Variable-Length Arrays (VLAs)
C99 added VLAs where the size comes from a runtime value. They live on the stack and can blow it up.
/* vla.c */
#include <stdio.h>
void fill(int n)
{
int arr[n]; /* VLA: size determined at runtime */
for (int i = 0; i < n; i++) {
arr[i] = i * i;
}
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}
int main(void)
{
fill(5);
fill(10);
return 0;
}
Caution: VLAs are banned in the Linux kernel (
-Wvlaflag). A largenoverflows the kernel stack (typically 8 KB or 16 KB). Usekmallocor fixed-size arrays instead.
Heap Arrays in C
For dynamic sizes, allocate on the heap with malloc.
/* heap_array.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int n = 5;
int *arr = malloc(n * sizeof(int));
if (arr == NULL) {
perror("malloc");
return 1;
}
for (int i = 0; i < n; i++) {
arr[i] = (i + 1) * 100;
}
for (int i = 0; i < n; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
free(arr);
return 0;
}
Stack Heap
+-----+----------+ +-----+-----+-----+-----+-----+
| arr | 0x8000 -|--------->| 100 | 200 | 300 | 400 | 500 |
+-----+----------+ +-----+-----+-----+-----+-----+
| n | 5 |
+-----+----------+
Rust Arrays: [T; N]
Rust arrays have their size baked into the type. [i32; 5] is a different type
from [i32; 3].
// rust_array.rs fn main() { let arr: [i32; 5] = [10, 20, 30, 40, 50]; println!("length = {}", arr.len()); for (i, val) in arr.iter().enumerate() { println!("arr[{}] = {}", i, val); } // Bounds checking at runtime: // let bad = arr[10]; // panics: index out of bounds }
$ rustc rust_array.rs && ./rust_array
length = 5
arr[0] = 10
arr[1] = 20
arr[2] = 30
arr[3] = 40
arr[4] = 50
The length is part of the type. No separate variable needed.
Rust Vec<T>: The Growable Array
Vec<T> is Rust's heap-allocated, growable array. It replaces C's
malloc/realloc pattern.
// vec_demo.rs fn main() { let mut v: Vec<i32> = Vec::new(); v.push(10); v.push(20); v.push(30); println!("length = {}", v.len()); println!("capacity = {}", v.capacity()); for val in &v { println!("{}", val); } v.pop(); // removes last element println!("after pop: {:?}", v); }
$ rustc vec_demo.rs && ./vec_demo
length = 3
capacity = 4
10
20
30
after pop: [10, 20]
Vec<T> layout:
Stack (Vec struct) Heap (buffer)
+----------+---------+ +----+----+----+----+
| pointer | 0x5000 -|--->| 10 | 20 | 30 | |
+----------+---------+ +----+----+----+----+
| length | 3 | [0] [1] [2] unused
+----------+---------+
| capacity | 4 |
+----------+---------+
When you push beyond capacity, Vec allocates a new, larger buffer, copies the
data, and frees the old one. This is automatic realloc.
Slices: &[T]
C has no concept of a slice. When you pass an array to a function in C, you pass a pointer and pray the caller also passed the correct length.
Rust slices bundle pointer and length together.
// slices.rs fn sum(data: &[i32]) -> i32 { let mut total = 0; for &val in data { total += val; } total } fn main() { let arr = [1, 2, 3, 4, 5]; let v = vec![10, 20, 30]; // Slice from array println!("sum(arr) = {}", sum(&arr)); println!("sum(arr[1..4]) = {}", sum(&arr[1..4])); // 2+3+4 // Slice from Vec println!("sum(v) = {}", sum(&v)); println!("sum(v[..2]) = {}", sum(&v[..2])); // 10+20 }
$ rustc slices.rs && ./slices
sum(arr) = 15
sum(arr[1..4]) = 9
sum(v) = 60
sum(v[..2]) = 30
The C equivalent requires explicit length passing:
/* sum.c */
#include <stdio.h>
int sum(const int *data, int len)
{
int total = 0;
for (int i = 0; i < len; i++) {
total += data[i];
}
return total;
}
int main(void)
{
int arr[] = {1, 2, 3, 4, 5};
printf("sum = %d\n", sum(arr, 5));
printf("sum[1..4] = %d\n", sum(arr + 1, 3));
return 0;
}
Rust Note: Slices perform bounds checking on every index access. This costs a branch instruction but prevents buffer overflows. In hot loops, the optimizer often eliminates the check.
Try It: Write a Rust function
fn max_value(data: &[i32]) -> Option<i32>that returnsNonefor an empty slice andSome(max)otherwise. Compare how much simpler it is than the C equivalent.
C Strings: Null-Terminated char *
C strings are arrays of char terminated by a zero byte ('\0'). There is no
stored length.
/* cstring.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
char greeting[] = "Hello";
printf("string: %s\n", greeting);
printf("strlen: %zu\n", strlen(greeting)); /* 5 */
printf("sizeof: %zu\n", sizeof(greeting)); /* 6 (includes \0) */
/* Print each byte */
for (int i = 0; i <= (int)strlen(greeting); i++) {
printf(" [%d] = '%c' (%d)\n", i, greeting[i], greeting[i]);
}
return 0;
}
string: Hello
strlen: 5
sizeof: 6
[0] = 'H' (72)
[1] = 'e' (101)
[2] = 'l' (108)
[3] = 'l' (108)
[4] = 'o' (111)
[5] = '' (0) <-- null terminator
C string in memory:
+---+---+---+---+---+----+
| H | e | l | l | o | \0 |
+---+---+---+---+---+----+
greeting[0] greeting[5]
The Dangerous String Functions
strcpy -- No Bounds Checking
/* strcpy_bad.c -- DO NOT DO THIS in production */
#include <stdio.h>
#include <string.h>
int main(void)
{
char buf[8];
char *input = "This string is way too long for buf";
strcpy(buf, input); /* BUFFER OVERFLOW */
printf("%s\n", buf);
return 0;
}
Caution:
strcpywrites until it hits\0in the source. It has no idea how big the destination is. This is the cause of thousands of CVEs.
strncpy -- Better, But Tricky
/* strncpy_demo.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
char buf[8];
strncpy(buf, "Hello, World!", sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0'; /* strncpy may not null-terminate! */
printf("buf = '%s'\n", buf); /* "Hello, " (truncated) */
return 0;
}
Caution:
strncpydoes NOT guarantee null-termination if the source is longer than the buffer. Always set the last byte to\0manually.
snprintf -- The Safe Choice
/* snprintf_demo.c */
#include <stdio.h>
int main(void)
{
char buf[16];
int written = snprintf(buf, sizeof(buf), "Count: %d", 42);
printf("buf = '%s'\n", buf);
printf("would have written %d chars\n", written);
/* If written >= sizeof(buf), truncation occurred */
if (written >= (int)sizeof(buf)) {
printf("WARNING: output truncated\n");
}
return 0;
}
snprintf always null-terminates (if size > 0) and tells you how many
characters it wanted to write. Use it for all string formatting in C.
Driver Prep: The Linux kernel uses
scnprintf, a variant that returns the number of characters actually written (not the would-have-been count). Never usesprintfin kernel code.
Rust Strings: String and &str
Rust has two main string types:
String-- owned, heap-allocated, growable (likeVec<u8>with UTF-8 guarantee)&str-- borrowed string slice (like&[u8]but guaranteed UTF-8)
// rust_strings.rs fn greet(name: &str) { println!("Hello, {}!", name); } fn main() { // String literal -> &str (stored in binary, read-only) let s1: &str = "world"; greet(s1); // Owned String on the heap let s2: String = String::from("Rust"); greet(&s2); // &String auto-coerces to &str // Building strings let mut s3 = String::new(); s3.push_str("Hello"); s3.push(' '); s3.push_str("World"); println!("{}", s3); // Length is always known println!("len = {}", s3.len()); // bytes println!("chars = {}", s3.chars().count()); // unicode scalar values }
$ rustc rust_strings.rs && ./rust_strings
Hello, world!
Hello, Rust!
Hello World
len = 11
chars = 11
String layout:
Stack (String struct) Heap
+----------+---------+ +---+---+---+---+---+---+---+---+---+---+---+
| pointer | 0x7000 -|--->| H | e | l | l | o | | W | o | r | l | d |
+----------+---------+ +---+---+---+---+---+---+---+---+---+---+---+
| length | 11 | UTF-8 bytes, NO null terminator
+----------+---------+
| capacity | 16 |
+----------+---------+
&str layout:
+----------+---------+
| pointer | 0x7000 | Points into String or binary data
+----------+---------+
| length | 11 | Fat pointer, always knows its length
+----------+---------+
Rust Note: Rust strings are always valid UTF-8. You cannot put arbitrary bytes in a
String. For raw bytes, useVec<u8>or&[u8]. For OS-interface strings, useOsStringandOsStr.
Buffer Overflows: Why They Happen
Buffer overflows happen when code writes past the end of a buffer. In C, this is trivially easy:
/* overflow.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
char password[8] = "secret";
char buffer[8];
printf("Enter name: ");
/* gets() has been removed from the C standard.
scanf without width limit is equally dangerous: */
scanf("%s", buffer); /* no length limit! */
printf("buffer = '%s'\n", buffer);
printf("password = '%s'\n", password);
return 0;
}
If the user types more than 7 characters, buffer overflows into password
(or whatever is adjacent on the stack). This is how stack-smashing attacks work.
The Rust equivalent simply cannot overflow:
// no_overflow.rs use std::io; fn main() { let mut buffer = String::new(); println!("Enter name:"); io::stdin().read_line(&mut buffer).unwrap(); // String grows as needed -- cannot overflow println!("buffer = '{}'", buffer.trim()); }
Side by Side: Processing CSV Lines
A practical example showing the difference in safety.
C Version
/* csv_c.c */
#include <stdio.h>
#include <string.h>
void parse_line(const char *line)
{
char buf[256];
strncpy(buf, line, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0';
char *token = strtok(buf, ",");
int col = 0;
while (token != NULL) {
printf(" col %d: '%s'\n", col, token);
token = strtok(NULL, ",");
col++;
}
}
int main(void)
{
const char *lines[] = {
"Alice,30,Engineer",
"Bob,25,Designer",
"Carol,35,Manager",
};
for (int i = 0; i < 3; i++) {
printf("Line %d:\n", i);
parse_line(lines[i]);
}
return 0;
}
Rust Version
// csv_rust.rs fn parse_line(line: &str) { for (col, token) in line.split(',').enumerate() { println!(" col {}: '{}'", col, token); } } fn main() { let lines = [ "Alice,30,Engineer", "Bob,25,Designer", "Carol,35,Manager", ]; for (i, line) in lines.iter().enumerate() { println!("Line {}:", i); parse_line(line); } }
The Rust version has no fixed-size buffer, no null terminator management, no
strtok with its hidden static state, and no possible overflow.
Try It: Extend the C version to handle lines longer than 256 characters. Notice how much code you need. Then notice that the Rust version already handles any length.
Knowledge Check
- What is the difference between
strlen(s)andsizeof(s)for a char array? - Why is
strcpydangerous? What should you use instead? - How does a Rust
&strdiffer from a Cconst char *?
Common Pitfalls
- Forgetting the null terminator -- C strings need
+1byte.char buf[5]holds at most 4 characters. - Using
strlenin a loop condition -- it traverses the string every call. Cache the length. strncpydoes not null-terminate -- if source is longer thann, the destination has no\0.- Mixing up bytes and characters -- UTF-8 characters can be 1-4 bytes.
strlencounts bytes. - Array decay in sizeof --
sizeof(arr)inside the declaring function gives array size; inside a called function, it gives pointer size. - Off-by-one in loop bounds --
i <= nwhen you meani < n. - Not checking
snprintfreturn -- it tells you if truncation occurred; ignoring it means silent data loss.
Dynamic Memory: malloc/free vs Box/Vec
Stack memory is fast but limited in size and lifetime. When you need memory that outlives a function call or whose size is not known at compile time, you allocate it on the heap. C gives you raw tools; Rust gives you safe abstractions over the same tools.
malloc and free
malloc requests bytes from the heap. free returns them. Everything between
is your responsibility.
/* malloc_basic.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc(sizeof(int));
if (p == NULL) {
perror("malloc");
return 1;
}
*p = 42;
printf("*p = %d\n", *p);
free(p);
p = NULL; /* good practice: prevent use-after-free */
return 0;
}
Before malloc: After malloc:
Stack Stack Heap
+---+------+ +---+--------+ +----+
| p | NULL | | p | 0x5000-|->| 42 |
+---+------+ +---+--------+ +----+
After free:
Stack Heap
+---+------+ +----+
| p | NULL | | ?? | <-- memory returned to allocator
+---+------+ +----+
calloc and realloc
calloc allocates and zero-initializes. realloc resizes an existing
allocation.
/* calloc_realloc.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
/* calloc: allocate 5 ints, all zeroed */
int *arr = calloc(5, sizeof(int));
if (arr == NULL) {
perror("calloc");
return 1;
}
for (int i = 0; i < 5; i++) {
arr[i] = (i + 1) * 10;
}
/* realloc: grow to 10 ints */
int *tmp = realloc(arr, 10 * sizeof(int));
if (tmp == NULL) {
perror("realloc");
free(arr);
return 1;
}
arr = tmp;
/* New elements are uninitialized (not zeroed!) */
for (int i = 5; i < 10; i++) {
arr[i] = (i + 1) * 10;
}
for (int i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
free(arr);
return 0;
}
Caution: Never do
arr = realloc(arr, new_size). Ifreallocfails, it returns NULL and you lose the original pointer -- a memory leak. Always use a temporary variable.
Memory Leaks
A memory leak occurs when allocated memory is never freed. The process holds onto memory it can never use again.
/* leak.c -- DO NOT DO THIS */
#include <stdio.h>
#include <stdlib.h>
void leaky(void)
{
int *p = malloc(1024);
if (p == NULL) return;
*p = 42;
/* forgot to free(p) -- leaked 1024 bytes */
}
int main(void)
{
for (int i = 0; i < 1000; i++) {
leaky(); /* leaks 1 MB total */
}
printf("Done (but leaked ~1 MB)\n");
return 0;
}
Run with Valgrind to detect:
$ gcc -g -o leak leak.c
$ valgrind --leak-check=full ./leak
...
==12345== LEAK SUMMARY:
==12345== definitely lost: 1,024,000 bytes in 1,000 blocks
Double-Free
Freeing the same pointer twice is undefined behavior. It can corrupt the allocator's internal data structures, leading to crashes or exploits.
/* double_free.c -- DO NOT DO THIS */
#include <stdlib.h>
int main(void)
{
int *p = malloc(sizeof(int));
*p = 42;
free(p);
/* free(p); */ /* UNDEFINED BEHAVIOR: double-free */
p = NULL; /* Setting to NULL after free prevents double-free */
free(p); /* free(NULL) is safe -- it does nothing */
return 0;
}
Caution: Double-free bugs are a major source of security vulnerabilities. Attackers can exploit heap corruption caused by double-free to execute arbitrary code.
Rust's Box<T>
Box<T> allocates a single value on the heap. When the Box goes out of
scope, the memory is freed automatically.
// box_demo.rs fn main() { let b = Box::new(42); println!("*b = {}", *b); // b goes out of scope here -> memory freed automatically // No free() needed. No leak possible. No double-free possible. }
Stack Heap
+---+---------+ +----+
| b | 0x5000 -|------>| 42 |
+---+---------+ +----+
When b drops:
- Heap memory at 0x5000 is freed
- b is gone from the stack
Rust's Vec<T>
Vec<T> is a growable heap array. It replaces C's malloc/realloc/free pattern.
// vec_grow.rs fn main() { let mut v: Vec<i32> = Vec::new(); println!("len={}, cap={}", v.len(), v.capacity()); for i in 0..10 { v.push(i * 10); println!("pushed {}: len={}, cap={}", i * 10, v.len(), v.capacity()); } println!("\ncontents: {:?}", v); }
$ rustc vec_grow.rs && ./vec_grow
len=0, cap=0
pushed 0: len=1, cap=4
pushed 10: len=2, cap=4
pushed 20: len=3, cap=4
pushed 30: len=4, cap=4
pushed 40: len=5, cap=8
pushed 50: len=6, cap=8
pushed 60: len=7, cap=8
pushed 70: len=8, cap=8
pushed 80: len=9, cap=16
pushed 90: len=10, cap=16
contents: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
Vec doubles its capacity when full (the exact growth factor is an
implementation detail). Compare this to manually calling realloc in C.
RAII and the Drop Trait
RAII (Resource Acquisition Is Initialization) means: tie resource cleanup to
scope exit. In Rust, every type can implement the Drop trait to run cleanup
code when a value goes out of scope.
// drop_demo.rs struct Resource { name: String, } impl Resource { fn new(name: &str) -> Self { println!("[{}] acquired", name); Resource { name: String::from(name), } } } impl Drop for Resource { fn drop(&mut self) { println!("[{}] released", self.name); } } fn main() { let _a = Resource::new("A"); { let _b = Resource::new("B"); let _c = Resource::new("C"); println!("-- end of inner scope --"); } // B and C dropped here (reverse order) println!("-- end of main --"); } // A dropped here
$ rustc drop_demo.rs && ./drop_demo
[A] acquired
[B] acquired
[C] acquired
-- end of inner scope --
[C] released
[B] released
-- end of main --
[A] released
Drop order is reverse of creation order, just like C++ destructors.
Driver Prep: The Rust-for-Linux project uses RAII extensively. When a device driver struct is dropped, it automatically unregisters the device, frees DMA buffers, and releases IRQs. This eliminates an entire class of kernel resource leaks.
C Equivalent of RAII: goto cleanup
C does not have destructors. The standard pattern is goto cleanup:
/* goto_cleanup.c */
#include <stdio.h>
#include <stdlib.h>
int process_data(int n)
{
int ret = -1;
int *buf1 = malloc(n * sizeof(int));
if (buf1 == NULL) goto out;
int *buf2 = malloc(n * sizeof(int));
if (buf2 == NULL) goto free_buf1;
/* Do work with buf1 and buf2 */
for (int i = 0; i < n; i++) {
buf1[i] = i;
buf2[i] = i * 2;
}
printf("Processed %d elements\n", n);
ret = 0;
free(buf2);
free_buf1:
free(buf1);
out:
return ret;
}
int main(void)
{
if (process_data(10) != 0) {
fprintf(stderr, "processing failed\n");
return 1;
}
return 0;
}
Driver Prep: This
goto cleanuppattern is the single most common pattern in Linux kernel code. Everyprobe()function looks like this. Rust's RAII replaces it entirely.
Detecting Memory Bugs
Valgrind
$ gcc -g -o program program.c
$ valgrind --leak-check=full --show-leak-kinds=all ./program
Valgrind instruments every memory access at runtime. It catches:
- Memory leaks
- Use-after-free
- Double-free
- Buffer overflows (heap)
- Uninitialized reads
AddressSanitizer (ASan)
$ gcc -g -fsanitize=address -o program program.c
$ ./program
ASan is faster than Valgrind (2x slowdown vs 20x). It catches the same bugs plus stack buffer overflows. Both GCC and Clang support it.
/* asan_demo.c -- compile with -fsanitize=address */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc(5 * sizeof(int));
p[5] = 99; /* heap-buffer-overflow */
free(p);
return 0;
}
$ gcc -g -fsanitize=address -o asan_demo asan_demo.c && ./asan_demo
=================================================================
==12345==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x...
WRITE of size 4 at 0x... thread T0
#0 0x... in main asan_demo.c:8
Try It: Compile the leak example above with
-fsanitize=addressand compare the output to Valgrind. ASan's reports are often more readable.
Side-by-Side: Linked List
This is the definitive comparison. A singly-linked list shows every difference between C and Rust memory management.
C Linked List
/* linked_list_c.c */
#include <stdio.h>
#include <stdlib.h>
struct node {
int value;
struct node *next;
};
struct node *list_push(struct node *head, int value)
{
struct node *n = malloc(sizeof(struct node));
if (n == NULL) {
perror("malloc");
exit(1);
}
n->value = value;
n->next = head;
return n;
}
void list_print(const struct node *head)
{
const struct node *cur = head;
while (cur != NULL) {
printf("%d -> ", cur->value);
cur = cur->next;
}
printf("NULL\n");
}
void list_free(struct node *head)
{
struct node *cur = head;
while (cur != NULL) {
struct node *next = cur->next;
free(cur);
cur = next;
}
}
int main(void)
{
struct node *list = NULL;
list = list_push(list, 10);
list = list_push(list, 20);
list = list_push(list, 30);
list_print(list); /* 30 -> 20 -> 10 -> NULL */
list_free(list);
return 0;
}
Three things that can go wrong in the C version:
- Forget
list_free-- memory leak - Use
listafterlist_free-- use-after-free - Free a node twice -- double-free
Rust Linked List
// linked_list_rust.rs enum List { Cons(i32, Box<List>), Nil, } use List::{Cons, Nil}; impl List { fn push(self, value: i32) -> List { Cons(value, Box::new(self)) } fn print(&self) { let mut current = self; loop { match current { Cons(val, next) => { print!("{} -> ", val); current = next; } Nil => { println!("Nil"); break; } } } } } fn main() { let list = Nil .push(10) .push(20) .push(30); list.print(); // 30 -> 20 -> 10 -> Nil // list goes out of scope here. // Each Box is dropped recursively. No leak. No double-free. }
$ rustc linked_list_rust.rs && ./linked_list_rust
30 -> 20 -> 10 -> Nil
Memory layout:
C version:
head -> [30|*]----> [20|*]----> [10|NULL]
malloc'd malloc'd malloc'd
(must free (must free (must free
manually) manually) manually)
Rust version:
list = Cons(30, Box::new(
Cons(20, Box::new(
Cons(10, Box::new(Nil))))))
Stack Heap Heap Heap
+------+ +------+ +------+ +------+
| list |------>| 30 | | 20 | | 10 |
+------+ | Box -|------>| Box -|------>| Nil |
+------+ +------+ +------+
(auto-drop) (auto-drop) (auto-drop)
No list_free function needed. When list goes out of scope, each Box drops
its contents, which triggers the next drop, recursively freeing the entire
chain.
Rust Note: For long lists, recursive drop can overflow the stack. In production, you would implement
Dropmanually with an iterative loop. The standard library'sLinkedList<T>handles this.
Comparing Memory Management Styles
+-------------------+-------------------------+------------------------+
| Operation | C | Rust |
+-------------------+-------------------------+------------------------+
| Allocate one | malloc(sizeof(T)) | Box::new(val) |
| Allocate array | malloc(n * sizeof(T)) | Vec::with_capacity(n) |
| Zero-allocate | calloc(n, sizeof(T)) | vec![0; n] |
| Resize | realloc(p, new_size) | v.reserve(additional) |
| Free | free(p) | automatic (Drop) |
| Detect leaks | valgrind, ASan | not needed* |
| Detect use-after | valgrind, ASan | compile error |
| Detect double-free| valgrind, ASan | compile error |
+-------------------+-------------------------+------------------------+
* Rust can still leak via mem::forget or Rc cycles, but accidental
leaks from forgetting free() are impossible.
std::mem::drop and Early Cleanup
Sometimes you want to free memory before a scope ends. Rust's drop() function
consumes a value, triggering its destructor.
// early_drop.rs fn main() { let data = vec![1, 2, 3, 4, 5]; println!("data = {:?}", data); drop(data); // free heap memory now // println!("{:?}", data); // ERROR: use of moved value println!("data has been freed"); }
This is safe because drop takes ownership (moves the value). After the move,
the compiler prevents any further access.
Try It: Create a struct that holds a large
Vec<u8>(say, 10 MB). Print its size, thendropit, then allocate another. Observe that you cannot accidentally use the first after dropping.
Knowledge Check
- What happens if
reallocfails and you wrotep = realloc(p, new_size)? - What is the Rust equivalent of C's
goto cleanuppattern? - Why can Rust guarantee no double-free at compile time?
Common Pitfalls
- Forgetting to check malloc's return -- it can return NULL on allocation failure.
- Using
reallocincorrectly -- always assign to a temporary first. - Mixing allocators -- do not
free()memory from a custom allocator, or vice versa. - Forgetting to free in every code path -- C's
goto cleanupexists because error paths leak memory. - Leaking
Rccycles in Rust --Rc<T>does not use a garbage collector. Cycles leak. UseWeak<T>to break them. - Calling
mem::forgetcasually -- it preventsDropfrom running. Use it only when you know what you are doing. - Heap fragmentation -- many small allocations can fragment memory. Use pool allocators or arena allocation for high-frequency allocation patterns.
Ownership and Lifetimes
This is the chapter that makes Rust click. Ownership is how Rust manages memory
without a garbage collector and without manual free. Lifetimes are how the
compiler proves your references are always valid. Together, they replace the
entire class of memory bugs that plague C programs.
The Three Rules of Ownership
- Every value has exactly one owner.
- When the owner goes out of scope, the value is dropped (freed).
- Ownership can be transferred (moved), not duplicated (by default).
// ownership_basic.rs fn main() { let s1 = String::from("hello"); // s1 owns the String let s2 = s1; // ownership moves to s2 // println!("{}", s1); // ERROR: s1 no longer valid println!("{}", s2); // OK: s2 is the owner now }
Before move:
Stack Heap
+------+---------+ +---+---+---+---+---+
| s1 | ptr --|----| h | e | l | l | o |
| | len=5 | +---+---+---+---+---+
| | cap=5 |
+------+---------+
After move (s1 = s2):
Stack Heap
+------+---------+
| s1 | invalid | (no longer accessible)
+------+---------+
+------+---------+ +---+---+---+---+---+
| s2 | ptr --|----| h | e | l | l | o |
| | len=5 | +---+---+---+---+---+
| | cap=5 |
+------+---------+
There is still only one pointer to the heap data. When s2 goes out of scope,
the heap memory is freed exactly once.
Compare to C, where this "just works" but is dangerous:
/* ownership_c.c -- C has no concept of ownership */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *s1 = malloc(6);
strcpy(s1, "hello");
char *s2 = s1; /* both pointers to same memory */
free(s1);
/* s2 is now dangling -- use-after-free if accessed */
/* free(s2); */ /* double-free if called */
return 0;
}
Rust Note: C trusts the programmer to track who "owns" each allocation mentally. Rust makes ownership explicit in the type system and enforces it at compile time. There is zero runtime cost.
Move Semantics
Passing a value to a function moves it. The caller loses access.
// move_fn.rs fn take_ownership(s: String) { println!("got: {}", s); } // s is dropped here fn main() { let s = String::from("hello"); take_ownership(s); // println!("{}", s); // ERROR: value moved }
To let the function use the value without taking ownership, pass a reference (borrowing, covered in Chapter 7).
// borrow_fn.rs fn borrow(s: &String) { println!("borrowed: {}", s); } // nothing dropped -- we only had a reference fn main() { let s = String::from("hello"); borrow(&s); println!("still mine: {}", s); // OK }
Copy vs Clone
Copy Types
Simple, stack-only types implement the Copy trait. Assignment copies the bits
instead of moving.
// copy_demo.rs fn main() { let x: i32 = 42; let y = x; // copy, not move -- i32 implements Copy println!("x = {}, y = {}", x, y); // both valid let a: f64 = 3.14; let b = a; // copy println!("a = {}, b = {}", a, b); // both valid }
Types that implement Copy: all integer types, f32, f64, bool, char,
tuples of Copy types, fixed-size arrays of Copy types.
Types that do NOT implement Copy: String, Vec<T>, Box<T>, anything
that owns heap memory.
Clone
Clone is explicit duplication. You call .clone() to make a deep copy.
// clone_demo.rs fn main() { let s1 = String::from("hello"); let s2 = s1.clone(); // deep copy: new heap allocation println!("s1 = {}", s1); // OK -- s1 still valid println!("s2 = {}", s2); }
After clone:
Stack Heap
+------+---------+ +---+---+---+---+---+
| s1 | ptr --|----| h | e | l | l | o | <-- allocation 1
| | len=5 | +---+---+---+---+---+
+------+---------+
+------+---------+ +---+---+---+---+---+
| s2 | ptr --|----| h | e | l | l | o | <-- allocation 2
| | len=5 | +---+---+---+---+---+
+------+---------+
Two separate heap allocations. No aliasing.
Try It: Try to assign a
Vec<i32>without cloning. Observe the move error. Then add.clone()and see that both vectors work independently.
Lifetimes: What 'a Means
A lifetime is the scope during which a reference is valid. Usually, the compiler infers lifetimes automatically. When it cannot, you annotate them.
// lifetime_basic.rs fn longer<'a>(s1: &'a str, s2: &'a str) -> &'a str { if s1.len() >= s2.len() { s1 } else { s2 } } fn main() { let s1 = String::from("long string"); let result; { let s2 = String::from("hi"); result = longer(&s1, &s2); println!("longer: {}", result); } // println!("{}", result); // Would fail if s2's data were used }
The 'a annotation says: "the returned reference lives at least as long as
the shorter of the two input references." This lets the compiler verify that
the returned reference does not outlive its data.
Lifetime visualization:
fn longer<'a>(s1: &'a str, s2: &'a str) -> &'a str
^^ ^^ ^^
| | |
+------- all the same 'a -----+
|
The returned reference is valid for
the INTERSECTION of s1 and s2's lifetimes.
Timeline:
|---- s1 valid -----------------------------|
| |---- s2 valid ----| |
| |---- 'a ---------| |
| |-- result valid --| |
Why the Compiler Needs Lifetimes
Without lifetimes, the compiler cannot tell if this function is safe:
// lifetime_needed.rs -- what if there were no annotations? fn first_word(s: &str) -> &str { let bytes = s.as_bytes(); for (i, &byte) in bytes.iter().enumerate() { if byte == b' ' { return &s[0..i]; } } s } fn main() { let sentence = String::from("hello world"); let word = first_word(&sentence); println!("first word: {}", word); }
This compiles without explicit lifetime annotations because of lifetime elision rules. The compiler infers that the output lifetime matches the input. Here are the three elision rules:
- Each reference parameter gets its own lifetime.
- If there is exactly one input lifetime, it is assigned to all output lifetimes.
- If one parameter is
&selfor&mut self, its lifetime is assigned to outputs.
When these rules are insufficient, you must annotate manually.
Structs with References
If a struct holds a reference, it needs a lifetime parameter.
// struct_lifetime.rs struct Excerpt<'a> { text: &'a str, } impl<'a> Excerpt<'a> { fn new(text: &'a str) -> Self { Excerpt { text } } fn display(&self) { println!("Excerpt: {}", self.text); } } fn main() { let novel = String::from("Call me Ishmael. Some years ago..."); let excerpt = Excerpt::new(&novel[..16]); excerpt.display(); }
$ rustc struct_lifetime.rs && ./struct_lifetime
Excerpt: Call me Ishmael.
The lifetime 'a guarantees that the Excerpt cannot outlive the string it
references. In C, you would just store a const char * with no such guarantee.
/* struct_lifetime.c -- C version: no safety */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct excerpt {
const char *text; /* dangling pointer? who knows */
};
int main(void)
{
struct excerpt e;
{
char *novel = strdup("Call me Ishmael. Some years ago...");
e.text = novel;
free(novel); /* e.text is now dangling */
}
/* printf("%s\n", e.text); */ /* undefined behavior */
return 0;
}
Caution: The C version compiles without warnings. The dangling pointer is invisible to the compiler. Rust rejects the equivalent code outright.
The Borrow Checker in Action
The borrow checker enforces ownership and lifetime rules at compile time. Here is a classic example it catches:
// borrow_checker.rs -- This will NOT compile fn main() { let mut v = vec![1, 2, 3]; let first = &v[0]; // immutable borrow v.push(4); // mutable borrow (push might reallocate!) println!("{}", first); // ERROR: first might be dangling }
error[E0502]: cannot borrow `v` as mutable because it is also
borrowed as immutable
Why? push might reallocate the underlying buffer, invalidating first.
The borrow checker catches this at compile time. In C, this is a silent bug:
/* borrow_bug.c -- C version: silent bug */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *arr = malloc(3 * sizeof(int));
arr[0] = 1; arr[1] = 2; arr[2] = 3;
int *first = &arr[0]; /* pointer into arr */
/* realloc might move the buffer */
int *tmp = realloc(arr, 100 * sizeof(int));
if (tmp) arr = tmp;
/* first might point to freed memory now */
printf("%d\n", *first); /* undefined behavior */
free(arr);
return 0;
}
Rc<T>: Reference-Counted Shared Ownership
Sometimes multiple parts of your program need to own the same data. Rc<T>
(Reference Counted) tracks how many owners exist and frees the data when the
count reaches zero.
// rc_demo.rs use std::rc::Rc; fn main() { let a = Rc::new(String::from("shared data")); println!("ref count after a: {}", Rc::strong_count(&a)); let b = Rc::clone(&a); // increment ref count, not deep copy println!("ref count after b: {}", Rc::strong_count(&a)); { let c = Rc::clone(&a); println!("ref count after c: {}", Rc::strong_count(&a)); } // c dropped, ref count decremented println!("ref count after c dropped: {}", Rc::strong_count(&a)); println!("data: {}", a); }
$ rustc rc_demo.rs && ./rc_demo
ref count after a: 1
ref count after b: 2
ref count after c: 3
ref count after c dropped: 2
data: shared data
Rc layout:
Stack Heap (Rc control block + data)
+---+---------+ +----------------+-------------------+
| a | ptr --|----->| strong_count=2 | "shared data" |
+---+---------+ | weak_count=0 | |
+---+---------+ +----------------+-------------------+
| b | ptr --|------^
+---+---------+
When all Rc's drop, strong_count hits 0 -> data freed.
Caution:
Rc<T>is single-threaded only. It does not use atomic operations. Using it across threads is a compile error.
Arc<T>: Atomic Reference Counting
Arc<T> is the thread-safe version of Rc<T>. It uses atomic operations to
update the reference count.
// arc_demo.rs use std::sync::Arc; use std::thread; fn main() { let data = Arc::new(vec![1, 2, 3, 4, 5]); let mut handles = vec![]; for i in 0..3 { let data_clone = Arc::clone(&data); let handle = thread::spawn(move || { let sum: i32 = data_clone.iter().sum(); println!("thread {}: sum = {}", i, sum); }); handles.push(handle); } for handle in handles { handle.join().unwrap(); } println!("ref count: {}", Arc::strong_count(&data)); }
$ rustc arc_demo.rs && ./arc_demo
thread 0: sum = 15
thread 1: sum = 15
thread 2: sum = 15
ref count: 1
Driver Prep: In Rust-for-Linux,
Arc<T>is used for shared device state. The kernel's ownstruct krefis the C equivalent -- a manual reference counter with explicitkref_getandkref_putcalls.
When to Use What
+------------------+--------------------------------------------+
| Ownership Model | Use When |
+------------------+--------------------------------------------+
| T (owned) | Single owner, value moves with assignment |
| &T | Read-only access, no ownership change |
| &mut T | Exclusive read-write access, temporary |
| Box<T> | Single owner, heap allocation needed |
| Rc<T> | Multiple owners, single-threaded |
| Arc<T> | Multiple owners, multi-threaded |
| Rc<RefCell<T>> | Multiple owners + interior mutability (ST) |
| Arc<Mutex<T>> | Multiple owners + interior mutability (MT) |
+------------------+--------------------------------------------+
'static Lifetime
The 'static lifetime means the reference is valid for the entire program
duration. String literals have 'static lifetime because they are embedded
in the binary.
// static_lifetime.rs fn get_greeting() -> &'static str { "Hello, world!" // string literal: lives forever } fn main() { let s = get_greeting(); println!("{}", s); }
'static does NOT mean "allocated forever." It means "valid for the rest of
the program." Leaked memory is also 'static, but that is usually a bug.
When to Use unsafe
unsafe does not turn off the borrow checker. It unlocks five specific powers:
- Dereference raw pointers (
*const T,*mut T) - Call
unsafefunctions - Access mutable statics
- Implement
unsafetraits - Access fields of
uniontypes
// unsafe_demo.rs fn main() { let mut x = 42; // Creating raw pointers is safe let r1 = &x as *const i32; let r2 = &mut x as *mut i32; // Dereferencing raw pointers requires unsafe unsafe { println!("r1 = {}", *r1); *r2 = 99; println!("r2 = {}", *r2); } // Calling C functions via FFI unsafe { let pid = libc_getpid(); println!("pid = {}", pid); } } // Declaring an external C function extern "C" { #[link_name = "getpid"] fn libc_getpid() -> i32; }
$ rustc unsafe_demo.rs && ./unsafe_demo
r1 = 42
r2 = 99
pid = 12345
Driver Prep: Rust-for-Linux kernel modules use
unsafeat the boundary between Rust and the C kernel API. The goal is to wrap unsafe operations in safe abstractions so that driver authors rarely needunsafein their own code.
C Trusts the Programmer, Rust Trusts the Compiler
This is the philosophical divide. C gives you full control and assumes you know what you are doing. Rust restricts what you can express and proves correctness at compile time.
/* trust_programmer.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc(sizeof(int));
*p = 42;
int *alias = p; /* two pointers, same memory */
free(p); /* free through one */
*alias = 99; /* use through other -- UB */
/* C: compiles, runs, "works" until it doesn't */
return 0;
}
// trust_compiler.rs fn main() { let p = Box::new(42); // let alias = p; // ownership moves, p is invalidated // println!("{}", *p); // compile error: use of moved value // Rust: does not compile. Bug caught before it exists. }
The cost of Rust's approach: a learning curve and occasional fights with the borrow checker. The benefit: entire categories of bugs are impossible.
Knowledge Check
- What is the difference between
CopyandClone? - What does the lifetime
'ainfn foo<'a>(x: &'a str) -> &'a strmean? - Why is
Rc<T>not safe to use across threads?
Common Pitfalls
- Cloning everything to avoid the borrow checker -- this works but defeats the purpose. Restructure your code instead.
- Confusing
movein closures with ownership transfer --movecaptures variables by value, taking ownership. - Forgetting that
Rccycles leak -- useWeak<T>references to break cycles. - Overusing
'static-- not every reference needs to live forever. Use the narrowest lifetime that works. - Putting lifetimes on everything -- trust the elision rules first. Annotate only when the compiler asks.
- Using
unsafeto bypass the borrow checker -- if the borrow checker rejects your code, you almost certainly have a real bug.unsafedoes not fix logic errors. - Thinking
unsafemeans "no rules" -- you still must uphold Rust's safety invariants.unsafemeans "I, the programmer, guarantee these invariants hold."
Binary, Hex, and Bitwise Operations
Systems code lives at the bit level. Device registers, protocol headers, permission flags -- all of them demand that you read, set, and clear individual bits. This chapter gives you the vocabulary and the muscle memory for that work, first in C, then in Rust.
Number Representations
A byte is eight bits. How you display those bits is a matter of base.
Base 10 (decimal): 42
Base 2 (binary): 00101010
Base 8 (octal): 052
Base 16 (hex): 0x2A
Hex is the lingua franca of systems programming because one hex digit maps to exactly four bits. Two hex digits map to one byte. Clean, compact, no ambiguity.
Hex digit: 0 1 2 3 4 5 6 7 8 9 A B C D E F
Binary: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
C Literals
#include <stdio.h>
int main(void) {
int dec = 42;
int oct = 052; /* leading zero = octal */
int hex = 0x2A;
int bin = 0b00101010; /* C23 / GCC extension */
printf("dec=%d oct=%o hex=0x%X bin(manual)=00101010\n", dec, oct, hex);
printf("All equal? %d\n", (dec == oct) && (oct == hex) && (hex == bin));
return 0;
}
Compile: gcc -std=c2x -o numrep numrep.c && ./numrep
Caution: A leading zero makes a literal octal in C. Writing
int x = 010;gives you 8, not 10. This has caused real bugs in real codebases.
Rust Literals
fn main() { let dec = 42; let oct = 0o52; // explicit 0o prefix -- no silent octal trap let hex = 0x2A; let bin = 0b00101010; println!("dec={dec} oct={oct} hex=0x{hex:X} bin=0b{bin:08b}"); println!("All equal? {}", dec == oct && oct == hex && hex == bin); }
Rust Note: Rust requires
0ofor octal. There is no silent leading-zero trap. Rust also lets you use underscores as visual separators:0b0010_1010,1_000_000.
Try It: Print the value
0xDEAD_BEEFin decimal, octal, and binary in both C and Rust. How many bits does it need?
Bitwise Operators
Six operators manipulate bits directly. They work on integer types only.
Operator C Rust Meaning
-------------------------------------------------------
AND & & 1 only if both bits are 1
OR | | 1 if either bit is 1
XOR ^ ^ 1 if bits differ
NOT ~ ! flip every bit (C: ~x, Rust: !x)
Left shift << << shift bits left, fill with 0
Right shift>> >> shift bits right (see below)
AND -- Masking
AND keeps only the bits that are 1 in both operands. Use it to extract bits.
#include <stdio.h>
int main(void) {
unsigned char val = 0b11010110;
unsigned char mask = 0b00001111; /* keep low nibble */
unsigned char result = val & mask;
printf("0x%02X & 0x%02X = 0x%02X\n", val, mask, result);
/* output: 0xD6 & 0x0F = 0x06 */
return 0;
}
1 1 0 1 0 1 1 0 val (0xD6)
AND 0 0 0 0 1 1 1 1 mask (0x0F)
= 0 0 0 0 0 1 1 0 result(0x06)
OR -- Setting Bits
OR forces bits to 1. Use it to set flags.
#include <stdio.h>
int main(void) {
unsigned char flags = 0b00000010; /* bit 1 already set */
unsigned char bit4 = 0b00010000; /* want to set bit 4 */
flags = flags | bit4;
printf("flags = 0x%02X\n", flags); /* 0x12 */
return 0;
}
XOR -- Toggling Bits
XOR flips bits where the mask is 1, leaves others untouched.
#include <stdio.h>
int main(void) {
unsigned char val = 0b11001100;
unsigned char tog = 0b00001111;
val ^= tog;
printf("After toggle: 0x%02X\n", val); /* 0xC3 = 0b11000011 */
return 0;
}
NOT -- Inverting All Bits
#include <stdio.h>
int main(void) {
unsigned char val = 0b00001111;
unsigned char inv = ~val;
printf("~0x%02X = 0x%02X\n", val, inv); /* ~0x0F = 0xF0 */
return 0;
}
Rust Note: Rust uses
!for bitwise NOT (not~). The!operator on a boolean gives logical NOT; on an integer, bitwise NOT. Context determines behavior.
Rust Equivalents -- All Operators at Once
fn main() { let a: u8 = 0b1100_1010; let b: u8 = 0b0011_1100; println!("AND: {:08b}", a & b); // 00001000 println!("OR: {:08b}", a | b); // 11111110 println!("XOR: {:08b}", a ^ b); // 11110110 println!("NOT: {:08b}", !a); // 00110101 println!("SHL: {:08b}", a << 2); // 00101000 (top bits lost) println!("SHR: {:08b}", a >> 2); // 00110010 }
Shifts: Arithmetic vs Logical
Left shift always fills with zeros. Right shift is where trouble lurks.
Logical right shift: fills the vacated high bits with 0. Arithmetic right shift: fills the vacated high bits with the sign bit.
Logical >> 2: 1100 0000 -> 0011 0000 (zero-filled)
Arithmetic >> 2: 1100 0000 -> 1111 0000 (sign-extended)
In C, the behavior of >> on signed integers is implementation-defined. Most
compilers do arithmetic shift, but it is not guaranteed.
#include <stdio.h>
int main(void) {
int signed_val = -128; /* 0xFFFFFF80 in 32-bit */
unsigned int unsigned_val = 0xFF000000u;
printf("signed >> 4 = 0x%08X\n", signed_val >> 4); /* likely 0xFFFFFFF8 */
printf("unsigned >> 4 = 0x%08X\n", unsigned_val >> 4); /* always 0x0FF00000 */
return 0;
}
Caution: Never right-shift a negative value in portable C code. Use unsigned types for bit manipulation. Always.
In Rust, the rules are explicit:
fn main() { let s: i8 = -128_i8; // 0x80 let u: u8 = 0x80; // Rust: >> is arithmetic on signed, logical on unsigned. Always. println!("signed >> 2 = {}", s >> 2); // -32 (arithmetic) println!("unsigned >> 2 = {}", u >> 2); // 32 (logical) }
Rust Note: Rust defines shift behavior precisely: arithmetic for signed, logical for unsigned. In debug mode, shifting by >= the bit width panics. In release mode, it wraps. No undefined behavior either way.
Common Bit Patterns
These are the bread and butter of driver and kernel code.
Check if Bit N is Set
#include <stdio.h>
#include <stdint.h>
int bit_is_set(uint32_t val, int n) {
return (val >> n) & 1;
}
int main(void) {
uint32_t reg = 0xA5; /* 1010 0101 */
for (int i = 7; i >= 0; i--)
printf("%d", bit_is_set(reg, i));
printf("\n");
return 0;
}
Set Bit N
val |= (1u << n);
Clear Bit N
val &= ~(1u << n);
Toggle Bit N
val ^= (1u << n);
All Four in Rust
fn main() { let mut val: u32 = 0b1010_0101; let n = 3; let is_set = (val >> n) & 1 == 1; println!("Bit {n} set? {is_set}"); val |= 1 << n; // set bit 3 println!("After set: 0b{val:08b}"); val &= !(1u32 << n); // clear bit 3 println!("After clear: 0b{val:08b}"); val ^= 1 << n; // toggle bit 3 println!("After toggle: 0b{val:08b}"); }
Driver Prep: Every hardware register you touch in a driver uses exactly these four operations. A typical register write looks like:
reg |= ENABLE_BIT; writel(reg, base + OFFSET);
Powers of Two
Bit shifts and powers of two are the same thing.
1 << 0 = 1 = 2^0
1 << 1 = 2 = 2^1
1 << 4 = 16 = 2^4
1 << 10 = 1024 = 2^10
1 << 20 = 1048576 = 2^20 (1 MiB)
A classic trick: check if a number is a power of two.
#include <stdio.h>
#include <stdbool.h>
bool is_power_of_two(unsigned int x) {
return x != 0 && (x & (x - 1)) == 0;
}
int main(void) {
unsigned int tests[] = {0, 1, 2, 3, 4, 15, 16, 255, 256};
int n = sizeof(tests) / sizeof(tests[0]);
for (int i = 0; i < n; i++)
printf("%3u -> %s\n", tests[i], is_power_of_two(tests[i]) ? "yes" : "no");
return 0;
}
Why does x & (x - 1) work?
x = 0001 0000 (16, a power of 2)
x - 1 = 0000 1111
x & (x-1) = 0000 0000 => zero, so it IS a power of 2
x = 0001 0100 (20, NOT a power of 2)
x - 1 = 0001 0011
x & (x-1) = 0001 0000 => non-zero, so it is NOT
fn is_power_of_two(x: u32) -> bool { x != 0 && (x & (x - 1)) == 0 } fn main() { for x in [0, 1, 2, 3, 4, 15, 16, 255, 256] { println!("{x:>3} -> {}", if is_power_of_two(x) { "yes" } else { "no" }); } }
Rust Note: Rust provides
u32::is_power_of_two()in the standard library. But knowing the bit trick matters -- you will seex & (x - 1)in kernel code.
Counting Set Bits (Population Count)
How many bits are 1 in a value? This operation is called popcount.
#include <stdio.h>
#include <stdint.h>
int popcount_naive(uint32_t x) {
int count = 0;
while (x) {
count += x & 1;
x >>= 1;
}
return count;
}
/* Brian Kernighan's trick: x & (x-1) clears the lowest set bit */
int popcount_kernighan(uint32_t x) {
int count = 0;
while (x) {
x &= x - 1;
count++;
}
return count;
}
int main(void) {
uint32_t val = 0xDEADBEEF;
printf("popcount(0x%X) = %d (naive)\n", val, popcount_naive(val));
printf("popcount(0x%X) = %d (kernighan)\n", val, popcount_kernighan(val));
/* GCC/Clang built-in -- compiles to a single POPCNT instruction */
printf("popcount(0x%X) = %d (builtin)\n", val, __builtin_popcount(val));
return 0;
}
fn main() { let val: u32 = 0xDEAD_BEEF; println!("popcount(0x{val:X}) = {}", val.count_ones()); // Also available: count_zeros, leading_zeros, trailing_zeros println!("leading zeros: {}", val.leading_zeros()); println!("trailing zeros: {}", val.trailing_zeros()); }
Try It: Write a function that returns the position of the highest set bit in a
u32. Test it with0x00000001(should return 0) and0x80000000(should return 31).
Extracting and Inserting Bit Fields
Registers often pack multiple values into a single word.
31 24 23 16 15 8 7 0
+----------+---------+--------+---------+
| field_d |field_c |field_b | field_a |
+----------+---------+--------+---------+
Extract field_b (bits 8..15):
#include <stdio.h>
#include <stdint.h>
int main(void) {
uint32_t reg = 0xAABBCCDD;
/* Extract bits [15:8] */
uint32_t field_b = (reg >> 8) & 0xFF;
printf("field_b = 0x%02X\n", field_b); /* 0xCC */
/* Insert new value 0x42 into bits [15:8] */
reg &= ~(0xFF << 8); /* clear the field */
reg |= (0x42u << 8); /* set new value */
printf("reg = 0x%08X\n", reg); /* 0xAABB42DD */
return 0;
}
fn main() { let mut reg: u32 = 0xAABB_CCDD; // Extract bits [15:8] let field_b = (reg >> 8) & 0xFF; println!("field_b = 0x{field_b:02X}"); // 0xCC // Insert 0x42 into bits [15:8] reg &= !(0xFFu32 << 8); reg |= 0x42u32 << 8; println!("reg = 0x{reg:08X}"); // 0xAABB42DD }
Driver Prep: The pattern
(reg >> SHIFT) & MASKto read andreg = (reg & ~(MASK << SHIFT)) | (val << SHIFT)to write is the single most common operation in Linux driver code. Burn it into memory.
No Implicit Conversions in Rust
In C, bitwise operators can silently promote or truncate types:
#include <stdio.h>
int main(void) {
unsigned char a = 0xFF;
/* ~a promotes a to int first, result is NOT 0x00 */
printf("~a = 0x%08X\n", ~a); /* 0xFFFFFF00 -- surprise! */
return 0;
}
Caution: In C,
~on acharpromotes tointfirst. The result has 32 (or 64) bits, not 8. This causes subtle bugs in mask comparisons.
Rust does not promote:
fn main() { let a: u8 = 0xFF; let b: u8 = !a; println!("!0xFF = 0x{b:02X}"); // 0x00 -- exactly 8 bits, no surprise }
Quick Knowledge Check
- What does
0x1F & 0x0Fevaluate to? Work it out in binary before running code. - You have a 32-bit register value. Bits [7:4] contain a 4-bit version number. Write the C expression to extract it.
- Why is
x & (x - 1) == 0not a correct power-of-two test whenxis 0?
Common Pitfalls
- Shifting by the type width.
1 << 32on a 32-bit int is undefined in C. Use1u << 31as the maximum, or1ULL << 32for 64-bit. - Signed operands in bit ops.
~(-1)is well-defined but confusing. Use unsigned. - Forgetting the
usuffix.1 << 31in C is signed overflow (UB on 32-bit int). Write1u << 31. - Comparing after NOT.
~(unsigned char)0xFFisint, notunsigned char. Cast or mask the result. - Rust shift panics.
1u32 << 32panics in debug. Design around it.
Bit Masks and Bit Fields
Hardware registers and protocol headers cram multiple values into single words. You need two skills: defining named masks for individual bits, and using C's bit field syntax for struct-level packing. This chapter covers both, along with the Rust alternatives.
Defining Flags with #define
The simplest approach: one #define per bit.
#include <stdio.h>
#include <stdint.h>
/* Permission flags -- each is a single bit */
#define PERM_READ (1u << 0) /* 0x01 */
#define PERM_WRITE (1u << 1) /* 0x02 */
#define PERM_EXEC (1u << 2) /* 0x04 */
#define PERM_SETUID (1u << 3) /* 0x08 */
void print_perms(uint8_t flags) {
printf("Permissions:");
if (flags & PERM_READ) printf(" READ");
if (flags & PERM_WRITE) printf(" WRITE");
if (flags & PERM_EXEC) printf(" EXEC");
if (flags & PERM_SETUID) printf(" SETUID");
printf("\n");
}
int main(void) {
uint8_t file_perms = PERM_READ | PERM_WRITE;
print_perms(file_perms);
/* Add execute */
file_perms |= PERM_EXEC;
print_perms(file_perms);
/* Remove write */
file_perms &= ~PERM_WRITE;
print_perms(file_perms);
/* Check a specific flag */
if (file_perms & PERM_EXEC)
printf("File is executable\n");
return 0;
}
The pattern is always the same:
Set flag: flags |= FLAG;
Clear flag: flags &= ~FLAG;
Toggle flag: flags ^= FLAG;
Test flag: if (flags & FLAG)
Using enum for Flags
Some codebases use enum instead of #define. The effect is similar, but enums are
visible in debuggers.
#include <stdio.h>
#include <stdint.h>
typedef enum {
OPT_VERBOSE = (1u << 0),
OPT_DEBUG = (1u << 1),
OPT_FORCE = (1u << 2),
OPT_DRY_RUN = (1u << 3),
} options_t;
int main(void) {
uint32_t opts = OPT_VERBOSE | OPT_DEBUG;
if (opts & OPT_VERBOSE)
printf("Verbose mode on\n");
if (opts & OPT_DEBUG)
printf("Debug mode on\n");
if (!(opts & OPT_FORCE))
printf("Force mode off\n");
return 0;
}
Caution: In C, enum values are
int-sized. Combining them with|may produce a value outside the enum's defined range, which is technically valid but some compilers warn. Use an unsigned integer type for the combined flags variable.
Combining and Testing Multiple Flags
Test whether all of several flags are set:
#include <stdio.h>
#include <stdint.h>
#define FLAG_A (1u << 0)
#define FLAG_B (1u << 1)
#define FLAG_C (1u << 2)
int main(void) {
uint32_t flags = FLAG_A | FLAG_C;
uint32_t required = FLAG_A | FLAG_B;
/* Test if ALL required flags are set */
if ((flags & required) == required)
printf("All required flags set\n");
else
printf("Missing some required flags\n");
/* Test if ANY of the required flags are set */
if (flags & required)
printf("At least one required flag set\n");
return 0;
}
Caution:
if (flags & required)tests if any bit matches.if ((flags & required) == required)tests if all bits match. Confusing the two is a classic bug.
C Bit Fields
C lets you declare struct members with explicit bit widths.
#include <stdio.h>
struct status_reg {
unsigned int enabled : 1;
unsigned int mode : 3; /* 0-7 */
unsigned int priority : 4; /* 0-15 */
unsigned int error : 1;
unsigned int reserved : 23;
};
int main(void) {
struct status_reg sr = {0};
sr.enabled = 1;
sr.mode = 5;
sr.priority = 12;
printf("enabled=%u mode=%u priority=%u error=%u\n",
sr.enabled, sr.mode, sr.priority, sr.error);
printf("sizeof(struct status_reg) = %zu\n", sizeof(struct status_reg));
return 0;
}
The layout in memory (assuming little-endian, no padding):
Bit: 31 9 8 7 4 3 1 0
+--------------------+---+----------+------+---+
| reserved (23) |err| pri (4) |mode 3|en |
+--------------------+---+----------+------+---+
When to Use Bit Fields
Good uses:
- Modeling hardware registers in documentation or test code
- Compact storage of boolean flags
- Quick prototyping
Bad uses:
- Anything that crosses a machine boundary (network, file, IPC)
- Portable code that must work on multiple compilers/architectures
Why Bit Fields Are Dangerous for Portable Code
The C standard leaves almost everything about bit fields implementation-defined:
- Allocation order (MSB-first or LSB-first) is compiler-dependent
- Whether a bit field can straddle a storage-unit boundary is compiler-dependent
- Signedness of plain
intbit fields is compiler-dependent - Padding between bit fields is compiler-dependent
Caution: Two different compilers (or the same compiler on two architectures) can lay out the same bit field struct differently. Never use bit fields for data that leaves the current process -- use explicit shifts and masks instead.
Try It: Compile the
status_regexample on your machine. Cast the struct touint32_tviamemcpyand print the raw hex value. Does bit 0 correspond toenabled? Try on a different compiler or with-m32if available.
Register Definitions with Bit Fields and Masks
Real driver code uses both approaches. Bit fields for readability during development, explicit masks for the actual hardware access.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
/* Mask-based definitions (portable, used for real HW access) */
#define CTRL_ENABLE_BIT (1u << 0)
#define CTRL_MODE_MASK (0x7u << 1)
#define CTRL_MODE_SHIFT 1
#define CTRL_PRIO_MASK (0xFu << 4)
#define CTRL_PRIO_SHIFT 4
#define CTRL_ERR_BIT (1u << 8)
/* Helper macros */
#define CTRL_SET_MODE(reg, m) \
(((reg) & ~CTRL_MODE_MASK) | (((m) & 0x7u) << CTRL_MODE_SHIFT))
#define CTRL_GET_MODE(reg) \
(((reg) & CTRL_MODE_MASK) >> CTRL_MODE_SHIFT)
int main(void) {
uint32_t ctrl = 0;
ctrl |= CTRL_ENABLE_BIT; /* enable */
ctrl = CTRL_SET_MODE(ctrl, 5); /* mode = 5 */
ctrl |= (12u << CTRL_PRIO_SHIFT); /* priority = 12 */
printf("ctrl = 0x%08X\n", ctrl);
printf("mode = %u\n", CTRL_GET_MODE(ctrl));
printf("enabled = %u\n", (ctrl & CTRL_ENABLE_BIT) ? 1 : 0);
return 0;
}
Driver Prep: Linux kernel drivers follow this exact pattern. Look at any register header file in
drivers/-- you will see_MASK,_SHIFT, and helper macros everywhere. The kernel avoids bit fields for hardware registers.
Rust: Manual Masks
The same mask-and-shift approach works in Rust, but with stronger types.
const CTRL_ENABLE_BIT: u32 = 1 << 0; const CTRL_MODE_MASK: u32 = 0x7 << 1; const CTRL_MODE_SHIFT: u32 = 1; const CTRL_PRIO_MASK: u32 = 0xF << 4; const CTRL_PRIO_SHIFT: u32 = 4; const CTRL_ERR_BIT: u32 = 1 << 8; fn ctrl_set_mode(reg: u32, mode: u32) -> u32 { (reg & !CTRL_MODE_MASK) | ((mode & 0x7) << CTRL_MODE_SHIFT) } fn ctrl_get_mode(reg: u32) -> u32 { (reg & CTRL_MODE_MASK) >> CTRL_MODE_SHIFT } fn main() { let mut ctrl: u32 = 0; ctrl |= CTRL_ENABLE_BIT; ctrl = ctrl_set_mode(ctrl, 5); ctrl |= 12 << CTRL_PRIO_SHIFT; println!("ctrl = 0x{ctrl:08X}"); println!("mode = {}", ctrl_get_mode(ctrl)); println!("enabled = {}", (ctrl & CTRL_ENABLE_BIT) != 0); }
Rust: The bitflags Crate
For flag-style bitmasks (not multi-bit fields), the bitflags crate is the Rust
community standard. Add bitflags = "2" to Cargo.toml.
// In Cargo.toml: bitflags = "2" use bitflags::bitflags; bitflags! { #[derive(Debug, Clone, Copy, PartialEq)] struct Permissions: u8 { const READ = 0b0000_0001; const WRITE = 0b0000_0010; const EXEC = 0b0000_0100; const SETUID = 0b0000_1000; } } fn main() { let mut perms = Permissions::READ | Permissions::WRITE; println!("{:?}", perms); // Permissions(READ | WRITE) perms.insert(Permissions::EXEC); println!("{:?}", perms); // Permissions(READ | WRITE | EXEC) perms.remove(Permissions::WRITE); println!("{:?}", perms); // Permissions(READ | EXEC) if perms.contains(Permissions::EXEC) { println!("Executable"); } // Test multiple flags at once let required = Permissions::READ | Permissions::EXEC; println!("Has required? {}", perms.contains(required)); // Raw bits access println!("raw bits = 0b{:08b}", perms.bits()); }
Rust Note:
bitflagsgives you type safety -- you cannot accidentally OR aPermissionswith an unrelated flag type. The raw bits are always accessible via.bits()when you need to pass them to hardware or system calls.
Protocol Header Parsing with Bit Masks
Real-world example: parsing an IPv4 header's first byte.
Byte 0 of IPv4 header:
+---+---+---+---+---+---+---+---+
| Version (4b) | IHL (4b) |
+---+---+---+---+---+---+---+---+
Bits: 7 6 5 4 3 2 1 0
#include <stdio.h>
#include <stdint.h>
int main(void) {
/* Simulated first byte of an IPv4 header: version=4, IHL=5 */
uint8_t byte0 = 0x45;
uint8_t version = (byte0 >> 4) & 0x0F;
uint8_t ihl = byte0 & 0x0F;
printf("Version: %u\n", version); /* 4 */
printf("IHL: %u (header = %u bytes)\n", ihl, ihl * 4); /* 5 (20 bytes) */
/* Construct a byte from fields */
uint8_t built = ((4u & 0x0F) << 4) | (5u & 0x0F);
printf("Built: 0x%02X\n", built); /* 0x45 */
return 0;
}
fn main() { let byte0: u8 = 0x45; let version = (byte0 >> 4) & 0x0F; let ihl = byte0 & 0x0F; println!("Version: {version}"); println!("IHL: {ihl} (header = {} bytes)", ihl as u32 * 4); let built: u8 = ((4 & 0x0F) << 4) | (5 & 0x0F); println!("Built: 0x{built:02X}"); }
A Larger Example: TCP Flags
TCP flags live in a single byte. Let us define, combine, and test them.
#include <stdio.h>
#include <stdint.h>
#define TCP_FIN (1u << 0)
#define TCP_SYN (1u << 1)
#define TCP_RST (1u << 2)
#define TCP_PSH (1u << 3)
#define TCP_ACK (1u << 4)
#define TCP_URG (1u << 5)
void print_tcp_flags(uint8_t flags) {
const char *names[] = {"FIN","SYN","RST","PSH","ACK","URG"};
printf("Flags:");
for (int i = 0; i < 6; i++) {
if (flags & (1u << i))
printf(" %s", names[i]);
}
printf("\n");
}
int main(void) {
/* SYN packet */
uint8_t syn = TCP_SYN;
print_tcp_flags(syn);
/* SYN-ACK response */
uint8_t syn_ack = TCP_SYN | TCP_ACK;
print_tcp_flags(syn_ack);
/* Is this a SYN without ACK? */
if ((syn_ack & (TCP_SYN | TCP_ACK)) == TCP_SYN)
printf("Pure SYN\n");
else
printf("Not a pure SYN\n");
return 0;
}
use std::fmt; #[derive(Clone, Copy)] struct TcpFlags(u8); impl TcpFlags { const FIN: u8 = 1 << 0; const SYN: u8 = 1 << 1; const RST: u8 = 1 << 2; const PSH: u8 = 1 << 3; const ACK: u8 = 1 << 4; const URG: u8 = 1 << 5; fn new(bits: u8) -> Self { TcpFlags(bits) } fn has(self, flag: u8) -> bool { (self.0 & flag) != 0 } } impl fmt::Display for TcpFlags { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { let names = [ (Self::FIN, "FIN"), (Self::SYN, "SYN"), (Self::RST, "RST"), (Self::PSH, "PSH"), (Self::ACK, "ACK"), (Self::URG, "URG"), ]; let mut first = true; for (bit, name) in &names { if self.0 & bit != 0 { if !first { write!(f, " | ")?; } write!(f, "{name}")?; first = false; } } Ok(()) } } fn main() { let syn = TcpFlags::new(TcpFlags::SYN); println!("Flags: {syn}"); let syn_ack = TcpFlags::new(TcpFlags::SYN | TcpFlags::ACK); println!("Flags: {syn_ack}"); let is_pure_syn = syn_ack.has(TcpFlags::SYN) && !syn_ack.has(TcpFlags::ACK); println!("Pure SYN? {is_pure_syn}"); }
Try It: Add
ECEandCWRflags (bits 6 and 7) to both the C and Rust versions. Create a flags byte withSYN | ECE | CWR-- this represents a SYN packet with ECN support.
Quick Knowledge Check
- You have
uint32_t flags = 0;and want bits 3, 5, and 7 set. Write one expression using the defined flag constants. - What is the difference between
if (flags & MASK)andif ((flags & MASK) == MASK)? - Why does the Linux kernel avoid C bit fields for hardware register definitions?
Common Pitfalls
- Forgetting
~on clear.flags &= FLAGdoes not clearFLAG. You needflags &= ~FLAG. - Using
==instead of&to test.if (flags == FLAG)only matches ifFLAGis the only bit set. Useif (flags & FLAG). - Bit field portability. Layout is compiler-defined. Never serialize bit fields.
- Missing parentheses in macros.
#define FLAG 1 << 3without parens will cause precedence bugs:FLAG | otherbecomes1 << 3 | otherwhich is1 << (3 | other). Always write(1u << 3). - Rust
!vs C~. Remember: Rust's bitwise NOT is!, not~. Writing~maskin Rust is a compile error.
Alignment, Padding, and Packing
The compiler silently inserts invisible bytes into your structs. This chapter shows you exactly where, why, and how to control it. You need this knowledge for network protocols, file formats, hardware registers, and shared memory -- any time data must match an exact byte layout.
Why the Compiler Inserts Padding
CPUs access memory most efficiently when data falls on natural boundaries. A 4-byte
int is fastest to read when its address is a multiple of 4. A 2-byte short wants
a multiple of 2. The compiler enforces this by inserting padding bytes between
struct members.
#include <stdio.h>
#include <stddef.h>
struct example {
char a; /* 1 byte */
int b; /* 4 bytes */
char c; /* 1 byte */
};
int main(void) {
printf("sizeof(struct example) = %zu\n", sizeof(struct example));
printf("offsetof(a) = %zu\n", offsetof(struct example, a));
printf("offsetof(b) = %zu\n", offsetof(struct example, b));
printf("offsetof(c) = %zu\n", offsetof(struct example, c));
return 0;
}
Typical output on a 64-bit system:
sizeof(struct example) = 12
offsetof(a) = 0
offsetof(b) = 4
offsetof(c) = 8
The layout with padding:
Offset: 0 1 2 3 4 5 6 7 8 9 10 11
+----+----+----+----+----+----+----+----+----+----+----+----+
| a | pad| pad| pad| b (4 bytes) | c | pad| pad| pad|
+----+----+----+----+----+----+----+----+----+----+----+----+
Three bytes of padding after a to align b on a 4-byte boundary. Three bytes of
trailing padding after c so that an array of these structs keeps b aligned.
The offsetof Macro
offsetof(type, member) from <stddef.h> tells you the exact byte offset of any
member. It is the essential tool for verifying layout.
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
struct packet {
uint8_t version;
uint16_t length;
uint32_t sequence;
uint8_t flags;
};
int main(void) {
printf("Field Offset Size\n");
printf("version %zu %zu\n", offsetof(struct packet, version), sizeof(uint8_t));
printf("length %zu %zu\n", offsetof(struct packet, length), sizeof(uint16_t));
printf("sequence %zu %zu\n", offsetof(struct packet, sequence), sizeof(uint32_t));
printf("flags %zu %zu\n", offsetof(struct packet, flags), sizeof(uint8_t));
printf("total size %zu\n", sizeof(struct packet));
return 0;
}
Likely output:
Field Offset Size
version 0 1
length 2 2
sequence 4 4
flags 8 1
total size 12
One byte of padding after version, three bytes of trailing padding after flags.
Reordering Fields to Minimize Padding
Simply reordering members from largest to smallest eliminates most internal padding.
#include <stdio.h>
#include <stddef.h>
struct bad_order {
char a; /* 1 byte + 7 padding */
double b; /* 8 bytes */
char c; /* 1 byte + 7 padding */
}; /* total: 24 bytes */
struct good_order {
double b; /* 8 bytes */
char a; /* 1 byte */
char c; /* 1 byte + 6 padding */
}; /* total: 16 bytes */
int main(void) {
printf("bad_order: %zu bytes\n", sizeof(struct bad_order));
printf("good_order: %zu bytes\n", sizeof(struct good_order));
return 0;
}
bad_order layout (24 bytes):
+---+-------+--------+---+-------+
| a | pad7 | b (8) | c | pad7 |
+---+-------+--------+---+-------+
good_order layout (16 bytes):
+--------+---+---+------+
| b (8) | a | c | pad6 |
+--------+---+---+------+
Driver Prep: In kernel code, struct layout matters for cache performance. Hot fields are grouped together. The
paholetool shows struct layouts including padding holes. Runpahole my_object.oon compiled code to see real layouts.
Packed Structs in C
When you need exact byte layout -- network packets, file headers, hardware registers -- you must eliminate padding entirely.
attribute((packed)) (GCC/Clang)
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
struct __attribute__((packed)) wire_header {
uint8_t version;
uint16_t length;
uint32_t sequence;
uint8_t flags;
};
int main(void) {
printf("sizeof = %zu\n", sizeof(struct wire_header)); /* 8, not 12 */
printf("offsetof(length) = %zu\n", offsetof(struct wire_header, length)); /* 1 */
printf("offsetof(sequence) = %zu\n", offsetof(struct wire_header, sequence)); /* 3 */
printf("offsetof(flags) = %zu\n", offsetof(struct wire_header, flags)); /* 7 */
return 0;
}
Packed layout:
Offset: 0 1 2 3 4 5 6 7
+----+----+----+----+----+----+----+----+
|ver | length | sequence |flag|
+----+----+----+----+----+----+----+----+
#pragma pack
MSVC and GCC both support #pragma pack. It affects all structs until reset.
#include <stdio.h>
#include <stdint.h>
#pragma pack(push, 1)
struct wire_header {
uint8_t version;
uint16_t length;
uint32_t sequence;
uint8_t flags;
};
#pragma pack(pop)
int main(void) {
printf("sizeof = %zu\n", sizeof(struct wire_header)); /* 8 */
return 0;
}
Caution: Always use
push/popwith#pragma pack. Forgettingpopsilently packs every subsequent struct in the translation unit, causing baffling bugs.
Performance Cost of Unaligned Access
On x86, unaligned access works but is slower. On ARM and RISC-V, it can trap or silently produce wrong results depending on the configuration.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
#define ITERATIONS 100000000
int main(void) {
/* Aligned access */
uint32_t aligned_val = 0;
uint32_t *aligned_ptr = &aligned_val;
clock_t start = clock();
for (int i = 0; i < ITERATIONS; i++)
*aligned_ptr = *aligned_ptr + 1;
clock_t aligned_time = clock() - start;
/* Unaligned access via packed struct */
struct __attribute__((packed)) {
uint8_t pad;
uint32_t val;
} unaligned = {0, 0};
start = clock();
for (int i = 0; i < ITERATIONS; i++)
unaligned.val = unaligned.val + 1;
clock_t unaligned_time = clock() - start;
printf("Aligned: %ld ticks\n", (long)aligned_time);
printf("Unaligned: %ld ticks\n", (long)unaligned_time);
return 0;
}
Try It: Compile with
-O0and-O2and compare the results. The compiler may generate special unaligned-access instructions at higher optimization levels. Also try on an ARM machine if you have one -- the difference may be dramatic.
Rust: repr(C)
By default, Rust makes no guarantees about struct layout. The compiler is free to
reorder fields, add padding, or change layout between compilations. To get a
predictable C-compatible layout, use #[repr(C)].
use std::mem; #[repr(C)] struct Example { a: u8, b: u32, c: u8, } fn main() { println!("size = {}", mem::size_of::<Example>()); println!("align = {}", mem::align_of::<Example>()); let ex = Example { a: 1, b: 2, c: 3 }; let ptr = &ex as *const Example as *const u8; // Use offset_of! (stabilized in Rust 1.77) println!("offset of a = {}", mem::offset_of!(Example, a)); println!("offset of b = {}", mem::offset_of!(Example, b)); println!("offset of c = {}", mem::offset_of!(Example, c)); }
Output matches the C version: size 12, offsets 0/4/8.
Rust Note: Without
#[repr(C)], Rust's defaultrepr(Rust)may reorder fields to minimize padding. This is an optimization -- but it means you cannot predict the layout. Always userepr(C)for FFI or hardware-facing structs.
Rust: repr(packed)
use std::mem; #[repr(C, packed)] struct WireHeader { version: u8, length: u16, sequence: u32, flags: u8, } fn main() { println!("size = {}", mem::size_of::<WireHeader>()); // 8 println!("offset version = {}", mem::offset_of!(WireHeader, version)); println!("offset length = {}", mem::offset_of!(WireHeader, length)); println!("offset sequence = {}", mem::offset_of!(WireHeader, sequence)); println!("offset flags = {}", mem::offset_of!(WireHeader, flags)); }
Caution: In Rust, taking a reference to a field in a
packedstruct is undefined behavior if the field is not naturally aligned. The compiler will refuse to create&header.sequenceif it might be unaligned. You must useaddr_of!(header.sequence).read_unaligned()or copy the field first.
use std::ptr::addr_of; #[repr(C, packed)] struct Packed { a: u8, b: u32, } fn main() { let p = Packed { a: 1, b: 0xDEADBEEF }; // This would be UB: let r = &p.b; // Safe way: let b_val = unsafe { addr_of!(p.b).read_unaligned() }; println!("b = 0x{b_val:08X}"); }
Rust: repr(align(N))
Force a minimum alignment, useful for cache-line alignment.
use std::mem; #[repr(C, align(64))] struct CacheAligned { counter: u64, data: [u8; 32], } fn main() { println!("size = {}", mem::size_of::<CacheAligned>()); // 64 println!("align = {}", mem::align_of::<CacheAligned>()); // 64 let obj = CacheAligned { counter: 0, data: [0; 32] }; let addr = &obj as *const CacheAligned as usize; println!("address = 0x{addr:X}"); println!("aligned to 64? {}", addr % 64 == 0); }
Driver Prep: Cache-line alignment prevents false sharing in concurrent code. When two threads write to different fields that share a cache line, the CPU bounces the line between cores. Aligning to 64 bytes (typical cache line) avoids this. The Linux kernel uses
____cacheline_alignedfor this purpose.
Verifying Layout at Compile Time
In C, use _Static_assert (C11):
#include <stdint.h>
#include <stddef.h>
struct __attribute__((packed)) wire_msg {
uint8_t type;
uint16_t length;
uint32_t payload;
};
_Static_assert(sizeof(struct wire_msg) == 7, "wire_msg must be 7 bytes");
_Static_assert(offsetof(struct wire_msg, payload) == 3, "payload at offset 3");
int main(void) {
return 0;
}
In Rust, use const assertions:
#[repr(C, packed)] struct WireMsg { msg_type: u8, length: u16, payload: u32, } const _: () = assert!(std::mem::size_of::<WireMsg>() == 7); fn main() { println!("Layout verified at compile time."); }
A Real-World Example: ELF Header
The ELF file format begins with a fixed-layout header. Here is a partial version:
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <string.h>
struct __attribute__((packed)) elf_ident {
uint8_t magic[4]; /* 0x7F 'E' 'L' 'F' */
uint8_t class; /* 1=32-bit, 2=64-bit */
uint8_t data; /* 1=LE, 2=BE */
uint8_t version;
uint8_t osabi;
uint8_t pad[8];
};
_Static_assert(sizeof(struct elf_ident) == 16, "ELF ident must be 16 bytes");
int main(void) {
struct elf_ident ident;
memset(&ident, 0, sizeof(ident));
ident.magic[0] = 0x7F;
ident.magic[1] = 'E';
ident.magic[2] = 'L';
ident.magic[3] = 'F';
ident.class = 2; /* 64-bit */
ident.data = 1; /* little-endian */
ident.version = 1;
printf("ELF ident: ");
uint8_t *bytes = (uint8_t *)&ident;
for (size_t i = 0; i < sizeof(ident); i++)
printf("%02X ", bytes[i]);
printf("\n");
return 0;
}
use std::mem; #[repr(C, packed)] struct ElfIdent { magic: [u8; 4], class: u8, data: u8, version: u8, osabi: u8, pad: [u8; 8], } const _: () = assert!(mem::size_of::<ElfIdent>() == 16); fn main() { let ident = ElfIdent { magic: [0x7F, b'E', b'L', b'F'], class: 2, // 64-bit data: 1, // little-endian version: 1, osabi: 0, pad: [0; 8], }; let bytes: &[u8] = unsafe { std::slice::from_raw_parts( &ident as *const ElfIdent as *const u8, mem::size_of::<ElfIdent>(), ) }; print!("ELF ident: "); for b in bytes { print!("{b:02X} "); } println!(); }
Try It: Read the first 16 bytes of
/bin/ls(or any ELF binary) into this struct and verify the magic number. In C, usefread. In Rust, usestd::fs::readand slice the first 16 bytes.
Quick Knowledge Check
- A struct has fields
u8, u32, u8withrepr(C). What is its size and why? - What happens on ARM if you read a
u32from an odd address without packed access? - Why does Rust refuse to let you create
&packed_struct.unaligned_field?
Common Pitfalls
- Assuming struct size equals sum of field sizes. Padding exists. Always verify
with
sizeof/size_of. - Forgetting trailing padding. The struct's total size is rounded up to its alignment so that arrays work.
- Using packed structs everywhere. Pack only when the wire format demands it. Unpacked structs are faster.
- Taking references to packed fields in Rust. This is UB. Use
read_unaligned. - Forgetting
repr(C). Default Rust layout is unspecified. Withoutrepr(C), your struct will not match the C equivalent. - Not asserting layout. Always add static assertions for struct size when the layout must be exact. Catch mistakes at compile time, not in production.
Endianness and Byte Order
A 32-bit integer is four bytes. Which byte goes first? The answer depends on the machine -- and it matters every time data crosses a boundary: network sockets, file formats, shared memory between architectures, or talking to hardware.
Little-Endian vs Big-Endian
The value 0x01020304 stored in memory:
Little-endian (x86, ARM default, RISC-V):
Address: 0x00 0x01 0x02 0x03
Content: 0x04 0x03 0x02 0x01
LSB MSB (least significant byte first)
Big-endian (network byte order, SPARC, some ARM modes):
Address: 0x00 0x01 0x02 0x03
Content: 0x01 0x02 0x03 0x04
MSB LSB (most significant byte first)
The term comes from Gulliver's Travels -- which end of the egg do you crack first? In systems programming, you crack whichever end the spec says.
Why It Matters
Within a single machine, endianness is invisible. The CPU loads and stores multi-byte values in its native order, and everything just works.
Problems appear when bytes cross boundaries:
- Network protocols. TCP/IP headers are big-endian (by convention, "network byte order"). An x86 machine must swap bytes before sending and after receiving.
- File formats. Some use big-endian (Java
.classfiles), some use little-endian (most Windows formats), some specify per-file (TIFF). - Hardware registers. PCI is little-endian. Some SoC peripherals are big-endian. The driver must know.
Seeing It In C: The Union Trick
#include <stdio.h>
#include <stdint.h>
union endian_check {
uint32_t word;
uint8_t bytes[4];
};
int main(void) {
union endian_check ec;
ec.word = 0x01020304;
printf("word = 0x%08X\n", ec.word);
printf("bytes: [0]=0x%02X [1]=0x%02X [2]=0x%02X [3]=0x%02X\n",
ec.bytes[0], ec.bytes[1], ec.bytes[2], ec.bytes[3]);
if (ec.bytes[0] == 0x04)
printf("Little-endian\n");
else if (ec.bytes[0] == 0x01)
printf("Big-endian\n");
else
printf("Mixed endian (?)\n");
return 0;
}
On x86 you will see:
word = 0x01020304
bytes: [0]=0x04 [1]=0x03 [2]=0x02 [3]=0x01
Little-endian
Caution: Accessing a union through a different member than the one last written is technically undefined behavior in C99+ (it is defined in C11 as a "type-pun"). In practice, every major compiler supports it. For strictly-conforming code, use
memcpyinstead.
Detecting Endianness at Runtime
The union trick above is one approach. Here is another, using a pointer cast:
#include <stdio.h>
#include <stdint.h>
int is_little_endian(void) {
uint16_t val = 1;
uint8_t *byte = (uint8_t *)&val;
return byte[0] == 1;
}
int main(void) {
printf("This machine is %s-endian\n",
is_little_endian() ? "little" : "big");
return 0;
}
In Rust:
fn is_little_endian() -> bool { let val: u16 = 1; let bytes = val.to_ne_bytes(); // native endian bytes[0] == 1 } fn main() { if is_little_endian() { println!("Little-endian"); } else { println!("Big-endian"); } }
Rust Note: In practice you rarely need runtime detection. Rust's byte conversion methods (
to_be_bytes,to_le_bytes, etc.) handle the conversion for you regardless of the host platform.
The C Conversion Functions: htons, htonl, ntohs, ntohl
POSIX provides four functions to convert between host and network byte order.
htons -- host to network, short (16-bit)
htonl -- host to network, long (32-bit)
ntohs -- network to host, short (16-bit)
ntohl -- network to host, long (32-bit)
On a little-endian machine, these swap bytes. On a big-endian machine, they are no-ops.
#include <stdio.h>
#include <stdint.h>
#include <arpa/inet.h>
int main(void) {
uint16_t host_port = 8080;
uint16_t net_port = htons(host_port);
printf("Host order: 0x%04X\n", host_port); /* 0x1F90 */
printf("Net order: 0x%04X\n", net_port); /* 0x901F on LE */
uint32_t host_addr = 0xC0A80001; /* 192.168.0.1 */
uint32_t net_addr = htonl(host_addr);
printf("Host order: 0x%08X\n", host_addr);
printf("Net order: 0x%08X\n", net_addr);
/* Round-trip */
printf("Back: 0x%08X\n", ntohl(net_addr));
return 0;
}
Compile: gcc -o endian endian.c && ./endian
What About 64-bit?
POSIX does not define htonll. You can build your own or use compiler builtins:
#include <stdio.h>
#include <stdint.h>
#include <arpa/inet.h>
uint64_t htonll(uint64_t val) {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
return __builtin_bswap64(val);
#else
return val;
#endif
}
uint64_t ntohll(uint64_t val) {
return htonll(val); /* same operation -- swap is its own inverse */
}
int main(void) {
uint64_t host_val = 0x0102030405060708ULL;
uint64_t net_val = htonll(host_val);
printf("Host: 0x%016lX\n", host_val);
printf("Net: 0x%016lX\n", net_val);
printf("Back: 0x%016lX\n", ntohll(net_val));
return 0;
}
Rust: to_be_bytes / to_le_bytes / from_be_bytes
Rust takes a different approach: no separate functions, just methods on integer types.
fn main() { let port: u16 = 8080; // Convert to big-endian (network order) bytes let net_bytes = port.to_be_bytes(); println!("port {port} as network bytes: {:02X} {:02X}", net_bytes[0], net_bytes[1]); // Convert back let recovered = u16::from_be_bytes(net_bytes); println!("recovered: {recovered}"); // 32-bit example let addr: u32 = 0xC0A80001; // 192.168.0.1 let net = addr.to_be_bytes(); println!("IP as network bytes: {}.{}.{}.{}", net[0], net[1], net[2], net[3]); // 64-bit -- just works, no special function needed let val: u64 = 0x0102030405060708; let be = val.to_be_bytes(); println!("64-bit big-endian: {:02X?}", be); let back = u64::from_be_bytes(be); println!("recovered: 0x{back:016X}"); }
The full set of methods:
.to_be_bytes() -- convert to big-endian byte array
.to_le_bytes() -- convert to little-endian byte array
.to_ne_bytes() -- convert to native-endian byte array
from_be_bytes() -- construct from big-endian bytes
from_le_bytes() -- construct from little-endian bytes
from_ne_bytes() -- construct from native-endian bytes
Rust Note: These methods return and consume fixed-size arrays (
[u8; 2],[u8; 4],[u8; 8]), not slices. This means the conversion is zero-cost when the compiler can see both the conversion and the use -- it often just emits abswapinstruction or nothing at all.
Wire Format Patterns
When parsing a network packet, always convert from network byte order to host order. When building a packet, always convert from host to network order.
Parsing a Simple Packet Header in C
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>
struct __attribute__((packed)) msg_header {
uint16_t msg_type;
uint16_t msg_length;
uint32_t sequence;
};
void parse_header(const uint8_t *data) {
struct msg_header hdr;
memcpy(&hdr, data, sizeof(hdr));
/* Convert from network byte order */
uint16_t type = ntohs(hdr.msg_type);
uint16_t len = ntohs(hdr.msg_length);
uint32_t seq = ntohl(hdr.sequence);
printf("Type: %u Length: %u Seq: %u\n", type, len, seq);
}
void build_header(uint8_t *buf, uint16_t type, uint16_t len, uint32_t seq) {
struct msg_header hdr;
hdr.msg_type = htons(type);
hdr.msg_length = htons(len);
hdr.sequence = htonl(seq);
memcpy(buf, &hdr, sizeof(hdr));
}
int main(void) {
uint8_t wire[8];
build_header(wire, 1, 128, 42);
printf("Wire bytes: ");
for (int i = 0; i < 8; i++)
printf("%02X ", wire[i]);
printf("\n");
parse_header(wire);
return 0;
}
Same Pattern in Rust
fn parse_header(data: &[u8]) { if data.len() < 8 { eprintln!("Short packet"); return; } let msg_type = u16::from_be_bytes([data[0], data[1]]); let msg_len = u16::from_be_bytes([data[2], data[3]]); let sequence = u32::from_be_bytes([data[4], data[5], data[6], data[7]]); println!("Type: {msg_type} Length: {msg_len} Seq: {sequence}"); } fn build_header(msg_type: u16, msg_len: u16, sequence: u32) -> [u8; 8] { let mut buf = [0u8; 8]; buf[0..2].copy_from_slice(&msg_type.to_be_bytes()); buf[2..4].copy_from_slice(&msg_len.to_be_bytes()); buf[4..8].copy_from_slice(&sequence.to_be_bytes()); buf } fn main() { let wire = build_header(1, 128, 42); print!("Wire bytes: "); for b in &wire { print!("{b:02X} "); } println!(); parse_header(&wire); }
Try It: Extend both programs to include a
uint64_ttimestamp field in the header. Usehtonll/ntohllin C andto_be_bytes/from_be_bytesin Rust.
Byte Swapping Internals
What does a byte swap actually do?
Original (LE): 0x04 0x03 0x02 0x01
Swapped (BE): 0x01 0x02 0x03 0x04
A manual 32-bit swap:
#include <stdio.h>
#include <stdint.h>
uint32_t swap32(uint32_t x) {
return ((x & 0x000000FFu) << 24)
| ((x & 0x0000FF00u) << 8)
| ((x & 0x00FF0000u) >> 8)
| ((x & 0xFF000000u) >> 24);
}
int main(void) {
uint32_t val = 0x01020304;
uint32_t swapped = swap32(val);
printf("0x%08X -> 0x%08X\n", val, swapped);
return 0;
}
Modern compilers recognize this pattern and emit a single bswap instruction. You
can also use the builtins directly:
/* GCC/Clang */
uint16_t s = __builtin_bswap16(0x0102); /* 0x0201 */
uint32_t w = __builtin_bswap32(0x01020304);
uint64_t d = __builtin_bswap64(0x0102030405060708ULL);
In Rust:
fn main() { let val: u32 = 0x01020304; let swapped = val.swap_bytes(); println!("0x{val:08X} -> 0x{swapped:08X}"); // Also available on u16, u64, u128, i32, etc. let s: u16 = 0x0102; println!("0x{s:04X} -> 0x{:04X}", s.swap_bytes()); }
Endianness in Structs: A Complete Example
Suppose a sensor sends data in big-endian format. Here is how you would parse it.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>
struct __attribute__((packed)) sensor_reading {
uint8_t sensor_id;
uint16_t temperature; /* big-endian, units: 0.1 deg C */
uint32_t timestamp; /* big-endian, Unix epoch */
};
void decode_reading(const uint8_t *raw) {
struct sensor_reading r;
memcpy(&r, raw, sizeof(r));
uint8_t id = r.sensor_id;
uint16_t temp = ntohs(r.temperature);
uint32_t ts = ntohl(r.timestamp);
printf("Sensor %u: %.1f C at t=%u\n", id, temp / 10.0, ts);
}
int main(void) {
/* Simulated wire data: sensor 3, 25.6 C (256 = 0x0100), ts=1000 */
uint8_t wire[] = {
0x03, /* sensor_id */
0x01, 0x00, /* temperature = 256 (big-endian) */
0x00, 0x00, 0x03, 0xE8 /* timestamp = 1000 (big-endian) */
};
decode_reading(wire);
return 0;
}
fn decode_reading(raw: &[u8]) { if raw.len() < 7 { eprintln!("Short reading"); return; } let id = raw[0]; let temp = u16::from_be_bytes([raw[1], raw[2]]); let ts = u32::from_be_bytes([raw[3], raw[4], raw[5], raw[6]]); println!("Sensor {id}: {:.1} C at t={ts}", temp as f64 / 10.0); } fn main() { let wire: &[u8] = &[ 0x03, 0x01, 0x00, 0x00, 0x00, 0x03, 0xE8, ]; decode_reading(wire); }
Driver Prep: PCI and PCIe are little-endian by specification. When your driver reads a register on an x86 host, no swapping is needed. But some SoC buses are big-endian, and the kernel provides
ioread32be/iowrite32befor those. Always check the hardware manual for the device's byte order.
Mixed Endianness in the Wild
Some formats mix endianness. The classic example: the ELF file format. The ELF identification bytes specify the endianness of the rest of the file.
Byte 5 of ELF header (e_ident[EI_DATA]):
1 = ELFDATA2LSB (little-endian)
2 = ELFDATA2MSB (big-endian)
Your parser must read this byte first, then decide how to interpret all subsequent multi-byte fields.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
uint16_t read_u16(const uint8_t *p, int big_endian) {
if (big_endian)
return ((uint16_t)p[0] << 8) | p[1];
else
return ((uint16_t)p[1] << 8) | p[0];
}
uint32_t read_u32(const uint8_t *p, int big_endian) {
if (big_endian)
return ((uint32_t)p[0] << 24) | ((uint32_t)p[1] << 16)
| ((uint32_t)p[2] << 8) | p[3];
else
return ((uint32_t)p[3] << 24) | ((uint32_t)p[2] << 16)
| ((uint32_t)p[1] << 8) | p[0];
}
int main(void) {
uint8_t be_data[] = {0x00, 0x01, 0x00, 0x02};
uint8_t le_data[] = {0x01, 0x00, 0x02, 0x00};
printf("BE u16: %u\n", read_u16(be_data, 1)); /* 1 */
printf("LE u16: %u\n", read_u16(le_data, 0)); /* 1 */
printf("BE u32: %u\n", read_u32(be_data, 1)); /* 65538 */
printf("LE u32: %u\n", read_u32(le_data, 0)); /* 131073 */
return 0;
}
fn read_u16(p: &[u8], big_endian: bool) -> u16 { if big_endian { u16::from_be_bytes([p[0], p[1]]) } else { u16::from_le_bytes([p[0], p[1]]) } } fn read_u32(p: &[u8], big_endian: bool) -> u32 { if big_endian { u32::from_be_bytes([p[0], p[1], p[2], p[3]]) } else { u32::from_le_bytes([p[0], p[1], p[2], p[3]]) } } fn main() { let be_data = [0x00u8, 0x01, 0x00, 0x02]; let le_data = [0x01u8, 0x00, 0x02, 0x00]; println!("BE u16: {}", read_u16(&be_data, true)); println!("LE u16: {}", read_u16(&le_data, false)); println!("BE u32: {}", read_u32(&be_data, true)); println!("LE u32: {}", read_u32(&le_data, false)); }
Quick Knowledge Check
- On a little-endian machine,
uint32_t x = 1;-- what is the value of the byte at((uint8_t *)&x)[0]? What about[3]? - A protocol spec says the port field is "in network byte order." You receive bytes
0x1F 0x90. What is the port number? - You see
htonl(INADDR_ANY)in network code.INADDR_ANYis0. Doeshtonlchange it? Why or why not?
Common Pitfalls
- Forgetting to convert. The most common network bug: sending
uint32_twithouthtonl. It works on big-endian machines, fails on little-endian, and the test server was big-endian. - Double-converting. Calling
htonlon a value that is already in network order swaps it back. Convert exactly once. - Assuming endianness. Your code might run on ARM big-endian someday. Always use explicit conversions for portable code.
- Casting instead of memcpy.
*(uint32_t *)bufis an alignment violation ifbufis not 4-byte aligned. Usememcpyorfrom_be_bytes. - No 64-bit POSIX function.
htonllis not standard. Roll your own or use__builtin_bswap64.
Volatile, Type Punning, and Hardware Access Patterns
When your code talks directly to hardware, two things break: the compiler's assumptions
about memory, and the type system's assumptions about data. This chapter covers the
volatile keyword, type punning, strict aliasing, and the register access patterns
used in embedded and driver code.
The Problem: The Compiler Is Too Smart
Compilers optimize aggressively. They assume that if no code writes to a variable, its value does not change. They assume that writing to a variable that is never read afterward is dead code. Both assumptions are wrong when hardware is involved.
#include <stdio.h>
#include <stdint.h>
/* Simulating a hardware status register */
static uint32_t fake_hw_register = 0;
void wait_for_ready_broken(void) {
uint32_t *status = &fake_hw_register;
/* BUG: compiler sees *status never changes in this loop */
/* At -O2, this becomes an infinite loop or is removed entirely */
while ((*status & 0x01) == 0) {
/* spin */
}
}
int main(void) {
printf("This function has a bug -- see the source.\n");
/* Do NOT call wait_for_ready_broken -- it will hang */
return 0;
}
At -O2, the compiler loads *status once, sees it is zero, and generates an
infinite loop -- or removes the loop as dead code. The compiler does not know that
hardware can change the value behind its back.
The volatile Keyword
volatile tells the compiler: do not optimize away accesses to this variable.
Read it every time the code says to read it. Write it every time the code says to
write it. In the order the code specifies.
#include <stdio.h>
#include <stdint.h>
static volatile uint32_t fake_hw_status = 0;
void wait_for_ready(void) {
/* volatile forces a real memory read on every iteration */
while ((fake_hw_status & 0x01) == 0) {
/* spin -- compiler MUST re-read fake_hw_status each time */
}
}
int main(void) {
printf("volatile prevents the compiler from caching the read.\n");
/* Still do not call wait_for_ready in this demo -- there is no */
/* other thread or hardware changing the value. */
return 0;
}
What volatile Does and Does NOT Do
volatile guarantees:
- Every read in the source produces a load instruction
- Every write in the source produces a store instruction
- Reads and writes to the same volatile variable are not reordered relative to each other
volatile does NOT guarantee:
- Atomicity -- a 64-bit volatile read on a 32-bit CPU may tear
- Memory ordering between different variables (use memory barriers for that)
- Thread safety (use
_Atomicorstdatomic.hfor threads)
Caution:
volatileis NOT a substitute for atomic operations in multithreaded code. In C, use_Atomic. In Rust, usestd::sync::atomic.volatileis for hardware registers and memory-mapped I/O only.
Memory-Mapped Hardware Registers
Real hardware appears as addresses in the CPU's memory map. Reading or writing those addresses talks to the device.
Physical memory map (simplified):
0x0000_0000 - 0x3FFF_FFFF RAM
0x4000_0000 - 0x4000_00FF UART registers
0x4000_0100 - 0x4000_01FF GPIO registers
0x4000_0200 - 0x4000_02FF Timer registers
A typical register block for a UART:
Offset Register Access
0x00 DATA R/W (read = receive, write = transmit)
0x04 STATUS R (bit 0 = TX ready, bit 1 = RX data available)
0x08 CONTROL R/W (bit 0 = enable, bit 1 = interrupt enable)
0x0C BAUD_DIV R/W (baud rate divisor)
Accessing Registers in C
#include <stdio.h>
#include <stdint.h>
/* In real code, UART_BASE comes from device tree or platform header */
/* Here we simulate with a static array */
static uint32_t simulated_uart[4] = {0, 0x03, 0, 0};
#define UART_BASE ((volatile uint32_t *)simulated_uart)
#define UART_DATA (UART_BASE[0])
#define UART_STATUS (UART_BASE[1])
#define UART_CONTROL (UART_BASE[2])
#define UART_BAUD (UART_BASE[3])
#define STATUS_TX_READY (1u << 0)
#define STATUS_RX_AVAIL (1u << 1)
#define CTRL_ENABLE (1u << 0)
#define CTRL_IRQ_EN (1u << 1)
void uart_init(uint32_t baud_divisor) {
UART_BAUD = baud_divisor;
UART_CONTROL = CTRL_ENABLE;
}
void uart_send(uint8_t byte) {
while (!(UART_STATUS & STATUS_TX_READY)) {
/* spin -- volatile ensures re-read */
}
UART_DATA = byte;
}
int main(void) {
uart_init(26); /* e.g., 115200 baud */
/* STATUS already has TX_READY set in our simulation */
uart_send('H');
printf("Sent 'H' (0x%02X) to simulated UART\n",
(unsigned)simulated_uart[0]);
printf("CONTROL = 0x%08X\n", (unsigned)simulated_uart[2]);
printf("BAUD = %u\n", (unsigned)simulated_uart[3]);
return 0;
}
Driver Prep: In the Linux kernel, you never access physical addresses directly. The kernel provides
ioremap()to map physical addresses into kernel virtual space, andreadl()/writel()to perform volatile MMIO reads/writes with proper barriers. The pattern is:void __iomem *base = ioremap(phys, size);thenval = readl(base + OFFSET);.
Type Punning in C
Type punning means reinterpreting the bytes of one type as another. There are three ways to do it in C, and two of them are problematic.
Method 1: Pointer Cast (Dangerous)
#include <stdio.h>
#include <stdint.h>
int main(void) {
float f = 3.14f;
uint32_t *p = (uint32_t *)&f; /* strict aliasing violation! */
printf("float 3.14 as uint32: 0x%08X\n", *p);
return 0;
}
This compiles and works on most compilers with default settings. But it violates the strict aliasing rule and is technically undefined behavior.
Method 2: Union (Common, Practical)
#include <stdio.h>
#include <stdint.h>
union float_bits {
float f;
uint32_t u;
};
int main(void) {
union float_bits fb;
fb.f = 3.14f;
printf("float 3.14 as uint32: 0x%08X\n", fb.u);
/* Inspect the IEEE 754 parts */
uint32_t sign = (fb.u >> 31) & 1;
uint32_t exponent = (fb.u >> 23) & 0xFF;
uint32_t mantissa = fb.u & 0x7FFFFF;
printf("sign=%u exp=%u mantissa=0x%06X\n", sign, exponent, mantissa);
return 0;
}
Caution: Union type-punning is well-defined in C11 (6.5.2.3) but NOT in C++. If you write code that must compile as both C and C++, use
memcpy.
Method 3: memcpy (Always Correct)
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main(void) {
float f = 3.14f;
uint32_t u;
memcpy(&u, &f, sizeof(u));
printf("float 3.14 as uint32: 0x%08X\n", u);
/* Round-trip */
float f2;
memcpy(&f2, &u, sizeof(f2));
printf("back to float: %f\n", f2);
return 0;
}
memcpy is the only method that is correct under all standards, all compilers, and
all optimization levels. Modern compilers optimize small memcpy calls into register
moves -- there is no performance penalty.
The Strict Aliasing Rule
The strict aliasing rule (C11 6.5 paragraph 7) says: you may only access an object through a pointer to a compatible type, a character type, or a signed/unsigned variant of its declared type.
#include <stdio.h>
#include <stdint.h>
/* This violates strict aliasing: */
void bad_example(void) {
int x = 42;
float *fp = (float *)&x; /* int* -> float*: VIOLATION */
/* Reading *fp is undefined behavior */
printf("%f\n", *fp); /* compiler may return garbage at -O2 */
}
/* This is fine -- char* can alias anything: */
void ok_example(void) {
int x = 42;
unsigned char *cp = (unsigned char *)&x;
for (size_t i = 0; i < sizeof(x); i++)
printf("%02X ", cp[i]);
printf("\n");
}
int main(void) {
ok_example();
/* bad_example(); -- do not rely on this */
return 0;
}
GCC's -fstrict-aliasing (enabled at -O2 and above) lets the compiler assume the
rule is followed. Violations cause real, baffling, optimization-dependent bugs.
$ gcc -O0 -o alias alias.c # might "work"
$ gcc -O2 -o alias alias.c # might break -- UB
$ gcc -O2 -fno-strict-aliasing -o alias alias.c # disables the optimization
Caution: The Linux kernel compiles with
-fno-strict-aliasingbecause kernel code routinely casts between pointer types. This is a pragmatic choice -- not a license to ignore aliasing in your own code.
Rust: read_volatile / write_volatile
Rust has no volatile keyword. Instead, it provides two functions in std::ptr:
use std::ptr; fn main() { let mut hw_reg: u32 = 0; // Volatile write unsafe { ptr::write_volatile(&mut hw_reg as *mut u32, 0xDEAD_BEEF); } // Volatile read let val = unsafe { ptr::read_volatile(&hw_reg as *const u32) }; println!("Register value: 0x{val:08X}"); }
Rust Note:
read_volatileandwrite_volatileareunsafebecause they take raw pointers. The volatility is a property of the access, not the variable. This is more precise than C's model, where volatility is part of the type.
Modeling Hardware Registers in Rust
In idiomatic Rust, you wrap register access in a struct that encapsulates the unsafe volatile operations.
use std::ptr; /// A read-write hardware register at a fixed memory address. struct Register { addr: *mut u32, } impl Register { /// # Safety /// `addr` must point to a valid, mapped hardware register. unsafe fn new(addr: *mut u32) -> Self { Register { addr } } fn read(&self) -> u32 { unsafe { ptr::read_volatile(self.addr) } } fn write(&self, val: u32) { unsafe { ptr::write_volatile(self.addr, val) } } fn set_bits(&self, mask: u32) { let old = self.read(); self.write(old | mask); } fn clear_bits(&self, mask: u32) { let old = self.read(); self.write(old & !mask); } fn read_field(&self, mask: u32, shift: u32) -> u32 { (self.read() & mask) >> shift } fn write_field(&self, mask: u32, shift: u32, val: u32) { let old = self.read() & !mask; self.write(old | ((val << shift) & mask)); } } // Demonstration using a simulated register fn main() { let mut simulated_reg: u32 = 0; let reg = unsafe { Register::new(&mut simulated_reg as *mut u32) }; reg.write(0x0000_0000); reg.set_bits(0x01); // enable reg.write_field(0x0E, 1, 5); // set mode field [3:1] = 5 println!("Register = 0x{:08X}", reg.read()); println!("Mode = {}", reg.read_field(0x0E, 1)); }
Read-Only and Write-Only Registers
Some registers must not be written (status registers), and some must not be read (command/data FIFOs where reading has side effects). Encode this in the type system.
use std::ptr; use std::marker::PhantomData; struct ReadOnly; struct WriteOnly; struct ReadWrite; struct Reg<MODE> { addr: *mut u32, _mode: PhantomData<MODE>, } impl<MODE> Reg<MODE> { unsafe fn new(addr: *mut u32) -> Self { Reg { addr, _mode: PhantomData } } } impl Reg<ReadOnly> { fn read(&self) -> u32 { unsafe { ptr::read_volatile(self.addr) } } // No write method -- compile error if you try } impl Reg<WriteOnly> { fn write(&self, val: u32) { unsafe { ptr::write_volatile(self.addr, val) } } // No read method } impl Reg<ReadWrite> { fn read(&self) -> u32 { unsafe { ptr::read_volatile(self.addr) } } fn write(&self, val: u32) { unsafe { ptr::write_volatile(self.addr, val) } } } fn main() { let mut status_mem: u32 = 0x42; let mut data_mem: u32 = 0; let mut ctrl_mem: u32 = 0; let status: Reg<ReadOnly> = unsafe { Reg::new(&mut status_mem) }; let data: Reg<WriteOnly> = unsafe { Reg::new(&mut data_mem) }; let ctrl: Reg<ReadWrite> = unsafe { Reg::new(&mut ctrl_mem) }; println!("Status = 0x{:02X}", status.read()); // status.write(0); // COMPILE ERROR -- ReadOnly has no write() data.write(0xFF); // data.read(); // COMPILE ERROR -- WriteOnly has no read() ctrl.write(0x01); println!("Ctrl = 0x{:02X}", ctrl.read()); }
Driver Prep: The Rust embedded ecosystem (cortex-m, svd2rust) generates register access code with exactly this pattern. The SVD file from the chip vendor describes which registers are read-only, write-only, or read-write, and the generated code enforces it at compile time.
Type Punning in Rust
Rust does not have unions in the C sense (Rust has union but accessing fields is
unsafe). The idiomatic approach is transmute or byte-level methods.
Using transmute
fn main() { let f: f32 = 3.14; let bits: u32 = unsafe { std::mem::transmute(f) }; println!("f32 3.14 as u32: 0x{bits:08X}"); let sign = (bits >> 31) & 1; let exponent = (bits >> 23) & 0xFF; let mantissa = bits & 0x7F_FFFF; println!("sign={sign} exp={exponent} mantissa=0x{mantissa:06X}"); // Round-trip let f2: f32 = unsafe { std::mem::transmute(bits) }; println!("back to f32: {f2}"); }
Caution:
transmuteis extremely unsafe. The source and destination types must have the same size (checked at compile time) but the compiler cannot verify that the bit pattern is valid for the destination type. Prefer safer alternatives when they exist.
Using to_bits / from_bits (Preferred)
fn main() { let f: f32 = 3.14; let bits = f.to_bits(); println!("f32 3.14 as u32: 0x{bits:08X}"); let f2 = f32::from_bits(bits); println!("back to f32: {f2}"); // For f64 <-> u64: let d: f64 = 2.718281828; let dbits = d.to_bits(); println!("f64 as u64: 0x{dbits:016X}"); }
to_bits() and from_bits() are safe, stable, and produce the same code as
transmute. Always prefer them for float/integer conversions.
The Register Access Pattern for Embedded/Driver Code
Putting it all together: a complete register block definition as used in real embedded Rust.
use std::ptr; /// UART register block starting at a base address. struct Uart { base: *mut u8, } impl Uart { const DATA_OFF: usize = 0x00; const STATUS_OFF: usize = 0x04; const CONTROL_OFF: usize = 0x08; const BAUD_OFF: usize = 0x0C; const STATUS_TX_READY: u32 = 1 << 0; const STATUS_RX_AVAIL: u32 = 1 << 1; const CTRL_ENABLE: u32 = 1 << 0; /// # Safety /// `base` must point to a mapped UART register block. unsafe fn new(base: *mut u8) -> Self { Uart { base } } fn read_reg(&self, offset: usize) -> u32 { unsafe { ptr::read_volatile(self.base.add(offset) as *const u32) } } fn write_reg(&self, offset: usize, val: u32) { unsafe { ptr::write_volatile(self.base.add(offset) as *mut u32, val); } } fn init(&self, baud_divisor: u32) { self.write_reg(Self::BAUD_OFF, baud_divisor); self.write_reg(Self::CONTROL_OFF, Self::CTRL_ENABLE); } fn send_byte(&self, byte: u8) { while (self.read_reg(Self::STATUS_OFF) & Self::STATUS_TX_READY) == 0 { // spin } self.write_reg(Self::DATA_OFF, byte as u32); } fn try_recv(&self) -> Option<u8> { if (self.read_reg(Self::STATUS_OFF) & Self::STATUS_RX_AVAIL) != 0 { Some(self.read_reg(Self::DATA_OFF) as u8) } else { None } } } fn main() { // Simulate a register block in memory let mut regs = [0u32; 4]; regs[1] = 0x01; // STATUS: TX_READY set let uart = unsafe { Uart::new(regs.as_mut_ptr() as *mut u8) }; uart.init(26); uart.send_byte(b'R'); println!("DATA = 0x{:08X}", regs[0]); // 'R' = 0x52 println!("CONTROL = 0x{:08X}", regs[2]); // ENABLE = 0x01 println!("BAUD = {}", regs[3]); // 26 }
This pattern -- base pointer plus offsets, volatile reads/writes, bit masks for fields -- is the foundation of every hardware driver, whether you write it in C or Rust.
Try It: Add an interrupt-enable bit to the CONTROL register (bit 1). Write a method
enable_interrupts(&self)that sets bit 1 without clearing bit 0. This is the read-modify-write pattern that every driver uses.
Quick Knowledge Check
- What happens if you remove
volatilefrom a hardware status register poll loop and compile at-O2? - In C, why is
memcpypreferred over pointer casts for type punning? - Why does Rust make
read_volatile/write_volatileunsafe, when C just uses a type qualifier?
Common Pitfalls
- Using volatile for thread synchronization. It does not provide atomicity or memory ordering between threads. Use atomics.
- Forgetting volatile on MMIO. The compiler will optimize your register writes
away. One missing
volatilecan make a device non-functional. - Read-modify-write races.
reg |= BITis read, modify, write. If an interrupt fires between the read and write, the change is lost. Use spin locks in kernel code. - Strict aliasing violations. Pointer casts between unrelated types are UB at
-O2. Usememcpy. - transmute misuse in Rust. If the bit pattern is invalid for the target type
(e.g., transmuting
2u8tobool), it is instant UB. Preferto_bits()orTryFrom. - Assuming volatile ordering across variables.
volatileorders accesses to the same variable only. Use compiler/kernel barrier macros for cross-variable ordering.
Data Structures in C and Rust
Every systems program is a data structure program. This chapter builds the classics by hand in C -- linked lists, hash tables, trees, stacks, queues -- then shows how Rust's standard library replaces most of that labor.
The Textbook Linked List
The singly linked list is the "hello world" of dynamic data structures. A node
holds data plus a pointer to the next node. The list ends when next is NULL.
/* slist.c -- singly linked list in C */
#include <stdio.h>
#include <stdlib.h>
struct node {
int data;
struct node *next;
};
/* Prepend a new node to the front of the list. */
struct node *list_push(struct node *head, int value)
{
struct node *n = malloc(sizeof(*n));
if (!n) {
perror("malloc");
exit(1);
}
n->data = value;
n->next = head;
return n;
}
/* Print every element. */
void list_print(const struct node *head)
{
for (const struct node *cur = head; cur; cur = cur->next)
printf("%d -> ", cur->data);
printf("NULL\n");
}
/* Free every node. */
void list_free(struct node *head)
{
while (head) {
struct node *tmp = head;
head = head->next;
free(tmp);
}
}
int main(void)
{
struct node *list = NULL;
for (int i = 1; i <= 5; i++)
list = list_push(list, i);
list_print(list); /* 5 -> 4 -> 3 -> 2 -> 1 -> NULL */
list_free(list);
return 0;
}
Memory layout after pushing 3, 2, 1:
head
|
v
+---+---+ +---+---+ +---+---+
| 1 | *-+--->| 2 | *-+--->| 3 | / |
+---+---+ +---+---+ +---+---+
data next data next data next (NULL)
Caution: Every
mallocmust pair with exactly onefree. Forget one -- memory leak. Free twice -- undefined behavior and likely a crash.
Try It: Add a
list_findfunction that returns a pointer to the first node whosedataequals a given value, or NULL if not found. Then add alist_removefunction that unlinks and frees that node.
The Kernel Way: Intrusive Lists
The textbook list above embeds data inside the list node. The Linux kernel flips this: it embeds a list node inside the data struct. This is called an intrusive list.
Textbook: Kernel (intrusive):
struct node { struct task_info {
int data; <--+ int pid;
struct node *next; | char name[16];
}; | struct list_head tasks; <-- just prev/next
| };
|
data lives data owns the link
inside node node lives inside data
The kernel's struct list_head is simply:
struct list_head {
struct list_head *next;
struct list_head *prev;
};
It is a doubly linked, circular list. The magic happens when you need to get
back from the embedded list_head to the enclosing struct.
container_of and offsetof
offsetof(type, member) returns the byte offset of member within type.
container_of subtracts that offset from a pointer to the member to recover
the parent struct.
/* container_of.c -- demonstrate container_of */
#include <stdio.h>
#include <stddef.h>
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
struct list_head {
struct list_head *next;
struct list_head *prev;
};
struct task_info {
int pid;
char name[16];
struct list_head link;
};
int main(void)
{
struct task_info t = { .pid = 42, .name = "init" };
/* Given only a pointer to the embedded link ... */
struct list_head *lh = &t.link;
/* ... recover the enclosing task_info. */
struct task_info *owner = container_of(lh, struct task_info, link);
printf("pid = %d, name = %s\n", owner->pid, owner->name);
printf("offsetof(task_info, link) = %zu\n",
offsetof(struct task_info, link));
return 0;
}
struct task_info layout:
byte 0 byte 4 byte 20
+------+------+-----------+----------+-----------+
| pid | pad | name[16] | link.next| link.prev |
+------+------+-----------+----------+-----------+
^ ^
| |
owner lh points here
owner = lh - offsetof(..., link)
Driver Prep: The kernel uses
container_ofthousands of times. When you write a driver, yourstruct my_deviceembeds astruct list_headfor the subsystem's device list, and you usecontainer_ofto recover it.
Hash Table with Chaining
A hash table maps keys to buckets. Collisions are handled by chaining -- each bucket is a linked list.
/* hashtable.c -- simple chained hash table */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NBUCKETS 16
struct entry {
char *key;
int value;
struct entry *next;
};
struct hashtable {
struct entry *buckets[NBUCKETS];
};
static unsigned hash(const char *s)
{
unsigned h = 5381;
while (*s)
h = h * 33 + (unsigned char)*s++;
return h % NBUCKETS;
}
void ht_insert(struct hashtable *ht, const char *key, int value)
{
unsigned idx = hash(key);
struct entry *e = malloc(sizeof(*e));
if (!e) { perror("malloc"); exit(1); }
e->key = strdup(key);
e->value = value;
e->next = ht->buckets[idx];
ht->buckets[idx] = e;
}
int ht_lookup(struct hashtable *ht, const char *key, int *out)
{
unsigned idx = hash(key);
for (struct entry *e = ht->buckets[idx]; e; e = e->next) {
if (strcmp(e->key, key) == 0) {
*out = e->value;
return 1; /* found */
}
}
return 0; /* not found */
}
void ht_free(struct hashtable *ht)
{
for (int i = 0; i < NBUCKETS; i++) {
struct entry *e = ht->buckets[i];
while (e) {
struct entry *tmp = e;
e = e->next;
free(tmp->key);
free(tmp);
}
}
}
int main(void)
{
struct hashtable ht = {0};
ht_insert(&ht, "alice", 100);
ht_insert(&ht, "bob", 200);
ht_insert(&ht, "carol", 300);
int val;
if (ht_lookup(&ht, "bob", &val))
printf("bob => %d\n", val);
if (!ht_lookup(&ht, "dave", &val))
printf("dave not found\n");
ht_free(&ht);
return 0;
}
buckets[0] -> NULL
buckets[1] -> NULL
buckets[2] -> [alice|100] -> NULL
...
buckets[7] -> [carol|300] -> [bob|200] -> NULL
...
buckets[15] -> NULL
Try It: Add an
ht_deletefunction that removes a key-value pair and frees its memory. Watch out for the head-of-list special case.
Binary Search Tree
/* bst.c -- binary search tree */
#include <stdio.h>
#include <stdlib.h>
struct bst_node {
int data;
struct bst_node *left;
struct bst_node *right;
};
struct bst_node *bst_insert(struct bst_node *root, int value)
{
if (!root) {
struct bst_node *n = malloc(sizeof(*n));
if (!n) { perror("malloc"); exit(1); }
n->data = value;
n->left = NULL;
n->right = NULL;
return n;
}
if (value < root->data)
root->left = bst_insert(root->left, value);
else if (value > root->data)
root->right = bst_insert(root->right, value);
return root;
}
void bst_inorder(const struct bst_node *root)
{
if (!root) return;
bst_inorder(root->left);
printf("%d ", root->data);
bst_inorder(root->right);
}
void bst_free(struct bst_node *root)
{
if (!root) return;
bst_free(root->left);
bst_free(root->right);
free(root);
}
int main(void)
{
struct bst_node *tree = NULL;
int vals[] = {5, 3, 7, 1, 4, 6, 8};
for (int i = 0; i < 7; i++)
tree = bst_insert(tree, vals[i]);
printf("In-order: ");
bst_inorder(tree); /* 1 2 3 4 5 6 7 8 */
printf("\n");
bst_free(tree);
return 0;
}
5
/ \
3 7
/ \ / \
1 4 6 8
Stack and Queue from Scratch
A stack is last-in-first-out. A queue is first-in-first-out. Both can be built on a linked list or on a contiguous array.
/* stack_queue.c -- array-based stack and queue */
#include <stdio.h>
#include <stdlib.h>
/* ---- Stack (LIFO) ---- */
struct stack {
int *data;
int top;
int cap;
};
struct stack stack_new(int cap)
{
struct stack s;
s.data = malloc(cap * sizeof(int));
if (!s.data) { perror("malloc"); exit(1); }
s.top = 0;
s.cap = cap;
return s;
}
void stack_push(struct stack *s, int val)
{
if (s->top >= s->cap) {
fprintf(stderr, "stack overflow\n");
exit(1);
}
s->data[s->top++] = val;
}
int stack_pop(struct stack *s)
{
if (s->top == 0) {
fprintf(stderr, "stack underflow\n");
exit(1);
}
return s->data[--s->top];
}
void stack_free(struct stack *s) { free(s->data); }
/* ---- Queue (FIFO, circular buffer) ---- */
struct queue {
int *data;
int head;
int tail;
int count;
int cap;
};
struct queue queue_new(int cap)
{
struct queue q;
q.data = malloc(cap * sizeof(int));
if (!q.data) { perror("malloc"); exit(1); }
q.head = 0;
q.tail = 0;
q.count = 0;
q.cap = cap;
return q;
}
void queue_enqueue(struct queue *q, int val)
{
if (q->count >= q->cap) {
fprintf(stderr, "queue full\n");
exit(1);
}
q->data[q->tail] = val;
q->tail = (q->tail + 1) % q->cap;
q->count++;
}
int queue_dequeue(struct queue *q)
{
if (q->count == 0) {
fprintf(stderr, "queue empty\n");
exit(1);
}
int val = q->data[q->head];
q->head = (q->head + 1) % q->cap;
q->count--;
return val;
}
void queue_free(struct queue *q) { free(q->data); }
int main(void)
{
/* Stack demo */
struct stack s = stack_new(8);
stack_push(&s, 10);
stack_push(&s, 20);
stack_push(&s, 30);
printf("stack pop: %d\n", stack_pop(&s)); /* 30 */
printf("stack pop: %d\n", stack_pop(&s)); /* 20 */
stack_free(&s);
/* Queue demo */
struct queue q = queue_new(8);
queue_enqueue(&q, 10);
queue_enqueue(&q, 20);
queue_enqueue(&q, 30);
printf("queue deq: %d\n", queue_dequeue(&q)); /* 10 */
printf("queue deq: %d\n", queue_dequeue(&q)); /* 20 */
queue_free(&q);
return 0;
}
Stack (after push 10, 20, 30):
top=3
|
v
[10][20][30][ ][ ][ ][ ][ ]
0 1 2 3 4 5 6 7
Queue (circular buffer, after enqueue 10, 20, 30):
head=0 tail=3
| |
v v
[10][20][30][ ][ ][ ][ ][ ]
0 1 2 3 4 5 6 7
Rust: The Standard Library Does the Heavy Lifting
Rust ships Vec, VecDeque, HashMap, BTreeMap, LinkedList, and more in
std::collections. You rarely build these from scratch.
// rust_collections.rs use std::collections::{VecDeque, HashMap, BTreeMap}; fn main() { // Vec -- growable array, also works as a stack let mut stack: Vec<i32> = Vec::new(); stack.push(10); stack.push(20); stack.push(30); println!("stack pop: {:?}", stack.pop()); // Some(30) println!("stack pop: {:?}", stack.pop()); // Some(20) // VecDeque -- double-ended queue (ring buffer internally) let mut queue: VecDeque<i32> = VecDeque::new(); queue.push_back(10); queue.push_back(20); queue.push_back(30); println!("queue deq: {:?}", queue.pop_front()); // Some(10) println!("queue deq: {:?}", queue.pop_front()); // Some(20) // HashMap let mut map = HashMap::new(); map.insert("alice", 100); map.insert("bob", 200); map.insert("carol", 300); if let Some(val) = map.get("bob") { println!("bob => {}", val); } // BTreeMap -- sorted by keys let mut btree = BTreeMap::new(); btree.insert(5, "five"); btree.insert(3, "three"); btree.insert(7, "seven"); for (k, v) in &btree { println!("{}: {}", k, v); // printed in sorted order } }
Rust Note: Rust's
Vecis not a linked list -- it is a contiguous, growable array. This is almost always what you want.LinkedListexists instd::collectionsbut is rarely the right choice because of poor cache locality.
Side-by-Side: C Linked List vs Rust Vec
| Operation | C (manual linked list) | Rust (Vec<T>) |
|---|---|---|
| Create | node *head = NULL; | let mut v = Vec::new(); |
| Prepend | allocate node, fix pointers | v.insert(0, val); |
| Append | walk to end, allocate, link | v.push(val); |
| Index access | walk the list -- O(n) | v[i] -- O(1) |
| Remove by idx | walk, relink, free -- O(n) | v.remove(i); -- O(n) |
| Memory layout | scattered heap allocations | one contiguous buffer |
| Cache behavior | terrible (pointer chasing) | excellent (sequential) |
| Safety | dangling pointers, double free | borrow checker enforced |
For almost all user-space programming, Vec beats a linked list. The linked
list wins only when you need O(1) insertion/removal in the middle and you
already hold a pointer to the node.
When You'd Still Write Your Own
- Embedded/no-alloc environments: You cannot use
Vecwithout an allocator. You write intrusive lists on a pre-allocated pool. - Kernel modules: The kernel has its own allocator and its own list macros.
You use
struct list_head, notstd::collections. - Lock-free data structures: Standard collections are not lock-free. You hand-roll with atomics.
- Performance-critical hot paths: When the profiler says the standard container is the bottleneck (rare, but it happens).
Driver Prep: In kernel space, you will use
struct list_head,struct hlist_head(hash list), andstruct rb_root(red-black tree). All are intrusive. Learn the pattern now; the kernel macros are the same idea.
Knowledge Check
- What does
container_of(ptr, type, member)compute, and why does the kernel need it? - Why is a
Vec(contiguous array) usually faster than a linked list for iteration, even though both are O(n)? - In the circular-buffer queue implementation, what happens if you forget
the modulo operation on
headandtail?
Common Pitfalls
- Forgetting to free every node in a linked list -- walk the list and free each node; do not just free the head.
- Off-by-one in circular buffers -- the modulo wrap must use the capacity, not the count.
- Using a linked list when a Vec would do -- cache misses from pointer chasing dominate on modern hardware.
- Dangling pointers after removal -- in C, after you
freea node, any pointer still referencing it is undefined behavior. - Hash function quality -- a bad hash clusters entries into a few buckets, turning O(1) average into O(n).
Generic Programming: void* to Generics
C has no generics in the language. Instead it has three escape hatches:
void*, macros, and _Generic. This chapter shows all three, then shows how
Rust does it properly with monomorphized generics and trait bounds.
The void* Pattern
void* is C's universal pointer -- it can point to any type. The cost is
total loss of type information. The compiler cannot check that you cast
correctly.
/* void_swap.c -- generic swap with void* */
#include <stdio.h>
#include <string.h>
void generic_swap(void *a, void *b, size_t size)
{
unsigned char tmp[size]; /* VLA -- C99 */
memcpy(tmp, a, size);
memcpy(a, b, size);
memcpy(b, tmp, size);
}
int main(void)
{
int x = 10, y = 20;
generic_swap(&x, &y, sizeof(int));
printf("x=%d y=%d\n", x, y); /* x=20 y=10 */
double p = 3.14, q = 2.72;
generic_swap(&p, &q, sizeof(double));
printf("p=%.2f q=%.2f\n", p, q); /* p=2.72 q=3.14 */
return 0;
}
This works, but nothing stops you from passing the wrong size or casting the
result to the wrong type. The bug compiles cleanly and corrupts memory at
runtime.
Caution:
void*erases the type. The compiler will not warn if you passsizeof(int)for adouble*. The resulting memory corruption is silent and undefined.
qsort: The Classic void* API
The C standard library's qsort is the canonical example. Its signature:
void qsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
Every argument is void* or size_t. The comparison function receives void*
and must cast to the real type.
/* qsort_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int cmp_int(const void *a, const void *b)
{
int ia = *(const int *)a;
int ib = *(const int *)b;
return (ia > ib) - (ia < ib);
}
int cmp_str(const void *a, const void *b)
{
const char *sa = *(const char **)a;
const char *sb = *(const char **)b;
return strcmp(sa, sb);
}
int main(void)
{
int nums[] = {5, 3, 8, 1, 4};
qsort(nums, 5, sizeof(int), cmp_int);
for (int i = 0; i < 5; i++)
printf("%d ", nums[i]);
printf("\n"); /* 1 3 4 5 8 */
const char *words[] = {"cherry", "apple", "banana"};
qsort(words, 3, sizeof(char *), cmp_str);
for (int i = 0; i < 3; i++)
printf("%s ", words[i]);
printf("\n"); /* apple banana cherry */
return 0;
}
Try It: Write a
cmp_int_desccomparator that sorts integers in descending order. Pass it toqsortand verify the output.
typedef for Clarity
Raw function-pointer types are hard to read. typedef helps.
/* typedef_demo.c */
#include <stdio.h>
/* Without typedef: hard to parse */
int (*get_operation_raw(char op))(int, int);
/* With typedef: much clearer */
typedef int (*binop_fn)(int, int);
int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }
binop_fn get_operation(char op)
{
switch (op) {
case '+': return add;
case '*': return mul;
default: return NULL;
}
}
int main(void)
{
binop_fn f = get_operation('+');
if (f)
printf("3 + 4 = %d\n", f(3, 4)); /* 7 */
return 0;
}
Macro-Based Generics
When void* is too dangerous, C programmers reach for macros. The preprocessor
does text substitution before the compiler sees the code, so a macro can
"generate" type-specific functions.
/* macro_generic.c -- type-safe min/max via macros */
#include <stdio.h>
#define DEFINE_MIN(TYPE, NAME) \
static inline TYPE NAME(TYPE a, TYPE b) { return a < b ? a : b; }
DEFINE_MIN(int, min_int)
DEFINE_MIN(double, min_double)
int main(void)
{
printf("min_int(3,7) = %d\n", min_int(3, 7));
printf("min_double(3.1,2.7) = %f\n", min_double(3.1, 2.7));
return 0;
}
The kernel takes this further. list_for_each_entry iterates over an
intrusive list and hands you typed pointers -- no casting needed.
/* Simplified kernel-style iteration macro */
#include <stdio.h>
#include <stddef.h>
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
struct list_head {
struct list_head *next;
struct list_head *prev;
};
/* Initialize a list head to point to itself (empty circular list) */
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define LIST_HEAD(name) struct list_head name = LIST_HEAD_INIT(name)
static inline void list_add(struct list_head *new_node,
struct list_head *head)
{
new_node->next = head->next;
new_node->prev = head;
head->next->prev = new_node;
head->next = new_node;
}
#define list_for_each_entry(pos, head, member) \
for (pos = container_of((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = container_of(pos->member.next, typeof(*pos), member))
struct task {
int pid;
struct list_head link;
};
int main(void)
{
LIST_HEAD(tasks);
struct task t1 = { .pid = 1 };
struct task t2 = { .pid = 2 };
struct task t3 = { .pid = 3 };
list_add(&t1.link, &tasks);
list_add(&t2.link, &tasks);
list_add(&t3.link, &tasks);
struct task *pos;
list_for_each_entry(pos, &tasks, link)
printf("pid = %d\n", pos->pid);
return 0;
}
Circular doubly linked list after adding t1, t2, t3:
tasks (sentinel)
|
+--next--> t3.link --next--> t2.link --next--> t1.link --+
+--prev--- t1.link <--prev-- t2.link <--prev-- t3.link <-+
Driver Prep:
list_for_each_entryandcontainer_ofare the two macros you will use most often in kernel code. Master them now.
_Generic in C11
C11 introduced _Generic, a compile-time type dispatch. It selects an
expression based on the type of its controlling argument.
/* generic_c11.c -- _Generic dispatch (requires C11) */
#include <stdio.h>
#include <math.h>
#define abs_val(x) _Generic((x), \
int: abs, \
long: labs, \
float: fabsf, \
double: fabs \
)(x)
#define print_val(x) _Generic((x), \
int: print_int, \
double: print_double, \
char *: print_str \
)(x)
void print_int(int x) { printf("int: %d\n", x); }
void print_double(double x) { printf("double: %.2f\n", x); }
void print_str(char *s) { printf("string: %s\n", s); }
int main(void)
{
printf("abs(-3) = %d\n", abs_val(-3));
printf("abs(-3.5) = %.1f\n", abs_val(-3.5));
print_val(42);
print_val(3.14);
print_val("hello");
return 0;
}
_Generic is limited: it dispatches on a fixed set of types you enumerate.
You cannot write open-ended generic code with it. It is best used for
type-overloaded convenience macros.
The typedef + Function Pointer + Macro Trinity
In practice, "generic" C code combines all three techniques:
- typedef names the function-pointer type so humans can read it.
- Function pointers supply type-specific operations at runtime.
- Macros stamp out boilerplate and provide type-safe wrappers.
/* trinity.c -- combining typedef, fn pointers, and macros */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* 1. typedef for clarity */
typedef int (*cmp_fn)(const void *, const void *);
/* 2. "Generic" sorted array with function pointer for comparison */
struct sorted_array {
void *data;
size_t elem_size;
size_t len;
size_t cap;
cmp_fn cmp;
};
struct sorted_array sa_new(size_t elem_size, size_t cap, cmp_fn cmp)
{
struct sorted_array sa;
sa.data = malloc(elem_size * cap);
sa.elem_size = elem_size;
sa.len = 0;
sa.cap = cap;
sa.cmp = cmp;
return sa;
}
void sa_insert(struct sorted_array *sa, const void *elem)
{
if (sa->len >= sa->cap) {
fprintf(stderr, "sorted_array full\n");
return;
}
/* Find insertion point (linear scan for simplicity) */
size_t i;
char *base = (char *)sa->data;
for (i = 0; i < sa->len; i++) {
if (sa->cmp(elem, base + i * sa->elem_size) < 0)
break;
}
/* Shift elements right */
memmove(base + (i + 1) * sa->elem_size,
base + i * sa->elem_size,
(sa->len - i) * sa->elem_size);
memcpy(base + i * sa->elem_size, elem, sa->elem_size);
sa->len++;
}
/* 3. Macro for type-safe access */
#define SA_GET(sa, type, index) (((type *)(sa)->data)[index])
void sa_free(struct sorted_array *sa) { free(sa->data); }
int cmp_int(const void *a, const void *b)
{
int ia = *(const int *)a;
int ib = *(const int *)b;
return (ia > ib) - (ia < ib);
}
int main(void)
{
struct sorted_array sa = sa_new(sizeof(int), 16, cmp_int);
int vals[] = {5, 3, 8, 1, 7};
for (int i = 0; i < 5; i++)
sa_insert(&sa, &vals[i]);
printf("Sorted: ");
for (size_t i = 0; i < sa.len; i++)
printf("%d ", SA_GET(&sa, int, i));
printf("\n"); /* 1 3 5 7 8 */
sa_free(&sa);
return 0;
}
Caution: The
SA_GETmacro trusts you to pass the right type. Get it wrong and you read garbage. C has no defense against this.
Rust: Real Generics
Rust generics are monomorphized -- the compiler generates a separate copy of the function for each concrete type used. You get type safety and zero runtime cost.
// generics.rs -- Rust generics with trait bounds fn min_val<T: PartialOrd>(a: T, b: T) -> T { if a < b { a } else { b } } fn print_sorted<T: Ord + std::fmt::Debug>(mut items: Vec<T>) { items.sort(); println!("{:?}", items); } fn main() { println!("min(3, 7) = {}", min_val(3, 7)); println!("min(3.1, 2.7) = {}", min_val(3.1, 2.7)); print_sorted(vec![5, 3, 8, 1, 7]); // [1, 3, 5, 7, 8] print_sorted(vec!["cherry", "apple", "banana"]); }
Rust Note: Monomorphization means
min_val::<i32>andmin_val::<f64>are two separate compiled functions. There is novoid*, no casting, no runtime dispatch. The compiler catches type errors at compile time.
Trait Bounds and Where Clauses
Trait bounds constrain what a generic type must support. A where clause is
just a cleaner way to write the same constraints.
// trait_bounds.rs use std::fmt::Display; use std::ops::Add; // Inline bounds fn sum_and_print<T: Add<Output = T> + Display + Copy>(a: T, b: T) { let result = a + b; println!("{} + {} = {}", a, b, result); } // Equivalent with where clause (clearer for many bounds) fn describe<T>(item: &T) where T: Display + PartialOrd + Copy, { println!("Value: {}", item); } // Multiple generic parameters fn largest<T>(list: &[T]) -> &T where T: PartialOrd, { let mut max = &list[0]; for item in &list[1..] { if item > max { max = item; } } max } fn main() { sum_and_print(3, 4); // 3 + 4 = 7 sum_and_print(1.5, 2.5); // 1.5 + 2.5 = 4 describe(&42); describe(&"hello"); let nums = vec![5, 3, 8, 1, 7]; println!("largest = {}", largest(&nums)); // 8 }
Generic Structs
In C, you fake generic containers with void* and macros. In Rust, you
declare a generic struct directly.
// generic_struct.rs struct SortedVec<T: Ord> { data: Vec<T>, } impl<T: Ord> SortedVec<T> { fn new() -> Self { SortedVec { data: Vec::new() } } fn insert(&mut self, value: T) { let pos = self.data.binary_search(&value).unwrap_or_else(|e| e); self.data.insert(pos, value); } fn contains(&self, value: &T) -> bool { self.data.binary_search(value).is_ok() } fn iter(&self) -> impl Iterator<Item = &T> { self.data.iter() } } fn main() { let mut sv = SortedVec::new(); sv.insert(5); sv.insert(3); sv.insert(8); sv.insert(1); sv.insert(7); print!("Sorted: "); for v in sv.iter() { print!("{} ", v); } println!(); // 1 3 5 7 8 println!("contains 3? {}", sv.contains(&3)); // true println!("contains 6? {}", sv.contains(&6)); // false }
Why the Kernel Chose Macros Over void*
The kernel avoids void* for linked lists. Here is why:
void* approach: Macro approach:
struct list_node { struct list_head {
void *data; <-- cast! struct list_head *next;
struct list_node *next; struct list_head *prev;
}; };
// Embed list_head in your struct.
Requires: // container_of recovers parent.
- runtime cast on every access
- extra indirection (pointer Requires:
to data, not data itself) - typeof / offsetof at compile time
- no type checking - zero extra indirection
- type-safe iteration macros
The macro approach gives:
- No extra allocations: the list node lives inside the data struct.
- No casts at use sites:
list_for_each_entryhands you a typed pointer. - No double indirection: one fewer pointer dereference per access.
Try It: Rewrite the
generic_swapfunction from the beginning of this chapter using a macro instead ofvoid*. The macro version should work for any type without requiring asizeparameter. (Hint: usetypeof.)
Knowledge Check
- What goes wrong if you pass
sizeof(int)togeneric_swapbut the pointers actually point todoublevalues? - In
_Generic, what happens if the controlling expression's type does not match any of the listed types and there is nodefaultcase? - What does "monomorphization" mean in Rust, and how does it differ from C++'s template instantiation?
Common Pitfalls
- Casting
void*to the wrong type -- the compiler says nothing, the program corrupts memory silently. - Macro hygiene -- macro arguments evaluated multiple times cause bugs:
MIN(x++, y)incrementsxtwice. _Genericdoes not support user-defined types easily -- you must enumerate every type explicitly.- Forgetting trait bounds in Rust -- the compiler will tell you exactly which trait is missing, but the error messages can be long.
- Over-constraining generics -- requiring more traits than necessary reduces reusability.
Function Pointers and Callbacks
A function pointer stores the address of a function. Combined with a context pointer, it becomes a callback -- the mechanism C uses for polymorphism, event handling, and plugin architectures. Rust replaces the pattern with closures and traits.
C Function Pointer Syntax
The syntax is notoriously hard to read. The parentheses around *fn are
mandatory -- without them you declare a function returning a pointer, not a
pointer to a function.
/* fnptr_basic.c */
#include <stdio.h>
int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }
int main(void)
{
/* Declare a function pointer */
int (*op)(int, int);
op = add;
printf("add(3,4) = %d\n", op(3, 4)); /* 7 */
op = mul;
printf("mul(3,4) = %d\n", op(3, 4)); /* 12 */
/* You can also call through the pointer explicitly */
printf("(*op)(5,6) = %d\n", (*op)(5, 6)); /* 30 */
return 0;
}
Memory layout:
op (8 bytes on x86-64)
+------------------+
| address of add() | or | address of mul() |
+------------------+
|
v
.text section: machine code for the function
typedef for Readability
Always typedef function pointer types. Compare:
/* Without typedef */
int (*get_op(char c))(int, int);
/* With typedef */
typedef int (*binop_fn)(int, int);
binop_fn get_op(char c);
The second is instantly readable. The first requires right-left parsing.
/* fnptr_typedef.c */
#include <stdio.h>
typedef int (*binop_fn)(int, int);
int add(int a, int b) { return a + b; }
int sub(int a, int b) { return a - b; }
int mul(int a, int b) { return a * b; }
binop_fn get_op(char c)
{
switch (c) {
case '+': return add;
case '-': return sub;
case '*': return mul;
default: return NULL;
}
}
int main(void)
{
char ops[] = "+-*";
for (int i = 0; i < 3; i++) {
binop_fn f = get_op(ops[i]);
if (f)
printf("10 %c 3 = %d\n", ops[i], f(10, 3));
}
return 0;
}
Try It: Add a division operator to
get_op. Handle division by zero inside thedivfunction by returning 0 and printing a warning.
Passing Functions as Arguments
The C standard library uses function pointers extensively.
qsort
/* qsort_fp.c */
#include <stdio.h>
#include <stdlib.h>
int ascending(const void *a, const void *b)
{
return *(const int *)a - *(const int *)b;
}
int descending(const void *a, const void *b)
{
return *(const int *)b - *(const int *)a;
}
void print_array(const int *arr, int n)
{
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
}
int main(void)
{
int nums[] = {5, 1, 4, 2, 3};
qsort(nums, 5, sizeof(int), ascending);
printf("ascending: "); print_array(nums, 5);
qsort(nums, 5, sizeof(int), descending);
printf("descending: "); print_array(nums, 5);
return 0;
}
signal
/* signal_fp.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
void handle_sigint(int sig)
{
/* Async-signal-safe: only write() is safe here */
const char msg[] = "\nCaught SIGINT\n";
write(STDOUT_FILENO, msg, sizeof(msg) - 1);
_exit(0);
}
int main(void)
{
signal(SIGINT, handle_sigint);
printf("Press Ctrl-C...\n");
while (1)
pause(); /* sleep until a signal arrives */
return 0;
}
pthread_create
/* pthread_fp.c */
#include <stdio.h>
#include <pthread.h>
void *worker(void *arg)
{
int id = *(int *)arg;
printf("Thread %d running\n", id);
return NULL;
}
int main(void)
{
pthread_t threads[3];
int ids[3] = {1, 2, 3};
for (int i = 0; i < 3; i++)
pthread_create(&threads[i], NULL, worker, &ids[i]);
for (int i = 0; i < 3; i++)
pthread_join(threads[i], NULL);
return 0;
}
Compile with: gcc -pthread pthread_fp.c -o pthread_fp
Callback Pattern: Function Pointer + void* Context
A callback alone is rarely enough. You usually need to pass some context --
user-defined data the callback can access. In C, this is done with a void*.
/* callback_ctx.c -- callback with context */
#include <stdio.h>
typedef void (*event_handler)(int event_id, void *ctx);
struct logger_ctx {
const char *prefix;
int count;
};
void log_event(int event_id, void *ctx)
{
struct logger_ctx *log = (struct logger_ctx *)ctx;
log->count++;
printf("[%s] event %d (total: %d)\n", log->prefix, event_id, log->count);
}
struct event_source {
event_handler handler;
void *ctx;
};
void event_source_fire(struct event_source *src, int event_id)
{
if (src->handler)
src->handler(event_id, src->ctx);
}
int main(void)
{
struct logger_ctx my_log = { .prefix = "app", .count = 0 };
struct event_source src = {
.handler = log_event,
.ctx = &my_log,
};
event_source_fire(&src, 100);
event_source_fire(&src, 200);
event_source_fire(&src, 300);
return 0;
}
event_source
+-------------+-----------+
| handler: * | ctx: * |
+------+------+-----+-----+
| |
v v
log_event() logger_ctx { prefix, count }
Caution: The
void* ctxpointer is completely unchecked. If you register the wrong context type for a given handler, the cast inside the handler will silently interpret garbage. This is a common source of bugs in C codebases.
Vtables: Struct of Function Pointers
When an object needs multiple operations, you group the function pointers into
a struct. This is C's version of a vtable -- the same pattern the kernel uses
for file_operations, inode_operations, and dozens of other interfaces.
/* vtable.c -- polymorphism via struct of function pointers */
#include <stdio.h>
#include <math.h>
/* The "interface" */
struct shape_ops {
double (*area)(const void *self);
double (*perimeter)(const void *self);
void (*describe)(const void *self);
};
/* A "class": circle */
struct circle {
const struct shape_ops *ops; /* vtable pointer */
double radius;
};
double circle_area(const void *self)
{
const struct circle *c = self;
return M_PI * c->radius * c->radius;
}
double circle_perimeter(const void *self)
{
const struct circle *c = self;
return 2.0 * M_PI * c->radius;
}
void circle_describe(const void *self)
{
const struct circle *c = self;
printf("Circle(r=%.1f)\n", c->radius);
}
static const struct shape_ops circle_ops = {
.area = circle_area,
.perimeter = circle_perimeter,
.describe = circle_describe,
};
/* A "class": rectangle */
struct rectangle {
const struct shape_ops *ops;
double width, height;
};
double rect_area(const void *self)
{
const struct rectangle *r = self;
return r->width * r->height;
}
double rect_perimeter(const void *self)
{
const struct rectangle *r = self;
return 2.0 * (r->width + r->height);
}
void rect_describe(const void *self)
{
const struct rectangle *r = self;
printf("Rectangle(%.1f x %.1f)\n", r->width, r->height);
}
static const struct shape_ops rect_ops = {
.area = rect_area,
.perimeter = rect_perimeter,
.describe = rect_describe,
};
/* Polymorphic function -- works with any "shape" */
void print_shape_info(const void *shape)
{
/* The first field of every "shape" is the ops pointer */
const struct shape_ops *ops = *(const struct shape_ops **)shape;
ops->describe(shape);
printf(" area = %.2f\n", ops->area(shape));
printf(" perimeter = %.2f\n", ops->perimeter(shape));
}
int main(void)
{
struct circle c = { .ops = &circle_ops, .radius = 5.0 };
struct rectangle r = { .ops = &rect_ops, .width = 4.0, .height = 6.0 };
print_shape_info(&c);
print_shape_info(&r);
return 0;
}
Compile with: gcc -lm vtable.c -o vtable
circle layout: rectangle layout:
+----------+----------+ +----------+-------+--------+
| ops: * | radius | | ops: * | width | height |
+----+-----+----------+ +----+-----+-------+--------+
| |
v v
circle_ops { rect_ops {
.area = circle_area .area = rect_area
.perimeter = circle_peri .perimeter = rect_perimeter
.describe = circle_desc .describe = rect_describe
} }
Driver Prep: The kernel's
struct file_operationsis exactly this pattern. When you write a character device driver, you fill in afile_operationsstruct with pointers to yourread,write,open,release, andioctlfunctions. The VFS calls them through the vtable.
Rust: fn Pointers
Rust has bare function pointers, spelled as fn(args) -> ret. They are used
less often than closures but exist for FFI and when no state capture is needed.
// fn_pointer.rs fn add(a: i32, b: i32) -> i32 { a + b } fn mul(a: i32, b: i32) -> i32 { a * b } fn apply(f: fn(i32, i32) -> i32, x: i32, y: i32) -> i32 { f(x, y) } fn main() { println!("add(3,4) = {}", apply(add, 3, 4)); println!("mul(3,4) = {}", apply(mul, 3, 4)); // Store in a variable let op: fn(i32, i32) -> i32 = add; println!("op(5,6) = {}", op(5, 6)); }
Rust: Closures and the Fn Traits
Closures capture variables from their environment. They implement one or more of three traits:
| Trait | Captures by | Can call | Analogy |
|---|---|---|---|
Fn | shared reference | many times | const void *ctx |
FnMut | mutable ref | many times | void *ctx (mutating) |
FnOnce | move (ownership) | exactly once | consuming the context |
// closures.rs fn apply_fn(f: &dyn Fn(i32) -> i32, x: i32) -> i32 { f(x) } fn apply_fn_mut(f: &mut dyn FnMut(i32) -> i32, x: i32) -> i32 { f(x) } fn apply_fn_once(f: impl FnOnce(i32) -> String, x: i32) -> String { f(x) } fn main() { // Fn -- captures `offset` by shared reference let offset = 10; let add_offset = |x: i32| x + offset; println!("Fn: {}", apply_fn(&add_offset, 5)); // 15 // FnMut -- captures `count` by mutable reference let mut count = 0; let mut counter = |x: i32| -> i32 { count += 1; x + count }; println!("FnMut: {}", apply_fn_mut(&mut counter, 5)); // 6 println!("FnMut: {}", apply_fn_mut(&mut counter, 5)); // 7 // FnOnce -- moves `name` into the closure let name = String::from("event"); let describe = move |x: i32| -> String { format!("{} #{}", name, x) }; println!("FnOnce: {}", apply_fn_once(describe, 42)); // `describe` is consumed -- cannot call it again }
Rust Note: Every closure has a unique, anonymous type. You cannot name it. When you need to store closures in a struct, use
Box<dyn Fn(...)>for trait objects or generics withimpl Fn(...)bounds.
How Closures Capture
By default, closures borrow variables with the least permissions needed. Use
move to force ownership transfer -- required when the closure outlives the
current scope (e.g., sending to another thread).
Without move: With move:
stack frame: stack frame:
+---+ +---+
| x | = 5 | x | = 5 (copied into closure)
+---+ +---+
| y | = "hello" (on heap) | y | = MOVED
+---+ +---+
|
| closure captures &y closure owns y's data
v directly in its anonymous struct
Rust Traits as Vtables
Rust's dyn Trait is the direct equivalent of C's struct-of-function-pointers.
// trait_vtable.rs use std::f64::consts::PI; trait Shape { fn area(&self) -> f64; fn perimeter(&self) -> f64; fn describe(&self); } struct Circle { radius: f64, } impl Shape for Circle { fn area(&self) -> f64 { PI * self.radius * self.radius } fn perimeter(&self) -> f64 { 2.0 * PI * self.radius } fn describe(&self) { println!("Circle(r={:.1})", self.radius); } } struct Rectangle { width: f64, height: f64, } impl Shape for Rectangle { fn area(&self) -> f64 { self.width * self.height } fn perimeter(&self) -> f64 { 2.0 * (self.width + self.height) } fn describe(&self) { println!("Rectangle({:.1} x {:.1})", self.width, self.height); } } fn print_shape_info(shape: &dyn Shape) { shape.describe(); println!(" area = {:.2}", shape.area()); println!(" perimeter = {:.2}", shape.perimeter()); } fn main() { let c = Circle { radius: 5.0 }; let r = Rectangle { width: 4.0, height: 6.0 }; print_shape_info(&c); print_shape_info(&r); // Store heterogeneous shapes in a Vec let shapes: Vec<Box<dyn Shape>> = vec![ Box::new(Circle { radius: 3.0 }), Box::new(Rectangle { width: 2.0, height: 8.0 }), ]; for s in &shapes { print_shape_info(s.as_ref()); } }
Rust Note:
&dyn Shapeis a fat pointer: it stores a pointer to the data AND a pointer to the vtable. This is exactly the same layout as the C pattern where every struct starts withconst struct shape_ops *ops. The difference: Rust enforces it at compile time.
&dyn Shape (fat pointer):
+----------+----------+
| data_ptr | vtbl_ptr |
+----+-----+----+-----+
| |
v v
Circle { Shape vtable for Circle {
radius area: -> Circle::area
} perimeter: -> Circle::perimeter
describe: -> Circle::describe
drop: -> Circle::drop
}
Try It: Add a
Trianglestruct that implementsShape. Add it to theshapesvector and verify polymorphic dispatch works.
Knowledge Check
- In C, why must you use parentheses in
int (*fn)(int, int)-- what happens if you writeint *fn(int, int)instead? - Why does the C callback pattern need both a function pointer and a
void*context, while a Rust closure needs only one value? - How does
dyn Traitin Rust achieve the same result as a C vtable?
Common Pitfalls
- Calling a NULL function pointer -- undefined behavior. Always check before calling.
- Mismatched callback signatures -- C will not always warn if the function pointer type does not match the actual function.
- Context lifetime -- if the
void*context points to a local variable that goes out of scope, the callback reads dangling memory. - Confusing
fnandFnin Rust --fnis a bare function pointer;Fnis a trait that closures implement. They are not interchangeable. - Forgetting
moveon closures passed to threads or stored in structs -- the closure captures references to locals that will be dropped.
State Machines with Function Pointers
A state machine has a finite set of states, a set of events, and transition rules. When an event arrives, the machine moves from its current state to a new one and optionally performs an action. This chapter builds state machines in C with function-pointer dispatch tables, then rebuilds them in Rust with enums and pattern matching.
Why State Machines Matter
Drivers, protocol parsers, network stacks, and user-interface logic are all state machines. A TCP connection goes through LISTEN, SYN_SENT, ESTABLISHED, FIN_WAIT, and more. A UART driver handles IDLE, RECEIVING, ERROR. If you write systems code, you write state machines.
Dispatch Tables: Array of Function Pointers
The simplest state machine is an array of function pointers indexed by state. Each function handles the current state and returns the next state.
/* dispatch_table.c -- state machine via function pointer array */
#include <stdio.h>
typedef enum {
STATE_IDLE,
STATE_RUNNING,
STATE_STOPPED,
STATE_COUNT /* number of states */
} state_t;
typedef enum {
EVENT_START,
EVENT_STOP,
EVENT_RESET,
EVENT_COUNT
} event_t;
typedef state_t (*handler_fn)(void);
state_t on_idle_start(void)
{
printf(" IDLE -> start -> RUNNING\n");
return STATE_RUNNING;
}
state_t on_idle_stop(void)
{
printf(" IDLE -> stop -> (stay IDLE)\n");
return STATE_IDLE;
}
state_t on_idle_reset(void)
{
printf(" IDLE -> reset -> (stay IDLE)\n");
return STATE_IDLE;
}
state_t on_running_start(void)
{
printf(" RUNNING -> start -> (stay RUNNING)\n");
return STATE_RUNNING;
}
state_t on_running_stop(void)
{
printf(" RUNNING -> stop -> STOPPED\n");
return STATE_STOPPED;
}
state_t on_running_reset(void)
{
printf(" RUNNING -> reset -> IDLE\n");
return STATE_IDLE;
}
state_t on_stopped_start(void)
{
printf(" STOPPED -> start -> RUNNING\n");
return STATE_RUNNING;
}
state_t on_stopped_stop(void)
{
printf(" STOPPED -> stop -> (stay STOPPED)\n");
return STATE_STOPPED;
}
state_t on_stopped_reset(void)
{
printf(" STOPPED -> reset -> IDLE\n");
return STATE_IDLE;
}
/* 2D dispatch table: [state][event] -> handler */
static handler_fn dispatch[STATE_COUNT][EVENT_COUNT] = {
[STATE_IDLE] = { on_idle_start, on_idle_stop, on_idle_reset },
[STATE_RUNNING] = { on_running_start, on_running_stop, on_running_reset },
[STATE_STOPPED] = { on_stopped_start, on_stopped_stop, on_stopped_reset },
};
static const char *state_names[] = { "IDLE", "RUNNING", "STOPPED" };
int main(void)
{
state_t current = STATE_IDLE;
event_t events[] = {
EVENT_START, EVENT_STOP, EVENT_RESET, EVENT_START, EVENT_STOP
};
for (int i = 0; i < 5; i++) {
printf("State: %s\n", state_names[current]);
current = dispatch[current][events[i]]();
}
printf("Final state: %s\n", state_names[current]);
return 0;
}
Dispatch table layout:
EVENT_START EVENT_STOP EVENT_RESET
+----------------+----------------+----------------+
IDLE | on_idle_start | on_idle_stop | on_idle_reset |
+----------------+----------------+----------------+
RUNNING | on_run_start | on_run_stop | on_run_reset |
+----------------+----------------+----------------+
STOPPED | on_stop_start | on_stop_stop | on_stop_reset |
+----------------+----------------+----------------+
Lookup: dispatch[current_state][event]() -> next_state
Try It: Add a
STATE_ERRORand anEVENT_ERRORthat transitions from any state toSTATE_ERROR. OnlyEVENT_RESETcan leaveSTATE_ERROR.
Protocol Parser State Machine
A real use case: parsing a simple key=value protocol. Messages look like
KEY=VALUE\n. The parser walks through states as it reads each character.
/* parser_sm.c -- protocol parser state machine */
#include <stdio.h>
#include <string.h>
typedef enum {
PS_KEY, /* reading the key */
PS_VALUE, /* reading the value */
PS_DONE, /* line complete */
PS_ERROR /* malformed input */
} parse_state_t;
struct parser {
parse_state_t state;
char key[64];
char value[256];
int key_len;
int val_len;
};
void parser_init(struct parser *p)
{
p->state = PS_KEY;
p->key_len = 0;
p->val_len = 0;
p->key[0] = '\0';
p->value[0] = '\0';
}
parse_state_t feed_key(struct parser *p, char c)
{
if (c == '=') {
p->key[p->key_len] = '\0';
return PS_VALUE;
}
if (c == '\n' || c == '\r')
return PS_ERROR; /* newline before '=' */
if (p->key_len < 63) {
p->key[p->key_len++] = c;
}
return PS_KEY;
}
parse_state_t feed_value(struct parser *p, char c)
{
if (c == '\n') {
p->value[p->val_len] = '\0';
return PS_DONE;
}
if (p->val_len < 255) {
p->value[p->val_len++] = c;
}
return PS_VALUE;
}
parse_state_t feed_done(struct parser *p, char c)
{
(void)c;
(void)p;
return PS_DONE; /* ignore further input */
}
parse_state_t feed_error(struct parser *p, char c)
{
(void)c;
(void)p;
return PS_ERROR;
}
typedef parse_state_t (*feed_fn)(struct parser *, char);
static feed_fn state_handlers[] = {
[PS_KEY] = feed_key,
[PS_VALUE] = feed_value,
[PS_DONE] = feed_done,
[PS_ERROR] = feed_error,
};
void parser_feed(struct parser *p, char c)
{
p->state = state_handlers[p->state](p, c);
}
int main(void)
{
const char *input = "host=192.168.1.1\n";
struct parser p;
parser_init(&p);
for (int i = 0; input[i] != '\0'; i++)
parser_feed(&p, input[i]);
if (p.state == PS_DONE)
printf("Parsed: key='%s', value='%s'\n", p.key, p.value);
else
printf("Parse error\n");
return 0;
}
State transitions for "host=192.168.1.1\n":
'h' -> PS_KEY
'o' -> PS_KEY
's' -> PS_KEY
't' -> PS_KEY
'=' -> PS_VALUE (key complete: "host")
'1' -> PS_VALUE
'9' -> PS_VALUE
'2' -> PS_VALUE
...
'1' -> PS_VALUE
'\n' -> PS_DONE (value complete: "192.168.1.1")
Driver Prep: Many driver protocols (I2C, SPI, USB) use state machines for packet parsing. The function-pointer-per-state pattern keeps the code modular: each state handler is a small, testable function.
Event-Driven Design
In event-driven systems, a main loop reads events and dispatches them to the current state handler. This decouples event sources from state logic.
/* event_driven.c -- event loop with state machine */
#include <stdio.h>
#include <string.h>
typedef enum { ST_OFF, ST_ON, ST_COUNT } state_t;
typedef enum { EV_PRESS, EV_TIMER, EV_COUNT } event_t;
typedef struct {
state_t state;
int press_count;
} context_t;
typedef state_t (*handler_fn)(context_t *ctx);
state_t off_press(context_t *ctx)
{
ctx->press_count++;
printf(" [OFF] Button pressed (#%d) -> ON\n", ctx->press_count);
return ST_ON;
}
state_t off_timer(context_t *ctx)
{
(void)ctx;
printf(" [OFF] Timer tick -> (stay OFF)\n");
return ST_OFF;
}
state_t on_press(context_t *ctx)
{
ctx->press_count++;
printf(" [ON] Button pressed (#%d) -> OFF\n", ctx->press_count);
return ST_OFF;
}
state_t on_timer(context_t *ctx)
{
(void)ctx;
printf(" [ON] Timer tick -> (stay ON)\n");
return ST_ON;
}
static handler_fn dispatch[ST_COUNT][EV_COUNT] = {
[ST_OFF] = { off_press, off_timer },
[ST_ON] = { on_press, on_timer },
};
void process_event(context_t *ctx, event_t ev)
{
ctx->state = dispatch[ctx->state][ev](ctx);
}
int main(void)
{
context_t ctx = { .state = ST_OFF, .press_count = 0 };
/* Simulate an event stream */
event_t events[] = { EV_TIMER, EV_PRESS, EV_TIMER, EV_PRESS, EV_PRESS };
for (int i = 0; i < 5; i++)
process_event(&ctx, events[i]);
printf("Final press count: %d\n", ctx.press_count);
return 0;
}
Try It: Add a
ST_BLINKstate entered by pressing the button while ON. A timer tick in BLINK goes back to ON. A press in BLINK goes to OFF.
Rust: enum + match
Rust's enums with data (algebraic data types) and exhaustive pattern matching are a natural fit for state machines. The compiler verifies you handle every state.
// state_machine.rs -- state machine with enum + match #[derive(Debug, Clone, Copy, PartialEq)] enum State { Idle, Running, Stopped, } #[derive(Debug, Clone, Copy)] enum Event { Start, Stop, Reset, } fn transition(state: State, event: Event) -> State { match (state, event) { (State::Idle, Event::Start) => { println!(" IDLE -> Start -> RUNNING"); State::Running } (State::Running, Event::Stop) => { println!(" RUNNING -> Stop -> STOPPED"); State::Stopped } (State::Running, Event::Reset) => { println!(" RUNNING -> Reset -> IDLE"); State::Idle } (State::Stopped, Event::Start) => { println!(" STOPPED -> Start -> RUNNING"); State::Running } (State::Stopped, Event::Reset) => { println!(" STOPPED -> Reset -> IDLE"); State::Idle } (s, e) => { println!(" {:?} -> {:?} -> (no change)", s, e); s } } } fn main() { let mut state = State::Idle; let events = [ Event::Start, Event::Stop, Event::Reset, Event::Start, Event::Stop, ]; for &event in &events { println!("State: {:?}", state); state = transition(state, event); } println!("Final state: {:?}", state); }
Rust Note: If you add a new variant to the
Stateenum and forget to handle it in thematch, the compiler refuses to build. In C, adding a new state to the enum but forgetting to add a row to the dispatch table compiles fine and crashes at runtime with a NULL function pointer call.
Protocol Parser in Rust
The same key=value parser, but using Rust enums.
// parser_sm.rs -- protocol parser with enum states #[derive(Debug)] enum ParseState { Key, Value, Done, Error(String), } struct Parser { state: ParseState, key: String, value: String, } impl Parser { fn new() -> Self { Parser { state: ParseState::Key, key: String::new(), value: String::new(), } } fn feed(&mut self, c: char) { self.state = match &self.state { ParseState::Key => { if c == '=' { ParseState::Value } else if c == '\n' || c == '\r' { ParseState::Error("newline before '='".into()) } else { self.key.push(c); ParseState::Key } } ParseState::Value => { if c == '\n' { ParseState::Done } else { self.value.push(c); ParseState::Value } } ParseState::Done => ParseState::Done, ParseState::Error(msg) => ParseState::Error(msg.clone()), }; } fn result(&self) -> Option<(&str, &str)> { match &self.state { ParseState::Done => Some((&self.key, &self.value)), _ => None, } } } fn main() { let input = "host=192.168.1.1\n"; let mut parser = Parser::new(); for c in input.chars() { parser.feed(c); } match parser.result() { Some((k, v)) => println!("Parsed: key='{}', value='{}'", k, v), None => println!("Parse error: {:?}", parser.state), } }
Notice that the Rust Error variant carries a message. Rust enums can hold
data per variant -- C enums cannot.
Try It: Extend the Rust parser to handle multiple key=value pairs separated by newlines. After each
Donestate, reset toKeyand collect results into aVec<(String, String)>.
Why This Matters for Drivers
Device drivers are inherently state machines. A block device goes through initialization, ready, busy, and error states. A network driver manages link negotiation states. An interrupt handler transitions a UART between idle, receiving, and transmitting.
The C function-pointer dispatch table maps directly to how the kernel structures driver state machines. The Rust enum + match approach is how the kernel's Rust abstractions will express the same logic with compile-time exhaustiveness checking.
Typical driver state machine:
init_hw() ready_for_io()
PROBE ----------> INIT --------------> READY
| | ^
| error | | io_complete()
v v |
ERROR <--- error --- BUSY
|
| reset()
v
PROBE (retry)
Driver Prep: When you write a driver, draw the state diagram first. Enumerate every state and every event. Then implement it as a dispatch table (C) or enum + match (Rust). Missing transitions become obvious on paper before they become bugs in production.
Knowledge Check
- What advantage does a 2D dispatch table
[state][event]have over a largeswitchstatement with nestedswitches? - In the Rust parser, why do we use
std::mem::replaceto take ownership of the current state before matching on it? - How does Rust's exhaustive match checking prevent a class of bugs that C dispatch tables are vulnerable to?
Common Pitfalls
- Uninitialized dispatch table entries -- in C, a missing entry is a NULL pointer. Calling it is undefined behavior. Always initialize every cell or add a bounds check.
- Forgetting to handle "no transition" -- some state/event combinations should be no-ops. Make this explicit, not accidental.
- State explosion -- too many states and events make the table unwieldy. Decompose into hierarchical state machines if the table exceeds about 5x5.
- Side effects in transition logic -- keep handlers pure when possible. Separate "compute next state" from "perform action" for testability.
- Rust: forgetting
std::mem::replace-- if you match on&self.stateyou cannot move data out of variants. The take-and-replace idiom is standard for owned state machines.
Opaque Types and Encapsulation
Good APIs hide their guts. In C, the technique is called the "opaque pointer" or "handle" pattern — you forward-declare a struct in a header and only define it in the implementation file. In Rust, the compiler enforces privacy by default. This chapter shows both approaches and why the difference matters for large codebases.
The Problem: Leaking Implementation Details
When you put a full struct definition in a header, every file that includes it can reach into the struct's fields. Change a field name and you recompile the world. Worse, callers start depending on layout details you never promised.
+-------------------------------+
| widget.h |
| struct widget { |
| int x; <-- exposed |
| int y; <-- exposed |
| char *name; <-- exposed |
| }; |
+-------------------------------+
| |
file_a.c file_b.c
w->x = 5; free(w->name); <-- both reach inside
Every consumer is now coupled to the exact layout. This is fragile.
C: The Opaque Pointer Pattern
The fix in C is simple: forward-declare the struct in the header, define it only in
the .c file, and expose functions that operate on pointers to it.
The Header (widget.h)
/* widget.h -- public interface only */
#ifndef WIDGET_H
#define WIDGET_H
#include <stddef.h>
/* Forward declaration -- callers never see the fields */
typedef struct widget widget_t;
/* Constructor / destructor */
widget_t *widget_create(const char *name, int x, int y);
void widget_destroy(widget_t *w);
/* Accessors */
const char *widget_name(const widget_t *w);
int widget_x(const widget_t *w);
int widget_y(const widget_t *w);
/* Mutator */
void widget_move(widget_t *w, int dx, int dy);
#endif /* WIDGET_H */
The Implementation (widget.c)
/* widget.c -- only this file knows the struct layout */
#include "widget.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct widget {
char *name;
int x;
int y;
};
widget_t *widget_create(const char *name, int x, int y)
{
widget_t *w = malloc(sizeof(*w));
if (!w)
return NULL;
w->name = strdup(name);
if (!w->name) {
free(w);
return NULL;
}
w->x = x;
w->y = y;
return w;
}
void widget_destroy(widget_t *w)
{
if (!w)
return;
free(w->name);
free(w);
}
const char *widget_name(const widget_t *w)
{
return w->name;
}
int widget_x(const widget_t *w)
{
return w->x;
}
int widget_y(const widget_t *w)
{
return w->y;
}
void widget_move(widget_t *w, int dx, int dy)
{
w->x += dx;
w->y += dy;
}
A Caller (main.c)
/* main.c */
#include <stdio.h>
#include "widget.h"
int main(void)
{
widget_t *w = widget_create("button", 10, 20);
if (!w) {
fprintf(stderr, "widget_create failed\n");
return 1;
}
printf("Widget '%s' at (%d, %d)\n",
widget_name(w), widget_x(w), widget_y(w));
widget_move(w, 5, -3);
printf("After move: (%d, %d)\n", widget_x(w), widget_y(w));
/* w->x = 99; <-- compile error: incomplete type */
widget_destroy(w);
return 0;
}
Compile and run:
gcc -Wall -c widget.c -o widget.o
gcc -Wall main.c widget.o -o main
./main
Output:
Widget 'button' at (10, 20)
After move: (15, 17)
The caller cannot access w->x directly because the compiler only sees the
forward declaration — the struct is an incomplete type in main.c.
Caution: The opaque pointer pattern in C relies entirely on programmer discipline. Nothing stops someone from copying the struct definition into their own file. It is a convention, not a guarantee.
Handles You Already Know
The C standard library and POSIX use this pattern everywhere:
+------------------+-----------------------------+
| Handle | Hidden struct |
+------------------+-----------------------------+
| FILE * | struct _IO_FILE (glibc) |
| DIR * | struct __dirstream |
| pthread_t | unsigned long (or struct) |
| sqlite3 * | struct sqlite3 |
+------------------+-----------------------------+
You call fopen() and get a FILE *. You never allocate a FILE yourself.
You never inspect its fields. You pass it to fread, fwrite, fclose. That
is the handle pattern.
Driver Prep: The Linux kernel uses opaque handles constantly. A
struct file *is passed through the VFS layer. Driver authors implement operations behind function pointers without exposing internal state to userspace.
Try It: Add a
widget_renamefunction that changes the widget's name. Make sure the old name is freed. Only modifywidget.candwidget.h— the caller should not need to know how names are stored.
Rust: Privacy by Default
Rust flips the default. Struct fields are private unless you say pub.
// widget.rs (or lib.rs in a library crate) pub struct Widget { name: String, // private -- only this module can touch it x: i32, // private y: i32, // private } impl Widget { /// Constructor -- the only way to create a Widget from outside pub fn new(name: &str, x: i32, y: i32) -> Self { Widget { name: name.to_string(), x, y, } } pub fn name(&self) -> &str { &self.name } pub fn x(&self) -> i32 { self.x } pub fn y(&self) -> i32 { self.y } pub fn move_by(&mut self, dx: i32, dy: i32) { self.x += dx; self.y += dy; } } fn main() { let mut w = Widget::new("button", 10, 20); println!("Widget '{}' at ({}, {})", w.name(), w.x(), w.y()); w.move_by(5, -3); println!("After move: ({}, {})", w.x(), w.y()); // w.x = 99; // compile error: field `x` is private }
Compile and run:
rustc widget.rs && ./widget
Output:
Widget 'button' at (10, 20)
After move: (15, 17)
Rust Note: In Rust, privacy is enforced at the module level, not the file level. All code in the same module can access private fields. But code outside the module cannot, even within the same crate, unless fields are marked
pub.
Module Visibility in Detail
Rust gives you fine-grained control:
mod engine { pub struct Motor { pub horsepower: u32, // anyone can read/write pub(crate) serial: u64, // only this crate pub(super) temperature: f64,// only the parent module rpm: u32, // only this module } impl Motor { pub fn new(hp: u32) -> Self { Motor { horsepower: hp, serial: 12345, temperature: 90.0, rpm: 0, } } pub fn start(&mut self) { self.rpm = 800; } pub fn rpm(&self) -> u32 { self.rpm } } } fn main() { let mut m = engine::Motor::new(250); m.start(); println!("HP: {}, RPM: {}", m.horsepower, m.rpm()); // m.rpm = 9000; // error: field `rpm` is private // m.serial = 0; // error: field `serial` is private (pub(crate) -- // // same crate but different module path depending // // on context in a multi-file project) }
The visibility ladder:
+---------------------+--------------------------------+
| Visibility | Who can access |
+---------------------+--------------------------------+
| (none) / private | current module only |
| pub(self) | same as private (explicit) |
| pub(super) | parent module |
| pub(crate) | anywhere in the same crate |
| pub | anyone, including other crates |
+---------------------+--------------------------------+
The Newtype Pattern
Sometimes you want a distinct type that wraps a primitive. In C you use typedef,
but it creates only an alias — not a separate type.
C: typedef is a Weak Alias
/* c_newtype.c */
#include <stdio.h>
typedef int user_id;
typedef int product_id;
void print_user(user_id uid)
{
printf("User: %d\n", uid);
}
int main(void)
{
user_id u = 42;
product_id p = 99;
print_user(u); /* correct */
print_user(p); /* compiles fine -- oops! */
return 0;
}
The compiler treats user_id and product_id as the same type. No warning.
No error. Just a bug waiting to happen.
Rust: Newtype Is a Real Type
// newtype.rs struct UserId(i32); struct ProductId(i32); fn print_user(uid: &UserId) { println!("User: {}", uid.0); } fn main() { let u = UserId(42); let _p = ProductId(99); print_user(&u); // ok // print_user(&_p); // compile error: expected `&UserId`, found `&ProductId` }
Compile and run:
rustc newtype.rs && ./newtype
Output:
User: 42
The newtype wrapper has zero runtime overhead — it is the same size as the inner value. But the compiler treats them as distinct types.
Memory layout (both are identical at runtime):
UserId(42) ProductId(99)
+----------+ +----------+
| 42 (i32) | | 99 (i32) |
+----------+ +----------+
4 bytes 4 bytes
But the type system sees them as DIFFERENT types.
Try It: Add a
MetersandFeetnewtype in Rust. Write a functionadd_meters(Meters, Meters) -> Meters. Verify that passing aFeetvalue is a compile error.
C: Opaque Handle with Function Pointers (vtable-style)
A more advanced C pattern combines opaque types with function pointers. This is
how the Linux kernel implements polymorphism (e.g., struct file_operations).
/* stream.h */
#ifndef STREAM_H
#define STREAM_H
#include <stddef.h>
typedef struct stream stream_t;
/* Operations table -- like a vtable */
typedef struct {
int (*read)(stream_t *s, void *buf, size_t len);
int (*write)(stream_t *s, const void *buf, size_t len);
void (*close)(stream_t *s);
} stream_ops_t;
stream_t *stream_create(const stream_ops_t *ops, void *private_data);
void stream_destroy(stream_t *s);
int stream_read(stream_t *s, void *buf, size_t len);
int stream_write(stream_t *s, const void *buf, size_t len);
#endif
/* stream.c */
#include "stream.h"
#include <stdlib.h>
struct stream {
const stream_ops_t *ops;
void *private_data;
};
stream_t *stream_create(const stream_ops_t *ops, void *private_data)
{
stream_t *s = malloc(sizeof(*s));
if (!s)
return NULL;
s->ops = ops;
s->private_data = private_data;
return s;
}
void stream_destroy(stream_t *s)
{
if (s && s->ops->close)
s->ops->close(s);
free(s);
}
int stream_read(stream_t *s, void *buf, size_t len)
{
if (!s || !s->ops->read)
return -1;
return s->ops->read(s, buf, len);
}
int stream_write(stream_t *s, const void *buf, size_t len)
{
if (!s || !s->ops->write)
return -1;
return s->ops->write(s, buf, len);
}
This is the same architecture as struct file_operations in the Linux kernel.
The caller never sees the internal struct. The operations table provides
polymorphism.
Driver Prep: When you write a Linux driver, you fill in a
struct file_operationswith function pointers forread,write,open,release, andioctl. The kernel calls your functions through these pointers. The opaque-handle-plus-vtable pattern is the foundation of the entire VFS layer.
Rust: Traits as the Clean Equivalent
// stream_trait.rs use std::io; trait Stream { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>; fn write(&mut self, buf: &[u8]) -> io::Result<usize>; } struct MemoryStream { data: Vec<u8>, pos: usize, } impl MemoryStream { fn new(data: Vec<u8>) -> Self { MemoryStream { data, pos: 0 } } } impl Stream for MemoryStream { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { let remaining = &self.data[self.pos..]; let n = buf.len().min(remaining.len()); buf[..n].copy_from_slice(&remaining[..n]); self.pos += n; Ok(n) } fn write(&mut self, buf: &[u8]) -> io::Result<usize> { self.data.extend_from_slice(buf); Ok(buf.len()) } } fn read_all(stream: &mut dyn Stream) -> io::Result<Vec<u8>> { let mut result = Vec::new(); let mut buf = [0u8; 64]; loop { let n = stream.read(&mut buf)?; if n == 0 { break; } result.extend_from_slice(&buf[..n]); } Ok(result) } fn main() { let mut ms = MemoryStream::new(b"Hello, opaque world!".to_vec()); let data = read_all(&mut ms).unwrap(); println!("{}", String::from_utf8_lossy(&data)); }
Compile and run:
rustc stream_trait.rs && ./stream_trait
The trait object &mut dyn Stream is Rust's version of the vtable pattern.
The compiler generates a vtable automatically. No manual function-pointer
tables needed.
Side-by-Side Comparison
+--------------------------+----------------------------+
| C (opaque pointer) | Rust (private fields) |
+--------------------------+----------------------------+
| Forward-declare struct | Fields private by default |
| Define in .c file only | Define in module |
| Expose via pointer | Expose via pub methods |
| Convention-based | Compiler-enforced |
| Caller can cast around | Caller cannot bypass |
| typedef = weak alias | Newtype = real distinct type|
| Function ptr table | Trait object (dyn Trait) |
+--------------------------+----------------------------+
Knowledge Check
-
In C, what makes a struct "opaque" to callers? What compiler concept prevents the caller from accessing fields?
-
In Rust, what is the default visibility of a struct field? How do you make it accessible outside the module?
-
Why is a Rust newtype safer than a C
typedeffor preventing argument mix-ups?
Common Pitfalls
- Forgetting the destructor in C opaque types. The caller cannot
freethe internals because they cannot see them. You must provide a destroy function. - Leaking the definition by putting the struct in a "private" header that someone else includes anyway. In C, there is no enforcement.
- Making all fields
pubin Rust "just to get it compiling." This throws away the safety you get for free. - Returning mutable references to private fields from Rust methods. This leaks internal state just as badly as making the field public.
- Confusing
typedefwith a newtype. In C,typedef int foodoes not create a new type. The compiler still treats it asint.
Error Handling: errno to Result
Every syscall can fail. Every allocation can return NULL. How a language handles
errors defines how reliable the software built with it can be. C gives you
conventions. Rust gives you a type system. This chapter covers both, from the
humble return code to the ? operator.
C: The Return Code Convention
The simplest C error-handling pattern: return 0 for success, non-zero for failure.
/* retcode.c */
#include <stdio.h>
int parse_positive_int(const char *s, int *out)
{
int val = 0;
if (!s || !out)
return -1; /* invalid argument */
for (const char *p = s; *p; p++) {
if (*p < '0' || *p > '9')
return -2; /* not a digit */
val = val * 10 + (*p - '0');
}
*out = val;
return 0; /* success */
}
int main(void)
{
int val;
int rc;
rc = parse_positive_int("42", &val);
if (rc == 0)
printf("Parsed: %d\n", val);
else
printf("Error: %d\n", rc);
rc = parse_positive_int("12ab", &val);
if (rc == 0)
printf("Parsed: %d\n", val);
else
printf("Error: %d\n", rc);
return 0;
}
Compile and run:
gcc -Wall -o retcode retcode.c && ./retcode
Output:
Parsed: 42
Error: -2
This works for small programs. But notice: the caller must remember to check the return value. The compiler will not warn if they forget.
C: errno -- The Global Error Variable
POSIX functions return -1 (or NULL) on failure and set errno to indicate
what went wrong.
/* errno_demo.c */
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
int main(void)
{
int fd = open("/nonexistent/file.txt", O_RDONLY);
if (fd == -1) {
printf("errno = %d\n", errno);
printf("strerror: %s\n", strerror(errno));
perror("open"); /* prints: open: No such file or directory */
}
errno = 0; /* manual reset */
FILE *f = fopen("/etc/shadow", "r");
if (!f) {
perror("fopen /etc/shadow");
}
return 0;
}
Compile and run:
gcc -Wall -o errno_demo errno_demo.c && ./errno_demo
The flow:
open("/nonexistent/file.txt", O_RDONLY)
|
+-- kernel returns error
+-- glibc sets errno = ENOENT (2)
+-- returns -1 to caller
+-- caller checks return value, reads errno for detail
Caution:
errnois thread-local in modern C (C11 / POSIX), but it is still fragile. Any function call between the failing call and readingerrnocan overwrite it. Always checkerrnoimmediately after the call that failed.
C: The -1 / NULL / errno Pattern
Most POSIX and standard library functions follow one of these patterns:
+---------------------------+------------------+------------------+
| Function returns | On success | On failure |
+---------------------------+------------------+------------------+
| int (file descriptor) | >= 0 | -1, sets errno |
| pointer | valid pointer | NULL, sets errno |
| ssize_t (byte count) | >= 0 | -1, sets errno |
| int (status) | 0 | -1, sets errno |
+---------------------------+------------------+------------------+
Not all functions follow this. getchar() returns EOF on failure.
pthread_create returns the error number directly (not through errno). You
must read the man page for every function you call.
Caution: Some functions (like
strtol) have ambiguous return values. A return of 0 might mean "parsed zero" or "parse failed." You must seterrno = 0before calling and check it afterward.
C: The goto cleanup Pattern
When a function acquires multiple resources, you need to release them all on
any error path. The goto pattern is standard practice in C -- and is used
extensively in the Linux kernel.
/* goto_cleanup.c */
#include <stdio.h>
#include <stdlib.h>
int process_file(const char *path)
{
int ret = -1;
FILE *f = NULL;
char *buf = NULL;
f = fopen(path, "r");
if (!f) {
perror("fopen");
goto cleanup;
}
buf = malloc(4096);
if (!buf) {
perror("malloc");
goto cleanup;
}
if (!fgets(buf, 4096, f)) {
if (ferror(f)) {
perror("fgets");
goto cleanup;
}
buf[0] = '\0';
}
printf("First line: %s", buf);
ret = 0; /* success */
cleanup:
free(buf); /* free(NULL) is safe */
if (f)
fclose(f);
return ret;
}
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
return 1;
}
return process_file(argv[1]) == 0 ? 0 : 1;
}
Compile and run:
gcc -Wall -o goto_cleanup goto_cleanup.c && echo "hello world" > /tmp/test.txt && ./goto_cleanup /tmp/test.txt
The goto-cleanup pattern:
function entry
|
allocate resource A ---fail---> goto cleanup
|
allocate resource B ---fail---> goto cleanup
|
do work -------------fail---> goto cleanup
|
success: ret = 0
|
cleanup:
free B (if allocated)
free A (if allocated)
return ret
Driver Prep: The Linux kernel style guide explicitly endorses
gotofor error handling. You will see this pattern in virtually every kernel function that acquires resources.
Try It: Modify
process_fileto alsomalloca second buffer for a processed output. Add the proper cleanup. What happens if you forget to free the second buffer on the error path?
Rust: Result<T, E>
Rust replaces all of the above with a single type:
#![allow(unused)] fn main() { enum Result<T, E> { Ok(T), Err(E), } }
You cannot ignore it. If a function returns Result, the compiler warns you if
you do not handle it.
// result_basic.rs use std::fs; use std::io; fn read_first_line(path: &str) -> Result<String, io::Error> { let contents = fs::read_to_string(path)?; let first = contents.lines().next().unwrap_or("").to_string(); Ok(first) } fn main() { match read_first_line("/tmp/test.txt") { Ok(line) => println!("First line: {}", line), Err(e) => eprintln!("Error: {}", e), } match read_first_line("/nonexistent/file.txt") { Ok(line) => println!("First line: {}", line), Err(e) => eprintln!("Error: {}", e), } }
Compile and run:
rustc result_basic.rs && ./result_basic
The ? Operator
The ? operator is Rust's answer to the goto-cleanup pattern. It does three
things:
- If the
ResultisOk(val), unwrapvaland continue. - If the
ResultisErr(e), converteinto the function's error type and return early. - All cleanup happens automatically via
Drop.
// question_mark.rs use std::fs::File; use std::io::{self, BufRead, BufReader}; fn first_line_length(path: &str) -> Result<usize, io::Error> { let file = File::open(path)?; // returns Err if open fails let reader = BufReader::new(file); let mut line = String::new(); reader.take(4096).read_line(&mut line)?; // returns Err if read fails Ok(line.trim_end().len()) } fn main() { match first_line_length("/tmp/test.txt") { Ok(len) => println!("Length: {}", len), Err(e) => eprintln!("Error: {}", e), } }
Compare the flow to C's goto:
C (goto cleanup) Rust (? operator)
---------------- ------------------
f = fopen(path) let file = File::open(path)?;
if (!f) goto cleanup; // auto-returns Err on failure
buf = malloc(4096); let mut line = String::new();
if (!buf) goto cleanup; // String manages its own memory
fgets(buf, 4096, f); reader.read_line(&mut line)?;
if error goto cleanup; // auto-returns, auto-cleans up
cleanup: // no cleanup block needed --
free(buf); // Drop runs automatically
fclose(f);
Option -- Nullable Without NULL
C uses NULL for "no value." Rust uses Option<T>:
// option_demo.rs fn find_first_negative(nums: &[i32]) -> Option<usize> { for (i, &n) in nums.iter().enumerate() { if n < 0 { return Some(i); } } None } fn main() { let data = [3, 7, -2, 5, -8]; match find_first_negative(&data) { Some(idx) => println!("First negative at index {}", idx), None => println!("No negatives found"), } if let Some(idx) = find_first_negative(&data) { println!("Value: {}", data[idx]); } let empty: &[i32] = &[]; println!("In empty: {:?}", find_first_negative(empty)); }
Compile and run:
rustc option_demo.rs && ./option_demo
Rust Note:
Option<T>has zero overhead for pointer types. The compiler uses the null representation internally, soOption<&T>is the same size as a raw pointer. This is called the "null pointer optimization."
Custom Error Types
Real programs have multiple error sources. In Rust, you define an enum that
implements std::error::Error:
// custom_error.rs use std::fmt; use std::io; use std::num::ParseIntError; #[derive(Debug)] enum AppError { Io(io::Error), Parse(ParseIntError), InvalidArg(String), } impl fmt::Display for AppError { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { match self { AppError::Io(e) => write!(f, "I/O error: {}", e), AppError::Parse(e) => write!(f, "Parse error: {}", e), AppError::InvalidArg(msg) => write!(f, "Invalid argument: {}", msg), } } } impl From<io::Error> for AppError { fn from(e: io::Error) -> Self { AppError::Io(e) } } impl From<ParseIntError> for AppError { fn from(e: ParseIntError) -> Self { AppError::Parse(e) } } fn read_number_from_file(path: &str) -> Result<i64, AppError> { if path.is_empty() { return Err(AppError::InvalidArg("empty path".to_string())); } let contents = std::fs::read_to_string(path)?; // io::Error -> AppError let num: i64 = contents.trim().parse()?; // ParseIntError -> AppError Ok(num) } fn main() { match read_number_from_file("/tmp/number.txt") { Ok(n) => println!("Number: {}", n), Err(e) => eprintln!("Error: {}", e), } match read_number_from_file("") { Ok(n) => println!("Number: {}", n), Err(e) => eprintln!("Error: {}", e), } }
Compile and run:
echo "42" > /tmp/number.txt
rustc custom_error.rs && ./custom_error
The From implementations let the ? operator automatically convert different
error types into your unified AppError.
Why C Error Handling Leads to Bugs
Consider this real-world pattern:
/* bug_demo.c -- spot the bugs */
#include <stdio.h>
#include <stdlib.h>
char *load_config(const char *path)
{
FILE *f = fopen(path, "r");
if (!f)
return NULL;
char *buf = malloc(1024);
if (!buf)
return NULL; /* BUG: f is leaked! */
if (!fgets(buf, 1024, f)) {
free(buf);
return NULL; /* BUG: f is leaked again! */
}
fclose(f);
return buf;
}
int main(void)
{
char *cfg = load_config("/tmp/test.txt");
if (cfg) {
printf("Config: %s", cfg);
free(cfg);
}
return 0;
}
Two resource leaks in a 15-line function. This is everywhere in C codebases. The language simply does not help.
Try It: Fix the bugs in
load_configusing the goto-cleanup pattern. Then write the equivalent in Rust and observe that the resource leaks are impossible.
Using thiserror and anyhow (Cargo ecosystem)
For real Rust projects, two crates simplify error handling enormously.
thiserror -- for library code, auto-generates Display and From:
#![allow(unused)] fn main() { // In Cargo.toml: thiserror = "1" use thiserror::Error; #[derive(Error, Debug)] enum DataError { #[error("I/O error: {0}")] Io(#[from] std::io::Error), #[error("parse error: {0}")] Parse(#[from] std::num::ParseIntError), #[error("value {0} out of range {1}..{2}")] OutOfRange(i64, i64, i64), } }
anyhow -- for application code, wraps any error into a single type:
// In Cargo.toml: anyhow = "1" use anyhow::{Context, Result}; fn load_config(path: &str) -> Result<String> { let contents = std::fs::read_to_string(path) .with_context(|| format!("failed to read config from {}", path))?; Ok(contents) } fn main() -> Result<()> { let cfg = load_config("/tmp/test.txt")?; println!("Config: {}", cfg.trim()); Ok(()) }
Error Handling Decision Tree
Are you writing a library?
|
YES --> Use thiserror (or manual impl)
| - Callers need to match on specific errors
|
NO --> Are you writing an application?
|
YES --> Use anyhow
| - Just print errors and bail
| - Use .context() for readable messages
|
(learning / small scripts) --> Use Box<dyn Error>
Mapping C errno to Rust
When calling C functions from Rust (via FFI), convert errno to a Rust error
using std::io::Error:
// errno_to_rust.rs fn main() { let result = std::fs::File::open("/nonexistent/path"); match result { Ok(_) => println!("opened"), Err(e) => { println!("Error: {}", e); println!("OS error code: {:?}", e.raw_os_error()); println!("Error kind: {:?}", e.kind()); } } }
Output:
Error: No such file or directory (os error 2)
OS error code: Some(2)
Error kind: NotFound
io::Error wraps errno values and maps them to the ErrorKind enum. This
is how Rust bridges the C world.
Knowledge Check
-
What happens in C if you call two POSIX functions in a row and only check
errnoafter the second one? -
In Rust, what does the
?operator do when it encounters anErrvalue? -
Why does Rust not need a goto-cleanup pattern for resource management?
Common Pitfalls
- Forgetting to check return values in C. The compiler will happily let you
ignore the return value of
read(),write(), orclose(). Bugs hide here for years. - Checking
errnoafter an intervening call. Evenprintfcan overwriteerrno. Read it immediately after the failing call. - Using
unwrap()in Rust production code. It panics onErr. Use?ormatchinstead. Reserveunwrap()for cases where failure is truly impossible. - Ignoring
#[must_use]warnings. Rust marksResultas#[must_use]. If you see a warning about an unusedResult, you are ignoring an error. - Confusing Option with Result. Use
Optionfor "might not exist" andResultfor "might fail." Do not useResult<T, ()>when you meanOption<T>.
The Preprocessor and Macros
Before the C compiler sees your code, the preprocessor runs a text-substitution pass over it. This is powerful, dangerous, and entirely unlike anything in most modern languages. Rust replaces the preprocessor with a hygienic macro system and feature flags. This chapter covers both.
The C Preprocessor: A Text Rewriting Engine
The preprocessor operates on text, not on syntax trees. It knows nothing about
types, scopes, or semantics. Every directive starts with #.
Source file (.c)
|
v
[Preprocessor] -- #include, #define, #ifdef
|
v
Translation unit (expanded text)
|
v
[Compiler] -- parsing, type checking, codegen
|
v
Object file (.o)
#include and Include Guards
#include literally copies the contents of another file into the current
position. No module system, no namespacing -- just text insertion. Without
protection, including the same header twice causes redefinition errors:
/* sensor.h */
#ifndef SENSOR_H
#define SENSOR_H
struct sensor {
int id;
float value;
};
int sensor_read(struct sensor *s);
#endif /* SENSOR_H */
Many compilers also support #pragma once as a non-standard shortcut.
#define: Constants and Macros
Simple Constants
/* constants.c */
#include <stdio.h>
#define MAX_SENSORS 16
#define PI 3.14159265358979
#define VERSION "1.0.3"
int main(void)
{
printf("Max sensors: %d\n", MAX_SENSORS);
printf("Pi: %.5f\n", PI);
printf("Version: %s\n", VERSION);
return 0;
}
The preprocessor replaces every occurrence of MAX_SENSORS with the literal
text 16. No type checking. No scoping.
Caution:
#defineconstants have no type.MAX_SENSORSis not anint-- it is the text16. Preferenumorstatic constfor typed constants in modern C.
Function-Like Macros
/* macros.c */
#include <stdio.h>
#define MIN(a, b) ((a) < (b) ? (a) : (b))
#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define SQUARE(x) ((x) * (x))
int main(void)
{
printf("min(3, 7) = %d\n", MIN(3, 7));
printf("max(3, 7) = %d\n", MAX(3, 7));
printf("square(5) = %d\n", SQUARE(5));
printf("square(2+3) = %d\n", SQUARE(2 + 3)); /* 25, correct with parens */
return 0;
}
Compile and run:
gcc -Wall -o macros macros.c && ./macros
Every parameter is wrapped in parentheses and the whole expression is wrapped in outer parentheses. Without this, operator precedence causes silent bugs.
Caution: Macro arguments are evaluated each time they appear. Consider
SQUARE(i++)-- this expands to((i++) * (i++)), which incrementsitwice and invokes undefined behavior. This is the most infamous macro pitfall in C. Never pass expressions with side effects to C macros.
Useful Kernel Patterns
The Linux kernel defines several macros that every systems programmer should know:
/* kernel_patterns.c */
#include <stdio.h>
#include <stddef.h>
/* Array size -- works only on true arrays, not pointers */
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
/* Build-time assertion (simplified) */
#define BUILD_BUG_ON(cond) \
((void)sizeof(char[1 - 2 * !!(cond)]))
/* Container-of: get the parent struct from a member pointer */
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
struct device {
int id;
char name[32];
};
int main(void)
{
int data[] = {10, 20, 30, 40, 50};
printf("Array size: %zu\n", ARRAY_SIZE(data));
BUILD_BUG_ON(sizeof(int) != 4); /* passes on most platforms */
/* BUILD_BUG_ON(sizeof(int) == 4); -- would fail to compile */
struct device dev = { .id = 42, .name = "sensor" };
char *name_ptr = dev.name;
struct device *dev_ptr = container_of(name_ptr, struct device, name);
printf("Device ID via container_of: %d\n", dev_ptr->id);
return 0;
}
Compile and run:
gcc -Wall -o kernel_patterns kernel_patterns.c && ./kernel_patterns
Output:
Array size: 5
Device ID via container_of: 42
Driver Prep:
container_ofis used on nearly every page of the Linux kernel source. When you have a pointer to a member of a struct (like alist_head), this macro recovers the pointer to the enclosing struct.
Stringification and Token Pasting
The preprocessor has two special operators: # turns a macro argument into a
string, and ## pastes two tokens together.
/* stringify.c */
#include <stdio.h>
#define STRINGIFY(x) #x
#define TO_STRING(x) STRINGIFY(x)
#define CONCAT(a, b) a##b
#define DEBUG_VAR(var) printf(#var " = %d\n", var)
#define VERSION_MAJOR 2
#define VERSION_MINOR 7
int main(void)
{
int count = 42;
DEBUG_VAR(count); /* expands to: printf("count" " = %d\n", count); */
int xy = 100;
printf("CONCAT(x, y) = %d\n", CONCAT(x, y)); /* becomes: xy */
/* Two-level stringification for expanding macros */
printf("Version: %s.%s\n",
TO_STRING(VERSION_MAJOR),
TO_STRING(VERSION_MINOR));
return 0;
}
Compile and run:
gcc -Wall -o stringify stringify.c && ./stringify
Output:
count = 42
CONCAT(x, y) = 100
Version: 2.7
Note the two-level TO_STRING / STRINGIFY trick. If you write
STRINGIFY(VERSION_MAJOR), you get the string "VERSION_MAJOR". The extra
indirection forces macro expansion first.
Variadic Macros
/* variadic_macro.c */
#include <stdio.h>
#define LOG(fmt, ...) \
fprintf(stderr, "[LOG] " fmt "\n", ##__VA_ARGS__)
int main(void)
{
LOG("Starting up");
LOG("Sensor %d: value = %.2f", 3, 27.5);
LOG("Shutting down with code %d", 0);
return 0;
}
Compile and run:
gcc -Wall -o variadic_macro variadic_macro.c && ./variadic_macro
The ##__VA_ARGS__ is a GCC extension that removes the trailing comma when no
variadic arguments are passed.
X-Macros: Code Generation
X-macros generate repetitive code from a single list definition.
/* x_macro.c */
#include <stdio.h>
#define ERROR_LIST \
X(ERR_NONE, "no error") \
X(ERR_IO, "I/O error") \
X(ERR_PARSE, "parse error") \
X(ERR_OVERFLOW, "overflow") \
X(ERR_TIMEOUT, "timeout")
/* Generate the enum */
#define X(code, str) code,
typedef enum {
ERROR_LIST
} error_code_t;
#undef X
/* Generate the string table */
#define X(code, str) [code] = str,
static const char *error_strings[] = {
ERROR_LIST
};
#undef X
const char *error_to_string(error_code_t e)
{
if (e < 0 || (size_t)e >= sizeof(error_strings) / sizeof(error_strings[0]))
return "unknown error";
return error_strings[e];
}
int main(void)
{
for (int i = ERR_NONE; i <= ERR_TIMEOUT; i++) {
printf("%d: %s\n", i, error_to_string(i));
}
return 0;
}
Compile and run:
gcc -Wall -o x_macro x_macro.c && ./x_macro
Output:
0: no error
1: I/O error
2: parse error
3: overflow
4: timeout
The error codes and their string representations are defined in one place. You can never add a code and forget its string, or vice versa.
Conditional Compilation
/* conditional.c */
#include <stdio.h>
#ifdef __linux__
#define PLATFORM "Linux"
#elif defined(_WIN32)
#define PLATFORM "Windows"
#elif defined(__APPLE__)
#define PLATFORM "macOS"
#else
#define PLATFORM "Unknown"
#endif
#ifndef NDEBUG
#define DBG(fmt, ...) fprintf(stderr, "DBG: " fmt "\n", ##__VA_ARGS__)
#else
#define DBG(fmt, ...) ((void)0)
#endif
int main(void)
{
printf("Platform: %s\n", PLATFORM);
DBG("This only prints in debug mode");
DBG("x = %d", 42);
return 0;
}
Compile and run:
gcc -Wall -o conditional conditional.c && ./conditional
gcc -Wall -DNDEBUG -o conditional_rel conditional.c && ./conditional_rel
In the release build (-DNDEBUG), the DBG macro expands to nothing.
Try It: Add a
#define VERBOSEflag. When defined, make the LOG macro also print the file name and line number using__FILE__and__LINE__.
Rust: macro_rules! -- Pattern-Matching Macros
Rust macros operate on the syntax tree, not on raw text. They are hygienic: they cannot accidentally capture variables from the surrounding scope.
// rust_macros.rs macro_rules! min { ($a:expr, $b:expr) => {{ let a = $a; let b = $b; if a < b { a } else { b } }}; } macro_rules! debug_var { ($var:expr) => { eprintln!("{} = {:?}", stringify!($var), $var); }; } macro_rules! make_vec { ( $( $elem:expr ),* $(,)? ) => {{ let mut v = Vec::new(); $( v.push($elem); )* v }}; } fn main() { let x = 10; let y = 3; println!("min({}, {}) = {}", x, y, min!(x, y)); // Safe with side effects -- each argument evaluated once let mut counter = 0; let result = min!({ counter += 1; counter }, 5); println!("result = {}, counter = {}", result, counter); // counter is exactly 1, not 2 debug_var!(x + y); debug_var!("hello"); let v = make_vec![1, 2, 3, 4, 5]; println!("vec: {:?}", v); }
Compile and run:
rustc rust_macros.rs && ./rust_macros
Output:
min(10, 3) = 3
result = 1, counter = 1
x + y = 13
"hello" = "hello"
vec: [1, 2, 3, 4, 5]
Rust Note: Rust macros evaluate each argument once by binding it to a local variable. The
SQUARE(i++)bug from C is impossible. This is what "hygienic macros" means in practice.
Rust: cfg Attributes for Conditional Compilation
Rust replaces #ifdef with the cfg attribute system:
// cfg_demo.rs #[cfg(target_os = "linux")] fn platform() -> &'static str { "Linux" } #[cfg(target_os = "windows")] fn platform() -> &'static str { "Windows" } #[cfg(target_os = "macos")] fn platform() -> &'static str { "macOS" } #[cfg(not(any(target_os = "linux", target_os = "windows", target_os = "macos")))] fn platform() -> &'static str { "Unknown" } fn main() { println!("Platform: {}", platform()); if cfg!(debug_assertions) { println!("Debug mode is ON"); } else { println!("Release mode"); } }
Compile and run:
rustc cfg_demo.rs && ./cfg_demo
The cfg! macro evaluates at compile time. Dead branches are eliminated
entirely.
Rust: Feature Flags in Cargo
Cargo supports feature flags for conditional compilation:
# Cargo.toml
[package]
name = "myapp"
version = "0.1.0"
edition = "2021"
[features]
default = ["json"]
json = ["dep:serde_json"]
verbose_logging = []
[dependencies]
serde_json = { version = "1", optional = true }
// src/main.rs (Cargo project) #[cfg(feature = "json")] fn parse_config(data: &str) { let v: serde_json::Value = serde_json::from_str(data).unwrap(); println!("Parsed JSON: {}", v); } #[cfg(not(feature = "json"))] fn parse_config(_data: &str) { println!("JSON support not compiled in"); } fn main() { parse_config(r#"{"key": "value"}"#); }
Build with different features:
cargo run # default features (json)
cargo run --no-default-features # no json
cargo run --features verbose_logging # default + verbose
Procedural Macros: A Brief Overview
Rust also has procedural macros: Rust functions that transform token streams at
compile time. The three kinds are derive macros (#[derive(Debug, Serialize)]),
attribute macros (#[route("GET", "/users")]), and function-like macros
(sql!(SELECT * FROM users)). They are defined in a separate crate. Derive
macros are by far the most common.
Side-by-Side: C Preprocessor vs Rust Macros
+----------------------------+----------------------------------+
| C Preprocessor | Rust Macros |
+----------------------------+----------------------------------+
| Text substitution | Syntax tree transformation |
| No hygiene | Hygienic -- no name capture |
| Arguments re-evaluated | Arguments evaluated once |
| No type safety | Type-checked after expansion |
| #ifdef for platforms | #[cfg()] attributes |
| #define constants | const / static |
| Include guards needed | Module system handles it |
| Errors point to expanded | Errors point to macro call site |
| code (unreadable) | (usually readable) |
+----------------------------+----------------------------------+
Debugging Macro Expansions
In C, use gcc -E to see the preprocessed output:
gcc -E macros.c | tail -20
In Rust, use cargo expand (requires the cargo-expand tool):
cargo install cargo-expand
cargo expand
Try It: Write a C macro
CLAMP(x, lo, hi)that clamps a value to a range. Then write the Rust equivalent usingmacro_rules!. Verify that the Rust version is safe with side effects by passing{ counter += 1; counter }as an argument.
Knowledge Check
-
What happens if you write
#define SQUARE(x) x * xwithout parentheses and then callSQUARE(2 + 3)? -
Why does
SQUARE(i++)cause undefined behavior in C but not in a Rust macro? -
What is the difference between
cfg!(target_os = "linux")and#[cfg(target_os = "linux")]in Rust?
Common Pitfalls
- Missing parentheses in C macros. Always wrap every parameter and the
entire expression:
#define M(x) ((x) + 1). - Multi-statement macros without do-while. Use
do { ... } while(0)for macros that expand to multiple statements, or they breakif/elsechains. - Macro arguments with side effects. Never pass
i++or function calls to C macros unless you know the argument is used exactly once. - Include guard name collisions. Using a common name like
UTILS_Hin two different libraries causes silent header suppression. - Over-using macros when functions work. Modern C compilers inline small
functions automatically. Use
static inlineinstead of function-like macros when possible. - Overcomplicating Rust macros. If a function does the job, use a function. Macros are for cases where you need syntax flexibility (variadic arguments, code generation, compile-time string manipulation).
Inline Assembly
Sometimes you need to drop below the language and talk directly to the CPU.
Reading hardware registers, executing specific instructions the compiler does
not emit, or inserting precise memory barriers -- these require inline assembly.
This chapter shows how to embed assembly in both C (GCC extended asm) and
Rust (the asm! macro), with real examples on x86-64.
When You Need Inline Assembly
Most code never needs it. But these situations demand it:
- Reading CPU-specific registers (cycle counter, model info, control registers)
- Memory and compiler barriers (preventing reordering in lock-free code)
- Specific SIMD instructions that the compiler does not auto-vectorize to
- Hardware I/O (
in/outinstructions for port-mapped I/O) - Atomic operations not provided by the language or library
- System calls (the raw
syscallinstruction)
Driver Prep: Kernel code and device drivers use inline assembly for all of the above. The Linux kernel's
arch/x86/include/asm/directory is full of inline assembly wrappers. Understanding the constraint system is essential.
C: GCC Extended Assembly Syntax
The basic form:
asm volatile (
"assembly template"
: output operands
: input operands
: clobber list
);
Example: Reading the CPU Cycle Counter (RDTSC)
The rdtsc instruction reads the Time Stamp Counter into EDX:EAX (high 32
bits in EDX, low 32 bits in EAX).
/* rdtsc.c */
#include <stdio.h>
#include <stdint.h>
static inline uint64_t read_tsc(void)
{
uint32_t lo, hi;
asm volatile (
"rdtsc"
: "=a" (lo), /* output: EAX -> lo */
"=d" (hi) /* output: EDX -> hi */
: /* no inputs */
: /* no extra clobbers */
);
return ((uint64_t)hi << 32) | lo;
}
int main(void)
{
uint64_t start = read_tsc();
/* Do some work */
volatile int sum = 0;
for (int i = 0; i < 1000000; i++)
sum += i;
uint64_t end = read_tsc();
printf("Cycles: %lu\n", end - start);
printf("Sum: %d\n", sum);
return 0;
}
Compile and run:
gcc -O2 -Wall -o rdtsc rdtsc.c && ./rdtsc
The constraint letters tell GCC which registers to use:
+------------+---------------------------+
| Constraint | Meaning |
+------------+---------------------------+
| "=a" | output in EAX |
| "=d" | output in EDX |
| "=r" | output in any GPR |
| "=m" | output in memory |
| "r" | input in any GPR |
| "i" | immediate constant |
| "m" | input in memory |
+------------+---------------------------+
| Modifiers |
+------------+---------------------------+
| "=" | write-only output |
| "+" | read-write operand |
| "&" | early-clobber output |
+------------+---------------------------+
Example: CPUID
The cpuid instruction returns CPU identification data. It reads EAX as a
"leaf" selector and writes results to EAX, EBX, ECX, EDX.
/* cpuid.c */
#include <stdio.h>
#include <stdint.h>
#include <string.h>
void get_cpu_vendor(char *vendor)
{
uint32_t ebx, ecx, edx;
asm volatile (
"cpuid"
: "=b" (ebx),
"=c" (ecx),
"=d" (edx)
: "a" (0) /* input: leaf 0 */
:
);
/* Vendor string is in EBX:EDX:ECX (yes, that order) */
memcpy(vendor + 0, &ebx, 4);
memcpy(vendor + 4, &edx, 4);
memcpy(vendor + 8, &ecx, 4);
vendor[12] = '\0';
}
int main(void)
{
char vendor[13];
get_cpu_vendor(vendor);
printf("CPU Vendor: %s\n", vendor);
return 0;
}
Compile and run:
gcc -O2 -Wall -o cpuid cpuid.c && ./cpuid
The volatile Keyword
asm volatile tells the compiler: "Do not optimize this away, do not move it,
do not assume it has no side effects." Without volatile, the compiler may
remove the asm block, reorder it, or merge duplicates. Always use volatile
unless the asm block is a pure function with no side effects (rare).
Caution: Even
volatiledoes not prevent CPU-level reordering. For that you need memory barriers (fence instructions).volatileonly constrains the compiler's optimizer.
Memory Barriers
On modern out-of-order CPUs, the processor can reorder memory operations. Compilers can also reorder loads and stores. Barriers prevent both.
/* barriers.c */
#include <stdio.h>
#include <stdint.h>
/* Compiler barrier -- prevents compiler reordering, not CPU reordering */
#define compiler_barrier() asm volatile ("" ::: "memory")
/* Full memory fence -- prevents both compiler and CPU reordering */
#define full_fence() asm volatile ("mfence" ::: "memory")
/* Store fence -- all prior stores complete before any later store */
#define store_fence() asm volatile ("sfence" ::: "memory")
/* Load fence -- all prior loads complete before any later load */
#define load_fence() asm volatile ("lfence" ::: "memory")
volatile int shared_flag = 0;
volatile int shared_data = 0;
void producer(void)
{
shared_data = 42; /* write data first */
store_fence(); /* ensure data is visible before flag */
shared_flag = 1; /* signal that data is ready */
}
void consumer(void)
{
while (!shared_flag) /* wait for flag */
;
load_fence(); /* ensure we read data after flag */
printf("Data: %d\n", shared_data); /* guaranteed to see 42 */
}
int main(void)
{
/* Single-threaded demo -- the barriers matter in multi-threaded code */
producer();
consumer();
return 0;
}
Compile and run:
gcc -O2 -Wall -o barriers barriers.c && ./barriers
The "memory" clobber tells the compiler that the asm block may read or write
any memory, so it must not reorder loads/stores across it.
Without barrier: With barrier:
store data = 42 store data = 42
store flag = 1 sfence
(CPU may reorder these!) store flag = 1
(stores are ordered)
Driver Prep: The Linux kernel defines
mb(),rmb(),wmb()(full, read, write memory barriers) andsmp_mb(),smp_rmb(),smp_wmb()(SMP variants). These are thin wrappers around inline assembly fence instructions.
C: Inline Assembly for a System Call
On x86-64 Linux, system calls use the syscall instruction:
/* raw_syscall.c */
#include <stdio.h>
#include <stdint.h>
/* write(fd, buf, count) -- syscall number 1 on x86-64 */
static long raw_write(int fd, const void *buf, unsigned long count)
{
long ret;
asm volatile (
"syscall"
: "=a" (ret) /* output: return value in RAX */
: "a" (1), /* input: syscall number in RAX */
"D" ((long)fd), /* input: arg1 in RDI */
"S" ((long)buf), /* input: arg2 in RSI */
"d" ((long)count) /* input: arg3 in RDX */
: "rcx", "r11", "memory" /* clobbers: syscall destroys RCX, R11 */
);
return ret;
}
int main(void)
{
const char msg[] = "Hello from raw syscall!\n";
long written = raw_write(1, msg, sizeof(msg) - 1);
printf("Bytes written: %ld\n", written);
return 0;
}
Compile and run:
gcc -O2 -Wall -o raw_syscall raw_syscall.c && ./raw_syscall
Output:
Hello from raw syscall!
Bytes written: 23
The x86-64 Linux syscall convention: RAX = syscall number and return value, arguments in RDI, RSI, RDX, R10, R8, R9. The CPU clobbers RCX and R11.
Try It: Implement a
raw_exit(int code)function using thesyscallinstruction (syscall number 60 on x86-64). Call it instead ofreturn 0and verify the exit code withecho $?.
Rust: The asm! Macro
Rust stabilized inline assembly in Rust 1.59. The syntax is different from GCC's but the concept is the same.
Example: Reading the CPU Cycle Counter
// rdtsc_rust.rs use std::arch::asm; fn read_tsc() -> u64 { let lo: u32; let hi: u32; unsafe { asm!( "rdtsc", out("eax") lo, out("edx") hi, options(nostack, nomem), ); } ((hi as u64) << 32) | (lo as u64) } fn main() { let start = read_tsc(); let mut sum: u64 = 0; for i in 0..1_000_000u64 { sum = sum.wrapping_add(i); } let end = read_tsc(); println!("Cycles: {}", end - start); println!("Sum: {}", sum); }
Compile and run:
rustc -O rdtsc_rust.rs && ./rdtsc_rust
Key differences from GCC syntax:
+---------------------------+--------------------------------+
| GCC (C) | Rust asm! |
+---------------------------+--------------------------------+
| "=a" (lo) | out("eax") lo |
| "=d" (hi) | out("edx") hi |
| : "memory" | options(nomem) if no memory |
| | access, otherwise omit |
| asm volatile | asm! is volatile by default |
| "r" (input) | in(reg) input |
+---------------------------+--------------------------------+
Example: CPUID in Rust
// cpuid_rust.rs use std::arch::asm; fn cpu_vendor() -> String { let ebx: u32; let ecx: u32; let edx: u32; unsafe { asm!( "cpuid", inout("eax") 0u32 => _, out("ebx") ebx, out("ecx") ecx, out("edx") edx, options(nostack), ); } let mut vendor = [0u8; 12]; vendor[0..4].copy_from_slice(&ebx.to_le_bytes()); vendor[4..8].copy_from_slice(&edx.to_le_bytes()); vendor[8..12].copy_from_slice(&ecx.to_le_bytes()); String::from_utf8_lossy(&vendor).to_string() } fn main() { println!("CPU Vendor: {}", cpu_vendor()); }
Compile and run:
rustc -O cpuid_rust.rs && ./cpuid_rust
Rust asm! Operand and Option Reference
+-------------------+------------------------------------------+
| Operand | Meaning |
+-------------------+------------------------------------------+
| in(reg) expr | Input in any general-purpose register |
| in("eax") expr | Input in a specific register |
| out(reg) var | Output to any GPR |
| out("edx") var | Output to a specific register |
| inout(reg) var | Read-write operand |
| inout("eax") x=>y | Input x, output y, same register |
| out(reg) _ | Clobbered register (output discarded) |
+-------------------+------------------------------------------+
+-------------------+------------------------------------------+
| Option | Meaning |
+-------------------+------------------------------------------+
| nomem | Asm does not read/write memory |
| nostack | Asm does not use the stack |
| pure | No side effects (allows optimization) |
| preserves_flags | Does not modify CPU flags (EFLAGS) |
| att_syntax | Use AT&T syntax instead of Intel |
+-------------------+------------------------------------------+
By default, asm! blocks are treated as volatile (they will not be removed or
reordered). Adding pure and nomem together allows the compiler to optimize
like a regular function call.
Rust: A Raw System Call
// raw_syscall_rust.rs use std::arch::asm; /// Perform a raw write() system call on x86-64 Linux. unsafe fn raw_write(fd: u64, buf: *const u8, count: u64) -> i64 { let ret: i64; asm!( "syscall", in("rax") 1u64, // syscall number: write = 1 in("rdi") fd, // arg1: file descriptor in("rsi") buf as u64, // arg2: buffer pointer in("rdx") count, // arg3: byte count out("rcx") _, // clobbered by syscall out("r11") _, // clobbered by syscall lateout("rax") ret, // return value options(nostack), ); ret } fn main() { let msg = b"Hello from Rust raw syscall!\n"; let written = unsafe { raw_write(1, msg.as_ptr(), msg.len() as u64) }; println!("Bytes written: {}", written); }
Compile and run:
rustc -O raw_syscall_rust.rs && ./raw_syscall_rust
Note lateout("rax") instead of out("rax"). This tells the compiler that the
output is written late (after inputs are consumed), which is necessary because
rax is also used as an input.
Rust Note: In practice, use
std::sync::atomicwith properOrderingvalues (SeqCst,Acquire,Release) instead of raw fence instructions for memory barriers. The atomic types generate the correct barriers automatically. Inline assembly for barriers is only needed when interfacing with hardware or writing the lowest levels of a synchronization library.
SIMD: A Practical Example
Let us use SSSE3 to sum an array of four 32-bit integers using 128-bit SIMD registers.
/* simd_sum.c */
#include <stdio.h>
#include <stdint.h>
int32_t simd_sum_4(const int32_t vals[4])
{
int32_t result;
asm volatile (
"movdqu (%1), %%xmm0\n\t" /* load 4 ints into xmm0 */
"phaddd %%xmm0, %%xmm0\n\t" /* horizontal add pairs */
"phaddd %%xmm0, %%xmm0\n\t" /* horizontal add again */
"movd %%xmm0, %0\n\t" /* extract low 32 bits */
: "=r" (result)
: "r" (vals)
: "xmm0", "memory"
);
return result;
}
int main(void)
{
int32_t data[4] = {10, 20, 30, 40};
printf("SIMD sum: %d\n", simd_sum_4(data));
printf("Expected: %d\n", 10 + 20 + 30 + 40);
return 0;
}
Compile and run (requires SSSE3):
gcc -O2 -Wall -mssse3 -o simd_sum simd_sum.c && ./simd_sum
The equivalent in Rust:
// simd_sum_rust.rs use std::arch::asm; fn simd_sum_4(vals: &[i32; 4]) -> i32 { let result: i32; unsafe { asm!( "movdqu ({ptr}), %xmm0", "phaddd %xmm0, %xmm0", "phaddd %xmm0, %xmm0", "movd %xmm0, {out}", ptr = in(reg) vals.as_ptr(), out = out(reg) result, out("xmm0") _, options(att_syntax, nostack), ); } result } fn main() { let data = [10i32, 20, 30, 40]; println!("SIMD sum: {}", simd_sum_4(&data)); println!("Expected: {}", 10 + 20 + 30 + 40); }
Compile and run:
RUSTFLAGS="-C target-feature=+ssse3" rustc -O simd_sum_rust.rs && ./simd_sum_rust
Rust Note: For SIMD in production Rust code, prefer the
std::archintrinsics (like_mm_hadd_epi32) or the portablestd::simdmodule (nightly). Use inline assembly only when the specific instruction you need has no intrinsic wrapper.
Try It: Modify the SIMD example to sum 8 integers using two
movdquloads and apadddto combine them before the horizontal adds.
Safety Considerations
Inline assembly bypasses every safety guarantee both languages provide.
+----------------------------------+-----------------------------------+
| Risk | Mitigation |
+----------------------------------+-----------------------------------+
| Wrong register constraints | Test on multiple opt levels |
| Missing clobber declaration | List ALL modified registers |
| Stack misalignment | Use nostack or align manually |
| Forgetting "memory" clobber | Add if asm touches any memory |
| Platform-specific code | Guard with #ifdef / #[cfg()] |
| Compiler upgrades break asm | Minimize asm surface area |
+----------------------------------+-----------------------------------+
Caution: Incorrect constraints are silent killers. The assembler will not warn you. The program will appear to work, then break under different optimization levels or with different surrounding code. Always test with
-O0,-O2, and-O3.
Wrap each asm block in a small, well-named inline function. Keep the asm to the one or two instructions you actually need, and let the compiler handle everything else. The compiler is better at register allocation and instruction scheduling than you are.
Knowledge Check
-
Why is
asm volatileused instead of plainasmfor reading hardware counters? -
What happens if you forget to list a register in the clobber list that your assembly modifies?
-
In Rust's
asm!macro, what is the difference betweenout("rax")andlateout("rax")?
Common Pitfalls
- Missing
volatile. Without it, the compiler may eliminate or move your asm block. Usevolatilefor anything with side effects. - Incomplete clobber lists. If your assembly modifies a register and you did not declare it, the compiler may store a value there that gets silently corrupted.
- Forgetting the
"memory"clobber. If your assembly reads or writes memory through a pointer, you must include"memory"in the clobber list (or usenomem/omit it appropriately in Rust). - Assuming AT&T vs Intel syntax. GCC uses AT&T syntax by default
(
src, dst). Rust'sasm!uses Intel syntax by default (dst, src). Use theatt_syntaxoption in Rust if you prefer AT&T. - Writing complex logic in assembly. Keep asm blocks to one or two instructions. Let the compiler handle the rest.
- Not guarding platform-specific code. Wrap all inline assembly in
#ifdef __x86_64__(C) or#[cfg(target_arch = "x86_64")](Rust) so the code does not break on ARM or other architectures.
From Source to Binary
When you type gcc main.c -o main, four distinct stages run in sequence.
Understanding each stage turns opaque compiler errors into something you can
reason about -- and makes debugging linker failures, ABI mismatches, and
cross-compilation issues far less painful.
The Four Stages
Source (.c)
|
v
[Preprocessor] --> Expanded source (.i)
|
v
[Compiler] --> Assembly (.s)
|
v
[Assembler] --> Object file (.o)
|
v
[Linker] --> Executable (ELF)
Each stage is a separate program. GCC orchestrates them, but you can stop at any point and inspect the output.
Stage 1: Preprocessing
The preprocessor handles #include, #define, #ifdef, and macro expansion.
It produces pure C with no directives left.
/* version.h */
#ifndef VERSION_H
#define VERSION_H
#define APP_VERSION "1.0.3"
#define MAX_RETRIES 5
#endif
/* stage1.c */
#include <stdio.h>
#include "version.h"
#ifdef DEBUG
#define LOG(msg) fprintf(stderr, "DEBUG: %s\n", msg)
#else
#define LOG(msg) ((void)0)
#endif
int main(void) {
LOG("starting up");
printf("App version: %s\n", APP_VERSION);
printf("Max retries: %d\n", MAX_RETRIES);
return 0;
}
Stop after preprocessing:
gcc -E stage1.c -o stage1.i
Open stage1.i -- it will be thousands of lines long because <stdio.h> gets
fully expanded. Scroll to the bottom and you will see your code with all macros
replaced:
int main(void) {
((void)0);
printf("App version: %s\n", "1.0.3");
printf("Max retries: %d\n", 5);
return 0;
}
The string "1.0.3" is inlined. LOG became ((void)0) because DEBUG
was not defined. Now try:
gcc -E -DDEBUG stage1.c -o stage1_debug.i
The LOG call now expands to an actual fprintf.
Try It: Add a
#define PLATFORM "linux"toversion.hand use it inmain. Rungcc -Eand confirm the string appears in the.ifile.
Stage 2: Compilation (to Assembly)
The compiler translates the preprocessed C into assembly for the target architecture. On x86-64:
gcc -S stage1.c -o stage1.s
/* arith.c */
int add(int a, int b) {
return a + b;
}
int square(int x) {
return x * x;
}
gcc -S -O0 arith.c -o arith.s
The output (simplified, x86-64):
add:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -4(%rbp), %edx
movl -8(%rbp), %eax
addl %edx, %eax
popq %rbp
ret
square:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
imull %eax, %eax
popq %rbp
ret
Now try with optimization:
gcc -S -O2 arith.c -o arith_opt.s
The optimized output is dramatically shorter -- the compiler may skip the frame pointer entirely and use registers directly.
Try It: Compile
arith.cwith-O0,-O1,-O2, and-O3. Compare the assembly output withdiff. Notice how the compiler eliminates unnecessary memory operations at higher levels.
Stage 3: Assembly (to Object Code)
The assembler translates assembly into machine code, producing an ELF object file:
gcc -c arith.c -o arith.o
Inspect it:
file arith.o
# arith.o: ELF 64-bit LSB relocatable, x86-64, ...
objdump -d arith.o
The object file contains machine instructions, but addresses are not yet resolved. Function calls to external symbols are placeholders.
/* caller.c */
#include <stdio.h>
extern int add(int a, int b);
extern int square(int x);
int main(void) {
printf("add(3,4) = %d\n", add(3, 4));
printf("square(5) = %d\n", square(5));
return 0;
}
gcc -c caller.c -o caller.o
objdump -d caller.o
In the disassembly, calls to add, square, and printf show placeholder
addresses (often all zeros). These are relocations -- the linker fills
them in later.
Stage 4: Linking
The linker combines object files, resolves symbols, and produces the final executable:
gcc caller.o arith.o -o program
./program
Output:
add(3,4) = 7
square(5) = 25
Symbols and the Symbol Table
Every object file carries a symbol table. View it with nm:
nm arith.o
0000000000000000 T add
0000000000000014 T square
T means the symbol is in the text (code) section and is globally visible.
nm caller.o
U add
0000000000000000 T main
U printf
U square
U means undefined -- these symbols must be provided by another object file
or library at link time.
Relocations
View relocations with readelf:
readelf -r caller.o
Each relocation entry says: "At offset X in section Y, insert the address of symbol Z." The linker processes every relocation in every object file.
+------------------+ +------------------+
| caller.o | | arith.o |
| | | |
| main | | add [T] |
| calls add [U] |---->| square [T] |
| calls square[U] |---->| |
| calls printf[U] |--+ +------------------+
+------------------+ |
| +------------------+
+->| libc.so |
| printf [T] |
+------------------+
Caution: If you see "undefined reference to ..." at link time, it means the linker cannot find a symbol. Check that you are passing all required object files and libraries. Order matters with static libraries -- the linker processes files left to right.
Examining the Final Executable
file program
# program: ELF 64-bit LSB executable, x86-64, ...
readelf -h program # ELF header
readelf -l program # program headers (segments)
readelf -S program # section headers
objdump -d program # full disassembly
Key sections in an ELF binary:
+-------------------+
| .text | Executable code
+-------------------+
| .rodata | Read-only data (string literals)
+-------------------+
| .data | Initialized global/static variables
+-------------------+
| .bss | Uninitialized global/static variables
+-------------------+
| .symtab | Symbol table
+-------------------+
| .strtab | String table for symbols
+-------------------+
| .rel.text | Relocations (in .o files)
+-------------------+
Driver Prep: Kernel modules are ELF relocatable objects (
.kofiles). The kernel's module loader performs its own linking atinsmodtime, resolving symbols against the running kernel's symbol table. Understanding relocations now pays off directly when debugging module load failures.
Rust's Compilation Model
Rust does not follow the same four-stage pipeline. Instead:
Source (.rs)
|
v
[rustc frontend] --> HIR --> MIR
|
v
[LLVM backend] --> Object files (.o) or LLVM IR (.ll)
|
v
[Linker] --> Executable (ELF)
The Rust compiler (rustc) handles preprocessing-like tasks (macro expansion,
conditional compilation with cfg) internally. There is no separate
preprocessor.
A Rust Example
// arith.rs fn add(a: i32, b: i32) -> i32 { a + b } fn square(x: i32) -> i32 { x * x } fn main() { println!("add(3,4) = {}", add(3, 4)); println!("square(5) = {}", square(5)); }
rustc arith.rs -o arith_rust
./arith_rust
Viewing Intermediate Representations
Emit LLVM IR:
rustc --emit=llvm-ir arith.rs
This produces arith.ll -- LLVM's intermediate representation, which is
portable across architectures.
Emit assembly:
rustc --emit=asm arith.rs
Emit object file only (no linking):
rustc --emit=obj arith.rs
Inspect the resulting object file the same way:
nm arith.o
objdump -d arith.o
Rust Note: Rust mangles symbol names by default. You will see names like
_ZN5arith3add17h...Erather than plainadd. Use#[no_mangle]andextern "C"when you need C-compatible symbol names. We cover this in Chapter 26.
Crates and Incremental Compilation
Rust's unit of compilation is the crate, not the individual .rs file.
A crate can contain many modules spread across multiple files, but rustc
compiles the entire crate as one unit.
Cargo enables incremental compilation: when you change one function,
only the affected parts of the crate are recompiled. Incremental data is
cached in target/debug/incremental/.
cargo build # first build -- compiles everything
# edit one function
cargo build # incremental -- only recompiles the changed parts
Compare with C, where each .c file is compiled independently into a .o
file, and the build system (Make) decides which files to recompile based on
timestamps.
C model: Rust/Cargo model:
file1.c --> file1.o +------------------+
file2.c --> file2.o vs | entire crate |---> crate .rlib
file3.c --> file3.o | (all .rs files) |
\ | / +------------------+
\ | /
v v v
[ linker ]
[executable]
Comparing Object Files from C and Rust
Let us compile equivalent functions in both languages and compare:
/* cfunc.c */
#include <stdint.h>
int32_t multiply(int32_t a, int32_t b) {
return a * b;
}
#![allow(unused)] fn main() { // rfunc.rs #[no_mangle] pub extern "C" fn multiply(a: i32, b: i32) -> i32 { a * b } }
gcc -c -O2 cfunc.c -o cfunc.o
rustc --crate-type=staticlib --emit=obj -C opt-level=2 rfunc.rs -o rfunc.o
objdump -d cfunc.o
objdump -d rfunc.o
At -O2, both produce nearly identical machine code for this simple function:
multiply:
movl %edi, %eax
imull %esi, %eax
ret
The LLVM backend (used by Rust) and GCC's backend produce equivalent output for straightforward arithmetic. Differences appear with more complex code -- different inlining decisions, vectorization strategies, and so on.
Try It: Write a function that sums an array of integers in both C and Rust. Compile with
-O2/-C opt-level=2and compare the assembly. Does one auto-vectorize and the other not?
Practical: Walking Through All Four Stages
Here is a complete C program that we will take through every stage manually:
/* pipeline.c */
#include <stdio.h>
#define GREETING "Hello from the pipeline"
static int helper(int n) {
return n * 2 + 1;
}
int main(void) {
int result = helper(21);
printf("%s: result = %d\n", GREETING, result);
return 0;
}
Run each stage explicitly:
# Stage 1: Preprocess
gcc -E pipeline.c -o pipeline.i
wc -l pipeline.i # thousands of lines
# Stage 2: Compile to assembly
gcc -S pipeline.i -o pipeline.s
wc -l pipeline.s # tens of lines
# Stage 3: Assemble to object
gcc -c pipeline.s -o pipeline.o
nm pipeline.o # main is T, printf is U, helper may be t (static)
# Stage 4: Link
gcc pipeline.o -o pipeline
./pipeline
# Hello from the pipeline: result = 43
Notice that helper might appear as t (lowercase) in nm output -- the
lowercase means it is a local symbol (because of static). Local symbols
are not visible to the linker from other object files.
The static Keyword and Symbol Visibility
/* visibility.c */
static int internal_func(void) { /* local to this file */
return 42;
}
int public_func(void) { /* visible to linker */
return internal_func();
}
gcc -c visibility.c -o visibility.o
nm visibility.o
0000000000000000 t internal_func
0000000000000014 T public_func
Lowercase t = local. Uppercase T = global.
In Rust, the equivalent is pub vs non-pub:
#![allow(unused)] fn main() { // visibility.rs fn internal_func() -> i32 { 42 } pub fn public_func() -> i32 { internal_func() } }
Non-pub functions are not exported from the crate. When generating a C-ABI
library, only #[no_mangle] pub extern "C" functions appear as global symbols.
Knowledge Check
-
What does the preprocessor do with
#include <stdio.h>? What does the resulting.ifile contain? -
An object file contains a call to
printfbut its address is all zeros. What mechanism resolves this to the real address? -
In
nmoutput, what is the difference betweenTandU?
Common Pitfalls
-
Forgetting to link all object files. If
main.ocallsadddefined inarith.o, you must pass both to the linker. -
Confusing compilation errors with linker errors. "undefined reference" is a linker error, not a compiler error. The code compiled fine; the symbol is just missing at link time.
-
Assuming identical assembly from C and Rust. Different compilers (GCC vs Clang/LLVM) make different optimization choices. Close does not mean identical.
-
Ignoring
staticvisibility. Astaticfunction in one.cfile cannot be called from another. This is intentional encapsulation, not a bug. -
Stripping debug binaries during development. Keep symbols during development; strip only for release.
Make, CMake, and Cargo
No serious project compiles files by hand. Build systems track dependencies, recompile only what changed, and manage flags across platforms. This chapter covers the three build tools you will encounter most: Make for C, CMake for portable C/C++, and Cargo for Rust.
Make: The Foundation
Make reads a Makefile and builds targets based on dependency rules. The
core idea is simple: if a target is older than its dependencies, run the
recipe to rebuild it.
Anatomy of a Rule
target: dependencies
recipe
The recipe line must start with a tab character, not spaces.
A Minimal Makefile
Given this project structure:
project/
main.c
mathlib.c
mathlib.h
Makefile
/* mathlib.h */
#ifndef MATHLIB_H
#define MATHLIB_H
int add(int a, int b);
int multiply(int a, int b);
#endif
/* mathlib.c */
#include "mathlib.h"
int add(int a, int b) {
return a + b;
}
int multiply(int a, int b) {
return a * b;
}
/* main.c */
#include <stdio.h>
#include "mathlib.h"
int main(void) {
printf("add(3,4) = %d\n", add(3, 4));
printf("multiply(3,4) = %d\n", multiply(3, 4));
return 0;
}
# Makefile
CC = gcc
CFLAGS = -Wall -Wextra -std=c11
LDFLAGS =
SRCS = main.c mathlib.c
OBJS = $(SRCS:.c=.o)
TARGET = calculator
.PHONY: all clean
all: $(TARGET)
$(TARGET): $(OBJS)
$(CC) $(LDFLAGS) -o $@ $^
%.o: %.c mathlib.h
$(CC) $(CFLAGS) -c -o $@ $<
clean:
rm -f $(OBJS) $(TARGET)
How it works:
$@is the target name.$^is all dependencies.$<is the first dependency.%.o: %.cis a pattern rule: any.odepends on its corresponding.c..PHONYtells Make thatallandcleanare not real files.
make # builds calculator
make clean # removes build artifacts
Variables and Overrides
Override variables from the command line:
make CC=clang CFLAGS="-Wall -O2"
Automatic Dependency Generation
Manually listing header dependencies is fragile. Use GCC's -MMD flag:
# Makefile with auto-deps
CC = gcc
CFLAGS = -Wall -Wextra -std=c11 -MMD -MP
LDFLAGS =
SRCS = main.c mathlib.c
OBJS = $(SRCS:.c=.o)
DEPS = $(OBJS:.o=.d)
TARGET = calculator
.PHONY: all clean
all: $(TARGET)
$(TARGET): $(OBJS)
$(CC) $(LDFLAGS) -o $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $<
-include $(DEPS)
clean:
rm -f $(OBJS) $(DEPS) $(TARGET)
-MMD generates .d files listing each .c file's header dependencies.
-include $(DEPS) pulls them in silently (the - suppresses errors on
first build when .d files do not exist).
Try It: Add a new file
utils.c/utils.hto the project. Update theSRCSvariable and verify thatmakerebuilds correctly when you modifyutils.h.
CMake: Portable Build Generation
Make works well for single-platform projects, but CMake generates build files for Make, Ninja, Visual Studio, Xcode, and more. CMake is the standard for cross-platform C and C++ projects.
CMakeLists.txt Basics
# CMakeLists.txt
cmake_minimum_required(VERSION 3.16)
project(Calculator VERSION 1.0 LANGUAGES C)
set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)
add_executable(calculator
main.c
mathlib.c
)
target_compile_options(calculator PRIVATE -Wall -Wextra)
Out-of-Tree Build
CMake strongly recommends building outside the source directory:
mkdir build && cd build
cmake ..
make
./calculator
The source directory stays clean. All generated files live in build/.
Libraries in CMake
Split the math library into its own target:
# CMakeLists.txt
cmake_minimum_required(VERSION 3.16)
project(Calculator VERSION 1.0 LANGUAGES C)
set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)
# Build mathlib as a static library
add_library(mathlib STATIC mathlib.c)
target_include_directories(mathlib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
# Build the executable and link against mathlib
add_executable(calculator main.c)
target_link_libraries(calculator PRIVATE mathlib)
target_compile_options(calculator PRIVATE -Wall -Wextra)
Change STATIC to SHARED to build a shared library instead.
Finding External Libraries
find_package(Threads REQUIRED)
target_link_libraries(calculator PRIVATE Threads::Threads)
find_package(ZLIB REQUIRED)
target_link_libraries(calculator PRIVATE ZLIB::ZLIB)
find_package searches standard system paths and produces imported targets
you can link against.
CMake Build Types
cmake -DCMAKE_BUILD_TYPE=Debug .. # -g, no optimization
cmake -DCMAKE_BUILD_TYPE=Release .. # -O3, NDEBUG defined
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo .. # -O2 -g
Try It: Create a CMake project with a
SHAREDlibrary. Build it, then runldd calculatorto see that it links against the shared library.
Cargo: Rust's Build System and Package Manager
Cargo is to Rust what Make + a package manager is to C, but integrated into one tool. Every Rust project starts with:
cargo new myproject
cd myproject
This creates:
myproject/
Cargo.toml
src/
main.rs
Cargo.toml
[package]
name = "myproject"
version = "0.1.0"
edition = "2021"
[dependencies]
Add a dependency:
[dependencies]
serde = { version = "1.0", features = ["derive"] }
clap = "4"
Run:
cargo build # downloads deps, compiles everything
cargo run # build + run
cargo test # build + run tests
cargo check # type-check only, no codegen (fast)
A Complete Cargo Project
#![allow(unused)] fn main() { // src/mathlib.rs pub fn add(a: i32, b: i32) -> i32 { a + b } pub fn multiply(a: i32, b: i32) -> i32 { a * b } #[cfg(test)] mod tests { use super::*; #[test] fn test_add() { assert_eq!(add(3, 4), 7); } #[test] fn test_multiply() { assert_eq!(multiply(3, 4), 12); } } }
// src/main.rs mod mathlib; fn main() { println!("add(3,4) = {}", mathlib::add(3, 4)); println!("multiply(3,4) = {}", mathlib::multiply(3, 4)); }
cargo run
cargo test
Build Profiles
Cargo has two built-in profiles:
# These are the defaults -- you can override them in Cargo.toml
[profile.dev]
opt-level = 0
debug = true
overflow-checks = true
[profile.release]
opt-level = 3
debug = false
overflow-checks = false
lto = false
cargo build # uses dev profile
cargo build --release # uses release profile
The release binary goes to target/release/ instead of target/debug/.
Custom profiles are possible:
[profile.release-with-debug]
inherits = "release"
debug = true
cargo build --profile release-with-debug
Rust Note: Unlike C where you pass
-O2or-gto the compiler directly, Cargo manages optimization and debug info through profiles. This centralizes build configuration and makes it reproducible.
Workspaces
Large Rust projects split into multiple crates within a workspace:
# Cargo.toml (workspace root)
[workspace]
members = [
"mathlib",
"calculator",
]
# mathlib/Cargo.toml
[package]
name = "mathlib"
version = "0.1.0"
edition = "2021"
# calculator/Cargo.toml
[package]
name = "calculator"
version = "0.1.0"
edition = "2021"
[dependencies]
mathlib = { path = "../mathlib" }
cargo build # builds all workspace members
cargo test -p mathlib # test only the mathlib crate
Features
Cargo features enable conditional compilation:
# mathlib/Cargo.toml
[package]
name = "mathlib"
version = "0.1.0"
edition = "2021"
[features]
default = []
advanced = []
#![allow(unused)] fn main() { // mathlib/src/lib.rs pub fn add(a: i32, b: i32) -> i32 { a + b } #[cfg(feature = "advanced")] pub fn power(base: i32, exp: u32) -> i32 { (0..exp).fold(1, |acc, _| acc * base) } }
cargo build # power() not compiled
cargo build --features advanced # power() included
Comparing the Three
+------------------+-----------+----------------+----------------+
| Feature | Make | CMake | Cargo |
+------------------+-----------+----------------+----------------+
| Config file | Makefile | CMakeLists.txt | Cargo.toml |
| Language | Make DSL | CMake DSL | TOML + Rust |
| Dep management | Manual | find_package | crates.io |
| Cross-platform | Weak | Strong | Strong |
| Incremental | File-time | File-time | Crate-internal |
| Parallel build | make -j | Inherited | Built-in |
+------------------+-----------+----------------+----------------+
Integrating C and Rust
Real projects often mix C and Rust. Three crates make this practical.
The cc Crate: Compiling C from Cargo
# Cargo.toml
[package]
name = "c-from-rust"
version = "0.1.0"
edition = "2021"
[build-dependencies]
cc = "1"
/* csrc/helper.c */
#include <stdint.h>
int32_t c_add(int32_t a, int32_t b) {
return a + b;
}
// build.rs fn main() { cc::Build::new() .file("csrc/helper.c") .compile("helper"); }
// src/main.rs extern "C" { fn c_add(a: i32, b: i32) -> i32; } fn main() { let result = unsafe { c_add(10, 20) }; println!("c_add(10, 20) = {}", result); }
cargo run
# c_add(10, 20) = 30
The cc crate compiles the C file, produces a static library, and tells
Cargo to link it. The build.rs script runs before compilation of the
main crate.
bindgen and cbindgen
Writing extern "C" blocks by hand is error-prone. The bindgen crate
reads a C header and auto-generates Rust FFI declarations. Add
bindgen = "0.70" to [build-dependencies], call
bindgen::Builder::default().header("mylib.h").generate() in build.rs,
and use include!(concat!(env!("OUT_DIR"), "/bindings.rs")) in your Rust
source. It handles structs, enums, typedefs, and function declarations.
The cbindgen crate does the reverse -- it reads Rust source with
#[no_mangle] pub extern "C" functions and generates a C header file
automatically. Add cbindgen = "0.27" to [build-dependencies] and call
cbindgen::generate(crate_dir) from build.rs. The generated header
contains proper C declarations matching your Rust exports.
Driver Prep: The Linux kernel build system uses a highly customized Kbuild system built on Make. Rust-for-Linux integrates with Kbuild to compile Rust kernel modules alongside C. Understanding both Make and Cargo is essential for this workflow.
Knowledge Check
-
In a Makefile, what does
$<expand to? What about$@and$^? -
Why does CMake recommend out-of-tree builds?
-
How does
cargo build --releasediffer fromcargo buildin terms of optimization and debug info?
Common Pitfalls
-
Spaces instead of tabs in Makefiles. Make requires literal tab characters for recipe lines. Many editors silently convert tabs to spaces.
-
Forgetting
-fPICfor shared libraries in CMake. Useset(CMAKE_POSITION_INDEPENDENT_CODE ON)or let CMake handle it withadd_library(... SHARED ...). -
Not running
cargo cleanwhen switching profiles. Stale artifacts intarget/can cause confusing behavior. -
Linking order with static libraries in Make. The linker processes left to right. If
main.odepends onlibmath.a, writegcc main.o -lmath, notgcc -lmath main.o. -
Forgetting
build.rsincc/bindgenworkflows. The build script must exist and be referenced correctly inCargo.toml.
Static and Shared Libraries
Libraries let you package compiled code for reuse without distributing source. The distinction between static and shared libraries affects binary size, load time, memory usage, and update strategy. This chapter covers both, plus how to bridge C and Rust libraries across the language boundary.
Static Libraries (.a)
A static library is an archive of object files. At link time, the linker copies the needed object code directly into the final executable.
Creating a Static Library in C
/* vec2.h */
#ifndef VEC2_H
#define VEC2_H
typedef struct {
double x;
double y;
} Vec2;
Vec2 vec2_add(Vec2 a, Vec2 b);
Vec2 vec2_scale(Vec2 v, double s);
double vec2_dot(Vec2 a, Vec2 b);
#endif
/* vec2.c */
#include "vec2.h"
Vec2 vec2_add(Vec2 a, Vec2 b) {
return (Vec2){ a.x + b.x, a.y + b.y };
}
Vec2 vec2_scale(Vec2 v, double s) {
return (Vec2){ v.x * s, v.y * s };
}
double vec2_dot(Vec2 a, Vec2 b) {
return a.x * b.x + a.y * b.y;
}
Build the static library:
gcc -c -O2 vec2.c -o vec2.o
ar rcs libvec2.a vec2.o
aris the archiver.rinserts files into the archive (replacing if they exist).ccreates the archive if it does not exist.swrites an index (equivalent to runningranlib).
Inspect it:
ar t libvec2.a # list contents
nm libvec2.a # list symbols
Linking Against a Static Library
/* main.c */
#include <stdio.h>
#include "vec2.h"
int main(void) {
Vec2 a = {1.0, 2.0};
Vec2 b = {3.0, 4.0};
Vec2 sum = vec2_add(a, b);
printf("sum = (%.1f, %.1f)\n", sum.x, sum.y);
double d = vec2_dot(a, b);
printf("dot = %.1f\n", d);
Vec2 scaled = vec2_scale(a, 3.0);
printf("scaled = (%.1f, %.1f)\n", scaled.x, scaled.y);
return 0;
}
gcc -O2 main.c -L. -lvec2 -o vectest
./vectest
-L.tells the linker to search the current directory for libraries.-lvec2tells it to look forlibvec2.a(orlibvec2.so).
The resulting binary is self-contained -- it does not need libvec2.a at
runtime.
+-------------------+ +-------------------+
| main.o | | libvec2.a |
| main() [T] | | vec2.o: |
| vec2_add [U] ---+------>| vec2_add [T] |
| vec2_dot [U] ---+------>| vec2_dot [T] |
+-------------------+ +-------------------+
\ /
\ /
v v
+---------------------------+
| vectest (executable) |
| main() |
| vec2_add() (copied in) |
| vec2_dot() (copied in) |
+---------------------------+
Caution: Static linking copies code into every executable that uses it. If ten programs link
libvec2.a, each gets its own copy. Security patches to the library require recompiling all ten programs.
Shared Libraries (.so)
A shared library is loaded at runtime. Multiple programs can share a single copy in memory.
Creating a Shared Library
gcc -c -O2 -fPIC vec2.c -o vec2_pic.o
gcc -shared -o libvec2.so vec2_pic.o
-fPICgenerates position-independent code, required for shared libs.-sharedtells the linker to produce a shared object.
Linking Against a Shared Library
gcc -O2 main.c -L. -lvec2 -o vectest_shared
But running it may fail:
./vectest_shared
# error: libvec2.so: cannot open shared object file
The dynamic linker does not search the current directory by default. Solutions:
# Option 1: Set LD_LIBRARY_PATH
LD_LIBRARY_PATH=. ./vectest_shared
# Option 2: Install to a system path
sudo cp libvec2.so /usr/local/lib/
sudo ldconfig
# Option 3: Embed the path at link time
gcc -O2 main.c -L. -lvec2 -Wl,-rpath,'$ORIGIN' -o vectest_shared
The -Wl,-rpath,'$ORIGIN' approach embeds a relative search path in the
binary itself. $ORIGIN expands to the directory containing the executable.
Runtime vs. Compile Time
+-----------------------+
| Compile/Link Time |
|-----------------------|
| gcc finds libvec2.so |
| records dependency |
| does NOT copy code |
+-----------------------+
|
v
+-----------------------+
| Runtime |
|-----------------------|
| ld.so loads .so |
| maps into memory |
| resolves symbols |
+-----------------------+
Check what shared libraries an executable needs:
ldd vectest_shared
Soname Versioning
Shared libraries use a versioning scheme:
libvec2.so.1.2.3 # real name (major.minor.patch)
libvec2.so.1 # soname (major version)
libvec2.so # linker name (symlink)
gcc -shared -Wl,-soname,libvec2.so.1 -o libvec2.so.1.0.0 vec2_pic.o
ln -s libvec2.so.1.0.0 libvec2.so.1
ln -s libvec2.so.1 libvec2.so
The executable records the soname (libvec2.so.1), not the full version.
This means you can update libvec2.so.1.0.0 to libvec2.so.1.1.0 without
relinking executables, as long as the ABI is compatible.
readelf -d vectest_shared | grep NEEDED
# 0x0000000000000001 (NEEDED) Shared library: [libvec2.so.1]
The ldconfig command manages the soname symlinks system-wide. Run
sudo ldconfig after installing a new library to update the cache.
dlopen / dlsym: Runtime Loading
Sometimes you need to load a library at runtime -- for plugins, optional features, or late binding.
Define a plugin with a clean ABI:
/* plugin_api.h */
#ifndef PLUGIN_API_H
#define PLUGIN_API_H
int plugin_init(void);
int plugin_process(int input);
void plugin_cleanup(void);
#endif
/* my_plugin.c */
#include <stdio.h>
#include "plugin_api.h"
int plugin_init(void) {
printf("[plugin] initialized\n");
return 0;
}
int plugin_process(int input) {
return input * 3 + 1;
}
void plugin_cleanup(void) {
printf("[plugin] cleaned up\n");
}
gcc -shared -fPIC -o my_plugin.so my_plugin.c
/* host.c */
#include <stdio.h>
#include <dlfcn.h>
typedef int (*init_fn)(void);
typedef int (*process_fn)(int);
typedef void (*cleanup_fn)(void);
int main(int argc, char *argv[]) {
const char *plugin_path = (argc > 1) ? argv[1] : "./my_plugin.so";
void *handle = dlopen(plugin_path, RTLD_LAZY);
if (!handle) {
fprintf(stderr, "dlopen: %s\n", dlerror());
return 1;
}
init_fn init = (init_fn)dlsym(handle, "plugin_init");
process_fn process = (process_fn)dlsym(handle, "plugin_process");
cleanup_fn cleanup = (cleanup_fn)dlsym(handle, "plugin_cleanup");
if (!init || !process || !cleanup) {
fprintf(stderr, "dlsym: %s\n", dlerror());
dlclose(handle);
return 1;
}
init();
printf("process(10) = %d\n", process(10));
cleanup();
dlclose(handle);
return 0;
}
gcc -o host host.c -ldl
./host ./my_plugin.so
Output:
[plugin] initialized
process(10) = 31
[plugin] cleaned up
Link with -ldl to get dlopen / dlsym / dlclose.
Driver Prep: The Linux kernel's module system is conceptually similar to
dlopen. When you runinsmod mydriver.ko, the kernel loads the module's ELF object, resolves symbols against the kernel's exported symbol table, and calls the module's init function.
Rust Library Types
Rust supports several library output types, configured in Cargo.toml:
[lib]
crate-type = ["rlib"] # default: Rust-native library
# crate-type = ["staticlib"] # C-compatible static library (.a)
# crate-type = ["cdylib"] # C-compatible shared library (.so)
# crate-type = ["dylib"] # Rust-native shared library
| Type | File | Use case |
|---|---|---|
rlib | .rlib | Dependency for other Rust crates |
staticlib | .a | Link into a C/C++ program |
cdylib | .so | Shared lib callable from C |
dylib | .so | Shared lib for other Rust code |
Building a Rust Static Library for C
# Cargo.toml
[package]
name = "rustmath"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["staticlib"]
#![allow(unused)] fn main() { // src/lib.rs use std::os::raw::c_int; #[no_mangle] pub extern "C" fn rust_add(a: c_int, b: c_int) -> c_int { a + b } #[no_mangle] pub extern "C" fn rust_factorial(n: c_int) -> c_int { if n <= 1 { 1 } else { n * rust_factorial(n - 1) } } }
cargo build --release
ls target/release/librustmath.a
Now call it from C:
/* use_rustlib.c */
#include <stdio.h>
#include <stdint.h>
/* Declarations matching Rust's extern "C" functions */
int32_t rust_add(int32_t a, int32_t b);
int32_t rust_factorial(int32_t n);
int main(void) {
printf("rust_add(10, 20) = %d\n", rust_add(10, 20));
printf("rust_factorial(6) = %d\n", rust_factorial(6));
return 0;
}
gcc -O2 use_rustlib.c -L target/release -lrustmath -lpthread -ldl -lm -o use_rustlib
./use_rustlib
The extra -lpthread -ldl -lm flags are needed because Rust's standard
library depends on them.
Rust Note: When producing a
staticlib, Rust statically links its own standard library into the.afile. This makes the archive self-contained but larger. Acdylibdynamically links the Rust standard library.
Building a Rust Shared Library for C
Change the crate type to ["cdylib"] and rebuild. This produces a
librustmath.so that can be dynamically linked from C the same way.
Writing a C Library Callable from Rust
The reverse direction: wrap an existing C library for use in Rust.
/* cstack.h */
#ifndef CSTACK_H
#define CSTACK_H
#include <stdint.h>
#include <stdbool.h>
#define STACK_CAPACITY 64
typedef struct {
int32_t data[STACK_CAPACITY];
int32_t top;
} Stack;
void stack_init(Stack *s);
bool stack_push(Stack *s, int32_t value);
bool stack_pop(Stack *s, int32_t *out);
int32_t stack_size(const Stack *s);
#endif
/* cstack.c */
#include "cstack.h"
void stack_init(Stack *s) {
s->top = -1;
}
bool stack_push(Stack *s, int32_t value) {
if (s->top >= STACK_CAPACITY - 1) return false;
s->data[++(s->top)] = value;
return true;
}
bool stack_pop(Stack *s, int32_t *out) {
if (s->top < 0) return false;
*out = s->data[(s->top)--];
return true;
}
int32_t stack_size(const Stack *s) {
return s->top + 1;
}
Use it from Rust with the cc crate:
# Cargo.toml
[package]
name = "use-cstack"
version = "0.1.0"
edition = "2021"
[build-dependencies]
cc = "1"
// build.rs fn main() { cc::Build::new() .file("cstack.c") .compile("cstack"); }
// src/main.rs use std::os::raw::c_int; const STACK_CAPACITY: usize = 64; #[repr(C)] struct Stack { data: [c_int; STACK_CAPACITY], top: c_int, } extern "C" { fn stack_init(s: *mut Stack); fn stack_push(s: *mut Stack, value: c_int) -> bool; fn stack_pop(s: *mut Stack, out: *mut c_int) -> bool; fn stack_size(s: *const Stack) -> c_int; } fn main() { unsafe { let mut s = std::mem::MaybeUninit::<Stack>::uninit(); stack_init(s.as_mut_ptr()); let mut s = s.assume_init(); stack_push(&mut s, 10); stack_push(&mut s, 20); stack_push(&mut s, 30); println!("size = {}", stack_size(&s)); let mut val: c_int = 0; while stack_pop(&mut s, &mut val) { println!("popped: {}", val); } } }
cargo run
Output:
size = 3
popped: 30
popped: 20
popped: 10
Caution: When defining
#[repr(C)]structs in Rust to match C structs, you must get the field order, types, and sizes exactly right. A mismatch causes silent memory corruption. Usebindgento generate these automatically for anything non-trivial.
ABI Compatibility
ABI (Application Binary Interface) defines how functions pass arguments, return values, and lay out structs at the machine level. On x86-64 Linux, the System V AMD64 ABI passes the first six integer arguments in registers RDI, RSI, RDX, RCX, R8, R9. Return values go in RAX.
+--------+--------+--------+--------+--------+--------+
| Arg 1 | Arg 2 | Arg 3 | Arg 4 | Arg 5 | Arg 6 |
| RDI | RSI | RDX | RCX | R8 | R9 |
+--------+--------+--------+--------+--------+--------+
| Remaining args go on the stack, right to left |
+------------------------------------------------------+
When Rust uses extern "C", it follows this exact convention.
Rust Note: Rust's native ABI is not stable and can change between compiler versions. Always use
extern "C"when crossing language boundaries. The#[no_mangle]attribute prevents Rust from mangling the symbol name, making it findable by C code.
Knowledge Check
-
What is the difference between
ar rcs libfoo.a foo.oandgcc -shared -o libfoo.so foo.o? -
An executable built against
libvec2.so.1fails to run after you update the library. What might have changed? -
Why must you compile with
-fPICbefore creating a shared library?
Common Pitfalls
-
Forgetting
-fPIC. Without position-independent code, the shared library cannot be loaded at arbitrary addresses. The linker will error. -
Library search order confusion. The linker prefers
.soover.awhen both exist. Use-staticor pass the.apath directly to force static linking. -
Missing transitive dependencies. If
libA.sodepends onlibB.so, you may need to link both explicitly. Usepkg-configor CMake'starget_link_librariesto manage this. -
Forgetting
-ldlfor dlopen. On glibc systems,dlopenanddlsymlive inlibdl. Link with-ldl. -
ABI mismatch between C and Rust structs. If you define a struct in both languages, the layout must match exactly. Use
#[repr(C)]in Rust and verify withoffsetof/std::mem::offset_of!. -
Stripping symbols from a shared library. Stripping all symbols from a
.somakes it useless. Usestrip --strip-unneededto keep only the dynamic symbols.
Cross-Compilation and Targets
Cross-compilation means building code on one machine (the host) to run on a different machine (the target). You compile on your x86-64 laptop and produce a binary for an ARM Raspberry Pi, a RISC-V board, or an embedded microcontroller. This is essential for embedded systems, driver development, and any scenario where the target cannot compile its own code.
Why Cross-Compile?
The target machine may be too slow, too resource-constrained, or not yet booted. You cannot compile a kernel for a board that has no operating system running. Embedded ARM devices, IoT sensors, and custom hardware all require cross-compilation from a development workstation.
+---------------------+ +---------------------+
| Host (x86-64) | | Target (aarch64) |
| - gcc / rustc | build | - no compiler |
| - full OS | -------> | - runs the binary |
| - cross-toolchain | | - limited resources|
+---------------------+ +---------------------+
The Target Triple
Both GCC and LLVM/Rust use a target triple (sometimes a quadruple) to identify the target platform:
<arch>-<vendor>-<os>-<abi>
Examples:
x86_64-unknown-linux-gnu Your typical desktop Linux
aarch64-unknown-linux-gnu 64-bit ARM Linux
arm-unknown-linux-gnueabihf 32-bit ARM Linux, hard float
riscv64gc-unknown-linux-gnu 64-bit RISC-V Linux
x86_64-unknown-linux-musl x86-64 Linux with musl libc
aarch64-unknown-none Bare-metal ARM (no OS)
thumbv7em-none-eabihf ARM Cortex-M4/M7, no OS
Each component:
| Field | Meaning |
|---|---|
arch | CPU architecture (x86_64, aarch64, arm, riscv64) |
vendor | Who made it (unknown, apple, pc) |
os | Operating system (linux, windows, none) |
abi | ABI / libc (gnu, musl, eabi, eabihf) |
The triple determines what instruction set the compiler emits, what system call conventions to use, and what C library to link against.
Cross-Compilation in C
Installing a Cross Toolchain
On Debian/Ubuntu, install cross-compilation tools:
sudo apt install gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu
This gives you aarch64-linux-gnu-gcc, aarch64-linux-gnu-ld,
aarch64-linux-gnu-objdump, and friends.
For 32-bit ARM:
sudo apt install gcc-arm-linux-gnueabihf
For RISC-V:
sudo apt install gcc-riscv64-linux-gnu
A Complete Cross-Compilation Example
/* hello_cross.c */
#include <stdio.h>
int main(void) {
printf("Hello from cross-compiled code!\n");
printf("sizeof(void*) = %zu\n", sizeof(void *));
printf("sizeof(long) = %zu\n", sizeof(long));
return 0;
}
Compile for aarch64:
aarch64-linux-gnu-gcc -O2 hello_cross.c -o hello_aarch64
Inspect the result:
file hello_aarch64
# hello_aarch64: ELF 64-bit LSB executable, ARM aarch64, ...
objdump -d hello_aarch64 | head -30
# You'll see ARM instructions, not x86
You cannot run it directly on x86-64:
./hello_aarch64
# bash: ./hello_aarch64: cannot execute binary file: Exec format error
But you can run it with QEMU user-mode emulation:
sudo apt install qemu-user qemu-user-static
qemu-aarch64 -L /usr/aarch64-linux-gnu ./hello_aarch64
Output:
Hello from cross-compiled code!
sizeof(void*) = 8
sizeof(long) = 8
Try It: Install
gcc-arm-linux-gnueabihfand cross-compile the same program for 32-bit ARM. Usefileto confirm it is an ARM executable. Run it withqemu-arm. Check whatsizeof(void*)reports -- it should be 4.
The Sysroot
A sysroot is a directory containing the target's headers and libraries. When
you install gcc-aarch64-linux-gnu, the sysroot is typically at
/usr/aarch64-linux-gnu/.
/usr/aarch64-linux-gnu/
include/ # target's C headers
lib/ # target's C library, crt*.o, etc.
The cross-compiler knows its sysroot. You can override it:
aarch64-linux-gnu-gcc --sysroot=/path/to/my/sysroot -O2 hello_cross.c -o hello
This is essential when building for custom Linux distributions or embedded systems with non-standard libraries.
Cross-compilation data flow:
hello_cross.c
|
v
aarch64-linux-gnu-gcc
|
+-- uses headers from /usr/aarch64-linux-gnu/include/
+-- links against /usr/aarch64-linux-gnu/lib/libc.so
|
v
hello_aarch64 (ELF for aarch64)
Cross-Compiling with a Makefile
Modify the Makefile to accept a CROSS_COMPILE prefix:
# Makefile
CROSS_COMPILE ?=
CC = $(CROSS_COMPILE)gcc
AR = $(CROSS_COMPILE)ar
STRIP = $(CROSS_COMPILE)strip
CFLAGS = -Wall -Wextra -O2
LDFLAGS =
SRCS = hello_cross.c
OBJS = $(SRCS:.c=.o)
TARGET = hello
.PHONY: all clean
all: $(TARGET)
$(TARGET): $(OBJS)
$(CC) $(LDFLAGS) -o $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $<
clean:
rm -f $(OBJS) $(TARGET)
make # native build
make CROSS_COMPILE=aarch64-linux-gnu- # cross-compile for ARM64
make CROSS_COMPILE=arm-linux-gnueabihf- # cross-compile for ARM32
Driver Prep: The Linux kernel uses exactly this pattern. The kernel Makefile accepts
CROSS_COMPILEandARCHvariables:make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-This is how you build a kernel for a Raspberry Pi on your laptop.
Cross-Compilation in Rust
Rust makes cross-compilation substantially easier than C. The Rust compiler
uses LLVM, which can emit code for many targets from a single compiler binary.
You do not need a separate rustc for each target.
Adding a Target
List installed targets:
rustup target list --installed
Add a new target:
rustup target add aarch64-unknown-linux-gnu
rustup target add arm-unknown-linux-gnueabihf
rustup target add x86_64-unknown-linux-musl
A Simple Cross-Compile
// src/main.rs fn main() { println!("Hello from Rust cross-compilation!"); println!("Target arch: {}", std::env::consts::ARCH); println!("Target OS: {}", std::env::consts::OS); println!("Pointer size: {} bytes", std::mem::size_of::<*const u8>()); }
cargo build --target aarch64-unknown-linux-gnu
This compiles Rust code for aarch64 but fails at linking because Cargo does not know where the aarch64 linker is. You need to tell it.
Configuring the Linker
Create or edit .cargo/config.toml:
[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"
[target.arm-unknown-linux-gnueabihf]
linker = "arm-linux-gnueabihf-gcc"
[target.riscv64gc-unknown-linux-gnu]
linker = "riscv64-linux-gnu-gcc"
Now:
cargo build --target aarch64-unknown-linux-gnu
file target/aarch64-unknown-linux-gnu/debug/myproject
# ELF 64-bit LSB pie executable, ARM aarch64, ...
Run with QEMU:
qemu-aarch64 -L /usr/aarch64-linux-gnu target/aarch64-unknown-linux-gnu/debug/myproject
Output:
Hello from Rust cross-compilation!
Target arch: aarch64
Target OS: linux
Pointer size: 8 bytes
Static Linking with musl
For maximum portability, link statically against musl libc. The resulting binary has zero runtime dependencies:
rustup target add x86_64-unknown-linux-musl
cargo build --target x86_64-unknown-linux-musl --release
file target/x86_64-unknown-linux-musl/release/myproject
# ELF 64-bit LSB executable, x86-64, statically linked, ...
ldd target/x86_64-unknown-linux-musl/release/myproject
# not a dynamic executable
This binary runs on any x86-64 Linux system regardless of the installed glibc version.
Rust Note: Rust's musl target produces fully static binaries by default. In C, achieving the same requires
musl-gccormusl-cross-makeand careful management of all dependencies. Rust makes this trivial.
Conditional Compilation for Target Architecture
// src/main.rs fn main() { #[cfg(target_arch = "x86_64")] println!("Running on x86-64"); #[cfg(target_arch = "aarch64")] println!("Running on ARM64"); #[cfg(target_arch = "arm")] println!("Running on 32-bit ARM"); #[cfg(target_os = "linux")] println!("Operating system: Linux"); #[cfg(target_os = "none")] println!("No OS (bare metal)"); #[cfg(target_pointer_width = "64")] println!("64-bit pointers"); #[cfg(target_pointer_width = "32")] println!("32-bit pointers"); }
The C equivalent uses preprocessor macros:
/* arch_detect.c */
#include <stdio.h>
int main(void) {
#if defined(__x86_64__)
printf("Running on x86-64\n");
#elif defined(__aarch64__)
printf("Running on ARM64\n");
#elif defined(__arm__)
printf("Running on 32-bit ARM\n");
#elif defined(__riscv)
printf("Running on RISC-V\n");
#else
printf("Unknown architecture\n");
#endif
#if defined(__linux__)
printf("Operating system: Linux\n");
#endif
printf("Pointer size: %zu bytes\n", sizeof(void *));
return 0;
}
gcc arch_detect.c -o arch_native
./arch_native
# Running on x86-64
# Operating system: Linux
# Pointer size: 8 bytes
aarch64-linux-gnu-gcc arch_detect.c -o arch_arm64
qemu-aarch64 -L /usr/aarch64-linux-gnu ./arch_arm64
# Running on ARM64
# Operating system: Linux
# Pointer size: 8 bytes
Cross-Compiling a C Library for ARM
Cross-compile a static library for aarch64 using the same tools:
aarch64-linux-gnu-gcc -c -O2 sensor.c -o sensor_arm64.o
aarch64-linux-gnu-ar rcs libsensor_arm64.a sensor_arm64.o
file sensor_arm64.o
# sensor_arm64.o: ELF 64-bit LSB relocatable, ARM aarch64
When using the cc crate in a Rust project, it automatically detects the
Cargo target triple and invokes the correct cross-compiler. If you run
cargo build --target aarch64-unknown-linux-gnu, the cc crate calls
aarch64-linux-gnu-gcc instead of gcc.
Caution: Struct layout across architectures can differ. Fields may have different alignment requirements on ARM vs x86. Always use
#[repr(C)]in Rust and fixed-width types (int16_t,uint32_t) in C to ensure consistent layout across platforms.
Checking Available Targets
GCC
GCC cross-compilers are separate binaries. List what is installed:
ls /usr/bin/*-gcc 2>/dev/null
# /usr/bin/aarch64-linux-gnu-gcc
# /usr/bin/arm-linux-gnueabihf-gcc
# /usr/bin/riscv64-linux-gnu-gcc
Rust
Rust shows all supported targets:
rustc --print target-list | wc -l
# Over 200 targets
rustc --print target-list | grep linux
# aarch64-unknown-linux-gnu
# arm-unknown-linux-gnueabihf
# riscv64gc-unknown-linux-gnu
# x86_64-unknown-linux-gnu
# x86_64-unknown-linux-musl
# ... many more
Get detailed target info:
rustc --print cfg --target aarch64-unknown-linux-gnu
This prints all cfg attributes that are true for that target, which
determines what code #[cfg(...)] includes or excludes.
Bare-Metal Cross-Compilation
For embedded targets with no OS, the approach changes. There is no libc,
no printf, no standard file I/O.
C for Bare Metal
/* bare.c -- for a bare-metal ARM target */
#include <stdint.h>
/* Memory-mapped UART register (hypothetical) */
#define UART0_DR (*(volatile uint32_t *)0x09000000)
void uart_putc(char c) {
UART0_DR = (uint32_t)c;
}
void uart_puts(const char *s) {
while (*s) {
uart_putc(*s++);
}
}
void _start(void) {
uart_puts("Hello, bare metal!\n");
while (1) {} /* hang */
}
aarch64-linux-gnu-gcc -ffreestanding -nostdlib -T linker.ld bare.c -o bare.elf
-ffreestandingtells the compiler not to assume a hosted environment.-nostdlibtells the linker not to link the standard library.-T linker.ldprovides a custom linker script.
Rust for Bare Metal
#![allow(unused)] fn main() { // src/main.rs #![no_std] #![no_main] use core::panic::PanicInfo; const UART0_DR: *mut u32 = 0x0900_0000 as *mut u32; fn uart_putc(c: u8) { unsafe { core::ptr::write_volatile(UART0_DR, c as u32); } } fn uart_puts(s: &str) { for b in s.bytes() { uart_putc(b); } } #[no_mangle] pub extern "C" fn _start() -> ! { uart_puts("Hello from bare-metal Rust!\n"); loop {} } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } }
cargo build --target aarch64-unknown-none
The aarch64-unknown-none target means: aarch64 architecture, no vendor,
no operating system. Rust's core library is available (basic types,
iterators, Option, Result), but std is not (no heap, no file I/O,
no threads).
Driver Prep: Kernel modules operate in a similar environment to bare metal. There is no standard library, no heap by default, and you interact with hardware through memory-mapped registers. Cross-compilation to
aarch64-unknown-linux-gnuis how you build kernel modules for ARM64 boards from your x86 workstation.
Summary Diagram: Cross-Compilation Workflow
Development Machine (x86-64)
+--------------------------------------------------+
| |
| Source Code (.c / .rs) |
| | |
| v |
| Cross-Compiler |
| (aarch64-linux-gnu-gcc / rustc --target ...) |
| | |
| +-- Sysroot: target headers + libs |
| | |
| v |
| Cross-Compiled Binary (ELF aarch64) |
| | |
+-------+------------------------------------------+
|
| scp / flash / JTAG / TFTP
v
Target Machine (aarch64)
+--------------------------------------------------+
| |
| Runs the binary natively |
| |
+--------------------------------------------------+
Knowledge Check
-
What does the "gnu" part of
aarch64-unknown-linux-gnuspecify? What would "musl" mean instead? -
You cross-compile a Rust program for
aarch64-unknown-linux-gnubut linking fails. What is the most likely missing piece? -
Why can you not just copy a dynamically-linked x86-64 binary to an aarch64 machine and run it?
Common Pitfalls
-
Forgetting to install the cross-linker for Rust.
rustccan emit aarch64 code, but it needsaarch64-linux-gnu-gcc(or equivalent) to link. Configure this in.cargo/config.toml. -
Mixing host and target libraries. If your Makefile picks up
/usr/libinstead of the sysroot'slib/, you get x86 libraries linked into an ARM binary. The result may link but will crash at runtime. -
Assuming identical struct layout across targets. Padding and alignment differ between 32-bit and 64-bit architectures. Use fixed-width types and
#pragma packor#[repr(C, packed)]when layout must be exact. -
Not testing with QEMU. Before deploying to real hardware, test cross-compiled binaries with
qemu-user. It catches most issues without needing the physical device. -
Forgetting endianness in wire protocols. If you serialize a struct to bytes on one architecture and deserialize on another, byte order mismatches will corrupt every multi-byte field.
File Descriptors
On Linux, everything is a file. A regular file, a terminal, a pipe, a network socket, even a device -- they are all accessed through the same interface: the file descriptor. This chapter shows you that interface from both sides of the C/Rust divide.
The File Descriptor Table
Every process has a small integer table managed by the kernel. Each entry points
to an open file description (an in-kernel structure). When you call open(),
the kernel picks the lowest available integer, fills the slot, and returns that
integer to you.
Process File-Descriptor Table Kernel Open-File Descriptions
+-----+------------------+ +----------------------------------+
| 0 | ──────────────────>─────> | struct file (terminal /dev/pts/0)|
+-----+------------------+ +----------------------------------+
| 1 | ──────────────────>─────> | struct file (terminal /dev/pts/0)|
+-----+------------------+ +----------------------------------+
| 2 | ──────────────────>─────> | struct file (terminal /dev/pts/0)|
+-----+------------------+ +----------------------------------+
| 3 | ──────────────────>─────> | struct file (/tmp/data.txt) |
+-----+------------------+ +----------------------------------+
| 4 | (unused) |
+-----+------------------+
| ... | |
+-----+------------------+
File descriptors 0, 1, and 2 are pre-opened by the shell before your program starts:
| fd | Symbolic Name | C Macro | Purpose |
|---|---|---|---|
| 0 | standard input | STDIN_FILENO | keyboard / pipe |
| 1 | standard output | STDOUT_FILENO | terminal / pipe |
| 2 | standard error | STDERR_FILENO | terminal / pipe |
Driver Prep: In kernel modules you will work with
struct filedirectly. Understanding the user-space side now makes the kernel side feel familiar.
Opening a File in C
/* open_file.c -- open, write, read, close */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
int main(void)
{
const char *path = "/tmp/fd_demo.txt";
/* O_WRONLY -- write only
O_CREAT -- create if missing
O_TRUNC -- truncate to zero length if exists
0644 -- rw-r--r-- permissions */
int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd == -1) {
perror("open");
return 1;
}
printf("opened %s as fd %d\n", path, fd);
const char *msg = "hello from file descriptor land\n";
ssize_t nw = write(fd, msg, strlen(msg));
if (nw == -1) {
perror("write");
close(fd);
return 1;
}
printf("wrote %zd bytes\n", nw);
if (close(fd) == -1) {
perror("close");
return 1;
}
/* Now reopen for reading */
fd = open(path, O_RDONLY);
if (fd == -1) {
perror("open (read)");
return 1;
}
char buf[128];
ssize_t nr = read(fd, buf, sizeof(buf) - 1);
if (nr == -1) {
perror("read");
close(fd);
return 1;
}
buf[nr] = '\0';
printf("read back: %s", buf);
close(fd);
return 0;
}
Compile and run:
$ gcc -Wall -o open_file open_file.c && ./open_file
opened /tmp/fd_demo.txt as fd 3
wrote 32 bytes
read back: hello from file descriptor land
Notice the fd is 3 -- the first slot after stdin/stdout/stderr.
Caution: Always check the return value of
open(). A return of-1means failure, anderrnotells you why. Forgetting this check is the single most common file-handling bug in C.
The Open Flags
Here are the flags you will use constantly:
| Flag | Meaning |
|---|---|
O_RDONLY | Open for reading only |
O_WRONLY | Open for writing only |
O_RDWR | Open for reading and writing |
O_CREAT | Create the file if it does not exist |
O_TRUNC | Truncate existing file to zero length |
O_APPEND | Writes always go to end of file |
O_EXCL | Fail if file already exists (with O_CREAT) |
O_CLOEXEC | Close fd automatically on exec() |
O_CREAT requires a third argument to open() specifying the permission bits.
Without it the permissions are garbage -- whatever happened to be on the stack.
Caution: Forgetting the mode argument when using
O_CREATis undefined behavior. The compiler will not warn you becauseopen()uses variadic arguments.
Partial Reads and Writes
read() and write() are not guaranteed to transfer the full amount you
requested. A read() of 4096 bytes might return 17 if only 17 bytes are
available. A write() on a non-blocking socket might write half your buffer.
A robust write loop:
/* write_all.c -- handle partial writes */
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <stdio.h>
ssize_t write_all(int fd, const void *buf, size_t count)
{
const char *p = buf;
size_t remaining = count;
while (remaining > 0) {
ssize_t n = write(fd, p, remaining);
if (n == -1) {
if (errno == EINTR)
continue; /* interrupted by signal, retry */
return -1;
}
p += n;
remaining -= (size_t)n;
}
return (ssize_t)count;
}
int main(void)
{
int fd = open("/tmp/robust.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd == -1) { perror("open"); return 1; }
const char *data = "robust write completed\n";
if (write_all(fd, data, strlen(data)) == -1) {
perror("write_all");
close(fd);
return 1;
}
close(fd);
printf("done\n");
return 0;
}
Try It: Modify
write_allto also handleEAGAIN(would block on non-blocking descriptors). What should you do -- retry immediately or sleep?
The Rust Equivalent
Rust wraps file descriptors in std::fs::File. The Read and Write traits
provide read() and write(). Dropping a File closes it automatically.
// open_file.rs -- open, write, read, close via drop use std::fs::{File, OpenOptions}; use std::io::{Read, Write}; fn main() -> std::io::Result<()> { let path = "/tmp/fd_demo_rs.txt"; // Create and write { let mut f = OpenOptions::new() .write(true) .create(true) .truncate(true) .open(path)?; let msg = b"hello from Rust file descriptor land\n"; f.write_all(msg)?; println!("wrote {} bytes", msg.len()); } // f is dropped here -- close() called automatically // Reopen and read { let mut f = File::open(path)?; let mut contents = String::new(); f.read_to_string(&mut contents)?; print!("read back: {}", contents); } Ok(()) }
Compile and run:
$ rustc open_file.rs && ./open_file
wrote 37 bytes
read back: hello from Rust file descriptor land
Rust Note:
write_all()already handles partial writes internally. You never need to write a retry loop in Rust -- the standard library does it for you. The?operator propagates errors cleanly.
Accessing the Raw File Descriptor in Rust
Sometimes you need the raw integer, for example when calling a Linux-specific ioctl. Rust provides traits for this:
// raw_fd.rs -- access the underlying file descriptor number use std::fs::File; use std::os::unix::io::AsRawFd; fn main() -> std::io::Result<()> { let f = File::open("/tmp/fd_demo_rs.txt")?; let raw: i32 = f.as_raw_fd(); println!("raw fd = {}", raw); // f still owns the fd -- it will close on drop Ok(()) }
$ rustc raw_fd.rs && ./raw_fd
raw fd = 3
There is also FromRawFd for wrapping an existing fd, and IntoRawFd for
giving up ownership. Use these when bridging C libraries.
#![allow(unused)] fn main() { use std::fs::File; use std::os::unix::io::FromRawFd; // SAFETY: fd must be a valid, open file descriptor that we now own. let f = unsafe { File::from_raw_fd(raw_fd) }; }
Caution:
from_raw_fdisunsafebecause Rust cannot verify the fd is valid or that nobody else will close it. Double-close is undefined behavior at the OS level.
Duplicating File Descriptors: dup and dup2
dup(fd) duplicates a file descriptor, returning the lowest available number.
dup2(oldfd, newfd) duplicates oldfd onto newfd, closing newfd first if
it was open.
This is how shells implement redirection. ls > output.txt is roughly:
fd = open("output.txt", ...)
dup2(fd, STDOUT_FILENO) // stdout now points to output.txt
close(fd) // don't need the extra fd
exec("ls", ...)
/* dup_demo.c -- redirect stdout to a file */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
int main(void)
{
int fd = open("/tmp/dup_out.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd == -1) { perror("open"); return 1; }
/* Save original stdout */
int saved_stdout = dup(STDOUT_FILENO);
if (saved_stdout == -1) { perror("dup"); return 1; }
/* Redirect stdout to the file */
if (dup2(fd, STDOUT_FILENO) == -1) { perror("dup2"); return 1; }
close(fd); /* fd is no longer needed; stdout points to the file */
/* This printf goes to /tmp/dup_out.txt */
printf("this line goes to the file\n");
fflush(stdout);
/* Restore stdout */
dup2(saved_stdout, STDOUT_FILENO);
close(saved_stdout);
/* This printf goes to the terminal */
printf("this line goes to the terminal\n");
return 0;
}
$ gcc -Wall -o dup_demo dup_demo.c && ./dup_demo
this line goes to the terminal
$ cat /tmp/dup_out.txt
this line goes to the file
Before dup2: After dup2(fd, 1): After close(fd):
fd 1 ──> terminal fd 1 ──> file fd 1 ──> file
fd 3 ──> file fd 3 ──> file fd 3 ── (closed)
dup2 in Rust
Rust has no safe wrapper for dup2 in the standard library. Use the libc
crate or the nix crate:
// dup2_demo.rs -- redirect stdout using libc::dup2 // Cargo.toml needs: libc = "0.2" use std::fs::OpenOptions; use std::os::unix::io::AsRawFd; fn main() -> std::io::Result<()> { let file = OpenOptions::new() .write(true).create(true).truncate(true) .open("/tmp/dup_out_rs.txt")?; let saved = unsafe { libc::dup(1) }; if saved == -1 { return Err(std::io::Error::last_os_error()); } if unsafe { libc::dup2(file.as_raw_fd(), 1) } == -1 { return Err(std::io::Error::last_os_error()); } println!("this line goes to the file"); unsafe { libc::dup2(saved, 1); libc::close(saved); } println!("this line goes to the terminal"); Ok(()) }
Rust Note: The
nixcrate provides safe wrappers:nix::unistd::dup2(). For production code, prefernixover rawlibccalls.
Error Handling at Every Syscall
In C, every system call can fail. The pattern is always the same:
int result = some_syscall(...);
if (result == -1) {
perror("some_syscall");
// handle error: cleanup, return, exit
}
The global variable errno is set on failure. perror() prints a
human-readable message. strerror(errno) gives you the string directly.
Common errors:
| errno | Meaning |
|---|---|
ENOENT | No such file or directory |
EACCES | Permission denied |
EEXIST | File already exists (with O_EXCL) |
EMFILE | Too many open files (per-process) |
ENFILE | Too many open files (system-wide) |
EINTR | Interrupted by signal |
EBADF | Bad file descriptor |
In Rust, std::io::Error wraps all of this. The kind() method maps to
ErrorKind variants, and raw_os_error() gives you the raw errno.
use std::fs::File; use std::io::ErrorKind; fn main() { match File::open("/nonexistent/path") { Ok(_) => println!("opened"), Err(e) => { println!("error kind: {:?}", e.kind()); println!("os error: {:?}", e.raw_os_error()); println!("message: {}", e); if e.kind() == ErrorKind::NotFound { println!("file does not exist"); } } } }
$ rustc error_demo.rs && ./error_demo
error kind: NotFound
os error: Some(2)
message: No such file or directory (os error 2)
file does not exist
O_CLOEXEC and File Descriptor Leaks
When you fork() and then exec(), all open file descriptors are inherited by
the child process unless they are marked close-on-exec. This is a common source
of file descriptor leaks and security bugs.
/* cloexec.c -- demonstrate O_CLOEXEC */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(void)
{
/* Without O_CLOEXEC: fd leaks to child after exec */
int fd_leak = open("/tmp/fd_demo.txt", O_RDONLY);
/* With O_CLOEXEC: fd is automatically closed on exec */
int fd_safe = open("/tmp/fd_demo.txt", O_RDONLY | O_CLOEXEC);
printf("fd_leak = %d, fd_safe = %d\n", fd_leak, fd_safe);
/* In a fork+exec scenario, fd_leak would be visible to the child
process, but fd_safe would not. */
close(fd_leak);
close(fd_safe);
return 0;
}
Driver Prep: Kernel drivers deal with
struct filedirectly. Thereleasecallback instruct file_operationsis called when the last fd referring to a file is closed. Understanding reference counting of descriptors here prepares you for that.
lseek: Moving the File Offset
Every open file description has a current offset. read() and write()
advance it. lseek() repositions it.
/* lseek_demo.c -- seek within a file */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
int main(void)
{
int fd = open("/tmp/seek_demo.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
if (fd == -1) { perror("open"); return 1; }
write(fd, "ABCDEFGHIJ", 10);
/* Seek back to offset 3 */
off_t pos = lseek(fd, 3, SEEK_SET);
printf("position after lseek: %ld\n", (long)pos);
/* Overwrite from position 3 */
write(fd, "xyz", 3);
/* Seek to beginning and read everything */
lseek(fd, 0, SEEK_SET);
char buf[32];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("contents: %s\n", buf);
close(fd);
return 0;
}
$ gcc -Wall -o lseek_demo lseek_demo.c && ./lseek_demo
position after lseek: 3
contents: ABCxyzGHIJ
SEEK_SET -- offset from beginning. SEEK_CUR -- offset from current
position. SEEK_END -- offset from end of file.
In Rust, use Seek trait:
// seek_demo.rs use std::fs::OpenOptions; use std::io::{Read, Write, Seek, SeekFrom}; fn main() -> std::io::Result<()> { let mut f = OpenOptions::new() .read(true) .write(true) .create(true) .truncate(true) .open("/tmp/seek_demo_rs.txt")?; f.write_all(b"ABCDEFGHIJ")?; // Seek to offset 3 let pos = f.seek(SeekFrom::Start(3))?; println!("position after seek: {}", pos); f.write_all(b"xyz")?; // Seek to start and read f.seek(SeekFrom::Start(0))?; let mut contents = String::new(); f.read_to_string(&mut contents)?; println!("contents: {}", contents); Ok(()) }
Quick Knowledge Check
-
What file descriptor number does
open()return if stdin, stdout, and stderr are all open and no other files are open? -
What happens if you call
open()withO_CREATbut forget the third argument (the mode)? -
After
dup2(fd, STDOUT_FILENO), which file descriptor should you close --fd,STDOUT_FILENO, or both?
Common Pitfalls
-
Forgetting to check return values. Every syscall can fail. Every one.
-
Forgetting the mode with
O_CREAT. The file gets garbage permissions. -
Not handling partial reads/writes.
read()returning less than requested is normal, not an error. -
Leaking file descriptors. Every
open()must have a matchingclose(). In Rust,Drophandles this, but in C it is your job. -
Using fd after close. Just like use-after-free, using a closed fd is a bug. The number might be reassigned to a different file.
-
Ignoring
EINTR. Signals can interrupt blocking syscalls. Always retry onEINTR. -
Forgetting
O_CLOEXEC. File descriptors leak acrossexec()by default. Always useO_CLOEXECunless you specifically want inheritance.
Buffered vs Unbuffered I/O
Every write() system call crosses the user-kernel boundary. That crossing is
expensive -- hundreds of nanoseconds at minimum. Buffered I/O collects small
writes into a large buffer and flushes them in one syscall. This chapter shows
you both layers and when to use each.
The Two Layers
Your Program
|
v
+------------------------------+
| stdio (fopen, fprintf, ...) | <-- buffered (user-space)
| internal buffer: 4096+ bytes|
+------------------------------+
| fflush() or buffer full
v
+------------------------------+
| syscalls (open, write, ...) | <-- unbuffered (kernel boundary)
+------------------------------+
|
v
Kernel page cache / disk
The unbuffered layer (open, read, write, close) is what we covered in
Chapter 28. The buffered layer (fopen, fread, fwrite, fprintf,
fclose) wraps the unbuffered layer with a user-space buffer.
Buffered I/O in C: stdio
/* stdio_demo.c -- buffered I/O with fopen, fprintf, fclose */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *fp = fopen("/tmp/buffered.txt", "w");
if (!fp) {
perror("fopen");
return 1;
}
/* fprintf writes to an internal buffer, not directly to disk */
fprintf(fp, "line one\n");
fprintf(fp, "line two\n");
fprintf(fp, "value: %d\n", 42);
/* fclose flushes the buffer and then calls close() */
if (fclose(fp) != 0) {
perror("fclose");
return 1;
}
/* Read it back */
fp = fopen("/tmp/buffered.txt", "r");
if (!fp) {
perror("fopen");
return 1;
}
char line[256];
while (fgets(line, sizeof(line), fp)) {
printf("read: %s", line);
}
fclose(fp);
return 0;
}
$ gcc -Wall -o stdio_demo stdio_demo.c && ./stdio_demo
read: line one
read: line two
read: value: 42
fread and fwrite
For binary data or bulk transfers, use fread and fwrite instead of
fprintf and fgets:
/* fread_fwrite.c -- binary I/O */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int data[] = {10, 20, 30, 40, 50};
size_t count = sizeof(data) / sizeof(data[0]);
/* Write binary data */
FILE *fp = fopen("/tmp/binary.dat", "wb");
if (!fp) { perror("fopen"); return 1; }
size_t written = fwrite(data, sizeof(int), count, fp);
printf("wrote %zu integers\n", written);
fclose(fp);
/* Read it back */
int buf[5] = {0};
fp = fopen("/tmp/binary.dat", "rb");
if (!fp) { perror("fopen"); return 1; }
size_t nread = fread(buf, sizeof(int), count, fp);
printf("read %zu integers:", nread);
for (size_t i = 0; i < nread; i++) {
printf(" %d", buf[i]);
}
printf("\n");
fclose(fp);
return 0;
}
$ gcc -Wall -o fread_fwrite fread_fwrite.c && ./fread_fwrite
wrote 5 integers
read 5 integers: 10 20 30 40 50
Caution:
fwriteof raw structs is not portable across architectures due to endianness and padding differences. For files that must be portable, serialize field by field.
Buffer Modes: Full, Line, None
stdio supports three buffering modes, set with setvbuf():
| Mode | Constant | Behavior |
|---|---|---|
| Full buffering | _IOFBF | Flush when buffer is full |
| Line buffering | _IOLBF | Flush on newline or when full |
| No buffering | _IONBF | Every write goes to kernel immediately |
Default behavior:
- stderr is unbuffered (
_IONBF) -- errors appear immediately - stdout is line-buffered when connected to a terminal, full-buffered when connected to a pipe or file
- Files opened with
fopenare full-buffered
/* setvbuf_demo.c -- control buffering mode */
#include <stdio.h>
#include <unistd.h>
int main(void)
{
FILE *fp = fopen("/tmp/setvbuf_demo.txt", "w");
if (!fp) { perror("fopen"); return 1; }
/* Set line buffering with a 1024-byte buffer */
char mybuf[1024];
if (setvbuf(fp, mybuf, _IOLBF, sizeof(mybuf)) != 0) {
perror("setvbuf");
fclose(fp);
return 1;
}
fprintf(fp, "this flushes on newline\n"); /* flushed now */
fprintf(fp, "no newline yet..."); /* still in buffer */
fprintf(fp, " now!\n"); /* flushed now */
fclose(fp);
/* Verify */
fp = fopen("/tmp/setvbuf_demo.txt", "r");
char line[256];
while (fgets(line, sizeof(line), fp))
printf("%s", line);
fclose(fp);
return 0;
}
To disable buffering entirely:
setvbuf(fp, NULL, _IONBF, 0);
Try It: Write a program that prints to stdout without a newline, then sleeps for 3 seconds, then prints a newline. Run it piped to
catvs directly on the terminal. Observe the difference in when output appears.
When to Flush: fflush
fflush(fp) forces the buffer to be written to the kernel. Common situations
where you need it:
- Before a
fork()-- otherwise the child inherits the buffer and you get double output - Before reading from the same file you are writing to
- Before a crash-sensitive section -- data in the buffer is lost on crash
- Before switching between stdio and raw fd operations on the same file
/* fflush_demo.c -- explicit flush */
#include <stdio.h>
#include <unistd.h>
int main(void)
{
printf("prompt: ");
fflush(stdout); /* force output before blocking on read */
char buf[64];
if (fgets(buf, sizeof(buf), stdin)) {
printf("you typed: %s", buf);
}
return 0;
}
fflush(NULL) flushes all open output streams. Useful before fork().
Caution:
fflush(stdin)is undefined behavior in the C standard, even though some implementations (like glibc on Linux) define it to discard input. Do not rely on it.
The Rust Equivalent: BufReader and BufWriter
Rust separates buffering from the file type. You wrap any reader in
BufReader and any writer in BufWriter.
// buffered_io.rs -- BufWriter and BufReader use std::fs::File; use std::io::{BufWriter, BufReader, BufRead, Write}; fn main() -> std::io::Result<()> { let path = "/tmp/buffered_rs.txt"; // Buffered writing { let file = File::create(path)?; let mut writer = BufWriter::new(file); writeln!(writer, "line one")?; writeln!(writer, "line two")?; writeln!(writer, "value: {}", 42)?; // BufWriter flushes on drop, but explicit flush // lets you catch errors writer.flush()?; } // Buffered reading { let file = File::open(path)?; let reader = BufReader::new(file); for line in reader.lines() { let line = line?; println!("read: {}", line); } } Ok(()) }
$ rustc buffered_io.rs && ./buffered_io
read: line one
read: line two
read: value: 42
Rust Note:
BufWriterflushes its buffer when dropped. However, any error during that flush is silently ignored. Always call.flush()explicitly before dropping if you need to detect write failures.
The BufRead Trait
BufReader implements the BufRead trait, which gives you lines(),
read_line(), and read_until():
// bufread_demo.rs use std::io::{self, BufRead}; fn main() { let stdin = io::stdin(); let handle = stdin.lock(); // locked handle implements BufRead println!("Type lines (Ctrl-D to stop):"); for (i, line) in handle.lines().enumerate() { match line { Ok(text) => println!(" line {}: {}", i + 1, text), Err(e) => { eprintln!("error: {}", e); break; } } } }
The lines() iterator strips trailing newlines and yields io::Result<String>
for each line.
write! and writeln! Macros
Rust's write! and writeln! macros work on any type implementing the
Write trait -- not just stdout:
// write_macro.rs use std::io::Write; use std::fs::File; fn main() -> std::io::Result<()> { let mut f = File::create("/tmp/write_macro.txt")?; write!(f, "no newline")?; writeln!(f, " -- now with newline")?; writeln!(f, "pi is approximately {:.4}", std::f64::consts::PI)?; // Also works with Vec<u8> as an in-memory buffer let mut buf: Vec<u8> = Vec::new(); writeln!(buf, "hello into a vector")?; println!("buf contains: {:?}", String::from_utf8(buf).unwrap()); Ok(()) }
$ rustc write_macro.rs && ./write_macro
buf contains: "hello into a vector\n"
Do Not Mix Buffered and Unbuffered I/O
Using both write() and fprintf() on the same file descriptor leads to
interleaved, corrupted output because the stdio buffer and the kernel see
different states.
/* bad_mix.c -- DO NOT DO THIS */
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int main(void)
{
/* stdout is fd 1, and printf uses a buffer on fd 1 */
printf("buffered line"); /* sits in stdio buffer */
const char *msg = "unbuffered line\n";
write(1, msg, strlen(msg)); /* goes directly to kernel */
printf(" -- surprise!\n"); /* still in buffer, flushed later */
return 0;
}
$ gcc -Wall -o bad_mix bad_mix.c && ./bad_mix | cat
unbuffered line
buffered line -- surprise!
The unbuffered write() bypasses the stdio buffer and reaches the output
first. The buffered printf output appears later when the buffer flushes.
Caution: Never mix
write()/read()andfprintf()/fread()on the same file descriptor. Pick one layer and stick with it.
Performance: Buffered vs Unbuffered
Let us measure the difference. Writing one million single-byte writes:
/* perf_test.c -- compare buffered vs unbuffered single-byte writes */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#define N 1000000
static double now(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec + ts.tv_nsec * 1e-9;
}
int main(void)
{
/* Unbuffered: 1 million write() syscalls */
int fd = open("/tmp/perf_unbuf.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
double t0 = now();
for (int i = 0; i < N; i++) {
write(fd, "x", 1);
}
double t1 = now();
close(fd);
printf("unbuffered: %.3f seconds (%d syscalls)\n", t1 - t0, N);
/* Buffered: stdio batches into ~4096-byte chunks */
FILE *fp = fopen("/tmp/perf_buf.txt", "w");
t0 = now();
for (int i = 0; i < N; i++) {
fputc('x', fp);
}
fclose(fp);
t1 = now();
printf("buffered: %.3f seconds (~%d syscalls)\n", t1 - t0, N / 4096);
return 0;
}
$ gcc -O2 -Wall -o perf_test perf_test.c && ./perf_test
unbuffered: 1.247 seconds (1000000 syscalls)
buffered: 0.018 seconds (~244 syscalls)
The buffered version is roughly 60x faster. Each fputc copies one byte into
the stdio buffer. Only when the buffer fills (~4096 bytes) does a write()
syscall happen.
Unbuffered: write("x") write("x") write("x") ... (1,000,000 syscalls)
| | |
v v v
kernel kernel kernel
Buffered: fputc -> [....buffer fills....] -> write(4096 bytes) -> kernel
fputc -> [....buffer fills....] -> write(4096 bytes) -> kernel
... (~244 syscalls total)
Rust Performance Comparison
// perf_test.rs -- buffered vs unbuffered in Rust use std::fs::File; use std::io::{BufWriter, Write}; use std::time::Instant; const N: usize = 1_000_000; fn main() -> std::io::Result<()> { // Unbuffered: each write_all is a syscall { let mut f = File::create("/tmp/perf_unbuf_rs.txt")?; let t0 = Instant::now(); for _ in 0..N { f.write_all(b"x")?; } let elapsed = t0.elapsed(); println!("unbuffered: {:.3} seconds", elapsed.as_secs_f64()); } // Buffered: BufWriter batches writes { let f = File::create("/tmp/perf_buf_rs.txt")?; let mut writer = BufWriter::new(f); let t0 = Instant::now(); for _ in 0..N { writer.write_all(b"x")?; } writer.flush()?; let elapsed = t0.elapsed(); println!("buffered: {:.3} seconds", elapsed.as_secs_f64()); } Ok(()) }
Rust Note:
File::write_alldoes not buffer -- each call goes directly to the kernel. Always wrapFileinBufWriterwhen doing many small writes. This is one of the most common Rust I/O performance mistakes.
Custom Buffer Sizes
The default BufWriter buffer is 8 KiB. For large sequential writes (like
copying a multi-gigabyte file), a larger buffer can help:
#![allow(unused)] fn main() { use std::fs::File; use std::io::BufWriter; let f = File::create("/tmp/large_output.bin")?; let writer = BufWriter::with_capacity(64 * 1024, f); // 64 KiB buffer }
In C, setvbuf does the same:
FILE *fp = fopen("/tmp/large_output.bin", "w");
char *buf = malloc(64 * 1024);
setvbuf(fp, buf, _IOFBF, 64 * 1024);
/* ... use fp ... */
fclose(fp);
free(buf);
Driver Prep: In kernel space there is no stdio. Drivers use raw
copy_to_user/copy_from_userfor transferring data between kernel and user buffers. Understanding why buffering matters at the user level helps you design efficient kernel interfaces.
Quick Knowledge Check
-
Why is stderr unbuffered by default?
-
You call
printf("hello")(no newline) and then your program crashes. Does "hello" appear on the terminal? What if stdout is connected to a pipe? -
In Rust, what happens if
BufWriter::flush()is never called and theBufWriteris simply dropped?
Common Pitfalls
-
Forgetting to flush before
fork(). Both parent and child inherit the buffer contents. When both eventually flush, you get duplicate output. -
Assuming
printfoutput appears immediately. It does on a terminal (line-buffered), but not when piped to another process (full-buffered). -
Using
fflush(stdin). Undefined behavior in the C standard. -
Dropping
BufWriterwithout explicitflush(). The implicit flush on drop silently discards errors. Always flush explicitly when error handling matters. -
Using unbuffered I/O for many small writes. The syscall overhead dominates. Always buffer.
-
Mixing buffered and unbuffered on the same fd. Output arrives in unpredictable order.
-
Not setting binary mode on Windows. On Windows,
fopenwithout"b"translates\nto\r\n. On Linux this is not an issue, but portable code should use"wb"/"rb"for binary files.
File Metadata and Directories
Files are not just data blobs. The kernel stores metadata about every file: its size, owner, permissions, timestamps, and more. This chapter teaches you how to query and modify that metadata, and how to navigate the directory tree from both C and Rust.
The stat() Family
The stat() system call fills a struct stat with information about a file.
There are three variants:
| Function | Operates on | Follows symlinks? |
|---|---|---|
stat() | a path (string) | Yes |
lstat() | a path (string) | No |
fstat() | an open fd (int) | N/A |
/* stat_demo.c -- query file metadata */
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#include <pwd.h>
#include <grp.h>
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "usage: %s <path>\n", argv[0]);
return 1;
}
struct stat st;
if (stat(argv[1], &st) == -1) {
perror("stat");
return 1;
}
printf("file: %s\n", argv[1]);
printf("inode: %lu\n", (unsigned long)st.st_ino);
printf("size: %ld bytes\n", (long)st.st_size);
printf("blocks: %ld (512-byte units)\n", (long)st.st_blocks);
printf("hard links: %lu\n", (unsigned long)st.st_nlink);
struct passwd *pw = getpwuid(st.st_uid);
struct group *gr = getgrgid(st.st_gid);
printf("owner: %s (uid %d)\n", pw ? pw->pw_name : "?", st.st_uid);
printf("group: %s (gid %d)\n", gr ? gr->gr_name : "?", st.st_gid);
printf("permissions: %o\n", st.st_mode & 07777);
if (S_ISREG(st.st_mode)) printf("type: regular file\n");
else if (S_ISDIR(st.st_mode)) printf("type: directory\n");
else if (S_ISLNK(st.st_mode)) printf("type: symlink\n");
else if (S_ISCHR(st.st_mode)) printf("type: char device\n");
else if (S_ISBLK(st.st_mode)) printf("type: block device\n");
else if (S_ISFIFO(st.st_mode)) printf("type: FIFO/pipe\n");
else if (S_ISSOCK(st.st_mode)) printf("type: socket\n");
char timebuf[64];
struct tm *tm = localtime(&st.st_mtime);
strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S", tm);
printf("modified: %s\n", timebuf);
return 0;
}
$ gcc -Wall -o stat_demo stat_demo.c && ./stat_demo /etc/passwd
file: /etc/passwd
inode: 1048594
size: 2773 bytes
blocks: 8 (512-byte units)
hard links: 1
owner: root (uid 0)
group: root (gid 0)
permissions: 644
type: regular file
modified: 2025-01-15 10:22:33
The struct stat Layout
struct stat
+------------------+--------------------------------------------+
| st_dev | device ID of filesystem |
| st_ino | inode number (unique within filesystem) |
| st_mode | file type + permissions (see below) |
| st_nlink | number of hard links |
| st_uid | owner user ID |
| st_gid | owner group ID |
| st_rdev | device ID (for char/block devices) |
| st_size | file size in bytes |
| st_blksize | optimal I/O block size |
| st_blocks | number of 512-byte blocks allocated |
| st_atim | last access time |
| st_mtim | last modification time |
| st_ctim | last status change time |
+------------------+--------------------------------------------+
st_mode bit layout (16 bits):
+------+------+------+------+------+------+
| type |setuid|setgid|sticky| user | group| other
| 4bit | 1 | 1 | 1 | rwx | rwx | rwx
+------+------+------+------+------+------+
Driver Prep: When you implement a character device driver, the kernel populates some of these fields for you. Your driver's
getattrcallback can override them. Understanding what each field means is essential.
Checking File Type with Macros
The S_IS* macros decode the file type from st_mode:
if (S_ISREG(st.st_mode)) { /* regular file */ }
if (S_ISDIR(st.st_mode)) { /* directory */ }
if (S_ISLNK(st.st_mode)) { /* symbolic link -- use lstat! */ }
if (S_ISCHR(st.st_mode)) { /* character device */ }
if (S_ISBLK(st.st_mode)) { /* block device */ }
if (S_ISFIFO(st.st_mode)) { /* FIFO (named pipe) */ }
if (S_ISSOCK(st.st_mode)) { /* socket */ }
Caution:
stat()follows symbolic links. If you callstat()on a symlink, you get the metadata of the target file. Uselstat()to get the metadata of the symlink itself.
Rust: std::fs::metadata
// metadata_demo.rs -- file metadata in Rust use std::fs; use std::os::unix::fs::MetadataExt; use std::os::unix::fs::PermissionsExt; fn main() -> std::io::Result<()> { let path = "/etc/passwd"; let meta = fs::metadata(path)?; println!("file: {}", path); println!("inode: {}", meta.ino()); println!("size: {} bytes", meta.len()); println!("blocks: {}", meta.blocks()); println!("hard links: {}", meta.nlink()); println!("uid: {}", meta.uid()); println!("gid: {}", meta.gid()); println!("permissions: {:o}", meta.permissions().mode() & 0o7777); if meta.is_file() { println!("type: regular file"); } if meta.is_dir() { println!("type: directory"); } if meta.is_symlink() { println!("type: symlink"); } if let Ok(modified) = meta.modified() { println!("modified: {:?}", modified); } Ok(()) }
For symlink metadata (equivalent to lstat), use fs::symlink_metadata():
#![allow(unused)] fn main() { let meta = fs::symlink_metadata("/some/symlink")?; println!("is symlink: {}", meta.is_symlink()); }
Rust Note:
MetadataExtis Unix-specific (imported fromstd::os::unix::fs). The cross-platformMetadatatype only exposeslen(),is_file(),is_dir(), and timestamps. For inode, uid, gid, and other Unix-specific fields, you need the extension trait.
Changing Permissions and Ownership
/* chmod_demo.c -- change file permissions */
#include <stdio.h>
#include <sys/stat.h>
int main(void)
{
const char *path = "/tmp/chmod_test.txt";
FILE *fp = fopen(path, "w");
if (!fp) { perror("fopen"); return 1; }
fprintf(fp, "secret data\n");
fclose(fp);
if (chmod(path, 0400) == -1) {
perror("chmod");
return 1;
}
struct stat st;
stat(path, &st);
printf("permissions: %o\n", st.st_mode & 07777);
chmod(path, 0644); /* restore */
return 0;
}
For changing ownership, chown(path, uid, gid) works the same way. Only root
(or a process with CAP_CHOWN) can change file ownership to another user.
In Rust:
use std::fs; use std::os::unix::fs::PermissionsExt; fn main() -> std::io::Result<()> { let path = "/tmp/chmod_test_rs.txt"; fs::write(path, "secret data\n")?; let perms = fs::Permissions::from_mode(0o400); fs::set_permissions(path, perms)?; let meta = fs::metadata(path)?; println!("permissions: {:o}", meta.permissions().mode() & 0o7777); fs::set_permissions(path, fs::Permissions::from_mode(0o644))?; Ok(()) }
Directory Operations in C
Reading a Directory
/* readdir_demo.c -- list directory contents */
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <sys/stat.h>
#include <string.h>
int main(int argc, char *argv[])
{
const char *dirpath = argc > 1 ? argv[1] : ".";
DIR *dp = opendir(dirpath);
if (!dp) {
perror("opendir");
return 1;
}
struct dirent *entry;
while ((entry = readdir(dp)) != NULL) {
if (strcmp(entry->d_name, ".") == 0 ||
strcmp(entry->d_name, "..") == 0)
continue;
char fullpath[4096];
snprintf(fullpath, sizeof(fullpath), "%s/%s", dirpath, entry->d_name);
struct stat st;
if (lstat(fullpath, &st) == -1) {
perror(fullpath);
continue;
}
char type = '-';
if (S_ISDIR(st.st_mode)) type = 'd';
else if (S_ISLNK(st.st_mode)) type = 'l';
else if (S_ISCHR(st.st_mode)) type = 'c';
else if (S_ISBLK(st.st_mode)) type = 'b';
printf("%c %8ld %s\n", type, (long)st.st_size, entry->d_name);
}
closedir(dp);
return 0;
}
$ gcc -Wall -o readdir_demo readdir_demo.c && ./readdir_demo /tmp
- 32 fd_demo.txt
- 20 buffered.txt
- 0 chmod_test.txt
Creating and Removing Directories
/* mkdir_rmdir.c -- create and remove directories */
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>
int main(void)
{
if (mkdir("/tmp/mydir", 0755) == -1)
perror("mkdir");
if (mkdir("/tmp/mydir/sub", 0755) == -1)
perror("mkdir sub");
printf("created /tmp/mydir/sub\n");
/* rmdir only works on empty directories */
rmdir("/tmp/mydir/sub");
rmdir("/tmp/mydir");
printf("removed both directories\n");
return 0;
}
Caution:
rmdir()fails withENOTEMPTYif the directory is not empty. To remove a directory tree, you must remove its contents first (recursively).
File Operations: unlink, rename
/* unlink_rename.c */
#include <stdio.h>
#include <unistd.h>
int main(void)
{
FILE *fp = fopen("/tmp/old_name.txt", "w");
fprintf(fp, "I will be renamed\n");
fclose(fp);
fp = fopen("/tmp/to_delete.txt", "w");
fprintf(fp, "I will be deleted\n");
fclose(fp);
/* rename() -- atomic on the same filesystem */
if (rename("/tmp/old_name.txt", "/tmp/new_name.txt") == -1) {
perror("rename");
return 1;
}
printf("renamed old_name.txt -> new_name.txt\n");
/* unlink() -- remove a hard link (deletes file if last link) */
if (unlink("/tmp/to_delete.txt") == -1) {
perror("unlink");
return 1;
}
printf("deleted to_delete.txt\n");
unlink("/tmp/new_name.txt");
return 0;
}
Directory Operations in Rust
// readdir_demo.rs -- list directory contents use std::fs; fn main() -> std::io::Result<()> { let dirpath = std::env::args().nth(1).unwrap_or_else(|| ".".to_string()); for entry in fs::read_dir(&dirpath)? { let entry = entry?; let meta = entry.metadata()?; let file_type = entry.file_type()?; let type_char = if file_type.is_dir() { 'd' } else if file_type.is_symlink() { 'l' } else { '-' }; println!("{} {:>8} {}", type_char, meta.len(), entry.file_name().to_string_lossy()); } Ok(()) }
Creating, Removing, Renaming
// dir_ops.rs -- mkdir, rmdir, rename, remove use std::fs; fn main() -> std::io::Result<()> { fs::create_dir_all("/tmp/rustdir/sub")?; println!("created /tmp/rustdir/sub"); fs::write("/tmp/rustdir/sub/hello.txt", "hello\n")?; fs::rename("/tmp/rustdir/sub/hello.txt", "/tmp/rustdir/sub/world.txt")?; println!("renamed hello.txt -> world.txt"); fs::remove_file("/tmp/rustdir/sub/world.txt")?; println!("removed world.txt"); fs::remove_dir_all("/tmp/rustdir")?; println!("removed /tmp/rustdir and all contents"); Ok(()) }
Rust Note:
fs::create_dir_allis likemkdir -p-- it creates parent directories as needed.fs::remove_dir_allis likerm -rf-- it removes everything recursively. In C you must walk the tree yourself.
Symbolic Links and Hard Links
Hard link:
name_a ──> inode 12345 <── name_b
(both names point to the same inode; same data blocks)
Symbolic link:
symlink ──> "path/to/target" (just stores a path string)
|
+-- readlink() returns "path/to/target"
+-- stat() follows to the target
+-- lstat() returns info about the symlink itself
/* links_demo.c -- create hard and symbolic links */
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>
int main(void)
{
FILE *fp = fopen("/tmp/original.txt", "w");
fprintf(fp, "original content\n");
fclose(fp);
link("/tmp/original.txt", "/tmp/hardlink.txt");
symlink("/tmp/original.txt", "/tmp/symlink.txt");
struct stat st;
stat("/tmp/original.txt", &st);
printf("original inode: %lu, nlink: %lu\n",
(unsigned long)st.st_ino, (unsigned long)st.st_nlink);
stat("/tmp/hardlink.txt", &st);
printf("hardlink inode: %lu (same!)\n", (unsigned long)st.st_ino);
lstat("/tmp/symlink.txt", &st);
printf("symlink inode: %lu (different)\n", (unsigned long)st.st_ino);
char target[256];
ssize_t n = readlink("/tmp/symlink.txt", target, sizeof(target) - 1);
if (n != -1) {
target[n] = '\0';
printf("symlink target: %s\n", target);
}
unlink("/tmp/hardlink.txt");
unlink("/tmp/symlink.txt");
unlink("/tmp/original.txt");
return 0;
}
$ gcc -Wall -o links_demo links_demo.c && ./links_demo
original inode: 2359301, nlink: 2
hardlink inode: 2359301 (same!)
symlink inode: 2359447 (different)
symlink target: /tmp/original.txt
In Rust:
// links_demo.rs use std::fs; use std::os::unix::fs as unix_fs; use std::os::unix::fs::MetadataExt; fn main() -> std::io::Result<()> { fs::write("/tmp/original_rs.txt", "original content\n")?; fs::hard_link("/tmp/original_rs.txt", "/tmp/hardlink_rs.txt")?; unix_fs::symlink("/tmp/original_rs.txt", "/tmp/symlink_rs.txt")?; let orig = fs::metadata("/tmp/original_rs.txt")?; let hard = fs::metadata("/tmp/hardlink_rs.txt")?; let sym = fs::symlink_metadata("/tmp/symlink_rs.txt")?; println!("original inode: {}, nlink: {}", orig.ino(), orig.nlink()); println!("hardlink inode: {} (same!)", hard.ino()); println!("symlink inode: {} (different)", sym.ino()); let target = fs::read_link("/tmp/symlink_rs.txt")?; println!("symlink target: {}", target.display()); fs::remove_file("/tmp/hardlink_rs.txt")?; fs::remove_file("/tmp/symlink_rs.txt")?; fs::remove_file("/tmp/original_rs.txt")?; Ok(()) }
Working with Paths in Rust
Rust provides Path and PathBuf for safe path manipulation:
// path_demo.rs use std::path::{Path, PathBuf}; fn main() { let p = Path::new("/home/user/documents/report.txt"); println!("file name: {:?}", p.file_name()); println!("stem: {:?}", p.file_stem()); println!("extension: {:?}", p.extension()); println!("parent: {:?}", p.parent()); println!("is absolute: {}", p.is_absolute()); let mut pb = PathBuf::from("/home/user"); pb.push("documents"); pb.push("report.txt"); println!("built path: {}", pb.display()); let full = Path::new("/var/log").join("syslog"); println!("joined: {}", full.display()); }
$ rustc path_demo.rs && ./path_demo
file name: Some("report.txt")
stem: Some("report")
extension: Some("txt")
parent: Some("/home/user/documents")
is absolute: true
built path: /home/user/documents/report.txt
joined: /var/log/syslog
Try It: Write a program (C or Rust) that recursively walks a directory tree, printing each file's path and size. In C, you will need a recursive function that calls
opendir/readdir/stat. In Rust, consider writing a recursive function or use thewalkdircrate.
Quick Knowledge Check
-
What is the difference between
stat()andlstat()when called on a symbolic link? -
What does
st_nlinkequal for a regular file with no extra hard links? -
Why does
rmdir()fail on a non-empty directory, and what must you do to remove a full directory tree in C?
Common Pitfalls
-
Using
stat()on symlinks when you meantlstat(). You silently get the target's metadata. -
Buffer overflow in path construction. In C, building paths with
sprintfwithout length checks is a classic vulnerability. Usesnprintf. -
TOCTOU races. Checking a file's existence with
stat()and then opening it is a race condition. Another process can change the file between your check and your open. UseO_CREAT | O_EXCLfor atomic creation. -
Forgetting
closedir(). Leaks a file descriptor just like forgettingclose(). -
Assuming
d_typeinstruct dirent. Not all filesystems populated_type. Always fall back tostat()ifd_type == DT_UNKNOWN. -
Hard-linking across filesystems. Hard links only work within a single filesystem. Use symbolic links for cross-filesystem references.
Memory-Mapped I/O
Instead of copying data between kernel and user space with read() and
write(), you can map a file directly into your process's address space.
The kernel handles paging data in and out transparently. This is mmap() --
one of the most powerful system calls on Linux.
How mmap Works
Traditional I/O:
User space Kernel space Disk
+--------+ +------------+ +------+
| buffer | <--copy-- | page cache | <--DMA-- | file |
+--------+ +------------+ +------+
read() copies data from kernel to user buffer
Memory-mapped I/O:
User space
+--------+
| mapped | <-- page fault --> kernel loads page from disk
| region | directly into this address range
+--------+
No copy -- your pointer IS the data
When you access a mapped page for the first time, a page fault occurs. The kernel loads the data from disk into a physical page and maps it into your address space. Subsequent accesses hit that page directly -- no syscall overhead at all.
mmap in C
/* mmap_read.c -- read a file via mmap */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
return 1;
}
int fd = open(argv[1], O_RDONLY);
if (fd == -1) { perror("open"); return 1; }
struct stat st;
if (fstat(fd, &st) == -1) { perror("fstat"); close(fd); return 1; }
if (st.st_size == 0) {
printf("(empty file)\n");
close(fd);
return 0;
}
void *addr = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
close(fd);
return 1;
}
/* We can close fd now -- the mapping keeps the file open internally */
close(fd);
const char *data = (const char *)addr;
printf("first 80 chars:\n");
size_t len = (size_t)st.st_size < 80 ? (size_t)st.st_size : 80;
fwrite(data, 1, len, stdout);
printf("\n");
size_t lines = 0;
for (off_t i = 0; i < st.st_size; i++) {
if (data[i] == '\n') lines++;
}
printf("total lines: %zu\n", lines);
munmap(addr, st.st_size);
return 0;
}
$ gcc -Wall -o mmap_read mmap_read.c && ./mmap_read /etc/passwd
first 80 chars:
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/n
total lines: 42
The mmap Arguments
void *mmap(
void *addr, /* suggested address (NULL = let kernel choose) */
size_t length, /* how many bytes to map */
int prot, /* protection: PROT_READ, PROT_WRITE, PROT_EXEC */
int flags, /* MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS, ... */
int fd, /* file descriptor (-1 with MAP_ANONYMOUS) */
off_t offset /* offset within file (must be page-aligned) */
);
| Flag | Meaning |
|---|---|
MAP_PRIVATE | Copy-on-write: writes go to private copy, not file |
MAP_SHARED | Writes go through to the file (visible to others) |
MAP_ANONYMOUS | No file backing; memory initialized to zero |
MAP_FIXED | Use exact address (dangerous if misused) |
| Protection | Meaning |
|---|---|
PROT_READ | Pages can be read |
PROT_WRITE | Pages can be written |
PROT_EXEC | Pages can be executed |
PROT_NONE | No access (guard pages) |
Caution:
MAP_FIXEDwill silently overwrite any existing mapping at that address, including your heap or stack. Almost never use it in application code.
MAP_SHARED vs MAP_PRIVATE
MAP_PRIVATE (copy-on-write):
Process A Process B
+--------+ +--------+
| page 1 |--+ +--| page 1 | Both point to same physical pages
| page 2 |--+--+--| page 2 | (read-only until a write)
+--------+ +--------+
When A writes to page 1:
+--------+ +--------+
| page 1'| (new) | page 1 | A gets a private copy
| page 2 |--+--+--| page 2 | page 2 still shared
+--------+ +--------+
MAP_SHARED:
Process A Process B
+--------+ +--------+
| page 1 |--+--+--| page 1 | Same physical pages, writable
| page 2 |--+--+--| page 2 | Writes by A visible to B
+--------+ +--------+
Writing with mmap
/* mmap_write.c -- modify a file via mmap */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
int main(void)
{
const char *path = "/tmp/mmap_write_demo.txt";
int fd = open(path, O_RDWR | O_CREAT | O_TRUNC, 0644);
if (fd == -1) { perror("open"); return 1; }
const char *initial = "Hello, World! This is memory-mapped.\n";
size_t len = strlen(initial);
write(fd, initial, len);
void *addr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); close(fd); return 1; }
close(fd);
char *data = (char *)addr;
printf("before: %s", data);
memcpy(data, "HOWDY", 5);
printf("after: %s", data);
/* Ensure changes reach disk */
msync(addr, len, MS_SYNC);
munmap(addr, len);
/* Verify by reading normally */
fd = open(path, O_RDONLY);
char buf[128];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("verify: %s", buf);
close(fd);
return 0;
}
$ gcc -Wall -o mmap_write mmap_write.c && ./mmap_write
before: Hello, World! This is memory-mapped.
after: HOWDY, World! This is memory-mapped.
verify: HOWDY, World! This is memory-mapped.
Caution: You cannot extend a file by writing past its end via mmap. The mapping size is fixed at
mmap()time. To grow a file, useftruncate()first, then remap.
msync: Flushing to Disk
msync() ensures that modifications to a MAP_SHARED mapping are written
back to the underlying file on disk.
| Flag | Meaning |
|---|---|
MS_SYNC | Block until write is complete |
MS_ASYNC | Initiate write, return immediately |
MS_INVALIDATE | Invalidate other mappings (force re-read) |
Without msync, the kernel will eventually flush dirty pages, but the
timing is unpredictable. For data integrity, always msync before considering
data durable.
Anonymous mmap: Shared Memory Without a File
MAP_ANONYMOUS creates a mapping not backed by any file. The memory is
initialized to zero. Combined with MAP_SHARED, it survives across fork()
and allows parent-child communication.
/* anon_mmap.c -- shared memory between parent and child */
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void)
{
int *shared = mmap(NULL, sizeof(int),
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS,
-1, 0);
if (shared == MAP_FAILED) { perror("mmap"); return 1; }
*shared = 0;
pid_t pid = fork();
if (pid == -1) { perror("fork"); return 1; }
if (pid == 0) {
*shared = 42;
printf("child set *shared = %d\n", *shared);
_exit(0);
}
waitpid(pid, NULL, 0);
printf("parent reads *shared = %d\n", *shared);
munmap(shared, sizeof(int));
return 0;
}
$ gcc -Wall -o anon_mmap anon_mmap.c && ./anon_mmap
child set *shared = 42
parent reads *shared = 42
Driver Prep: Kernel drivers often use
remap_pfn_range()to map device memory or DMA buffers into user space. The user-space side callsmmap()on the device file. UnderstandingMAP_SHAREDhere is essential preparation.
Large File Processing with mmap and madvise
mmap is ideal for processing large files. The kernel pages data in on demand and can evict pages under memory pressure.
/* mmap_large.c -- count bytes in a large file via mmap */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
return 1;
}
int fd = open(argv[1], O_RDONLY);
if (fd == -1) { perror("open"); return 1; }
struct stat st;
fstat(fd, &st);
if (st.st_size == 0) { printf("empty file\n"); close(fd); return 0; }
const char *data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (data == MAP_FAILED) { perror("mmap"); close(fd); return 1; }
close(fd);
/* Advise kernel we will read sequentially */
madvise((void *)data, st.st_size, MADV_SEQUENTIAL);
unsigned char target = 'e';
size_t count = 0;
for (off_t i = 0; i < st.st_size; i++) {
if ((unsigned char)data[i] == target) count++;
}
printf("'%c' appears %zu times in %s (%ld bytes)\n",
target, count, argv[1], (long)st.st_size);
munmap((void *)data, st.st_size);
return 0;
}
madvise() hints to the kernel how you plan to access the data:
| Hint | Meaning |
|---|---|
MADV_SEQUENTIAL | Will read sequentially; prefetch aggressively |
MADV_RANDOM | Will read randomly; do not prefetch |
MADV_WILLNEED | Will need these pages soon; start loading |
MADV_DONTNEED | Done with these pages; can be reclaimed |
Try It: Map
/usr/share/dict/words(if available) and count how many words start with the letter 'z'. Compare the speed againstread()in a loop.
Rust: The memmap2 Crate
The Rust standard library does not include mmap. The memmap2 crate provides
a safe wrapper. Add to Cargo.toml:
[dependencies]
memmap2 = "0.9"
// mmap_read.rs -- read a file via mmap in Rust use memmap2::Mmap; use std::fs::File; fn main() -> std::io::Result<()> { let path = std::env::args().nth(1).expect("usage: mmap_read <file>"); let file = File::open(&path)?; let mmap = unsafe { Mmap::map(&file)? }; // mmap implements Deref<Target=[u8]>, so we can use it as a byte slice println!("file size: {} bytes", mmap.len()); let preview = std::cmp::min(80, mmap.len()); let text = String::from_utf8_lossy(&mmap[..preview]); println!("first {} bytes:\n{}", preview, text); let lines = mmap.iter().filter(|&&b| b == b'\n').count(); println!("total lines: {}", lines); Ok(()) // mmap is automatically unmapped when dropped }
Rust Note:
Mmap::map()isunsafebecause the file could be modified by another process or truncated while you hold the mapping, causing undefined behavior (SIGBUS). This is the same risk as in C -- mmap is inherently a shared-memory interface.
Writable mmap in Rust
// mmap_write.rs -- modify a file via mmap in Rust use memmap2::MmapMut; use std::fs::OpenOptions; fn main() -> std::io::Result<()> { let path = "/tmp/mmap_write_rs.txt"; std::fs::write(path, b"Hello, World! Memory-mapped Rust.\n")?; let file = OpenOptions::new().read(true).write(true).open(path)?; let mut mmap = unsafe { MmapMut::map_mut(&file)? }; println!("before: {}", String::from_utf8_lossy(&mmap[..])); mmap[..5].copy_from_slice(b"HOWDY"); mmap.flush()?; println!("after: {}", String::from_utf8_lossy(&mmap[..])); let contents = std::fs::read_to_string(path)?; print!("verify: {}", contents); Ok(()) }
mprotect: Guard Pages
mprotect() changes the protection on an existing mapping. One use case is
guard pages -- regions marked PROT_NONE that cause a segfault on access,
used to detect stack overflows or buffer overruns.
/* guard_page.c -- use mprotect to create a guard page */
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
static void handler(int sig, siginfo_t *info, void *ctx)
{
(void)ctx;
printf("caught %s at address %p\n",
sig == SIGSEGV ? "SIGSEGV" : "SIGBUS",
info->si_addr);
_exit(1);
}
int main(void)
{
long page_size = sysconf(_SC_PAGESIZE);
void *region = mmap(NULL, 2 * page_size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (region == MAP_FAILED) { perror("mmap"); return 1; }
void *guard = (char *)region + page_size;
if (mprotect(guard, page_size, PROT_NONE) == -1) {
perror("mprotect");
return 1;
}
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction = handler;
sa.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &sa, NULL);
char *usable = (char *)region;
usable[0] = 'A';
printf("wrote to usable page OK\n");
printf("about to touch guard page...\n");
char *bad = (char *)guard;
bad[0] = 'B'; /* triggers SIGSEGV */
munmap(region, 2 * page_size);
return 0;
}
$ gcc -Wall -o guard_page guard_page.c && ./guard_page
wrote to usable page OK
about to touch guard page...
caught SIGSEGV at address 0x7f8a12341000
Memory layout:
+-------------------+-------------------+
| usable page | guard page |
| PROT_READ | | PROT_NONE |
| PROT_WRITE | (any access = |
| | SIGSEGV) |
+-------------------+-------------------+
^ ^
region region + page_size
mmap for Device Register Access (Preview)
In embedded and driver work, hardware registers live at fixed physical
addresses. User-space programs can access them by mapping /dev/mem:
/* devmem_preview.c -- concept only, requires root */
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdint.h>
int main(void)
{
off_t phys_addr = 0xFE200000; /* hypothetical GPIO base */
size_t page_size = sysconf(_SC_PAGESIZE);
int fd = open("/dev/mem", O_RDWR | O_SYNC);
if (fd == -1) { perror("open /dev/mem (need root)"); return 1; }
void *map = mmap(NULL, page_size,
PROT_READ | PROT_WRITE, MAP_SHARED,
fd, phys_addr);
if (map == MAP_FAILED) { perror("mmap"); close(fd); return 1; }
volatile uint32_t *regs = (volatile uint32_t *)map;
uint32_t val = regs[0];
printf("register at 0x%lx = 0x%08x\n", (long)phys_addr, val);
munmap(map, page_size);
close(fd);
return 0;
}
Caution: Writing to the wrong physical address via
/dev/memcan crash your system, corrupt data, or damage hardware. Production systems use proper kernel drivers withrequest_mem_region()andioremap().
The volatile keyword is critical. Without it, the compiler may optimize
away reads and writes to hardware registers. Hardware registers are
side-effectful -- reading a status register may clear an interrupt flag.
Driver Prep: Device drivers map hardware registers into user space via the driver's
mmapfile_operation, which callsremap_pfn_range(). The user-space pattern is always: open device fd, mmap it, read/write through pointers.
Comparing I/O Methods
Method Copies Syscalls/access Best for
----------- ------ --------------- --------
read()/write() 1-2 1 per call Small files, streaming
stdio (fread) 2 ~1 per buffer General purpose
mmap 0 0 (after fault) Large files, random access,
shared memory, device regs
mmap wins on: zero-copy access, automatic caching, cheap random access, and inter-process shared memory. It loses on: overhead for small files (minimum one page), harder error handling (SIGBUS), inability to grow without remapping, and inability to work with pipes, sockets, or non-seekable files.
Quick Knowledge Check
-
What is the difference between
MAP_SHAREDandMAP_PRIVATEwhen you write to a mapped page? -
Why must you call
msync()if you need to guarantee data has reached disk? -
What signal does the kernel deliver if you access an mmap'd region after the file has been truncated shorter than the mapping?
Common Pitfalls
-
Mapping a zero-length file.
mmapwith length 0 returns an error. Always checkst_sizebefore mapping. -
Forgetting
munmap(). Leaked mappings consume virtual address space. In long-running processes this eventually causesmmapto fail. -
Ignoring SIGBUS. If the file is truncated while mapped, accessing beyond the new end delivers SIGBUS, not SIGSEGV.
-
Using
MAP_FIXEDcasually. It silently overwrites existing mappings. -
Writing past the mapping size. The mapping covers exactly the bytes you requested. Writing beyond it is a segfault.
-
Missing
volatileon device registers. The compiler will optimize away your hardware accesses without it. -
Forgetting
O_SYNCfor device memory. Without it, the kernel may use caching that reorders stores to device registers.
Creating Processes: fork, exec, wait
Every program you run from a shell starts as a clone. The kernel duplicates the running process, then the clone replaces itself with a new program. This fork-exec pattern is the foundation of Unix process creation, and understanding it is non-negotiable for systems work.
fork(): Duplicating a Process
fork() creates an almost-exact copy of the calling process. The parent gets
the child's PID as the return value; the child gets zero.
/* fork_basic.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main(void)
{
printf("Before fork: PID = %d\n", getpid());
pid_t pid = fork();
if (pid < 0) {
perror("fork");
return 1;
}
if (pid == 0) {
/* Child process */
printf("Child: PID = %d, parent PID = %d\n", getpid(), getppid());
} else {
/* Parent process */
printf("Parent: PID = %d, child PID = %d\n", getpid(), pid);
}
return 0;
}
Compile and run:
$ gcc -o fork_basic fork_basic.c
$ ./fork_basic
Before fork: PID = 1234
Parent: PID = 1234, child PID = 1235
Child: PID = 1235, parent PID = 1234
The "Before fork" line prints once. After fork(), two processes execute the
same code. The return value tells each process which role it plays.
fork()
|
+-----+-----+
| |
Parent Child
pid > 0 pid == 0
(original) (copy)
Caution: After
fork(), the child inherits copies of all open file descriptors. If both parent and child write to the same fd without coordination, output will interleave unpredictably.
What the Child Inherits
The child gets copies of:
- Memory (stack, heap, data, text segments -- copy-on-write)
- Open file descriptors
- Signal dispositions
- Environment variables
- Current working directory
- umask
The child gets its own:
- PID
- Parent PID (set to the forking process)
- Pending signals (cleared)
- File locks (not inherited)
Parent Memory Child Memory (after fork)
+------------------+ +------------------+
| text (shared) | | text (shared) |
| data | | data (COW copy) |
| heap | | heap (COW copy) |
| stack | | stack (COW copy) |
| fd table [0,1,2] | | fd table [0,1,2] |
+------------------+ +------------------+
\ /
\-----> kernel <-----/
(same open file descriptions)
Try It: Add a variable
int x = 42;beforefork(). In the child, setx = 99;and print it. In the parent, sleep one second, then printx. Confirm the parent still sees 42.
exec(): Replacing the Process Image
fork() gives you a clone. exec() replaces that clone with a different
program entirely. The exec family includes: execl, execlp, execle,
execv, execvp, execvpe. The differences are how you pass arguments and
whether PATH is searched.
/* exec_basic.c */
#include <stdio.h>
#include <unistd.h>
int main(void)
{
printf("About to exec 'ls -la /tmp'\n");
/* execlp searches PATH for the binary */
execlp("ls", "ls", "-la", "/tmp", (char *)NULL);
/* If exec returns, it failed */
perror("execlp");
return 1;
}
After a successful exec(), the calling process's code, data, and stack are
replaced. The PID stays the same. Open file descriptors without FD_CLOEXEC
remain open.
Caution: If
exec()returns at all, it has failed. Always follow anexec()call with error handling.
The fork-exec Pattern
This is the standard Unix way to run a new program:
/* fork_exec.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void)
{
pid_t pid = fork();
if (pid < 0) {
perror("fork");
return 1;
}
if (pid == 0) {
/* Child: replace self with 'date' */
execlp("date", "date", "+%Y-%m-%d %H:%M:%S", (char *)NULL);
perror("execlp");
_exit(127); /* Use _exit in child after failed exec */
}
/* Parent: wait for child to finish */
int status;
waitpid(pid, &status, 0);
if (WIFEXITED(status)) {
printf("Child exited with status %d\n", WEXITSTATUS(status));
}
return 0;
}
Caution: In the child after a failed
exec(), use_exit()instead ofexit(). Theexit()function flushes stdio buffers -- which are copies from the parent. This can cause duplicated output.
wait() and waitpid(): Reaping Children
When a child process terminates, the kernel keeps a small record of its exit status until the parent retrieves it. Until then, the child is a zombie.
/* wait_status.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void)
{
pid_t pid = fork();
if (pid < 0) {
perror("fork");
return 1;
}
if (pid == 0) {
printf("Child running, PID = %d\n", getpid());
exit(42);
}
int status;
pid_t waited = waitpid(pid, &status, 0);
if (waited < 0) {
perror("waitpid");
return 1;
}
if (WIFEXITED(status)) {
printf("Child %d exited normally, status = %d\n",
waited, WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
printf("Child %d killed by signal %d\n",
waited, WTERMSIG(status));
} else if (WIFSTOPPED(status)) {
printf("Child %d stopped by signal %d\n",
waited, WSTOPSIG(status));
}
return 0;
}
The status macros decode the packed integer:
| Macro | Meaning |
|---|---|
WIFEXITED(s) | True if child exited normally |
WEXITSTATUS(s) | Exit code (0-255) |
WIFSIGNALED(s) | True if killed by signal |
WTERMSIG(s) | Signal that killed it |
WIFSTOPPED(s) | True if stopped (traced) |
WSTOPSIG(s) | Signal that stopped it |
The Zombie Problem
A zombie is a process that has exited but whose parent has not called wait().
It occupies a slot in the process table.
/* zombie.c -- creates a zombie for 30 seconds */
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main(void)
{
pid_t pid = fork();
if (pid == 0) {
printf("Child exiting immediately\n");
exit(0);
}
printf("Parent sleeping 30s -- child %d is a zombie\n", pid);
printf("Run: ps aux | grep Z\n");
sleep(30);
/* Never reaps the child -- zombie persists until parent exits */
return 0;
}
Try It: Run the zombie program. In another terminal, run
ps aux | grep Zto see the zombie entry. Note theZ+in the STAT column.
To avoid zombies in long-running servers, either:
- Call
waitpid()periodically or afterSIGCHLD. - Set
SIGCHLDtoSIG_IGN(Linux-specific: auto-reaps children). - Double-fork (child forks again, middle process exits immediately).
Rust: std::process::Command
Rust's standard library wraps fork-exec-wait into a safe, ergonomic API.
// command_basic.rs use std::process::Command; fn main() { let output = Command::new("date") .arg("+%Y-%m-%d %H:%M:%S") .output() .expect("failed to execute 'date'"); println!("stdout: {}", String::from_utf8_lossy(&output.stdout)); println!("status: {}", output.status); }
Command::new does not fork immediately. Calling .output() forks, execs, and
waits, returning all three streams and the exit status. For streaming output:
// command_stream.rs use std::process::{Command, Stdio}; use std::io::{BufRead, BufReader}; fn main() { let mut child = Command::new("ls") .arg("-la") .arg("/tmp") .stdout(Stdio::piped()) .spawn() .expect("failed to spawn"); if let Some(stdout) = child.stdout.take() { let reader = BufReader::new(stdout); for line in reader.lines() { let line = line.expect("read error"); println!("LINE: {}", line); } } let status = child.wait().expect("wait failed"); println!("Exit status: {}", status); }
Rust Note:
Commandhandles the fork-exec-wait dance, fd cleanup, and error propagation. You never touch raw PIDs. The child is automatically waited on when theChildhandle is dropped (though the drop does not block -- it just detaches).
Rust: Raw fork with the nix Crate
When you need the full power of fork(), the nix crate provides it:
// fork_nix.rs // Cargo.toml: nix = { version = "0.29", features = ["process", "signal"] } use nix::unistd::{fork, ForkResult, getpid, getppid}; use nix::sys::wait::waitpid; use std::process::exit; fn main() { println!("Before fork: PID = {}", getpid()); match unsafe { fork() }.expect("fork failed") { ForkResult::Parent { child } => { println!("Parent: PID = {}, child = {}", getpid(), child); let status = waitpid(child, None).expect("waitpid failed"); println!("Child exited: {:?}", status); } ForkResult::Child => { println!("Child: PID = {}, parent = {}", getpid(), getppid()); exit(0); } } }
Rust Note:
fork()isunsafein Rust because it duplicates the process including all threads' memory but only the calling thread. This can leave mutexes in a locked state with no thread to unlock them. PreferCommandunless you need pre-exec setup.
A Minimal Shell in 50 Lines
Putting it all together -- a shell that reads commands and runs them:
/* minishell.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>
#define MAX_ARGS 64
#define MAX_LINE 1024
int main(void)
{
char line[MAX_LINE];
char *args[MAX_ARGS];
for (;;) {
printf("mini$ ");
fflush(stdout);
if (!fgets(line, sizeof(line), stdin))
break;
/* Strip newline */
line[strcspn(line, "\n")] = '\0';
if (line[0] == '\0')
continue;
/* Built-in: exit */
if (strcmp(line, "exit") == 0)
break;
/* Tokenize */
int argc = 0;
char *tok = strtok(line, " \t");
while (tok && argc < MAX_ARGS - 1) {
args[argc++] = tok;
tok = strtok(NULL, " \t");
}
args[argc] = NULL;
/* Fork-exec */
pid_t pid = fork();
if (pid < 0) {
perror("fork");
continue;
}
if (pid == 0) {
execvp(args[0], args);
perror(args[0]);
_exit(127);
}
int status;
waitpid(pid, &status, 0);
if (WIFEXITED(status) && WEXITSTATUS(status) != 0)
printf("[exit %d]\n", WEXITSTATUS(status));
}
printf("\n");
return 0;
}
Try It: Extend the mini shell to support the
cdbuilt-in command. Hint:cdmust be handled by the parent process sincechdir()in a child only affects the child.
Driver Prep: Kernel modules do not use fork/exec -- the kernel spawns kernel threads with
kthread_create(). But user-space driver helpers, udev rules, and firmware loaders all rely on fork-exec. Understanding this pattern is essential for writing device manager daemons.
Knowledge Check
-
What value does
fork()return to the child process? What does the parent receive? -
Why should you call
_exit()instead ofexit()in a child process after a failedexec()? -
What is a zombie process, and how do you prevent zombies in a long-running server?
Common Pitfalls
-
Forgetting to wait: Every
fork()needs a correspondingwait()orSIGCHLDhandler. Otherwise: zombies. -
Using
exit()after failed exec in child: Flushes the parent's buffered stdio. Use_exit(). -
Assuming execution order: After
fork(), the scheduler decides who runs first. Do not assume parent runs before child or vice versa. -
Fork bombs: A loop that calls
fork()unconditionally will exhaust the process table. Always guard fork with proper termination logic. -
Ignoring exec failure: If
exec()returns, it failed. Handle it. -
Sharing file descriptors carelessly: Both parent and child share the same open file descriptions. Close fds you do not need in each process.
Process Groups, Sessions, and Daemons
Unix organizes processes into groups and sessions. This hierarchy controls which processes receive signals from the terminal, how job control works, and how daemons detach from everything. If you write a server, a driver helper, or anything that outlives a login session, you need this.
The Process Hierarchy
Session (SID)
|
+-- Process Group (PGID) -- foreground job
| +-- Process (PID)
| +-- Process (PID)
|
+-- Process Group (PGID) -- background job
| +-- Process (PID)
|
+-- Process Group (PGID) -- background job
+-- Process (PID)
+-- Process (PID)
A session is a collection of process groups, typically one per login. A
process group is a collection of processes, typically one per pipeline. The
session leader is the process that called setsid() -- usually the shell.
Process Groups
Every process belongs to a process group. The group is identified by the PID of its leader.
/* pgid_demo.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void)
{
printf("Parent: PID=%d PGID=%d SID=%d\n",
getpid(), getpgrp(), getsid(0));
pid_t pid = fork();
if (pid == 0) {
printf("Child before setpgid: PID=%d PGID=%d\n",
getpid(), getpgrp());
/* Put child in its own process group */
setpgid(0, 0);
printf("Child after setpgid: PID=%d PGID=%d\n",
getpid(), getpgrp());
_exit(0);
}
waitpid(pid, NULL, 0);
return 0;
}
$ gcc -o pgid_demo pgid_demo.c && ./pgid_demo
Parent: PID=5000 PGID=5000 SID=4900
Child before setpgid: PID=5001 PGID=5000
Child after setpgid: PID=5001 PGID=5001
setpgid(0, 0) means "set my PGID to my own PID" -- making the calling process
a new group leader.
Why Process Groups Matter
When you press Ctrl+C in a terminal, the kernel sends SIGINT to the entire
foreground process group, not just one process. A pipeline like
cat file | grep pattern | wc -l runs as three processes in one group, so
Ctrl+C kills them all.
Terminal (controlling terminal)
|
| SIGINT (Ctrl+C)
v
Foreground Process Group
+-- cat (receives SIGINT)
+-- grep (receives SIGINT)
+-- wc (receives SIGINT)
Sessions and Controlling Terminals
A session is created by setsid(). The calling process becomes the session
leader and is disconnected from any controlling terminal.
/* session_demo.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void)
{
printf("Original: PID=%d PGID=%d SID=%d\n",
getpid(), getpgrp(), getsid(0));
pid_t pid = fork();
if (pid == 0) {
/* setsid() fails if caller is already a group leader,
so we fork first to guarantee we are not */
pid_t new_sid = setsid();
if (new_sid < 0) {
perror("setsid");
_exit(1);
}
printf("Child: PID=%d PGID=%d SID=%d\n",
getpid(), getpgrp(), getsid(0));
/* Now PID == PGID == SID -- session leader */
_exit(0);
}
waitpid(pid, NULL, 0);
return 0;
}
Caution:
setsid()fails if the calling process is already a process group leader (PID == PGID). The standard trick: fork first, then callsetsid()in the child.
Job Control Basics
The shell manages foreground and background jobs by manipulating process groups and the terminal's foreground group.
| Action | Shell command | What happens |
|---|---|---|
| Run foreground | ./prog | Shell sets prog's PGID as terminal foreground group |
| Run background | ./prog & | Shell keeps its own PGID as foreground group |
| Suspend | Ctrl+Z | Kernel sends SIGTSTP to foreground group |
| Resume foreground | fg | Shell calls tcsetpgrp() + sends SIGCONT |
| Resume background | bg | Shell sends SIGCONT without changing foreground group |
/* fg_group.c -- show foreground process group */
#include <stdio.h>
#include <unistd.h>
int main(void)
{
pid_t fg = tcgetpgrp(STDIN_FILENO);
printf("Foreground PGID: %d\n", fg);
printf("My PID: %d\n", getpid());
printf("My PGID: %d\n", getpgrp());
return 0;
}
Try It: Run
fg_groupnormally, then run it with./fg_group &. Compare the foreground PGID to your PGID in each case.
The Classic Daemon Recipe
A daemon is a process that runs in the background, detached from any terminal. The traditional recipe:
/* daemon_classic.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <signal.h>
static void daemonize(void)
{
/* Step 1: Fork and let parent exit */
pid_t pid = fork();
if (pid < 0) { perror("fork"); exit(1); }
if (pid > 0) _exit(0); /* Parent exits */
/* Step 2: Create new session */
if (setsid() < 0) { perror("setsid"); exit(1); }
/* Step 3: Ignore SIGHUP, fork again to prevent
acquiring a controlling terminal */
signal(SIGHUP, SIG_IGN);
pid = fork();
if (pid < 0) { perror("fork"); exit(1); }
if (pid > 0) _exit(0); /* First child exits */
/* Step 4: Change working directory */
chdir("/");
/* Step 5: Reset umask */
umask(0);
/* Step 6: Close all open file descriptors */
for (int fd = sysconf(_SC_OPEN_MAX); fd >= 0; fd--)
close(fd);
/* Step 7: Redirect stdin/stdout/stderr to /dev/null */
open("/dev/null", O_RDWR); /* stdin = fd 0 */
dup(0); /* stdout = fd 1 */
dup(0); /* stderr = fd 2 */
}
int main(void)
{
daemonize();
/* Daemon work loop */
FILE *log = fopen("/tmp/daemon_demo.log", "a");
if (!log) _exit(1);
for (int i = 0; i < 10; i++) {
fprintf(log, "Daemon tick %d, PID=%d\n", i, getpid());
fflush(log);
sleep(2);
}
fclose(log);
return 0;
}
The double-fork pattern:
Shell
|
+-- fork() --> Parent exits (shell gets exit status)
|
+-- setsid() --> New session, no controlling terminal
|
+-- fork() --> First child exits
|
+-- Daemon (not session leader,
cannot acquire controlling terminal)
Caution: The double fork is critical. A session leader that opens a terminal device can acquire it as a controlling terminal. The second fork ensures the daemon is not a session leader.
The Modern Way: systemd
On modern Linux systems, systemd manages daemons. You do not need to daemonize manually. Instead, write a simple foreground program and let systemd handle it.
A systemd service file:
# /etc/systemd/system/myservice.service
[Unit]
Description=My Demo Service
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/myservice
Restart=on-failure
User=nobody
[Install]
WantedBy=multi-user.target
Your program just runs in the foreground:
/* myservice.c -- systemd-friendly daemon */
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
static volatile sig_atomic_t running = 1;
static void handle_term(int sig)
{
(void)sig;
running = 0;
}
int main(void)
{
signal(SIGTERM, handle_term);
while (running) {
printf("Service tick, PID=%d\n", getpid());
fflush(stdout);
sleep(5);
}
printf("Service shutting down\n");
return 0;
}
systemd captures stdout to the journal. No syslog gymnastics needed.
Driver Prep: Kernel drivers do not daemonize -- they are loaded into kernel space. But user-space driver companions (firmware loaders, device managers, monitoring daemons) absolutely do. The
udevddaemon is a perfect example: it manages device nodes and runs in the background from boot.
Rust: Daemon Patterns with nix
// daemon_nix.rs // Cargo.toml: nix = { version = "0.29", features = ["process", "signal", "fs"] } use nix::unistd::{fork, ForkResult, setsid, chdir, close, dup2}; use nix::sys::stat::umask; use nix::sys::stat::Mode; use std::fs::OpenOptions; use std::os::unix::io::AsRawFd; use std::process::exit; use std::io::Write; use std::thread; use std::time::Duration; fn daemonize() { // First fork match unsafe { fork() }.expect("first fork failed") { ForkResult::Parent { .. } => exit(0), ForkResult::Child => {} } // New session setsid().expect("setsid failed"); // Second fork match unsafe { fork() }.expect("second fork failed") { ForkResult::Parent { .. } => exit(0), ForkResult::Child => {} } // Change directory chdir("/").expect("chdir failed"); // Reset umask umask(Mode::empty()); // Redirect std fds to /dev/null let devnull = OpenOptions::new() .read(true) .write(true) .open("/dev/null") .expect("open /dev/null"); let fd = devnull.as_raw_fd(); dup2(fd, 0).ok(); dup2(fd, 1).ok(); dup2(fd, 2).ok(); if fd > 2 { close(fd).ok(); } } fn main() { daemonize(); let mut log = OpenOptions::new() .create(true) .append(true) .open("/tmp/rust_daemon.log") .expect("open log"); for i in 0..10 { writeln!(log, "Rust daemon tick {}, PID={}", i, std::process::id()) .expect("write log"); log.flush().expect("flush"); thread::sleep(Duration::from_secs(2)); } }
Rust Note: In practice, most Rust services run as simple foreground processes under systemd. The
daemonizecrate exists for cases where you truly need the classic pattern, but it is increasingly rare.
Rust: Process Groups with nix
// pgid_nix.rs // Cargo.toml: nix = { version = "0.29", features = ["process"] } use nix::unistd::{fork, ForkResult, getpid, getpgrp, setpgid, Pid}; use nix::sys::wait::waitpid; use std::process::exit; fn main() { println!("Parent: PID={} PGID={}", getpid(), getpgrp()); match unsafe { fork() }.expect("fork failed") { ForkResult::Parent { child } => { waitpid(child, None).expect("waitpid"); } ForkResult::Child => { println!("Child before: PID={} PGID={}", getpid(), getpgrp()); setpgid(Pid::from_raw(0), Pid::from_raw(0)) .expect("setpgid"); println!("Child after: PID={} PGID={}", getpid(), getpgrp()); exit(0); } } }
A Process Hierarchy Inspector
This utility prints the session, process group, and parent for the current process:
/* proc_info.c */
#include <stdio.h>
#include <unistd.h>
int main(void)
{
printf("PID: %d\n", getpid());
printf("Parent PID: %d\n", getppid());
printf("Process Group ID: %d\n", getpgrp());
printf("Session ID: %d\n", getsid(0));
printf("Foreground PGID: %d\n", tcgetpgrp(STDIN_FILENO));
printf("Is session leader: %s\n",
getpid() == getsid(0) ? "yes" : "no");
printf("Is group leader: %s\n",
getpid() == getpgrp() ? "yes" : "no");
return 0;
}
Try It: Run
proc_infofrom a shell. Then runcat | ./proc_info(pipe into it). Compare the Session ID and Foreground PGID. Why does the PGID change in the piped case?
Sending Signals to Process Groups
The kill() system call with a negative PID sends a signal to an entire
process group:
/* kill_group.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
int main(void)
{
/* Create 3 children in the same new process group */
pid_t first_child = 0;
for (int i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) {
if (i == 0) setpgid(0, 0);
else setpgid(0, first_child);
printf("Child %d: PID=%d PGID=%d\n", i, getpid(), getpgrp());
pause(); /* Wait for signal */
_exit(0);
}
if (i == 0) first_child = pid;
setpgid(pid, first_child);
}
sleep(1);
printf("Parent: sending SIGTERM to group %d\n", first_child);
kill(-first_child, SIGTERM); /* Negative PID = process group */
for (int i = 0; i < 3; i++) {
int status;
pid_t w = wait(&status);
if (WIFSIGNALED(status))
printf("Reaped %d, killed by signal %d\n", w, WTERMSIG(status));
}
return 0;
}
kill(-PGID, SIGTERM)
|
+----------+---------+
| | |
Child 0 Child 1 Child 2
(all share the same PGID)
Knowledge Check
-
What does
setsid()do, and why must the caller not be a process group leader? -
Why does the classic daemon recipe fork twice?
-
When you press Ctrl+C in a terminal, which processes receive
SIGINT?
Common Pitfalls
-
Calling setsid() as group leader: It fails with
EPERM. Fork first. -
Single fork for daemons: The process remains a session leader and can accidentally acquire a controlling terminal.
-
Forgetting to close file descriptors: Inherited fds can hold locks, keep files open, or leak information. Close them all.
-
Not redirecting stdio: A daemon with no terminal that writes to stdout gets
SIGPIPEand dies. -
Manual daemonization under systemd: If systemd starts your service, do not daemonize. systemd expects
Type=simpleservices to stay in the foreground. -
Ignoring SIGHUP in daemons: When the session leader exits,
SIGHUPis sent to the session. Daemons must handle or ignore it.
Environment and Configuration
Every Unix process inherits a block of key-value strings from its parent. This environment block controls program behavior without code changes. Understanding how it works -- and how to combine it with command-line arguments and configuration files -- is essential for writing well-behaved Unix tools.
The Environment Block
The kernel passes the environment to a new process on the stack, right after the
argument strings. Each entry is a KEY=VALUE string.
/* print_env.c */
#include <stdio.h>
extern char **environ; /* Global pointer to environment array */
int main(void)
{
for (char **ep = environ; *ep != NULL; ep++) {
printf("%s\n", *ep);
}
return 0;
}
$ gcc -o print_env print_env.c
$ ./print_env | head -5
SHELL=/bin/bash
HOME=/home/user
PATH=/usr/local/bin:/usr/bin:/bin
LANG=en_US.UTF-8
TERM=xterm-256color
The layout in memory:
Stack (high address)
+----------------------------+
| environment strings |
| "HOME=/home/user\0" |
| "PATH=/usr/bin:/bin\0" |
| ... |
+----------------------------+
| environ[0] -> "HOME=..." |
| environ[1] -> "PATH=..." |
| environ[N] -> NULL |
+----------------------------+
| argv strings |
| argv[0] -> "./print_env" |
| argv[1] -> NULL |
+----------------------------+
| argc = 1 |
+----------------------------+
(stack grows down)
Reading and Writing the Environment
/* env_ops.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
/* Read */
const char *home = getenv("HOME");
if (home)
printf("HOME = %s\n", home);
else
printf("HOME not set\n");
/* Write -- adds or overwrites */
setenv("MY_APP_DEBUG", "1", 1); /* 1 = overwrite if exists */
printf("MY_APP_DEBUG = %s\n", getenv("MY_APP_DEBUG"));
/* Write without overwrite */
setenv("MY_APP_DEBUG", "2", 0); /* 0 = do not overwrite */
printf("MY_APP_DEBUG = %s\n", getenv("MY_APP_DEBUG")); /* Still "1" */
/* Remove */
unsetenv("MY_APP_DEBUG");
printf("After unsetenv: %s\n",
getenv("MY_APP_DEBUG") ? getenv("MY_APP_DEBUG") : "(null)");
return 0;
}
Caution:
putenv()inserts a pointer to your string directly into the environment. If that string is on the stack, it becomes a dangling pointer when the function returns. Prefersetenv(), which copies the string.
Caution: None of the environment functions are thread-safe. Calling
setenv()orgetenv()from multiple threads without synchronization is undefined behavior.
PATH Resolution and exec
When you call execlp() or execvp() (the "p" variants), the kernel searches
the directories listed in PATH for the binary.
/* path_search.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
const char *path = getenv("PATH");
if (!path) {
printf("PATH not set\n");
return 1;
}
printf("PATH directories:\n");
/* strtok modifies the string, so copy it */
char *copy = strdup(path);
char *dir = strtok(copy, ":");
int i = 0;
while (dir) {
printf(" [%d] %s\n", i++, dir);
dir = strtok(NULL, ":");
}
free(copy);
return 0;
}
The search order matters. If /usr/local/bin appears before /usr/bin, a
binary in /usr/local/bin shadows the system version.
Caution: A PATH that includes
.(current directory) or an empty component (like:/usr/bin-- note the leading colon) is a security risk. An attacker can place a malicious binary in the current directory.
Command-Line Parsing: getopt
getopt() is the traditional Unix way to parse command-line options.
/* getopt_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int verbose = 0;
int count = 1;
const char *output = NULL;
int opt;
while ((opt = getopt(argc, argv, "vc:o:")) != -1) {
switch (opt) {
case 'v':
verbose = 1;
break;
case 'c':
count = atoi(optarg);
break;
case 'o':
output = optarg;
break;
default:
fprintf(stderr, "Usage: %s [-v] [-c count] [-o output] [files...]\n",
argv[0]);
return 1;
}
}
printf("verbose=%d count=%d output=%s\n",
verbose, count, output ? output : "(none)");
/* Remaining arguments (non-option) */
for (int i = optind; i < argc; i++)
printf("arg: %s\n", argv[i]);
return 0;
}
$ ./getopt_demo -v -c 5 -o result.txt file1.txt file2.txt
verbose=1 count=5 output=result.txt
arg: file1.txt
arg: file2.txt
The option string "vc:o:" means: -v takes no argument, -c and -o each
require one (indicated by the colon).
Long Options: getopt_long
For modern tools, long options like --verbose are expected.
/* getopt_long_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>
int main(int argc, char *argv[])
{
int verbose = 0;
int count = 1;
const char *output = NULL;
static struct option long_options[] = {
{"verbose", no_argument, NULL, 'v'},
{"count", required_argument, NULL, 'c'},
{"output", required_argument, NULL, 'o'},
{"help", no_argument, NULL, 'h'},
{NULL, 0, NULL, 0 }
};
int opt;
while ((opt = getopt_long(argc, argv, "vc:o:h", long_options, NULL)) != -1) {
switch (opt) {
case 'v': verbose = 1; break;
case 'c': count = atoi(optarg); break;
case 'o': output = optarg; break;
case 'h':
printf("Usage: %s [--verbose] [--count N] [--output FILE]\n",
argv[0]);
return 0;
default:
return 1;
}
}
printf("verbose=%d count=%d output=%s\n",
verbose, count, output ? output : "(none)");
return 0;
}
$ ./getopt_long_demo --verbose --count 10 --output data.csv
verbose=1 count=10 output=data.csv
Rust: std::env
Rust's standard library provides safe environment access.
// env_demo.rs use std::env; fn main() { // Read match env::var("HOME") { Ok(val) => println!("HOME = {}", val), Err(_) => println!("HOME not set"), } // Set env::set_var("MY_APP_DEBUG", "1"); println!("MY_APP_DEBUG = {}", env::var("MY_APP_DEBUG").unwrap()); // Remove env::remove_var("MY_APP_DEBUG"); // Iterate all println!("\nAll environment variables:"); for (key, value) in env::vars() { println!(" {}={}", key, value); } // PATH directories if let Some(path) = env::var_os("PATH") { println!("\nPATH directories:"); for dir in env::split_paths(&path) { println!(" {}", dir.display()); } } }
Rust Note:
env::set_var()andenv::remove_var()are markedunsafein Rust 1.66+ when used in multi-threaded programs. The Rust team recognized the same thread-safety issue that plagues C'ssetenv(). Prefer reading environment at startup and storing values in your own data structures.
Rust: Command-Line Parsing with clap
The clap crate is the standard Rust approach to argument parsing.
// clap_demo.rs // Cargo.toml: // [dependencies] // clap = { version = "4", features = ["derive"] } use clap::Parser; /// A well-behaved Unix tool #[derive(Parser, Debug)] #[command(name = "mytool", version, about = "Does useful things")] struct Args { /// Enable verbose output #[arg(short, long)] verbose: bool, /// Number of iterations #[arg(short, long, default_value_t = 1)] count: u32, /// Output file path #[arg(short, long)] output: Option<String>, /// Input files files: Vec<String>, } fn main() { let args = Args::parse(); println!("verbose={} count={} output={:?}", args.verbose, args.count, args.output); for f in &args.files { println!("file: {}", f); } }
$ cargo run -- --verbose --count 5 -o result.txt input1.dat input2.dat
verbose=true count=5 output=Some("result.txt")
file: input1.dat
file: input2.dat
The --help flag auto-generates usage text from the struct annotations.
Rust Note:
clapwith derive macros generates the help text, validation, and parsing code at compile time. The C equivalent requires writing all of this by hand or using a library likeargp.
Configuration File Patterns
A well-behaved Unix tool checks configuration in this order (later overrides earlier):
1. Compiled-in defaults
2. System config: /etc/myapp/config
3. User config: ~/.config/myapp/config (XDG_CONFIG_HOME)
4. Environment: MYAPP_DEBUG=1
5. Command-line: --debug
A minimal config file parser in C:
/* config_parse.c */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_LINE 256
struct config {
int port;
int verbose;
char logfile[256];
};
static void config_defaults(struct config *cfg)
{
cfg->port = 8080;
cfg->verbose = 0;
strncpy(cfg->logfile, "/var/log/myapp.log", sizeof(cfg->logfile) - 1);
}
static int config_load(struct config *cfg, const char *path)
{
FILE *f = fopen(path, "r");
if (!f) return -1;
char line[MAX_LINE];
while (fgets(line, sizeof(line), f)) {
/* Skip comments and empty lines */
if (line[0] == '#' || line[0] == '\n')
continue;
char key[128], value[128];
if (sscanf(line, "%127[^=]=%127[^\n]", key, value) == 2) {
if (strcmp(key, "port") == 0)
cfg->port = atoi(value);
else if (strcmp(key, "verbose") == 0)
cfg->verbose = atoi(value);
else if (strcmp(key, "logfile") == 0)
strncpy(cfg->logfile, value, sizeof(cfg->logfile) - 1);
}
}
fclose(f);
return 0;
}
int main(int argc, char *argv[])
{
struct config cfg;
config_defaults(&cfg);
/* Try system config, then user config */
config_load(&cfg, "/etc/myapp.conf");
char user_conf[512];
const char *home = getenv("HOME");
if (home) {
snprintf(user_conf, sizeof(user_conf), "%s/.myapp.conf", home);
config_load(&cfg, user_conf);
}
/* Environment overrides */
const char *env_port = getenv("MYAPP_PORT");
if (env_port) cfg.port = atoi(env_port);
printf("port=%d verbose=%d logfile=%s\n",
cfg.port, cfg.verbose, cfg.logfile);
return 0;
}
The /etc Convention
System-wide configuration lives under /etc. Per-application patterns:
| Path | Purpose |
|---|---|
/etc/myapp.conf | Single config file |
/etc/myapp/ | Config directory |
/etc/myapp/conf.d/ | Drop-in overrides (processed alphabetically) |
/etc/default/myapp | Default environment for init scripts |
Putting It Together: A Well-Behaved Unix Tool
Here is a complete C program that follows all conventions:
/* wellbehaved.c -- a well-behaved Unix tool */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>
static struct {
int verbose;
int count;
const char *output;
} config = {
.verbose = 0,
.count = 1,
.output = NULL,
};
static void usage(const char *prog)
{
fprintf(stderr,
"Usage: %s [OPTIONS] [FILE...]\n"
"\n"
"Options:\n"
" -v, --verbose Enable verbose output\n"
" -c, --count=N Number of iterations (default: 1)\n"
" -o, --output=FILE Output file\n"
" -h, --help Show this help\n"
"\n"
"Environment:\n"
" WELLBEHAVED_VERBOSE Set to 1 for verbose mode\n"
" WELLBEHAVED_COUNT Default iteration count\n",
prog);
}
int main(int argc, char *argv[])
{
/* 1. Environment */
const char *env_v = getenv("WELLBEHAVED_VERBOSE");
if (env_v && strcmp(env_v, "1") == 0)
config.verbose = 1;
const char *env_c = getenv("WELLBEHAVED_COUNT");
if (env_c) config.count = atoi(env_c);
/* 2. Command line (overrides environment) */
static struct option long_opts[] = {
{"verbose", no_argument, NULL, 'v'},
{"count", required_argument, NULL, 'c'},
{"output", required_argument, NULL, 'o'},
{"help", no_argument, NULL, 'h'},
{NULL, 0, NULL, 0}
};
int opt;
while ((opt = getopt_long(argc, argv, "vc:o:h", long_opts, NULL)) != -1) {
switch (opt) {
case 'v': config.verbose = 1; break;
case 'c': config.count = atoi(optarg); break;
case 'o': config.output = optarg; break;
case 'h': usage(argv[0]); return 0;
default: usage(argv[0]); return 1;
}
}
/* 3. Act on stdin if no files given (Unix filter convention) */
if (optind >= argc) {
if (config.verbose)
fprintf(stderr, "Reading from stdin...\n");
/* Process stdin here */
}
/* 4. Process each file argument */
for (int i = optind; i < argc; i++) {
if (config.verbose)
fprintf(stderr, "Processing: %s\n", argv[i]);
FILE *f = fopen(argv[i], "r");
if (!f) {
fprintf(stderr, "%s: %s: %s\n", argv[0], argv[i], strerror(errno));
continue; /* Keep going -- do not abort on one bad file */
}
/* Process file here */
fclose(f);
}
/* 5. Diagnostic output to stderr, data output to stdout */
if (config.verbose)
fprintf(stderr, "Done. Processed %d iteration(s).\n", config.count);
return 0;
}
Key conventions this follows:
- Diagnostic messages go to stderr, data to stdout
- Works as a filter (reads stdin when no files given)
- Continues on error (does not abort for one bad file)
- Documents environment variables in
--help - Uses exit code 0 for success, nonzero for failure
Try It: Write the Rust equivalent of
wellbehaved.cusingclapandstd::env. Make it read from stdin when no files are given, usingstd::io::stdin().
Driver Prep: Kernel modules receive configuration through module parameters (
module_param()macro) and device tree entries, not environment variables. But user-space tools that load, configure, and test drivers rely heavily on environment and command-line patterns. Tools likemodproberead/etc/modprobe.d/for configuration.
Knowledge Check
-
What is the order of precedence when a program checks compiled-in defaults, environment variables, and command-line arguments?
-
Why is
putenv()dangerous compared tosetenv()? -
What does a leading colon or dot in
PATHmean, and why is it a security risk?
Common Pitfalls
-
Not checking getenv() return value: It returns
NULLif the variable is not set. PassingNULLtostrcmp()orprintf("%s", ...)is undefined behavior. -
Modifying the string returned by getenv(): The returned pointer may point into the environment block. Modifying it has undefined behavior. Copy it first.
-
Thread-unsafe environment access:
setenv()andgetenv()are not thread-safe. Read everything you need at startup. -
Hardcoding paths: Use environment variables (
HOME,XDG_CONFIG_HOME) or/etcconventions. Never assume a home directory path. -
Ignoring stdin: Unix tools that accept files should also work as filters. If no files are given, read from stdin.
-
Error messages to stdout: Diagnostic output must go to stderr so it does not corrupt piped data.
Signal Fundamentals
Signals are asynchronous notifications delivered by the kernel to a process. They interrupt whatever the process is doing -- right now, at any instruction boundary. They are Unix's oldest form of inter-process communication, and they are everywhere: Ctrl+C, child process death, illegal memory access, broken pipes. You cannot write robust systems code without understanding them.
What Signals Are
A signal is a small integer sent from the kernel (or another process) to a target process. When a signal arrives, the process can:
- Run a handler function (custom code).
- Accept the default action (terminate, core dump, ignore, or stop).
- Block the signal temporarily (it stays pending).
Kernel / Other Process
|
| signal (e.g., SIGINT)
v
+-----------------+
| Target Process |
| |
| Normal code | <-- interrupted
| ... |
| Handler runs | <-- if installed
| ... |
| Normal code | <-- resumes
+-----------------+
Common Signals
| Signal | Number | Default Action | Trigger |
|---|---|---|---|
SIGHUP | 1 | Terminate | Terminal hangup |
SIGINT | 2 | Terminate | Ctrl+C |
SIGQUIT | 3 | Core dump | Ctrl+\ |
SIGILL | 4 | Core dump | Illegal instruction |
SIGABRT | 6 | Core dump | abort() |
SIGFPE | 8 | Core dump | Divide by zero (integer) |
SIGKILL | 9 | Terminate | Uncatchable kill |
SIGSEGV | 11 | Core dump | Bad memory access |
SIGPIPE | 13 | Terminate | Write to broken pipe |
SIGALRM | 14 | Terminate | alarm() timer |
SIGTERM | 15 | Terminate | Polite termination request |
SIGCHLD | 17 | Ignore | Child process stopped/exited |
SIGCONT | 18 | Continue | Resume stopped process |
SIGSTOP | 19 | Stop | Uncatchable stop |
SIGTSTP | 20 | Stop | Ctrl+Z |
SIGUSR1 | 10 | Terminate | User-defined |
SIGUSR2 | 12 | Terminate | User-defined |
Caution:
SIGKILL(9) andSIGSTOP(19) cannot be caught, blocked, or ignored. The kernel enforces this. Do not waste time trying to handle them.
Default Actions
There are four possible default actions:
- Terminate: Process exits.
- Core dump: Process exits and writes a core file (if enabled).
- Ignore: Signal is silently discarded.
- Stop: Process is suspended (like Ctrl+Z).
SIGCHLD and SIGURG default to ignore. Most signals default to terminate.
Sending Signals
From the shell:
$ kill -TERM 1234 # Send SIGTERM to PID 1234
$ kill -9 1234 # Send SIGKILL (uncatchable)
$ kill -SIGUSR1 1234 # Send SIGUSR1
$ kill -0 1234 # Test if process exists (no signal sent)
From C:
/* send_signal.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
int main(void)
{
pid_t pid = fork();
if (pid == 0) {
printf("Child %d: waiting for signal...\n", getpid());
pause(); /* Suspend until any signal arrives */
printf("Child: this line never reached (default SIGTERM kills)\n");
_exit(0);
}
sleep(1);
printf("Parent: sending SIGTERM to child %d\n", pid);
kill(pid, SIGTERM);
int status;
waitpid(pid, &status, 0);
if (WIFSIGNALED(status))
printf("Child killed by signal %d\n", WTERMSIG(status));
return 0;
}
The signal() Function (And Why You Should Not Use It)
The original BSD/POSIX signal() function installs a handler:
/* signal_old.c -- for demonstration only */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
static void handler(int sig)
{
/* UNSAFE: printf is not async-signal-safe -- demo only */
printf("Caught signal %d\n", sig);
}
int main(void)
{
signal(SIGINT, handler);
printf("PID %d: Press Ctrl+C (3 times to exit via default)\n", getpid());
for (int i = 0; i < 30; i++) {
printf("tick %d\n", i);
sleep(1);
}
return 0;
}
Caution:
signal()has portability problems. On some systems it resets the handler toSIG_DFLafter each delivery (System V behavior). On others it does not (BSD behavior). The behavior ofsignal()is implementation-defined by POSIX. Always usesigaction()instead (covered in the next chapter).
A Signal Demo: Handling SIGINT and SIGTERM
Here is a proper pattern using a flag (still using signal() for simplicity --
we will fix this with sigaction() in the next chapter):
/* graceful_shutdown.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
static volatile sig_atomic_t got_signal = 0;
static void handler(int sig)
{
got_signal = sig;
}
int main(void)
{
signal(SIGINT, handler);
signal(SIGTERM, handler);
printf("PID %d: Running... (Ctrl+C to stop)\n", getpid());
while (!got_signal) {
/* Main work loop */
printf("Working...\n");
sleep(2);
}
printf("Received signal %d, shutting down gracefully.\n", got_signal);
/* Cleanup code here */
return 0;
}
The key type is volatile sig_atomic_t -- an integer type guaranteed to be
read and written atomically with respect to signal delivery.
Main thread: Signal delivery:
while (!got_signal) { handler(SIGINT) {
work(); got_signal = SIGINT;
sleep(2); }
}
| |
+---- reads got_signal ---+
Try It: Modify
graceful_shutdown.cto count how many times Ctrl+C is pressed. After 3 presses, exit. Print the count in the main loop.
SIGCHLD: Child Process Notifications
When a child process exits, the kernel sends SIGCHLD to the parent. This is
how servers avoid blocking on waitpid():
/* sigchld_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <errno.h>
static void sigchld_handler(int sig)
{
(void)sig;
int saved_errno = errno;
/* Reap all dead children (non-blocking) */
while (waitpid(-1, NULL, WNOHANG) > 0)
;
errno = saved_errno;
}
int main(void)
{
signal(SIGCHLD, sigchld_handler);
/* Spawn 3 children that exit at different times */
for (int i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) {
sleep(i + 1);
printf("Child %d (PID %d) exiting\n", i, getpid());
_exit(0);
}
printf("Spawned child %d: PID %d\n", i, pid);
}
/* Parent does other work */
for (int i = 0; i < 5; i++) {
printf("Parent working (tick %d)...\n", i);
sleep(2);
}
return 0;
}
Caution: The
SIGCHLDhandler must callwaitpid()in a loop withWNOHANG. Multiple children can exit before the handler runs, but signals are not queued (standard signals, at least). OneSIGCHLDdelivery might represent multiple dead children.
SIGPIPE: Broken Pipes
When you write to a pipe or socket whose read end is closed, the kernel sends
SIGPIPE. The default action kills the process.
/* sigpipe_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
int main(void)
{
/* Ignore SIGPIPE -- check write() return instead */
signal(SIGPIPE, SIG_IGN);
int pipefd[2];
pipe(pipefd);
/* Close the read end immediately */
close(pipefd[0]);
/* Write to the broken pipe */
const char *msg = "Hello, pipe!\n";
ssize_t n = write(pipefd[1], msg, strlen(msg));
if (n < 0) {
printf("Write failed: %s (errno=%d)\n", strerror(errno), errno);
/* errno == EPIPE */
} else {
printf("Wrote %zd bytes\n", n);
}
close(pipefd[1]);
return 0;
}
Driver Prep: Kernel drivers handle signals indirectly. When a user-space process receives a signal while blocked in a system call, the kernel returns
EINTR(orERESTARTSYSinternally). Driver code must check forsignal_pending(current)and return-ERESTARTSYSso the VFS layer can restart or abort the system call.
Rust: The signal-hook Crate
Rust has no built-in signal handling in std. The signal-hook crate provides
safe abstractions.
// signal_hook_demo.rs // Cargo.toml: // [dependencies] // signal-hook = "0.3" use signal_hook::consts::{SIGINT, SIGTERM}; use signal_hook::flag; use std::sync::Arc; use std::sync::atomic::{AtomicBool, Ordering}; use std::thread; use std::time::Duration; fn main() { let running = Arc::new(AtomicBool::new(true)); // Register signal handlers that set the flag to false flag::register(SIGINT, Arc::clone(&running)).expect("register SIGINT"); flag::register(SIGTERM, Arc::clone(&running)).expect("register SIGTERM"); println!("PID {}: Running... (Ctrl+C to stop)", std::process::id()); while running.load(Ordering::Relaxed) { println!("Working..."); thread::sleep(Duration::from_secs(2)); } println!("Signal received, shutting down."); }
Rust Note:
signal-hookuses atomic flags internally, avoiding the async-signal-safety pitfalls of C handlers. Theflag::registerfunction installs a minimal handler that just flips anAtomicBool. No unsafe code is needed in user code.
Rust: Using nix for Signals
The nix crate wraps the POSIX signal API:
// nix_signal_demo.rs // Cargo.toml: nix = { version = "0.29", features = ["signal", "process"] } use nix::sys::signal::{self, Signal, SigHandler}; use nix::unistd::Pid; use std::thread; use std::time::Duration; extern "C" fn handler(sig: libc::c_int) { // Minimal handler -- only async-signal-safe operations // We just use write() to fd 1 (not println!) let msg = b"Signal caught!\n"; unsafe { libc::write(1, msg.as_ptr() as *const libc::c_void, msg.len()); } let _ = sig; } fn main() { // Install handler for SIGUSR1 unsafe { signal::signal(Signal::SIGUSR1, SigHandler::Handler(handler)) .expect("signal"); } let pid = nix::unistd::getpid(); println!("PID {}: send SIGUSR1 to me", pid); println!(" kill -USR1 {}", pid); // Also send it to ourselves thread::sleep(Duration::from_secs(1)); signal::kill(Pid::this(), Signal::SIGUSR1).expect("kill"); thread::sleep(Duration::from_secs(1)); println!("Done."); }
Listing Signals on Your System
/* list_signals.c */
#include <stdio.h>
#include <string.h>
#include <signal.h>
int main(void)
{
for (int i = 1; i < NSIG; i++) {
const char *name = strsignal(i);
if (name)
printf("%2d %s\n", i, name);
}
return 0;
}
$ gcc -o list_signals list_signals.c && ./list_signals
1 Hangup
2 Interrupt
3 Quit
...
Try It: Run
kill -lin your shell to see the full signal list. Compare it with the output oflist_signals. Note the real-time signals at the end (32+).
Signal Delivery Flow
Event occurs (Ctrl+C, child dies, bad memory access, kill())
|
v
Kernel sets signal as "pending" for target process
|
v
Process is scheduled to run (or already running)
|
v
Kernel checks: is signal blocked?
| |
YES NO
| |
v v
Signal stays Check disposition:
pending SIG_DFL / SIG_IGN / handler
|
+---------+----------+
| | |
SIG_DFL SIG_IGN handler()
| | |
Default Discard Run handler,
action then resume
Knowledge Check
-
Name two signals that cannot be caught or ignored.
-
What is the default action for
SIGCHLD? Why is this important for servers that fork child processes? -
Why is
volatile sig_atomic_trequired for variables shared between a signal handler and the main program?
Common Pitfalls
-
Ignoring SIGPIPE: Network servers must ignore
SIGPIPEor they will die when a client disconnects mid-write. Usesignal(SIGPIPE, SIG_IGN)and checkwrite()return values. -
Not saving/restoring errno in handlers: Signal handlers can clobber
errno. Save it on entry, restore on exit. -
Assuming signals are queued: Standard signals (1-31) are not queued. If two
SIGCHLDsignals arrive before the handler runs, you get one delivery. Always loop inwaitpid()withWNOHANG. -
Using printf in handlers: It is not async-signal-safe. Use
write()to a file descriptor if you must produce output. -
Forgetting that sleep() is interrupted:
sleep(),read(),write(), and other blocking calls return early withEINTRwhen a signal is caught. Always retry or handle the short return. -
Catching SIGSEGV to "handle" crashes: You can catch it, but you cannot safely resume. The faulting instruction will re-execute and fault again unless you fix the underlying memory issue (which you almost certainly cannot do portably).
Signal Handlers and Masks
The previous chapter introduced signals. This chapter is about controlling them
properly: installing handlers with sigaction(), restricting what runs inside
a handler, and using signal masks to create critical sections where signals
are deferred.
sigaction(): The Proper Way
sigaction() replaces signal() with well-defined, portable behavior. It does
not reset the handler after delivery, it lets you control which signals are
blocked during handler execution, and it provides additional flags.
/* sigaction_basic.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
static volatile sig_atomic_t got_int = 0;
static void handler(int sig)
{
(void)sig;
got_int = 1;
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(SIGINT, &sa, NULL) < 0) {
perror("sigaction");
return 1;
}
printf("PID %d: Press Ctrl+C\n", getpid());
while (!got_int) {
printf("Waiting...\n");
sleep(2);
}
printf("Caught SIGINT, exiting gracefully.\n");
return 0;
}
The struct sigaction fields:
| Field | Purpose |
|---|---|
sa_handler | Pointer to handler function, SIG_DFL, or SIG_IGN |
sa_mask | Additional signals to block during handler execution |
sa_flags | Behavior flags (see below) |
sa_sigaction | Extended handler (used with SA_SIGINFO) |
Common flags:
| Flag | Effect |
|---|---|
SA_RESTART | Auto-restart interrupted system calls |
SA_NOCLDSTOP | Do not deliver SIGCHLD when child stops (only on exit) |
SA_SIGINFO | Use sa_sigaction instead of sa_handler |
SA_RESETHAND | Reset to SIG_DFL after one delivery (like old signal()) |
SA_RESTART: Restarting Interrupted System Calls
Without SA_RESTART, a caught signal causes blocking calls like read() to
return -1 with errno == EINTR. With it, the kernel restarts the call
automatically.
/* sa_restart.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
static void handler(int sig)
{
(void)sig;
const char msg[] = "[signal caught]\n";
write(STDERR_FILENO, msg, sizeof(msg) - 1);
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
/* Try with and without SA_RESTART */
sa.sa_flags = SA_RESTART; /* Comment this out to see EINTR */
sigaction(SIGINT, &sa, NULL);
printf("PID %d: Type something (Ctrl+C to test):\n", getpid());
char buf[256];
ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
if (n < 0) {
if (errno == EINTR)
printf("read() interrupted by signal (EINTR)\n");
else
perror("read");
} else {
buf[n] = '\0';
printf("Read: %s", buf);
}
return 0;
}
Try It: Compile and run with
SA_RESTART. Press Ctrl+C, then type something. The read completes normally. Now removeSA_RESTART, recompile, and press Ctrl+C. The read returnsEINTR.
Async-Signal-Safe Functions: The Short List
Inside a signal handler, you can only call async-signal-safe functions. These are functions guaranteed to work correctly even when interrupting arbitrary code.
The POSIX-mandated safe list (selected):
_exit write read open
close signal sigaction sigprocmask
sigaddset sigdelset sigemptyset sigfillset
kill raise alarm pause
fork execve waitpid getpid
Not safe (most common traps):
printf fprintf malloc free
syslog strerror localtime gmtime
pthread_* exit atexit
Caution: Calling
printf()from a signal handler is undefined behavior. It can deadlock if the signal interruptsprintf()in the main program (both try to acquire the stdio lock). Usewrite()with a fixed-size buffer for any handler output.
/* safe_handler_output.c */
#include <signal.h>
#include <unistd.h>
#include <string.h>
static void handler(int sig)
{
/* Only async-signal-safe calls here */
const char msg[] = "Caught SIGINT\n";
write(STDOUT_FILENO, msg, strlen(msg));
(void)sig;
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, NULL);
pause(); /* Wait for signal */
return 0;
}
Why printf in a Handler Is Undefined Behavior
Here is the scenario:
Main program Signal handler
| |
printf("Working...\n") |
| |
+-- acquires stdio lock |
| |
+-- SIGINT arrives here! |
| |
| printf("Caught!\n")
| |
| +-- tries to acquire stdio lock
| |
| +-- DEADLOCK (same thread holds lock)
The program hangs forever. This is not theoretical -- it happens in production.
sig_atomic_t: The Shared Flag
The only safe way to communicate between a handler and the main program is
through variables of type volatile sig_atomic_t.
/* sig_atomic_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
static volatile sig_atomic_t signal_count = 0;
static void handler(int sig)
{
(void)sig;
signal_count++;
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, NULL);
printf("PID %d: Press Ctrl+C multiple times. Ctrl+\\ to quit.\n", getpid());
while (1) {
pause();
printf("Signal count: %d\n", (int)signal_count);
}
return 0;
}
Caution:
sig_atomic_tis guaranteed to be atomically readable/writable, but it is NOT a general-purpose atomic type. It is typically just anint. Only use it for simple flags and counters in signal handlers. For anything more complex, use signal masks or the self-pipe trick (next chapter).
Signal Masks: sigprocmask
A signal mask is a set of signals that are blocked (deferred) for the calling thread. Blocked signals stay pending until unblocked.
/* sigmask_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
static void handler(int sig)
{
const char msg[] = "SIGINT delivered!\n";
write(STDOUT_FILENO, msg, strlen(msg));
(void)sig;
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, NULL);
sigset_t block_set, old_set;
sigemptyset(&block_set);
sigaddset(&block_set, SIGINT);
/* Block SIGINT */
sigprocmask(SIG_BLOCK, &block_set, &old_set);
printf("SIGINT blocked. Press Ctrl+C now (within 5 seconds).\n");
sleep(5);
/* Check if SIGINT is pending */
sigset_t pending;
sigpending(&pending);
if (sigismember(&pending, SIGINT))
printf("SIGINT is pending (was sent while blocked).\n");
/* Unblock SIGINT -- pending signal will be delivered now */
printf("Unblocking SIGINT...\n");
sigprocmask(SIG_SETMASK, &old_set, NULL);
printf("After unblock.\n");
return 0;
}
$ ./sigmask_demo
SIGINT blocked. Press Ctrl+C now (within 5 seconds).
^C <-- pressed Ctrl+C during sleep
SIGINT is pending (was sent while blocked).
Unblocking SIGINT...
SIGINT delivered! <-- handler runs when unblocked
After unblock.
Signal mask operations:
| Operation | Meaning |
|---|---|
SIG_BLOCK | Add signals to the mask |
SIG_UNBLOCK | Remove signals from the mask |
SIG_SETMASK | Replace the entire mask |
Signal Mask (per-thread):
+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |...|
+---+---+---+---+---+---+---+---+---+
0 1 0 0 0 0 0 0 ...
^
|
SIGINT blocked (bit set = blocked)
Pending Signals:
+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |...|
+---+---+---+---+---+---+---+---+---+
0 1 0 0 0 0 0 0 ...
^
|
SIGINT pending (received while blocked)
The Critical Section Pattern
Block signals around code that must not be interrupted:
/* critical_section.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
static volatile sig_atomic_t got_signal = 0;
static void handler(int sig)
{
(void)sig;
got_signal = 1;
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, NULL);
sigset_t block, old;
sigemptyset(&block);
sigaddset(&block, SIGINT);
/* --- Critical section: SIGINT deferred --- */
sigprocmask(SIG_BLOCK, &block, &old);
printf("Updating shared data structure...\n");
sleep(3); /* Simulate long update */
printf("Update complete.\n");
sigprocmask(SIG_SETMASK, &old, NULL);
/* --- End critical section: pending SIGINT delivered here --- */
if (got_signal)
printf("Signal was deferred and delivered after critical section.\n");
return 0;
}
This pattern is essential for data structures that must be consistent. If a signal handler touches the same data, blocking the signal prevents corruption.
Blocking Signals During Handler Execution
The sa_mask field of struct sigaction specifies additional signals to block
while the handler is running. The caught signal is always blocked by default
(unless SA_NODEFER is set).
/* sa_mask_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
static void int_handler(int sig)
{
(void)sig;
const char msg[] = "SIGINT handler start\n";
write(1, msg, strlen(msg));
sleep(3); /* SIGTERM blocked during this time via sa_mask */
const char msg2[] = "SIGINT handler end\n";
write(1, msg2, strlen(msg2));
}
static void term_handler(int sig)
{
(void)sig;
const char msg[] = "SIGTERM handler\n";
write(1, msg, strlen(msg));
}
int main(void)
{
struct sigaction sa_int, sa_term;
memset(&sa_term, 0, sizeof(sa_term));
sa_term.sa_handler = term_handler;
sigemptyset(&sa_term.sa_mask);
sa_term.sa_flags = 0;
sigaction(SIGTERM, &sa_term, NULL);
memset(&sa_int, 0, sizeof(sa_int));
sa_int.sa_handler = int_handler;
sigemptyset(&sa_int.sa_mask);
sigaddset(&sa_int.sa_mask, SIGTERM); /* Block SIGTERM during SIGINT handler */
sa_int.sa_flags = 0;
sigaction(SIGINT, &sa_int, NULL);
printf("PID %d: Press Ctrl+C, then quickly send SIGTERM\n", getpid());
printf(" kill -TERM %d\n", getpid());
for (;;) pause();
return 0;
}
Try It: Run the program. Press Ctrl+C. While "SIGINT handler start" is displayed, send
kill -TERM <pid>from another terminal. Notice that SIGTERM is delivered only after the SIGINT handler finishes.
SA_SIGINFO: Extended Signal Information
With SA_SIGINFO, the handler receives a siginfo_t struct with details about
who sent the signal and why.
/* siginfo_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
static void handler(int sig, siginfo_t *info, void *ucontext)
{
(void)ucontext;
char buf[128];
int len = snprintf(buf, sizeof(buf),
"Signal %d from PID %d (uid %d)\n",
sig, info->si_pid, info->si_uid);
write(STDOUT_FILENO, buf, len);
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_SIGINFO;
sigaction(SIGUSR1, &sa, NULL);
printf("PID %d: send SIGUSR1 to me\n", getpid());
printf(" kill -USR1 %d\n", getpid());
pause();
return 0;
}
Caution:
snprintfis technically not on the async-signal-safe list. For production code, format the output manually usingwrite()and integer-to- string conversion. The example above is simplified for clarity.
Rust: Safe Signal Handling with signal-hook
The signal-hook crate provides multiple safe patterns:
// signal_hook_iterator.rs // Cargo.toml: // [dependencies] // signal-hook = "0.3" use signal_hook::consts::{SIGINT, SIGTERM, SIGUSR1}; use signal_hook::iterator::Signals; use std::thread; use std::time::Duration; fn main() { let mut signals = Signals::new(&[SIGINT, SIGTERM, SIGUSR1]) .expect("register signals"); // Spawn a thread to handle signals let handle = thread::spawn(move || { for sig in signals.forever() { match sig { SIGINT => { println!("Received SIGINT (Ctrl+C)"); println!("Shutting down..."); return; } SIGTERM => { println!("Received SIGTERM"); println!("Shutting down..."); return; } SIGUSR1 => { println!("Received SIGUSR1 -- reloading config"); } _ => unreachable!(), } } }); println!("PID {}: Running... (Ctrl+C to stop)", std::process::id()); println!(" kill -USR1 {} to reload", std::process::id()); // Main work loop loop { if handle.is_finished() { break; } println!("Working..."); thread::sleep(Duration::from_secs(2)); } handle.join().expect("signal thread panicked"); println!("Clean shutdown complete."); }
Rust Note:
signal-hook'sSignals::forever()uses a self-pipe internally. The signal handler writes a byte to a pipe; the iterator reads from it. This converts asynchronous signals into synchronous iteration, completely avoiding async-signal-safety issues. All the complex, unsafe C patterns become a simple for loop.
Rust Note: The nix crate also exposes
sigprocmaskvianix::sys::signal::sigprocmask(). The API mirrors C: create aSigSet, add signals to it, and callsigprocmask(SigmaskHow::SIG_BLOCK, Some(&set)). The old mask is returned for later restoration.
Driver Prep: Kernel signal handling is different -- kernel code does not receive signals. Instead, the kernel checks
signal_pending(current)when returning from a blocking operation. If a signal is pending, the kernel returns-ERESTARTSYSto allow the system call to be restarted. Driver authors must handle this return code in any sleeping function.
Knowledge Check
-
What happens if you call
printf()inside a signal handler that interrupts anotherprintf()call in the main program? -
What does
SA_RESTARTdo, and when would you want to omit it? -
How do you temporarily block a signal, do some work, then unblock it so any pending instances are delivered?
Common Pitfalls
-
Using signal() instead of sigaction():
signal()has undefined reset behavior across platforms. Always usesigaction(). -
Calling malloc/free in handlers: Both use global state (the heap free list) and can deadlock or corrupt memory.
-
Not blocking signals during data structure updates: If a handler accesses shared data, block the signal during modifications in the main code.
-
Forgetting SA_RESTART: Without it, every
read(),write(),accept(), andselect()must check forEINTRand retry manually. -
Using complex types in handlers: Only
volatile sig_atomic_tis safe for communication between handlers and the main program. -
Not saving errno: Signal handlers can be called between a system call failing and the program reading
errno. Save and restore it.
Advanced Signals: signalfd and the Self-Pipe Trick
Standard signal handlers are awkward. They interrupt your code at arbitrary points, restrict you to a tiny set of safe functions, and make shared state management painful. This chapter covers techniques that convert signals from asynchronous interrupts into synchronous events you can handle in a normal event loop -- alongside sockets, timers, and other file descriptors.
The Problem with Async Signal Handlers
Consider a server using select() or epoll() to multiplex I/O. A signal
handler fires between iterations, setting a flag. But select() is blocking --
it will not check the flag until a file descriptor event wakes it up, which
might not happen for seconds or minutes.
Event loop:
while (running) {
n = select(...) <-- blocks here
handle_fd_events()
check_signal_flag() <-- too late if no fd events
}
Signal arrives during select():
- Handler sets flag
- select() returns EINTR (if no SA_RESTART)
- OR select() keeps sleeping (with SA_RESTART)
We need signals to appear as file descriptor events.
Real-Time Signals (SIGRTMIN to SIGRTMAX)
Standard signals (1-31) have a critical limitation: they are not queued. If two SIGCHLD signals arrive before the handler runs, you get one delivery.
Real-time signals (SIGRTMIN through SIGRTMAX, typically 34-64) fix this:
- They are queued: each send results in one delivery.
- They carry data (an integer or pointer via
sigqueue()). - They are delivered in order (lowest signal number first).
/* rt_signal.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <sys/wait.h>
static void handler(int sig, siginfo_t *info, void *ctx)
{
(void)ctx;
char buf[128];
int len = snprintf(buf, sizeof(buf),
"RT signal %d, value=%d, from PID %d\n",
sig, info->si_value.sival_int, info->si_pid);
write(STDOUT_FILENO, buf, len);
}
int main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_SIGINFO;
/* Install handler for SIGRTMIN */
sigaction(SIGRTMIN, &sa, NULL);
printf("PID %d: installing handler for signal %d (SIGRTMIN)\n",
getpid(), SIGRTMIN);
printf("RT signal range: %d - %d\n", SIGRTMIN, SIGRTMAX);
pid_t pid = fork();
if (pid == 0) {
/* Child sends 3 queued signals with different values */
union sigval val;
for (int i = 1; i <= 3; i++) {
val.sival_int = i * 100;
sigqueue(getppid(), SIGRTMIN, val);
}
_exit(0);
}
/* Parent: block briefly to let all signals queue up */
sigset_t block;
sigemptyset(&block);
sigaddset(&block, SIGRTMIN);
sigprocmask(SIG_BLOCK, &block, NULL);
waitpid(pid, NULL, 0);
sleep(1);
/* Unblock: all 3 queued signals should now deliver */
printf("Unblocking RT signal...\n");
sigprocmask(SIG_UNBLOCK, &block, NULL);
sleep(1);
return 0;
}
$ ./rt_signal
PID 5000: installing handler for signal 34 (SIGRTMIN)
RT signal range: 34 - 64
Unblocking RT signal...
RT signal 34, value=100, from PID 5001
RT signal 34, value=200, from PID 5001
RT signal 34, value=300, from PID 5001
All three deliveries happen. With a standard signal, only one would.
signalfd(): Signals as File Descriptor Events
Linux provides signalfd(), which creates a file descriptor that becomes
readable when a signal is pending. This is the cleanest integration with
event loops.
/* signalfd_demo.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/signalfd.h>
#include <string.h>
#include <stdlib.h>
#include <poll.h>
int main(void)
{
/* Block SIGINT and SIGTERM so they go to signalfd */
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigaddset(&mask, SIGTERM);
sigprocmask(SIG_BLOCK, &mask, NULL);
/* Create signalfd */
int sfd = signalfd(-1, &mask, 0);
if (sfd < 0) {
perror("signalfd");
return 1;
}
printf("PID %d: Press Ctrl+C or send SIGTERM\n", getpid());
/* Event loop using poll */
struct pollfd pfd = { .fd = sfd, .events = POLLIN };
for (;;) {
int ret = poll(&pfd, 1, 5000); /* 5 second timeout */
if (ret < 0) {
perror("poll");
break;
}
if (ret == 0) {
printf("Tick (no signals)...\n");
continue;
}
/* Read the signal info */
struct signalfd_siginfo si;
ssize_t n = read(sfd, &si, sizeof(si));
if (n != sizeof(si)) {
perror("read signalfd");
break;
}
printf("Received signal %d from PID %u\n",
si.ssi_signo, si.ssi_pid);
if (si.ssi_signo == SIGINT || si.ssi_signo == SIGTERM) {
printf("Shutting down.\n");
break;
}
}
close(sfd);
return 0;
}
The flow:
1. Block signals with sigprocmask()
2. Create signalfd() with same mask
3. Signals arrive -> kernel queues them on the fd
4. poll()/epoll() reports fd as readable
5. read() from fd returns struct signalfd_siginfo
6. Handle signal synchronously in your event loop
+----------+ +----------+ +-----------+
| Kernel | --> | signalfd | --> | poll/epoll|
| signal | | (fd) | | event |
| delivery | | | | loop |
+----------+ +----------+ +-----------+
Caution: You must block the signals with
sigprocmask()before creating thesignalfd. If a default handler or custom handler is installed, the signal gets delivered to the handler instead of the fd.
The Self-Pipe Trick
Before signalfd() existed (or on non-Linux systems), the self-pipe trick
was the standard solution. Create a pipe. In the signal handler, write one byte
to the pipe. In the event loop, include the pipe's read end in your
select()/poll() set.
/* self_pipe.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <poll.h>
#include <errno.h>
static int pipe_fds[2];
static void handler(int sig)
{
/* Write one byte -- the signal number */
unsigned char s = (unsigned char)sig;
write(pipe_fds[1], &s, 1);
}
static void make_nonblocking(int fd)
{
int flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
int main(void)
{
if (pipe(pipe_fds) < 0) {
perror("pipe");
return 1;
}
make_nonblocking(pipe_fds[0]);
make_nonblocking(pipe_fds[1]);
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sigaction(SIGINT, &sa, NULL);
sigaction(SIGTERM, &sa, NULL);
printf("PID %d: Press Ctrl+C or send SIGTERM\n", getpid());
struct pollfd pfds[2] = {
{ .fd = STDIN_FILENO, .events = POLLIN },
{ .fd = pipe_fds[0], .events = POLLIN },
};
for (;;) {
int ret = poll(pfds, 2, 5000);
if (ret < 0 && errno == EINTR)
continue;
if (ret < 0) {
perror("poll");
break;
}
if (ret == 0) {
printf("Tick...\n");
continue;
}
/* Check for stdin input */
if (pfds[0].revents & POLLIN) {
char buf[256];
ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
if (n > 0) {
buf[n] = '\0';
printf("Input: %s", buf);
}
}
/* Check for signal via pipe */
if (pfds[1].revents & POLLIN) {
unsigned char sig;
while (read(pipe_fds[0], &sig, 1) > 0) {
printf("Signal %d via self-pipe\n", sig);
if (sig == SIGINT || sig == SIGTERM) {
printf("Shutting down.\n");
close(pipe_fds[0]);
close(pipe_fds[1]);
return 0;
}
}
}
}
close(pipe_fds[0]);
close(pipe_fds[1]);
return 0;
}
Why make the pipe non-blocking? If the signal fires rapidly, the write end
could fill up. A blocking write in a signal handler would deadlock the process.
With O_NONBLOCK, the write simply fails silently if the pipe is full -- which
is fine because we only need to wake the event loop.
Try It: Modify the self-pipe program to handle
SIGUSR1as a "reload configuration" trigger. When received, print "Reloading config..." in the event loop (not in the handler).
timerfd: Timers as File Descriptors
While not a signal mechanism, timerfd solves the same integration problem for
timers. Instead of SIGALRM, you get a readable file descriptor.
/* timerfd_demo.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/timerfd.h>
#include <stdint.h>
#include <poll.h>
int main(void)
{
int tfd = timerfd_create(CLOCK_MONOTONIC, 0);
if (tfd < 0) {
perror("timerfd_create");
return 1;
}
/* Fire every 2 seconds, first fire in 1 second */
struct itimerspec ts = {
.it_interval = { .tv_sec = 2, .tv_nsec = 0 },
.it_value = { .tv_sec = 1, .tv_nsec = 0 },
};
timerfd_settime(tfd, 0, &ts, NULL);
printf("Timer started. Reading 5 ticks...\n");
for (int i = 0; i < 5; i++) {
uint64_t expirations;
ssize_t n = read(tfd, &expirations, sizeof(expirations));
if (n != sizeof(expirations)) {
perror("read timerfd");
break;
}
printf("Timer tick %d (expirations: %lu)\n", i, expirations);
}
close(tfd);
return 0;
}
Integrating Everything: An Event-Driven Server Skeleton
Here is a skeleton that combines signalfd, timerfd, and socket I/O in one event loop:
/* event_server.c */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/signalfd.h>
#include <sys/timerfd.h>
#include <poll.h>
#include <string.h>
#include <stdint.h>
int main(void)
{
/* 1. Set up signalfd for SIGINT, SIGTERM */
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigaddset(&mask, SIGTERM);
sigprocmask(SIG_BLOCK, &mask, NULL);
int sig_fd = signalfd(-1, &mask, 0);
/* 2. Set up timerfd for periodic work */
int tmr_fd = timerfd_create(CLOCK_MONOTONIC, 0);
struct itimerspec ts = {
.it_interval = { .tv_sec = 3, .tv_nsec = 0 },
.it_value = { .tv_sec = 3, .tv_nsec = 0 },
};
timerfd_settime(tmr_fd, 0, &ts, NULL);
/* 3. Event loop */
enum { FD_SIGNAL, FD_TIMER, FD_COUNT };
struct pollfd pfds[FD_COUNT] = {
[FD_SIGNAL] = { .fd = sig_fd, .events = POLLIN },
[FD_TIMER] = { .fd = tmr_fd, .events = POLLIN },
};
printf("PID %d: event server running\n", getpid());
int running = 1;
while (running) {
int n = poll(pfds, FD_COUNT, -1);
if (n < 0) { perror("poll"); break; }
/* Signal event */
if (pfds[FD_SIGNAL].revents & POLLIN) {
struct signalfd_siginfo si;
read(sig_fd, &si, sizeof(si));
printf("Signal %d received. Shutting down.\n", si.ssi_signo);
running = 0;
}
/* Timer event */
if (pfds[FD_TIMER].revents & POLLIN) {
uint64_t exp;
read(tmr_fd, &exp, sizeof(exp));
printf("Timer tick (expirations: %lu)\n", exp);
}
}
close(sig_fd);
close(tmr_fd);
return 0;
}
Event Loop Architecture:
+----------+ +----------+ +----------+
| signalfd | | timerfd | | socket |
| (signals)| | (timers) | | (network)|
+----+-----+ +----+-----+ +----+-----+
| | |
v v v
+--------------------------------------+
| poll() / epoll_wait() |
+--------------------------------------+
|
v
Handle event synchronously
(no async-signal-safety concerns)
Driver Prep: Kernel drivers use
wait_event()andwake_up()for event notification, not signalfd or timerfd. But user-space driver frameworks (like DPDK, SPDK, or UIO helpers) often build event loops with these exact Linux fd types. The pattern of multiplexing heterogeneous event sources into one loop translates directly.
Rust: signalfd via nix
// signalfd_nix.rs // Cargo.toml: nix = { version = "0.29", features = ["signal", "poll"] } use nix::sys::signal::SigSet; use nix::sys::signal::Signal; use nix::sys::signalfd::{SignalFd, SfdFlags}; use nix::sys::signal::SigmaskHow; use nix::sys::signal; use nix::poll::{PollFd, PollFlags, poll, PollTimeout}; use std::os::unix::io::AsFd; fn main() { // Block signals let mut mask = SigSet::empty(); mask.add(Signal::SIGINT); mask.add(Signal::SIGTERM); signal::sigprocmask(SigmaskHow::SIG_BLOCK, Some(&mask)) .expect("sigprocmask"); // Create signalfd let mut sfd = SignalFd::with_flags(&mask, SfdFlags::empty()) .expect("signalfd"); println!("PID {}: Press Ctrl+C or send SIGTERM", std::process::id()); loop { let poll_fd = PollFd::new(sfd.as_fd(), PollFlags::POLLIN); let ret = poll(&mut [poll_fd], PollTimeout::from(5000u16)) .expect("poll"); if ret == 0 { println!("Tick..."); continue; } if let Some(info) = sfd.read_signal().expect("read_signal") { println!("Received signal {} from PID {}", info.ssi_signo, info.ssi_pid); let sig = info.ssi_signo as i32; if sig == Signal::SIGINT as i32 || sig == Signal::SIGTERM as i32 { println!("Shutting down."); break; } } } }
Rust Note: For production async Rust,
signal-hookintegrates withmio(the I/O reactor behind tokio) viasignal-hook-mio. Tokio also provides built-intokio::signalthat uses signalfd on Linux internally. You simplyawaita signal future. The ecosystem has converged on treating signals as just another async event.
Comparison: Signal Handling Approaches
| Approach | Portability | Complexity | Event Loop Integration |
|---|---|---|---|
signal() / sigaction() handler | POSIX | Low | Poor |
| Self-pipe trick | POSIX | Medium | Good |
signalfd() | Linux only | Low | Excellent |
signal-hook (Rust) | Cross-platform | Low | Excellent |
When to Use What
-
Simple CLI tools:
sigaction()with avolatile sig_atomic_tflag. Nothing more needed. -
Event-driven servers on Linux:
signalfd()withepoll(). Clean, efficient, no race conditions. -
Portable servers: Self-pipe trick. Works everywhere, adds one extra fd.
-
Rust programs:
signal-hookcrate. It picks the right backend automatically.
Try It: Write a program that uses
signalfdandtimerfdtogether in onepoll()loop. The timer fires every second and prints a count. SIGUSR1 resets the count to zero. SIGINT exits cleanly.
Knowledge Check
-
Why must you block signals with
sigprocmask()before creating asignalfd? -
What advantage do real-time signals have over standard signals?
-
In the self-pipe trick, why must both ends of the pipe be set to non-blocking mode?
Common Pitfalls
-
Forgetting to block signals before signalfd: The signal gets delivered to the default handler instead of the fd. The fd never becomes readable.
-
Not draining the self-pipe: If you only read one byte but multiple signals arrived, the pipe stays readable. Always read in a loop until
EAGAIN. -
Blocking write in signal handler: If the self-pipe fills up and the write end is blocking, the handler blocks forever. Always use
O_NONBLOCK. -
Mixing signalfd and handlers: If you have both a signalfd and a
sigactionhandler for the same signal, behavior is undefined. Pick one. -
Ignoring timerfd expirations count:
read()on a timerfd returns auint64_twith the number of expirations since last read. If your process was delayed, this count can be greater than 1. -
Using signalfd in multithreaded programs: Signal masks are per-thread. Block the signals in all threads, then read the signalfd from one thread only.
Threads and pthreads
Threads let you run multiple execution paths inside a single process, sharing the same address space. They are lighter than fork() because there is no page-table copy, no duplicated file descriptors, no COW overhead. This chapter covers POSIX threads in C and std::thread in Rust.
Why Threads?
Process with one thread: Process with three threads:
+---------------------------+ +---------------------------+
| Code | Data | Heap | | Code | Data | Heap |
| | | | | | (shared) |
+---------------------------+ +---------------------------+
| Stack | | Stack-0 | Stack-1 | Stack-2|
+---------------------------+ +---------------------------+
| 1 program counter | | PC-0 | PC-1 | PC-2 |
+---------------------------+ +---------------------------+
Every thread shares the code, global data, heap, and file descriptors. Each thread gets its own stack and register set. This makes communication between threads trivial (just read shared memory) but also dangerous (data races).
Creating a Thread in C
/* thread_hello.c */
#include <stdio.h>
#include <pthread.h>
void *greet(void *arg) {
int id = *(int *)arg;
printf("Hello from thread %d\n", id);
return NULL;
}
int main(void) {
pthread_t t;
int id = 42;
if (pthread_create(&t, NULL, greet, &id) != 0) {
perror("pthread_create");
return 1;
}
pthread_join(t, NULL);
printf("Thread finished\n");
return 0;
}
Compile with:
gcc -o thread_hello thread_hello.c -pthread
The -pthread flag links the pthreads library and defines the right macros.
pthread_create takes four arguments:
| Argument | Meaning |
|---|---|
&t | Where to store the thread ID |
NULL | Thread attributes (NULL = defaults) |
greet | The function to run |
&id | Argument passed to that function |
The thread function signature is always void *(*)(void *) -- it takes a void * and returns a void *.
Passing Arguments Safely
A common bug: passing a pointer to a stack variable that changes before the thread reads it.
/* broken_args.c -- DO NOT DO THIS */
#include <stdio.h>
#include <pthread.h>
void *print_id(void *arg) {
int id = *(int *)arg; /* race: main may have changed *arg */
printf("Thread %d\n", id);
return NULL;
}
int main(void) {
pthread_t threads[5];
for (int i = 0; i < 5; i++) {
pthread_create(&threads[i], NULL, print_id, &i); /* BUG */
}
for (int i = 0; i < 5; i++)
pthread_join(threads[i], NULL);
return 0;
}
Caution: The loop variable
iis shared across all threads. By the time a thread reads*arg,imay already be 3 or 5. You might see "Thread 5" printed five times.
The fix: give each thread its own copy.
/* fixed_args.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *print_id(void *arg) {
int id = *(int *)arg;
free(arg);
printf("Thread %d\n", id);
return NULL;
}
int main(void) {
pthread_t threads[5];
for (int i = 0; i < 5; i++) {
int *p = malloc(sizeof(int));
*p = i;
pthread_create(&threads[i], NULL, print_id, p);
}
for (int i = 0; i < 5; i++)
pthread_join(threads[i], NULL);
return 0;
}
Each thread gets its own heap-allocated integer. The thread frees it after reading.
Try It: Modify
broken_args.cto use an arrayint ids[5]instead ofmalloc. Setids[i] = ibefore creating each thread. Does this fix the bug? Why or why not?
Return Values
A thread function returns void *. You retrieve it through pthread_join.
/* thread_return.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *compute_square(void *arg) {
int val = *(int *)arg;
int *result = malloc(sizeof(int));
*result = val * val;
return result;
}
int main(void) {
pthread_t t;
int input = 7;
void *retval;
pthread_create(&t, NULL, compute_square, &input);
pthread_join(t, &retval);
printf("7 squared = %d\n", *(int *)retval);
free(retval);
return 0;
}
Caution: Never return a pointer to a local variable from the thread function. The thread's stack is destroyed after it exits. Return heap-allocated memory or cast an integer to
void *.
Joinable vs Detached Threads
By default, threads are joinable. If you never join them, you leak resources (similar to zombie processes). Detached threads clean up automatically when they exit.
/* detached.c */
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
void *background_work(void *arg) {
(void)arg;
sleep(1);
printf("Background work done\n");
return NULL;
}
int main(void) {
pthread_t t;
pthread_create(&t, NULL, background_work, NULL);
pthread_detach(t); /* cannot join after this */
printf("Main continues immediately\n");
sleep(2); /* give detached thread time to finish */
return 0;
}
You can also create a thread as detached from the start:
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&t, &attr, func, arg);
pthread_attr_destroy(&attr);
Thread-Local Storage
Sometimes each thread needs its own copy of a variable. Three approaches in C:
1. The __thread keyword (GCC extension, also C11 _Thread_local):
/* tls_keyword.c */
#include <stdio.h>
#include <pthread.h>
__thread int counter = 0;
void *worker(void *arg) {
int id = *(int *)arg;
for (int i = 0; i < 1000; i++)
counter++;
printf("Thread %d: counter = %d\n", id, counter);
return NULL;
}
int main(void) {
pthread_t t1, t2;
int id1 = 1, id2 = 2;
pthread_create(&t1, NULL, worker, &id1);
pthread_create(&t2, NULL, worker, &id2);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Main: counter = %d\n", counter);
return 0;
}
Each thread sees counter = 1000. Main sees counter = 0. No synchronization needed.
2. pthread_key_create / pthread_getspecific / pthread_setspecific:
/* tls_key.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
static pthread_key_t key;
void destructor(void *val) {
free(val);
}
void *worker(void *arg) {
int *p = malloc(sizeof(int));
*p = *(int *)arg;
pthread_setspecific(key, p);
int *my_val = pthread_getspecific(key);
printf("Thread-local value: %d\n", *my_val);
return NULL;
}
int main(void) {
pthread_key_create(&key, destructor);
pthread_t t1, t2;
int a = 10, b = 20;
pthread_create(&t1, NULL, worker, &a);
pthread_create(&t2, NULL, worker, &b);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_key_delete(key);
return 0;
}
The destructor runs automatically when a thread exits.
Thread Safety: What Breaks
When two threads touch the same data without synchronization, you get a data race.
/* data_race.c */
#include <stdio.h>
#include <pthread.h>
int shared_counter = 0;
void *increment(void *arg) {
(void)arg;
for (int i = 0; i < 1000000; i++)
shared_counter++; /* NOT atomic */
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Expected: 2000000, Got: %d\n", shared_counter);
return 0;
}
Run this several times. You will almost never see 2000000. The increment shared_counter++ is three CPU instructions (load, add, store). Two threads interleave them:
Thread A: load counter (0)
Thread B: load counter (0)
Thread A: add 1 -> 1
Thread B: add 1 -> 1
Thread A: store 1
Thread B: store 1 <-- one increment lost
Caution: Data races in C are undefined behavior per C11. The compiler is free to assume they do not happen, leading to bizarre optimizations.
Rust: std::thread::spawn
Rust threads use OS threads, just like pthreads. The API is safer.
// thread_hello.rs use std::thread; fn main() { let handle = thread::spawn(|| { println!("Hello from a spawned thread"); }); handle.join().unwrap(); println!("Thread finished"); }
No void * casting. No manual memory management. The closure captures its environment.
Move Closures for Safe Data Passing
Rust forces you to either borrow or move data into the thread closure. Since the compiler cannot prove the borrow outlives the thread, you must use move.
// thread_move.rs use std::thread; fn main() { let mut handles = vec![]; for i in 0..5 { let handle = thread::spawn(move || { println!("Thread {}", i); }); handles.push(handle); } for h in handles { h.join().unwrap(); } }
Each closure gets its own copy of i (integers implement Copy). There is no equivalent of the C bug where all threads share a pointer to the same loop variable.
Rust Note: Rust's
thread::spawnrequires the closure to be'static-- it cannot borrow stack-local data from the parent. This prevents the entire class of dangling-pointer bugs that plague pthreads.
Returning Values from Rust Threads
The JoinHandle<T> carries the return value.
// thread_return.rs use std::thread; fn main() { let handle = thread::spawn(|| -> i32 { 7 * 7 }); let result = handle.join().unwrap(); println!("7 squared = {}", result); }
No malloc, no void * cast, no free. The value is moved out of the thread safely.
Thread-Local Storage in Rust
// thread_local.rs use std::cell::RefCell; use std::thread; thread_local! { static COUNTER: RefCell<u32> = RefCell::new(0); } fn main() { let mut handles = vec![]; for id in 0..3 { let h = thread::spawn(move || { COUNTER.with(|c| { for _ in 0..1000 { *c.borrow_mut() += 1; } println!("Thread {}: counter = {}", id, *c.borrow()); }); }); handles.push(h); } for h in handles { h.join().unwrap(); } COUNTER.with(|c| { println!("Main: counter = {}", *c.borrow()); }); }
Each thread sees its own COUNTER. The thread_local! macro initializes lazily per thread.
Comparing C and Rust Thread APIs
+--------------------+-------------------------------+---------------------------+
| Operation | C (pthreads) | Rust (std::thread) |
+--------------------+-------------------------------+---------------------------+
| Create | pthread_create(&t, NULL, f, a)| thread::spawn(closure) |
| Join | pthread_join(t, &retval) | handle.join().unwrap() |
| Detach | pthread_detach(t) | drop(handle) (implicit) |
| Pass args | void* cast | move closure |
| Return values | void* cast | JoinHandle<T> |
| Thread-local | __thread / pthread_key | thread_local! macro |
| Data race protect | programmer discipline | compiler-enforced |
+--------------------+-------------------------------+---------------------------+
Driver Prep: Linux kernel threads use
kthread_createandkthread_run, which follow a similar create-join pattern. The kernel has its own synchronization primitives (spinlock_t,mutex,rcu) but the mental model is the same: shared data needs protection.
Knowledge Check
- What happens if you pass
&i(whereiis a loop variable) to fivepthread_createcalls without copyingi? - Why must you compile with
-pthreadand not just-lpthread? - In Rust, why does
thread::spawnrequire a'staticclosure?
Common Pitfalls
- Forgetting
-pthread-- the program may compile but crash at runtime or behave strangely. - Returning a pointer to a local variable from a thread function -- the stack is gone after the thread exits.
- Not joining and not detaching -- resource leak, just like a zombie process.
- Passing a shared pointer to multiple threads without synchronization -- data race, undefined behavior.
- Calling
pthread_joinon a detached thread -- undefined behavior. - Assuming
printfis thread-safe in all cases -- it is, by POSIX, but output may interleave at the line level.
Mutexes, Condition Variables, and Synchronization
The previous chapter showed that two threads incrementing a shared counter lose updates. This chapter fixes that with mutexes, condition variables, and read-write locks. We start with C's pthreads primitives, then show how Rust wraps the data inside the lock itself.
The Race Condition, Concretely
Here is the broken counter again for reference:
/* race.c */
#include <stdio.h>
#include <pthread.h>
static int counter = 0;
void *increment(void *arg) {
(void)arg;
for (int i = 0; i < 1000000; i++)
counter++;
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Expected 2000000, got %d\n", counter);
return 0;
}
The CPU executes counter++ as three steps: load, increment, store. Two threads interleaving these steps lose updates.
Time Thread A Thread B counter
---- -------- -------- -------
1 load counter (100) 100
2 load counter (100) 100
3 add 1 -> 101 100
4 add 1 -> 101 100
5 store 101 101
6 store 101 101 <-- lost update
Mutex: The Fix
A mutex (mutual exclusion) ensures only one thread enters the critical section at a time.
/* mutex_counter.c */
#include <stdio.h>
#include <pthread.h>
static int counter = 0;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void *increment(void *arg) {
(void)arg;
for (int i = 0; i < 1000000; i++) {
pthread_mutex_lock(&lock);
counter++;
pthread_mutex_unlock(&lock);
}
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_mutex_destroy(&lock);
printf("Expected 2000000, got %d\n", counter);
return 0;
}
Now the output is always 2000000. The mutex serializes access to counter.
The lifecycle of a mutex:
PTHREAD_MUTEX_INITIALIZER or pthread_mutex_init(&m, NULL)
|
v
pthread_mutex_lock(&m) <-- blocks if another thread holds it
|
v
[ critical section ]
|
v
pthread_mutex_unlock(&m)
|
v
pthread_mutex_destroy(&m)
Try It: Remove the
pthread_mutex_lock/unlockcalls and run the program 10 times. How much variance do you see in the output?
Dynamic Initialization
For mutexes allocated on the heap or inside a struct, use pthread_mutex_init:
/* mutex_dynamic.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
typedef struct {
int value;
pthread_mutex_t lock;
} SafeCounter;
SafeCounter *safe_counter_new(void) {
SafeCounter *sc = malloc(sizeof(SafeCounter));
sc->value = 0;
pthread_mutex_init(&sc->lock, NULL);
return sc;
}
void safe_counter_inc(SafeCounter *sc) {
pthread_mutex_lock(&sc->lock);
sc->value++;
pthread_mutex_unlock(&sc->lock);
}
void safe_counter_free(SafeCounter *sc) {
pthread_mutex_destroy(&sc->lock);
free(sc);
}
int main(void) {
SafeCounter *sc = safe_counter_new();
safe_counter_inc(sc);
safe_counter_inc(sc);
printf("Counter: %d\n", sc->value);
safe_counter_free(sc);
return 0;
}
Deadlock
Deadlock occurs when two threads each hold a lock the other needs.
Thread A Thread B
-------- --------
lock(mutex_1) lock(mutex_2)
... ...
lock(mutex_2) <-- blocked lock(mutex_1) <-- blocked
DEADLOCK DEADLOCK
Prevention rules:
- Lock ordering -- always acquire locks in the same global order.
- Try-lock -- use
pthread_mutex_trylockand back off if it fails. - Avoid holding multiple locks whenever possible.
/* deadlock_fixed.c */
#include <stdio.h>
#include <pthread.h>
static pthread_mutex_t lock_a = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t lock_b = PTHREAD_MUTEX_INITIALIZER;
void *worker1(void *arg) {
(void)arg;
/* Always lock A before B */
pthread_mutex_lock(&lock_a);
pthread_mutex_lock(&lock_b);
printf("Worker 1 has both locks\n");
pthread_mutex_unlock(&lock_b);
pthread_mutex_unlock(&lock_a);
return NULL;
}
void *worker2(void *arg) {
(void)arg;
/* Same order: A before B */
pthread_mutex_lock(&lock_a);
pthread_mutex_lock(&lock_b);
printf("Worker 2 has both locks\n");
pthread_mutex_unlock(&lock_b);
pthread_mutex_unlock(&lock_a);
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, worker1, NULL);
pthread_create(&t2, NULL, worker2, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
return 0;
}
Caution: Deadlocks are silent -- the program just hangs. Use
pthread_mutex_timedlockin debug builds to detect them.
Condition Variables
A condition variable lets a thread sleep until some condition is true, without busy-waiting.
Classic pattern: producer-consumer queue.
/* condvar.c */
#include <stdio.h>
#include <pthread.h>
#include <stdbool.h>
#define QUEUE_SIZE 5
static int queue[QUEUE_SIZE];
static int count = 0;
static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t not_empty = PTHREAD_COND_INITIALIZER;
static pthread_cond_t not_full = PTHREAD_COND_INITIALIZER;
void *producer(void *arg) {
(void)arg;
for (int i = 0; i < 20; i++) {
pthread_mutex_lock(&mtx);
while (count == QUEUE_SIZE) /* MUST be while, not if */
pthread_cond_wait(¬_full, &mtx);
queue[count++] = i;
printf("Produced %d (count=%d)\n", i, count);
pthread_cond_signal(¬_empty);
pthread_mutex_unlock(&mtx);
}
return NULL;
}
void *consumer(void *arg) {
(void)arg;
for (int i = 0; i < 20; i++) {
pthread_mutex_lock(&mtx);
while (count == 0) /* MUST be while, not if */
pthread_cond_wait(¬_empty, &mtx);
int val = queue[--count];
printf("Consumed %d (count=%d)\n", val, count);
pthread_cond_signal(¬_full);
pthread_mutex_unlock(&mtx);
}
return NULL;
}
int main(void) {
pthread_t prod, cons;
pthread_create(&prod, NULL, producer, NULL);
pthread_create(&cons, NULL, consumer, NULL);
pthread_join(prod, NULL);
pthread_join(cons, NULL);
pthread_mutex_destroy(&mtx);
pthread_cond_destroy(¬_empty);
pthread_cond_destroy(¬_full);
return 0;
}
Caution: Always check the condition in a
whileloop, not anif. Spurious wakeups are allowed by POSIX. The thread may wake up even though no one signaled the condvar.
The flow:
pthread_cond_wait(&cond, &mtx):
1. Atomically: unlock mtx + sleep on cond
2. When woken: re-lock mtx
3. Return (caller re-checks condition in while loop)
pthread_cond_signal(&cond):
Wake ONE waiting thread
pthread_cond_broadcast(&cond):
Wake ALL waiting threads
Read-Write Locks
When reads vastly outnumber writes, a read-write lock allows multiple simultaneous readers.
/* rwlock.c */
#include <stdio.h>
#include <pthread.h>
static int shared_data = 0;
static pthread_rwlock_t rwl = PTHREAD_RWLOCK_INITIALIZER;
void *reader(void *arg) {
int id = *(int *)arg;
pthread_rwlock_rdlock(&rwl);
printf("Reader %d sees %d\n", id, shared_data);
pthread_rwlock_unlock(&rwl);
return NULL;
}
void *writer(void *arg) {
(void)arg;
pthread_rwlock_wrlock(&rwl);
shared_data = 42;
printf("Writer set data to 42\n");
pthread_rwlock_unlock(&rwl);
return NULL;
}
int main(void) {
pthread_t r1, r2, w;
int id1 = 1, id2 = 2;
pthread_create(&w, NULL, writer, NULL);
pthread_create(&r1, NULL, reader, &id1);
pthread_create(&r2, NULL, reader, &id2);
pthread_join(w, NULL);
pthread_join(r1, NULL);
pthread_join(r2, NULL);
pthread_rwlock_destroy(&rwl);
return 0;
}
Rust: Mutex -- Data Inside the Lock
In C, the mutex and the data it protects are separate. You can forget to lock. In Rust, the data lives inside the Mutex<T>. You cannot access the data without locking.
// mutex_counter.rs use std::sync::{Arc, Mutex}; use std::thread; fn main() { let counter = Arc::new(Mutex::new(0)); let mut handles = vec![]; for _ in 0..2 { let counter = Arc::clone(&counter); let h = thread::spawn(move || { for _ in 0..1_000_000 { let mut num = counter.lock().unwrap(); *num += 1; } // MutexGuard dropped here -> unlock }); handles.push(h); } for h in handles { h.join().unwrap(); } println!("Result: {}", *counter.lock().unwrap()); }
Rust Note:
Mutex::lock()returns aMutexGuard<T>. This guard implementsDerefandDerefMutso you use it like a reference. When the guard is dropped, the mutex is automatically unlocked. You literally cannot forget to unlock.
Rust: RwLock
// rwlock.rs use std::sync::{Arc, RwLock}; use std::thread; fn main() { let data = Arc::new(RwLock::new(0)); let mut handles = vec![]; // spawn readers for id in 0..3 { let data = Arc::clone(&data); handles.push(thread::spawn(move || { let val = data.read().unwrap(); println!("Reader {} sees {}", id, *val); })); } // spawn writer { let data = Arc::clone(&data); handles.push(thread::spawn(move || { let mut val = data.write().unwrap(); *val = 42; println!("Writer set data to 42"); })); } for h in handles { h.join().unwrap(); } }
Rust: Condvar
// condvar.rs use std::sync::{Arc, Mutex, Condvar}; use std::thread; fn main() { let pair = Arc::new((Mutex::new(false), Condvar::new())); let pair_clone = Arc::clone(&pair); let producer = thread::spawn(move || { let (lock, cvar) = &*pair_clone; let mut ready = lock.lock().unwrap(); *ready = true; println!("Producer: data is ready"); cvar.notify_one(); }); let (lock, cvar) = &*pair; let mut ready = lock.lock().unwrap(); while !*ready { ready = cvar.wait(ready).unwrap(); } println!("Consumer: got the signal, ready = {}", *ready); producer.join().unwrap(); }
The Condvar::wait method takes the MutexGuard, releases the lock, sleeps, reacquires the lock, and returns a new guard. Same semantics as pthread_cond_wait, but type-safe.
Rust: Channels (mpsc)
Message passing avoids shared state entirely. Rust provides multi-producer, single-consumer channels.
// channel.rs use std::sync::mpsc; use std::thread; fn main() { let (tx, rx) = mpsc::channel(); let producer = thread::spawn(move || { for i in 0..5 { tx.send(i * i).unwrap(); } }); for val in rx { println!("Received: {}", val); } producer.join().unwrap(); }
When the tx (sender) is dropped, the rx iterator ends. Clean, simple, no locks.
For multiple producers, clone the sender:
// multi_producer.rs use std::sync::mpsc; use std::thread; fn main() { let (tx, rx) = mpsc::channel(); let mut handles = vec![]; for id in 0..3 { let tx = tx.clone(); handles.push(thread::spawn(move || { tx.send(format!("Hello from thread {}", id)).unwrap(); })); } drop(tx); // drop original sender so rx iterator terminates for msg in rx { println!("{}", msg); } for h in handles { h.join().unwrap(); } }
Driver Prep: The Linux kernel uses similar patterns:
wait_event/wake_upfor condition variables,spinlock_tfor short critical sections, andcompletionfor one-shot signaling. Message-passing patterns appear in kernel workqueues.
Why Rust's Mutex Is Better Than C's
C: mutex and data are separate
- You can access data without locking
- You can lock the wrong mutex
- You can forget to unlock
Rust: data is INSIDE the Mutex<T>
- You MUST lock to access data
- The lock guard auto-unlocks on drop
- The compiler enforces Send + Sync bounds
Try It: In the Rust
mutex_counter.rsexample, try removingArc::cloneand just movingcounterinto both closures. What error does the compiler give? Why?
Knowledge Check
- Why must the condition in a condition variable be checked in a
whileloop, not anif? - What is the difference between
pthread_cond_signalandpthread_cond_broadcast? - In Rust, what prevents you from accessing data protected by a
Mutex<T>without locking it?
Common Pitfalls
- Forgetting to unlock -- in C, every
lockmust have a matchingunlock, even on error paths. Use cleanup handlers or RAII wrappers. - Locking inside a loop body when you meant to lock outside it -- performance disaster from lock contention.
- Deadlock from inconsistent lock ordering -- establish a global order and document it.
- Using
ifinstead ofwhilewith condition variables -- spurious wakeups cause logic bugs. - Holding a lock while doing I/O -- blocks all other threads waiting on that lock. Keep critical sections short.
- Poisoned mutex in Rust -- if a thread panics while holding a
MutexGuard, the mutex is poisoned. Call.unwrap()or handle thePoisonError.
Rust Threads, Channels, and Async
Rust's concurrency model is built on two pillars: the type system prevents data races at compile time, and the ecosystem gives you both OS threads and async I/O. This chapter digs into Send, Sync, scoped threads, channels, Arc<Mutex<T>>, and introduces async/await with tokio.
Send and Sync
Two marker traits control what can cross thread boundaries.
Send: A type isSendif it can be transferred to another thread. Most types areSend. Raw pointers are not.Sync: A type isSyncif it can be shared (via&T) between threads. A type isSyncif&TisSend.
+---------------------+--------+--------+
| Type | Send? | Sync? |
+---------------------+--------+--------+
| i32, String, Vec<T> | Yes | Yes |
| Mutex<T> | Yes | Yes |
| Rc<T> | No | No |
| Arc<T> | Yes | Yes |
| Cell<T> | Yes | No |
| *mut T | No | No |
+---------------------+--------+--------+
If you try to send an Rc<T> to another thread, the compiler stops you:
// send_error.rs -- WILL NOT COMPILE use std::rc::Rc; use std::thread; fn main() { let data = Rc::new(42); thread::spawn(move || { println!("{}", data); }); }
error: `Rc<i32>` cannot be sent between threads safely
Rust Note: These traits are automatically derived by the compiler. You almost never implement them manually. They exist so the compiler can reason about thread safety without runtime checks.
Channels: mpsc in Depth
Rust's standard library provides std::sync::mpsc -- multi-producer, single-consumer channels.
// channel_types.rs use std::sync::mpsc; use std::thread; use std::time::Duration; fn main() { // Unbounded channel (infinite buffer) let (tx, rx) = mpsc::channel(); thread::spawn(move || { let messages = vec!["hello", "from", "the", "thread"]; for msg in messages { tx.send(msg).unwrap(); thread::sleep(Duration::from_millis(200)); } }); // recv() blocks until a message arrives // When the sender drops, recv() returns Err loop { match rx.recv() { Ok(msg) => println!("Got: {}", msg), Err(_) => { println!("Channel closed"); break; } } } }
For bounded channels (backpressure):
// sync_channel.rs use std::sync::mpsc; use std::thread; fn main() { // Buffer holds at most 2 messages let (tx, rx) = mpsc::sync_channel(2); let producer = thread::spawn(move || { for i in 0..5 { println!("Sending {}", i); tx.send(i).unwrap(); // blocks if buffer full println!("Sent {}", i); } }); for val in rx { println!("Received: {}", val); } producer.join().unwrap(); }
Try It: Change the buffer size to 0. This creates a rendezvous channel where send blocks until the receiver calls recv. Run it and observe the interleaving.
Arc<Mutex> for Shared Mutable State
When multiple threads need to read and write the same data, combine Arc (atomic reference counting) with Mutex (mutual exclusion).
// arc_mutex.rs use std::sync::{Arc, Mutex}; use std::thread; fn main() { let data = Arc::new(Mutex::new(vec![1, 2, 3])); let mut handles = vec![]; for i in 0..3 { let data = Arc::clone(&data); handles.push(thread::spawn(move || { let mut vec = data.lock().unwrap(); vec.push(i * 10); println!("Thread {} pushed {}", i, i * 10); })); } for h in handles { h.join().unwrap(); } println!("Final: {:?}", *data.lock().unwrap()); }
The ownership diagram:
main thread thread 0 thread 1
| | |
v v v
Arc -------> [strong count = 3] <-------- Arc
|
v
Mutex<Vec<i32>>
|
v
Vec [1, 2, 3, ...]
Each Arc::clone increments the atomic reference count. When the last Arc is dropped, the Mutex and Vec are freed.
Scoped Threads
std::thread::scope (stable since Rust 1.63) lets threads borrow from the parent stack. No 'static requirement, no Arc needed.
// scoped.rs use std::thread; fn main() { let mut data = vec![1, 2, 3, 4, 5]; thread::scope(|s| { s.spawn(|| { let sum: i32 = data.iter().sum(); println!("Sum: {}", sum); }); s.spawn(|| { let len = data.len(); println!("Length: {}", len); }); }); // All scoped threads are joined here automatically // We can mutate data again -- threads are done data.push(6); println!("After scope: {:?}", data); }
Rust Note: Scoped threads solve the problem of needing
Arcjust to share a reference. The scope guarantees all threads finish before the borrow ends. This is similar to OpenMP parallel regions.
With scoped threads you can even have one thread borrow mutably while others read different data:
// scoped_mut.rs use std::thread; fn main() { let mut a = 10; let b = 20; thread::scope(|s| { s.spawn(|| { a += b; // mutable borrow of a }); s.spawn(|| { println!("b = {}", b); // shared borrow of b only }); }); println!("a = {}", a); }
Introduction to Async: Why and When
Threads are great for CPU-bound work. But for I/O-bound work (network servers, file I/O), OS threads are heavy: each one costs ~8KB of kernel stack plus scheduling overhead. A server handling 10,000 connections needs 10,000 threads.
Async I/O uses cooperative multitasking: tasks yield when they would block, and a runtime multiplexes many tasks onto a few OS threads.
Threads (preemptive): Async (cooperative):
Thread 1 [==BLOCK=====RUN==] Task 1 [==yield--RUN==yield--]
Thread 2 [=RUN===BLOCK==RUN] Task 2 [--RUN====yield--RUN==]
Thread 3 [BLOCK=======RUN==] Task 3 [--yield--RUN========-]
^
3 OS threads 1 OS thread, 3 tasks
Rule of thumb:
- CPU-bound (number crunching, compression): use threads
- I/O-bound (network, disk): use async
- Mixed: use async with
spawn_blockingfor CPU work
The Future Trait
An async function returns a Future. A Future is a state machine that can be polled.
#![allow(unused)] fn main() { // This is conceptual -- you don't implement Future manually for most code trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>; } enum Poll<T> { Ready(T), Pending, } }
When you write async fn, the compiler transforms your function into a state machine that implements Future. The .await points are where the state machine yields.
Tokio Basics
Tokio is the most widely used async runtime for Rust. Add it to Cargo.toml:
[dependencies]
tokio = { version = "1", features = ["full"] }
A minimal async program:
// tokio_hello.rs #[tokio::main] async fn main() { println!("Hello from async main"); let result = compute().await; println!("Result: {}", result); } async fn compute() -> i32 { tokio::time::sleep(std::time::Duration::from_millis(100)).await; 42 }
The #[tokio::main] macro sets up the runtime. Without it, async fn main would return a Future that nobody polls.
Spawning Async Tasks
// tokio_spawn.rs use tokio::time::{sleep, Duration}; #[tokio::main] async fn main() { let handle1 = tokio::spawn(async { sleep(Duration::from_millis(100)).await; println!("Task 1 done"); 1 }); let handle2 = tokio::spawn(async { sleep(Duration::from_millis(50)).await; println!("Task 2 done"); 2 }); let r1 = handle1.await.unwrap(); let r2 = handle2.await.unwrap(); println!("Results: {} + {} = {}", r1, r2, r1 + r2); }
Tasks run concurrently on the thread pool. tokio::spawn is like thread::spawn but for async tasks.
select!: Racing Tasks
// tokio_select.rs use tokio::time::{sleep, Duration}; #[tokio::main] async fn main() { tokio::select! { _ = sleep(Duration::from_secs(1)) => { println!("1 second elapsed"); } _ = sleep(Duration::from_millis(500)) => { println!("500ms elapsed first"); } } }
select! waits for the first future to complete and cancels the rest. Useful for timeouts, shutdown signals, and multiplexing.
Async TCP Echo Server
Here is a complete async echo server -- the kind of thing that would need threads-per-connection in C:
// echo_server.rs use tokio::net::TcpListener; use tokio::io::{AsyncReadExt, AsyncWriteExt}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let listener = TcpListener::bind("127.0.0.1:8080").await?; println!("Listening on 127.0.0.1:8080"); loop { let (mut socket, addr) = listener.accept().await?; println!("New connection from {}", addr); tokio::spawn(async move { let mut buf = [0u8; 1024]; loop { let n = match socket.read(&mut buf).await { Ok(0) => { println!("{} disconnected", addr); return; } Ok(n) => n, Err(e) => { eprintln!("Read error from {}: {}", addr, e); return; } }; if let Err(e) = socket.write_all(&buf[..n]).await { eprintln!("Write error to {}: {}", addr, e); return; } } }); } }
Try It: Run the echo server, then connect with
nc 127.0.0.1 8080from multiple terminals. Each connection is handled by a lightweight task, not an OS thread.
C Comparison: Threaded Echo Server
For contrast, here is the same server in C using one thread per connection:
/* echo_server_threaded.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/socket.h>
#include <netinet/in.h>
void *handle_client(void *arg) {
int fd = *(int *)arg;
free(arg);
char buf[1024];
ssize_t n;
while ((n = read(fd, buf, sizeof(buf))) > 0) {
write(fd, buf, n);
}
close(fd);
return NULL;
}
int main(void) {
int srv = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(8080),
.sin_addr.s_addr = INADDR_ANY
};
bind(srv, (struct sockaddr *)&addr, sizeof(addr));
listen(srv, 128);
printf("Listening on port 8080\n");
while (1) {
int *client = malloc(sizeof(int));
*client = accept(srv, NULL, NULL);
pthread_t t;
pthread_create(&t, NULL, handle_client, client);
pthread_detach(t);
}
}
This works but creates an OS thread per connection. At 10,000 connections, you have 10,000 threads. The async version uses a small thread pool regardless of connection count.
Driver Prep: Kernel drivers do not use async/await, but they use a similar concept: workqueues and tasklets defer work without creating new threads. The kernel's
io_uringinterface is the closest thing to async I/O at the syscall level.
Threads vs Async: Decision Guide
+------------------+-------------------+--------------------+
| Factor | OS Threads | Async Tasks |
+------------------+-------------------+--------------------+
| Scheduling | Preemptive (OS) | Cooperative (user) |
| Stack size | ~8KB kernel stack | Few hundred bytes |
| Creation cost | Moderate | Very cheap |
| Best for | CPU-bound work | I/O-bound work |
| Max concurrency | ~thousands | ~millions |
| Blocking calls | OK | MUST NOT block |
| Debugging | Easier | Harder (state mc.) |
+------------------+-------------------+--------------------+
Caution: Never call blocking functions (like
std::thread::sleepor synchronous file I/O) inside an async task. Usetokio::time::sleep,tokio::fs, ortokio::task::spawn_blockinginstead. Blocking an async task blocks the entire runtime thread.
Knowledge Check
- What is the difference between
SendandSync? - Why does
Rc<T>fail to compile when sent to another thread? - When should you use
tokio::task::spawn_blockinginstead oftokio::spawn?
Common Pitfalls
- Using
Rcinstead ofArcacross threads -- compile error, but confusing for beginners. - Forgetting
moveon closures passed tothread::spawn-- the closure borrows from the stack, which the thread may outlive. - Holding a
MutexGuardacross an.awaitpoint -- this blocks the async runtime. Usetokio::sync::Mutexif you must hold a lock across await. - Calling
.awaitoutside an async function -- futures are lazy; they do nothing until polled. - Mixing
std::sync::Mutexwith async code -- it works if the critical section is short and never crosses an await, buttokio::sync::Mutexis safer for async contexts. - Not dropping the original sender when using
mpsc::channelwith cloned senders -- the receiver never terminates.
Pipes and FIFOs
Pipes are the oldest IPC mechanism on Unix. When you type ls | grep foo | wc -l in a shell, three processes are connected by two pipes. This chapter covers unnamed pipes, named pipes (FIFOs), dup2 for I/O redirection, and building a mini shell pipeline.
pipe(): The Basics
pipe() creates two file descriptors: one for reading, one for writing. Data written to the write end comes out the read end, in order, like a one-way queue.
/* pipe_basic.c */
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int main(void) {
int fd[2];
if (pipe(fd) == -1) {
perror("pipe");
return 1;
}
/* fd[0] = read end, fd[1] = write end */
const char *msg = "Hello through a pipe!\n";
write(fd[1], msg, strlen(msg));
close(fd[1]); /* close write end so read sees EOF */
char buf[128];
ssize_t n = read(fd[0], buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Read: %s", buf);
close(fd[0]);
return 0;
}
pipe() returns:
fd[0] ----READ----< KERNEL BUFFER <----WRITE---- fd[1]
(4096-65536 bytes)
Parent-Child Communication
The real power of pipes comes with fork(). The parent and child share the pipe file descriptors.
/* pipe_fork.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
int main(void) {
int fd[2];
pipe(fd);
pid_t pid = fork();
if (pid == -1) {
perror("fork");
return 1;
}
if (pid == 0) {
/* Child: write to pipe */
close(fd[0]); /* close unused read end */
const char *msg = "Message from child\n";
write(fd[1], msg, strlen(msg));
close(fd[1]);
_exit(0);
} else {
/* Parent: read from pipe */
close(fd[1]); /* close unused write end */
char buf[128];
ssize_t n = read(fd[0], buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Parent received: %s", buf);
close(fd[0]);
wait(NULL);
}
return 0;
}
Caution: Always close the unused ends of the pipe in each process. If the child does not close
fd[0], and the parent does not closefd[1],read()may block forever because the kernel thinks a writer still exists.
The flow after fork:
Before fork:
Process: fd[0]=read, fd[1]=write
After fork:
Parent: fd[0]=read, fd[1]=CLOSE
Child: fd[0]=CLOSE, fd[1]=write
Child writes --> kernel buffer --> Parent reads
dup2: Redirecting stdin/stdout
dup2(oldfd, newfd) makes newfd a copy of oldfd, closing newfd first if open. This is how shells redirect I/O.
/* dup2_example.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void) {
int fd[2];
pipe(fd);
pid_t pid = fork();
if (pid == 0) {
/* Child: redirect stdout to pipe write end */
close(fd[0]);
dup2(fd[1], STDOUT_FILENO); /* stdout now writes to pipe */
close(fd[1]); /* original fd no longer needed */
execlp("echo", "echo", "Hello from echo", NULL);
_exit(1);
}
/* Parent: read from pipe */
close(fd[1]);
char buf[256];
ssize_t n = read(fd[0], buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Captured: %s", buf);
close(fd[0]);
wait(NULL);
return 0;
}
After dup2(fd[1], STDOUT_FILENO):
Before dup2: After dup2:
fd[1] -> pipe_write fd[1] -> pipe_write (closed next)
stdout -> terminal stdout -> pipe_write
Implementing a Shell Pipeline
Let us implement ls -la /tmp | grep log | wc -l with pipes and fork.
/* pipeline.c -- ls -la /tmp | grep log | wc -l */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void) {
int pipe1[2], pipe2[2];
pipe(pipe1);
pipe(pipe2);
/* First child: ls -la /tmp */
pid_t p1 = fork();
if (p1 == 0) {
close(pipe1[0]);
close(pipe2[0]);
close(pipe2[1]);
dup2(pipe1[1], STDOUT_FILENO);
close(pipe1[1]);
execlp("ls", "ls", "-la", "/tmp", NULL);
_exit(1);
}
/* Second child: grep log */
pid_t p2 = fork();
if (p2 == 0) {
close(pipe1[1]);
close(pipe2[0]);
dup2(pipe1[0], STDIN_FILENO);
close(pipe1[0]);
dup2(pipe2[1], STDOUT_FILENO);
close(pipe2[1]);
execlp("grep", "grep", "log", NULL);
_exit(1);
}
/* Third child: wc -l */
pid_t p3 = fork();
if (p3 == 0) {
close(pipe1[0]);
close(pipe1[1]);
close(pipe2[1]);
dup2(pipe2[0], STDIN_FILENO);
close(pipe2[0]);
execlp("wc", "wc", "-l", NULL);
_exit(1);
}
/* Parent: close all pipe ends and wait */
close(pipe1[0]);
close(pipe1[1]);
close(pipe2[0]);
close(pipe2[1]);
wait(NULL);
wait(NULL);
wait(NULL);
return 0;
}
The data flow:
ls -la /tmp --pipe1--> grep log --pipe2--> wc -l --> stdout
Try It: Modify the pipeline to run
cat /etc/passwd | grep root | head -1. Remember to create two pipes and three child processes.
Pipe Capacity and Blocking
Linux pipes have a default capacity of 65536 bytes (16 pages). You can query and change it with fcntl:
/* pipe_capacity.c */
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main(void) {
int fd[2];
pipe(fd);
int capacity = fcntl(fd[0], F_GETPIPE_SZ);
printf("Default pipe capacity: %d bytes\n", capacity);
/* Increase capacity (requires CAP_SYS_RESOURCE for > 1MB) */
fcntl(fd[0], F_SETPIPE_SZ, 1048576);
capacity = fcntl(fd[0], F_GETPIPE_SZ);
printf("New pipe capacity: %d bytes\n", capacity);
close(fd[0]);
close(fd[1]);
return 0;
}
Blocking behavior:
write()to a full pipe blocks until space is available (or returnsEAGAINifO_NONBLOCKis set).read()from an empty pipe blocks until data arrives.read()returns 0 when all write ends are closed (EOF).write()to a pipe with no readers sendsSIGPIPEto the writer.
Caution:
SIGPIPEkills the process by default. In servers, setsignal(SIGPIPE, SIG_IGN)and check the return value ofwrite()forEPIPEinstead.
Named Pipes (FIFOs)
Unnamed pipes only work between related processes (parent-child). FIFOs are special files on the filesystem that unrelated processes can open.
/* fifo_writer.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <string.h>
int main(void) {
const char *fifo_path = "/tmp/myfifo";
mkfifo(fifo_path, 0666); /* create the FIFO */
printf("Opening FIFO for writing (blocks until a reader opens)...\n");
int fd = open(fifo_path, O_WRONLY);
const char *msg = "Hello through a FIFO!\n";
write(fd, msg, strlen(msg));
close(fd);
return 0;
}
/* fifo_reader.c */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(void) {
const char *fifo_path = "/tmp/myfifo";
printf("Opening FIFO for reading (blocks until a writer opens)...\n");
int fd = open(fifo_path, O_RDONLY);
char buf[256];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Received: %s", buf);
close(fd);
unlink(fifo_path); /* clean up */
return 0;
}
Run the writer and reader in two terminals. The open() call blocks until both ends are connected.
Terminal 1: ./fifo_writer Terminal 2: ./fifo_reader
"Opening FIFO..." "Opening FIFO..."
(blocks) (connects)
(writes, exits) "Received: Hello through a FIFO!"
Rust: Pipes with std::process
Rust's standard library makes pipe-based communication easy through Command and Stdio:
// rust_pipe.rs use std::process::{Command, Stdio}; use std::io::Read; fn main() { let mut child = Command::new("echo") .arg("Hello from echo") .stdout(Stdio::piped()) .spawn() .expect("Failed to spawn echo"); let mut output = String::new(); child.stdout.take().unwrap().read_to_string(&mut output).unwrap(); child.wait().unwrap(); println!("Captured: {}", output.trim()); }
Rust: Shell Pipeline
// rust_pipeline.rs use std::process::{Command, Stdio}; use std::io::Read; fn main() { // ls -la /tmp | grep log | wc -l let ls = Command::new("ls") .args(["-la", "/tmp"]) .stdout(Stdio::piped()) .spawn() .expect("Failed to start ls"); let grep = Command::new("grep") .arg("log") .stdin(Stdio::from(ls.stdout.unwrap())) .stdout(Stdio::piped()) .spawn() .expect("Failed to start grep"); let mut wc = Command::new("wc") .arg("-l") .stdin(Stdio::from(grep.stdout.unwrap())) .stdout(Stdio::piped()) .spawn() .expect("Failed to start wc"); let mut output = String::new(); wc.stdout.take().unwrap().read_to_string(&mut output).unwrap(); wc.wait().unwrap(); println!("Lines matching 'log': {}", output.trim()); }
Rust Note:
Stdio::from()transfers ownership of the pipe file descriptor. Rust's type system ensures you cannot accidentally use the same stdout twice. In C, you manually close file descriptors and hope you did not make a mistake.
Rust: Low-Level Pipes with nix
For direct pipe() and dup2() access:
// rust_nix_pipe.rs // Cargo.toml: nix = { version = "0.29", features = ["process", "unistd"] } use nix::unistd::{pipe, fork, ForkResult, write, read, close, dup2}; use std::os::fd::AsRawFd; fn main() { let (read_fd, write_fd) = pipe().expect("pipe failed"); match unsafe { fork() }.expect("fork failed") { ForkResult::Child => { close(read_fd.as_raw_fd()).ok(); let msg = b"Hello from child via nix\n"; write(&write_fd, msg).unwrap(); close(write_fd.as_raw_fd()).ok(); std::process::exit(0); } ForkResult::Parent { child: _ } => { close(write_fd.as_raw_fd()).ok(); let mut buf = [0u8; 128]; let n = read(read_fd.as_raw_fd(), &mut buf).unwrap(); close(read_fd.as_raw_fd()).ok(); print!("Parent got: {}", std::str::from_utf8(&buf[..n]).unwrap()); nix::sys::wait::wait().ok(); } } }
Driver Prep: Linux kernel modules use
struct pipe_inode_infointernally. The concept of pipes extends to kernel-space communication:relaychannels andtrace_pipeuse the same ring-buffer idea for high-throughput kernel-to-user data transfer.
Knowledge Check
- What happens if you
write()to a pipe whose read end has been closed by all processes? - Why must you close unused pipe ends after
fork()? - What is the difference between an unnamed pipe and a FIFO?
Common Pitfalls
- Not closing unused pipe ends -- leads to deadlock because
read()never sees EOF. - Forgetting to handle
SIGPIPE-- a write to a pipe with no readers kills the process. - Using pipes for large data transfers -- pipes are limited to kernel buffer size. For bulk data, use shared memory or files.
- Opening a FIFO with
O_RDWR-- technically works but defeats the blocking semantics and is not portable. - Race condition with
mkfifo-- if the file already exists,mkfiforeturnsEEXIST. Check or useunlinkfirst. - Assuming pipe writes are atomic -- writes up to
PIPE_BUF(4096 on Linux) are atomic. Larger writes may be interleaved with other writers.
Shared Memory
Shared memory is the fastest IPC mechanism. Two processes map the same physical memory into their address spaces. There is no copying through the kernel -- a write by one process is instantly visible to the other. The cost: you must synchronize access yourself.
POSIX Shared Memory in C
Three steps: create/open, set the size, map it.
/* shm_writer.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#define SHM_NAME "/my_shm"
#define SHM_SIZE 4096
int main(void) {
/* Create shared memory object */
int fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
if (fd == -1) {
perror("shm_open");
return 1;
}
/* Set its size */
if (ftruncate(fd, SHM_SIZE) == -1) {
perror("ftruncate");
return 1;
}
/* Map it into our address space */
void *ptr = mmap(NULL, SHM_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
return 1;
}
close(fd); /* fd no longer needed after mmap */
/* Write data */
const char *msg = "Hello from shared memory!";
memcpy(ptr, msg, strlen(msg) + 1);
printf("Writer: wrote '%s'\n", msg);
/* Keep running so reader can access */
printf("Writer: press Enter to clean up...\n");
getchar();
munmap(ptr, SHM_SIZE);
shm_unlink(SHM_NAME);
return 0;
}
/* shm_reader.c */
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#define SHM_NAME "/my_shm"
#define SHM_SIZE 4096
int main(void) {
int fd = shm_open(SHM_NAME, O_RDONLY, 0);
if (fd == -1) {
perror("shm_open");
return 1;
}
void *ptr = mmap(NULL, SHM_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
return 1;
}
close(fd);
printf("Reader: got '%s'\n", (char *)ptr);
munmap(ptr, SHM_SIZE);
return 0;
}
Compile both with -lrt (for shm_open):
gcc -o shm_writer shm_writer.c -lrt
gcc -o shm_reader shm_reader.c -lrt
Run the writer first, then the reader in another terminal.
The memory layout:
Process A (writer) Process B (reader)
+------------------+ +------------------+
| Virtual Memory | | Virtual Memory |
| | | |
| mmap region ------+ +------ mmap region |
| | | | | |
+------------------+ | | +------------------+
v v
+------------+
| Physical |
| Memory |
| (shared) |
+------------+
The shm_open / mmap API
| Function | Purpose |
|---|---|
shm_open(name, flags, mode) | Create or open a shared memory object (lives under /dev/shm/) |
ftruncate(fd, size) | Set the size of the shared memory object |
mmap(addr, len, prot, flags, fd, offset) | Map the object into the process address space |
munmap(addr, len) | Unmap the region |
shm_unlink(name) | Remove the shared memory object |
Caution:
shm_unlinkremoves the name from the filesystem, but the memory stays mapped until all processes callmunmapor exit. If you forgetshm_unlink, the shared memory persists across reboots (it lives in/dev/shm/). Check withls /dev/shm/.
Try It: Run
shm_writer, then look at/dev/shm/my_shmwithls -la /dev/shm/. You will see a file. Runshm_reader, then press Enter in the writer to clean up. Verify the file is gone.
Sharing Structured Data
You can share any fixed-size structure. Use offsetof and fixed-width types for portability.
/* shm_struct.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdint.h>
#include <sys/wait.h>
typedef struct {
int32_t counter;
char message[64];
} SharedData;
int main(void) {
const char *name = "/struct_shm";
int fd = shm_open(name, O_CREAT | O_RDWR, 0666);
ftruncate(fd, sizeof(SharedData));
SharedData *data = mmap(NULL, sizeof(SharedData),
PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
close(fd);
data->counter = 0;
strcpy(data->message, "initialized");
pid_t pid = fork();
if (pid == 0) {
/* Child increments counter */
for (int i = 0; i < 100000; i++)
data->counter++; /* WARNING: no synchronization! */
strcpy(data->message, "child was here");
_exit(0);
}
/* Parent also increments */
for (int i = 0; i < 100000; i++)
data->counter++; /* WARNING: race condition! */
wait(NULL);
printf("Counter: %d (expected 200000)\n", data->counter);
printf("Message: %s\n", data->message);
munmap(data, sizeof(SharedData));
shm_unlink(name);
return 0;
}
The counter will be wrong due to the race condition. We need synchronization.
Process-Shared Mutex
A regular pthread_mutex_t only works within a single process. For cross-process synchronization, use PTHREAD_PROCESS_SHARED.
/* shm_mutex.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/wait.h>
typedef struct {
pthread_mutex_t lock;
int counter;
} SharedData;
int main(void) {
const char *name = "/mutex_shm";
int fd = shm_open(name, O_CREAT | O_RDWR, 0666);
ftruncate(fd, sizeof(SharedData));
SharedData *data = mmap(NULL, sizeof(SharedData),
PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
close(fd);
/* Initialize process-shared mutex */
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&data->lock, &attr);
pthread_mutexattr_destroy(&attr);
data->counter = 0;
pid_t pid = fork();
if (pid == 0) {
for (int i = 0; i < 100000; i++) {
pthread_mutex_lock(&data->lock);
data->counter++;
pthread_mutex_unlock(&data->lock);
}
_exit(0);
}
for (int i = 0; i < 100000; i++) {
pthread_mutex_lock(&data->lock);
data->counter++;
pthread_mutex_unlock(&data->lock);
}
wait(NULL);
printf("Counter: %d (expected 200000)\n", data->counter);
pthread_mutex_destroy(&data->lock);
munmap(data, sizeof(SharedData));
shm_unlink(name);
return 0;
}
Compile with:
gcc -o shm_mutex shm_mutex.c -lrt -pthread
Now the counter is always 200000. The key line is pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED).
Caution: The mutex must be stored in the shared memory region itself, not on the stack or heap of either process. Both processes must access the same
pthread_mutex_tobject.
Rust: Shared Memory with memmap2
Rust does not have a standard-library shared memory API. The memmap2 crate provides a safe wrapper around mmap.
Add to Cargo.toml:
[dependencies]
memmap2 = "0.9"
nix = { version = "0.29", features = ["mman", "fs"] }
// shm_writer.rs use nix::fcntl::OFlag; use nix::sys::mman::{shm_open, shm_unlink}; use nix::sys::stat::Mode; use nix::unistd::ftruncate; use memmap2::MmapMut; use std::fs::File; use std::os::fd::FromRawFd; use std::io::Write; const SHM_NAME: &str = "/rust_shm"; const SHM_SIZE: usize = 4096; fn main() { // Create shared memory let fd = shm_open( SHM_NAME, OFlag::O_CREAT | OFlag::O_RDWR, Mode::S_IRUSR | Mode::S_IWUSR, ) .expect("shm_open failed"); ftruncate(&fd, SHM_SIZE as i64).expect("ftruncate failed"); let file = unsafe { File::from_raw_fd(fd.as_raw_fd()) }; let mut mmap = unsafe { MmapMut::map_mut(&file).expect("mmap failed") }; let msg = b"Hello from Rust shared memory!"; mmap[..msg.len()].copy_from_slice(msg); println!("Writer: wrote message. Press Enter to clean up."); let mut buf = String::new(); std::io::stdin().read_line(&mut buf).unwrap(); drop(mmap); shm_unlink(SHM_NAME).ok(); }
// shm_reader.rs use nix::fcntl::OFlag; use nix::sys::mman::shm_open; use nix::sys::stat::Mode; use memmap2::Mmap; use std::fs::File; use std::os::fd::FromRawFd; const SHM_NAME: &str = "/rust_shm"; const SHM_SIZE: usize = 4096; fn main() { let fd = shm_open(SHM_NAME, OFlag::O_RDONLY, Mode::empty()) .expect("shm_open failed -- is the writer running?"); let file = unsafe { File::from_raw_fd(fd.as_raw_fd()) }; let mmap = unsafe { Mmap::map(&file).expect("mmap failed") }; // Find the null terminator or use a fixed length let end = mmap.iter().position(|&b| b == 0).unwrap_or(SHM_SIZE); let msg = std::str::from_utf8(&mmap[..end]).unwrap(); println!("Reader: got '{}'", msg); }
Rust Note:
mmapis inherently unsafe in Rust because another process can modify the mapped memory at any time, violating Rust's aliasing rules. Theunsafeblocks here acknowledge that you are opting into shared-memory semantics, where the compiler cannot enforce data-race freedom.
When to Use Shared Memory
+------------------+-----------+----------+----------+
| Factor | Pipe | Socket | Shm |
+------------------+-----------+----------+----------+
| Speed | Medium | Medium | Fastest |
| Kernel copies | 2 (w+r) | 2 (w+r) | 0 |
| Sync needed | Built-in | Built-in | Manual |
| Unrelated procs | No (FIFO) | Yes | Yes |
| Structured data | Serialize | Serialize| Direct |
| Complexity | Low | Medium | High |
+------------------+-----------+----------+----------+
Use shared memory when:
- You need the absolute lowest latency (high-frequency trading, real-time audio).
- You are transferring large amounts of data between processes.
- The data is a fixed-size structure that both processes understand.
Do not use it when:
- You need communication between machines (use sockets).
- The data is small and infrequent (use pipes or message queues).
- You cannot afford the complexity of manual synchronization.
Anonymous Shared Memory with mmap
You do not always need shm_open. For parent-child sharing, use MAP_SHARED | MAP_ANONYMOUS:
/* anon_shm.c */
#include <stdio.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void) {
int *shared = mmap(NULL, sizeof(int),
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS,
-1, 0);
if (shared == MAP_FAILED) {
perror("mmap");
return 1;
}
*shared = 0;
pid_t pid = fork();
if (pid == 0) {
*shared = 42;
_exit(0);
}
wait(NULL);
printf("Child set shared value to %d\n", *shared);
munmap(shared, sizeof(int));
return 0;
}
No filesystem name needed. The mapping is inherited by fork() and shared between parent and child.
Driver Prep: Linux kernel drivers use shared memory extensively.
mmapin a device driver maps kernel buffers into user space (e.g., framebuffer devices, DMA buffers). Theremap_pfn_rangefunction in the kernel is the driver-side equivalent ofmmap.
Memory-Mapped Files
mmap can also map regular files, giving you shared, persistent storage:
/* mmap_file.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>
int main(void) {
const char *path = "/tmp/mmap_test.dat";
int fd = open(path, O_RDWR | O_CREAT | O_TRUNC, 0666);
ftruncate(fd, 4096);
char *map = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
close(fd);
strcpy(map, "Persisted via mmap");
msync(map, 4096, MS_SYNC); /* flush to disk */
printf("Wrote to file via mmap\n");
munmap(map, 4096);
/* Verify by reading the file normally */
fd = open(path, O_RDONLY);
char buf[64];
read(fd, buf, sizeof(buf));
close(fd);
printf("Read back: %s\n", buf);
unlink(path);
return 0;
}
Try It: Modify
mmap_file.cto map an existing file (like/etc/hostname) as read-only and print its contents without usingread(). Hint: usePROT_READandMAP_PRIVATE.
Knowledge Check
- What is the difference between
MAP_SHAREDandMAP_PRIVATE? - Why must a process-shared mutex be stored in the shared memory region itself?
- What does
msyncdo, and when would you need it?
Common Pitfalls
- Forgetting
ftruncate-- the shared memory object starts at size 0. Accessing unmapped memory causesSIGBUS. - Using
MAP_PRIVATEwhen you want sharing --MAP_PRIVATEcreates a copy-on-write mapping. Changes are not visible to other processes. - Not calling
shm_unlink-- the shared memory object persists in/dev/shm/until you remove it. - Assuming memory ordering -- on architectures with weak memory ordering (ARM, RISC-V), you need memory barriers or atomics even with shared memory. x86 is relatively forgiving but do not rely on it.
- Mapping too much memory --
mmapreserves virtual address space but physical memory is allocated on demand (page faults). Still, do not map terabytes casually. - Storing pointers in shared memory -- pointers are process-local. Store offsets instead.
Message Queues and Semaphores
Message queues give you structured, typed communication between processes without the byte-stream nature of pipes. Semaphores give you lightweight synchronization without the overhead of a full mutex. This chapter covers POSIX message queues and POSIX semaphores, then compares all IPC mechanisms.
POSIX Message Queues
A message queue is a kernel-managed list of messages. Each message has a body and a priority. Higher-priority messages are delivered first.
/* mq_sender.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <mqueue.h>
#define QUEUE_NAME "/my_queue"
#define MAX_MSG_SIZE 256
#define MAX_MSGS 10
int main(void) {
struct mq_attr attr = {
.mq_flags = 0,
.mq_maxmsg = MAX_MSGS,
.mq_msgsize = MAX_MSG_SIZE,
.mq_curmsgs = 0
};
mqd_t mq = mq_open(QUEUE_NAME, O_CREAT | O_WRONLY, 0666, &attr);
if (mq == (mqd_t)-1) {
perror("mq_open");
return 1;
}
const char *messages[] = {
"Low priority message",
"Medium priority message",
"High priority message"
};
unsigned int priorities[] = {1, 5, 10};
for (int i = 0; i < 3; i++) {
if (mq_send(mq, messages[i], strlen(messages[i]) + 1,
priorities[i]) == -1) {
perror("mq_send");
return 1;
}
printf("Sent (priority %u): %s\n", priorities[i], messages[i]);
}
mq_close(mq);
return 0;
}
/* mq_receiver.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <mqueue.h>
#define QUEUE_NAME "/my_queue"
#define MAX_MSG_SIZE 256
int main(void) {
mqd_t mq = mq_open(QUEUE_NAME, O_RDONLY);
if (mq == (mqd_t)-1) {
perror("mq_open");
return 1;
}
char buf[MAX_MSG_SIZE + 1];
unsigned int priority;
/* Receive 3 messages -- highest priority comes first */
for (int i = 0; i < 3; i++) {
ssize_t n = mq_receive(mq, buf, sizeof(buf), &priority);
if (n == -1) {
perror("mq_receive");
return 1;
}
buf[n] = '\0';
printf("Received (priority %u): %s\n", priority, buf);
}
mq_close(mq);
mq_unlink(QUEUE_NAME);
return 0;
}
Compile with:
gcc -o mq_sender mq_sender.c -lrt
gcc -o mq_receiver mq_receiver.c -lrt
Run the sender first, then the receiver. Notice the output order is by descending priority:
Received (priority 10): High priority message
Received (priority 5): Medium priority message
Received (priority 1): Low priority message
The message queue API:
mq_open(name, flags, mode, attr) -- create or open
mq_send(mq, msg, len, priority) -- send a message
mq_receive(mq, buf, len, &prio) -- receive highest-priority message
mq_close(mq) -- close the descriptor
mq_unlink(name) -- remove the queue
mq_getattr(mq, &attr) -- query attributes
mq_notify(mq, &sigevent) -- register for async notification
Caution: The
mq_receivebuffer must be at leastmq_msgsizebytes (as set inmq_attr). If it is smaller,mq_receivefails withEMSGSIZE. This is a common mistake.
Try It: Modify the sender to send 5 messages with the same priority. Verify that they arrive in FIFO order (first-in, first-out within the same priority level).
Message Queue Limits
Linux imposes system-wide limits on message queues:
/proc/sys/fs/mqueue/msg_max -- max messages per queue (default 10)
/proc/sys/fs/mqueue/msgsize_max -- max message size (default 8192)
/proc/sys/fs/mqueue/queues_max -- max number of queues (default 256)
You can view and modify these:
cat /proc/sys/fs/mqueue/msg_max
Non-blocking and Timed Operations
/* mq_nonblock.c */
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <mqueue.h>
#include <errno.h>
#include <time.h>
#include <string.h>
#define QUEUE_NAME "/nb_queue"
#define MAX_MSG_SIZE 256
int main(void) {
struct mq_attr attr = {
.mq_flags = 0,
.mq_maxmsg = 10,
.mq_msgsize = MAX_MSG_SIZE,
.mq_curmsgs = 0
};
mqd_t mq = mq_open(QUEUE_NAME, O_CREAT | O_RDWR | O_NONBLOCK,
0666, &attr);
if (mq == (mqd_t)-1) {
perror("mq_open");
return 1;
}
/* Non-blocking receive on empty queue */
char buf[MAX_MSG_SIZE + 1];
unsigned int prio;
if (mq_receive(mq, buf, sizeof(buf), &prio) == -1) {
if (errno == EAGAIN)
printf("No messages available (non-blocking)\n");
}
/* Send a message */
const char *msg = "test message";
mq_send(mq, msg, strlen(msg) + 1, 0);
/* Timed receive: wait up to 2 seconds */
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
ts.tv_sec += 2;
ssize_t n = mq_timedreceive(mq, buf, sizeof(buf), &prio, &ts);
if (n > 0) {
buf[n] = '\0';
printf("Timed receive got: %s\n", buf);
}
mq_close(mq);
mq_unlink(QUEUE_NAME);
return 0;
}
POSIX Semaphores
A semaphore is a counter that supports two atomic operations: wait (decrement) and post (increment). When the counter is zero, sem_wait blocks.
There are two kinds:
- Named semaphores -- created with
sem_open, accessible by unrelated processes via a filesystem name. - Unnamed semaphores -- created with
sem_init, live in shared memory or within a single process.
Named Semaphore
/* sem_named.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <semaphore.h>
#include <sys/wait.h>
#include <unistd.h>
#define SEM_NAME "/my_sem"
int main(void) {
/* Create semaphore with initial value 1 (binary semaphore = mutex) */
sem_t *sem = sem_open(SEM_NAME, O_CREAT, 0666, 1);
if (sem == SEM_FAILED) {
perror("sem_open");
return 1;
}
pid_t pid = fork();
if (pid == 0) {
/* Child */
sem_wait(sem);
printf("Child: entered critical section\n");
sleep(1);
printf("Child: leaving critical section\n");
sem_post(sem);
sem_close(sem);
_exit(0);
}
/* Parent */
sleep(0); /* let child start first */
sem_wait(sem);
printf("Parent: entered critical section\n");
printf("Parent: leaving critical section\n");
sem_post(sem);
wait(NULL);
sem_close(sem);
sem_unlink(SEM_NAME);
return 0;
}
Compile with:
gcc -o sem_named sem_named.c -pthread
Unnamed Semaphore in Shared Memory
/* sem_unnamed.c */
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void) {
/* Allocate semaphore in shared memory */
sem_t *sem = mmap(NULL, sizeof(sem_t), PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (sem == MAP_FAILED) {
perror("mmap");
return 1;
}
/* Initialize: pshared=1 (process-shared), value=1 */
sem_init(sem, 1, 1);
pid_t pid = fork();
if (pid == 0) {
sem_wait(sem);
printf("Child: in critical section\n");
sleep(1);
printf("Child: done\n");
sem_post(sem);
_exit(0);
}
sem_wait(sem);
printf("Parent: in critical section\n");
printf("Parent: done\n");
sem_post(sem);
wait(NULL);
sem_destroy(sem);
munmap(sem, sizeof(sem_t));
return 0;
}
Counting Semaphores: Resource Pools
A counting semaphore tracks the number of available resources.
/* sem_pool.c */
#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>
#define POOL_SIZE 3
#define NUM_WORKERS 8
static sem_t pool;
void *worker(void *arg) {
int id = *(int *)arg;
sem_wait(&pool);
printf("Worker %d: acquired resource (entering pool)\n", id);
sleep(1); /* simulate work with the resource */
printf("Worker %d: releasing resource\n", id);
sem_post(&pool);
return NULL;
}
int main(void) {
sem_init(&pool, 0, POOL_SIZE); /* 3 resources available */
pthread_t threads[NUM_WORKERS];
int ids[NUM_WORKERS];
for (int i = 0; i < NUM_WORKERS; i++) {
ids[i] = i;
pthread_create(&threads[i], NULL, worker, &ids[i]);
}
for (int i = 0; i < NUM_WORKERS; i++)
pthread_join(threads[i], NULL);
sem_destroy(&pool);
return 0;
}
Output shows at most 3 workers in the pool at any time:
Worker 0: acquired resource (entering pool)
Worker 1: acquired resource (entering pool)
Worker 2: acquired resource (entering pool)
Worker 0: releasing resource
Worker 3: acquired resource (entering pool)
...
The semaphore value diagram:
sem value: 3 2 1 0 0 0 1 0 1 ...
| | | | | | | |
W0 W1 W2 W3 W4 W0 W5 W1
acq acq blocks rel acq
Driver Prep: The Linux kernel uses semaphores (
struct semaphore) and counting semaphores for resource management. Thedown()andup()functions in the kernel correspond tosem_wait()andsem_post(). Modern kernel code prefers mutexes for binary locking and completions for signaling.
Semaphore vs Mutex
+------------------+-------------------+--------------------+
| Feature | Mutex | Semaphore |
+------------------+-------------------+--------------------+
| Value range | 0 or 1 (locked/ | 0 to N |
| | unlocked) | |
| Ownership | Yes (only owner | No (any thread |
| | can unlock) | can post) |
| Use case | Mutual exclusion | Resource counting |
| Priority inherit | Yes (on Linux) | No |
| Cross-process | With PSHARED attr | Named or in shm |
+------------------+-------------------+--------------------+
Rust: Message Passing with Channels
Rust does not wrap POSIX message queues in the standard library. Instead, it provides channels (covered in Ch39-40) which serve the same purpose within a single process. For cross-process message queues, use the posixmq crate:
// rust_mq.rs // Cargo.toml: posixmq = "1" use posixmq::PosixMq; fn main() { let name = "/rust_mq"; // Open or create the queue let mq = PosixMq::create(name) .max_msg_len(256) .capacity(10) .open_or_create() .expect("Failed to open message queue"); // Send messages with priorities mq.send(0, b"Low priority").unwrap(); mq.send(5, b"Medium priority").unwrap(); mq.send(10, b"High priority").unwrap(); // Receive (highest priority first) let mut buf = vec![0u8; 256]; for _ in 0..3 { let (priority, len) = mq.recv(&mut buf).unwrap(); let msg = std::str::from_utf8(&buf[..len]).unwrap(); println!("Received (priority {}): {}", priority, msg); } PosixMq::unlink(name).ok(); }
Rust: Semaphore Alternatives
Rust's standard library has no semaphore type. Use tokio's Semaphore for async code or build one from Mutex and Condvar:
// semaphore.rs use std::sync::{Arc, Mutex, Condvar}; use std::thread; use std::time::Duration; struct Semaphore { count: Mutex<usize>, cvar: Condvar, } impl Semaphore { fn new(initial: usize) -> Self { Semaphore { count: Mutex::new(initial), cvar: Condvar::new(), } } fn acquire(&self) { let mut count = self.count.lock().unwrap(); while *count == 0 { count = self.cvar.wait(count).unwrap(); } *count -= 1; } fn release(&self) { let mut count = self.count.lock().unwrap(); *count += 1; self.cvar.notify_one(); } } fn main() { let sem = Arc::new(Semaphore::new(3)); let mut handles = vec![]; for id in 0..8 { let sem = Arc::clone(&sem); handles.push(thread::spawn(move || { sem.acquire(); println!("Worker {}: acquired resource", id); thread::sleep(Duration::from_secs(1)); println!("Worker {}: releasing", id); sem.release(); })); } for h in handles { h.join().unwrap(); } }
Rust Note: Rust's philosophy favors channels over semaphores for most use cases. "Do not communicate by sharing memory; share memory by communicating." Channels are easier to reason about and less prone to bugs.
IPC Decision Table
+------------------+-------+--------+--------+--------+--------+
| Feature | Pipe | FIFO | Shm | MsgQ | Socket |
+------------------+-------+--------+--------+--------+--------+
| Related procs | Yes | Any | Any | Any | Any |
| Network capable | No | No | No | No | Yes |
| Message boundary | No | No | N/A | Yes | DGRAM |
| Priority | No | No | N/A | Yes | No |
| Speed | Med | Med | Fast | Med | Med |
| Kernel copies | 2 | 2 | 0 | 2 | 2 |
| Bidirectional | No | No | Yes | No* | Yes |
| Max data size | 64KB | 64KB | RAM | 8KB** | Large |
| Persistence | No | File | /dev/ | /dev/ | No |
| | | | shm | mqueue | |
+------------------+-------+--------+--------+--------+--------+
* Two queues needed for bidirectional
** Default, configurable
Try It: Write a producer-consumer pair using POSIX message queues. The producer sends 10 numbered messages with alternating priorities (odd numbers get priority 1, even get priority 5). The consumer prints them and observes the ordering.
Knowledge Check
- What is the difference between a named semaphore and an unnamed semaphore?
- Why does
mq_receiverequire a buffer of at leastmq_msgsizebytes? - In what situation would you choose a message queue over a pipe?
Common Pitfalls
mq_receivebuffer too small -- fails withEMSGSIZEeven if the actual message is short. The buffer must bemq_msgsizeor larger.- Forgetting
mq_unlinkorsem_unlink-- the objects persist in/dev/mqueue/and/dev/shm/until explicitly removed. - Using
sem_initwithpshared=0across processes -- the semaphore only works within one process. Setpshared=1for cross-process use. - Deadlock with semaphores -- if
sem_waitis called more times thansem_post, the semaphore blocks forever. - Ignoring
EINTR--sem_waitandmq_receivecan be interrupted by signals. Always check forEINTRand retry. - Message queue full --
mq_sendblocks (or returnsEAGAINin non-blocking mode) when the queue is at capacity.
Unix Domain Sockets
Unix domain sockets are the Swiss Army knife of Linux IPC. They use the familiar socket API (socket, bind, listen, accept, connect) but communicate within the same machine, without network overhead. They support both stream and datagram modes, can pass file descriptors between processes, and can verify the identity of the peer. If you only learn one IPC mechanism, make it this one.
Creating a Unix Domain Socket
The key difference from network sockets: AF_UNIX instead of AF_INET, and struct sockaddr_un instead of struct sockaddr_in.
/* uds_server.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SOCKET_PATH "/tmp/my_uds.sock"
int main(void) {
int srv = socket(AF_UNIX, SOCK_STREAM, 0);
if (srv == -1) {
perror("socket");
return 1;
}
/* Remove any leftover socket file */
unlink(SOCKET_PATH);
struct sockaddr_un addr;
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
if (bind(srv, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
perror("bind");
return 1;
}
listen(srv, 5);
printf("Server listening on %s\n", SOCKET_PATH);
int client = accept(srv, NULL, NULL);
if (client == -1) {
perror("accept");
return 1;
}
char buf[256];
ssize_t n = read(client, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Server received: %s\n", buf);
const char *reply = "Hello from server";
write(client, reply, strlen(reply));
close(client);
close(srv);
unlink(SOCKET_PATH);
return 0;
}
/* uds_client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SOCKET_PATH "/tmp/my_uds.sock"
int main(void) {
int fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd == -1) {
perror("socket");
return 1;
}
struct sockaddr_un addr;
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
if (connect(fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
perror("connect");
return 1;
}
const char *msg = "Hello from client";
write(fd, msg, strlen(msg));
char buf[256];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Client received: %s\n", buf);
close(fd);
return 0;
}
Run the server in one terminal, the client in another. The socket appears as a file:
$ ls -la /tmp/my_uds.sock
srwxrwxr-x 1 user user 0 ... /tmp/my_uds.sock
The s at the start of the permissions indicates a socket file.
SOCK_STREAM vs SOCK_DGRAM
SOCK_STREAM (like TCP):
- Connection-oriented
- Reliable, ordered byte stream
- Must listen/accept/connect
SOCK_DGRAM (like UDP):
- Connectionless
- Message boundaries preserved
- No listen/accept needed
- Reliable (unlike UDP -- no network to drop packets)
A datagram example:
/* uds_dgram_server.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SERVER_PATH "/tmp/uds_dgram_srv.sock"
int main(void) {
int fd = socket(AF_UNIX, SOCK_DGRAM, 0);
unlink(SERVER_PATH);
struct sockaddr_un addr = {0};
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, SERVER_PATH, sizeof(addr.sun_path) - 1);
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
printf("Datagram server waiting on %s\n", SERVER_PATH);
char buf[256];
struct sockaddr_un client_addr;
socklen_t len = sizeof(client_addr);
ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
(struct sockaddr *)&client_addr, &len);
buf[n] = '\0';
printf("Server got: %s\n", buf);
/* Send reply back to client */
const char *reply = "ACK";
sendto(fd, reply, strlen(reply), 0,
(struct sockaddr *)&client_addr, len);
close(fd);
unlink(SERVER_PATH);
return 0;
}
/* uds_dgram_client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SERVER_PATH "/tmp/uds_dgram_srv.sock"
#define CLIENT_PATH "/tmp/uds_dgram_cli.sock"
int main(void) {
int fd = socket(AF_UNIX, SOCK_DGRAM, 0);
/* Client must bind too, so server can reply */
unlink(CLIENT_PATH);
struct sockaddr_un client_addr = {0};
client_addr.sun_family = AF_UNIX;
strncpy(client_addr.sun_path, CLIENT_PATH,
sizeof(client_addr.sun_path) - 1);
bind(fd, (struct sockaddr *)&client_addr, sizeof(client_addr));
struct sockaddr_un server_addr = {0};
server_addr.sun_family = AF_UNIX;
strncpy(server_addr.sun_path, SERVER_PATH,
sizeof(server_addr.sun_path) - 1);
const char *msg = "Hello datagram";
sendto(fd, msg, strlen(msg), 0,
(struct sockaddr *)&server_addr, sizeof(server_addr));
char buf[256];
ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0, NULL, NULL);
buf[n] = '\0';
printf("Client got reply: %s\n", buf);
close(fd);
unlink(CLIENT_PATH);
return 0;
}
Try It: Modify the datagram server to loop and handle multiple messages from different clients. Each client should bind to a unique path (e.g.,
/tmp/client_PID.sock).
Abstract Socket Namespace
Linux supports an abstract namespace that does not create a filesystem entry. Set sun_path[0] = '\0':
/* uds_abstract.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>
int main(void) {
int srv = socket(AF_UNIX, SOCK_STREAM, 0);
struct sockaddr_un addr;
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
/* Abstract: first byte is \0, rest is the name */
const char *name = "\0my_abstract_socket";
memcpy(addr.sun_path, name, 20);
socklen_t addr_len = offsetof(struct sockaddr_un, sun_path) + 20;
bind(srv, (struct sockaddr *)&addr, addr_len);
listen(srv, 1);
pid_t pid = fork();
if (pid == 0) {
/* Child: connect */
close(srv);
int cli = socket(AF_UNIX, SOCK_STREAM, 0);
connect(cli, (struct sockaddr *)&addr, addr_len);
write(cli, "abstract!", 9);
close(cli);
_exit(0);
}
int client = accept(srv, NULL, NULL);
char buf[64];
ssize_t n = read(client, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Received via abstract socket: %s\n", buf);
close(client);
close(srv);
wait(NULL);
return 0;
}
Advantages of abstract sockets:
- No filesystem cleanup needed (no
unlinkrequired). - No permission issues with the socket file.
- Automatically vanishes when all file descriptors are closed.
Caution: Abstract sockets are Linux-specific. They do not exist on macOS or FreeBSD. The address length matters -- you must pass the exact length, not
sizeof(addr), because the name may contain null bytes.
Passing File Descriptors (SCM_RIGHTS)
This is the killer feature. One process can send an open file descriptor to another process over a Unix domain socket. The kernel creates a new file descriptor in the receiver's file descriptor table pointing to the same underlying file.
/* fd_sender.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SOCKET_PATH "/tmp/fd_pass.sock"
int send_fd(int sock, int fd_to_send) {
char buf[1] = {'F'};
struct iovec iov = { .iov_base = buf, .iov_len = 1 };
/* Ancillary data buffer */
union {
char buf[CMSG_SPACE(sizeof(int))];
struct cmsghdr align;
} cmsg_buf;
struct msghdr msg = {0};
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = cmsg_buf.buf;
msg.msg_controllen = sizeof(cmsg_buf.buf);
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(int));
memcpy(CMSG_DATA(cmsg), &fd_to_send, sizeof(int));
return sendmsg(sock, &msg, 0);
}
int main(void) {
int srv = socket(AF_UNIX, SOCK_STREAM, 0);
unlink(SOCKET_PATH);
struct sockaddr_un addr = {0};
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
bind(srv, (struct sockaddr *)&addr, sizeof(addr));
listen(srv, 1);
printf("Sender: waiting for connection...\n");
int client = accept(srv, NULL, NULL);
/* Open a file and send the fd to the other process */
int file_fd = open("/etc/hostname", O_RDONLY);
if (file_fd == -1) {
perror("open");
return 1;
}
printf("Sender: sending fd %d for /etc/hostname\n", file_fd);
send_fd(client, file_fd);
close(file_fd);
close(client);
close(srv);
unlink(SOCKET_PATH);
return 0;
}
/* fd_receiver.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SOCKET_PATH "/tmp/fd_pass.sock"
int recv_fd(int sock) {
char buf[1];
struct iovec iov = { .iov_base = buf, .iov_len = 1 };
union {
char buf[CMSG_SPACE(sizeof(int))];
struct cmsghdr align;
} cmsg_buf;
struct msghdr msg = {0};
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = cmsg_buf.buf;
msg.msg_controllen = sizeof(cmsg_buf.buf);
if (recvmsg(sock, &msg, 0) <= 0)
return -1;
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
if (cmsg && cmsg->cmsg_level == SOL_SOCKET
&& cmsg->cmsg_type == SCM_RIGHTS) {
int fd;
memcpy(&fd, CMSG_DATA(cmsg), sizeof(int));
return fd;
}
return -1;
}
int main(void) {
int fd = socket(AF_UNIX, SOCK_STREAM, 0);
struct sockaddr_un addr = {0};
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
connect(fd, (struct sockaddr *)&addr, sizeof(addr));
int received_fd = recv_fd(fd);
printf("Receiver: got fd %d\n", received_fd);
/* Read from the received fd */
char buf[256];
ssize_t n = read(received_fd, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("Receiver: read from passed fd: %s", buf);
close(received_fd);
close(fd);
return 0;
}
Run the sender first, then the receiver. The receiver reads /etc/hostname using a file descriptor it never opened -- the sender passed it over the socket.
FD passing flow:
Sender process: Receiver process:
fd 3 -> /etc/hostname
|
+-- sendmsg(SCM_RIGHTS) --> recvmsg() --> fd 4 -> /etc/hostname
(same file, new fd #)
Caution: The received file descriptor number will be different from the sender's. The kernel allocates the lowest available fd number in the receiver's table. The underlying file description (offset, flags) is shared.
Passing Credentials (SCM_CREDENTIALS)
Unix domain sockets can also verify the peer's PID, UID, and GID.
/* cred_server.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SOCKET_PATH "/tmp/cred_check.sock"
int main(void) {
int srv = socket(AF_UNIX, SOCK_STREAM, 0);
unlink(SOCKET_PATH);
struct sockaddr_un addr = {0};
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
bind(srv, (struct sockaddr *)&addr, sizeof(addr));
listen(srv, 1);
int client = accept(srv, NULL, NULL);
/* Enable credential passing */
int optval = 1;
setsockopt(client, SOL_SOCKET, SO_PASSCRED, &optval, sizeof(optval));
char buf[1];
struct iovec iov = { .iov_base = buf, .iov_len = 1 };
union {
char buf[CMSG_SPACE(sizeof(struct ucred))];
struct cmsghdr align;
} cmsg_buf;
struct msghdr msg = {0};
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = cmsg_buf.buf;
msg.msg_controllen = sizeof(cmsg_buf.buf);
recvmsg(client, &msg, 0);
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
if (cmsg && cmsg->cmsg_level == SOL_SOCKET
&& cmsg->cmsg_type == SCM_CREDENTIALS) {
struct ucred cred;
memcpy(&cred, CMSG_DATA(cmsg), sizeof(cred));
printf("Peer PID: %d\n", cred.pid);
printf("Peer UID: %d\n", cred.uid);
printf("Peer GID: %d\n", cred.gid);
}
close(client);
close(srv);
unlink(SOCKET_PATH);
return 0;
}
This is how D-Bus, systemd, and many daemons authenticate their clients without passwords.
Driver Prep: Unix domain sockets are used heavily in the Linux ecosystem.
systemdsocket activation passes pre-opened sockets to services. D-Bus uses Unix domain sockets for desktop IPC. Container runtimes pass file descriptors for namespace setup. Understandingsendmsg/recvmsgwith ancillary data is essential for systems programming.
Rust: UnixStream and UnixListener
Rust's standard library includes Unix domain socket support:
// uds_server.rs use std::os::unix::net::UnixListener; use std::io::{Read, Write}; fn main() { let path = "/tmp/rust_uds.sock"; let _ = std::fs::remove_file(path); let listener = UnixListener::bind(path).expect("bind failed"); println!("Server listening on {}", path); let (mut stream, _addr) = listener.accept().expect("accept failed"); let mut buf = [0u8; 256]; let n = stream.read(&mut buf).expect("read failed"); let msg = std::str::from_utf8(&buf[..n]).unwrap(); println!("Server received: {}", msg); stream.write_all(b"Hello from Rust server").expect("write failed"); std::fs::remove_file(path).ok(); }
// uds_client.rs use std::os::unix::net::UnixStream; use std::io::{Read, Write}; fn main() { let path = "/tmp/rust_uds.sock"; let mut stream = UnixStream::connect(path).expect("connect failed"); stream.write_all(b"Hello from Rust client").expect("write failed"); let mut buf = [0u8; 256]; let n = stream.read(&mut buf).expect("read failed"); let msg = std::str::from_utf8(&buf[..n]).unwrap(); println!("Client received: {}", msg); }
Rust: Datagram Sockets
// uds_dgram.rs use std::os::unix::net::UnixDatagram; fn main() { let server_path = "/tmp/rust_dgram_srv.sock"; let client_path = "/tmp/rust_dgram_cli.sock"; let _ = std::fs::remove_file(server_path); let _ = std::fs::remove_file(client_path); let server = UnixDatagram::bind(server_path).unwrap(); let client = UnixDatagram::bind(client_path).unwrap(); client.send_to(b"Hello datagram", server_path).unwrap(); let mut buf = [0u8; 256]; let (n, addr) = server.recv_from(&mut buf).unwrap(); println!("Server got: {}", std::str::from_utf8(&buf[..n]).unwrap()); server.send_to(b"ACK", addr.as_pathname().unwrap()).unwrap(); let n = client.recv(&mut buf).unwrap(); println!("Client got: {}", std::str::from_utf8(&buf[..n]).unwrap()); std::fs::remove_file(server_path).ok(); std::fs::remove_file(client_path).ok(); }
For async Unix domain sockets, tokio provides tokio::net::UnixListener and tokio::net::UnixStream with the same API as the sync versions but using .await. See Ch40 for async patterns.
Rust Note: Rust's
std::os::unix::nettypes do not supportSCM_RIGHTSdirectly. For file descriptor passing in Rust, use thenixcrate'ssendmsg/recvmsgwithControlMessage::ScmRights, or thepassfdcrate.
Why Unix Domain Sockets Are the Best IPC
+---------------------------+------------------------------------+
| Feature | Unix Domain Sockets |
+---------------------------+------------------------------------+
| Bidirectional | Yes (SOCK_STREAM) |
| Message boundaries | Yes (SOCK_DGRAM) |
| Unrelated processes | Yes |
| File descriptor passing | Yes (SCM_RIGHTS) |
| Credential checking | Yes (SCM_CREDENTIALS) |
| Familiar API | Same as TCP/UDP sockets |
| Performance | Faster than TCP loopback |
| Backpressure | Yes (kernel buffer limits) |
| Async-compatible | Yes (epoll, tokio, etc.) |
| Easy to upgrade to TCP | Change AF_UNIX to AF_INET |
+---------------------------+------------------------------------+
Knowledge Check
- What is the difference between a filesystem-path socket and an abstract socket?
- How does
SCM_RIGHTSwork at the kernel level? - Why must a datagram client also
bindto a path if it wants to receive replies?
Common Pitfalls
- Forgetting to
unlinkthe socket file -- the nextbindwill fail withEADDRINUSE. Alwaysunlinkbeforebind. - Using
sizeof(addr)for abstract socket addresses -- abstract names can contain null bytes. Pass the exact computed length. - Not setting
SO_PASSCREDbefore receiving credentials -- the kernel does not attach credential data by default. - Assuming
SOCK_DGRAMis unreliable -- unlike UDP, Unix datagram sockets are reliable on the same machine. Messages are never dropped (but the sender blocks if the receiver's buffer is full). - Permission issues on the socket file -- the socket file inherits the umask. Use
chmodorfchmodif other users need access. - Buffer overflow in
sun_path-- the path field is only 108 bytes on Linux. Use abstract sockets for long names.
The Socket API
Networking on Linux starts with sockets. A socket is a file descriptor that represents one end of a network conversation. Every networked program you have ever used -- web browsers, SSH clients, game servers -- builds on the same handful of system calls: socket(), bind(), listen(), accept(), connect().
This chapter walks through each call, the address structures that feed them, and the DNS resolution machinery that maps hostnames to addresses.
The Two Workflows
Before any code, understand the two fundamental patterns.
CLIENT SERVER
------ ------
socket() socket()
| |
connect() -----> [network] -----> bind()
| |
write()/read() listen()
| |
close() accept() ---> new fd
|
read()/write()
|
close()
The client creates a socket and immediately connects. The server creates a socket, binds it to an address, starts listening, and accepts incoming connections.
Address Structures
Every socket call that touches an address needs a struct sockaddr. In practice you never use the generic one directly. You fill in a protocol-specific structure and cast it.
struct sockaddr (generic, 16 bytes)
+--------+---------------------------+
| family | 14 bytes of data |
+--------+---------------------------+
struct sockaddr_in (IPv4)
+--------+--------+------------------+
| AF_INET| port | 4-byte IPv4 addr| + 8 bytes padding
+--------+--------+------------------+
struct sockaddr_in6 (IPv6)
+--------+--------+------+-----------+----------+
|AF_INET6| port |flow | 16-byte IPv6 addr | + scope_id
+--------+--------+------+-----------+----------+
Creating a Socket in C
/* create_socket.c -- create a TCP socket and print its fd */
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) {
perror("socket");
return 1;
}
printf("TCP socket fd = %d\n", fd);
int udp_fd = socket(AF_INET, SOCK_DGRAM, 0);
if (udp_fd < 0) {
perror("socket");
close(fd);
return 1;
}
printf("UDP socket fd = %d\n", udp_fd);
close(fd);
close(udp_fd);
return 0;
}
The three arguments: address family (AF_INET for IPv4, AF_INET6 for IPv6), socket type (SOCK_STREAM for TCP, SOCK_DGRAM for UDP), and protocol (0 lets the kernel pick the obvious one).
Caution: A socket fd is just a number. If you forget to close it, you leak a file descriptor. In a long-running server, this eventually hits the per-process fd limit and new connections silently fail.
Filling an Address: inet_pton
inet_pton converts a human-readable address string into binary form. inet_ntop goes the other direction.
/* addr_convert.c -- convert addresses between text and binary */
#include <stdio.h>
#include <arpa/inet.h>
int main(void)
{
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(8080); /* host-to-network byte order */
if (inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr) != 1) {
fprintf(stderr, "bad address\n");
return 1;
}
/* Convert back to string */
char buf[INET_ADDRSTRLEN];
const char *result = inet_ntop(AF_INET, &addr.sin_addr,
buf, sizeof(buf));
if (!result) {
perror("inet_ntop");
return 1;
}
printf("Address: %s Port: %d\n", buf, ntohs(addr.sin_port));
/* IPv6 example */
struct sockaddr_in6 addr6;
addr6.sin6_family = AF_INET6;
addr6.sin6_port = htons(9090);
inet_pton(AF_INET6, "::1", &addr6.sin6_addr);
char buf6[INET6_ADDRSTRLEN];
inet_ntop(AF_INET6, &addr6.sin6_addr, buf6, sizeof(buf6));
printf("IPv6 Address: %s Port: %d\n", buf6, ntohs(addr6.sin6_port));
return 0;
}
htons and ntohs convert between host byte order and network byte order (big-endian). Every multi-byte field in a sockaddr must be in network byte order.
Caution: Forgetting
htons()on the port is a classic bug. Port 8080 in little-endian becomes 47137 in big-endian. Your server binds to the wrong port and you spend an hour debugging.
DNS Resolution: getaddrinfo
Hard-coding IP addresses is fragile. getaddrinfo resolves hostnames and service names, returning a linked list of address structures ready to pass to connect() or bind().
/* resolve.c -- resolve a hostname to IP addresses */
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <arpa/inet.h>
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "usage: %s hostname\n", argv[0]);
return 1;
}
struct addrinfo hints, *res, *p;
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC; /* IPv4 or IPv6 */
hints.ai_socktype = SOCK_STREAM; /* TCP */
int status = getaddrinfo(argv[1], "http", &hints, &res);
if (status != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(status));
return 1;
}
for (p = res; p != NULL; p = p->ai_next) {
char ipstr[INET6_ADDRSTRLEN];
void *addr;
const char *ipver;
if (p->ai_family == AF_INET) {
struct sockaddr_in *ipv4 = (struct sockaddr_in *)p->ai_addr;
addr = &ipv4->sin_addr;
ipver = "IPv4";
} else {
struct sockaddr_in6 *ipv6 = (struct sockaddr_in6 *)p->ai_addr;
addr = &ipv6->sin6_addr;
ipver = "IPv6";
}
inet_ntop(p->ai_family, addr, ipstr, sizeof(ipstr));
printf(" %s: %s\n", ipver, ipstr);
}
freeaddrinfo(res);
return 0;
}
getaddrinfo is thread-safe, handles both IPv4 and IPv6, and replaces the older gethostbyname. Always use it.
Try It: Compile
resolve.cand run it withlocalhost, thengoogle.com, then a hostname that does not exist. Observe the error fromgai_strerror.
A Complete TCP Client in C
/* tcp_client.c -- connect to a server, send a message, read reply */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
int main(int argc, char *argv[])
{
if (argc != 3) {
fprintf(stderr, "usage: %s host port\n", argv[0]);
return 1;
}
struct addrinfo hints, *res;
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
int status = getaddrinfo(argv[1], argv[2], &hints, &res);
if (status != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(status));
return 1;
}
int sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (sockfd < 0) {
perror("socket");
freeaddrinfo(res);
return 1;
}
if (connect(sockfd, res->ai_addr, res->ai_addrlen) < 0) {
perror("connect");
close(sockfd);
freeaddrinfo(res);
return 1;
}
freeaddrinfo(res);
const char *msg = "Hello, server!\n";
write(sockfd, msg, strlen(msg));
char buf[1024];
ssize_t n = read(sockfd, buf, sizeof(buf) - 1);
if (n > 0) {
buf[n] = '\0';
printf("Server replied: %s", buf);
}
close(sockfd);
return 0;
}
The flow: resolve address, create socket, connect, write, read, close. Every real-world client follows this skeleton.
A Complete TCP Server in C
/* tcp_server.c -- accept one connection, echo, exit */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
if (listen_fd < 0) { perror("socket"); return 1; }
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(7878);
if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind");
close(listen_fd);
return 1;
}
if (listen(listen_fd, 5) < 0) {
perror("listen");
close(listen_fd);
return 1;
}
printf("Listening on port 7878...\n");
struct sockaddr_in client_addr;
socklen_t client_len = sizeof(client_addr);
int conn_fd = accept(listen_fd, (struct sockaddr *)&client_addr,
&client_len);
if (conn_fd < 0) {
perror("accept");
close(listen_fd);
return 1;
}
char client_ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &client_addr.sin_addr, client_ip, sizeof(client_ip));
printf("Connection from %s:%d\n", client_ip, ntohs(client_addr.sin_port));
char buf[1024];
ssize_t n = read(conn_fd, buf, sizeof(buf));
if (n > 0) {
write(conn_fd, buf, n); /* echo back */
}
close(conn_fd);
close(listen_fd);
return 0;
}
SO_REUSEADDR lets you restart the server immediately after stopping it. Without it, the kernel holds the port in TIME_WAIT state for up to 60 seconds.
Try It: Run the server in one terminal and the client in another (
./tcp_client 127.0.0.1 7878). Then modify the server to handle multiple connections in a loop instead of exiting after the first one.
TCP vs UDP
| Feature | TCP (SOCK_STREAM) | UDP (SOCK_DGRAM) |
|---|---|---|
| Connection | Yes (connect/accept) | No (sendto/recvfrom) |
| Reliability | Guaranteed delivery | Best-effort |
| Ordering | Preserved | Not guaranteed |
| Framing | Byte stream | Message boundaries |
| Overhead | Higher (handshake, ACKs) | Lower |
| Typical use | HTTP, SSH, databases | DNS, gaming, streaming |
Rust: std::net
Rust's standard library wraps the socket API into safe, high-level types. No raw sockaddr structs, no casts, no byte-order functions to remember.
// tcp_client.rs -- connect, send, receive use std::io::{Read, Write}; use std::net::TcpStream; fn main() -> std::io::Result<()> { let mut stream = TcpStream::connect("127.0.0.1:7878")?; stream.write_all(b"Hello, server!\n")?; let mut buf = [0u8; 1024]; let n = stream.read(&mut buf)?; print!("Server replied: {}", String::from_utf8_lossy(&buf[..n])); Ok(()) }
One line to connect. One line to write. One line to read. The ? operator propagates errors without crashing.
// tcp_server.rs -- accept one connection, echo use std::io::{Read, Write}; use std::net::TcpListener; fn main() -> std::io::Result<()> { let listener = TcpListener::bind("0.0.0.0:7878")?; println!("Listening on port 7878..."); let (mut stream, addr) = listener.accept()?; println!("Connection from {}", addr); let mut buf = [0u8; 1024]; let n = stream.read(&mut buf)?; stream.write_all(&buf[..n])?; Ok(()) }
Rust Note:
TcpListener::bindhandlessocket(),bind(), andlisten()in a single call. The address is parsed from a string automatically.SO_REUSEADDRis set by default on most platforms.
Rust: UDP
// udp_example.rs -- send and receive a datagram use std::net::UdpSocket; fn main() -> std::io::Result<()> { let socket = UdpSocket::bind("0.0.0.0:0")?; // OS picks port socket.send_to(b"ping", "127.0.0.1:9000")?; let mut buf = [0u8; 1024]; let (n, src) = socket.recv_from(&mut buf)?; println!("Got {} bytes from {}: {}", n, src, String::from_utf8_lossy(&buf[..n])); Ok(()) }
Rust: The nix Crate for Low-Level Control
When you need setsockopt, raw sockaddr manipulation, or socket options that std::net does not expose, use the nix crate.
// nix_socket.rs -- create a socket with nix for low-level control // Cargo.toml: nix = { version = "0.29", features = ["net"] } use nix::sys::socket::{ socket, bind, listen, accept, AddressFamily, SockType, SockFlag, SockaddrIn, }; use std::net::Ipv4Addr; use std::io::{Read, Write}; use std::os::fd::FromRawFd; fn main() -> Result<(), Box<dyn std::error::Error>> { let fd = socket( AddressFamily::Inet, SockType::Stream, SockFlag::empty(), None, )?; let addr = SockaddrIn::new(0, 0, 0, 0, 7879); // 0.0.0.0:7879 bind(fd.as_raw_fd(), &addr)?; listen(&fd, nix::sys::socket::Backlog::new(5)?)?; println!("nix: listening on port 7879"); let conn_fd = accept(&fd)?; println!("nix: accepted connection"); // Wrap in std File for Read/Write traits let mut stream = unsafe { std::net::TcpStream::from_raw_fd(conn_fd) }; let mut buf = [0u8; 256]; let n = stream.read(&mut buf)?; stream.write_all(&buf[..n])?; Ok(()) }
Driver Prep: In kernel modules, you will encounter
struct socketandsock_create_kern(), which mirror the userspace socket API. Understanding the syscall interface here maps directly to the kernel's internal socket layer.
Data Flow Through the Stack
Application: write(fd, buf, len)
|
+---------+
| TCP/UDP | segmentation, checksums, sequence numbers
+---------+
|
+---------+
| IP | routing, fragmentation, TTL
+---------+
|
+---------+
| Driver | DMA to NIC hardware
+---------+
|
[wire]
Every write() to a socket sends data down this stack. Every read() pulls data up.
Knowledge Check
- What is the difference between
SOCK_STREAMandSOCK_DGRAM? Which transport protocol does each imply? - Why must you call
htons()on the port number before storing it insockaddr_in? - What does
getaddrinforeturn, and why is it preferred overgethostbyname?
Common Pitfalls
- Forgetting
htons/htonl-- your address or port is silently wrong. - Not checking return values --
connect()can fail for dozens of reasons. - Not calling
freeaddrinfo-- leaks the linked list returned bygetaddrinfo. - Using
INADDR_ANYwithouthtonl-- works by accident on little-endian (0 is 0 in any byte order), butINADDR_LOOPBACKwill not. - Assuming
read()returns a complete message -- TCP is a byte stream. Onewrite()can arrive as multipleread()calls. - Binding to a specific address when you want all interfaces -- use
INADDR_ANY(0.0.0.0) to accept connections on every interface.
TCP Client-Server Programming
The previous chapter showed the individual socket calls. Now we wire them into real programs: an echo server that handles multiple clients, a matching client, graceful shutdown, and protocol framing. By the end, you will have a working chat server.
The Echo Server in C
This server accepts connections in a loop, forks a child process for each client, and echoes back everything it receives.
/* echo_server.c -- fork-per-connection echo server */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <errno.h>
static volatile sig_atomic_t running = 1;
static void handle_sigterm(int sig)
{
(void)sig;
running = 0;
}
static void reap_children(int sig)
{
(void)sig;
while (waitpid(-1, NULL, WNOHANG) > 0)
;
}
static void handle_client(int fd)
{
char buf[4096];
ssize_t n;
while ((n = read(fd, buf, sizeof(buf))) > 0) {
ssize_t written = 0;
while (written < n) {
ssize_t w = write(fd, buf + written, n - written);
if (w <= 0) return;
written += w;
}
}
close(fd);
}
int main(void)
{
struct sigaction sa_term;
memset(&sa_term, 0, sizeof(sa_term));
sa_term.sa_handler = handle_sigterm;
sigaction(SIGTERM, &sa_term, NULL);
sigaction(SIGINT, &sa_term, NULL);
struct sigaction sa_chld;
memset(&sa_chld, 0, sizeof(sa_chld));
sa_chld.sa_handler = reap_children;
sa_chld.sa_flags = SA_RESTART;
sigaction(SIGCHLD, &sa_chld, NULL);
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
if (listen_fd < 0) { perror("socket"); return 1; }
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(7878);
if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind"); return 1;
}
if (listen(listen_fd, 128) < 0) {
perror("listen"); return 1;
}
printf("Echo server listening on port 7878\n");
while (running) {
struct sockaddr_in client;
socklen_t clen = sizeof(client);
int conn_fd = accept(listen_fd, (struct sockaddr *)&client, &clen);
if (conn_fd < 0) {
if (errno == EINTR) continue;
perror("accept");
break;
}
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
printf("New connection from %s:%d\n", ip, ntohs(client.sin_port));
pid_t pid = fork();
if (pid < 0) {
perror("fork");
close(conn_fd);
} else if (pid == 0) {
/* Child: handle client */
close(listen_fd);
handle_client(conn_fd);
_exit(0);
} else {
/* Parent: close the connected fd, keep listening */
close(conn_fd);
}
}
printf("\nShutting down...\n");
close(listen_fd);
return 0;
}
SIGCHLD handler reaps zombie processes. accept() can return EINTR when interrupted; the loop retries. The child closes the listening socket; the parent closes the connected socket.
Caution:
fork()duplicates the entire process. A thousand simultaneous connections means a thousand processes. We will fix this shortly.
The Matching Client
/* echo_client.c -- send lines from stdin, print echoed replies */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netdb.h>
int main(int argc, char *argv[])
{
const char *host = argc > 1 ? argv[1] : "127.0.0.1";
const char *port = argc > 2 ? argv[2] : "7878";
struct addrinfo hints = {0}, *res;
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
int s = getaddrinfo(host, port, &hints, &res);
if (s != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(s));
return 1;
}
int fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (fd < 0) { perror("socket"); return 1; }
if (connect(fd, res->ai_addr, res->ai_addrlen) < 0) {
perror("connect"); return 1;
}
freeaddrinfo(res);
printf("Connected. Type lines to echo (Ctrl-D to quit):\n");
char line[1024];
while (fgets(line, sizeof(line), stdin)) {
write(fd, line, strlen(line));
char buf[1024];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
if (n <= 0) break;
buf[n] = '\0';
printf("echo: %s", buf);
}
close(fd);
return 0;
}
Try It: Start the server, then open three separate terminals each running the client. Verify that all three sessions echo independently.
Thread-Per-Connection Alternative
Threads share the same address space, so they are cheaper than processes. Replace the fork() block with pthread_create.
/* echo_server_threaded.c -- thread-per-connection echo server */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
static void *client_thread(void *arg)
{
int fd = *(int *)arg;
free(arg);
char buf[4096];
ssize_t n;
while ((n = read(fd, buf, sizeof(buf))) > 0) {
ssize_t w = 0;
while (w < n) {
ssize_t ret = write(fd, buf + w, n - w);
if (ret <= 0) goto done;
w += ret;
}
}
done:
close(fd);
return NULL;
}
int main(void)
{
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
if (listen_fd < 0) { perror("socket"); return 1; }
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(7878);
if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind"); return 1;
}
listen(listen_fd, 128);
printf("Threaded echo server on port 7878\n");
for (;;) {
struct sockaddr_in client;
socklen_t clen = sizeof(client);
int conn_fd = accept(listen_fd, (struct sockaddr *)&client, &clen);
if (conn_fd < 0) { perror("accept"); continue; }
int *fdp = malloc(sizeof(int));
*fdp = conn_fd;
pthread_t tid;
if (pthread_create(&tid, NULL, client_thread, fdp) != 0) {
perror("pthread_create");
close(conn_fd);
free(fdp);
} else {
pthread_detach(tid);
}
}
close(listen_fd);
return 0;
}
Compile with gcc -pthread echo_server_threaded.c -o echo_server_threaded.
Caution: We heap-allocate the fd so each thread gets its own copy. Passing
&conn_fddirectly is a race: the main loop may overwriteconn_fdbefore the thread reads it.
Protocol Framing
TCP is a byte stream. If the client sends two messages quickly, the server might receive them glued together in one read() call. You need a protocol to know where one message ends and the next begins.
Two common approaches: (1) length-prefix -- send 4 bytes of length then the payload; (2) delimiter -- terminate each message with \n. Length-prefix is more robust.
Length-Prefix Framing in C
/* framed_send.c -- send a length-prefixed message */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdint.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>
/* Write exactly n bytes */
static int write_all(int fd, const void *buf, size_t n)
{
const char *p = buf;
while (n > 0) {
ssize_t w = write(fd, p, n);
if (w <= 0) return -1;
p += w;
n -= w;
}
return 0;
}
/* Read exactly n bytes */
static int read_all(int fd, void *buf, size_t n)
{
char *p = buf;
while (n > 0) {
ssize_t r = read(fd, p, n);
if (r <= 0) return -1;
p += r;
n -= r;
}
return 0;
}
int send_message(int fd, const char *msg, size_t len)
{
uint32_t net_len = htonl((uint32_t)len);
if (write_all(fd, &net_len, 4) < 0) return -1;
if (write_all(fd, msg, len) < 0) return -1;
return 0;
}
int recv_message(int fd, char *buf, size_t bufsize, size_t *out_len)
{
uint32_t net_len;
if (read_all(fd, &net_len, 4) < 0) return -1;
uint32_t len = ntohl(net_len);
if (len > bufsize) return -1; /* message too large */
if (read_all(fd, buf, len) < 0) return -1;
*out_len = len;
return 0;
}
Caution: Always validate the length prefix. A malicious client could send
0xFFFFFFFFand trick you into allocating 4 GB of memory. Set a maximum message size.
Try It: Write a small main() that connects to the echo server, sends a framed message, then reads the framed response. Verify that even rapid sends are correctly separated.
Rust TCP Server
// echo_server.rs -- multi-threaded echo server in Rust use std::io::{Read, Write}; use std::net::{TcpListener, TcpStream}; use std::thread; fn handle_client(mut stream: TcpStream) { let peer = stream.peer_addr().unwrap(); println!("Connection from {}", peer); let mut buf = [0u8; 4096]; loop { match stream.read(&mut buf) { Ok(0) => break, Ok(n) => { if stream.write_all(&buf[..n]).is_err() { break; } } Err(_) => break, } } println!("{} disconnected", peer); } fn main() -> std::io::Result<()> { let listener = TcpListener::bind("0.0.0.0:7878")?; println!("Echo server listening on port 7878"); for stream in listener.incoming() { match stream { Ok(s) => { thread::spawn(move || handle_client(s)); } Err(e) => eprintln!("accept error: {}", e), } } Ok(()) }
Rust Note:
TcpStreamisSend, so moving it into athread::spawnclosure is safe. The compiler ensures no two threads can access the same stream. Nomallocfor the fd pointer, no detach -- ownership transfer handles everything.
Rust: Length-Prefix Framing
// framed.rs -- length-prefix framing over TCP use std::io::{self, Read, Write}; use std::net::TcpStream; fn send_message(stream: &mut TcpStream, msg: &[u8]) -> io::Result<()> { let len = (msg.len() as u32).to_be_bytes(); stream.write_all(&len)?; stream.write_all(msg)?; Ok(()) } fn recv_message(stream: &mut TcpStream) -> io::Result<Vec<u8>> { let mut len_buf = [0u8; 4]; stream.read_exact(&mut len_buf)?; let len = u32::from_be_bytes(len_buf) as usize; if len > 1_000_000 { return Err(io::Error::new(io::ErrorKind::InvalidData, "message too large")); } let mut buf = vec![0u8; len]; stream.read_exact(&mut buf)?; Ok(buf) } fn main() -> io::Result<()> { let mut stream = TcpStream::connect("127.0.0.1:7878")?; send_message(&mut stream, b"Hello, framed world!")?; let reply = recv_message(&mut stream)?; println!("Got: {}", String::from_utf8_lossy(&reply)); Ok(()) }
read_exact loops internally until exactly N bytes are read. This eliminates the manual read_all loop from C.
A Complete Chat Server in C
This is the culmination: a multi-client chat server where messages from one client are broadcast to all others.
/* chat_server.c -- simple broadcast chat (thread-per-connection) */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define MAX_CLIENTS 64
#define BUF_SIZE 1024
static pthread_mutex_t clients_lock = PTHREAD_MUTEX_INITIALIZER;
static int client_fds[MAX_CLIENTS];
static int client_count = 0;
static void add_client(int fd)
{
pthread_mutex_lock(&clients_lock);
if (client_count < MAX_CLIENTS) {
client_fds[client_count++] = fd;
}
pthread_mutex_unlock(&clients_lock);
}
static void remove_client(int fd)
{
pthread_mutex_lock(&clients_lock);
for (int i = 0; i < client_count; i++) {
if (client_fds[i] == fd) {
client_fds[i] = client_fds[--client_count];
break;
}
}
pthread_mutex_unlock(&clients_lock);
}
static void broadcast(int sender_fd, const char *msg, size_t len)
{
pthread_mutex_lock(&clients_lock);
for (int i = 0; i < client_count; i++) {
if (client_fds[i] != sender_fd) {
write(client_fds[i], msg, len);
}
}
pthread_mutex_unlock(&clients_lock);
}
static void *client_thread(void *arg)
{
int fd = *(int *)arg;
free(arg);
add_client(fd);
char buf[BUF_SIZE];
ssize_t n;
while ((n = read(fd, buf, sizeof(buf))) > 0) {
broadcast(fd, buf, n);
}
remove_client(fd);
close(fd);
return NULL;
}
int main(void)
{
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(9000);
bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
listen(listen_fd, 128);
printf("Chat server on port 9000 (use: nc 127.0.0.1 9000)\n");
for (;;) {
struct sockaddr_in cl;
socklen_t len = sizeof(cl);
int conn = accept(listen_fd, (struct sockaddr *)&cl, &len);
if (conn < 0) continue;
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &cl.sin_addr, ip, sizeof(ip));
printf("%s:%d joined\n", ip, ntohs(cl.sin_port));
int *fdp = malloc(sizeof(int));
*fdp = conn;
pthread_t tid;
pthread_create(&tid, NULL, client_thread, fdp);
pthread_detach(tid);
}
}
Test with multiple nc 127.0.0.1 9000 sessions. Type in one and watch it appear in the others.
Rust Chat Server
// chat_server.rs -- broadcast chat server use std::io::{BufRead, BufReader, Write}; use std::net::{TcpListener, TcpStream, SocketAddr}; use std::sync::{Arc, Mutex}; use std::thread; type ClientList = Arc<Mutex<Vec<(SocketAddr, TcpStream)>>>; fn handle_client(stream: TcpStream, clients: ClientList) { let peer = stream.peer_addr().unwrap(); println!("{} joined", peer); { clients.lock().unwrap().push((peer, stream.try_clone().unwrap())); } let reader = BufReader::new(stream); for line in reader.lines().flatten() { let full = format!("{}: {}\n", peer, line); let list = clients.lock().unwrap(); for (addr, mut s) in list.iter().filter(|(a, _)| *a != peer) { let _ = s.write_all(full.as_bytes()); } } { clients.lock().unwrap().retain(|(a, _)| *a != peer); } println!("{} left", peer); } fn main() -> std::io::Result<()> { let listener = TcpListener::bind("0.0.0.0:9000")?; let clients: ClientList = Arc::new(Mutex::new(Vec::new())); println!("Chat server on port 9000"); for stream in listener.incoming() { let stream = stream?; let clients = Arc::clone(&clients); thread::spawn(move || handle_client(stream, clients)); } Ok(()) }
Rust Note:
Arc<Mutex<Vec<TcpStream>>>is the standard pattern for shared mutable state across threads. The compiler refuses to compile the program if you try to share without proper synchronization. No equivalent compile-time guarantee exists in C.
Driver Prep: Kernel network drivers process packets without the luxury of threads-per-connection. The patterns in chapters 48 and 49 (poll, epoll) are what drivers and high-performance servers use instead.
Graceful Shutdown Pattern
The volatile sig_atomic_t running flag (shown in the fork server above) is the standard approach. The signal handler sets it to 0; the main loop checks it. Close the listening socket to unblock accept(), then wait for in-flight clients to finish before exiting.
Knowledge Check
- Why does the fork-based server close
conn_fdin the parent andlisten_fdin the child? - What happens if you omit the mutex around the client list in the chat server?
- How does length-prefix framing solve the TCP message-boundary problem?
Common Pitfalls
- Not handling partial writes --
write()can return fewer bytes than requested. Always loop. - Not handling partial reads -- same issue on the receive side.
read()returns whatever is available, not a complete message. - Zombie processes -- forgetting
SIGCHLDhandler with fork-per-connection fills the process table. - Thread stack overflow -- each thread allocates a stack (typically 2-8 MB). Thousands of threads consume gigabytes of memory.
- Broadcasting while holding the lock too long -- a slow client's
write()can block, stalling all other broadcasts. Consider non-blocking I/O or per-client queues. - Forgetting
SO_REUSEADDR-- restarting the server gives "Address already in use" for up to 60 seconds.
UDP and Datagram Sockets
UDP is the other transport protocol on IP. It provides no connections, no guaranteed delivery, no ordering. You send a datagram, and it either arrives whole or not at all. This simplicity makes UDP fast and the right choice for DNS lookups, live video, gaming, and service discovery.
This chapter covers sendto/recvfrom, multicast, broadcast, and builds a practical service discovery protocol.
UDP vs TCP Recap
TCP (SOCK_STREAM) UDP (SOCK_DGRAM)
+----------------------------------+----------------------------------+
| 3-way handshake | No handshake |
| Guaranteed delivery (retransmit) | Fire and forget |
| Ordered byte stream | Independent datagrams |
| Flow control, congestion control | None built-in |
| ~200 bytes overhead per segment | 8-byte header |
+----------------------------------+----------------------------------+
A UDP Echo Server in C
/* udp_echo_server.c -- receive datagrams, echo them back */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0) { perror("socket"); return 1; }
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(5000);
if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind"); return 1;
}
printf("UDP echo server on port 5000\n");
for (;;) {
char buf[65535];
struct sockaddr_in client;
socklen_t clen = sizeof(client);
ssize_t n = recvfrom(fd, buf, sizeof(buf), 0,
(struct sockaddr *)&client, &clen);
if (n < 0) { perror("recvfrom"); continue; }
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
printf("From %s:%d (%zd bytes)\n", ip, ntohs(client.sin_port), n);
/* Echo back to sender */
sendto(fd, buf, n, 0, (struct sockaddr *)&client, clen);
}
close(fd);
return 0;
}
Notice: no listen(), no accept(). A single socket handles all clients. recvfrom tells you who sent the datagram; sendto sends a reply directly to that address.
A UDP Client in C
/* udp_client.c -- send a message, wait for reply */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0) { perror("socket"); return 1; }
struct sockaddr_in server = {0};
server.sin_family = AF_INET;
server.sin_port = htons(5000);
inet_pton(AF_INET, "127.0.0.1", &server.sin_addr);
const char *msg = "Hello, UDP!";
sendto(fd, msg, strlen(msg), 0,
(struct sockaddr *)&server, sizeof(server));
char buf[1024];
struct sockaddr_in from;
socklen_t flen = sizeof(from);
ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
(struct sockaddr *)&from, &flen);
if (n > 0) {
buf[n] = '\0';
printf("Reply: %s\n", buf);
}
close(fd);
return 0;
}
Caution:
recvfromblocks forever if no reply comes. In production, set a timeout withsetsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, ...)or usepoll()before reading.
Try It: Start the UDP server, run the client, then kill the server and run the client again. Observe that
sendtosucceeds even though nobody is listening -- UDP does not detect that the remote end is unreachable (unless the network returns an ICMP port-unreachable, which may or may not arrive).
When UDP Is Appropriate
- DNS -- a single question-answer exchange. Retransmit if no reply in 2 seconds.
- Gaming -- player position updates arrive 60 times per second. A lost packet is stale by the time it would be retransmitted.
- Live video/audio -- a dropped frame is better than a delayed frame.
- Service discovery -- "who's on the network?" is a broadcast/multicast question, and TCP cannot broadcast.
- IoT sensors -- tiny devices with limited memory cannot afford TCP's state machine.
Handling Packet Loss at the Application Layer
UDP gives you no retransmission. If reliability matters, build it yourself.
Sender Receiver
| |
|--- [seq=1] data ------------->|
| |--- [ack=1] ---------->|
|--- [seq=2] data ------------->|
| (lost) |
|--- (timeout, resend seq=2) -->|
| |--- [ack=2] ---------->|
The minimum reliable protocol over UDP:
- Attach a sequence number to each datagram.
- The receiver acknowledges each sequence number.
- The sender retransmits if no acknowledgment arrives within a timeout.
- The receiver discards duplicates.
/* reliable_header.h -- minimal reliability over UDP */
#ifndef RELIABLE_HEADER_H
#define RELIABLE_HEADER_H
#include <stdint.h>
struct reliable_hdr {
uint32_t seq; /* sequence number (network byte order) */
uint32_t ack; /* acknowledgment number */
uint16_t flags; /* 0x01 = DATA, 0x02 = ACK */
uint16_t len; /* payload length */
};
#define FLAG_DATA 0x01
#define FLAG_ACK 0x02
#endif
Driver Prep: Many industrial and automotive protocols (CAN bus, some PROFINET variants) run on UDP or raw frames and implement their own reliability layer. This pattern shows up everywhere below TCP.
Broadcast
Broadcast sends a datagram to every host on the local subnet. The destination address is 255.255.255.255 (limited broadcast) or the subnet broadcast address (e.g., 192.168.1.255 for a /24 network).
/* broadcast_sender.c -- send a broadcast message */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0) { perror("socket"); return 1; }
/* Must enable broadcast on the socket */
int broadcast = 1;
if (setsockopt(fd, SOL_SOCKET, SO_BROADCAST,
&broadcast, sizeof(broadcast)) < 0) {
perror("setsockopt"); return 1;
}
struct sockaddr_in dest = {0};
dest.sin_family = AF_INET;
dest.sin_port = htons(5001);
inet_pton(AF_INET, "255.255.255.255", &dest.sin_addr);
const char *msg = "DISCOVER";
sendto(fd, msg, strlen(msg), 0,
(struct sockaddr *)&dest, sizeof(dest));
printf("Broadcast sent\n");
close(fd);
return 0;
}
Caution: Broadcasting generates traffic that every host on the subnet must process. Do it sparingly. On large networks, prefer multicast.
Multicast
Multicast sends datagrams to a group address (224.0.0.0 - 239.255.255.255). Only hosts that join the group receive the traffic. The network infrastructure (IGMP) handles group membership.
/* mcast_receiver.c -- join a multicast group and print messages */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0) { perror("socket"); return 1; }
/* Allow multiple receivers on same port */
int reuse = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(5002);
if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind"); return 1;
}
/* Join multicast group 239.1.1.1 */
struct ip_mreq mreq;
inet_pton(AF_INET, "239.1.1.1", &mreq.imr_multiaddr);
mreq.imr_interface.s_addr = htonl(INADDR_ANY);
if (setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
&mreq, sizeof(mreq)) < 0) {
perror("setsockopt IP_ADD_MEMBERSHIP"); return 1;
}
printf("Joined multicast group 239.1.1.1, listening on port 5002\n");
for (;;) {
char buf[1024];
ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0, NULL, NULL);
if (n < 0) { perror("recvfrom"); break; }
buf[n] = '\0';
printf("Received: %s\n", buf);
}
close(fd);
return 0;
}
/* mcast_sender.c -- send to a multicast group */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0) { perror("socket"); return 1; }
/* Set TTL for multicast (1 = local subnet only) */
unsigned char ttl = 1;
setsockopt(fd, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, sizeof(ttl));
struct sockaddr_in dest = {0};
dest.sin_family = AF_INET;
dest.sin_port = htons(5002);
inet_pton(AF_INET, "239.1.1.1", &dest.sin_addr);
const char *msg = "Hello, multicast group!";
sendto(fd, msg, strlen(msg), 0,
(struct sockaddr *)&dest, sizeof(dest));
printf("Sent to multicast group 239.1.1.1\n");
close(fd);
return 0;
}
Try It: Start two or more
mcast_receiverprocesses, then runmcast_sender. All receivers should print the message. Then stop one receiver and verify the others still work.
A Simple Discovery Protocol
Combine broadcast and timed responses to build a LAN service discovery mechanism.
/* discover_server.c -- respond to discovery broadcasts */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
int reuse = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(5003);
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
printf("Discovery responder on port 5003\n");
for (;;) {
char buf[256];
struct sockaddr_in client;
socklen_t clen = sizeof(client);
ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
(struct sockaddr *)&client, &clen);
if (n < 0) continue;
buf[n] = '\0';
if (strcmp(buf, "DISCOVER") == 0) {
const char *reply = "SERVICE:echo:7878";
sendto(fd, reply, strlen(reply), 0,
(struct sockaddr *)&client, clen);
}
}
}
/* discover_client.c -- broadcast DISCOVER, collect responses */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
int broadcast = 1;
setsockopt(fd, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof(broadcast));
/* Set 2-second receive timeout */
struct timeval tv = { .tv_sec = 2, .tv_usec = 0 };
setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
struct sockaddr_in dest = {0};
dest.sin_family = AF_INET;
dest.sin_port = htons(5003);
inet_pton(AF_INET, "255.255.255.255", &dest.sin_addr);
sendto(fd, "DISCOVER", 8, 0,
(struct sockaddr *)&dest, sizeof(dest));
printf("Sent DISCOVER broadcast, waiting for replies...\n");
for (;;) {
char buf[256];
struct sockaddr_in from;
socklen_t flen = sizeof(from);
ssize_t n = recvfrom(fd, buf, sizeof(buf) - 1, 0,
(struct sockaddr *)&from, &flen);
if (n < 0) break; /* timeout or error */
buf[n] = '\0';
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &from.sin_addr, ip, sizeof(ip));
printf("Found: %s at %s\n", buf, ip);
}
printf("Discovery complete.\n");
close(fd);
return 0;
}
Discovery flow:
Client Network Server(s)
| | |
|-- DISCOVER (broadcast) ------->| -------> |
| | [server receives] |
|<------- SERVICE:echo:7878 -----|<------ |
| | |
| (timeout: 2 seconds) | |
| [done] | |
Rust: UdpSocket
// udp_echo_server.rs -- UDP echo server use std::net::UdpSocket; fn main() -> std::io::Result<()> { let socket = UdpSocket::bind("0.0.0.0:5000")?; println!("UDP echo server on port 5000"); let mut buf = [0u8; 65535]; loop { let (n, src) = socket.recv_from(&mut buf)?; println!("From {} ({} bytes)", src, n); socket.send_to(&buf[..n], src)?; } }
// udp_client.rs -- send a datagram, receive reply use std::net::UdpSocket; use std::time::Duration; fn main() -> std::io::Result<()> { let socket = UdpSocket::bind("0.0.0.0:0")?; // OS picks port socket.set_read_timeout(Some(Duration::from_secs(3)))?; socket.send_to(b"Hello, UDP!", "127.0.0.1:5000")?; let mut buf = [0u8; 1024]; match socket.recv_from(&mut buf) { Ok((n, src)) => { println!("Reply from {}: {}", src, String::from_utf8_lossy(&buf[..n])); } Err(e) => eprintln!("No reply: {}", e), } Ok(()) }
Rust Note:
UdpSocket::bind("0.0.0.0:0")binds to a random available port. The address string is parsed via theToSocketAddrstrait, which also handles DNS resolution. Theset_read_timeoutmethod replaces the Csetsockoptdance.
Rust: Multicast
// mcast_receiver.rs -- join multicast group and receive use std::net::{UdpSocket, Ipv4Addr}; fn main() -> std::io::Result<()> { let socket = UdpSocket::bind("0.0.0.0:5002")?; let multiaddr: Ipv4Addr = "239.1.1.1".parse().unwrap(); let interface = Ipv4Addr::UNSPECIFIED; socket.join_multicast_v4(&multiaddr, &interface)?; println!("Joined multicast group 239.1.1.1 on port 5002"); let mut buf = [0u8; 1024]; loop { let (n, src) = socket.recv_from(&mut buf)?; println!("From {}: {}", src, String::from_utf8_lossy(&buf[..n])); } }
// mcast_sender.rs -- send to multicast group use std::net::UdpSocket; fn main() -> std::io::Result<()> { let socket = UdpSocket::bind("0.0.0.0:0")?; socket.set_multicast_ttl_v4(1)?; socket.send_to(b"Hello, multicast group!", "239.1.1.1:5002")?; println!("Sent to multicast group"); Ok(()) }
Rust: Discovery Protocol
// discover_client.rs -- broadcast discovery and collect replies use std::net::UdpSocket; use std::time::Duration; fn main() -> std::io::Result<()> { let socket = UdpSocket::bind("0.0.0.0:0")?; socket.set_broadcast(true)?; socket.set_read_timeout(Some(Duration::from_secs(2)))?; socket.send_to(b"DISCOVER", "255.255.255.255:5003")?; println!("Sent DISCOVER, waiting for replies..."); let mut buf = [0u8; 256]; loop { match socket.recv_from(&mut buf) { Ok((n, src)) => { let msg = String::from_utf8_lossy(&buf[..n]); println!("Found: {} at {}", msg, src); } Err(_) => break, } } println!("Discovery complete."); Ok(()) }
Maximum Datagram Size
+-- Ethernet MTU: 1500 bytes ---+
| IP header (20B) | UDP (8B) | payload (up to 1472B) |
+------------------+-----------+-----------------------+
Larger datagrams are fragmented by IP.
Any lost fragment = entire datagram lost.
Safe payload size for LAN: 1472 bytes
Safe payload size for internet: ~512 bytes (conservative)
Maximum theoretical UDP payload: 65,507 bytes
Caution: Sending 64 KB datagrams over the internet is asking for trouble. IP fragmentation dramatically increases the chance of packet loss because losing any single fragment kills the entire datagram. Stay under the path MTU.
Knowledge Check
- Why does a UDP server not need
listen()oraccept()? - What socket option must be enabled before calling
sendtowith a broadcast address? - How does multicast differ from broadcast in terms of network traffic?
Common Pitfalls
- Assuming delivery -- UDP does not guarantee anything. Always plan for lost packets.
- Assuming ordering -- datagrams can arrive out of order, especially across the internet.
- Forgetting
SO_BROADCAST--sendtowith a broadcast address fails withEACCESwithout it. - Large datagrams -- IP fragmentation silently destroys reliability. Keep payloads small.
- No timeout on
recvfrom-- blocks forever if no packet arrives. Always setSO_RCVTIMEOor usepoll(). - Multicast on loopback only -- by default, multicast may not leave the loopback interface. Check your routing table if receivers on other hosts do not get packets.
Multiplexing with select and poll
Blocking I/O is simple: call read(), wait for data, process it. But a server with 100 clients cannot call read() on all 100 sockets at the same time. It blocks on the first one and ignores the other 99. Fork-per-connection and thread-per-connection solve this, but they are expensive. I/O multiplexing lets a single thread monitor many file descriptors and act only on the ones that are ready.
This chapter covers select() and poll(), their APIs, and their limitations.
The Problem
Thread blocked on fd 3: fds 4, 5, 6 have data waiting
+---+ +---+---+---+
| 3 | <-- read() blocks | 4 | 5 | 6 | data piling up
+---+ +---+---+---+
With multiplexing:
+---+---+---+---+
| 3 | 4 | 5 | 6 | <-- "which of these are ready?"
+---+---+---+---+
|
v
"fd 4 and fd 6 are ready to read"
|
v
read(4, ...) read(6, ...) <-- no blocking
select() in C
select() watches three sets of file descriptors: readable, writable, and exceptional. It blocks until at least one fd is ready or a timeout expires.
/* select_server.c -- single-threaded multi-client echo with select() */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/select.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(7878);
bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
listen(listen_fd, 128);
printf("select server on port 7878 (max %d fds)\n", FD_SETSIZE);
fd_set master_set;
FD_ZERO(&master_set);
FD_SET(listen_fd, &master_set);
int max_fd = listen_fd;
for (;;) {
fd_set read_set = master_set; /* select modifies the set */
int ready = select(max_fd + 1, &read_set, NULL, NULL, NULL);
if (ready < 0) { perror("select"); break; }
for (int fd = 0; fd <= max_fd; fd++) {
if (!FD_ISSET(fd, &read_set))
continue;
if (fd == listen_fd) {
/* New connection */
struct sockaddr_in client;
socklen_t clen = sizeof(client);
int conn = accept(listen_fd,
(struct sockaddr *)&client, &clen);
if (conn < 0) { perror("accept"); continue; }
if (conn >= FD_SETSIZE) {
fprintf(stderr, "fd %d exceeds FD_SETSIZE\n", conn);
close(conn);
continue;
}
FD_SET(conn, &master_set);
if (conn > max_fd) max_fd = conn;
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
printf("+ %s:%d (fd %d)\n",
ip, ntohs(client.sin_port), conn);
} else {
/* Data from existing client */
char buf[1024];
ssize_t n = read(fd, buf, sizeof(buf));
if (n <= 0) {
printf("- fd %d disconnected\n", fd);
close(fd);
FD_CLR(fd, &master_set);
} else {
write(fd, buf, n);
}
}
}
}
close(listen_fd);
return 0;
}
The fd_set API
| Macro/Function | Purpose |
|---|---|
FD_ZERO(&set) | Clear all bits |
FD_SET(fd, &set) | Add fd to set |
FD_CLR(fd, &set) | Remove fd from set |
FD_ISSET(fd, &set) | Test if fd is in set |
select(nfds, r, w, e, t) | Block until fd(s) ready or timeout |
The first argument to select() is the highest fd number plus one. The kernel scans from 0 to nfds-1.
Caution:
FD_SETSIZEis typically 1024 on Linux. If your server opens fd 1024 or higher,FD_SETwrites out of bounds, corrupting memory silently. This is undefined behavior, not a clean error. For servers that may handle more than ~1000 connections, usepoll()orepollinstead.
Try It: Connect 5 clients to the select server using
nc 127.0.0.1 7878. Type in different terminals and verify they all echo independently with no threads.
select() with Timeout
/* select_timeout.c -- wait for stdin with a 3-second timeout */
#include <stdio.h>
#include <sys/select.h>
#include <unistd.h>
int main(void)
{
printf("Type something within 3 seconds...\n");
fd_set fds;
FD_ZERO(&fds);
FD_SET(STDIN_FILENO, &fds);
struct timeval tv;
tv.tv_sec = 3;
tv.tv_usec = 0;
int ret = select(STDIN_FILENO + 1, &fds, NULL, NULL, &tv);
if (ret > 0 && FD_ISSET(STDIN_FILENO, &fds)) {
char buf[256];
ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
buf[n] = '\0';
printf("You typed: %s", buf);
} else if (ret == 0) {
printf("Timeout!\n");
} else {
perror("select");
}
return 0;
}
Caution: On Linux,
select()modifies thetimevalstruct to reflect remaining time. Do not reuse it across calls without re-initializing. This behavior is Linux-specific and not portable.
poll() in C
poll() fixes the fd limit problem. Instead of a fixed-size bitmask, it takes an array of struct pollfd. You can monitor as many fds as the system allows.
/* poll_server.c -- single-threaded multi-client echo with poll() */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <poll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define MAX_FDS 4096
int main(void)
{
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(7879);
bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
listen(listen_fd, 128);
printf("poll server on port 7879\n");
struct pollfd fds[MAX_FDS];
int nfds = 0;
/* First entry: the listening socket */
fds[0].fd = listen_fd;
fds[0].events = POLLIN;
nfds = 1;
for (;;) {
int ready = poll(fds, nfds, -1); /* -1 = block forever */
if (ready < 0) { perror("poll"); break; }
/* Check listening socket first */
if (fds[0].revents & POLLIN) {
struct sockaddr_in client;
socklen_t clen = sizeof(client);
int conn = accept(listen_fd,
(struct sockaddr *)&client, &clen);
if (conn >= 0 && nfds < MAX_FDS) {
fds[nfds].fd = conn;
fds[nfds].events = POLLIN;
nfds++;
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
printf("+ %s:%d (fd %d, slot %d)\n",
ip, ntohs(client.sin_port), conn, nfds - 1);
} else {
if (conn >= 0) close(conn); /* too many fds */
}
}
/* Check client sockets */
for (int i = 1; i < nfds; i++) {
if (fds[i].revents & (POLLIN | POLLERR | POLLHUP)) {
char buf[1024];
ssize_t n = read(fds[i].fd, buf, sizeof(buf));
if (n <= 0) {
printf("- fd %d disconnected\n", fds[i].fd);
close(fds[i].fd);
/* Swap with last entry to compact array */
fds[i] = fds[--nfds];
i--; /* re-check this slot */
} else {
write(fds[i].fd, buf, n);
}
}
}
}
close(listen_fd);
return 0;
}
struct pollfd
struct pollfd {
int fd; /* file descriptor */
short events; /* requested events (input) */
short revents; /* returned events (output) */
};
| Flag | Meaning |
|---|---|
POLLIN | Data available to read |
POLLOUT | Writing will not block |
POLLERR | Error condition (output only) |
POLLHUP | Hang up (output only) |
POLLNVAL | Invalid fd (output only) |
Caution:
POLLERRandPOLLHUPare always monitored even if you do not set them inevents. When they fire, you must handle them -- typically by closing the fd.
select vs poll
select() poll()
+------------------------------------+------------------------------------+
| Fixed fd limit (FD_SETSIZE=1024) | No fd limit (array of pollfd) |
| Bitmask modified on each call | revents field written, events kept |
| Must rebuild fd_set each iteration | Array persists between calls |
| O(max_fd) scanning | O(nfds) scanning |
| Portable (POSIX, Windows) | POSIX only (not native Windows) |
+------------------------------------+------------------------------------+
Both share the fundamental limitation:
The kernel scans the ENTIRE fd list on every call, even if only one fd is ready.
At 10,000 fds, both spend most of their time scanning fds that have no events.
Monitoring for Writability
Sometimes you need to know when a socket is ready for writing -- for example, after a connect() in non-blocking mode, or when an output buffer was full.
/* poll_write.c -- detect when a non-blocking connect() completes */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <poll.h>
#include <errno.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void)
{
int fd = socket(AF_INET, SOCK_STREAM, 0);
/* Set non-blocking */
int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_port = htons(80);
inet_pton(AF_INET, "93.184.216.34", &addr.sin_addr); /* example.com */
int ret = connect(fd, (struct sockaddr *)&addr, sizeof(addr));
if (ret < 0 && errno != EINPROGRESS) {
perror("connect"); return 1;
}
/* Wait for connection to complete */
struct pollfd pfd = { .fd = fd, .events = POLLOUT };
int ready = poll(&pfd, 1, 5000); /* 5-second timeout */
if (ready > 0 && (pfd.revents & POLLOUT)) {
int err = 0;
socklen_t elen = sizeof(err);
getsockopt(fd, SOL_SOCKET, SO_ERROR, &err, &elen);
if (err == 0) {
printf("Connected!\n");
} else {
printf("Connection failed: %s\n", strerror(err));
}
} else {
printf("Timeout or error\n");
}
close(fd);
return 0;
}
Driver Prep: The kernel's internal
pollmechanism (struct file_operations.poll) works on the same principle. When you write a character device driver, you implement apollcallback so that userspaceselect()/poll()works on your device fd.
Rust: Using nix for select and poll
The Rust standard library does not expose select() or poll() directly. The nix crate provides safe wrappers.
poll with nix
// poll_server.rs -- multi-client echo with nix::poll // Cargo.toml: nix = { version = "0.29", features = ["poll", "net"] } use nix::poll::{poll, PollFd, PollFlags}; use std::collections::HashMap; use std::io::{Read, Write}; use std::net::{TcpListener, TcpStream}; use std::os::fd::AsRawFd; fn main() -> std::io::Result<()> { let listener = TcpListener::bind("0.0.0.0:7879")?; listener.set_nonblocking(true)?; println!("Rust poll server on port 7879"); let mut poll_fds: Vec<PollFd> = vec![ PollFd::new(listener.as_raw_fd(), PollFlags::POLLIN), ]; let mut clients: HashMap<i32, TcpStream> = HashMap::new(); loop { let _ready = poll(&mut poll_fds, -1) .expect("poll failed"); let mut new_fds: Vec<PollFd> = Vec::new(); let mut remove_fds: Vec<i32> = Vec::new(); for pfd in &poll_fds { let revents = pfd.revents().unwrap_or(PollFlags::empty()); let fd = pfd.as_raw_fd(); if fd == listener.as_raw_fd() { if revents.contains(PollFlags::POLLIN) { // Accept all pending connections loop { match listener.accept() { Ok((stream, addr)) => { println!("+ {}", addr); stream.set_nonblocking(true).ok(); let raw = stream.as_raw_fd(); new_fds.push( PollFd::new(raw, PollFlags::POLLIN) ); clients.insert(raw, stream); } Err(_) => break, } } } } else if revents.intersects( PollFlags::POLLIN | PollFlags::POLLERR | PollFlags::POLLHUP ) { let mut buf = [0u8; 1024]; if let Some(stream) = clients.get_mut(&fd) { match stream.read(&mut buf) { Ok(0) | Err(_) => { println!("- fd {}", fd); remove_fds.push(fd); } Ok(n) => { let _ = stream.write_all(&buf[..n]); } } } } } // Remove disconnected clients for fd in &remove_fds { clients.remove(fd); poll_fds.retain(|p| p.as_raw_fd() != *fd); } // Add new connections poll_fds.extend(new_fds); } }
Rust Note: Rust's ownership model prevents the common C bug of using a closed fd. Once the
TcpStreamis removed from the HashMap, it is dropped, and the fd is closed. No dangling fd in the poll set -- theretaincall removes the stale entry.
select with nix
// select_demo.rs -- wait for stdin with timeout using nix::select // Cargo.toml: nix = { version = "0.29", features = ["select"] } use nix::sys::select::{select, FdSet}; use nix::sys::time::TimeVal; use std::io::Read; use std::os::fd::AsRawFd; fn main() { println!("Type something within 3 seconds..."); let stdin_fd = std::io::stdin().as_raw_fd(); let mut read_fds = FdSet::new(); read_fds.insert(stdin_fd); let mut timeout = TimeVal::new(3, 0); match select( stdin_fd + 1, Some(&mut read_fds), None, None, Some(&mut timeout), ) { Ok(n) if n > 0 => { let mut buf = [0u8; 256]; let n = std::io::stdin().read(&mut buf).unwrap(); print!("You typed: {}", String::from_utf8_lossy(&buf[..n])); } Ok(_) => println!("Timeout!"), Err(e) => eprintln!("select error: {}", e), } }
When to Use What
Connections Recommendation
----------- -------------------------------------------
< 10 select() is fine, simple and portable
10 - 1000 poll() removes the fd limit
> 1000 epoll (next chapter) -- O(1) notification
Both select and poll have the same fundamental scaling problem: on every call, the kernel walks the entire list of file descriptors to check which are ready. With 10,000 fds, this linear scan dominates the server's CPU time. The next chapter introduces epoll, which solves this.
Knowledge Check
- What is
FD_SETSIZEand why is it dangerous to exceed it withselect()? - How does
poll()avoid the fd limit problem ofselect()? - Why do both
select()andpoll()have O(n) per-call overhead?
Common Pitfalls
- Not re-initializing
fd_set--select()modifies the set in place. You must copy the master set before each call. - Exceeding
FD_SETSIZE-- silent memory corruption. No error, no warning, just data corruption and crashes. - Forgetting to handle
POLLERR/POLLHUP-- the fd is signaled but reading from it yields an error. Infinite busy-loop if not handled. - Not compacting the
pollfdarray -- leaving closed fds in the array withfd = -1works (poll ignores them) but wastes scanning time. - Assuming
select()timeout is preserved -- on Linux,timevalis updated to reflect remaining time. Reuse without reinitializing gives shorter and shorter timeouts until you are busy-polling. - Using
select()for high-fd-count servers -- it was designed in 1983 for a handful of file descriptors. Usepoll()orepollinstead.
epoll: Scalable Event-Driven I/O
select and poll scan every file descriptor on every call. With 50,000 connections, most of them idle, you spend most of your CPU time checking fds that have nothing to report. epoll fixes this by maintaining a ready list inside the kernel. Only fds that actually have events appear in the results. This is O(1) with respect to the total number of monitored fds and O(k) with respect to the number of ready fds.
This chapter builds a complete single-threaded event loop from scratch, covers level-triggered vs edge-triggered semantics, and connects to the Rust ecosystem through the nix and mio crates.
The Three epoll Calls
epoll_create1(flags) --> returns an epoll fd
epoll_ctl(epfd, op, fd, ev) --> add/modify/remove a watched fd
epoll_wait(epfd, events, max, timeout) --> wait for ready fds
Kernel Userspace
+------------------+
| epoll instance |
| interest list: | epoll_ctl(ADD, fd=5)
| [fd=5, fd=9] | <--------- epoll_ctl(ADD, fd=9)
| |
| ready list: | epoll_wait() blocks...
| [fd=5] | ---------> returns: fd=5 has EPOLLIN
+------------------+
Only fd 5 is ready. The kernel does not scan fd 9 at all. With 50,000 fds and 3 ready, epoll_wait returns immediately with just those 3.
A Complete epoll Echo Server in C
/* epoll_server.c -- single-threaded echo server using epoll */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define MAX_EVENTS 64
#define BUF_SIZE 4096
static int set_nonblocking(int fd)
{
int flags = fcntl(fd, F_GETFL, 0);
if (flags < 0) return -1;
return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
int main(void)
{
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
if (listen_fd < 0) { perror("socket"); return 1; }
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {0};
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(7878);
if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind"); return 1;
}
if (listen(listen_fd, 128) < 0) {
perror("listen"); return 1;
}
set_nonblocking(listen_fd);
/* Create epoll instance */
int epfd = epoll_create1(0);
if (epfd < 0) { perror("epoll_create1"); return 1; }
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = listen_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);
struct epoll_event events[MAX_EVENTS];
printf("epoll echo server on port 7878\n");
for (;;) {
int nready = epoll_wait(epfd, events, MAX_EVENTS, -1);
if (nready < 0) {
if (errno == EINTR) continue;
perror("epoll_wait");
break;
}
for (int i = 0; i < nready; i++) {
int fd = events[i].data.fd;
if (fd == listen_fd) {
/* Accept all pending connections */
for (;;) {
struct sockaddr_in client;
socklen_t clen = sizeof(client);
int conn = accept(listen_fd,
(struct sockaddr *)&client, &clen);
if (conn < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK)
break; /* no more pending */
perror("accept");
break;
}
set_nonblocking(conn);
ev.events = EPOLLIN | EPOLLET; /* edge-triggered */
ev.data.fd = conn;
epoll_ctl(epfd, EPOLL_CTL_ADD, conn, &ev);
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &client.sin_addr, ip, sizeof(ip));
printf("+ %s:%d (fd %d)\n",
ip, ntohs(client.sin_port), conn);
}
} else {
/* Data from a client (edge-triggered: drain fully) */
char buf[BUF_SIZE];
for (;;) {
ssize_t n = read(fd, buf, sizeof(buf));
if (n < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK)
break; /* no more data right now */
/* Real error */
close(fd);
break;
}
if (n == 0) {
/* Client disconnected */
printf("- fd %d\n", fd);
close(fd);
break;
}
/* Echo back */
ssize_t written = 0;
while (written < n) {
ssize_t w = write(fd, buf + written, n - written);
if (w < 0) {
if (errno == EAGAIN) break;
close(fd);
goto next_event;
}
written += w;
}
}
next_event: ;
}
}
}
close(epfd);
close(listen_fd);
return 0;
}
Compile and run: gcc -o epoll_server epoll_server.c && ./epoll_server. Test with multiple nc 127.0.0.1 7878 sessions.
struct epoll_event and the data Union
struct epoll_event {
uint32_t events; /* EPOLLIN, EPOLLOUT, EPOLLET, ... */
epoll_data_t data; /* user data returned with event */
};
typedef union epoll_data {
void *ptr; /* pointer to your own struct */
int fd; /* file descriptor */
uint32_t u32;
uint64_t u64;
} epoll_data_t;
The data field is your tag. The kernel passes it back to you untouched in epoll_wait. Most simple servers use data.fd. Complex servers store a pointer to a connection struct:
struct connection {
int fd;
char read_buf[4096];
size_t read_len;
/* ... */
};
struct connection *conn = malloc(sizeof(*conn));
conn->fd = accepted_fd;
struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET;
ev.data.ptr = conn;
epoll_ctl(epfd, EPOLL_CTL_ADD, accepted_fd, &ev);
Then in the event loop: struct connection *c = events[i].data.ptr;
Level-Triggered vs Edge-Triggered
This is the most important distinction in epoll.
Level-triggered (default):
"Notify me AS LONG AS the fd is ready"
Edge-triggered (EPOLLET):
"Notify me ONCE WHEN the fd BECOMES ready"
Data arrives: [####]
Level-triggered:
epoll_wait -> EPOLLIN (data available)
read(100 bytes) (still 300 bytes left)
epoll_wait -> EPOLLIN (still ready -- data remains)
read(300 bytes)
epoll_wait -> blocks (no more data)
Edge-triggered:
epoll_wait -> EPOLLIN (data just arrived)
read(100 bytes) (still 300 bytes left)
epoll_wait -> BLOCKS (no NEW data arrived -- edge already fired)
*** 300 bytes stuck in the buffer forever ***
Caution: With edge-triggered mode, you MUST read until
EAGAINon every notification. If you stop reading early, the remaining data is stranded. The kernel will not notify you again until NEW data arrives.
The Edge-Triggered + Non-Blocking Pattern
This is the canonical pattern that all high-performance servers use:
- Set the fd to
O_NONBLOCK - Register with
EPOLLET - On
EPOLLIN, loopread()until it returnsEAGAIN - On
EPOLLOUT, loopwrite()until it returnsEAGAIN
while (true) {
n = read(fd, buf, sizeof(buf));
if (n > 0) {
process(buf, n);
continue;
}
if (n < 0 && errno == EAGAIN) {
break; // <-- all data consumed, wait for next edge
}
if (n == 0) {
close(fd); // client disconnected
break;
}
// n < 0 && errno != EAGAIN: real error
close(fd);
break;
}
Why Edge-Triggered?
Level-triggered is simpler and less error-prone. So why bother with edge-triggered?
Thundering herd. If multiple threads each have their own epoll_wait on the same epoll fd (a common pattern), level-triggered wakes ALL of them when data arrives. Only one can read() successfully; the rest wake up for nothing. Edge-triggered fires only once, waking a single thread.
Efficiency. Level-triggered can cause redundant wake-ups. If you know you are going to drain the entire buffer anyway, edge-triggered avoids the kernel re-checking readiness on the next epoll_wait.
In practice, most applications start with level-triggered and switch to edge-triggered only when they need the performance.
EPOLLONESHOT
For multi-threaded servers where multiple threads call epoll_wait, EPOLLONESHOT disables the fd after one event fires. You must re-arm it with EPOLL_CTL_MOD after processing. This guarantees exactly one thread handles a given fd at a time.
/* Register with EPOLLONESHOT */
ev.events = EPOLLIN | EPOLLET | EPOLLONESHOT;
ev.data.fd = conn_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, conn_fd, &ev);
/* After processing, re-arm */
ev.events = EPOLLIN | EPOLLET | EPOLLONESHOT;
ev.data.fd = conn_fd;
epoll_ctl(epfd, EPOLL_CTL_MOD, conn_fd, &ev);
The Reactor Pattern
The event loop in the epoll server is an instance of the reactor pattern:
+----------------------------+
| Event Loop |
| +----------------------+ |
| | epoll_wait() | |
| +----------+-----------+ |
| | |
| +--------+--------+ |
| | | |
| accept read |
| handler handler |
| | | |
| register process |
| new fd + reply |
+----------------------------+
The reactor:
- Waits for events (demultiplexing)
- Dispatches each event to a handler
- Handlers are non-blocking and complete quickly
- Returns to step 1
This single-threaded design handles thousands of connections with one thread, zero locks, zero context switches.
In production, you use data.ptr to store per-connection state (read buffers, write queues, protocol state machines). The epoll echo server above uses data.fd for simplicity, but real servers like nginx, Redis, and memcached all use the pointer variant with handler dispatch. This is the skeleton every event-driven C server builds on.
Try It: Modify the epoll echo server to use
data.ptrwith astruct connectionthat includes a write buffer. Whenwrite()returnsEAGAIN, store the remaining data and register forEPOLLOUT. When the fd becomes writable, flush the buffer and switch back toEPOLLIN.
Rust: epoll via the nix Crate
// epoll_server.rs -- event loop using nix::sys::epoll // Cargo.toml: nix = { version = "0.29", features = ["epoll", "net", "fs"] } use nix::sys::epoll::*; use std::collections::HashMap; use std::io::{self, Read, Write}; use std::net::{TcpListener, TcpStream}; use std::os::fd::{AsRawFd, RawFd}; fn set_nonblocking(stream: &TcpStream) { stream.set_nonblocking(true).expect("set_nonblocking"); } fn main() -> io::Result<()> { let listener = TcpListener::bind("0.0.0.0:7878")?; listener.set_nonblocking(true)?; println!("Rust epoll server on port 7878"); let epfd = Epoll::new(EpollCreateFlags::empty()) .expect("epoll_create"); let listen_fd = listener.as_raw_fd(); epfd.add( &listener, EpollEvent::new(EpollFlags::EPOLLIN, listen_fd as u64), ).expect("epoll_add listener"); let mut clients: HashMap<RawFd, TcpStream> = HashMap::new(); let mut events = vec![EpollEvent::empty(); 64]; loop { let n = epfd.wait(&mut events, -1) .expect("epoll_wait"); for i in 0..n { let fd = events[i].data() as RawFd; if fd == listen_fd { loop { match listener.accept() { Ok((stream, addr)) => { println!("+ {}", addr); set_nonblocking(&stream); let raw = stream.as_raw_fd(); epfd.add( &stream, EpollEvent::new( EpollFlags::EPOLLIN | EpollFlags::EPOLLET, raw as u64, ), ).expect("epoll_add client"); clients.insert(raw, stream); } Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => { break; } Err(e) => { eprintln!("accept: {}", e); break; } } } } else if let Some(stream) = clients.get_mut(&fd) { let mut buf = [0u8; 4096]; loop { match stream.read(&mut buf) { Ok(0) => { println!("- fd {}", fd); clients.remove(&fd); break; } Ok(n) => { let _ = stream.write_all(&buf[..n]); } Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => { break; } Err(_) => { clients.remove(&fd); break; } } } } } } }
Rust Note: When a
TcpStreamis removed from theHashMap, it is dropped, which closes the fd. The kernel automatically removes a closed fd from the epoll interest list. No explicitEPOLL_CTL_DELneeded.
Rust: mio for Portable Event Loops
epoll is Linux-only. kqueue is the equivalent on macOS/BSD. The mio crate abstracts over both, providing a single API.
// mio_server.rs -- portable event loop with mio // Cargo.toml: mio = { version = "1", features = ["net", "os-poll"] } use mio::net::{TcpListener, TcpStream}; use mio::{Events, Interest, Poll, Token}; use std::collections::HashMap; use std::io::{self, Read, Write}; const LISTENER: Token = Token(0); fn main() -> io::Result<()> { let mut poll = Poll::new()?; let mut events = Events::with_capacity(128); let addr = "0.0.0.0:7878".parse().unwrap(); let mut listener = TcpListener::bind(addr)?; poll.registry().register( &mut listener, LISTENER, Interest::READABLE)?; let mut clients: HashMap<Token, TcpStream> = HashMap::new(); let mut next_token = 1usize; println!("mio server on port 7878"); loop { poll.poll(&mut events, None)?; for event in events.iter() { match event.token() { LISTENER => { loop { match listener.accept() { Ok((mut stream, addr)) => { let token = Token(next_token); next_token += 1; poll.registry().register( &mut stream, token, Interest::READABLE, )?; println!("+ {} (token {})", addr, token.0); clients.insert(token, stream); } Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => { break; } Err(e) => return Err(e), } } } token => { let done = if let Some(stream) = clients.get_mut(&token) { let mut buf = [0u8; 4096]; let mut closed = false; loop { match stream.read(&mut buf) { Ok(0) => { closed = true; break; } Ok(n) => { let _ = stream.write_all(&buf[..n]); } Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => { break; } Err(_) => { closed = true; break; } } } closed } else { false }; if done { if let Some(mut stream) = clients.remove(&token) { poll.registry().deregister(&mut stream)?; println!("- token {}", token.0); } } } } } } }
Connection to tokio
tokio is Rust's most popular async runtime. Under the hood, it is an epoll (Linux) / kqueue (macOS) event loop built on mio. When you write:
#![allow(unused)] fn main() { // Conceptual -- requires tokio runtime setup async fn handle(mut stream: tokio::net::TcpStream) { let (mut reader, mut writer) = stream.split(); tokio::io::copy(&mut reader, &mut writer).await.unwrap(); } }
...the .await suspends the task. tokio's reactor (an epoll event loop) resumes it when data arrives. There is no thread per connection, no manual epoll_ctl -- the async/await syntax hides the event loop plumbing you built in this chapter.
Your code (this chapter): tokio (same thing, hidden):
+---------------------------+ +---------------------------+
| epoll_wait() | | runtime.block_on(...) |
| -> fd ready | | -> task wakes up |
| -> call handler(fd) | | -> resume .await |
| -> handler does read/write| | -> async fn does I/O |
| -> back to epoll_wait | | -> task yields at .await |
+---------------------------+ +---------------------------+
Driver Prep: The Linux kernel uses a similar event-driven model internally. The
waitqueuemechanism wakes sleeping tasks when events occur. Kernel threads and work queues are the kernel's equivalent of the reactor pattern. Understanding epoll deeply prepares you for kernel-level event handling.
Performance Comparison
10,000 idle connections, 100 active per second:
select: scans 10,000 fds per call ~10,000 operations/call
poll: scans 10,000 pollfds per call ~10,000 operations/call
epoll: returns only ~100 ready fds ~100 operations/call
At 100,000 connections: select/poll grind to a halt.
epoll: barely notices.
This is why nginx, Redis, Node.js (libuv), and every modern event-driven server uses epoll on Linux.
Knowledge Check
- What is the difference between
epoll_create1and the olderepoll_create? - In edge-triggered mode, what happens if you read only part of the available data?
- Why does the reactor pattern avoid the need for mutexes?
Common Pitfalls
- Edge-triggered without draining -- the most common epoll bug. Read until
EAGAINor you will lose data. - Forgetting
O_NONBLOCK-- edge-triggered epoll with blocking fds deadlocks. Aread()call blocks when there is no data, but you will never get another notification because the edge already fired. - Stale pointers in
data.ptr-- if you free a connection struct but forget to remove the fd from epoll, the next event delivers a dangling pointer. Use-after-free. EPOLL_CTL_DELon a closed fd -- closing the fd automatically removes it from epoll (if it is the last reference). CallingEPOLL_CTL_DELafterclose()returnsEBADF. Close last, or skip the explicit delete.- Using
EPOLLONESHOTwithout re-arming -- the fd goes silent forever. Every event handler must callEPOLL_CTL_MODto re-enable. - Assuming portability -- epoll is Linux-only. Use
kqueueon BSD/macOS, IOCP on Windows, or a library likemioorlibuvfor cross-platform code.
C Optimization Techniques
Performance matters in systems programming. This chapter covers the tools and techniques that separate amateur C from production C: compiler flags, profiling, cache-aware data layout, and branch prediction hints. The rule is always the same: measure first, optimize second.
Compiler Optimization Levels
GCC and Clang accept -O flags that control how aggressively the compiler
transforms your code.
| Flag | Effect |
|---|---|
-O0 | No optimization. Fastest compile, debuggable. |
-O1 | Basic optimizations. Smaller code. |
-O2 | Most optimizations. Good default for release. |
-O3 | Aggressive: vectorization, inlining, unrolling. |
-Os | Optimize for size (like -O2 minus bloat). |
-Og | Optimize for debugging experience. |
Let's see the difference on a trivial loop.
/* opt_levels.c */
#include <stdio.h>
#include <time.h>
static long sum_array(const int *arr, int n) {
long total = 0;
for (int i = 0; i < n; i++) {
total += arr[i];
}
return total;
}
int main(void) {
enum { N = 100000000 };
static int data[N];
for (int i = 0; i < N; i++)
data[i] = i & 0xFF;
struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
long result = sum_array(data, N);
clock_gettime(CLOCK_MONOTONIC, &t1);
double elapsed = (t1.tv_sec - t0.tv_sec)
+ (t1.tv_nsec - t0.tv_nsec) / 1e9;
printf("sum = %ld, time = %.4f s\n", result, elapsed);
return 0;
}
Compile and run at different levels:
$ gcc -O0 -o opt0 opt_levels.c && ./opt0
sum = 12700000000, time = 0.2510 s
$ gcc -O2 -o opt2 opt_levels.c && ./opt2
sum = 12700000000, time = 0.0380 s
$ gcc -O3 -o opt3 opt_levels.c && ./opt3
sum = 12700000000, time = 0.0120 s
At -O3, the compiler auto-vectorizes the loop using SIMD instructions.
Caution:
-O3can change behavior of code that relies on undefined behavior. If your program works at-O0but breaks at-O2, you have a bug, not a compiler problem.
Looking at What the Compiler Did
Use -S to see assembly, or objdump -d on the binary:
$ gcc -O2 -S -o opt2.s opt_levels.c
$ grep -A5 'sum_array' opt2.s
The Compiler Explorer (godbolt.org) is invaluable for comparing output across flags and compilers. Use it.
Profile-Guided Optimization (PGO)
PGO lets the compiler observe real execution patterns, then recompile with that data.
# Step 1: Instrument
$ gcc -O2 -fprofile-generate -o opt_pgo_gen opt_levels.c
# Step 2: Run with representative input
$ ./opt_pgo_gen
# Step 3: Recompile using the profile
$ gcc -O2 -fprofile-use -o opt_pgo opt_levels.c
PGO helps the compiler make better inlining, branching, and layout decisions. Typical improvement: 5-20% on real workloads.
Profiling with perf
Never guess where time is spent. Use perf.
$ gcc -O2 -g -o opt2 opt_levels.c
$ perf stat ./opt2
This gives you cycle counts, cache misses, branch mispredictions, and IPC (instructions per cycle).
For function-level profiling:
$ perf record -g ./opt2
$ perf report
perf report shows a call-graph breakdown. Look for the hottest functions
first.
Profiling with gprof
$ gcc -O2 -pg -o opt_gprof opt_levels.c
$ ./opt_gprof
$ gprof opt_gprof gmon.out | head -30
gprof adds instrumentation overhead but gives call counts and cumulative time.
Profiling with Valgrind/Callgrind
$ gcc -O2 -g -o opt2 opt_levels.c
$ valgrind --tool=callgrind ./opt2
$ callgrind_annotate callgrind.out.* | head -40
Callgrind simulates the CPU cache hierarchy. It's slow (20-50x), but gives exact instruction counts and cache miss data.
Try It: Compile
opt_levels.cat-O0and-O3. Runperf staton both. Compare the "instructions" and "cache-misses" lines.
Cache-Friendly Data Layout
Modern CPUs are fast. Memory is slow. A cache miss costs 100+ cycles. Data layout determines cache behavior.
Array of Structs (AoS) vs Struct of Arrays (SoA)
AoS (Array of Structs):
+------+------+------+------+------+------+------+------+
| x[0] | y[0] | z[0] | w[0] | x[1] | y[1] | z[1] | w[1] | ...
+------+------+------+------+------+------+------+------+
SoA (Struct of Arrays):
+------+------+------+------+------+------+------+------+
| x[0] | x[1] | x[2] | x[3] | y[0] | y[1] | y[2] | y[3] | ...
+------+------+------+------+------+------+------+------+
If you iterate over all elements but only touch x, SoA wins because cache
lines contain only x values.
/* cache_layout.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 10000000
/* Array of Structs */
struct Particle_AoS {
float x, y, z;
float vx, vy, vz;
float mass;
float pad; /* 32 bytes total */
};
/* Struct of Arrays */
struct Particles_SoA {
float *x, *y, *z;
float *vx, *vy, *vz;
float *mass;
};
static double now(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec + ts.tv_nsec / 1e9;
}
int main(void) {
/* AoS test */
struct Particle_AoS *aos = malloc(N * sizeof(*aos));
for (int i = 0; i < N; i++) {
aos[i].x = (float)i;
aos[i].mass = 1.0f;
}
double t0 = now();
float sum_aos = 0;
for (int i = 0; i < N; i++)
sum_aos += aos[i].x * aos[i].mass;
double t1 = now();
printf("AoS: sum=%.0f time=%.4f s\n", sum_aos, t1 - t0);
/* SoA test */
struct Particles_SoA soa;
soa.x = malloc(N * sizeof(float));
soa.mass = malloc(N * sizeof(float));
for (int i = 0; i < N; i++) {
soa.x[i] = (float)i;
soa.mass[i] = 1.0f;
}
t0 = now();
float sum_soa = 0;
for (int i = 0; i < N; i++)
sum_soa += soa.x[i] * soa.mass[i];
t1 = now();
printf("SoA: sum=%.0f time=%.4f s\n", sum_soa, t1 - t0);
free(aos);
free(soa.x);
free(soa.mass);
return 0;
}
$ gcc -O2 -o cache_layout cache_layout.c -lm
$ ./cache_layout
AoS: sum=... time=0.0280 s
SoA: sum=... time=0.0090 s
SoA wins because only 8 bytes per element touch the cache (x + mass), not 32.
Driver Prep: Kernel DMA buffer layout affects device performance. The same AoS-vs-SoA trade-off applies to descriptor rings in network drivers.
Branch Prediction Hints
CPUs predict branches. Mispredictions cost ~15 cycles. You can hint the
compiler with __builtin_expect.
/* branch_hints.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
static long process(const int *data, int n) {
long sum = 0;
for (int i = 0; i < n; i++) {
if (unlikely(data[i] < 0)) {
/* Error path: rarely taken */
sum -= data[i];
} else {
sum += data[i];
}
}
return sum;
}
int main(void) {
enum { N = 100000000 };
int *data = malloc(N * sizeof(int));
/* 99.9% positive values */
for (int i = 0; i < N; i++)
data[i] = (i % 1000 == 0) ? -1 : i & 0xFF;
struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
long result = process(data, N);
clock_gettime(CLOCK_MONOTONIC, &t1);
double elapsed = (t1.tv_sec - t0.tv_sec)
+ (t1.tv_nsec - t0.tv_nsec) / 1e9;
printf("sum=%ld time=%.4f s\n", result, elapsed);
free(data);
return 0;
}
The Linux kernel defines likely() and unlikely() macros everywhere. Use
them on error-checking branches.
The restrict Keyword
restrict tells the compiler that two pointers don't alias (don't point to
overlapping memory). This enables vectorization.
/* restrict_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void add_arrays(float *restrict dst,
const float *restrict a,
const float *restrict b,
int n) {
for (int i = 0; i < n; i++)
dst[i] = a[i] + b[i];
}
void add_arrays_no_restrict(float *dst,
const float *a,
const float *b,
int n) {
for (int i = 0; i < n; i++)
dst[i] = a[i] + b[i];
}
int main(void) {
enum { N = 50000000 };
float *a = malloc(N * sizeof(float));
float *b = malloc(N * sizeof(float));
float *c = malloc(N * sizeof(float));
for (int i = 0; i < N; i++) {
a[i] = (float)i;
b[i] = (float)(N - i);
}
struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
add_arrays(c, a, b, N);
clock_gettime(CLOCK_MONOTONIC, &t1);
printf("restrict: %.4f s\n",
(t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);
clock_gettime(CLOCK_MONOTONIC, &t0);
add_arrays_no_restrict(c, a, b, N);
clock_gettime(CLOCK_MONOTONIC, &t1);
printf("no restrict: %.4f s\n",
(t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);
free(a); free(b); free(c);
return 0;
}
Without restrict, the compiler must assume dst might overlap a or b,
preventing SIMD optimization.
Caution: If you lie to the compiler with
restrictand the pointers actually alias, you get undefined behavior. The compiler will generate wrong code.
Loop Unrolling
The compiler can unroll loops at -O2/-O3, but you can also do it manually
or with pragmas:
/* unroll.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
long sum_unrolled(const int *arr, int n) {
long s0 = 0, s1 = 0, s2 = 0, s3 = 0;
int i = 0;
/* Process 4 elements per iteration */
for (; i + 3 < n; i += 4) {
s0 += arr[i];
s1 += arr[i + 1];
s2 += arr[i + 2];
s3 += arr[i + 3];
}
/* Handle remainder */
for (; i < n; i++)
s0 += arr[i];
return s0 + s1 + s2 + s3;
}
long sum_simple(const int *arr, int n) {
long total = 0;
for (int i = 0; i < n; i++)
total += arr[i];
return total;
}
int main(void) {
enum { N = 100000000 };
int *data = malloc(N * sizeof(int));
for (int i = 0; i < N; i++)
data[i] = i & 0xFF;
struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
long r1 = sum_simple(data, N);
clock_gettime(CLOCK_MONOTONIC, &t1);
printf("simple: sum=%ld %.4f s\n", r1,
(t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);
clock_gettime(CLOCK_MONOTONIC, &t0);
long r2 = sum_unrolled(data, N);
clock_gettime(CLOCK_MONOTONIC, &t1);
printf("unrolled: sum=%ld %.4f s\n", r2,
(t1.tv_sec-t0.tv_sec) + (t1.tv_nsec-t0.tv_nsec)/1e9);
free(data);
return 0;
}
Manual unrolling with multiple accumulators (s0-s3) breaks data
dependencies and lets the CPU pipeline fill.
GCC also supports:
#pragma GCC unroll 8
for (int i = 0; i < n; i++)
total += arr[i];
Rust Optimization
Rust uses LLVM. The same optimization principles apply.
# Debug build (like -O0)
$ cargo build
# Release build (like -O2 + LTO)
$ cargo build --release
In Cargo.toml:
[profile.release]
opt-level = 3
lto = true
codegen-units = 1
// src/main.rs — cache-friendly iteration use std::time::Instant; const N: usize = 10_000_000; struct ParticlesAoS { data: Vec<(f32, f32, f32, f32)>, // x, y, z, mass } struct ParticlesSoA { x: Vec<f32>, mass: Vec<f32>, } fn main() { // AoS let aos = ParticlesAoS { data: (0..N).map(|i| (i as f32, 0.0, 0.0, 1.0)).collect(), }; let t0 = Instant::now(); let sum_aos: f32 = aos.data.iter().map(|(x, _, _, m)| x * m).sum(); let d_aos = t0.elapsed(); // SoA let soa = ParticlesSoA { x: (0..N).map(|i| i as f32).collect(), mass: vec![1.0; N], }; let t0 = Instant::now(); let sum_soa: f32 = soa.x.iter().zip(&soa.mass).map(|(x, m)| x * m).sum(); let d_soa = t0.elapsed(); println!("AoS: sum={sum_aos:.0} time={d_aos:?}"); println!("SoA: sum={sum_soa:.0} time={d_soa:?}"); }
Rust Note: Rust iterators compile to the same tight loops as C
forloops at-O2. No overhead. The compiler auto-vectorizes them just like it would a C loop.
Rust: Profiling
Use cargo flamegraph or perf directly:
$ cargo build --release
$ perf stat ./target/release/myapp
$ perf record -g ./target/release/myapp
$ perf report
For Criterion-based benchmarks:
#![allow(unused)] fn main() { // benches/my_bench.rs use criterion::{black_box, criterion_group, criterion_main, Criterion}; fn bench_sum(c: &mut Criterion) { let data: Vec<i32> = (0..1_000_000).map(|i| (i & 0xFF) as i32).collect(); c.bench_function("sum", |b| { b.iter(|| { let sum: i64 = data.iter().map(|&x| x as i64).sum(); black_box(sum) }) }); } criterion_group!(benches, bench_sum); criterion_main!(benches); }
black_box prevents the compiler from optimizing away the computation.
Rust Note: Rust has no direct equivalent of
restrict. The borrow checker ensures&mutreferences are unique, which gives LLVM the same aliasing information automatically.
Measuring: The Golden Rule
Rule: If you didn't measure it, you don't know it's slow.
Rule: If you didn't measure it after, you don't know you fixed it.
Micro-benchmark checklist
- Warm up the cache first (run the function once before timing).
- Use
clock_gettime(CLOCK_MONOTONIC)in C,Instant::now()in Rust. - Run multiple iterations and take the median, not the mean.
- Disable CPU frequency scaling during benchmarks.
- Use
volatileorblack_boxto prevent dead-code elimination.
/* prevent_dce.c — prevent dead code elimination */
#include <stdio.h>
#include <time.h>
/* Force the compiler to keep the result */
static void do_not_optimize(void *p) {
__asm__ volatile("" : : "g"(p) : "memory");
}
int main(void) {
struct timespec t0, t1;
long sum = 0;
clock_gettime(CLOCK_MONOTONIC, &t0);
for (int i = 0; i < 100000000; i++)
sum += i;
do_not_optimize(&sum);
clock_gettime(CLOCK_MONOTONIC, &t1);
double elapsed = (t1.tv_sec - t0.tv_sec)
+ (t1.tv_nsec - t0.tv_nsec) / 1e9;
printf("sum=%ld time=%.4f s\n", sum, elapsed);
return 0;
}
Try It: Compile
prevent_dce.cwith-O3but without thedo_not_optimizecall. What happens to the loop? Check the assembly.
Optimization Decision Flowchart
Is it fast enough?
|
+-- YES --> Stop. Ship it.
|
+-- NO --> Profile it.
|
Where is the time?
|
+-- CPU bound --> Check -O level, restrict, SIMD
|
+-- Memory bound --> Check data layout, cache misses
|
+-- I/O bound --> Check syscall count, buffering
|
+-- Branch misses --> Check branch patterns, likely/unlikely
Quick Knowledge Check
- Why is
-O0useful even though it produces slow code? - What does
restrictpromise the compiler, and what happens if you lie? - When does SoA beat AoS?
Common Pitfalls
- Optimizing without profiling. You will optimize the wrong thing.
- Benchmarking at
-O0. That measures the interpreter, not your algorithm. - Forgetting warm-up. Cold caches give misleading first-run numbers.
- Using
gettimeofdayfor benchmarks. It's not monotonic. Useclock_gettime(CLOCK_MONOTONIC). - Assuming
-O3is always better than-O2. Aggressive inlining can blow up the instruction cache. restricton aliased pointers. Undefined behavior, silently wrong.- Optimizing for one CPU.
-march=nativewon't run on other machines.
Memory Pools and Arena Allocators
malloc and free are general-purpose. General-purpose means slow for
specific patterns. When you allocate thousands of short-lived objects of the
same size, or build a parse tree that you discard all at once, custom
allocators win by an order of magnitude. This chapter builds both an arena
allocator and a pool allocator from scratch.
Why malloc Is Sometimes Too Slow
malloc must handle any size, any order of free, and thread safety. That
flexibility costs:
- Metadata overhead per allocation (typically 16-32 bytes).
- Fragmentation from interleaved alloc/free.
- Lock contention in multi-threaded programs.
- System calls (
brk/mmap) when the free list is empty.
For patterns like "allocate many, free all at once" or "allocate fixed-size blocks rapidly," we can do much better.
Arena Allocator: Bump and Reset
An arena is a contiguous block of memory. Allocation bumps a pointer forward. Freeing individual objects is not supported -- you free everything at once.
Arena layout:
+---------------------------------------------------+
| used memory | free space |
+---------------------------------------------------+
^ ^ ^
base offset base + capacity
Arena in C
/* arena.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint8_t *base;
size_t capacity;
size_t offset;
} Arena;
Arena arena_create(size_t capacity) {
Arena a;
a.base = (uint8_t *)malloc(capacity);
if (!a.base) {
fprintf(stderr, "arena: malloc failed\n");
exit(1);
}
a.capacity = capacity;
a.offset = 0;
return a;
}
void *arena_alloc(Arena *a, size_t size, size_t align) {
/* Align the current offset */
size_t aligned = (a->offset + align - 1) & ~(align - 1);
if (aligned + size > a->capacity) {
fprintf(stderr, "arena: out of memory (%zu requested, %zu free)\n",
size, a->capacity - a->offset);
return NULL;
}
void *ptr = a->base + aligned;
a->offset = aligned + size;
return ptr;
}
void arena_reset(Arena *a) {
a->offset = 0;
}
void arena_destroy(Arena *a) {
free(a->base);
a->base = NULL;
a->capacity = 0;
a->offset = 0;
}
/* ---- demo ---- */
typedef struct {
int x, y;
char label[24];
} Point;
int main(void) {
Arena arena = arena_create(1024 * 1024); /* 1 MB */
/* Allocate 1000 Points -- no individual free needed */
Point **points = arena_alloc(&arena, 1000 * sizeof(Point *),
_Alignof(Point *));
for (int i = 0; i < 1000; i++) {
points[i] = arena_alloc(&arena, sizeof(Point), _Alignof(Point));
points[i]->x = i;
points[i]->y = i * 2;
snprintf(points[i]->label, sizeof(points[i]->label), "pt_%d", i);
}
printf("Point 42: (%d, %d) \"%s\"\n",
points[42]->x, points[42]->y, points[42]->label);
printf("Arena used: %zu / %zu bytes\n", arena.offset, arena.capacity);
/* Free everything at once */
arena_reset(&arena);
printf("After reset: %zu bytes used\n", arena.offset);
arena_destroy(&arena);
return 0;
}
$ gcc -O2 -o arena arena.c && ./arena
Point 42: (42, 84) "pt_42"
Arena used: 40000 / 1048576 bytes
After reset: 0 bytes used
That's the entire allocator: 20 lines of logic. No free list, no metadata per object, no fragmentation.
Try It: Add a function
arena_alloc_string(Arena *a, const char *s)that copies a string into the arena and returns a pointer to it. Hint: usestrlen+arena_alloc+memcpy.
Arena Performance vs malloc
/* arena_bench.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>
typedef struct {
uint8_t *base;
size_t capacity;
size_t offset;
} Arena;
Arena arena_create(size_t cap) {
Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
return a;
}
void *arena_alloc(Arena *a, size_t size) {
size_t aligned = (a->offset + 7) & ~(size_t)7;
void *p = a->base + aligned;
a->offset = aligned + size;
return p;
}
static double now(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec + ts.tv_nsec / 1e9;
}
int main(void) {
enum { N = 1000000 };
/* Benchmark malloc */
void **ptrs = malloc(N * sizeof(void *));
double t0 = now();
for (int i = 0; i < N; i++)
ptrs[i] = malloc(64);
double t1 = now();
printf("malloc: %.4f s\n", t1 - t0);
for (int i = 0; i < N; i++)
free(ptrs[i]);
free(ptrs);
/* Benchmark arena */
Arena a = arena_create((size_t)N * 72);
t0 = now();
for (int i = 0; i < N; i++)
arena_alloc(&a, 64);
t1 = now();
printf("arena: %.4f s\n", t1 - t0);
free(a.base);
return 0;
}
Typical result: arena allocation is 5-20x faster than malloc for small objects.
Pool Allocator: Fixed-Size Blocks
A pool allocator manages blocks of identical size. Freed blocks go onto a free list for reuse.
Pool layout (block size = 32 bytes):
+--------+--------+--------+--------+--------+
| block0 | block1 | block2 | block3 | block4 | ...
+--------+--------+--------+--------+--------+
Free list (embedded in unused blocks):
block2 -> block0 -> block4 -> NULL
The trick: when a block is free, we store the free-list pointer inside the block itself. No extra metadata.
/* pool.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint8_t *memory;
void *free_list;
size_t block_size;
size_t block_count;
} Pool;
Pool pool_create(size_t block_size, size_t block_count) {
/* Block must be large enough to hold a pointer */
if (block_size < sizeof(void *))
block_size = sizeof(void *);
Pool p;
p.block_size = block_size;
p.block_count = block_count;
p.memory = (uint8_t *)malloc(block_size * block_count);
if (!p.memory) {
fprintf(stderr, "pool: malloc failed\n");
exit(1);
}
/* Build the free list */
p.free_list = NULL;
for (size_t i = 0; i < block_count; i++) {
void *block = p.memory + i * block_size;
*(void **)block = p.free_list;
p.free_list = block;
}
return p;
}
void *pool_alloc(Pool *p) {
if (!p->free_list) {
fprintf(stderr, "pool: exhausted\n");
return NULL;
}
void *block = p->free_list;
p->free_list = *(void **)block;
return block;
}
void pool_free(Pool *p, void *block) {
*(void **)block = p->free_list;
p->free_list = block;
}
void pool_destroy(Pool *p) {
free(p->memory);
p->memory = NULL;
p->free_list = NULL;
}
/* ---- demo ---- */
typedef struct {
int id;
double value;
} Record;
int main(void) {
Pool pool = pool_create(sizeof(Record), 1024);
Record *r1 = pool_alloc(&pool);
Record *r2 = pool_alloc(&pool);
Record *r3 = pool_alloc(&pool);
r1->id = 1; r1->value = 3.14;
r2->id = 2; r2->value = 2.72;
r3->id = 3; r3->value = 1.41;
printf("r1: id=%d val=%.2f\n", r1->id, r1->value);
printf("r2: id=%d val=%.2f\n", r2->id, r2->value);
/* Return r2 to pool */
pool_free(&pool, r2);
/* Reuse that block */
Record *r4 = pool_alloc(&pool);
r4->id = 4; r4->value = 9.81;
printf("r4: id=%d val=%.2f (reused r2's block)\n", r4->id, r4->value);
printf("r4 == r2 address? %s\n", (r4 == r2) ? "yes" : "no");
pool_destroy(&pool);
return 0;
}
$ gcc -O2 -o pool pool.c && ./pool
r1: id=1 val=3.14
r2: id=2 val=2.72
r4: id=4 val=9.81 (reused r2's block)
r4 == r2 address? yes
Caution: Using a pool-allocated block after
pool_freeis use-after-free. The pool won't detect it. You'll corrupt the free list.
Rust: bumpalo and typed-arena
Rust has crate-level arena allocators that integrate with the borrow checker.
bumpalo (bump allocator)
// Cargo.toml: bumpalo = "3" use bumpalo::Bump; fn main() { let arena = Bump::new(); // Allocate values -- they live as long as `arena` let x = arena.alloc(42_i32); let y = arena.alloc(3.14_f64); let s = arena.alloc_str("hello from the arena"); println!("x = {x}, y = {y}, s = \"{s}\""); println!("Arena used: {} bytes", arena.allocated_bytes()); // Everything freed when `arena` drops }
typed-arena (single-type arena)
// Cargo.toml: typed-arena = "2" use typed_arena::Arena; struct Node { value: i32, label: String, } fn main() { let arena = Arena::new(); let nodes: Vec<&Node> = (0..1000) .map(|i| { arena.alloc(Node { value: i, label: format!("node_{i}"), }) }) .collect(); println!("Node 42: value={}, label=\"{}\"", nodes[42].value, nodes[42].label); }
Rust Note: Rust arenas return references (
&T) with the arena's lifetime. The borrow checker prevents use-after-free at compile time. This is the biggest difference from C arenas, where dangling pointers are your problem.
When to Use Each Allocator
+------------------+----------------------------+---------------------------+
| Pattern | Allocator | Why |
+------------------+----------------------------+---------------------------+
| Parse a request, | Arena | Alloc many, free all at |
| process, discard | | once. Zero fragmentation. |
+------------------+----------------------------+---------------------------+
| Game loop: | Arena (per-frame) | Reset at frame boundary. |
| alloc per frame | | No GC pauses. |
+------------------+----------------------------+---------------------------+
| Connection pool: | Pool | Fixed-size blocks. Fast |
| reuse objects | | alloc/free. Reuse memory. |
+------------------+----------------------------+---------------------------+
| Mixed sizes, | malloc/free (or jemalloc) | General-purpose is fine |
| long lifetimes | | when patterns are random. |
+------------------+----------------------------+---------------------------+
Real-World Patterns
Protocol Parser with Arena
A network server receives a packet, parses headers and fields into an arena, processes the request, then resets the arena for the next packet.
/* parse_loop.c — sketch of arena-based packet parsing */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint8_t *base;
size_t capacity;
size_t offset;
} Arena;
Arena arena_create(size_t cap) {
Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
return a;
}
void *arena_alloc(Arena *a, size_t size) {
size_t aligned = (a->offset + 7) & ~(size_t)7;
void *p = a->base + aligned;
a->offset = aligned + size;
return p;
}
void arena_reset(Arena *a) { a->offset = 0; }
typedef struct {
char *method; /* "GET", "POST", etc. */
char *path; /* "/index.html" */
int num_headers;
} HttpRequest;
/* Parse a (fake) request into the arena */
HttpRequest *parse_request(Arena *a, const char *raw) {
HttpRequest *req = arena_alloc(a, sizeof(HttpRequest));
/* Copy method */
req->method = arena_alloc(a, 8);
strncpy(req->method, raw, 3);
req->method[3] = '\0';
/* Copy path */
req->path = arena_alloc(a, 256);
strncpy(req->path, raw + 4, 11);
req->path[11] = '\0';
req->num_headers = 3; /* placeholder */
return req;
}
int main(void) {
Arena arena = arena_create(4096);
/* Simulate processing 3 requests */
const char *requests[] = {
"GET /index.html HTTP/1.1",
"POST /api/data HTTP/1.1",
"GET /style.css HTTP/1.1",
};
for (int i = 0; i < 3; i++) {
arena_reset(&arena); /* free previous request's data */
HttpRequest *req = parse_request(&arena, requests[i]);
printf("Request %d: method=%s path=%s headers=%d (arena=%zu bytes)\n",
i, req->method, req->path, req->num_headers, arena.offset);
}
free(arena.base);
return 0;
}
$ gcc -O2 -o parse_loop parse_loop.c && ./parse_loop
Request 0: method=GET path=/index.html headers=3 (arena=288 bytes)
Request 1: method=POS path=/api/data HT headers=3 (arena=288 bytes)
Request 2: method=GET path=/style.css headers=3 (arena=288 bytes)
Rust: Per-Request Arena
use bumpalo::Bump; struct Request<'a> { method: &'a str, path: &'a str, } fn parse_request<'a>(arena: &'a Bump, raw: &str) -> Request<'a> { let parts: Vec<&str> = raw.splitn(3, ' ').collect(); Request { method: arena.alloc_str(parts[0]), path: arena.alloc_str(parts[1]), } } fn main() { let requests = [ "GET /index.html HTTP/1.1", "POST /api/data HTTP/1.1", "GET /style.css HTTP/1.1", ]; let arena = Bump::new(); for (i, raw) in requests.iter().enumerate() { arena.reset(); let req = parse_request(&arena, raw); println!("Request {i}: {} {} (arena={} bytes)", req.method, req.path, arena.allocated_bytes()); } }
Driver Prep: Kernel memory allocation uses slab allocators (kmem_cache), which are essentially pool allocators for fixed-size kernel objects. The concepts here map directly to
kmem_cache_create/kmem_cache_alloc/kmem_cache_freein the kernel.
Growing an Arena
The simple arena above has a fixed capacity. A production arena grows by chaining blocks:
+--------+ +--------+ +--------+
| block1 | --> | block2 | --> | block3 |
| 4 KB | | 8 KB | | 16 KB | (double each time)
+--------+ +--------+ +--------+
Reset means: keep block1, free the rest. This gives good amortized performance without wasting memory on small workloads.
Try It: Extend the C arena to support growing. When
arena_allocruns out of space, allocate a new block (double the previous size), link it, and continue.arena_resetshould free all blocks except the first.
Quick Knowledge Check
- Why can an arena allocator skip tracking individual frees?
- How does a pool allocator store its free list without extra metadata?
- When is
mallocthe right choice over an arena or pool?
Common Pitfalls
- Using arena memory after reset. All pointers are invalidated. Same as use-after-free.
- Pool block too small. Must be at least
sizeof(void*)to hold the free-list pointer. - Forgetting alignment. Bumping by
sizewithout aligning causes bus errors on strict-alignment architectures (ARM). - Arena for long-lived objects. If you can't reset, the arena just grows forever. Use a pool or malloc.
- Thread safety. Neither allocator above is thread-safe. Add a mutex or use per-thread arenas.
Zero-Copy Techniques and Atomics
Copying data is the enemy of performance. Every memcpy wastes CPU cycles and
pollutes the cache. This chapter covers zero-copy I/O on Linux and atomic
operations for lock-free data structures -- two techniques that separate
fast systems code from everything else.
Zero-Copy I/O: Why Copies Hurt
A naive file-to-socket transfer does four copies:
Traditional copy path:
Disk --> Kernel Buffer --> User Buffer --> Kernel Buffer --> NIC
copy #1 copy #2 copy #3
(+ context switch) (+ context switch)
With sendfile, the kernel does it in zero or one copy:
sendfile path:
Disk --> Kernel Buffer ---------> NIC
(DMA) (DMA or single copy)
No user-space involvement
sendfile
/* sendfile_demo.c */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/sendfile.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
return 1;
}
int filefd = open(argv[1], O_RDONLY);
if (filefd < 0) { perror("open"); return 1; }
struct stat st;
fstat(filefd, &st);
/* Create a TCP server socket */
int srv = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(9000),
.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
};
bind(srv, (struct sockaddr *)&addr, sizeof(addr));
listen(srv, 1);
printf("Listening on :9000, waiting for connection...\n");
int client = accept(srv, NULL, NULL);
if (client < 0) { perror("accept"); return 1; }
/* Zero-copy transfer */
off_t offset = 0;
ssize_t sent = sendfile(client, filefd, &offset, st.st_size);
printf("Sent %zd bytes via sendfile\n", sent);
close(client);
close(srv);
close(filefd);
return 0;
}
$ gcc -O2 -o sendfile_demo sendfile_demo.c
$ ./sendfile_demo /etc/passwd &
$ nc localhost 9000 | wc -c
splice
splice moves data between two file descriptors via a kernel pipe buffer.
No user-space copy.
/* splice_demo.c */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(void) {
int infd = open("/etc/hosts", O_RDONLY);
int outfd = open("/tmp/hosts_copy", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (infd < 0 || outfd < 0) { perror("open"); return 1; }
int pipefd[2];
if (pipe(pipefd) < 0) { perror("pipe"); return 1; }
/* Move data: file -> pipe -> file, no user-space buffer */
ssize_t total = 0;
ssize_t n;
while ((n = splice(infd, NULL, pipefd[1], NULL, 65536,
SPLICE_F_MOVE)) > 0) {
ssize_t w = splice(pipefd[0], NULL, outfd, NULL, n,
SPLICE_F_MOVE);
if (w < 0) { perror("splice out"); return 1; }
total += w;
}
printf("Copied %zd bytes via splice\n", total);
close(infd); close(outfd);
close(pipefd[0]); close(pipefd[1]);
return 0;
}
$ gcc -O2 -o splice_demo splice_demo.c && ./splice_demo
Copied 221 bytes via splice
Parse In-Place: Avoiding memcpy in Protocols
Instead of copying fields out of a packet buffer, point into the buffer directly:
/* parse_inplace.c */
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>
/* A simple fixed-format message */
struct __attribute__((packed)) Message {
uint16_t type;
uint16_t length;
uint32_t sequence;
char payload[0]; /* flexible array member */
};
void process_buffer(const uint8_t *buf, size_t len) {
/* Parse in-place: cast, don't copy */
const struct Message *msg = (const struct Message *)buf;
printf("Type: %u\n", ntohs(msg->type));
printf("Length: %u\n", ntohs(msg->length));
printf("Sequence: %u\n", ntohl(msg->sequence));
printf("Payload: %.*s\n",
(int)(len - sizeof(struct Message)),
msg->payload);
}
int main(void) {
/* Simulate a received buffer */
uint8_t buf[64];
struct Message *msg = (struct Message *)buf;
msg->type = htons(1);
msg->length = htons(13);
msg->sequence = htonl(42);
memcpy(msg->payload, "Hello World!", 13);
process_buffer(buf, sizeof(struct Message) + 13);
return 0;
}
Caution: In-place parsing requires careful attention to alignment and endianness. On architectures that require aligned access (ARM, SPARC), casting an unaligned buffer to a struct pointer is undefined behavior.
Rust: Zero-Copy Patterns
// zero_copy_parse.rs use std::convert::TryInto; fn parse_u16_be(buf: &[u8], offset: usize) -> u16 { u16::from_be_bytes(buf[offset..offset + 2].try_into().unwrap()) } fn parse_u32_be(buf: &[u8], offset: usize) -> u32 { u32::from_be_bytes(buf[offset..offset + 4].try_into().unwrap()) } fn main() { // Simulate a received buffer let buf: Vec<u8> = vec![ 0x00, 0x01, // type = 1 0x00, 0x0D, // length = 13 0x00, 0x00, 0x00, 0x2A, // sequence = 42 b'H', b'e', b'l', b'l', b'o', b' ', b'W', b'o', b'r', b'l', b'd', b'!', 0x00, ]; let msg_type = parse_u16_be(&buf, 0); let length = parse_u16_be(&buf, 2); let sequence = parse_u32_be(&buf, 4); let payload = std::str::from_utf8(&buf[8..8 + length as usize - 1]) .unwrap(); println!("Type: {msg_type}, Length: {length}, Seq: {sequence}"); println!("Payload: {payload}"); }
Rust Note: Rust doesn't allow arbitrary pointer casts to structs. Instead, you parse fields explicitly with
from_be_bytes. Thezerocopyandbytemuckcrates provide safe zero-copy deserialization for types that meet alignment and validity requirements.
Atomic Operations
When threads share data without locks, you need atomics. An atomic operation completes indivisibly -- no other thread can see a half-written value.
C: __atomic Builtins
/* atomics_c.c */
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
static atomic_long counter = 0;
void *increment(void *arg) {
(void)arg;
for (int i = 0; i < 1000000; i++) {
atomic_fetch_add(&counter, 1);
}
return NULL;
}
int main(void) {
pthread_t threads[4];
for (int i = 0; i < 4; i++)
pthread_create(&threads[i], NULL, increment, NULL);
for (int i = 0; i < 4; i++)
pthread_join(threads[i], NULL);
printf("Counter: %ld (expected 4000000)\n",
atomic_load(&counter));
return 0;
}
$ gcc -O2 -o atomics_c atomics_c.c -lpthread && ./atomics_c
Counter: 4000000 (expected 4000000)
Without atomic_, the result would be less than 4000000 due to data races.
Rust: std::sync::atomic
// atomics_rust.rs use std::sync::atomic::{AtomicI64, Ordering}; use std::sync::Arc; use std::thread; fn main() { let counter = Arc::new(AtomicI64::new(0)); let handles: Vec<_> = (0..4) .map(|_| { let c = Arc::clone(&counter); thread::spawn(move || { for _ in 0..1_000_000 { c.fetch_add(1, Ordering::Relaxed); } }) }) .collect(); for h in handles { h.join().unwrap(); } println!("Counter: {} (expected 4000000)", counter.load(Ordering::Relaxed)); }
Memory Ordering
Atomics alone aren't enough. You must specify how operations are ordered relative to other memory accesses.
Ordering Strength (weakest to strongest):
Relaxed No ordering guarantees. Only atomicity.
Acquire Reads after this acquire see writes before the matching release.
Release Writes before this release are visible after the matching acquire.
AcqRel Both acquire and release.
SeqCst Total global order. All threads agree on the sequence.
Acquire-Release Pattern
Thread A (producer): Thread B (consumer):
data = 42; while (!ready.load(Acquire)) { }
ready.store(true, Release); assert(data == 42); // guaranteed!
| ^
+--- Release syncs with Acquire--+
Without proper ordering, Thread B might see ready == true but data == 0
because the CPU reordered the stores.
/* acquire_release.c */
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
#include <assert.h>
static int data = 0;
static atomic_int ready = 0;
void *producer(void *arg) {
(void)arg;
data = 42; /* non-atomic write */
atomic_store_explicit(&ready, 1, memory_order_release);
return NULL;
}
void *consumer(void *arg) {
(void)arg;
while (atomic_load_explicit(&ready, memory_order_acquire) == 0) {
/* spin */
}
assert(data == 42); /* guaranteed by acquire-release */
printf("data = %d (correct!)\n", data);
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, producer, NULL);
pthread_create(&t2, NULL, consumer, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
return 0;
}
$ gcc -O2 -o acqrel acquire_release.c -lpthread && ./acqrel
data = 42 (correct!)
Rust Acquire-Release
use std::sync::atomic::{AtomicBool, Ordering}; use std::sync::Arc; use std::thread; fn main() { let data = Arc::new(std::sync::Mutex::new(0)); let ready = Arc::new(AtomicBool::new(false)); // Use raw pointers for the non-atomic shared data // to mirror the C pattern (Rust normally prevents this) let shared: Arc<(AtomicBool, std::cell::UnsafeCell<i32>)> = Arc::new((AtomicBool::new(false), std::cell::UnsafeCell::new(0))); // SAFETY: Acquire-release ordering guarantees visibility let s1 = Arc::clone(&shared); let producer = thread::spawn(move || { unsafe { *s1.1.get() = 42; } s1.0.store(true, Ordering::Release); }); let s2 = Arc::clone(&shared); let consumer = thread::spawn(move || { while !s2.0.load(Ordering::Acquire) { std::hint::spin_loop(); } let val = unsafe { *s2.1.get() }; assert_eq!(val, 42); println!("data = {val} (correct!)"); }); producer.join().unwrap(); consumer.join().unwrap(); }
Rust Note: Rust requires
unsafeto share non-atomic data between threads without aMutex. The language forces you to acknowledge the danger explicitly. In practice, preferMutexor channels unless you've proven atomics are necessary.
Compare-and-Swap (CAS)
CAS is the fundamental lock-free primitive: "If the value is X, change it to Y. Otherwise, tell me the current value."
/* cas_demo.c */
#include <stdio.h>
#include <stdatomic.h>
#include <stdbool.h>
static atomic_int value = 0;
bool try_increment(int expected_old, int new_val) {
/* Atomically: if value == expected_old, set to new_val */
return atomic_compare_exchange_strong(&value, &expected_old, new_val);
}
int main(void) {
atomic_store(&value, 10);
/* Try to change 10 -> 20 */
if (try_increment(10, 20))
printf("CAS succeeded: %d -> 20\n", 10);
else
printf("CAS failed\n");
/* Try to change 10 -> 30 (will fail, value is now 20) */
if (try_increment(10, 30))
printf("CAS succeeded: 10 -> 30\n");
else
printf("CAS failed: value is actually %d\n",
atomic_load(&value));
return 0;
}
$ gcc -O2 -o cas_demo cas_demo.c && ./cas_demo
CAS succeeded: 10 -> 20
CAS failed: value is actually 20
Rust CAS
use std::sync::atomic::{AtomicI32, Ordering}; fn main() { let value = AtomicI32::new(10); // Try to change 10 -> 20 match value.compare_exchange(10, 20, Ordering::SeqCst, Ordering::SeqCst) { Ok(old) => println!("CAS succeeded: {old} -> 20"), Err(cur) => println!("CAS failed: value is {cur}"), } // Try to change 10 -> 30 (will fail) match value.compare_exchange(10, 30, Ordering::SeqCst, Ordering::SeqCst) { Ok(old) => println!("CAS succeeded: {old} -> 30"), Err(cur) => println!("CAS failed: value is {cur}"), } }
Lock-Free Stack (Sketch)
A lock-free stack using CAS:
/* lockfree_stack.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdatomic.h>
typedef struct Node {
int value;
struct Node *next;
} Node;
typedef struct {
_Atomic(Node *) top;
} Stack;
void stack_init(Stack *s) {
atomic_store(&s->top, NULL);
}
void stack_push(Stack *s, int value) {
Node *node = malloc(sizeof(Node));
node->value = value;
Node *old_top;
do {
old_top = atomic_load(&s->top);
node->next = old_top;
} while (!atomic_compare_exchange_weak(&s->top, &old_top, node));
}
int stack_pop(Stack *s, int *out) {
Node *old_top;
Node *new_top;
do {
old_top = atomic_load(&s->top);
if (!old_top) return 0; /* empty */
new_top = old_top->next;
} while (!atomic_compare_exchange_weak(&s->top, &old_top, new_top));
*out = old_top->value;
free(old_top); /* caution: ABA problem in real code */
return 1;
}
int main(void) {
Stack s;
stack_init(&s);
stack_push(&s, 10);
stack_push(&s, 20);
stack_push(&s, 30);
int val;
while (stack_pop(&s, &val))
printf("popped: %d\n", val);
return 0;
}
$ gcc -O2 -o lockfree lockfree_stack.c && ./lockfree
popped: 30
popped: 20
popped: 10
Caution: This stack has the ABA problem: if thread A pops node X, thread B pops X and Y then pushes X back, thread A's CAS succeeds but the stack is corrupted. Real lock-free structures use tagged pointers or hazard pointers.
When to Use Atomics vs Mutex
+-------------------+----------------------------------+
| Use Atomics | Use Mutex |
+-------------------+----------------------------------+
| Single counter | Multiple fields updated together |
| Single flag | Complex invariants |
| Hot path, simple | Readability matters |
| Statistics | Anything non-trivial |
+-------------------+----------------------------------+
Rule of thumb: if you can't explain why your lock-free code is correct in one paragraph, use a mutex.
Driver Prep: The Linux kernel uses atomic operations extensively:
atomic_t,atomic_inc(),atomic_dec_and_test(). Memory barriers (smp_mb(),smp_wmb(),smp_rmb()) map to the orderings above. DMA descriptor rings are often lock-free structures using these primitives.
Try It: Modify
atomics_c.cto usememory_order_relaxedinstead of the defaultmemory_order_seq_cst. Does it still produce the correct count? Why?
Quick Knowledge Check
- How many copies does
sendfileeliminate compared toread+write? - What does
memory_order_acquireguarantee thatmemory_order_relaxeddoes not? - What is the ABA problem in lock-free programming?
Common Pitfalls
- Using
memcpywhere a pointer would do. Profile first, but default to referencing data in-place. - Relaxed ordering everywhere. It works for counters but breaks for publish/subscribe patterns.
- Forgetting
volatiledoesn't mean atomic.volatileprevents compiler reordering but not CPU reordering. It's not a substitute for atomics. - Lock-free code without formal reasoning. Lock-free is harder to get right than locks. Only use it when profiling proves the lock is the bottleneck.
sendfileon non-regular files. It doesn't work with pipes or sockets as the source -- usesplicefor those.- CAS loops without backoff. Under high contention, spinning CAS wastes CPU. Add exponential backoff or yield.
The /proc and /sys Filesystems
Linux exposes the kernel's internal state as files. Want to know how much
memory a process uses? Read a file. Want to check CPU topology? Read a file.
Want to toggle a GPIO pin? Write to a file. This chapter explores /proc and
/sys -- the "everything is a file" philosophy at its most powerful.
/proc: Process and Kernel Information
/proc is a virtual filesystem. Nothing is stored on disk. Every read
generates the content on the fly from kernel data structures.
/proc/
|- 1/ <-- init process
| |- status <-- process state
| |- maps <-- memory mappings
| |- fd/ <-- open file descriptors
| +- cmdline <-- command line
|- self/ <-- symlink to current process
|- cpuinfo <-- CPU details
|- meminfo <-- memory statistics
|- uptime <-- seconds since boot
+- loadavg <-- load averages
Reading /proc/self/maps
Every process can inspect its own memory layout.
/* proc_maps.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int global_var = 42;
int main(void) {
int stack_var = 99;
int *heap_var = malloc(sizeof(int));
*heap_var = 7;
printf("Addresses:\n");
printf(" main() = %p (text)\n", (void *)main);
printf(" global_var = %p (data)\n", (void *)&global_var);
printf(" heap_var = %p (heap)\n", (void *)heap_var);
printf(" stack_var = %p (stack)\n", (void *)&stack_var);
printf("\n--- /proc/self/maps ---\n");
FILE *f = fopen("/proc/self/maps", "r");
if (!f) { perror("fopen"); return 1; }
char line[512];
while (fgets(line, sizeof(line), f)) {
/* Show only interesting segments */
if (strstr(line, "proc_maps") || /* our binary */
strstr(line, "[heap]") ||
strstr(line, "[stack]") ||
strstr(line, "[vdso]")) {
printf(" %s", line);
}
}
fclose(f);
free(heap_var);
return 0;
}
$ gcc -O0 -o proc_maps proc_maps.c && ./proc_maps
Addresses:
main() = 0x5599a3b00189 (text)
global_var = 0x5599a3b03010 (data)
heap_var = 0x5599a4e482a0 (heap)
stack_var = 0x7ffc1a2b3c44 (stack)
--- /proc/self/maps ---
5599a3b00000-5599a3b01000 r-xp ... proc_maps
5599a4e48000-5599a4e69000 rw-p ... [heap]
7ffc1a294000-7ffc1a2b5000 rw-p ... [stack]
7ffc1a2fd000-7ffc1a301000 r-xp ... [vdso]
The maps file shows: address range, permissions (r/w/x/p), offset, device,
inode, and pathname.
Try It: Run the program and find which region contains each address. Can you identify the text, data, heap, and stack regions in the maps output?
Reading /proc/[pid]/status
/* proc_status.c */
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int main(void) {
char path[64];
snprintf(path, sizeof(path), "/proc/%d/status", getpid());
FILE *f = fopen(path, "r");
if (!f) { perror("fopen"); return 1; }
char line[256];
while (fgets(line, sizeof(line), f)) {
if (strncmp(line, "Name:", 5) == 0 ||
strncmp(line, "Pid:", 4) == 0 ||
strncmp(line, "PPid:", 5) == 0 ||
strncmp(line, "VmSize:", 7) == 0 ||
strncmp(line, "VmRSS:", 6) == 0 ||
strncmp(line, "Threads:", 8) == 0) {
printf("%s", line);
}
}
fclose(f);
return 0;
}
$ gcc -o proc_status proc_status.c && ./proc_status
Name: proc_status
Pid: 12345
PPid: 11000
VmSize: 2104 kB
VmRSS: 768 kB
Threads: 1
Key fields:
- VmSize: Total virtual memory.
- VmRSS: Resident Set Size -- how much physical memory is actually used.
- Threads: Number of threads in the process.
Reading /proc/cpuinfo and /proc/meminfo
/* sysinfo.c */
#include <stdio.h>
#include <string.h>
static void print_matching_lines(const char *path, const char *prefix) {
FILE *f = fopen(path, "r");
if (!f) { perror(path); return; }
char line[256];
while (fgets(line, sizeof(line), f)) {
if (strncmp(line, prefix, strlen(prefix)) == 0)
printf("%s", line);
}
fclose(f);
}
int main(void) {
printf("=== CPU ===\n");
print_matching_lines("/proc/cpuinfo", "model name");
print_matching_lines("/proc/cpuinfo", "cpu cores");
printf("\n=== Memory ===\n");
print_matching_lines("/proc/meminfo", "MemTotal");
print_matching_lines("/proc/meminfo", "MemFree");
print_matching_lines("/proc/meminfo", "MemAvailable");
print_matching_lines("/proc/meminfo", "SwapTotal");
return 0;
}
$ gcc -o sysinfo sysinfo.c && ./sysinfo
=== CPU ===
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
cpu cores : 8
=== Memory ===
MemTotal: 16384000 kB
MemFree: 4200000 kB
MemAvailable: 12000000 kB
SwapTotal: 8192000 kB
Rust: Reading /proc
// proc_reader.rs use std::fs; use std::io::{self, BufRead}; fn read_proc_field(path: &str, prefix: &str) -> io::Result<Vec<String>> { let file = fs::File::open(path)?; let reader = io::BufReader::new(file); let matches: Vec<String> = reader .lines() .filter_map(|line| { let line = line.ok()?; if line.starts_with(prefix) { Some(line) } else { None } }) .collect(); Ok(matches) } fn main() -> io::Result<()> { // Read our own memory maps let pid = std::process::id(); let maps = fs::read_to_string(format!("/proc/{pid}/maps"))?; println!("=== Memory Maps (first 5 lines) ==="); for line in maps.lines().take(5) { println!(" {line}"); } // Read system info println!("\n=== CPU ==="); for line in read_proc_field("/proc/cpuinfo", "model name")?.iter().take(1) { println!(" {line}"); } println!("\n=== Memory ==="); for line in &read_proc_field("/proc/meminfo", "MemTotal")? { println!(" {line}"); } for line in &read_proc_field("/proc/meminfo", "MemAvailable")? { println!(" {line}"); } Ok(()) }
/sys: The Device Model
/sys exposes the kernel's device model. It's organized by bus, class, and
device.
/sys/
|- class/
| |- net/ <-- network interfaces
| | |- eth0/
| | +- lo/
| |- block/ <-- block devices
| +- tty/ <-- terminals
|- bus/
| |- pci/
| |- usb/
| +- platform/
|- devices/ <-- device hierarchy
+- kernel/ <-- kernel parameters
Reading Network Interface Info via sysfs
/* sysfs_net.c */
#include <stdio.h>
#include <string.h>
#include <dirent.h>
static int read_sysfs_str(const char *path, char *buf, size_t len) {
FILE *f = fopen(path, "r");
if (!f) return -1;
if (!fgets(buf, len, f)) {
fclose(f);
return -1;
}
/* Remove trailing newline */
buf[strcspn(buf, "\n")] = '\0';
fclose(f);
return 0;
}
int main(void) {
DIR *d = opendir("/sys/class/net");
if (!d) { perror("opendir"); return 1; }
struct dirent *entry;
while ((entry = readdir(d)) != NULL) {
if (entry->d_name[0] == '.')
continue;
char path[256], buf[64];
printf("Interface: %s\n", entry->d_name);
/* Read MTU */
snprintf(path, sizeof(path),
"/sys/class/net/%s/mtu", entry->d_name);
if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
printf(" MTU: %s\n", buf);
/* Read operstate (up/down) */
snprintf(path, sizeof(path),
"/sys/class/net/%s/operstate", entry->d_name);
if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
printf(" State: %s\n", buf);
/* Read MAC address */
snprintf(path, sizeof(path),
"/sys/class/net/%s/address", entry->d_name);
if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
printf(" MAC: %s\n", buf);
/* Read speed (may fail for loopback) */
snprintf(path, sizeof(path),
"/sys/class/net/%s/speed", entry->d_name);
if (read_sysfs_str(path, buf, sizeof(buf)) == 0)
printf(" Speed: %s Mbps\n", buf);
printf("\n");
}
closedir(d);
return 0;
}
$ gcc -o sysfs_net sysfs_net.c && ./sysfs_net
Interface: eth0
MTU: 1500
State: up
MAC: 00:11:22:33:44:55
Speed: 1000 Mbps
Interface: lo
MTU: 65536
State: unknown
MAC: 00:00:00:00:00:00
Writing to sysfs
Some sysfs attributes are writable. This is how you configure hardware from user space.
/* sysfs_write.c — set network interface MTU */
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <interface> <mtu>\n", argv[0]);
return 1;
}
char path[256];
snprintf(path, sizeof(path),
"/sys/class/net/%s/mtu", argv[1]);
FILE *f = fopen(path, "w");
if (!f) {
perror("fopen (need root?)");
return 1;
}
fprintf(f, "%s\n", argv[2]);
fclose(f);
printf("Set %s MTU to %s\n", argv[1], argv[2]);
return 0;
}
$ gcc -o sysfs_write sysfs_write.c
$ sudo ./sysfs_write eth0 9000
Set eth0 MTU to 9000
Caution: Writing to
/sysfiles can change hardware behavior. Setting a wrong MTU, disabling a device, or modifying power settings can cause system instability. Always check what an attribute does before writing.
GPIO via sysfs (Legacy Interface)
The classic sysfs GPIO interface demonstrates read/write device control. Note
that modern Linux prefers the libgpiod character device interface, but sysfs
remains common in embedded systems.
/* gpio_sysfs.c — toggle a GPIO pin (legacy interface) */
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
static int write_file(const char *path, const char *value) {
int fd = open(path, O_WRONLY);
if (fd < 0) { perror(path); return -1; }
write(fd, value, strlen(value));
close(fd);
return 0;
}
int main(void) {
int gpio_num = 17; /* Example: Raspberry Pi GPIO 17 */
char buf[64];
/* Export the GPIO */
snprintf(buf, sizeof(buf), "%d", gpio_num);
write_file("/sys/class/gpio/export", buf);
/* Set direction to output */
snprintf(buf, sizeof(buf), "/sys/class/gpio/gpio%d/direction", gpio_num);
write_file(buf, "out");
/* Toggle the pin */
snprintf(buf, sizeof(buf), "/sys/class/gpio/gpio%d/value", gpio_num);
for (int i = 0; i < 10; i++) {
write_file(buf, (i & 1) ? "1" : "0");
usleep(500000); /* 500 ms */
}
/* Unexport */
snprintf(buf, sizeof(buf), "%d", gpio_num);
write_file("/sys/class/gpio/unexport", buf);
printf("Done toggling GPIO %d\n", gpio_num);
return 0;
}
Driver Prep: When you write a kernel driver, you create the sysfs attributes that user-space programs read and write. The
device_attributestructure andsysfs_create_file()are the kernel-side API. Everything you're reading here was created by a driver.
Rust: Reading sysfs
// sysfs_reader.rs use std::fs; use std::path::Path; fn read_sysfs(path: &str) -> Option<String> { fs::read_to_string(path) .ok() .map(|s| s.trim().to_string()) } fn main() { let net_dir = Path::new("/sys/class/net"); let entries = fs::read_dir(net_dir).expect("cannot read /sys/class/net"); for entry in entries { let entry = entry.unwrap(); let name = entry.file_name(); let name = name.to_str().unwrap(); println!("Interface: {name}"); let base = format!("/sys/class/net/{name}"); if let Some(mtu) = read_sysfs(&format!("{base}/mtu")) { println!(" MTU: {mtu}"); } if let Some(state) = read_sysfs(&format!("{base}/operstate")) { println!(" State: {state}"); } if let Some(mac) = read_sysfs(&format!("{base}/address")) { println!(" MAC: {mac}"); } println!(); } }
Udev Rules
Udev runs in user space and reacts to kernel device events. Rules in
/etc/udev/rules.d/ can:
- Set device permissions.
- Create stable symlinks (
/dev/mydevice). - Run scripts when devices appear.
Example rule (/etc/udev/rules.d/99-usb-serial.rules):
# When a USB serial adapter is plugged in, create /dev/myserial
SUBSYSTEM=="tty", ATTRS{idVendor}=="1a86", ATTRS{idProduct}=="7523", \
SYMLINK+="myserial", MODE="0666"
Test rules without reboot:
$ sudo udevadm trigger
$ sudo udevadm test /sys/class/tty/ttyUSB0
"Everything Is a File" in Practice
The /proc and /sys filesystems are the ultimate expression of Unix's "everything is a file" design:
+-------------------+---------------------------+--------------------------+
| What | File path | Operation |
+-------------------+---------------------------+--------------------------+
| Process memory | /proc/[pid]/maps | read |
| Kernel version | /proc/version | read |
| System uptime | /proc/uptime | read |
| Network MTU | /sys/class/net/eth0/mtu | read/write |
| CPU frequency | /sys/devices/.../scaling_ | read/write |
| | cur_freq | |
| Disk scheduler | /sys/block/sda/queue/ | read/write |
| | scheduler | |
| LED brightness | /sys/class/leds/.../ | read/write |
| | brightness | |
+-------------------+---------------------------+--------------------------+
This means shell scripts, Python, C, Rust -- any language that can read files can control hardware.
Try It: Write a C program that reads
/proc/uptimeand prints how long the system has been running in hours, minutes, and seconds.
Quick Knowledge Check
- What is the difference between
/procand/sys? - Why is VmRSS more useful than VmSize for understanding memory usage?
- How does a udev rule differ from directly writing to
/sys?
Common Pitfalls
- Parsing /proc with fixed offsets. Fields can change between kernel versions. Always search for the label.
- Caching /proc data. It's generated on read. Old data is immediately stale.
- Writing to /sys without root. Most writable attributes require
CAP_SYS_ADMINor root. - Assuming sysfs paths are stable. Hardware topology can change. Use udev rules for stable names.
- Blocking on /proc reads. Some
/procfiles (like/proc/kmsg) block. Use non-blocking I/O or poll. - String parsing errors.
/procvalues often have trailing newlines or varying whitespace. Alwaystrim()/strcspn().
ioctl and Device Interaction
When read, write, and lseek aren't enough, there's ioctl. It's the
Swiss Army knife of device control -- a single syscall that can do anything
the driver author wanted. This chapter explains how ioctl works, how
request numbers are encoded, and how to use it to talk to devices from user
space.
What ioctl Is
int ioctl(int fd, unsigned long request, ...);
ioctl sends a command to a device driver through an open file descriptor.
The request number encodes what to do, and the optional third argument
carries data (usually a pointer to a struct).
User space Kernel space
+----------+ +------------------+
| program | ioctl(fd, cmd) | driver |
| | ----------------> | .unlocked_ioctl |
| | <---------------- | return result |
+----------+ +------------------+
ioctl exists because devices have operations that don't fit the read/write
model: setting baud rates, querying screen dimensions, ejecting disks,
configuring network interfaces.
ioctl Request Number Encoding
On Linux, ioctl numbers are 32-bit values with structure:
Bits: 31..30 29..16 15..8 7..0
+------+---------+----------+--------+
| dir | size | type | number |
+------+---------+----------+--------+
dir: _IOC_NONE (0), _IOC_READ (2), _IOC_WRITE (1), _IOC_READ|_IOC_WRITE (3)
size: Size of the data argument (14 bits)
type: Magic number identifying the driver (8 bits)
number: Command number within the driver (8 bits)
The kernel provides macros to build these:
#include <linux/ioctl.h> /* or <sys/ioctl.h> */
_IO(type, number) /* No data transfer */
_IOR(type, number, datatype) /* Read from driver to user */
_IOW(type, number, datatype) /* Write from user to driver */
_IOWR(type, number, datatype) /* Both directions */
Example: _IOR('T', 1, struct winsize) means "read from driver, magic type
'T', command 1, data is a struct winsize."
Getting Terminal Size: TIOCGWINSZ
The most common ioctl in everyday programming.
/* term_size.c */
#include <stdio.h>
#include <sys/ioctl.h>
#include <unistd.h>
int main(void) {
struct winsize ws;
if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &ws) < 0) {
perror("ioctl TIOCGWINSZ");
return 1;
}
printf("Terminal size:\n");
printf(" Rows: %d\n", ws.ws_row);
printf(" Columns: %d\n", ws.ws_col);
printf(" X pixels: %d\n", ws.ws_xpixel);
printf(" Y pixels: %d\n", ws.ws_ypixel);
return 0;
}
$ gcc -o term_size term_size.c && ./term_size
Terminal size:
Rows: 40
Columns: 120
X pixels: 0
Y pixels: 0
The name decodes as: Terminal I/O Control, Get WINdow SiZe.
Setting Terminal Attributes
/* term_raw.c — put terminal in raw mode, then restore */
#include <stdio.h>
#include <unistd.h>
#include <termios.h>
int main(void) {
struct termios orig, raw;
/* Save original settings */
if (tcgetattr(STDIN_FILENO, &orig) < 0) {
perror("tcgetattr");
return 1;
}
raw = orig;
/* Disable canonical mode and echo */
raw.c_lflag &= ~(ICANON | ECHO);
raw.c_cc[VMIN] = 1; /* read at least 1 byte */
raw.c_cc[VTIME] = 0; /* no timeout */
if (tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw) < 0) {
perror("tcsetattr");
return 1;
}
printf("Raw mode. Press 'q' to quit. Typed characters show as hex.\r\n");
char c;
while (read(STDIN_FILENO, &c, 1) == 1) {
if (c == 'q') break;
printf("0x%02x\r\n", (unsigned char)c);
}
/* Restore original settings */
tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig);
printf("\nRestored normal mode.\n");
return 0;
}
$ gcc -o term_raw term_raw.c && ./term_raw
Raw mode. Press 'q' to quit. Typed characters show as hex.
Under the hood, tcgetattr and tcsetattr call ioctl with TCGETS and
TCSETS requests.
Try It: Run
term_rawand press arrow keys. You'll see escape sequences (0x1b 0x5b 0x41 for Up). This is how terminal applications detect special keys.
Block Device ioctls
/* blk_size.c — get block device size */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/fs.h>
#include <stdint.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <block_device>\n", argv[0]);
return 1;
}
int fd = open(argv[1], O_RDONLY);
if (fd < 0) { perror("open"); return 1; }
/* Get size in bytes */
uint64_t size_bytes;
if (ioctl(fd, BLKGETSIZE64, &size_bytes) < 0) {
perror("ioctl BLKGETSIZE64");
close(fd);
return 1;
}
/* Get sector size */
int sector_size;
if (ioctl(fd, BLKSSZGET, §or_size) < 0) {
perror("ioctl BLKSSZGET");
close(fd);
return 1;
}
/* Get read-only status */
int readonly;
if (ioctl(fd, BLKROGET, &readonly) < 0) {
perror("ioctl BLKROGET");
close(fd);
return 1;
}
printf("Device: %s\n", argv[1]);
printf("Size: %lu bytes (%.2f GB)\n",
size_bytes, size_bytes / 1e9);
printf("Sector size: %d bytes\n", sector_size);
printf("Read-only: %s\n", readonly ? "yes" : "no");
close(fd);
return 0;
}
$ gcc -o blk_size blk_size.c
$ sudo ./blk_size /dev/sda
Device: /dev/sda
Size: 500107862016 bytes (500.11 GB)
Sector size: 512 bytes
Read-only: no
Defining Custom ioctl Numbers
When writing a user-space program that talks to a custom driver, you define matching ioctl numbers.
/* custom_ioctl.h — shared between driver and user-space */
#ifndef CUSTOM_IOCTL_H
#define CUSTOM_IOCTL_H
#include <linux/ioctl.h>
#define MYDEV_MAGIC 'M'
struct mydev_config {
int speed;
int mode;
char name[32];
};
/* Commands */
#define MYDEV_GET_CONFIG _IOR(MYDEV_MAGIC, 0, struct mydev_config)
#define MYDEV_SET_CONFIG _IOW(MYDEV_MAGIC, 1, struct mydev_config)
#define MYDEV_RESET _IO(MYDEV_MAGIC, 2)
#define MYDEV_TRANSFER _IOWR(MYDEV_MAGIC, 3, struct mydev_config)
#endif /* CUSTOM_IOCTL_H */
User-space program using the custom ioctls:
/* user_ioctl.c — user-space side of custom device control */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <string.h>
/* In real code, include custom_ioctl.h */
#include <linux/ioctl.h>
#define MYDEV_MAGIC 'M'
struct mydev_config {
int speed;
int mode;
char name[32];
};
#define MYDEV_GET_CONFIG _IOR(MYDEV_MAGIC, 0, struct mydev_config)
#define MYDEV_SET_CONFIG _IOW(MYDEV_MAGIC, 1, struct mydev_config)
#define MYDEV_RESET _IO(MYDEV_MAGIC, 2)
int main(void) {
int fd = open("/dev/mydevice", O_RDWR);
if (fd < 0) {
perror("open /dev/mydevice");
printf("(This demo needs a matching kernel module loaded.)\n");
return 1;
}
/* Read current config */
struct mydev_config cfg;
if (ioctl(fd, MYDEV_GET_CONFIG, &cfg) < 0) {
perror("ioctl GET_CONFIG");
close(fd);
return 1;
}
printf("Current: speed=%d mode=%d name=%s\n",
cfg.speed, cfg.mode, cfg.name);
/* Modify and write back */
cfg.speed = 115200;
cfg.mode = 3;
strncpy(cfg.name, "fast_mode", sizeof(cfg.name));
if (ioctl(fd, MYDEV_SET_CONFIG, &cfg) < 0) {
perror("ioctl SET_CONFIG");
close(fd);
return 1;
}
printf("Config updated.\n");
close(fd);
return 0;
}
Driver Prep: On the kernel side, the driver's
file_operations.unlocked_ioctlfunction receives these commands. It usescopy_from_user()andcopy_to_user()to safely transfer the struct between user and kernel space. This is exactly how real hardware drivers are controlled.
Decoding ioctl Numbers
You can decode any ioctl number:
/* decode_ioctl.c */
#include <stdio.h>
#include <sys/ioctl.h>
int main(void) {
unsigned long cmd = TIOCGWINSZ;
printf("TIOCGWINSZ = 0x%lx\n", cmd);
printf(" Direction: %lu\n", (cmd >> 30) & 3);
printf(" Size: %lu bytes\n", (cmd >> 16) & 0x3FFF);
printf(" Type: '%c' (0x%02lx)\n",
(char)((cmd >> 8) & 0xFF), (cmd >> 8) & 0xFF);
printf(" Number: %lu\n", cmd & 0xFF);
return 0;
}
$ gcc -o decode_ioctl decode_ioctl.c && ./decode_ioctl
TIOCGWINSZ = 0x5413
Direction: 0
Size: 0 bytes
Type: 'T' (0x54)
Number: 19
Note: TIOCGWINSZ is an older ioctl that predates the modern encoding scheme,
so the direction and size fields may be zero.
The ioctl vs sysfs Debate
+------------------+----------------------------+---------------------------+
| Aspect | ioctl | sysfs |
+------------------+----------------------------+---------------------------+
| Interface | Binary struct | Text string |
| Discovery | Need header file | ls /sys/... |
| Scripting | Requires C/compiled code | cat/echo from shell |
| Performance | One syscall | open+read/write+close |
| Complex data | Handles structs natively | Must serialize to text |
| Debugging | Opaque without docs | Self-documenting filenames|
+------------------+----------------------------+---------------------------+
Modern practice: use sysfs for simple attributes (enable/disable, speed, status), use ioctl for complex operations (DMA transfers, firmware upload, bulk configuration).
Rust: ioctl with the nix Crate
The nix crate provides type-safe ioctl wrappers.
// Cargo.toml: nix = { version = "0.27", features = ["ioctl", "term"] } use nix::libc; use nix::sys::termios; use std::os::unix::io::AsRawFd; use std::io; // Terminal size ioctl nix::ioctl_read_bad!(tiocgwinsz, libc::TIOCGWINSZ, libc::winsize); fn main() -> io::Result<()> { let stdout = io::stdout(); let fd = stdout.as_raw_fd(); // Get terminal size let mut ws = libc::winsize { ws_row: 0, ws_col: 0, ws_xpixel: 0, ws_ypixel: 0, }; unsafe { tiocgwinsz(fd, &mut ws).expect("TIOCGWINSZ failed"); } println!("Terminal: {} rows x {} cols", ws.ws_row, ws.ws_col); // Get terminal attributes using nix's safe wrapper let attrs = termios::tcgetattr(fd).expect("tcgetattr failed"); println!("Input flags: {:?}", attrs.input_flags); println!("Output flags: {:?}", attrs.output_flags); println!("Local flags: {:?}", attrs.local_flags); Ok(()) }
Defining Custom ioctls in Rust
use nix::libc; // Define the same custom ioctls from the C example const MYDEV_MAGIC: u8 = b'M'; #[repr(C)] struct MydevConfig { speed: i32, mode: i32, name: [u8; 32], } nix::ioctl_read!(mydev_get_config, MYDEV_MAGIC, 0, MydevConfig); nix::ioctl_write_ptr!(mydev_set_config, MYDEV_MAGIC, 1, MydevConfig); nix::ioctl_none!(mydev_reset, MYDEV_MAGIC, 2); fn main() { // These would be called as: // unsafe { mydev_get_config(fd, &mut cfg) } // unsafe { mydev_set_config(fd, &cfg) } // unsafe { mydev_reset(fd) } println!("Custom ioctl macros defined successfully."); println!("(Need /dev/mydevice to actually use them.)"); }
Rust Note: The
nixcrate's ioctl macros generateunsafefunctions because ioctls inherently bypass Rust's type system -- you're passing raw memory to a kernel driver. Theunsafeblock explicitly marks this trust boundary.
Practical Example: Watchdog Timer
The Linux watchdog (/dev/watchdog) is controlled entirely via ioctls.
/* watchdog_info.c */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/watchdog.h>
int main(void) {
int fd = open("/dev/watchdog", O_RDWR);
if (fd < 0) {
perror("open /dev/watchdog (need root or watchdog group)");
return 1;
}
/* Get watchdog info */
struct watchdog_info info;
if (ioctl(fd, WDIOC_GETSUPPORT, &info) == 0) {
printf("Watchdog: %s\n", info.identity);
printf("Firmware: %u\n", info.firmware_version);
printf("Options: 0x%08x\n", info.options);
}
/* Get timeout */
int timeout;
if (ioctl(fd, WDIOC_GETTIMEOUT, &timeout) == 0)
printf("Timeout: %d seconds\n", timeout);
/* Magic close: write 'V' before closing to prevent reboot */
write(fd, "V", 1);
close(fd);
return 0;
}
Caution: Opening
/dev/watchdogstarts the watchdog timer. If you don't periodically write to it (or close with the magic 'V' character), the system will reboot. Do not run this on a production system without understanding the consequences.
Try It: Use
straceto trace the ioctls of a familiar command:strace -e ioctl ls -l /dev/tty. How many different ioctl requests do you see?
Quick Knowledge Check
- What do the four fields in an ioctl number encode?
- What is the difference between
_IORand_IOW? - Why does the
nixcrate mark ioctl functions asunsafe?
Common Pitfalls
- Wrong ioctl number. A mismatched magic type or command number returns
ENOTTY("inappropriate ioctl for device"). Despite the confusing name, this is the standard error. - Wrong data size. If the struct in
_IORdoesn't match the kernel's expected size, the ioctl fails or corrupts memory. - Missing
O_RDWR. Some ioctls require the fd to be opened read-write, even if the ioctl only reads data. - Forgetting
copy_from_useron the kernel side. Accessing user pointers directly from kernel code is a security vulnerability (and crashes on SMAP- enabled CPUs). - Platform differences. ioctl numbers can differ between architectures (32-bit vs 64-bit). Always use the macros, never hardcode numbers.
- ioctl on the wrong fd.
TIOCGWINSZworks on a terminal fd, not a regular file fd. Check what your fd actually points to.
Netlink Sockets
Netlink is Linux's primary mechanism for communication between the kernel and
user-space processes. Unlike ioctl, netlink uses a proper socket interface
with structured messages, multicast groups, and asynchronous notifications.
This chapter shows how to read the routing table, monitor network events, and
build a simple network monitor.
What Netlink Is
Netlink is an AF_NETLINK socket family. Instead of connecting to a remote
host, you connect to the kernel.
User space Kernel
+------------------+ +------------------+
| netlink socket | <----------> | netlink subsystem |
| AF_NETLINK | messages | (routing, link, |
| SOCK_DGRAM | | firewall, ...) |
+------------------+ +------------------+
Key properties:
- Message-based (like UDP, not like TCP streams).
- Supports multicast -- subscribe to kernel event groups.
- Bidirectional -- query state or receive notifications.
- Replaces many
ioctl-based network configuration interfaces.
Netlink Message Format
Every netlink message starts with struct nlmsghdr:
struct nlmsghdr {
__u32 nlmsg_len; /* Total message length (including header) */
__u16 nlmsg_type; /* Message type */
__u16 nlmsg_flags; /* Flags: NLM_F_REQUEST, NLM_F_DUMP, etc. */
__u32 nlmsg_seq; /* Sequence number (for matching replies) */
__u32 nlmsg_pid; /* Sending process PID */
};
Message layout:
+------------------+-------------------+------------------+
| nlmsghdr | payload | padding |
| (16 bytes) | (variable) | (to 4-byte align)|
+------------------+-------------------+------------------+
|<-------- nlmsg_len -------->|
For route messages, the payload is struct rtmsg followed by route attributes.
For link messages, it's struct ifinfomsg followed by link attributes.
Protocol Families
| Protocol | Purpose |
|---|---|
NETLINK_ROUTE | Routing, addresses, links, neighbors |
NETLINK_GENERIC | Generic netlink (extensible) |
NETLINK_NETFILTER | Firewall (nftables, conntrack) |
NETLINK_KOBJECT_UEVENT | Device hotplug events |
NETLINK_AUDIT | Kernel audit subsystem |
NETLINK_ROUTE is by far the most common.
Reading the Routing Table
/* netlink_routes.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <arpa/inet.h>
#define BUFSIZE 8192
struct nl_request {
struct nlmsghdr hdr;
struct rtmsg msg;
};
static void parse_route(struct nlmsghdr *nlh) {
struct rtmsg *rtm = NLMSG_DATA(nlh);
/* Only show main table IPv4 routes */
if (rtm->rtm_family != AF_INET)
return;
if (rtm->rtm_table != RT_TABLE_MAIN)
return;
char dst[INET_ADDRSTRLEN] = "0.0.0.0";
char gw[INET_ADDRSTRLEN] = "*";
int oif = 0;
struct rtattr *rta = RTM_RTA(rtm);
int rta_len = RTM_PAYLOAD(nlh);
while (RTA_OK(rta, rta_len)) {
switch (rta->rta_type) {
case RTA_DST:
inet_ntop(AF_INET, RTA_DATA(rta), dst, sizeof(dst));
break;
case RTA_GATEWAY:
inet_ntop(AF_INET, RTA_DATA(rta), gw, sizeof(gw));
break;
case RTA_OIF:
oif = *(int *)RTA_DATA(rta);
break;
}
rta = RTA_NEXT(rta, rta_len);
}
printf(" %-18s via %-15s dev index %d /%d\n",
dst, gw, oif, rtm->rtm_dst_len);
}
int main(void) {
int sock = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
if (sock < 0) { perror("socket"); return 1; }
/* Bind to netlink */
struct sockaddr_nl sa = {
.nl_family = AF_NETLINK,
.nl_pid = getpid(),
};
if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
perror("bind");
close(sock);
return 1;
}
/* Request a dump of the routing table */
struct nl_request req = {
.hdr = {
.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
.nlmsg_type = RTM_GETROUTE,
.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
.nlmsg_seq = 1,
.nlmsg_pid = getpid(),
},
.msg = {
.rtm_family = AF_INET,
.rtm_table = RT_TABLE_MAIN,
},
};
if (send(sock, &req, req.hdr.nlmsg_len, 0) < 0) {
perror("send");
close(sock);
return 1;
}
/* Read the response */
printf("IPv4 Routing Table:\n");
printf(" %-18s %-17s %-15s %s\n",
"Destination", "Gateway", "Dev Index", "Prefix");
char buf[BUFSIZE];
int done = 0;
while (!done) {
ssize_t len = recv(sock, buf, sizeof(buf), 0);
if (len < 0) { perror("recv"); break; }
struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
while (NLMSG_OK(nlh, len)) {
if (nlh->nlmsg_type == NLMSG_DONE) {
done = 1;
break;
}
if (nlh->nlmsg_type == NLMSG_ERROR) {
fprintf(stderr, "Netlink error\n");
done = 1;
break;
}
parse_route(nlh);
nlh = NLMSG_NEXT(nlh, len);
}
}
close(sock);
return 0;
}
$ gcc -O2 -o netlink_routes netlink_routes.c && ./netlink_routes
IPv4 Routing Table:
Destination Gateway Dev Index Prefix
0.0.0.0 192.168.1.1 dev index 2 /0
192.168.1.0 * dev index 2 /24
172.17.0.0 * dev index 3 /16
Try It: Modify the program to also show IPv6 routes. Change
rtm_familytoAF_INET6and useinet_ntop(AF_INET6, ...)withINET6_ADDRSTRLEN.
Monitoring Network Events
Netlink supports multicast. Subscribe to groups to receive real-time notifications when links go up/down or addresses change.
/* netlink_monitor.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <net/if.h>
#define BUFSIZE 8192
static const char *msg_type_str(int type) {
switch (type) {
case RTM_NEWLINK: return "NEW_LINK";
case RTM_DELLINK: return "DEL_LINK";
case RTM_NEWADDR: return "NEW_ADDR";
case RTM_DELADDR: return "DEL_ADDR";
case RTM_NEWROUTE: return "NEW_ROUTE";
case RTM_DELROUTE: return "DEL_ROUTE";
default: return "UNKNOWN";
}
}
int main(void) {
int sock = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
if (sock < 0) { perror("socket"); return 1; }
struct sockaddr_nl sa = {
.nl_family = AF_NETLINK,
.nl_pid = getpid(),
.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR | RTMGRP_IPV4_ROUTE,
};
if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
perror("bind");
close(sock);
return 1;
}
printf("Monitoring network events (Ctrl+C to stop)...\n");
char buf[BUFSIZE];
while (1) {
ssize_t len = recv(sock, buf, sizeof(buf), 0);
if (len < 0) { perror("recv"); break; }
struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
while (NLMSG_OK(nlh, len)) {
printf("[%s] ", msg_type_str(nlh->nlmsg_type));
if (nlh->nlmsg_type == RTM_NEWLINK ||
nlh->nlmsg_type == RTM_DELLINK) {
struct ifinfomsg *ifi = NLMSG_DATA(nlh);
char ifname[IF_NAMESIZE];
if_indextoname(ifi->ifi_index, ifname);
printf("Interface: %s (index %d), flags=0x%x %s\n",
ifname, ifi->ifi_index, ifi->ifi_flags,
(ifi->ifi_flags & IFF_UP) ? "UP" : "DOWN");
} else {
printf("type=%d len=%d\n",
nlh->nlmsg_type, nlh->nlmsg_len);
}
nlh = NLMSG_NEXT(nlh, len);
}
}
close(sock);
return 0;
}
$ gcc -O2 -o netlink_monitor netlink_monitor.c && sudo ./netlink_monitor
Monitoring network events (Ctrl+C to stop)...
[NEW_LINK] Interface: eth0 (index 2), flags=0x1003 UP
[DEL_LINK] Interface: eth0 (index 2), flags=0x1002 DOWN
In another terminal, toggle an interface:
$ sudo ip link set eth0 down
$ sudo ip link set eth0 up
Netlink vs ioctl for Network Configuration
+------------------+----------------------------+---------------------------+
| Feature | Netlink | ioctl |
+------------------+----------------------------+---------------------------+
| Async events | Yes (multicast groups) | No (must poll) |
| Bulk queries | Yes (NLM_F_DUMP) | One item at a time |
| Extensibility | Attributes (TLV format) | Fixed struct size |
| Atomicity | Can batch operations | One operation per call |
| Modern tools | ip, iw use netlink | ifconfig uses ioctl |
| Complexity | Higher (message parsing) | Simpler (struct + call) |
+------------------+----------------------------+---------------------------+
The ip command uses netlink. The old ifconfig command uses ioctl. Netlink
is the modern, preferred interface.
Caution: Netlink messages must be properly aligned (NLMSG_ALIGN). Sending a message with wrong length or alignment can cause the kernel to reject it silently or return EINVAL.
Building a Simple Link Monitor (Complete Example)
This combines everything into a useful tool that watches for interface changes and prints their state.
/* link_watch.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <sys/socket.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <net/if.h>
#define BUFSIZE 8192
int main(void) {
int sock = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
if (sock < 0) { perror("socket"); return 1; }
struct sockaddr_nl sa = {
.nl_family = AF_NETLINK,
.nl_groups = RTMGRP_LINK,
};
if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
perror("bind");
close(sock);
return 1;
}
printf("%-20s %-10s %-8s %s\n",
"Time", "Interface", "Event", "State");
printf("%-20s %-10s %-8s %s\n",
"----", "---------", "-----", "-----");
char buf[BUFSIZE];
while (1) {
ssize_t len = recv(sock, buf, sizeof(buf), 0);
if (len < 0) break;
struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
while (NLMSG_OK(nlh, len)) {
if (nlh->nlmsg_type == RTM_NEWLINK) {
struct ifinfomsg *ifi = NLMSG_DATA(nlh);
char ifname[IF_NAMESIZE] = "???";
if_indextoname(ifi->ifi_index, ifname);
/* Get timestamp */
time_t now = time(NULL);
struct tm *tm = localtime(&now);
char timebuf[20];
strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S", tm);
const char *state;
if (ifi->ifi_flags & IFF_RUNNING)
state = "RUNNING";
else if (ifi->ifi_flags & IFF_UP)
state = "UP (no carrier)";
else
state = "DOWN";
printf("%-20s %-10s %-8s %s\n",
timebuf, ifname, "CHANGE", state);
fflush(stdout);
}
nlh = NLMSG_NEXT(nlh, len);
}
}
close(sock);
return 0;
}
Rust: Netlink with netlink-packet-route
The netlink-packet-route and netlink-sys crates provide structured
netlink access.
// Cargo.toml dependencies: // netlink-sys = "0.8" // netlink-packet-core = "0.7" // netlink-packet-route = "0.17" use std::io; use std::os::unix::io::AsRawFd; fn main() -> io::Result<()> { // Low-level: use raw socket like the C version let sock = unsafe { libc::socket(libc::AF_NETLINK, libc::SOCK_DGRAM, libc::NETLINK_ROUTE) }; if sock < 0 { return Err(io::Error::last_os_error()); } // Bind with RTMGRP_LINK group let mut sa: libc::sockaddr_nl = unsafe { std::mem::zeroed() }; sa.nl_family = libc::AF_NETLINK as u16; sa.nl_groups = 1; // RTMGRP_LINK let ret = unsafe { libc::bind( sock, &sa as *const _ as *const libc::sockaddr, std::mem::size_of::<libc::sockaddr_nl>() as u32, ) }; if ret < 0 { return Err(io::Error::last_os_error()); } println!("Monitoring link events (Ctrl+C to stop)..."); let mut buf = [0u8; 8192]; loop { let len = unsafe { libc::recv(sock, buf.as_mut_ptr() as *mut _, buf.len(), 0) }; if len < 0 { return Err(io::Error::last_os_error()); } println!("Received {} bytes of netlink data", len); } }
For a higher-level approach, use the rtnetlink crate:
// Cargo.toml: rtnetlink = "0.13", tokio = { version = "1", features = ["full"] } use rtnetlink::new_connection; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let (connection, handle, _) = new_connection()?; tokio::spawn(connection); // List all links let mut links = handle.link().get().execute(); use futures::stream::StreamExt; while let Some(msg) = links.next().await { match msg { Ok(link) => { let name = link.header.index; println!("Link index {}: {:?}", name, link.attributes); } Err(e) => { eprintln!("Error: {e}"); break; } } } Ok(()) }
Rust Note: The
rtnetlinkcrate is async and usestokio. It provides a much higher-level API than raw netlink sockets, with proper message parsing and type safety. For production code, this is strongly preferred over raw socket manipulation.
NETLINK_GENERIC: The Extension Point
Generic netlink allows kernel modules and user-space programs to define custom message families without allocating a dedicated protocol number.
Flow:
1. Kernel module registers a generic netlink family ("my_family")
2. User-space resolves the family name to an ID via the controller
3. Communication proceeds using that dynamic ID
Tools like nl80211 (Wi-Fi configuration) and taskstats use generic
netlink.
$ genl-ctrl-list # (from libnl-utils)
0x0010 nlctrl version 2
0x0015 devlink version 1
0x001b nl80211 version 1
...
Driver Prep: Kernel modules that need a user-space communication channel often use generic netlink. When you write kernel modules, you'll use
genl_register_family()to create a netlink family, and user-space programs will talk to your module via generic netlink sockets. This is the modern alternative to creating a custom character device for every module.
Try It: Run
netlink_monitorin one terminal. In another terminal, runsudo ip addr add 10.99.99.1/24 dev loandsudo ip addr del 10.99.99.1/24 dev lo. Watch the NEW_ADDR and DEL_ADDR events appear.
Quick Knowledge Check
- What advantages does netlink have over ioctl for network configuration?
- What does
nl_groupsinsockaddr_nlcontrol? - Why does netlink use
NLMSG_ALIGNandNLMSG_NEXTmacros instead of simple pointer arithmetic?
Common Pitfalls
- Forgetting
NLM_F_DUMPfor bulk queries. Without it, you get one entry instead of the full table. - Not checking
NLMSG_DONE. The kernel sends multi-part responses. You must loop until you seeNLMSG_DONE. - Buffer too small. Netlink dumps can be large. Use at least 8KB buffers, or better, 32KB.
- Wrong
nl_pid. Set it togetpid()or 0 (let the kernel assign). Using a conflicting PID causesEADDRINUSE. - Ignoring
NLMSG_ERROR. The kernel reports errors as netlink messages. Always check for error responses. - Assuming message order. Multicast events can arrive between dump responses. Use sequence numbers to match requests with replies.
Preparing for Kernel Space
Everything in this book has been user-space code. But every concept -- pointers, bit manipulation, function pointers, state machines, memory layout -- was chosen because it maps directly to kernel programming. This chapter connects the dots: what changes when you cross into kernel space, and how your user-space skills translate.
What Changes When You Cross the Boundary
User space Kernel space
+----------------------------------+----------------------------------+
| libc available | No libc |
| malloc/free | kmalloc/kfree (with GFP flags) |
| printf | printk |
| Segfaults caught by kernel | Bugs crash the whole system |
| Virtual address space per process| Shared address space, all memory |
| Floating point available | No floating point (usually) |
| Large stack (8 MB default) | Tiny stack (8-16 KB) |
| User can be preempted freely | Must think about preemption |
| Errors return -1 and set errno | Functions return negative errno |
+----------------------------------+----------------------------------+
The kernel is freestanding C. No standard library, no heap by default, no safety net. Every technique we've practiced -- careful memory management, understanding alignment, defensive error handling -- becomes critical.
The Kernel's C Dialect
Kernel C is C11 (or later) with extensions and restrictions.
No Standard Library
You do not get #include <stdio.h>. Instead:
/* Kernel equivalents */
#include <linux/kernel.h> /* printk, container_of */
#include <linux/slab.h> /* kmalloc, kfree */
#include <linux/string.h> /* memcpy, strcmp (kernel versions) */
#include <linux/types.h> /* u8, u16, u32, u64, etc. */
printk replaces printf:
/* User space */
printf("value = %d\n", x);
/* Kernel space */
printk(KERN_INFO "value = %d\n", x);
/* or modern style: */
pr_info("value = %d\n", x);
No Floating Point
The kernel does not save/restore FPU state on context switches between kernel threads. Using floating point in kernel code silently corrupts user-space FPU registers.
/* WRONG in kernel code: */
double ratio = bytes / 1024.0; /* will corrupt user FPU state */
/* CORRECT: use integer math */
unsigned long ratio = bytes / 1024;
unsigned long remainder = bytes % 1024;
If you absolutely need floating point (rare), you must wrap it:
kernel_fpu_begin();
/* ... floating point operations ... */
kernel_fpu_end();
Limited Stack
The kernel stack is typically 8 KB on x86 (two pages). Allocating large arrays on the stack will overflow it -- there's no guard page, just corruption.
/* WRONG in kernel code: */
char buffer[8192]; /* might overflow the entire kernel stack */
/* CORRECT: allocate on the heap */
char *buffer = kmalloc(8192, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
/* ... use buffer ... */
kfree(buffer);
GFP Flags
kmalloc takes a flags argument that specifies allocation context:
/* Can sleep (normal context, not in interrupt) */
ptr = kmalloc(size, GFP_KERNEL);
/* Cannot sleep (interrupt context, spinlock held) */
ptr = kmalloc(size, GFP_ATOMIC);
/* For DMA-able memory */
ptr = kmalloc(size, GFP_DMA);
Using GFP_KERNEL in interrupt context will deadlock the system. Using
GFP_ATOMIC wastes emergency memory reserves. Getting this right is
essential.
Caution: A wrong GFP flag is one of the most common kernel bugs. If you hold a spinlock or are in an interrupt handler, you must use
GFP_ATOMIC. If you useGFP_KERNELin that context, the allocator may sleep, and sleeping while holding a spinlock deadlocks the CPU.
Module Basics (Conceptual)
A kernel module is a .ko file loaded at runtime. The minimal structure:
/* hello_module.c -- conceptual, not compilable without kernel headers */
#include <linux/init.h>
#include <linux/module.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("Hello world kernel module");
static int __init hello_init(void)
{
pr_info("hello: module loaded\n");
return 0; /* 0 = success */
}
static void __exit hello_exit(void)
{
pr_info("hello: module unloaded\n");
}
module_init(hello_init);
module_exit(hello_exit);
$ make -C /lib/modules/$(uname -r)/build M=$(pwd) modules
$ sudo insmod hello.ko
$ dmesg | tail -1
[12345.678] hello: module loaded
$ sudo rmmod hello
The __init and __exit macros let the kernel free the init code after
loading and skip the exit code for built-in (non-modular) drivers.
How Your User-Space Skills Map to Kernel Code
list_head: The Kernel's Real Linked List
In Chapter 16, we built linked lists in C. The kernel uses struct list_head
-- an intrusive doubly-linked list that is embedded inside the data structure.
/* User space (from Ch16): */
struct node {
int data;
struct node *next;
};
/* Kernel: */
#include <linux/list.h>
struct my_item {
int data;
struct list_head list; /* embedded list node */
};
/* Usage: */
LIST_HEAD(my_list);
struct my_item *item = kmalloc(sizeof(*item), GFP_KERNEL);
item->data = 42;
list_add(&item->list, &my_list);
/* Iterate: */
struct my_item *pos;
list_for_each_entry(pos, &my_list, list) {
pr_info("data = %d\n", pos->data);
}
The container_of macro (which you may have implemented in Ch16) converts a
list_head pointer back to the containing structure. This is the same
technique, used everywhere in the kernel.
struct my_item layout:
+-----------+------------------+
| data (4B) | list_head (16B) |
| | .prev | .next |
+-----------+--------+---------+
^ ^
| |
item &item->list
container_of(&item->list, struct my_item, list) == item
Function Pointer vtables Become file_operations
In Chapter 18, we built vtables from function pointers. The kernel uses exactly the same pattern for its driver interfaces.
/* User space (from Ch18): */
struct Shape {
double (*area)(void *self);
void (*draw)(void *self);
};
/* Kernel: file_operations for a character device */
#include <linux/fs.h>
static int mydev_open(struct inode *i, struct file *f) { return 0; }
static ssize_t mydev_read(struct file *f, char __user *buf,
size_t len, loff_t *off) { return 0; }
static ssize_t mydev_write(struct file *f, const char __user *buf,
size_t len, loff_t *off) { return len; }
static long mydev_ioctl(struct file *f, unsigned int cmd,
unsigned long arg) { return 0; }
static int mydev_release(struct inode *i, struct file *f) { return 0; }
static const struct file_operations mydev_fops = {
.owner = THIS_MODULE,
.open = mydev_open,
.read = mydev_read,
.write = mydev_write,
.unlocked_ioctl = mydev_ioctl,
.release = mydev_release,
};
This is the same struct-of-function-pointers pattern. The kernel dispatches
open(), read(), write(), ioctl() through these pointers. Every
character device, block device, and network device uses this pattern.
Similarly, platform drivers:
/* Kernel: platform driver operations */
#include <linux/platform_device.h>
static int mydrv_probe(struct platform_device *pdev) { return 0; }
static int mydrv_remove(struct platform_device *pdev) { return 0; }
static struct platform_driver mydrv = {
.probe = mydrv_probe,
.remove = mydrv_remove,
.driver = {
.name = "my_device",
},
};
State Machines Become Driver Lifecycle
In Chapter 19, we built explicit state machines. Kernel drivers are state machines:
Driver Lifecycle State Machine:
[UNLOADED]
|
v module_init()
[LOADED]
|
v probe()
[BOUND TO DEVICE]
|
+----> suspend() --> [SUSPENDED]
| |
| resume() <-------+
|
v remove()
[UNBOUND]
|
v module_exit()
[UNLOADED]
Every transition has a corresponding callback in the driver structure. The patterns you practiced -- clear states, explicit transitions, error handling at each step -- are exactly what kernel drivers require.
Bit Manipulation Becomes Register Access
In Part III (Chapters 11-13), we covered bitwise operations, masks, and bit fields. In kernel drivers, you use these to read and write hardware registers.
/* User space (from Part III): */
#define BIT(n) (1UL << (n))
#define SET_BIT(val, n) ((val) | BIT(n))
#define CLR_BIT(val, n) ((val) & ~BIT(n))
/* Kernel: register access */
#include <linux/io.h>
#define REG_CONTROL 0x00
#define REG_STATUS 0x04
#define CTRL_ENABLE BIT(0)
#define CTRL_IRQ_EN BIT(1)
#define STATUS_BUSY BIT(7)
static void __iomem *base; /* memory-mapped register base */
/* Enable the device */
u32 val = readl(base + REG_CONTROL);
val |= CTRL_ENABLE | CTRL_IRQ_EN;
writel(val, base + REG_CONTROL);
/* Wait for not busy */
while (readl(base + REG_STATUS) & STATUS_BUSY)
cpu_relax();
readl and writel are memory-mapped I/O accessors that handle memory
barriers and prevent compiler reordering. The bit manipulation is identical
to what you learned.
Error Handling in the Kernel
The kernel returns negative errno values, not -1 with a separate errno
variable:
/* User space: */
int fd = open(path, O_RDONLY);
if (fd < 0) {
perror("open"); /* uses errno */
}
/* Kernel: */
static int mydrv_probe(struct platform_device *pdev)
{
void *buf = kmalloc(1024, GFP_KERNEL);
if (!buf)
return -ENOMEM; /* return the negative errno directly */
int irq = platform_get_irq(pdev, 0);
if (irq < 0)
return irq; /* pass through the error */
/* ... */
return 0; /* success */
}
The goto-based cleanup pattern from earlier chapters is the standard kernel idiom:
static int mydrv_probe(struct platform_device *pdev)
{
int ret;
void *buf = kmalloc(1024, GFP_KERNEL);
if (!buf)
return -ENOMEM;
ret = register_something();
if (ret)
goto err_free_buf;
ret = setup_irq();
if (ret)
goto err_unregister;
return 0;
err_unregister:
unregister_something();
err_free_buf:
kfree(buf);
return ret;
}
This pattern appears in virtually every kernel driver probe function.
Concurrency in the Kernel
The kernel is massively concurrent: multiple CPUs, interrupts, softirqs, workqueues. Everything from Chapters on threads and synchronization applies, but with kernel primitives:
+---------------------+----------------------------+
| User space | Kernel |
+---------------------+----------------------------+
| pthread_mutex_t | struct mutex |
| pthread_spinlock_t | spinlock_t |
| sem_t | struct semaphore |
| atomic_int | atomic_t, atomic_long_t |
| pthread_cond_t | wait_queue_head_t |
| read-write lock | rwlock_t, struct rw_semaphore |
+---------------------+----------------------------+
The key difference: in interrupt context, you cannot sleep, so you must use spinlocks rather than mutexes.
Rust in the Kernel
The Linux kernel has experimental Rust support. Kernel Rust has the same restrictions as kernel C: no standard library, no heap unless explicitly allocated, no floating point.
#![allow(unused)] fn main() { // Conceptual kernel module in Rust (requires Rust-for-Linux) use kernel::prelude::*; module! { type: MyModule, name: "my_module", license: "GPL", } struct MyModule; impl kernel::Module for MyModule { fn init(_module: &'static ThisModule) -> Result<Self> { pr_info!("Hello from Rust kernel module!\n"); Ok(MyModule) } } impl Drop for MyModule { fn drop(&mut self) { pr_info!("Goodbye from Rust kernel module!\n"); } } }
Rust kernel modules use:
kernel::crate instead ofstd::pr_info!instead ofprintln!Result<T>with kernel error typesBoxbacked bykmalloc- The borrow checker prevents most use-after-free and data race bugs at compile time
Rust Note: Rust in the kernel is not a replacement for C. It's an additional language option for new drivers and subsystems. Existing kernel C code will not be rewritten. Understanding C kernel programming is still essential even if you plan to write Rust kernel modules.
The Complete Mapping
Here is how every major topic from this book connects to kernel programming:
+------------------------------+---------------------------------------+
| Book Chapter / Topic | Kernel Equivalent |
+------------------------------+---------------------------------------+
| Pointers (Ch6-7) | __user pointers, void __iomem * |
| Structs (Ch8-9) | Every kernel data structure |
| Bit manipulation (Ch11-13) | Register access, flag fields |
| Linked lists (Ch16) | struct list_head, hlist_head |
| Function pointers (Ch18) | file_operations, driver ops |
| State machines (Ch19) | Driver probe/remove/suspend/resume |
| Opaque types (Ch20) | struct device, struct file (internals) |
| Build system (Ch24-27) | Kbuild, Kconfig, make menuconfig |
| File descriptors (Ch28-31) | struct file, VFS layer |
| Processes (Ch32-34) | kthread, workqueue |
| Signals (Ch35-37) | Kernel signal delivery |
| Memory mapping (Ch38-40) | ioremap, DMA mapping |
| Threads (Ch41-43) | kthread, per-cpu variables |
| Synchronization (Ch44-46) | spinlock, mutex, RCU |
| Networking (Ch47-49) | sk_buff, net_device, socket layer |
| Optimization (Ch50) | Cache-aligned structs, likely/unlikely |
| Arenas/pools (Ch51) | Slab allocator (kmem_cache) |
| Atomics (Ch52) | atomic_t, memory barriers |
| /proc and /sys (Ch53) | Creating procfs/sysfs entries |
| ioctl (Ch54) | Implementing file_operations.ioctl |
| Netlink (Ch55) | genl_register_family() |
+------------------------------+---------------------------------------+
What to Study Next
-
Linux Device Drivers (LDD3) -- the classic reference. Some APIs have changed, but the concepts are timeless.
-
The kernel source itself --
drivers/contains thousands of real examples. Start with simple ones likedrivers/misc/. -
QEMU + buildroot -- build a minimal Linux system and test your modules in a VM. No risk of crashing your real machine.
-
Kernel documentation --
Documentation/in the kernel tree. Especiallydriver-api/andcore-api/. -
Rust for Linux -- if you want to write kernel modules in Rust, follow the
rust-for-linuxproject. -
Write a character device driver -- your first kernel project should be a simple character device that implements
open,read,write, andioctl. You already know every concept required.
Driver Prep: This is it. You've learned the user-space foundations. Every concept in this book -- from pointers to atomics, from bit manipulation to state machines -- was chosen because it's essential in kernel and driver code. You're ready.
Try It: Download the kernel source. Navigate to
drivers/misc/dummy-irq.cordrivers/misc/eeprom/at24.c. Read the code. You should recognize the patterns: module init/exit, probe/remove, file_operations, error handling with goto, bit manipulation for registers. If you can read and understand a real kernel driver, you've succeeded.
Quick Knowledge Check
- Why can't you use
printfin kernel code? - What happens if you use
GFP_KERNELinside an interrupt handler? - How does
container_ofwork, and why is it essential for kernel linked lists?
Common Pitfalls
- Using
mallocin kernel code. There is no libc. Usekmalloc. - Large stack allocations. The kernel stack is 8-16 KB. Allocate large
buffers with
kmalloc. - Sleeping in atomic context. If you hold a spinlock or are in an interrupt
handler, you must not call anything that might sleep (
kmalloc(GFP_KERNEL),mutex_lock(),copy_from_user()-- yes, even that can sleep). - Forgetting to free on error paths. The goto cleanup pattern exists for a reason. Every resource acquired must be released in reverse order.
- Accessing user pointers directly. Always use
copy_from_user()/copy_to_user(). Direct access crashes on SMAP-enabled CPUs and is a security vulnerability. - No error checking. Every kernel function that can fail must be checked. The kernel does not tolerate ignored errors the way user space sometimes does.
Appendix A: C Standard Library Quick Reference
This appendix covers the most important C standard library functions for systems programming, organized by header. Each entry lists the signature, purpose, and common pitfalls.
stdio.h -- Standard I/O
| Function | Signature | Purpose |
|---|---|---|
printf | int printf(const char *fmt, ...) | Print formatted output to stdout |
fprintf | int fprintf(FILE *stream, const char *fmt, ...) | Print to any stream |
snprintf | int snprintf(char *buf, size_t n, const char *fmt, ...) | Safe formatted string |
scanf | int scanf(const char *fmt, ...) | Read formatted input from stdin |
fopen | FILE *fopen(const char *path, const char *mode) | Open a file stream |
fclose | int fclose(FILE *stream) | Close a file stream |
fread | size_t fread(void *ptr, size_t size, size_t n, FILE *s) | Binary read |
fwrite | size_t fwrite(const void *ptr, size_t size, size_t n, FILE *s) | Binary write |
fgets | char *fgets(char *s, int n, FILE *stream) | Read a line (safe) |
fputs | int fputs(const char *s, FILE *stream) | Write a string |
fseek | int fseek(FILE *stream, long offset, int whence) | Seek in stream |
ftell | long ftell(FILE *stream) | Current position |
fflush | int fflush(FILE *stream) | Flush output buffer |
perror | void perror(const char *s) | Print errno message |
feof | int feof(FILE *stream) | Check end-of-file |
ferror | int ferror(FILE *stream) | Check stream error |
Pitfalls:
sprintfhas no bounds checking. Always usesnprintf.getsis removed from C11. Never use it. Usefgets.scanf("%s", buf)has no bounds checking. Usescanf("%63s", buf)with a width specifier.feofreturns true only after a read attempt fails. Don't use it as a loop condition before reading.fcloseon an already-closedFILE*is undefined behavior.
stdlib.h -- General Utilities
| Function | Signature | Purpose |
|---|---|---|
malloc | void *malloc(size_t size) | Allocate memory |
calloc | void *calloc(size_t n, size_t size) | Allocate zeroed memory |
realloc | void *realloc(void *ptr, size_t size) | Resize allocation |
free | void free(void *ptr) | Free memory |
exit | void exit(int status) | Terminate process |
atexit | int atexit(void (*fn)(void)) | Register exit handler |
atoi | int atoi(const char *s) | String to int (unsafe) |
strtol | long strtol(const char *s, char **end, int base) | String to long (safe) |
strtoul | unsigned long strtoul(const char *s, char **end, int base) | String to unsigned long |
strtod | double strtod(const char *s, char **end) | String to double |
abs | int abs(int n) | Absolute value |
rand | int rand(void) | Pseudo-random integer |
srand | void srand(unsigned seed) | Seed random generator |
qsort | void qsort(void *base, size_t n, size_t size, int(*cmp)(...)) | Sort array |
bsearch | void *bsearch(const void *key, const void *base, ...) | Binary search |
getenv | char *getenv(const char *name) | Get environment variable |
system | int system(const char *cmd) | Run shell command |
Pitfalls:
atoireturns 0 on error, which is indistinguishable from converting "0". Usestrtolinstead.realloc(ptr, 0)behavior is implementation-defined. Don't rely on it.reallocon failure returns NULL but doesn't free the original. Save the original pointer:tmp = realloc(ptr, n); if (tmp) ptr = tmp;rand()is not cryptographically secure. Usegetrandom()for security.system()is a shell injection risk. Avoid in production code.
string.h -- String and Memory Operations
| Function | Signature | Purpose |
|---|---|---|
strlen | size_t strlen(const char *s) | String length |
strcpy | char *strcpy(char *dst, const char *src) | Copy string (unsafe) |
strncpy | char *strncpy(char *dst, const char *src, size_t n) | Copy with limit |
strcat | char *strcat(char *dst, const char *src) | Concatenate (unsafe) |
strncat | char *strncat(char *dst, const char *src, size_t n) | Concatenate with limit |
strcmp | int strcmp(const char *a, const char *b) | Compare strings |
strncmp | int strncmp(const char *a, const char *b, size_t n) | Compare n bytes |
strchr | char *strchr(const char *s, int c) | Find char in string |
strrchr | char *strrchr(const char *s, int c) | Find char (from end) |
strstr | char *strstr(const char *haystack, const char *needle) | Find substring |
strtok | char *strtok(char *s, const char *delim) | Tokenize (modifies string) |
memcpy | void *memcpy(void *dst, const void *src, size_t n) | Copy memory (no overlap) |
memmove | void *memmove(void *dst, const void *src, size_t n) | Copy memory (overlap ok) |
memset | void *memset(void *s, int c, size_t n) | Fill memory |
memcmp | int memcmp(const void *a, const void *b, size_t n) | Compare memory |
strerror | char *strerror(int errnum) | Error number to string |
Pitfalls:
strncpydoes NOT null-terminate ifsrcis longer thann. Always:strncpy(dst, src, n-1); dst[n-1] = '\0';memcpywith overlapping regions is undefined behavior. Usememmove.strtokuses a static buffer and is not thread-safe. Usestrtok_r.strcmpreturns 0 for equal strings (not 1). The return value is the difference, not a boolean.
math.h -- Mathematics
| Function | Signature | Purpose |
|---|---|---|
sqrt | double sqrt(double x) | Square root |
pow | double pow(double base, double exp) | Power |
fabs | double fabs(double x) | Absolute value |
ceil | double ceil(double x) | Round up |
floor | double floor(double x) | Round down |
round | double round(double x) | Round to nearest |
log | double log(double x) | Natural logarithm |
log10 | double log10(double x) | Base-10 logarithm |
sin/cos/tan | double sin(double x) | Trig functions (radians) |
Pitfalls:
- Link with
-lmon Linux. Math functions are inlibm, notlibc. sqrt(-1)returns NaN, not an error. Checkisnan().
ctype.h -- Character Classification
| Function | Purpose | Example |
|---|---|---|
isalpha(c) | Letter? | isalpha('A') = true |
isdigit(c) | Digit? | isdigit('5') = true |
isalnum(c) | Letter or digit? | |
isspace(c) | Whitespace? | isspace(' ') = true |
isupper(c) / islower(c) | Case check | |
toupper(c) / tolower(c) | Case conversion |
Pitfalls:
- Argument must be
unsigned charorEOF. Passing a signedcharwith negative value is undefined behavior:isalpha((unsigned char)c).
errno.h -- Error Numbers
Common errno values:
| Value | Name | Meaning |
|---|---|---|
| 1 | EPERM | Operation not permitted |
| 2 | ENOENT | No such file or directory |
| 5 | EIO | I/O error |
| 9 | EBADF | Bad file descriptor |
| 11 | EAGAIN | Try again (non-blocking) |
| 12 | ENOMEM | Out of memory |
| 13 | EACCES | Permission denied |
| 17 | EEXIST | File exists |
| 22 | EINVAL | Invalid argument |
| 28 | ENOSPC | No space left on device |
| 32 | EPIPE | Broken pipe |
Pitfalls:
errnois only valid immediately after a failed call. Successful calls may or may not reset it.errnois thread-local in glibc (actually a macro that expands to(*__errno_location())).
signal.h -- Signal Handling
| Function | Signature | Purpose |
|---|---|---|
signal | void (*signal(int sig, void (*handler)(int)))(int) | Set signal handler |
raise | int raise(int sig) | Send signal to self |
kill | int kill(pid_t pid, int sig) | Send signal to process |
sigaction | int sigaction(int sig, const struct sigaction *act, ...) | Set handler (preferred) |
sigprocmask | int sigprocmask(int how, const sigset_t *set, ...) | Block/unblock signals |
Pitfalls:
signal()behavior varies across systems. Usesigaction()instead.- Signal handlers must only call async-signal-safe functions. No
printf, nomalloc.
time.h -- Time Functions
| Function | Signature | Purpose |
|---|---|---|
time | time_t time(time_t *tloc) | Current time (seconds) |
clock | clock_t clock(void) | CPU time used |
difftime | double difftime(time_t t1, time_t t0) | Time difference |
localtime | struct tm *localtime(const time_t *t) | Convert to local time |
gmtime | struct tm *gmtime(const time_t *t) | Convert to UTC |
strftime | size_t strftime(char *s, size_t max, const char *fmt, ...) | Format time string |
clock_gettime | int clock_gettime(clockid_t id, struct timespec *tp) | High-resolution time |
Pitfalls:
localtimereturns a pointer to a static buffer (not thread-safe). Uselocaltime_r.clock()measures CPU time, not wall time. Useclock_gettime(CLOCK_MONOTONIC)for benchmarks.
unistd.h -- POSIX System Interface
| Function | Signature | Purpose |
|---|---|---|
read | ssize_t read(int fd, void *buf, size_t count) | Read from fd |
write | ssize_t write(int fd, const void *buf, size_t count) | Write to fd |
close | int close(int fd) | Close fd |
lseek | off_t lseek(int fd, off_t offset, int whence) | Seek in fd |
fork | pid_t fork(void) | Create child process |
exec* | int execvp(const char *file, char *const argv[]) | Replace process |
pipe | int pipe(int pipefd[2]) | Create pipe |
dup2 | int dup2(int oldfd, int newfd) | Duplicate fd |
getpid | pid_t getpid(void) | Current PID |
getcwd | char *getcwd(char *buf, size_t size) | Current directory |
sleep | unsigned sleep(unsigned seconds) | Sleep |
usleep | int usleep(useconds_t usec) | Sleep (microseconds) |
unlink | int unlink(const char *path) | Delete file |
access | int access(const char *path, int mode) | Check file access |
Pitfalls:
readandwritemay transfer fewer bytes than requested (short reads/ writes). Always loop.closecan fail. Check the return value, especially for files opened withO_SYNC.fork+execis POSIX, not C standard. Useposix_spawnwhen possible.
sys/types.h -- System Data Types
| Type | Purpose |
|---|---|
pid_t | Process ID |
uid_t / gid_t | User/group ID |
off_t | File offset |
size_t | Unsigned size |
ssize_t | Signed size (for error returns) |
mode_t | File permissions |
dev_t | Device number |
ino_t | Inode number |
time_t | Time in seconds |
Pitfalls:
size_tis unsigned. Subtracting twosize_tvalues can wrap to a huge number. Usessize_tor cast carefully.off_tis 32-bit by default on 32-bit systems. Use_FILE_OFFSET_BITS=64for large file support.
Appendix B: Rust std for C Programmers
This appendix maps C standard library functions and patterns to their Rust equivalents. If you know the C function, find the Rust way to do the same thing.
I/O: stdio.h -> std::io, std::fs
| C | Rust | Notes |
|---|---|---|
printf("x=%d\n", x) | println!("x={x}") | Format macros |
fprintf(stderr, ...) | eprintln!(...) | Stderr output |
sprintf(buf, ...) | format!(...) | Returns String |
snprintf(buf, n, ...) | write!(buf, ...) | Into any Write impl |
fopen(path, "r") | File::open(path) | Returns Result<File> |
fopen(path, "w") | File::create(path) | Truncates existing |
fclose(f) | (automatic via Drop) | RAII closes the file |
fread(buf, 1, n, f) | f.read(&mut buf) | Read trait |
fwrite(buf, 1, n, f) | f.write_all(&buf) | Write trait |
fgets(buf, n, f) | reader.read_line(&mut s) | BufRead trait |
fseek(f, off, SEEK_SET) | f.seek(SeekFrom::Start(off)) | Seek trait |
fflush(f) | f.flush() | Write trait |
perror("msg") | eprintln!("msg: {e}") | Where e is the error |
#![allow(unused)] fn main() { // Reading a file in Rust (C equivalent: fopen + fread + fclose) use std::fs; let contents = fs::read_to_string("file.txt").expect("read failed"); // Line-by-line reading (C equivalent: fgets loop) use std::io::{self, BufRead}; let file = fs::File::open("file.txt").expect("open failed"); for line in io::BufReader::new(file).lines() { let line = line.expect("read line failed"); println!("{line}"); } }
Memory: stdlib.h -> Box, Vec, ownership
| C | Rust | Notes |
|---|---|---|
malloc(n) | Box::new(value) | Single heap object |
malloc(n * sizeof(T)) | Vec::with_capacity(n) | Dynamic array |
calloc(n, size) | vec![0; n] | Zeroed allocation |
realloc(p, new_size) | v.reserve(additional) | Vec grows automatically |
free(p) | (automatic via Drop) | RAII frees memory |
memcpy(dst, src, n) | dst.copy_from_slice(src) | Slices |
memmove(dst, src, n) | slice.copy_within(range, dest) | Overlapping |
memset(p, 0, n) | buf.fill(0) | Fill slice |
memcmp(a, b, n) | a == b or a.cmp(&b) | Slice comparison |
#![allow(unused)] fn main() { // Heap allocation (C: malloc + use + free) let boxed = Box::new(42); // heap-allocated i32 println!("{boxed}"); // auto-deref // freed automatically when boxed goes out of scope // Dynamic array (C: malloc + realloc pattern) let mut v: Vec<i32> = Vec::new(); v.push(1); v.push(2); v.push(3); // no manual realloc or free needed }
Strings: string.h -> String, &str
| C | Rust | Notes |
|---|---|---|
strlen(s) | s.len() | O(1) in Rust (stored length) |
strcpy(dst, src) | let dst = src.to_string() | New allocation |
strcat(dst, src) | dst.push_str(src) | Append to String |
strcmp(a, b) | a == b | Direct comparison |
strchr(s, c) | s.find(c) | Returns Option<usize> |
strstr(hay, needle) | hay.find(needle) | Returns Option<usize> |
strtok(s, delim) | s.split(delim) | Returns iterator |
strtol(s, NULL, 10) | s.parse::<i64>() | Returns Result |
atoi(s) | s.parse::<i32>().unwrap() | Panics on failure |
#![allow(unused)] fn main() { // String operations (C equivalents in comments) let s = String::from("hello"); // like strdup("hello") let len = s.len(); // like strlen(s) let upper = s.to_uppercase(); // no C equivalent in string.h // Splitting (C: strtok loop) let csv = "a,b,c,d"; let parts: Vec<&str> = csv.split(',').collect(); // parts = ["a", "b", "c", "d"] // Parsing numbers (C: strtol) let n: i64 = "42".parse().expect("not a number"); }
The &str vs String distinction
C: const char * (pointer to existing string) <--> &str
char *buf (owned, mutable buffer) <--> String
Rust rule: use &str for parameters, String for owned data.
Math: math.h -> std::f64, num traits
| C | Rust | Notes |
|---|---|---|
sqrt(x) | x.sqrt() or f64::sqrt(x) | Method on f64 |
pow(x, y) | x.powi(n) / x.powf(y) | Integer/float exponent |
fabs(x) | x.abs() | Method |
ceil(x) | x.ceil() | Method |
floor(x) | x.floor() | Method |
round(x) | x.round() | Method |
log(x) | x.ln() | Natural log |
log10(x) | x.log10() | Base-10 log |
sin(x) | x.sin() | Radians |
INFINITY | f64::INFINITY | Associated constant |
NAN | f64::NAN | Associated constant |
isnan(x) | x.is_nan() | Method |
No -lm flag needed. Math is built into the primitive types.
Process: unistd.h, stdlib.h -> std::process
| C | Rust | Notes |
|---|---|---|
fork() | (no direct equivalent) | Use Command::new() |
exec*() | Command::new(prog).exec() | Via std::os::unix |
system(cmd) | Command::new("sh").arg("-c").arg(cmd).status() | |
exit(n) | std::process::exit(n) | |
getpid() | std::process::id() | Returns u32 |
getenv("PATH") | std::env::var("PATH") | Returns Result<String> |
pipe() | Command::new(...).stdin(Stdio::piped()) | |
waitpid() | child.wait() | Returns ExitStatus |
#![allow(unused)] fn main() { // Running a child process (C: fork + exec + wait) use std::process::Command; let output = Command::new("ls") .arg("-la") .output() .expect("failed to execute"); println!("stdout: {}", String::from_utf8_lossy(&output.stdout)); println!("exit: {}", output.status); }
Networking: sys/socket.h -> std::net
| C | Rust | Notes |
|---|---|---|
socket(AF_INET, SOCK_STREAM, 0) | TcpListener::bind(addr) | Combined |
connect() | TcpStream::connect(addr) | |
bind() + listen() | TcpListener::bind(addr) | |
accept() | listener.accept() | Returns (TcpStream, SocketAddr) |
send(fd, buf, n, 0) | stream.write_all(&buf) | Write trait |
recv(fd, buf, n, 0) | stream.read(&mut buf) | Read trait |
close(fd) | (automatic via Drop) | |
inet_ntop() | addr.to_string() | Display trait |
htons(port) | (automatic) | Rust handles byte order |
#![allow(unused)] fn main() { // TCP echo server (C equivalent: socket + bind + listen + accept loop) use std::net::TcpListener; use std::io::{Read, Write}; let listener = TcpListener::bind("127.0.0.1:8080").unwrap(); for stream in listener.incoming() { let mut stream = stream.unwrap(); let mut buf = [0u8; 1024]; let n = stream.read(&mut buf).unwrap(); stream.write_all(&buf[..n]).unwrap(); } }
Error Handling: errno -> Result<T, E>
| C pattern | Rust equivalent |
|---|---|
Return -1 and set errno | Return Err(io::Error) |
Check if (ret < 0) | match or ? operator |
perror("msg") | eprintln!("msg: {e}") |
strerror(errno) | e.to_string() |
#![allow(unused)] fn main() { // The ? operator replaces C error-checking boilerplate use std::io; use std::fs; fn read_config() -> io::Result<String> { let contents = fs::read_to_string("/etc/myapp.conf")?; // returns Err on failure Ok(contents) } }
Key Traits Every C Programmer Should Know
Read and Write
#![allow(unused)] fn main() { use std::io::{Read, Write}; // Anything that implements Read can be read from: // File, TcpStream, &[u8], Stdin, ... fn process_input(mut reader: impl Read) -> io::Result<String> { let mut buf = String::new(); reader.read_to_string(&mut buf)?; Ok(buf) } }
Iterator
Replaces C for-loops over arrays and linked lists.
#![allow(unused)] fn main() { // C: for (int i = 0; i < n; i++) sum += arr[i]; let sum: i32 = arr.iter().sum(); // C: filter + transform loop let even_squares: Vec<i32> = (0..10) .filter(|x| x % 2 == 0) .map(|x| x * x) .collect(); }
Display and Debug
#![allow(unused)] fn main() { use std::fmt; // Display: for user-facing output (like a custom printf format) impl fmt::Display for MyType { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { write!(f, "MyType({})", self.value) } } // Debug: for developer output (derive it) #[derive(Debug)] struct Point { x: f64, y: f64 } let p = Point { x: 1.0, y: 2.0 }; println!("{p:?}"); // Point { x: 1.0, y: 2.0 } }
From and Into
#![allow(unused)] fn main() { // From<T>: conversion from one type to another // Replaces C's explicit casting and conversion functions let s: String = String::from("hello"); let n: i64 = i64::from(42i32); // Into<T>: the reverse direction (auto-derived from From) fn takes_string(s: String) { /* ... */ } takes_string("hello".into()); // &str -> String via Into }
Clone and Copy
#![allow(unused)] fn main() { // Copy: bitwise copy (like C assignment for simple types) // Applies to: integers, floats, bool, char, references let a: i32 = 5; let b = a; // copy, both valid // Clone: explicit deep copy (like manual malloc + memcpy) let s1 = String::from("hello"); let s2 = s1.clone(); // explicit deep copy // Both s1 and s2 are valid }
Concurrency: pthreads -> std::thread, std::sync
| C (pthreads) | Rust | Notes |
|---|---|---|
pthread_create() | thread::spawn(closure) | Closure captures data |
pthread_join() | handle.join() | Returns Result |
pthread_mutex_t | Mutex<T> | Data inside the mutex |
pthread_rwlock_t | RwLock<T> | |
pthread_cond_t | Condvar | |
sem_t | (use Mutex + Condvar) | No direct equivalent |
atomic_int | AtomicI32 | std::sync::atomic |
| (shared data) | Arc<T> | Thread-safe ref counting |
#![allow(unused)] fn main() { use std::sync::{Arc, Mutex}; use std::thread; let counter = Arc::new(Mutex::new(0)); let handles: Vec<_> = (0..4).map(|_| { let c = Arc::clone(&counter); thread::spawn(move || { let mut num = c.lock().unwrap(); *num += 1; }) }).collect(); for h in handles { h.join().unwrap(); } println!("Result: {}", *counter.lock().unwrap()); }
Time: time.h -> std::time
| C | Rust | Notes |
|---|---|---|
time(NULL) | SystemTime::now() | Wall clock |
clock_gettime(CLOCK_MONOTONIC) | Instant::now() | For benchmarks |
sleep(n) | thread::sleep(Duration::from_secs(n)) | |
difftime(t1, t0) | t1.duration_since(t0) | Returns Duration |
strftime(...) | (use chrono crate) | No built-in formatting |
#![allow(unused)] fn main() { use std::time::Instant; let start = Instant::now(); // ... work ... let elapsed = start.elapsed(); println!("Took {elapsed:?}"); }
Collections: Manual C -> std::collections
| C pattern | Rust type | Notes |
|---|---|---|
T array[N] | [T; N] | Fixed-size array |
T *arr + len | Vec<T> | Dynamic array |
| Hash table (hand-rolled) | HashMap<K, V> | |
| Binary tree (hand-rolled) | BTreeMap<K, V> | Sorted |
| Linked list (hand-rolled) | LinkedList<T> | (rarely used) |
| Bit set (manual) | HashSet<T> or bitflags crate | |
| Ring buffer (manual) | VecDeque<T> | Double-ended queue |
Quick Conversion Cheatsheet
C type --> Rust type
int i32
unsigned int u32
long i64 (on 64-bit Linux)
size_t usize
ssize_t isize
char u8 (byte) or char (Unicode)
char * &str (borrowed) or String (owned)
void * *const u8 / *mut u8 or &[u8]
NULL None (in Option<T>)
FILE * File (in std::fs)
bool (C99) bool
_Bool bool
Appendix C: Linux System Call Reference
This appendix provides a categorized reference of the key system calls covered in this book. For each syscall: the signature, purpose, common errno values, and the chapter where it's covered in detail.
File I/O
open
#include <fcntl.h>
int open(const char *pathname, int flags, ... /* mode_t mode */);
Opens or creates a file. Returns a file descriptor.
Flags: O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_TRUNC, O_APPEND,
O_NONBLOCK, O_CLOEXEC.
Errno: ENOENT (not found), EACCES (permission), EEXIST (with
O_EXCL), EMFILE (too many open fds).
Chapter: 28 (File Descriptors)
read / write
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
Read/write bytes. May transfer fewer than requested (short read/write).
Errno: EBADF (bad fd), EINTR (interrupted by signal), EAGAIN
(non-blocking and would block), EIO (hardware error).
Chapter: 28-29 (File I/O)
close
#include <unistd.h>
int close(int fd);
Closes a file descriptor. Releases kernel resources.
Errno: EBADF (not a valid fd), EIO (error flushing data).
Chapter: 28
lseek
#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);
Repositions file offset. whence: SEEK_SET, SEEK_CUR, SEEK_END.
Errno: EBADF, EINVAL (invalid whence), ESPIPE (fd is a pipe).
Chapter: 29
stat / fstat / lstat
#include <sys/stat.h>
int stat(const char *path, struct stat *buf);
int fstat(int fd, struct stat *buf);
int lstat(const char *path, struct stat *buf); /* doesn't follow symlinks */
Get file metadata: size, permissions, timestamps, inode.
Errno: ENOENT, EACCES, EBADF (fstat).
Chapter: 30
dup2
#include <unistd.h>
int dup2(int oldfd, int newfd);
Duplicates oldfd to newfd, closing newfd first if open.
Errno: EBADF, EMFILE.
Chapter: 31
sendfile
#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
Zero-copy transfer between file descriptors (in_fd must be mmappable).
Errno: EAGAIN, EBADF, EINVAL (fd types not supported).
Chapter: 52
mmap / munmap
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *addr, size_t length);
Map files or devices into memory. Also used for anonymous memory allocation.
Prot: PROT_READ, PROT_WRITE, PROT_EXEC.
Flags: MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS, MAP_FIXED.
Errno: EACCES, ENOMEM, EINVAL (bad alignment or length).
Chapter: 39
Process Management
fork
#include <unistd.h>
pid_t fork(void);
Creates a child process. Returns 0 in child, child PID in parent, -1 on error.
Errno: ENOMEM, EAGAIN (process limit).
Chapter: 32
exec family
#include <unistd.h>
int execve(const char *path, char *const argv[], char *const envp[]);
int execvp(const char *file, char *const argv[]);
int execlp(const char *file, const char *arg, ... /* (char *)NULL */);
Replace current process image. Does not return on success.
Errno: ENOENT, EACCES, ENOEXEC (bad format).
Chapter: 33
wait / waitpid
#include <sys/wait.h>
pid_t wait(int *wstatus);
pid_t waitpid(pid_t pid, int *wstatus, int options);
Wait for child process state change. Use WIFEXITED, WEXITSTATUS macros
to interpret status.
Options: WNOHANG (don't block), WUNTRACED.
Errno: ECHILD (no children), EINTR.
Chapter: 33
_exit / exit_group
#include <unistd.h>
void _exit(int status);
Terminate process immediately (no atexit handlers, no stdio flush).
Chapter: 32
getpid / getppid
#include <unistd.h>
pid_t getpid(void);
pid_t getppid(void);
Get current process ID or parent process ID. Never fails.
Chapter: 32
Signal Handling
sigaction
#include <signal.h>
int sigaction(int signum, const struct sigaction *act,
struct sigaction *oldact);
Set signal handler with full control over flags and signal mask.
Errno: EINVAL (invalid signal or attempt to handle SIGKILL/SIGSTOP).
Chapter: 35-36
kill
#include <signal.h>
int kill(pid_t pid, int sig);
Send signal to process or process group.
Errno: ESRCH (no such process), EPERM (permission).
Chapter: 35
sigprocmask
#include <signal.h>
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);
Block or unblock signals. how: SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK.
Chapter: 36
signalfd
#include <sys/signalfd.h>
int signalfd(int fd, const sigset_t *mask, int flags);
Receive signals via a file descriptor (pollable).
Errno: EINVAL, ENOMEM.
Chapter: 37
Memory Management
brk / sbrk
#include <unistd.h>
int brk(void *addr);
void *sbrk(intptr_t increment);
Adjust the program break (heap end). Rarely used directly; malloc calls
these internally.
Chapter: 38
mprotect
#include <sys/mman.h>
int mprotect(void *addr, size_t len, int prot);
Change protection on a memory region.
Errno: EACCES, EINVAL, ENOMEM.
Chapter: 39
madvise
#include <sys/mman.h>
int madvise(void *addr, size_t length, int advice);
Advise kernel on memory usage patterns. MADV_SEQUENTIAL, MADV_DONTNEED,
MADV_HUGEPAGE.
Chapter: 39
IPC (Inter-Process Communication)
pipe / pipe2
#include <unistd.h>
int pipe(int pipefd[2]);
int pipe2(int pipefd[2], int flags); /* O_CLOEXEC, O_NONBLOCK */
Create a unidirectional data channel. pipefd[0] for reading, pipefd[1]
for writing.
Errno: EMFILE, ENFILE.
Chapter: 34
socketpair
#include <sys/socket.h>
int socketpair(int domain, int type, int protocol, int sv[2]);
Create a pair of connected sockets (bidirectional pipe).
Chapter: 34
shmget / shmat / shmdt
#include <sys/shm.h>
int shmget(key_t key, size_t size, int shmflg);
void *shmat(int shmid, const void *shmaddr, int shmflg);
int shmdt(const void *shmaddr);
System V shared memory. Prefer mmap(MAP_SHARED) for new code.
Chapter: 40
Networking
socket
#include <sys/socket.h>
int socket(int domain, int type, int protocol);
Create a socket. Returns a file descriptor.
Domain: AF_INET, AF_INET6, AF_UNIX, AF_NETLINK.
Type: SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW.
Errno: EAFNOSUPPORT, EMFILE, ENOMEM.
Chapter: 47
bind
#include <sys/socket.h>
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
Assign address to socket.
Errno: EADDRINUSE, EACCES (privileged port), EINVAL.
Chapter: 47
listen / accept
int listen(int sockfd, int backlog);
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
Listen for connections and accept them.
Errno: EAGAIN (non-blocking), EMFILE, EINTR.
Chapter: 47-48
connect
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
Connect to a remote address.
Errno: ECONNREFUSED, ETIMEDOUT, ENETUNREACH, EINPROGRESS
(non-blocking).
Chapter: 47
send / recv / sendto / recvfrom
ssize_t send(int sockfd, const void *buf, size_t len, int flags);
ssize_t recv(int sockfd, void *buf, size_t len, int flags);
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
const struct sockaddr *dest, socklen_t addrlen);
ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
struct sockaddr *src, socklen_t *addrlen);
Send/receive data on sockets. sendto/recvfrom for UDP.
Flags: MSG_DONTWAIT, MSG_PEEK, MSG_NOSIGNAL.
Chapter: 47-48
epoll_create1 / epoll_ctl / epoll_wait
#include <sys/epoll.h>
int epoll_create1(int flags);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
Scalable I/O event notification. op: EPOLL_CTL_ADD, EPOLL_CTL_MOD,
EPOLL_CTL_DEL.
Chapter: 49
Device Interaction
ioctl
#include <sys/ioctl.h>
int ioctl(int fd, unsigned long request, ...);
Device-specific control operations.
Errno: ENOTTY (wrong device type), EINVAL (bad request), EFAULT
(bad pointer).
Chapter: 54
Thread-Related
clone (underlying syscall for pthread_create)
#include <sched.h>
int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...);
Create a new process/thread with fine-grained sharing control.
Flags: CLONE_VM, CLONE_FILES, CLONE_SIGHAND, CLONE_THREAD.
Chapter: 41
futex
#include <linux/futex.h>
long futex(uint32_t *uaddr, int futex_op, uint32_t val, ...);
Fast user-space locking primitive. Used by pthread mutex implementation.
Operations: FUTEX_WAIT, FUTEX_WAKE.
Chapter: 44
Miscellaneous
getrandom
#include <sys/random.h>
ssize_t getrandom(void *buf, size_t buflen, unsigned int flags);
Cryptographically secure random bytes.
Flags: GRND_NONBLOCK, GRND_RANDOM.
Chapter: Referenced in security discussions
prctl
#include <sys/prctl.h>
int prctl(int option, unsigned long arg2, ...);
Process-specific operations: set name, enable seccomp, control dumpability.
seccomp
#include <linux/seccomp.h>
int seccomp(unsigned int operation, unsigned int flags, void *args);
Restrict available system calls for sandboxing.
Syscall Invocation
All the above are libc wrappers. The actual syscall instruction:
#include <sys/syscall.h>
#include <unistd.h>
/* Direct syscall (bypassing libc) */
long result = syscall(SYS_write, STDOUT_FILENO, "hello\n", 6);
On x86-64, this becomes:
mov rax, 1 ; syscall number for write
mov rdi, 1 ; fd = stdout
lea rsi, [msg] ; buffer
mov rdx, 6 ; count
syscall
Registers: rax = syscall number, rdi = arg1, rsi = arg2, rdx = arg3,
r10 = arg4, r8 = arg5, r9 = arg6. Return value in rax.
Appendix D: GDB Quick Reference
GDB is the GNU Debugger. It's the primary tool for finding segfaults, inspecting data structures, and understanding what your C and Rust programs actually do at runtime.
Starting GDB
# Compile with debug symbols
$ gcc -g -O0 -o myapp myapp.c
$ rustc -g -o myapp myapp.rs
# Start GDB
$ gdb ./myapp
# Start with arguments
$ gdb --args ./myapp arg1 arg2
# Attach to running process
$ gdb -p <pid>
# Analyze a core dump
$ gdb ./myapp core
# Start in TUI (text user interface) mode
$ gdb -tui ./myapp
Essential Commands
Running
| Command | Short | Action |
|---|---|---|
run | r | Start the program |
run arg1 arg2 | r arg1 arg2 | Start with arguments |
continue | c | Continue after breakpoint |
kill | Kill the running program | |
quit | q | Exit GDB |
Breakpoints
| Command | Short | Action |
|---|---|---|
break main | b main | Break at function |
break file.c:42 | b file.c:42 | Break at line |
break *0x4005a0 | b *0x4005a0 | Break at address |
break func if x > 10 | Conditional breakpoint | |
tbreak main | tb main | Temporary (one-shot) breakpoint |
info breakpoints | i b | List breakpoints |
delete 1 | d 1 | Delete breakpoint #1 |
delete | d | Delete all breakpoints |
disable 1 | dis 1 | Disable breakpoint #1 |
enable 1 | en 1 | Enable breakpoint #1 |
Stepping
| Command | Short | Action |
|---|---|---|
next | n | Step over (next line) |
step | s | Step into function |
finish | fin | Run until current function returns |
until 50 | u 50 | Run until line 50 |
nexti | ni | Step one instruction (over calls) |
stepi | si | Step one instruction (into calls) |
Examining Variables
| Command | Action |
|---|---|
print x | Print variable x |
print *ptr | Dereference pointer |
print arr[5] | Array element |
print sizeof(x) | Size of variable |
print/x val | Print in hex |
print/t val | Print in binary |
print/d val | Print as decimal |
print/c val | Print as character |
print (struct foo *)ptr | Cast and print |
display x | Print x every time we stop |
undisplay 1 | Remove display #1 |
info locals | All local variables |
info args | Function arguments |
ptype var | Show type of variable |
whatis var | Short type description |
Examining Memory
x/FMT ADDRESS
Format: x/[count][format][size]
count: number of items
format: x (hex), d (decimal), u (unsigned), o (octal),
t (binary), c (char), s (string), i (instruction)
size: b (byte), h (halfword=2), w (word=4), g (giant=8)
| Command | Action |
|---|---|
x/16xb ptr | 16 bytes in hex |
x/4xw ptr | 4 words (32-bit) in hex |
x/s str | Print as C string |
x/10i $pc | 10 instructions at PC |
x/gx &var | 8-byte value in hex |
Example session:
(gdb) x/32xb buffer
0x7fffffffe400: 0x48 0x65 0x6c 0x6c 0x6f 0x00 0x00 0x00
0x7fffffffe408: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe410: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe418: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Watchpoints
Watchpoints stop when a variable's value changes.
| Command | Action |
|---|---|
watch x | Break when x changes |
watch *0x601040 | Break when memory at address changes |
rwatch x | Break when x is read |
awatch x | Break on read or write |
info watchpoints | List watchpoints |
Watchpoints are hardware-assisted (limited number, typically 4) or software-assisted (very slow).
Backtrace and Stack
| Command | Short | Action |
|---|---|---|
backtrace | bt | Show call stack |
backtrace full | bt full | Stack with local vars |
frame 3 | f 3 | Select stack frame #3 |
up | Move up one frame | |
down | Move down one frame | |
info frame | i f | Detailed frame info |
info stack | i s | Stack summary |
Threads
| Command | Action |
|---|---|
info threads | List all threads |
thread 2 | Switch to thread 2 |
thread apply all bt | Backtrace all threads |
thread apply all print var | Print var in all threads |
set scheduler-locking on | Only run current thread |
set scheduler-locking off | Run all threads |
TUI Mode
TUI (Text User Interface) shows source code alongside the command line.
# Start in TUI mode
$ gdb -tui ./myapp
# Toggle TUI in running session
(gdb) tui enable
(gdb) tui disable
# Or press Ctrl+X then A to toggle
# Switch layouts
(gdb) layout src # source code
(gdb) layout asm # assembly
(gdb) layout split # source + assembly
(gdb) layout regs # registers
Refresh the screen if it gets corrupted: Ctrl+L
Common Workflows
Finding a Segfault
$ gcc -g -O0 -o buggy buggy.c
$ gdb ./buggy
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400556 in process_data (ptr=0x0) at buggy.c:15
15 *ptr = 42;
(gdb) bt
#0 0x0000000000400556 in process_data (ptr=0x0) at buggy.c:15
#1 0x0000000000400589 in main () at buggy.c:22
(gdb) print ptr
$1 = (int *) 0x0
(gdb) frame 1
#1 0x0000000000400589 in main () at buggy.c:22
(gdb) info locals
data = 0x0
The backtrace tells you: process_data was called with a NULL pointer from
main at line 22.
Inspecting a Linked List
(gdb) print *head
$1 = {value = 10, next = 0x602040}
(gdb) print *head->next
$2 = {value = 20, next = 0x602060}
(gdb) print *head->next->next
$3 = {value = 30, next = 0x0}
Or use a loop:
(gdb) set $node = head
(gdb) while $node
> print $node->value
> set $node = $node->next
> end
$4 = 10
$5 = 20
$6 = 30
Debugging Multi-Threaded Programs
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f... "main" in main () at server.c:45
2 Thread 0x7f... "worker" in handle_client () at server.c:28
3 Thread 0x7f... "worker" in recv () from /lib/...
(gdb) thread 2
(gdb) bt
#0 handle_client (arg=0x602010) at server.c:28
#1 0x00007ffff... in start_thread ()
(gdb) thread apply all bt
Finding Memory Corruption
Use watchpoints to find what's overwriting a variable:
(gdb) break main
(gdb) run
(gdb) watch my_variable
(gdb) continue
Hardware watchpoint 2: my_variable
Old value = 42
New value = 0
0x000000000040066a in corrupt_function () at bug.c:33
Examining struct Layout
(gdb) ptype struct sockaddr_in
type = struct sockaddr_in {
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
unsigned char sin_zero[8];
}
(gdb) print sizeof(struct sockaddr_in)
$1 = 16
Rust-Specific GDB Tips
Compiling Rust for GDB
# Debug build (default, includes symbols)
$ cargo build
$ gdb ./target/debug/myapp
# Release with debug info
# In Cargo.toml:
# [profile.release]
# debug = true
$ cargo build --release
$ gdb ./target/release/myapp
Rust Type Names in GDB
Rust types appear with their full path:
(gdb) ptype v
type = alloc::vec::Vec<i32, alloc::alloc::Global>
Printing Rust Types
(gdb) print v
$1 = Vec(size=3) = {1, 2, 3}
(gdb) print s
$2 = "hello world"
(gdb) print opt
$3 = core::option::Option<i32>::Some(42)
(gdb) print result
$4 = core::result::Result<i32, ...>::Ok(10)
GDB has pretty-printers for common Rust types (Vec, String, Option, Result, HashMap) when Rust's GDB extensions are installed.
Rust Mangled Names
Rust function names are mangled. Use:
(gdb) break myapp::main
(gdb) break myapp::module::function_name
Or with tab completion:
(gdb) break myapp::<TAB><TAB>
Printing Enum Variants
(gdb) print my_enum
$1 = myapp::State::Running(42)
Unwinding Through Panics
(gdb) break rust_panic
(gdb) run
(gdb) bt
#0 std::panicking::begin_panic ()
#1 myapp::risky_function () at src/main.rs:15
#2 myapp::main () at src/main.rs:8
GDB Configuration
Put frequently used settings in ~/.gdbinit:
# ~/.gdbinit
set print pretty on
set print array on
set pagination off
set history save on
set history filename ~/.gdb_history
set history size 10000
set disassembly-flavor intel
# Rust pretty-printers (path varies by installation)
# python
# import gdb
# gdb.execute('source /path/to/rust-gdb-pretty-printers.py')
# end
Quick Reference Card
+------------------+-------+-------------------------------+
| Action | Short | Full Command |
+------------------+-------+-------------------------------+
| Run | r | run [args] |
| Break | b | break location |
| Continue | c | continue |
| Step over | n | next |
| Step into | s | step |
| Finish function | fin | finish |
| Print variable | p | print expression |
| Examine memory | | x/FMT address |
| Backtrace | bt | backtrace |
| List source | l | list |
| Info breakpoints | i b | info breakpoints |
| Info locals | i lo | info locals |
| Info threads | i th | info threads |
| Quit | q | quit |
+------------------+-------+-------------------------------+
Appendix E: Glossary
Definitions of key terms used throughout this book.
ABI (Application Binary Interface)
The low-level contract between compiled code: calling conventions, register
usage, struct layout, name mangling. If two object files follow the same ABI,
they can be linked together. C has a stable ABI on most platforms; Rust does
not (use extern "C" for FFI).
Alignment
The requirement that data sit at memory addresses that are multiples of a
certain value (typically the type's size). A uint32_t usually requires
4-byte alignment. Misaligned access is undefined behavior in C on some
architectures and always a performance penalty.
Arena Allocator A memory allocator that bumps a pointer forward for each allocation and frees all allocations at once. Zero fragmentation, zero per-object overhead, very fast. Useful for parsers, request handlers, and game loops. See Chapter 51.
Async (Asynchronous I/O)
A programming model where I/O operations don't block the calling thread.
Instead, the program registers interest in events and is notified when they
complete. Linux provides epoll, io_uring, and aio for async I/O. Rust's
async/await builds on epoll via runtimes like tokio.
Atomic Operation
An operation that completes indivisibly -- no other thread can see it
half-done. Used for lock-free synchronization. C provides <stdatomic.h>;
Rust provides std::sync::atomic. See Chapter 52.
Borrow Checker Rust's compile-time system that enforces ownership rules: one mutable reference or any number of shared references, but not both simultaneously. Prevents data races and use-after-free at compile time.
Callback
A function pointer passed to another function, to be called later. Used
extensively in C APIs (signal handlers, thread start routines, comparators
for qsort). In Rust, closures serve the same purpose with type safety.
Condition Variable
A synchronization primitive that allows threads to wait until a condition
becomes true. Always used with a mutex. C: pthread_cond_t. Rust:
std::sync::Condvar. See Chapter 45.
Container_of
A C macro (used extensively in the Linux kernel) that takes a pointer to a
struct member and returns a pointer to the containing struct. Relies on
offsetof. Essential for intrusive data structures like list_head.
Copy Semantics
In C, assignment copies all bytes of a struct (shallow copy). In Rust, types
that implement the Copy trait are copied on assignment; other types are
moved (ownership transfer). See Chapter 9.
Daemon
A background process with no controlling terminal. Created by forking, calling
setsid, and closing stdin/stdout/stderr. System services (sshd, nginx,
systemd) are daemons. See Chapter 34.
Data Race Two or more threads access the same memory location concurrently, at least one is a write, and there's no synchronization. Undefined behavior in both C and Rust. Rust's type system prevents data races at compile time.
Deadlock A situation where two or more threads are each waiting for the other to release a resource. Thread A holds lock 1 and waits for lock 2; Thread B holds lock 2 and waits for lock 1. Neither can proceed.
DMA (Direct Memory Access) Hardware that transfers data between devices and memory without CPU involvement. Kernel drivers set up DMA buffers and descriptors; the hardware reads/writes directly. Requires careful cache management and memory alignment.
Endianness
Byte order within multi-byte values. Big-endian: most significant byte first
(network byte order). Little-endian: least significant byte first (x86, ARM
default). Use htons/ntohs for conversion. See Chapter 11.
epoll
Linux's scalable I/O event notification mechanism. Monitors many file
descriptors efficiently with O(1) per event. Three syscalls: epoll_create1,
epoll_ctl, epoll_wait. See Chapter 49.
errno A thread-local integer set by system calls and library functions on failure. Check it immediately after a call returns an error. Reset it before calling functions that may or may not set it. See Appendix A.
File Descriptor An integer index into the kernel's per-process table of open files, sockets, pipes, and devices. 0 = stdin, 1 = stdout, 2 = stderr. The fundamental I/O abstraction in Unix. See Chapter 28.
Fork
The fork() syscall creates a new process by duplicating the calling process.
The child gets a copy of the parent's memory (via copy-on-write). Returns 0
in the child, the child's PID in the parent. See Chapter 32.
Futex (Fast Userspace Mutex)
A Linux-specific synchronization primitive. Uncontended operations stay in user
space (fast). Contended operations trap to the kernel to sleep. The building
block for pthread_mutex_t. See Chapter 44.
GFP Flags
Kernel memory allocation flags (GFP_KERNEL, GFP_ATOMIC, etc.) that tell
the allocator what context the allocation occurs in. Using the wrong flag
(e.g., GFP_KERNEL in interrupt context) causes deadlocks.
Inode The kernel data structure that represents a file on disk. Contains metadata (permissions, timestamps, size, block pointers) but not the filename. Multiple filenames (hard links) can point to the same inode.
ioctl (I/O Control)
A catch-all system call for device-specific operations that don't fit the
read/write model. Takes a file descriptor, a request number, and an
optional argument. See Chapter 54.
IPC (Inter-Process Communication) Mechanisms for processes to exchange data: pipes, FIFOs, Unix domain sockets, shared memory, message queues, signals, netlink. See Chapters 34, 40.
Lifetime (Rust)
A compile-time annotation that tracks how long a reference is valid. Written as
'a in type signatures. The borrow checker uses lifetimes to prevent dangling
references. No runtime cost.
mmap (Memory Map)
The mmap() syscall maps files or devices into a process's address space.
Also used for anonymous memory allocation and shared memory between processes.
See Chapter 39.
Move Semantics
In Rust, assigning a value to a new variable transfers ownership. The original
variable becomes invalid. Prevents double-free and use-after-free. Types that
implement Copy are exempt (they're bitwise copied instead).
Mutex (Mutual Exclusion)
A synchronization primitive that ensures only one thread accesses a critical
section at a time. C: pthread_mutex_t. Rust: std::sync::Mutex<T> (data
is inside the mutex, enforced by the type system). See Chapter 44.
Netlink A socket-based IPC mechanism between the Linux kernel and user-space processes. Used for network configuration, device events, and other kernel communication. See Chapter 55.
Opaque Type A type whose internal layout is hidden from users. In C, declared as a forward struct declaration with access only through function pointers. In Rust, achieved with module visibility. See Chapter 20.
Ownership (Rust) Rust's core memory management concept: every value has exactly one owner. When the owner goes out of scope, the value is dropped (freed). Ownership can be transferred (moved) or temporarily lent (borrowed).
POSIX (Portable Operating System Interface) The IEEE standard defining the API for Unix-like operating systems. Covers system calls, shell utilities, and C library functions. Linux is "mostly POSIX-compliant."
Race Condition A bug where the program's behavior depends on the timing of events (typically thread scheduling). Includes data races but also higher-level logic races (TOCTOU: time-of-check-to-time-of-use).
RAII (Resource Acquisition Is Initialization)
A pattern where resources (memory, files, locks) are tied to object lifetime.
Acquired in the constructor, released in the destructor. C++ and Rust use this
heavily. C requires manual cleanup (often with goto chains).
Reactor Pattern An event-driven design where a central event loop waits for I/O events and dispatches them to handlers. Used by epoll-based servers, tokio, and most high-performance network servers. See Chapter 49.
Semaphore
A synchronization primitive with a counter. wait() decrements (blocking if
zero); post() increments (waking a waiter). Binary semaphore acts like a
mutex. Counting semaphore limits concurrent access. C: sem_t. See Chapter 45.
Signal
An asynchronous notification sent to a process. Examples: SIGTERM (terminate),
SIGSEGV (segfault), SIGINT (Ctrl+C), SIGCHLD (child exited). Handled
by signal handlers or signalfd. See Chapters 35-37.
Slab Allocator
The Linux kernel's pool allocator for fixed-size objects. Uses kmem_cache
structures. Pre-allocates pages, divides them into same-size slots, and
maintains free lists. The kernel equivalent of the pool allocator in Chapter 51.
Socket An endpoint for network communication. Identified by a file descriptor. Types: stream (TCP), datagram (UDP), raw, Unix domain. See Chapters 47-49.
Spinlock A lock that busy-waits (spins) instead of sleeping. Appropriate when the expected hold time is very short and sleeping would be more expensive than spinning. Used in the kernel for interrupt-context synchronization.
Syscall (System Call)
The interface between user-space programs and the kernel. Invoked via the
syscall instruction (x86-64). Each has a number (e.g., write = 1 on
x86-64). libc wraps syscalls in C functions. See Appendix C.
Thread-Safe Code that can be called simultaneously from multiple threads without data corruption. Achieved through synchronization (mutexes, atomics) or by avoiding shared mutable state.
TOCTOU (Time-of-Check-to-Time-of-Use)
A race condition where the state checked by a program changes before the
program acts on it. Example: checking file permissions with access(), then
opening the file -- an attacker can swap the file in between.
Trait (Rust)
An interface definition. Like a C vtable or a Java interface, but resolved at
compile time (static dispatch) or runtime (dynamic dispatch with dyn Trait).
Key traits: Read, Write, Iterator, Display, Debug, Clone, Copy.
Undefined Behavior (UB) Code whose behavior is not defined by the language standard. The compiler may do anything: crash, produce wrong results, appear to work, or format your hard drive. Common C causes: null dereference, buffer overflow, signed integer overflow, use-after-free, data races.
Volatile
A C qualifier that prevents the compiler from optimizing away or reordering
accesses to a variable. Used for memory-mapped I/O registers. Does NOT provide
atomicity or prevent CPU reordering. Not the same as atomic.
VFS (Virtual File System)
The kernel's abstraction layer that provides a uniform file interface over
different filesystems (ext4, procfs, sysfs, tmpfs). All file operations go
through VFS, which dispatches to the specific filesystem's file_operations.
Vtable (Virtual Function Table)
A table of function pointers used for runtime polymorphism. In C, built
manually as a struct of function pointers. In Rust, generated automatically
for dyn Trait objects. In the kernel, file_operations and
platform_driver are vtables.
Zero-Copy
Transferring data without copying it between buffers. Techniques: sendfile,
splice, mmap, io_uring, and in-place parsing. Eliminates CPU and cache
overhead of memcpy. See Chapter 52.