Structs, Enums, and Unions

Primitive types only get you so far. Real programs model real things: a network packet has a source, a destination, and a payload. A device register has named bit fields. This chapter covers the composite types that make systems programming possible.

C Structs

A struct groups related values under one name.

/* struct_basic.c */
#include <stdio.h>
#include <math.h>

typedef struct {
    double x;
    double y;
} Point;

double distance(Point a, Point b)
{
    double dx = a.x - b.x;
    double dy = a.y - b.y;
    return sqrt(dx * dx + dy * dy);
}

int main(void)
{
    Point a = { .x = 0.0, .y = 0.0 };
    Point b = { .x = 3.0, .y = 4.0 };
    printf("distance = %f\n", distance(a, b));
    return 0;
}
$ gcc -Wall -std=c17 -o struct_basic struct_basic.c -lm && ./struct_basic
distance = 5.000000

typedef lets you write Point instead of struct Point everywhere. The .x syntax in the initializer is a C99 designated initializer.

Driver Prep: The Linux kernel uses structs constantly: struct file, struct inode, struct task_struct, struct sk_buff. Understanding struct layout and passing is foundational.

Rust Structs

Named-field struct

// struct_basic.rs
struct Point {
    x: f64,
    y: f64,
}

fn distance(a: &Point, b: &Point) -> f64 {
    let dx = a.x - b.x;
    let dy = a.y - b.y;
    (dx * dx + dy * dy).sqrt()
}

fn main() {
    let a = Point { x: 0.0, y: 0.0 };
    let b = Point { x: 3.0, y: 4.0 };
    println!("distance = {}", distance(&a, &b));
}

No typedef needed. The struct name is the type name directly.

Tuple struct and unit struct

// tuple_struct.rs
struct Color(u8, u8, u8);
struct Meters(f64);
struct Marker;    // unit struct, zero-sized

fn main() {
    let red = Color(255, 0, 0);
    println!("R={}, G={}, B={}", red.0, red.1, red.2);

    let height = Meters(1.82);
    println!("height = {} m", height.0);

    let _m = Marker;
    println!("size of Marker = {}", std::mem::size_of::<Marker>());
}

Tuple structs are useful for the "newtype" pattern -- wrapping a value in a distinct type for type safety. Unit structs take no memory at runtime.

Methods (impl blocks in Rust)

Rust attaches methods to structs via impl. C has no methods; you pass the struct to a function manually.

C: functions that take a struct pointer

/* rect_c.c */
#include <stdio.h>

typedef struct {
    double width;
    double height;
} Rect;

double rect_area(const Rect *r)
{
    return r->width * r->height;
}

int main(void)
{
    Rect r = { .width = 5.0, .height = 3.0 };
    printf("area = %f\n", rect_area(&r));
    return 0;
}

Rust: methods with self

// rect_rust.rs
struct Rect {
    width: f64,
    height: f64,
}

impl Rect {
    fn area(&self) -> f64 {
        self.width * self.height
    }

    fn new(width: f64, height: f64) -> Rect {
        Rect { width, height }
    }
}

fn main() {
    let r = Rect::new(5.0, 3.0);
    println!("area = {}", r.area());
}
C struct "method" call:      rect_area(&r)
Rust method call:            r.area()

Under the hood, both pass a pointer to the struct.
  &self   == const Rect*
  &mut self == Rect*

Try It: Add a scale method to the Rust Rect that takes &mut self and a factor: f64, and multiplies both width and height by the factor.

C Enums

In C, enums are just named integer constants.

/* enum_c.c */
#include <stdio.h>

enum Direction { NORTH = 0, SOUTH = 1, EAST = 2, WEST = 3 };

const char *direction_name(enum Direction d)
{
    switch (d) {
    case NORTH: return "North";
    case SOUTH: return "South";
    case EAST:  return "East";
    case WEST:  return "West";
    default:    return "Unknown";
    }
}

int main(void)
{
    enum Direction d = EAST;
    printf("direction = %s (%d)\n", direction_name(d), d);

    /* C allows any integer -- no type safety */
    enum Direction invalid = 99;
    printf("invalid = %s (%d)\n", direction_name(invalid), invalid);

    return 0;
}

Caution: C enums provide no type safety. You can assign any integer to an enum variable. The default case is your only defense.

Rust Enums: Algebraic Data Types

Rust enums are fundamentally more powerful. Each variant can carry data.

Simple enum

// enum_simple.rs
#[derive(Debug)]
enum Direction {
    North, South, East, West,
}

fn direction_name(d: &Direction) -> &str {
    match d {
        Direction::North => "North",
        Direction::South => "South",
        Direction::East  => "East",
        Direction::West  => "West",
    }
}

fn main() {
    let d = Direction::East;
    println!("direction = {} ({:?})", direction_name(&d), d);
    // let invalid: Direction = 99;  // does NOT compile
}

The match is exhaustive. Add a fifth variant and the compiler forces you to handle it everywhere.

Enums with data

// enum_data.rs
#[derive(Debug)]
enum Shape {
    Circle(f64),
    Rectangle(f64, f64),
    Triangle { base: f64, height: f64 },
}

fn area(shape: &Shape) -> f64 {
    match shape {
        Shape::Circle(r)                 => std::f64::consts::PI * r * r,
        Shape::Rectangle(w, h)           => w * h,
        Shape::Triangle { base, height } => 0.5 * base * height,
    }
}

fn main() {
    let shapes = vec![
        Shape::Circle(5.0),
        Shape::Rectangle(4.0, 6.0),
        Shape::Triangle { base: 3.0, height: 8.0 },
    ];

    for s in &shapes {
        println!("{:?} -> area = {:.2}", s, area(s));
    }
}
$ rustc enum_data.rs && ./enum_data
Circle(5.0) -> area = 78.54
Rectangle(4.0, 6.0) -> area = 24.00
Triangle { base: 3.0, height: 8.0 } -> area = 12.00

This is impossible in C with plain enums. You would need a struct with a tag and union.

Option and Result

Rust's standard library uses enums for two critical types.

Option replaces null pointers:

// option_demo.rs
fn find_first_negative(nums: &[i32]) -> Option<usize> {
    for (i, &n) in nums.iter().enumerate() {
        if n < 0 { return Some(i); }
    }
    None
}

fn main() {
    let data = [10, 20, -5, 30];
    match find_first_negative(&data) {
        Some(idx) => println!("first negative at index {}", idx),
        None      => println!("no negatives found"),
    }
}

Result replaces error codes:

// result_demo.rs
use std::num::ParseIntError;

fn parse_and_double(s: &str) -> Result<i32, ParseIntError> {
    let n: i32 = s.parse()?;
    Ok(n * 2)
}

fn main() {
    match parse_and_double("21") {
        Ok(val)  => println!("success: {}", val),
        Err(e)   => println!("error: {}", e),
    }
    match parse_and_double("abc") {
        Ok(val)  => println!("success: {}", val),
        Err(e)   => println!("error: {}", e),
    }
}

Rust Note: Option<T> is enum { Some(T), None }. Result<T, E> is enum { Ok(T), Err(E) }. These are ordinary enums with generics. The power comes from match and the ? operator.

C Unions

A union stores different types in the same memory. Only one field is valid at a time.

/* union_c.c */
#include <stdio.h>
#include <string.h>

typedef struct {
    enum { INT_VAL, FLOAT_VAL, STR_VAL } tag;
    union {
        int    i;
        double f;
        char   s[32];
    } data;
} Value;

void print_value(const Value *v)
{
    switch (v->tag) {
    case INT_VAL:   printf("int: %d\n", v->data.i);   break;
    case FLOAT_VAL: printf("float: %f\n", v->data.f); break;
    case STR_VAL:   printf("str: %s\n", v->data.s);   break;
    }
}

int main(void)
{
    Value a = { .tag = INT_VAL,   .data.i = 42 };
    Value b = { .tag = FLOAT_VAL, .data.f = 3.14 };
    Value c = { .tag = STR_VAL };
    strncpy(c.data.s, "hello", sizeof(c.data.s) - 1);

    print_value(&a);
    print_value(&b);
    print_value(&c);

    return 0;
}
Union memory layout:

  +------+------+------+------+------+------+------+------+
  |              shared memory (32 bytes)                   |
  +------+------+------+------+------+------+------+------+

  When tag == INT_VAL:    first 4 bytes hold int
  When tag == FLOAT_VAL:  first 8 bytes hold double
  When tag == STR_VAL:    all 32 bytes hold char[32]

  sizeof(union) = size of largest member = 32

Caution: Reading the wrong union member is undefined behavior in C. The tag field is a convention, not an enforcement. There is no runtime check.

Driver Prep: Type punning through unions is common in low-level code -- reading hardware registers, parsing binary protocols. The Linux kernel uses unions in structures like union sigval and union nf_inet_addr.

Rust's Safe Alternative to Unions

Rust enums with data are tagged unions with the tag built in and enforced by the compiler:

// tagged_union_rust.rs
enum Value {
    Int(i32),
    Float(f64),
    Str(String),
}

fn print_value(v: &Value) {
    match v {
        Value::Int(i)   => println!("int: {}", i),
        Value::Float(f) => println!("float: {}", f),
        Value::Str(s)   => println!("str: {}", s),
    }
}

fn main() {
    let values = vec![
        Value::Int(42),
        Value::Float(3.14),
        Value::Str(String::from("hello")),
    ];
    for v in &values { print_value(v); }
}

For low-level type punning, Rust has raw union types (access requires unsafe):

// raw_union.rs
union FloatBits {
    f: f32,
    u: u32,
}

fn main() {
    let fb = FloatBits { f: 1.0 };
    let bits = unsafe { fb.u };
    println!("float 1.0 as bits: 0x{:08X}", bits);
}

Rust Note: Raw Rust unions exist primarily for C interop (FFI). In pure Rust code, prefer enums. The unsafe block signals that the programmer is taking responsibility for correctness.

Memory Layout Comparison

/* layout_c.c */
#include <stdio.h>
#include <stddef.h>

typedef struct {
    char   a;    /* 1 byte  */
    int    b;    /* 4 bytes */
    char   c;    /* 1 byte  */
    double d;    /* 8 bytes */
} Example;

int main(void)
{
    printf("sizeof(Example) = %zu\n", sizeof(Example));
    printf("offset of a = %zu\n", offsetof(Example, a));
    printf("offset of b = %zu\n", offsetof(Example, b));
    printf("offset of c = %zu\n", offsetof(Example, c));
    printf("offset of d = %zu\n", offsetof(Example, d));
    return 0;
}
C struct layout (with padding):

  Byte:  0    1    2    3    4    5    6    7
        +----+----+----+----+----+----+----+----+
        | a  | pad| pad| pad| b  | b  | b  | b  |
        +----+----+----+----+----+----+----+----+
  Byte:  8    9   10   11   12   13   14   15
        +----+----+----+----+----+----+----+----+
        | c  | pad| pad| pad| pad| pad| pad| pad|
        +----+----+----+----+----+----+----+----+
  Byte: 16   17   18   19   20   21   22   23
        +----+----+----+----+----+----+----+----+
        | d  | d  | d  | d  | d  | d  | d  | d  |
        +----+----+----+----+----+----+----+----+

  Total: 24 bytes (10 bytes of padding!)

Rust reorders fields to minimize padding:

// layout_rust.rs
use std::mem;

struct Example { a: u8, b: i32, c: u8, d: f64 }

fn main() {
    println!("size of Example = {}", mem::size_of::<Example>());
}
Rust struct layout (fields reordered by compiler):

  Byte:  0    1    2    3    4    5    6    7
        +----+----+----+----+----+----+----+----+
        | d  | d  | d  | d  | d  | d  | d  | d  |
        +----+----+----+----+----+----+----+----+
  Byte:  8    9   10   11   12   13   14   15
        +----+----+----+----+----+----+----+----+
        | b  | b  | b  | b  | a  | c  | pad| pad|
        +----+----+----+----+----+----+----+----+

  Total: 16 bytes (2 bytes of padding)

To force C-compatible layout, use #[repr(C)]:

// repr_c.rs
#[repr(C)]
struct Example { a: u8, b: i32, c: u8, d: f64 }

fn main() {
    println!("size (#[repr(C)]) = {}", std::mem::size_of::<Example>());
    // prints 24, same as C
}

Driver Prep: When passing structs to the kernel or hardware, you must control the layout. Use #[repr(C)] in Rust. In C, use __attribute__((packed)) if you need to eliminate padding entirely.

Try It: Reorder the fields in the C struct to minimize padding manually. What is the smallest sizeof you can achieve?

Enum Memory Layout

// enum_size.rs
use std::mem;

enum Color { Red, Green, Blue }

fn main() {
    println!("size of Color = {}", mem::size_of::<Color>());
    println!("size of Option<u8> = {}", mem::size_of::<Option<u8>>());
    println!("size of Option<Box<i32>> = {}", mem::size_of::<Option<Box<i32>>>());
}
$ rustc enum_size.rs && ./enum_size
size of Color = 1
size of Option<u8> = 2
size of Option<Box<i32>> = 8

Rust uses the smallest discriminant that fits. Color needs only 1 byte. A C enum is typically 4 bytes (int-sized).

Option<Box<i32>> is the same size as Box<i32> -- Rust uses "niche optimization": since Box can never be null, the null bit pattern represents None.

Option<Box<i32>> layout:

  Some(ptr):  | non-zero pointer value (8 bytes) |
  None:       | 0x0000000000000000     (8 bytes) |

  No extra tag byte needed.

Quick Knowledge Check

  1. What is the difference between a C union and a Rust enum with data?
  2. Why does Rust reorder struct fields by default?
  3. What does #[repr(C)] do?

Common Pitfalls

  • Reading the wrong union member in C. Undefined behavior. No runtime check. Use a tag field and validate it in every access.
  • Forgetting padding in C structs. sizeof(struct) may be larger than the sum of field sizes. Use offsetof to check.
  • Assuming C enum values are contiguous. You can assign arbitrary values: enum E { A = 0, B = 100 }. Do not use them as array indices without bounds checks.
  • Forgetting pub on Rust struct fields. The struct may be public, but fields are private by default.
  • Using #[repr(C)] everywhere in Rust. Only use it when you need C-compatible layout (FFI, memory-mapped I/O). Otherwise let the compiler optimize.
  • Ignoring niche optimization. Option<&T> is the same size as &T. Do not wrap references in custom tagged enums when Option already does it for free.