Data Structure Layout in Memory

Type this right now

// save as layout.c — compile: gcc -o layout layout.c
#include <stdio.h>
#include <stddef.h>

struct Bad {
    char a;     // 1 byte
    int b;      // 4 bytes
    char c;     // 1 byte
};

struct Good {
    int b;      // 4 bytes
    char a;     // 1 byte
    char c;     // 1 byte
};

int main() {
    printf("struct Bad:  sizeof = %zu\n", sizeof(struct Bad));
    printf("  offset of a: %zu\n", offsetof(struct Bad, a));
    printf("  offset of b: %zu\n", offsetof(struct Bad, b));
    printf("  offset of c: %zu\n", offsetof(struct Bad, c));

    printf("\nstruct Good: sizeof = %zu\n", sizeof(struct Good));
    printf("  offset of b: %zu\n", offsetof(struct Good, b));
    printf("  offset of a: %zu\n", offsetof(struct Good, a));
    printf("  offset of c: %zu\n", offsetof(struct Good, c));

    return 0;
}
$ gcc -o layout layout.c && ./layout
struct Bad:  sizeof = 12
  offset of a: 0
  offset of b: 4
  offset of c: 8

struct Good: sizeof = 8
  offset of b: 0
  offset of a: 4
  offset of c: 5

Same three fields. Different order. 4 bytes smaller. If you have a million of these structs, that's 4 MB wasted on invisible padding bytes. The compiler doesn't reorder C struct fields — it lays them out exactly as you declared them. It's on you.


Why alignment matters

Modern CPUs don't read arbitrary bytes from memory. They read in aligned chunks. A 4-byte int must start at an address divisible by 4. An 8-byte double must start at an address divisible by 8.

    Memory addresses:
    0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
    │           │           │            │            │
    └── 4-byte ─┘           └── 4-byte  ─┘

    int at address 0: ✓ aligned (0 % 4 == 0)
    int at address 4: ✓ aligned (4 % 4 == 0)
    int at address 1: ✗ misaligned! (1 % 4 ≠ 0)

What happens on misaligned access?

  • x86-64: works, but slower (may need two cache line reads instead of one)
  • ARM (older): hardware exception — your program crashes
  • RISC-V: implementation-defined — may work, may trap

The C compiler adds padding bytes between fields to ensure every field is properly aligned. These padding bytes contain garbage and waste space.


struct Bad: the layout problem

struct Bad {
    char a;     // 1 byte, alignment 1
    int b;      // 4 bytes, alignment 4
    char c;     // 1 byte, alignment 1
};
    Byte:  0     1     2     3     4     5     6     7     8     9    10    11
          ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
          │  a  │ pad │ pad │ pad │  b  │  b  │  b  │  b  │  c  │ pad │ pad │ pad │
          │     │     │     │     │     │     │     │     │     │     │     │     │
          └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
          ▲                       ▲                       ▲
          │                       │                       │
          a at offset 0           b at offset 4           c at offset 8
          (align 1: OK)           (align 4: 4%4=0 ✓)     (align 1: OK)

    Total: 12 bytes. But actual data is only 6 bytes.
    Waste: 6 bytes of padding (50%!)

    Why padding after c? The struct's alignment is max(1,4,1) = 4.
    sizeof must be a multiple of 4 so arrays of structs stay aligned.
    8 + 1 = 9, round up to 12.

struct Good: reordered fields

struct Good {
    int b;      // 4 bytes, alignment 4
    char a;     // 1 byte, alignment 1
    char c;     // 1 byte, alignment 1
};
    Byte:  0     1     2     3     4     5     6     7
          ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
          │  b  │  b  │  b  │  b  │  a  │  c  │ pad │ pad │
          │     │     │     │     │     │     │     │     │
          └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
          ▲                       ▲     ▲
          │                       │     │
          b at offset 0           a     c
          (align 4: 0%4=0 ✓)

    Total: 8 bytes. Same data, 4 bytes smaller.
    The trick: put larger-aligned fields first.

The golden rule in C: sort fields from largest alignment to smallest. double first, then int/long, then short, then char. This minimizes internal padding.

🧠 What do you think happens?

struct Mystery {
    char a;
    double b;
    char c;
    int d;
};

What's sizeof(struct Mystery)? Work it out by hand before compiling. (Hint: double has alignment 8. The struct's alignment is also 8.)


Rust: the compiler reorders for you

Rust's default struct layout (repr(Rust)) allows the compiler to reorder fields for optimal packing:

struct Bad {
    a: u8,      // 1 byte
    b: u32,     // 4 bytes
    c: u8,      // 1 byte
}

fn main() {
    println!("Size of Bad: {}", std::mem::size_of::<Bad>());
    println!("Align of Bad: {}", std::mem::align_of::<Bad>());
}
$ rustc layout.rs && ./layout
Size of Bad: 8
Align of Bad: 4

Even though the fields are declared in the "bad" order, Rust produces an 8-byte struct. The compiler silently reordered b before a and c.

    C layout (struct Bad):            Rust layout (same fields):
    ┌───┬───────┬───┬───────┐        ┌───────────┬───┬───┬─────┐
    │ a │padding│ b │c+pad  │        │     b     │ a │ c │ pad │
    └───┴───────┴───┴───────┘        └───────────┴───┴───┴─────┘
         12 bytes                          8 bytes

    Same fields. Same alignment rules. Smaller struct.

#[repr(C)]: when you need C-compatible layout

Sometimes you need the fields in a specific order:

  • FFI (Foreign Function Interface) — calling C from Rust or vice versa
  • Hardware register mappings — the bytes must match the hardware's expectation
  • Network protocols — the bytes go on the wire in a specific order
  • Memory-mapped I/O — addresses are fixed
#[repr(C)]
struct CCompatible {
    a: u8,
    b: u32,
    c: u8,
}

fn main() {
    println!("repr(C) size: {}", std::mem::size_of::<CCompatible>());   // 12
    println!("repr(Rust) size: {}", std::mem::size_of::<Bad>());        // 8
}

#[repr(C)] tells the compiler: "Lay out fields in declaration order, with C-style padding rules. Do not reorder." Now the Rust struct has the same layout as the C struct, byte for byte.

💡 Fun Fact: The Linux kernel's structures are defined in C with precise layouts that match hardware expectations. Any Rust code in the kernel that interacts with these structures must use #[repr(C)] to guarantee layout compatibility. Getting this wrong means reading the wrong field at the wrong offset — silent data corruption.


Rust enum layout

Rust enums are tagged unions: a discriminant (tag) that identifies the variant, plus the data for the active variant.

enum Message {
    Quit,                       // No data
    Move { x: i32, y: i32 },   // 8 bytes of data
    Write(String),              // 24 bytes of data (ptr + len + cap)
}

fn main() {
    println!("Size of Message: {}", std::mem::size_of::<Message>());
    println!("Size of String: {}", std::mem::size_of::<String>());
}
Size of Message: 32
Size of String: 24
    Enum layout (conceptual):

    ┌──────────────┬──────────────────────────────────┐
    │ Discriminant │           Payload                 │
    │   (tag)      │  (large enough for biggest variant)│
    ├──────────────┼──────────────────────────────────┤
    │    0 (Quit)  │  (unused — 24 bytes of nothing)  │
    │    1 (Move)  │  x: i32, y: i32, (16B padding)  │
    │    2 (Write) │  String (ptr, len, cap) = 24B    │
    └──────────────┴──────────────────────────────────┘

    Total = align(discriminant) + size(largest variant)
          = 8 + 24 = 32 bytes

The size of the enum is the size of the discriminant plus the size of the largest variant. Every variant uses the same amount of space, even Quit which has no data. This is the cost of a tagged union.


Niche optimization: zero-cost Option

Here's where Rust gets clever. Option<&T> is the same size as &T:

fn main() {
    println!("Size of &i32:           {}", std::mem::size_of::<&i32>());
    println!("Size of Option<&i32>:   {}", std::mem::size_of::<Option<&i32>>());
    println!();
    println!("Size of Box<i32>:       {}", std::mem::size_of::<Box<i32>>());
    println!("Size of Option<Box<i32>>: {}", std::mem::size_of::<Option<Box<i32>>>());
}
Size of &i32:           8
Size of Option<&i32>:   8   ← SAME SIZE! No extra discriminant byte.

Size of Box<i32>:       8
Size of Option<Box<i32>>: 8   ← Also the same!

How? Niche optimization. A reference (&T) can never be null. So the compiler uses the null bit pattern (all zeros) to represent None. No extra tag needed.

    Option<&i32> layout:

    Some(&val):  ┌────────────────────────────────────┐
                 │  0x00007FFF12340000 (valid pointer) │
                 └────────────────────────────────────┘

    None:        ┌────────────────────────────────────┐
                 │  0x0000000000000000 (null = None)   │
                 └────────────────────────────────────┘

    Same 8 bytes. The "impossible" value (null) serves as the discriminant.

This works for any type that has an "impossible" bit pattern:

use std::num::NonZeroU32;

fn main() {
    println!("Size of u32:              {}", std::mem::size_of::<u32>());
    println!("Size of Option<NonZeroU32>: {}", std::mem::size_of::<Option<NonZeroU32>>());
    // Both are 4 bytes! Value 0 represents None.
}

In C, you'd represent an "optional pointer" as NULL — but nothing stops you from accidentally dereferencing it. In Rust, Option<&T> has the same representation as a C pointer (8 bytes, with null for "absent"), but the compiler forces you to check for None before accessing the value.


Examining layout: the tools

C

#include <stdio.h>
#include <stddef.h>
#include <stdalign.h>

struct Example {
    char a;
    double b;
    int c;
    char d;
};

int main() {
    printf("sizeof:  %zu\n", sizeof(struct Example));
    printf("alignof: %zu\n", alignof(struct Example));
    printf("offsetof a: %zu\n", offsetof(struct Example, a));
    printf("offsetof b: %zu\n", offsetof(struct Example, b));
    printf("offsetof c: %zu\n", offsetof(struct Example, c));
    printf("offsetof d: %zu\n", offsetof(struct Example, d));
    return 0;
}
sizeof:  24
alignof: 8
offsetof a: 0
offsetof b: 8
offsetof c: 16
offsetof d: 20
    Byte: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
         ┌──┬─────────────────────┬──────────────────────┬───────────┬──┬────────┐
         │a │      padding        │         b            │     c     │d │ padding│
         │  │  (7 bytes!)         │    (8 bytes)         │ (4 bytes) │  │(3 bytes│
         └──┴─────────────────────┴──────────────────────┴───────────┴──┴────────┘

Rust

use std::mem;

struct Example {
    a: u8,
    b: f64,
    c: u32,
    d: u8,
}

fn main() {
    println!("size_of:  {}", mem::size_of::<Example>());
    println!("align_of: {}", mem::align_of::<Example>());

    // To see field offsets, we need a trick — create an instance:
    let e = Example { a: 0, b: 0.0, c: 0, d: 0 };
    let base = &e as *const _ as usize;
    println!("offset of a: {}", &e.a as *const _ as usize - base);
    println!("offset of b: {}", &e.b as *const _ as usize - base);
    println!("offset of c: {}", &e.c as *const _ as usize - base);
    println!("offset of d: {}", &e.d as *const _ as usize - base);
}
size_of:  16      ← Smaller! Rust reordered fields.
align_of: 8
offset of b: 0    ← b (8 bytes, align 8) placed first
offset of c: 8    ← c (4 bytes, align 4) placed second
offset of a: 12   ← a and d packed together
offset of d: 13
    Rust layout (compiler reordered):
    Byte: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
         ┌──────────────────────┬───────────┬──┬──┬──────┐
         │         b            │     c     │a │d │ pad  │
         │    (8 bytes)         │ (4 bytes) │  │  │(2 B) │
         └──────────────────────┴───────────┴──┴──┴──────┘

    C layout: 24 bytes. Rust layout: 16 bytes. Same fields.

Packed structs: removing all padding

Sometimes you want no padding at all — typically for wire protocols or file formats:

// C: using __attribute__((packed))
struct __attribute__((packed)) Packed {
    char a;
    int b;
    char c;
};
// sizeof = 6. No padding. But misaligned access to b!
#![allow(unused)]
fn main() {
// Rust: using repr(packed)
#[repr(packed)]
struct Packed {
    a: u8,
    b: u32,
    c: u8,
}
// size_of = 6. No padding. Accessing b requires care.
}

Packed structs are dangerous: accessing b at an odd offset may cause a hardware exception on some architectures, or slow misaligned access on x86. Rust makes you use unsafe to take references to misaligned fields, or copy them to a local variable first.

🧠 What do you think happens?

You have a #[repr(packed)] struct in Rust and try to take &packed_struct.b where b is a u32 at offset 1. Does it compile? Does it crash? What does the compiler warn you about?


🔧 Task: Compare C and Rust layouts

Step 1: Create this struct in both C and Rust:

// C version
struct Record {
    char type_flag;      // 1 byte
    double value;        // 8 bytes
    short count;         // 2 bytes
    char active;         // 1 byte
    int id;              // 4 bytes
};
#![allow(unused)]
fn main() {
// Rust version
struct Record {
    type_flag: u8,
    value: f64,
    count: i16,
    active: u8,
    id: i32,
}
}

Step 2: Print sizeof / size_of and all field offsets in both languages.

Step 3: Draw the byte-level layout diagram for each, marking padding bytes.

Step 4: Reorder the C struct fields to minimize padding. Verify the new size matches (or approaches) Rust's automatically optimized layout.

Step 5: Add #[repr(C)] to the Rust struct and confirm the size matches your C struct.

Expected results:

    C (original order):   sizeof = 32
    C (reordered):        sizeof = 24 (or less)
    Rust (default):       size_of = 24
    Rust (repr(C)):       size_of = 32 (matches C original)

The compiler is a better struct packer than most humans — but only if you let it (Rust default) or think about it (C manual ordering).