Data Structure Layout in Memory
Type this right now
// save as layout.c — compile: gcc -o layout layout.c
#include <stdio.h>
#include <stddef.h>
struct Bad {
char a; // 1 byte
int b; // 4 bytes
char c; // 1 byte
};
struct Good {
int b; // 4 bytes
char a; // 1 byte
char c; // 1 byte
};
int main() {
printf("struct Bad: sizeof = %zu\n", sizeof(struct Bad));
printf(" offset of a: %zu\n", offsetof(struct Bad, a));
printf(" offset of b: %zu\n", offsetof(struct Bad, b));
printf(" offset of c: %zu\n", offsetof(struct Bad, c));
printf("\nstruct Good: sizeof = %zu\n", sizeof(struct Good));
printf(" offset of b: %zu\n", offsetof(struct Good, b));
printf(" offset of a: %zu\n", offsetof(struct Good, a));
printf(" offset of c: %zu\n", offsetof(struct Good, c));
return 0;
}
$ gcc -o layout layout.c && ./layout
struct Bad: sizeof = 12
offset of a: 0
offset of b: 4
offset of c: 8
struct Good: sizeof = 8
offset of b: 0
offset of a: 4
offset of c: 5
Same three fields. Different order. 4 bytes smaller. If you have a million of these structs, that's 4 MB wasted on invisible padding bytes. The compiler doesn't reorder C struct fields — it lays them out exactly as you declared them. It's on you.
Why alignment matters
Modern CPUs don't read arbitrary bytes from memory. They read in aligned chunks. A 4-byte
int must start at an address divisible by 4. An 8-byte double must start at an address
divisible by 8.
Memory addresses:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
│ │ │ │ │
└── 4-byte ─┘ └── 4-byte ─┘
int at address 0: ✓ aligned (0 % 4 == 0)
int at address 4: ✓ aligned (4 % 4 == 0)
int at address 1: ✗ misaligned! (1 % 4 ≠ 0)
What happens on misaligned access?
- x86-64: works, but slower (may need two cache line reads instead of one)
- ARM (older): hardware exception — your program crashes
- RISC-V: implementation-defined — may work, may trap
The C compiler adds padding bytes between fields to ensure every field is properly aligned. These padding bytes contain garbage and waste space.
struct Bad: the layout problem
struct Bad {
char a; // 1 byte, alignment 1
int b; // 4 bytes, alignment 4
char c; // 1 byte, alignment 1
};
Byte: 0 1 2 3 4 5 6 7 8 9 10 11
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ a │ pad │ pad │ pad │ b │ b │ b │ b │ c │ pad │ pad │ pad │
│ │ │ │ │ │ │ │ │ │ │ │ │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
▲ ▲ ▲
│ │ │
a at offset 0 b at offset 4 c at offset 8
(align 1: OK) (align 4: 4%4=0 ✓) (align 1: OK)
Total: 12 bytes. But actual data is only 6 bytes.
Waste: 6 bytes of padding (50%!)
Why padding after c? The struct's alignment is max(1,4,1) = 4.
sizeof must be a multiple of 4 so arrays of structs stay aligned.
8 + 1 = 9, round up to 12.
struct Good: reordered fields
struct Good {
int b; // 4 bytes, alignment 4
char a; // 1 byte, alignment 1
char c; // 1 byte, alignment 1
};
Byte: 0 1 2 3 4 5 6 7
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ b │ b │ b │ b │ a │ c │ pad │ pad │
│ │ │ │ │ │ │ │ │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
▲ ▲ ▲
│ │ │
b at offset 0 a c
(align 4: 0%4=0 ✓)
Total: 8 bytes. Same data, 4 bytes smaller.
The trick: put larger-aligned fields first.
The golden rule in C: sort fields from largest alignment to smallest. double first, then
int/long, then short, then char. This minimizes internal padding.
🧠 What do you think happens?
struct Mystery { char a; double b; char c; int d; };What's
sizeof(struct Mystery)? Work it out by hand before compiling. (Hint:doublehas alignment 8. The struct's alignment is also 8.)
Rust: the compiler reorders for you
Rust's default struct layout (repr(Rust)) allows the compiler to reorder fields for
optimal packing:
struct Bad { a: u8, // 1 byte b: u32, // 4 bytes c: u8, // 1 byte } fn main() { println!("Size of Bad: {}", std::mem::size_of::<Bad>()); println!("Align of Bad: {}", std::mem::align_of::<Bad>()); }
$ rustc layout.rs && ./layout
Size of Bad: 8
Align of Bad: 4
Even though the fields are declared in the "bad" order, Rust produces an 8-byte struct. The
compiler silently reordered b before a and c.
C layout (struct Bad): Rust layout (same fields):
┌───┬───────┬───┬───────┐ ┌───────────┬───┬───┬─────┐
│ a │padding│ b │c+pad │ │ b │ a │ c │ pad │
└───┴───────┴───┴───────┘ └───────────┴───┴───┴─────┘
12 bytes 8 bytes
Same fields. Same alignment rules. Smaller struct.
#[repr(C)]: when you need C-compatible layout
Sometimes you need the fields in a specific order:
- FFI (Foreign Function Interface) — calling C from Rust or vice versa
- Hardware register mappings — the bytes must match the hardware's expectation
- Network protocols — the bytes go on the wire in a specific order
- Memory-mapped I/O — addresses are fixed
#[repr(C)] struct CCompatible { a: u8, b: u32, c: u8, } fn main() { println!("repr(C) size: {}", std::mem::size_of::<CCompatible>()); // 12 println!("repr(Rust) size: {}", std::mem::size_of::<Bad>()); // 8 }
#[repr(C)] tells the compiler: "Lay out fields in declaration order, with C-style padding
rules. Do not reorder." Now the Rust struct has the same layout as the C struct, byte for byte.
💡 Fun Fact: The Linux kernel's structures are defined in C with precise layouts that match hardware expectations. Any Rust code in the kernel that interacts with these structures must use
#[repr(C)]to guarantee layout compatibility. Getting this wrong means reading the wrong field at the wrong offset — silent data corruption.
Rust enum layout
Rust enums are tagged unions: a discriminant (tag) that identifies the variant, plus the data for the active variant.
enum Message { Quit, // No data Move { x: i32, y: i32 }, // 8 bytes of data Write(String), // 24 bytes of data (ptr + len + cap) } fn main() { println!("Size of Message: {}", std::mem::size_of::<Message>()); println!("Size of String: {}", std::mem::size_of::<String>()); }
Size of Message: 32
Size of String: 24
Enum layout (conceptual):
┌──────────────┬──────────────────────────────────┐
│ Discriminant │ Payload │
│ (tag) │ (large enough for biggest variant)│
├──────────────┼──────────────────────────────────┤
│ 0 (Quit) │ (unused — 24 bytes of nothing) │
│ 1 (Move) │ x: i32, y: i32, (16B padding) │
│ 2 (Write) │ String (ptr, len, cap) = 24B │
└──────────────┴──────────────────────────────────┘
Total = align(discriminant) + size(largest variant)
= 8 + 24 = 32 bytes
The size of the enum is the size of the discriminant plus the size of the largest variant.
Every variant uses the same amount of space, even Quit which has no data. This is the cost
of a tagged union.
Niche optimization: zero-cost Option
Here's where Rust gets clever. Option<&T> is the same size as &T:
fn main() { println!("Size of &i32: {}", std::mem::size_of::<&i32>()); println!("Size of Option<&i32>: {}", std::mem::size_of::<Option<&i32>>()); println!(); println!("Size of Box<i32>: {}", std::mem::size_of::<Box<i32>>()); println!("Size of Option<Box<i32>>: {}", std::mem::size_of::<Option<Box<i32>>>()); }
Size of &i32: 8
Size of Option<&i32>: 8 ← SAME SIZE! No extra discriminant byte.
Size of Box<i32>: 8
Size of Option<Box<i32>>: 8 ← Also the same!
How? Niche optimization. A reference (&T) can never be null. So the compiler uses the
null bit pattern (all zeros) to represent None. No extra tag needed.
Option<&i32> layout:
Some(&val): ┌────────────────────────────────────┐
│ 0x00007FFF12340000 (valid pointer) │
└────────────────────────────────────┘
None: ┌────────────────────────────────────┐
│ 0x0000000000000000 (null = None) │
└────────────────────────────────────┘
Same 8 bytes. The "impossible" value (null) serves as the discriminant.
This works for any type that has an "impossible" bit pattern:
use std::num::NonZeroU32; fn main() { println!("Size of u32: {}", std::mem::size_of::<u32>()); println!("Size of Option<NonZeroU32>: {}", std::mem::size_of::<Option<NonZeroU32>>()); // Both are 4 bytes! Value 0 represents None. }
In C, you'd represent an "optional pointer" as NULL — but nothing stops you from
accidentally dereferencing it. In Rust, Option<&T> has the same representation as a C
pointer (8 bytes, with null for "absent"), but the compiler forces you to check for None
before accessing the value.
Examining layout: the tools
C
#include <stdio.h>
#include <stddef.h>
#include <stdalign.h>
struct Example {
char a;
double b;
int c;
char d;
};
int main() {
printf("sizeof: %zu\n", sizeof(struct Example));
printf("alignof: %zu\n", alignof(struct Example));
printf("offsetof a: %zu\n", offsetof(struct Example, a));
printf("offsetof b: %zu\n", offsetof(struct Example, b));
printf("offsetof c: %zu\n", offsetof(struct Example, c));
printf("offsetof d: %zu\n", offsetof(struct Example, d));
return 0;
}
sizeof: 24
alignof: 8
offsetof a: 0
offsetof b: 8
offsetof c: 16
offsetof d: 20
Byte: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
┌──┬─────────────────────┬──────────────────────┬───────────┬──┬────────┐
│a │ padding │ b │ c │d │ padding│
│ │ (7 bytes!) │ (8 bytes) │ (4 bytes) │ │(3 bytes│
└──┴─────────────────────┴──────────────────────┴───────────┴──┴────────┘
Rust
use std::mem; struct Example { a: u8, b: f64, c: u32, d: u8, } fn main() { println!("size_of: {}", mem::size_of::<Example>()); println!("align_of: {}", mem::align_of::<Example>()); // To see field offsets, we need a trick — create an instance: let e = Example { a: 0, b: 0.0, c: 0, d: 0 }; let base = &e as *const _ as usize; println!("offset of a: {}", &e.a as *const _ as usize - base); println!("offset of b: {}", &e.b as *const _ as usize - base); println!("offset of c: {}", &e.c as *const _ as usize - base); println!("offset of d: {}", &e.d as *const _ as usize - base); }
size_of: 16 ← Smaller! Rust reordered fields.
align_of: 8
offset of b: 0 ← b (8 bytes, align 8) placed first
offset of c: 8 ← c (4 bytes, align 4) placed second
offset of a: 12 ← a and d packed together
offset of d: 13
Rust layout (compiler reordered):
Byte: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
┌──────────────────────┬───────────┬──┬──┬──────┐
│ b │ c │a │d │ pad │
│ (8 bytes) │ (4 bytes) │ │ │(2 B) │
└──────────────────────┴───────────┴──┴──┴──────┘
C layout: 24 bytes. Rust layout: 16 bytes. Same fields.
Packed structs: removing all padding
Sometimes you want no padding at all — typically for wire protocols or file formats:
// C: using __attribute__((packed))
struct __attribute__((packed)) Packed {
char a;
int b;
char c;
};
// sizeof = 6. No padding. But misaligned access to b!
#![allow(unused)] fn main() { // Rust: using repr(packed) #[repr(packed)] struct Packed { a: u8, b: u32, c: u8, } // size_of = 6. No padding. Accessing b requires care. }
Packed structs are dangerous: accessing b at an odd offset may cause a hardware exception
on some architectures, or slow misaligned access on x86. Rust makes you use unsafe to take
references to misaligned fields, or copy them to a local variable first.
🧠 What do you think happens?
You have a
#[repr(packed)]struct in Rust and try to take&packed_struct.bwherebis au32at offset 1. Does it compile? Does it crash? What does the compiler warn you about?
🔧 Task: Compare C and Rust layouts
Step 1: Create this struct in both C and Rust:
// C version
struct Record {
char type_flag; // 1 byte
double value; // 8 bytes
short count; // 2 bytes
char active; // 1 byte
int id; // 4 bytes
};
#![allow(unused)] fn main() { // Rust version struct Record { type_flag: u8, value: f64, count: i16, active: u8, id: i32, } }
Step 2: Print sizeof / size_of and all field offsets in both languages.
Step 3: Draw the byte-level layout diagram for each, marking padding bytes.
Step 4: Reorder the C struct fields to minimize padding. Verify the new size matches (or approaches) Rust's automatically optimized layout.
Step 5: Add #[repr(C)] to the Rust struct and confirm the size matches your C struct.
Expected results:
C (original order): sizeof = 32
C (reordered): sizeof = 24 (or less)
Rust (default): size_of = 24
Rust (repr(C)): size_of = 32 (matches C original)
The compiler is a better struct packer than most humans — but only if you let it (Rust default) or think about it (C manual ordering).