Alignment, Padding, and Packing
The compiler silently inserts invisible bytes into your structs. This chapter shows you exactly where, why, and how to control it. You need this knowledge for network protocols, file formats, hardware registers, and shared memory -- any time data must match an exact byte layout.
Why the Compiler Inserts Padding
CPUs access memory most efficiently when data falls on natural boundaries. A 4-byte
int is fastest to read when its address is a multiple of 4. A 2-byte short wants
a multiple of 2. The compiler enforces this by inserting padding bytes between
struct members.
#include <stdio.h>
#include <stddef.h>
struct example {
char a; /* 1 byte */
int b; /* 4 bytes */
char c; /* 1 byte */
};
int main(void) {
printf("sizeof(struct example) = %zu\n", sizeof(struct example));
printf("offsetof(a) = %zu\n", offsetof(struct example, a));
printf("offsetof(b) = %zu\n", offsetof(struct example, b));
printf("offsetof(c) = %zu\n", offsetof(struct example, c));
return 0;
}
Typical output on a 64-bit system:
sizeof(struct example) = 12
offsetof(a) = 0
offsetof(b) = 4
offsetof(c) = 8
The layout with padding:
Offset: 0 1 2 3 4 5 6 7 8 9 10 11
+----+----+----+----+----+----+----+----+----+----+----+----+
| a | pad| pad| pad| b (4 bytes) | c | pad| pad| pad|
+----+----+----+----+----+----+----+----+----+----+----+----+
Three bytes of padding after a to align b on a 4-byte boundary. Three bytes of
trailing padding after c so that an array of these structs keeps b aligned.
The offsetof Macro
offsetof(type, member) from <stddef.h> tells you the exact byte offset of any
member. It is the essential tool for verifying layout.
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
struct packet {
uint8_t version;
uint16_t length;
uint32_t sequence;
uint8_t flags;
};
int main(void) {
printf("Field Offset Size\n");
printf("version %zu %zu\n", offsetof(struct packet, version), sizeof(uint8_t));
printf("length %zu %zu\n", offsetof(struct packet, length), sizeof(uint16_t));
printf("sequence %zu %zu\n", offsetof(struct packet, sequence), sizeof(uint32_t));
printf("flags %zu %zu\n", offsetof(struct packet, flags), sizeof(uint8_t));
printf("total size %zu\n", sizeof(struct packet));
return 0;
}
Likely output:
Field Offset Size
version 0 1
length 2 2
sequence 4 4
flags 8 1
total size 12
One byte of padding after version, three bytes of trailing padding after flags.
Reordering Fields to Minimize Padding
Simply reordering members from largest to smallest eliminates most internal padding.
#include <stdio.h>
#include <stddef.h>
struct bad_order {
char a; /* 1 byte + 7 padding */
double b; /* 8 bytes */
char c; /* 1 byte + 7 padding */
}; /* total: 24 bytes */
struct good_order {
double b; /* 8 bytes */
char a; /* 1 byte */
char c; /* 1 byte + 6 padding */
}; /* total: 16 bytes */
int main(void) {
printf("bad_order: %zu bytes\n", sizeof(struct bad_order));
printf("good_order: %zu bytes\n", sizeof(struct good_order));
return 0;
}
bad_order layout (24 bytes):
+---+-------+--------+---+-------+
| a | pad7 | b (8) | c | pad7 |
+---+-------+--------+---+-------+
good_order layout (16 bytes):
+--------+---+---+------+
| b (8) | a | c | pad6 |
+--------+---+---+------+
Driver Prep: In kernel code, struct layout matters for cache performance. Hot fields are grouped together. The
paholetool shows struct layouts including padding holes. Runpahole my_object.oon compiled code to see real layouts.
Packed Structs in C
When you need exact byte layout -- network packets, file headers, hardware registers -- you must eliminate padding entirely.
attribute((packed)) (GCC/Clang)
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
struct __attribute__((packed)) wire_header {
uint8_t version;
uint16_t length;
uint32_t sequence;
uint8_t flags;
};
int main(void) {
printf("sizeof = %zu\n", sizeof(struct wire_header)); /* 8, not 12 */
printf("offsetof(length) = %zu\n", offsetof(struct wire_header, length)); /* 1 */
printf("offsetof(sequence) = %zu\n", offsetof(struct wire_header, sequence)); /* 3 */
printf("offsetof(flags) = %zu\n", offsetof(struct wire_header, flags)); /* 7 */
return 0;
}
Packed layout:
Offset: 0 1 2 3 4 5 6 7
+----+----+----+----+----+----+----+----+
|ver | length | sequence |flag|
+----+----+----+----+----+----+----+----+
#pragma pack
MSVC and GCC both support #pragma pack. It affects all structs until reset.
#include <stdio.h>
#include <stdint.h>
#pragma pack(push, 1)
struct wire_header {
uint8_t version;
uint16_t length;
uint32_t sequence;
uint8_t flags;
};
#pragma pack(pop)
int main(void) {
printf("sizeof = %zu\n", sizeof(struct wire_header)); /* 8 */
return 0;
}
Caution: Always use
push/popwith#pragma pack. Forgettingpopsilently packs every subsequent struct in the translation unit, causing baffling bugs.
Performance Cost of Unaligned Access
On x86, unaligned access works but is slower. On ARM and RISC-V, it can trap or silently produce wrong results depending on the configuration.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
#define ITERATIONS 100000000
int main(void) {
/* Aligned access */
uint32_t aligned_val = 0;
uint32_t *aligned_ptr = &aligned_val;
clock_t start = clock();
for (int i = 0; i < ITERATIONS; i++)
*aligned_ptr = *aligned_ptr + 1;
clock_t aligned_time = clock() - start;
/* Unaligned access via packed struct */
struct __attribute__((packed)) {
uint8_t pad;
uint32_t val;
} unaligned = {0, 0};
start = clock();
for (int i = 0; i < ITERATIONS; i++)
unaligned.val = unaligned.val + 1;
clock_t unaligned_time = clock() - start;
printf("Aligned: %ld ticks\n", (long)aligned_time);
printf("Unaligned: %ld ticks\n", (long)unaligned_time);
return 0;
}
Try It: Compile with
-O0and-O2and compare the results. The compiler may generate special unaligned-access instructions at higher optimization levels. Also try on an ARM machine if you have one -- the difference may be dramatic.
Rust: repr(C)
By default, Rust makes no guarantees about struct layout. The compiler is free to
reorder fields, add padding, or change layout between compilations. To get a
predictable C-compatible layout, use #[repr(C)].
use std::mem; #[repr(C)] struct Example { a: u8, b: u32, c: u8, } fn main() { println!("size = {}", mem::size_of::<Example>()); println!("align = {}", mem::align_of::<Example>()); let ex = Example { a: 1, b: 2, c: 3 }; let ptr = &ex as *const Example as *const u8; // Use offset_of! (stabilized in Rust 1.77) println!("offset of a = {}", mem::offset_of!(Example, a)); println!("offset of b = {}", mem::offset_of!(Example, b)); println!("offset of c = {}", mem::offset_of!(Example, c)); }
Output matches the C version: size 12, offsets 0/4/8.
Rust Note: Without
#[repr(C)], Rust's defaultrepr(Rust)may reorder fields to minimize padding. This is an optimization -- but it means you cannot predict the layout. Always userepr(C)for FFI or hardware-facing structs.
Rust: repr(packed)
use std::mem; #[repr(C, packed)] struct WireHeader { version: u8, length: u16, sequence: u32, flags: u8, } fn main() { println!("size = {}", mem::size_of::<WireHeader>()); // 8 println!("offset version = {}", mem::offset_of!(WireHeader, version)); println!("offset length = {}", mem::offset_of!(WireHeader, length)); println!("offset sequence = {}", mem::offset_of!(WireHeader, sequence)); println!("offset flags = {}", mem::offset_of!(WireHeader, flags)); }
Caution: In Rust, taking a reference to a field in a
packedstruct is undefined behavior if the field is not naturally aligned. The compiler will refuse to create&header.sequenceif it might be unaligned. You must useaddr_of!(header.sequence).read_unaligned()or copy the field first.
use std::ptr::addr_of; #[repr(C, packed)] struct Packed { a: u8, b: u32, } fn main() { let p = Packed { a: 1, b: 0xDEADBEEF }; // This would be UB: let r = &p.b; // Safe way: let b_val = unsafe { addr_of!(p.b).read_unaligned() }; println!("b = 0x{b_val:08X}"); }
Rust: repr(align(N))
Force a minimum alignment, useful for cache-line alignment.
use std::mem; #[repr(C, align(64))] struct CacheAligned { counter: u64, data: [u8; 32], } fn main() { println!("size = {}", mem::size_of::<CacheAligned>()); // 64 println!("align = {}", mem::align_of::<CacheAligned>()); // 64 let obj = CacheAligned { counter: 0, data: [0; 32] }; let addr = &obj as *const CacheAligned as usize; println!("address = 0x{addr:X}"); println!("aligned to 64? {}", addr % 64 == 0); }
Driver Prep: Cache-line alignment prevents false sharing in concurrent code. When two threads write to different fields that share a cache line, the CPU bounces the line between cores. Aligning to 64 bytes (typical cache line) avoids this. The Linux kernel uses
____cacheline_alignedfor this purpose.
Verifying Layout at Compile Time
In C, use _Static_assert (C11):
#include <stdint.h>
#include <stddef.h>
struct __attribute__((packed)) wire_msg {
uint8_t type;
uint16_t length;
uint32_t payload;
};
_Static_assert(sizeof(struct wire_msg) == 7, "wire_msg must be 7 bytes");
_Static_assert(offsetof(struct wire_msg, payload) == 3, "payload at offset 3");
int main(void) {
return 0;
}
In Rust, use const assertions:
#[repr(C, packed)] struct WireMsg { msg_type: u8, length: u16, payload: u32, } const _: () = assert!(std::mem::size_of::<WireMsg>() == 7); fn main() { println!("Layout verified at compile time."); }
A Real-World Example: ELF Header
The ELF file format begins with a fixed-layout header. Here is a partial version:
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <string.h>
struct __attribute__((packed)) elf_ident {
uint8_t magic[4]; /* 0x7F 'E' 'L' 'F' */
uint8_t class; /* 1=32-bit, 2=64-bit */
uint8_t data; /* 1=LE, 2=BE */
uint8_t version;
uint8_t osabi;
uint8_t pad[8];
};
_Static_assert(sizeof(struct elf_ident) == 16, "ELF ident must be 16 bytes");
int main(void) {
struct elf_ident ident;
memset(&ident, 0, sizeof(ident));
ident.magic[0] = 0x7F;
ident.magic[1] = 'E';
ident.magic[2] = 'L';
ident.magic[3] = 'F';
ident.class = 2; /* 64-bit */
ident.data = 1; /* little-endian */
ident.version = 1;
printf("ELF ident: ");
uint8_t *bytes = (uint8_t *)&ident;
for (size_t i = 0; i < sizeof(ident); i++)
printf("%02X ", bytes[i]);
printf("\n");
return 0;
}
use std::mem; #[repr(C, packed)] struct ElfIdent { magic: [u8; 4], class: u8, data: u8, version: u8, osabi: u8, pad: [u8; 8], } const _: () = assert!(mem::size_of::<ElfIdent>() == 16); fn main() { let ident = ElfIdent { magic: [0x7F, b'E', b'L', b'F'], class: 2, // 64-bit data: 1, // little-endian version: 1, osabi: 0, pad: [0; 8], }; let bytes: &[u8] = unsafe { std::slice::from_raw_parts( &ident as *const ElfIdent as *const u8, mem::size_of::<ElfIdent>(), ) }; print!("ELF ident: "); for b in bytes { print!("{b:02X} "); } println!(); }
Try It: Read the first 16 bytes of
/bin/ls(or any ELF binary) into this struct and verify the magic number. In C, usefread. In Rust, usestd::fs::readand slice the first 16 bytes.
Quick Knowledge Check
- A struct has fields
u8, u32, u8withrepr(C). What is its size and why? - What happens on ARM if you read a
u32from an odd address without packed access? - Why does Rust refuse to let you create
&packed_struct.unaligned_field?
Common Pitfalls
- Assuming struct size equals sum of field sizes. Padding exists. Always verify
with
sizeof/size_of. - Forgetting trailing padding. The struct's total size is rounded up to its alignment so that arrays work.
- Using packed structs everywhere. Pack only when the wire format demands it. Unpacked structs are faster.
- Taking references to packed fields in Rust. This is UB. Use
read_unaligned. - Forgetting
repr(C). Default Rust layout is unspecified. Withoutrepr(C), your struct will not match the C equivalent. - Not asserting layout. Always add static assertions for struct size when the layout must be exact. Catch mistakes at compile time, not in production.