Alignment, Padding, and Packing

The compiler silently inserts invisible bytes into your structs. This chapter shows you exactly where, why, and how to control it. You need this knowledge for network protocols, file formats, hardware registers, and shared memory -- any time data must match an exact byte layout.

Why the Compiler Inserts Padding

CPUs access memory most efficiently when data falls on natural boundaries. A 4-byte int is fastest to read when its address is a multiple of 4. A 2-byte short wants a multiple of 2. The compiler enforces this by inserting padding bytes between struct members.

#include <stdio.h>
#include <stddef.h>

struct example {
    char   a;    /* 1 byte */
    int    b;    /* 4 bytes */
    char   c;    /* 1 byte */
};

int main(void) {
    printf("sizeof(struct example) = %zu\n", sizeof(struct example));
    printf("offsetof(a) = %zu\n", offsetof(struct example, a));
    printf("offsetof(b) = %zu\n", offsetof(struct example, b));
    printf("offsetof(c) = %zu\n", offsetof(struct example, c));
    return 0;
}

Typical output on a 64-bit system:

sizeof(struct example) = 12
offsetof(a) = 0
offsetof(b) = 4
offsetof(c) = 8

The layout with padding:

Offset:  0    1    2    3    4    5    6    7    8    9   10   11
       +----+----+----+----+----+----+----+----+----+----+----+----+
       | a  | pad| pad| pad|    b (4 bytes)    | c  | pad| pad| pad|
       +----+----+----+----+----+----+----+----+----+----+----+----+

Three bytes of padding after a to align b on a 4-byte boundary. Three bytes of trailing padding after c so that an array of these structs keeps b aligned.

The offsetof Macro

offsetof(type, member) from <stddef.h> tells you the exact byte offset of any member. It is the essential tool for verifying layout.

#include <stdio.h>
#include <stddef.h>
#include <stdint.h>

struct packet {
    uint8_t  version;
    uint16_t length;
    uint32_t sequence;
    uint8_t  flags;
};

int main(void) {
    printf("Field        Offset  Size\n");
    printf("version      %zu       %zu\n", offsetof(struct packet, version),  sizeof(uint8_t));
    printf("length       %zu       %zu\n", offsetof(struct packet, length),   sizeof(uint16_t));
    printf("sequence     %zu       %zu\n", offsetof(struct packet, sequence), sizeof(uint32_t));
    printf("flags        %zu       %zu\n", offsetof(struct packet, flags),    sizeof(uint8_t));
    printf("total size   %zu\n", sizeof(struct packet));
    return 0;
}

Likely output:

Field        Offset  Size
version      0       1
length       2       2
sequence     4       4
flags        8       1
total size   12

One byte of padding after version, three bytes of trailing padding after flags.

Reordering Fields to Minimize Padding

Simply reordering members from largest to smallest eliminates most internal padding.

#include <stdio.h>
#include <stddef.h>

struct bad_order {
    char   a;   /* 1 byte + 7 padding */
    double b;   /* 8 bytes */
    char   c;   /* 1 byte + 7 padding */
};   /* total: 24 bytes */

struct good_order {
    double b;   /* 8 bytes */
    char   a;   /* 1 byte */
    char   c;   /* 1 byte + 6 padding */
};   /* total: 16 bytes */

int main(void) {
    printf("bad_order:  %zu bytes\n", sizeof(struct bad_order));
    printf("good_order: %zu bytes\n", sizeof(struct good_order));
    return 0;
}
bad_order layout (24 bytes):
+---+-------+--------+---+-------+
| a |  pad7 | b (8)  | c |  pad7 |
+---+-------+--------+---+-------+

good_order layout (16 bytes):
+--------+---+---+------+
| b (8)  | a | c | pad6 |
+--------+---+---+------+

Driver Prep: In kernel code, struct layout matters for cache performance. Hot fields are grouped together. The pahole tool shows struct layouts including padding holes. Run pahole my_object.o on compiled code to see real layouts.

Packed Structs in C

When you need exact byte layout -- network packets, file headers, hardware registers -- you must eliminate padding entirely.

attribute((packed)) (GCC/Clang)

#include <stdio.h>
#include <stddef.h>
#include <stdint.h>

struct __attribute__((packed)) wire_header {
    uint8_t  version;
    uint16_t length;
    uint32_t sequence;
    uint8_t  flags;
};

int main(void) {
    printf("sizeof = %zu\n", sizeof(struct wire_header));  /* 8, not 12 */
    printf("offsetof(length)   = %zu\n", offsetof(struct wire_header, length));   /* 1 */
    printf("offsetof(sequence) = %zu\n", offsetof(struct wire_header, sequence)); /* 3 */
    printf("offsetof(flags)    = %zu\n", offsetof(struct wire_header, flags));    /* 7 */
    return 0;
}

Packed layout:

Offset:  0    1    2    3    4    5    6    7
       +----+----+----+----+----+----+----+----+
       |ver |  length |     sequence      |flag|
       +----+----+----+----+----+----+----+----+

#pragma pack

MSVC and GCC both support #pragma pack. It affects all structs until reset.

#include <stdio.h>
#include <stdint.h>

#pragma pack(push, 1)
struct wire_header {
    uint8_t  version;
    uint16_t length;
    uint32_t sequence;
    uint8_t  flags;
};
#pragma pack(pop)

int main(void) {
    printf("sizeof = %zu\n", sizeof(struct wire_header));  /* 8 */
    return 0;
}

Caution: Always use push/pop with #pragma pack. Forgetting pop silently packs every subsequent struct in the translation unit, causing baffling bugs.

Performance Cost of Unaligned Access

On x86, unaligned access works but is slower. On ARM and RISC-V, it can trap or silently produce wrong results depending on the configuration.

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

#define ITERATIONS 100000000

int main(void) {
    /* Aligned access */
    uint32_t aligned_val = 0;
    uint32_t *aligned_ptr = &aligned_val;

    clock_t start = clock();
    for (int i = 0; i < ITERATIONS; i++)
        *aligned_ptr = *aligned_ptr + 1;
    clock_t aligned_time = clock() - start;

    /* Unaligned access via packed struct */
    struct __attribute__((packed)) {
        uint8_t  pad;
        uint32_t val;
    } unaligned = {0, 0};

    start = clock();
    for (int i = 0; i < ITERATIONS; i++)
        unaligned.val = unaligned.val + 1;
    clock_t unaligned_time = clock() - start;

    printf("Aligned:   %ld ticks\n", (long)aligned_time);
    printf("Unaligned: %ld ticks\n", (long)unaligned_time);
    return 0;
}

Try It: Compile with -O0 and -O2 and compare the results. The compiler may generate special unaligned-access instructions at higher optimization levels. Also try on an ARM machine if you have one -- the difference may be dramatic.

Rust: repr(C)

By default, Rust makes no guarantees about struct layout. The compiler is free to reorder fields, add padding, or change layout between compilations. To get a predictable C-compatible layout, use #[repr(C)].

use std::mem;

#[repr(C)]
struct Example {
    a: u8,
    b: u32,
    c: u8,
}

fn main() {
    println!("size  = {}", mem::size_of::<Example>());
    println!("align = {}", mem::align_of::<Example>());

    let ex = Example { a: 1, b: 2, c: 3 };
    let ptr = &ex as *const Example as *const u8;

    // Use offset_of! (stabilized in Rust 1.77)
    println!("offset of a = {}", mem::offset_of!(Example, a));
    println!("offset of b = {}", mem::offset_of!(Example, b));
    println!("offset of c = {}", mem::offset_of!(Example, c));
}

Output matches the C version: size 12, offsets 0/4/8.

Rust Note: Without #[repr(C)], Rust's default repr(Rust) may reorder fields to minimize padding. This is an optimization -- but it means you cannot predict the layout. Always use repr(C) for FFI or hardware-facing structs.

Rust: repr(packed)

use std::mem;

#[repr(C, packed)]
struct WireHeader {
    version:  u8,
    length:   u16,
    sequence: u32,
    flags:    u8,
}

fn main() {
    println!("size = {}", mem::size_of::<WireHeader>());  // 8

    println!("offset version  = {}", mem::offset_of!(WireHeader, version));
    println!("offset length   = {}", mem::offset_of!(WireHeader, length));
    println!("offset sequence = {}", mem::offset_of!(WireHeader, sequence));
    println!("offset flags    = {}", mem::offset_of!(WireHeader, flags));
}

Caution: In Rust, taking a reference to a field in a packed struct is undefined behavior if the field is not naturally aligned. The compiler will refuse to create &header.sequence if it might be unaligned. You must use addr_of!(header.sequence).read_unaligned() or copy the field first.

use std::ptr::addr_of;

#[repr(C, packed)]
struct Packed {
    a: u8,
    b: u32,
}

fn main() {
    let p = Packed { a: 1, b: 0xDEADBEEF };

    // This would be UB: let r = &p.b;
    // Safe way:
    let b_val = unsafe { addr_of!(p.b).read_unaligned() };
    println!("b = 0x{b_val:08X}");
}

Rust: repr(align(N))

Force a minimum alignment, useful for cache-line alignment.

use std::mem;

#[repr(C, align(64))]
struct CacheAligned {
    counter: u64,
    data: [u8; 32],
}

fn main() {
    println!("size  = {}", mem::size_of::<CacheAligned>());   // 64
    println!("align = {}", mem::align_of::<CacheAligned>());  // 64

    let obj = CacheAligned { counter: 0, data: [0; 32] };
    let addr = &obj as *const CacheAligned as usize;
    println!("address = 0x{addr:X}");
    println!("aligned to 64? {}", addr % 64 == 0);
}

Driver Prep: Cache-line alignment prevents false sharing in concurrent code. When two threads write to different fields that share a cache line, the CPU bounces the line between cores. Aligning to 64 bytes (typical cache line) avoids this. The Linux kernel uses ____cacheline_aligned for this purpose.

Verifying Layout at Compile Time

In C, use _Static_assert (C11):

#include <stdint.h>
#include <stddef.h>

struct __attribute__((packed)) wire_msg {
    uint8_t  type;
    uint16_t length;
    uint32_t payload;
};

_Static_assert(sizeof(struct wire_msg) == 7, "wire_msg must be 7 bytes");
_Static_assert(offsetof(struct wire_msg, payload) == 3, "payload at offset 3");

int main(void) {
    return 0;
}

In Rust, use const assertions:

#[repr(C, packed)]
struct WireMsg {
    msg_type: u8,
    length:   u16,
    payload:  u32,
}

const _: () = assert!(std::mem::size_of::<WireMsg>() == 7);

fn main() {
    println!("Layout verified at compile time.");
}

A Real-World Example: ELF Header

The ELF file format begins with a fixed-layout header. Here is a partial version:

#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <string.h>

struct __attribute__((packed)) elf_ident {
    uint8_t  magic[4];   /* 0x7F 'E' 'L' 'F' */
    uint8_t  class;      /* 1=32-bit, 2=64-bit */
    uint8_t  data;       /* 1=LE, 2=BE */
    uint8_t  version;
    uint8_t  osabi;
    uint8_t  pad[8];
};

_Static_assert(sizeof(struct elf_ident) == 16, "ELF ident must be 16 bytes");

int main(void) {
    struct elf_ident ident;
    memset(&ident, 0, sizeof(ident));
    ident.magic[0] = 0x7F;
    ident.magic[1] = 'E';
    ident.magic[2] = 'L';
    ident.magic[3] = 'F';
    ident.class = 2;  /* 64-bit */
    ident.data  = 1;  /* little-endian */
    ident.version = 1;

    printf("ELF ident: ");
    uint8_t *bytes = (uint8_t *)&ident;
    for (size_t i = 0; i < sizeof(ident); i++)
        printf("%02X ", bytes[i]);
    printf("\n");
    return 0;
}
use std::mem;

#[repr(C, packed)]
struct ElfIdent {
    magic:   [u8; 4],
    class:   u8,
    data:    u8,
    version: u8,
    osabi:   u8,
    pad:     [u8; 8],
}

const _: () = assert!(mem::size_of::<ElfIdent>() == 16);

fn main() {
    let ident = ElfIdent {
        magic:   [0x7F, b'E', b'L', b'F'],
        class:   2,   // 64-bit
        data:    1,   // little-endian
        version: 1,
        osabi:   0,
        pad:     [0; 8],
    };

    let bytes: &[u8] = unsafe {
        std::slice::from_raw_parts(
            &ident as *const ElfIdent as *const u8,
            mem::size_of::<ElfIdent>(),
        )
    };

    print!("ELF ident: ");
    for b in bytes {
        print!("{b:02X} ");
    }
    println!();
}

Try It: Read the first 16 bytes of /bin/ls (or any ELF binary) into this struct and verify the magic number. In C, use fread. In Rust, use std::fs::read and slice the first 16 bytes.

Quick Knowledge Check

  1. A struct has fields u8, u32, u8 with repr(C). What is its size and why?
  2. What happens on ARM if you read a u32 from an odd address without packed access?
  3. Why does Rust refuse to let you create &packed_struct.unaligned_field?

Common Pitfalls

  • Assuming struct size equals sum of field sizes. Padding exists. Always verify with sizeof/size_of.
  • Forgetting trailing padding. The struct's total size is rounded up to its alignment so that arrays work.
  • Using packed structs everywhere. Pack only when the wire format demands it. Unpacked structs are faster.
  • Taking references to packed fields in Rust. This is UB. Use read_unaligned.
  • Forgetting repr(C). Default Rust layout is unspecified. Without repr(C), your struct will not match the C equivalent.
  • Not asserting layout. Always add static assertions for struct size when the layout must be exact. Catch mistakes at compile time, not in production.