The Allocator: What malloc Really Does

Type this right now

$ ltrace -e malloc,free ./your_program 2>&1 | head -20

Don't have a program handy? Use this:

// save as allocs.c — compile: gcc -o allocs allocs.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main() {
    char *a = malloc(32);
    char *b = malloc(128);
    char *c = malloc(256 * 1024);  // 256 KB — big allocation!

    printf("a = %p (32 bytes)\n", (void *)a);
    printf("b = %p (128 bytes)\n", (void *)b);
    printf("c = %p (256 KB)\n", (void *)c);

    free(a);
    free(b);
    free(c);
    return 0;
}

$ ltrace -e malloc,free ./allocs
malloc(32)                         = 0x55a000
malloc(128)                        = 0x55a030
malloc(262144)                     = 0x7f4a00000000
free(0x55a000)
free(0x55a030)
free(0x7f4a00000000)

Notice something? a and b are close together (in the heap/brk region). But c is in a completely different address range. That's because the allocator uses two different strategies depending on the allocation size.

Now watch the system calls:

$ strace -e brk,mmap,munmap ./allocs 2>&1 | grep -E "^(brk|mmap|munmap)"
brk(NULL)                         = 0x55a000
brk(0x57b000)                     = 0x57b000     ← extend heap
mmap(NULL, 266240, ..., MAP_PRIVATE|MAP_ANONYMOUS, ...) = 0x7f4a00000000  ← big alloc
munmap(0x7f4a00000000, 266240)    = 0             ← big free

malloc called brk for the small ones. mmap for the big one. Two paths to the kernel.

malloc is NOT a system call

This surprises people. malloc() is a library function — it runs entirely in user space. It maintains its own data structures, its own free lists, its own bookkeeping. It only calls the kernel when it needs more memory from the OS.

    Your code
    ─────────
    ptr = malloc(32);
         │
         ▼
    ┌──────────────────────────────────────────────┐
    │           C Library (glibc, musl, etc.)       │
    │                                               │
    │  malloc() implementation:                     │
    │    1. Check free list — is there a free chunk │
    │       that fits?                              │
    │       YES → carve it out, return pointer      │
    │       NO  → ask the kernel for more memory    │
    │             (brk or mmap)                     │
    │                                               │
    │  free() implementation:                       │
    │    1. Mark chunk as free                      │
    │    2. Add to free list                        │
    │    3. Maybe coalesce with neighbors           │
    │    4. Maybe return memory to kernel            │
    └──────────────────────────────────────────────┘
         │
         │ Only when needed:
         ▼
    ┌──────────────────────────────────────────────┐
    │           Kernel                              │
    │   brk()  — extend the heap (contiguous)      │
    │   mmap() — map pages anywhere                │
    └──────────────────────────────────────────────┘

Two ways to get memory from the kernel

brk / sbrk: the heap

The classic heap is a contiguous region that grows upward. brk() moves the "program break" — the boundary between allocated and unallocated heap space.

    Before malloc:

    0x555555570000 ┌──────────────────┐
                   │ .data, .bss      │
    0x555555580000 ├──────────────────┤ ◄── program break (brk)
                   │                  │
                   │ (unallocated)    │
                   │                  │
                   └──────────────────┘

    After malloc(32) + malloc(128):

    0x555555570000 ┌──────────────────┐
                   │ .data, .bss      │
    0x555555580000 ├──────────────────┤ ◄── old brk
                   │ [chunk: 32 bytes]│
                   │ [chunk: 128 bytes│
    0x55555559B000 ├──────────────────┤ ◄── new brk (moved up)
                   │ (unallocated)    │
                   └──────────────────┘

brk is fast (just moving a pointer in kernel data structures) but only works for contiguous growth.

mmap anonymous: pages anywhere

For large allocations (typically >128 KB), the allocator uses mmap with MAP_ANONYMOUS to get pages at an arbitrary virtual address:

    mmap region (somewhere in the address space):

    0x7f4a00000000 ┌──────────────────────┐
                   │                      │
                   │  256 KB              │  ◄── mmap'd directly
                   │  (65 pages)          │
                   │                      │
    0x7f4a00040000 └──────────────────────┘

Advantages of mmap for large allocations:

Can be returned to the OS immediately via munmap(). With brk, you can only shrink the heap from the top — a free block in the middle stays allocated.
No fragmentation of the brk region — large blocks don't interfere with small ones.

Inside a heap chunk

When you malloc(32), the allocator actually allocates more than 32 bytes. Every chunk has a header containing metadata:

    What malloc returns          What's actually in memory
    ──────────────────           ────────────────────────────

    ptr ──────────────────────►  ┌─────────────────────────────┐
                                 │ prev_size (8 bytes)         │ ◄── only used if
                                 │                             │     previous chunk is free
                                 ├─────────────────────────────┤
                                 │ size + flags (8 bytes)      │ ◄── chunk header
                                 │ bits: [size | A | M | P]   │
    ptr points HERE ──────────►  ├─────────────────────────────┤
                                 │                             │
                                 │ User data (32 bytes)        │ ◄── what you requested
                                 │                             │
                                 ├─────────────────────────────┤
                                 │ (alignment padding)         │
                                 └─────────────────────────────┘

    Total chunk size: 48 bytes for a 32-byte request
    Overhead: 16 bytes (prev_size + size header)

The flags in the size field:

P (PREV_INUSE): is the previous chunk allocated?
M (IS_MMAPPED): was this chunk allocated via mmap?
A (NON_MAIN_ARENA): does this belong to a non-main arena? (threading)

The malloc pointer you get back is 16 bytes past the start of the actual chunk. When you call free(ptr), the allocator subtracts 16 to find the header.

The free list

When you free() a chunk, the allocator doesn't give the memory back to the kernel (usually). It adds the chunk to a free list — a linked list of available chunks, threaded through the chunks themselves.

    Heap after several malloc + free operations:

    ┌──────────────────────────────────────────────────────────┐
    │                                                          │
    │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐        │
    │  │ALLOC   │  │ FREE   │  │ALLOC   │  │ FREE   │        │
    │  │ 64 B   │  │ 128 B  │  │ 32 B   │  │ 256 B  │        │
    │  │        │  │  ┌──┐  │  │        │  │  ┌──┐  │        │
    │  │        │  │  │FD├──┼──┼────────┼──┼─►│FD├──┼──► ... │
    │  │        │  │  │BK│◄─┼──┼────────┼──┼──│BK│  │        │
    │  │        │  │  └──┘  │  │        │  │  └──┘  │        │
    │  └────────┘  └────────┘  └────────┘  └────────┘        │
    │                                                          │
    └──────────────────────────────────────────────────────────┘

    FD = forward pointer (next free chunk)
    BK = backward pointer (previous free chunk)

    The free list is a doubly-linked list INSIDE the free chunks.
    No extra memory needed — the user data area stores the pointers.

malloc(32): the full sequence

    1. malloc(32) called

    2. Round up to minimum chunk size: 32 + 16 (header) = 48 bytes
       Align to 16 bytes: 48 bytes

    3. Check fastbin for 48-byte chunks
       ├── Found? → unlink from fastbin, return user pointer
       └── Not found? → continue

    4. Check small bin for 48-byte chunks
       ├── Found? → unlink from bin, return user pointer
       └── Not found? → continue

    5. Check unsorted bin — scan for any chunk that fits
       ├── Exact match? → return it
       ├── Larger chunk? → split it, return the piece, put remainder in bin
       └── Nothing? → continue

    6. Check larger bins for a bigger chunk to split
       ├── Found? → split, return piece, put remainder back
       └── Not found? → continue

    7. Ask the kernel for more memory
       brk() to extend the heap by at least 128 KB
       Carve a 48-byte chunk from the new space
       Return user pointer

free(ptr): what happens

    1. free(ptr) called

    2. Subtract 16 bytes to find chunk header
       Read the size field → this chunk is 48 bytes

    3. Is previous chunk free? (check PREV_INUSE bit)
       ├── Yes → coalesce: merge with previous chunk, update size
       └── No → skip

    4. Is next chunk free? (check next chunk's PREV_INUSE)
       ├── Yes → coalesce: merge with next chunk, update size
       └── No → set next chunk's PREV_INUSE = 0

    5. Add the (possibly coalesced) chunk to the appropriate free list
       Small chunk (< 512 bytes) → fastbin or smallbin
       Larger chunk → unsorted bin

    6. Was this chunk mmap'd? (IS_MMAPPED flag)
       ├── Yes → munmap() it immediately — returns pages to kernel
       └── No → it stays in the heap free list

Coalescing is critical. Without it, you'd end up with many small free chunks that can't satisfy larger requests, even though the total free space is large. This is external fragmentation.

Why double-free is catastrophic

int *p = malloc(32);
free(p);
free(p);  // DOUBLE FREE — disaster

    After first free(p):

    Free list:  HEAD → [chunk at p] → ...
                         │
                         └─ FD points to next free chunk

    After second free(p):

    Free list:  HEAD → [chunk at p] → [chunk at p] → ...
                         │                  │
                         └──────────────────┘
                         (circular! chunk points to itself)

    Now malloc(32) returns p.
    Then malloc(32) returns p AGAIN.
    Two different parts of your program think they own the same memory.
    They overwrite each other's data. Heap corruption. Potential code execution.

This is why double-free is a security vulnerability, not just a bug.

🧠 What do you think happens?

You malloc(32), write to it, free() it, then immediately malloc(32) again. Do you get the same pointer back? Why or why not? (Try it.)

Thread safety: arenas

In a multithreaded program, multiple threads call malloc() simultaneously. A global lock would be a bottleneck. The solution: arenas.

    ┌────────────────────────────────────────────────────┐
    │                  Process                            │
    │                                                    │
    │   Thread 1          Thread 2          Thread 3     │
    │      │                 │                 │         │
    │      ▼                 ▼                 ▼         │
    │  ┌────────┐       ┌────────┐       ┌────────┐     │
    │  │Arena 0 │       │Arena 1 │       │Arena 2 │     │
    │  │(main)  │       │        │       │        │     │
    │  │ lock   │       │ lock   │       │ lock   │     │
    │  │ heap   │       │ heap   │       │ heap   │     │
    │  │ bins   │       │ bins   │       │ bins   │     │
    │  └────────┘       └────────┘       └────────┘     │
    │                                                    │
    │  Each arena has its own lock and free lists.       │
    │  Threads pick an arena (usually round-robin).      │
    │  Contention is reduced: threads rarely fight       │
    │  for the same lock.                                │
    └────────────────────────────────────────────────────┘

glibc creates up to 8 * num_cores arenas. Each thread is assigned to an arena and sticks with it (usually). The main arena uses brk(); secondary arenas use mmap().

Rust: same allocator underneath

Rust's default allocator is the system allocator — it literally calls malloc and free from your platform's C library.

fn main() {
    // Box::new calls the global allocator (malloc underneath)
    let x = Box::new(42);
    println!("x = {} at {:p}", x, &*x);
    // x dropped here → allocator calls free()

    // Vec grows by calling realloc or malloc+copy
    let mut v = Vec::new();
    for i in 0..10 {
        v.push(i);
        println!("len={}, cap={}, ptr={:p}", v.len(), v.capacity(), v.as_ptr());
    }
}

len=1, cap=4, ptr=0x55a020       ← initial allocation
len=2, cap=4, ptr=0x55a020
len=3, cap=4, ptr=0x55a020
len=4, cap=4, ptr=0x55a020
len=5, cap=8, ptr=0x55a040       ← doubled! Allocated new buffer, freed old
len=6, cap=8, ptr=0x55a040
len=7, cap=8, ptr=0x55a040
len=8, cap=8, ptr=0x55a040
len=9, cap=16, ptr=0x55a060      ← doubled again!
len=10, cap=16, ptr=0x55a060

You can swap in a different allocator with #[global_allocator]:

#![allow(unused)]
fn main() {
use std::alloc::System;

#[global_allocator]
static GLOBAL: System = System;  // Explicitly use system allocator (the default)

// Or use jemalloc, mimalloc, etc. for better multithreaded performance
}

💡 Fun Fact: Firefox switched from the system allocator to jemalloc and saw measurable performance improvements. The allocator you choose matters — it affects fragmentation, multithreaded scaling, and memory overhead. For most programs, the default is fine. For high-performance servers, it's worth benchmarking alternatives.

🔧 Task: Watch the allocator in action

Step 1: ltrace — see malloc/free calls:

$ ltrace -e malloc,calloc,realloc,free ./allocs 2>&1 | head -20

Step 2: strace — see when it talks to the kernel:

$ strace -e brk,mmap,munmap ./allocs 2>&1 | head -20

Step 3: Write a program that does many small allocations, then many large ones, and observe the difference:

// save as alloc_patterns.c — compile: gcc -o alloc_patterns alloc_patterns.c
#include <stdio.h>
#include <stdlib.h>

int main() {
    printf("=== Small allocations (brk region) ===\n");
    void *ptrs[10];
    for (int i = 0; i < 10; i++) {
        ptrs[i] = malloc(64);
        printf("  malloc(64) = %p\n", ptrs[i]);
    }

    printf("\n=== Large allocation (mmap region) ===\n");
    void *big = malloc(1024 * 1024);  // 1 MB
    printf("  malloc(1MB) = %p\n", big);

    // Free in reverse order — watch coalescing
    printf("\n=== Freeing ===\n");
    free(big);
    for (int i = 9; i >= 0; i--) {
        free(ptrs[i]);
    }
    printf("Done.\n");
    return 0;
}

Run with strace -e brk,mmap,munmap and observe: small allocations trigger one brk() call (to extend the heap). The large allocation triggers mmap(). free(big) triggers munmap(). The small free()s trigger nothing — the memory stays in the process's free list.

How Programs Really Run: From CPU to Process Memory