Memory Pools and Arena Allocators

malloc and free are general-purpose. General-purpose means slow for specific patterns. When you allocate thousands of short-lived objects of the same size, or build a parse tree that you discard all at once, custom allocators win by an order of magnitude. This chapter builds both an arena allocator and a pool allocator from scratch.

Why malloc Is Sometimes Too Slow

malloc must handle any size, any order of free, and thread safety. That flexibility costs:

  • Metadata overhead per allocation (typically 16-32 bytes).
  • Fragmentation from interleaved alloc/free.
  • Lock contention in multi-threaded programs.
  • System calls (brk/mmap) when the free list is empty.

For patterns like "allocate many, free all at once" or "allocate fixed-size blocks rapidly," we can do much better.

Arena Allocator: Bump and Reset

An arena is a contiguous block of memory. Allocation bumps a pointer forward. Freeing individual objects is not supported -- you free everything at once.

Arena layout:

+---------------------------------------------------+
| used memory          | free space                  |
+---------------------------------------------------+
^                      ^                             ^
base                   offset                        base + capacity

Arena in C

/* arena.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    uint8_t *base;
    size_t   capacity;
    size_t   offset;
} Arena;

Arena arena_create(size_t capacity) {
    Arena a;
    a.base = (uint8_t *)malloc(capacity);
    if (!a.base) {
        fprintf(stderr, "arena: malloc failed\n");
        exit(1);
    }
    a.capacity = capacity;
    a.offset = 0;
    return a;
}

void *arena_alloc(Arena *a, size_t size, size_t align) {
    /* Align the current offset */
    size_t aligned = (a->offset + align - 1) & ~(align - 1);
    if (aligned + size > a->capacity) {
        fprintf(stderr, "arena: out of memory (%zu requested, %zu free)\n",
                size, a->capacity - a->offset);
        return NULL;
    }
    void *ptr = a->base + aligned;
    a->offset = aligned + size;
    return ptr;
}

void arena_reset(Arena *a) {
    a->offset = 0;
}

void arena_destroy(Arena *a) {
    free(a->base);
    a->base = NULL;
    a->capacity = 0;
    a->offset = 0;
}

/* ---- demo ---- */

typedef struct {
    int x, y;
    char label[24];
} Point;

int main(void) {
    Arena arena = arena_create(1024 * 1024); /* 1 MB */

    /* Allocate 1000 Points -- no individual free needed */
    Point **points = arena_alloc(&arena, 1000 * sizeof(Point *),
                                 _Alignof(Point *));

    for (int i = 0; i < 1000; i++) {
        points[i] = arena_alloc(&arena, sizeof(Point), _Alignof(Point));
        points[i]->x = i;
        points[i]->y = i * 2;
        snprintf(points[i]->label, sizeof(points[i]->label), "pt_%d", i);
    }

    printf("Point 42: (%d, %d) \"%s\"\n",
           points[42]->x, points[42]->y, points[42]->label);
    printf("Arena used: %zu / %zu bytes\n", arena.offset, arena.capacity);

    /* Free everything at once */
    arena_reset(&arena);
    printf("After reset: %zu bytes used\n", arena.offset);

    arena_destroy(&arena);
    return 0;
}
$ gcc -O2 -o arena arena.c && ./arena
Point 42: (42, 84) "pt_42"
Arena used: 40000 / 1048576 bytes
After reset: 0 bytes used

That's the entire allocator: 20 lines of logic. No free list, no metadata per object, no fragmentation.

Try It: Add a function arena_alloc_string(Arena *a, const char *s) that copies a string into the arena and returns a pointer to it. Hint: use strlen + arena_alloc + memcpy.

Arena Performance vs malloc

/* arena_bench.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>

typedef struct {
    uint8_t *base;
    size_t   capacity;
    size_t   offset;
} Arena;

Arena arena_create(size_t cap) {
    Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
    return a;
}

void *arena_alloc(Arena *a, size_t size) {
    size_t aligned = (a->offset + 7) & ~(size_t)7;
    void *p = a->base + aligned;
    a->offset = aligned + size;
    return p;
}

static double now(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec + ts.tv_nsec / 1e9;
}

int main(void) {
    enum { N = 1000000 };

    /* Benchmark malloc */
    void **ptrs = malloc(N * sizeof(void *));
    double t0 = now();
    for (int i = 0; i < N; i++)
        ptrs[i] = malloc(64);
    double t1 = now();
    printf("malloc: %.4f s\n", t1 - t0);
    for (int i = 0; i < N; i++)
        free(ptrs[i]);
    free(ptrs);

    /* Benchmark arena */
    Arena a = arena_create((size_t)N * 72);
    t0 = now();
    for (int i = 0; i < N; i++)
        arena_alloc(&a, 64);
    t1 = now();
    printf("arena:  %.4f s\n", t1 - t0);
    free(a.base);

    return 0;
}

Typical result: arena allocation is 5-20x faster than malloc for small objects.

Pool Allocator: Fixed-Size Blocks

A pool allocator manages blocks of identical size. Freed blocks go onto a free list for reuse.

Pool layout (block size = 32 bytes):

+--------+--------+--------+--------+--------+
| block0 | block1 | block2 | block3 | block4 | ...
+--------+--------+--------+--------+--------+

Free list (embedded in unused blocks):

block2 -> block0 -> block4 -> NULL

The trick: when a block is free, we store the free-list pointer inside the block itself. No extra metadata.

/* pool.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    uint8_t *memory;
    void    *free_list;
    size_t   block_size;
    size_t   block_count;
} Pool;

Pool pool_create(size_t block_size, size_t block_count) {
    /* Block must be large enough to hold a pointer */
    if (block_size < sizeof(void *))
        block_size = sizeof(void *);

    Pool p;
    p.block_size  = block_size;
    p.block_count = block_count;
    p.memory      = (uint8_t *)malloc(block_size * block_count);
    if (!p.memory) {
        fprintf(stderr, "pool: malloc failed\n");
        exit(1);
    }

    /* Build the free list */
    p.free_list = NULL;
    for (size_t i = 0; i < block_count; i++) {
        void *block = p.memory + i * block_size;
        *(void **)block = p.free_list;
        p.free_list = block;
    }

    return p;
}

void *pool_alloc(Pool *p) {
    if (!p->free_list) {
        fprintf(stderr, "pool: exhausted\n");
        return NULL;
    }
    void *block = p->free_list;
    p->free_list = *(void **)block;
    return block;
}

void pool_free(Pool *p, void *block) {
    *(void **)block = p->free_list;
    p->free_list = block;
}

void pool_destroy(Pool *p) {
    free(p->memory);
    p->memory = NULL;
    p->free_list = NULL;
}

/* ---- demo ---- */

typedef struct {
    int id;
    double value;
} Record;

int main(void) {
    Pool pool = pool_create(sizeof(Record), 1024);

    Record *r1 = pool_alloc(&pool);
    Record *r2 = pool_alloc(&pool);
    Record *r3 = pool_alloc(&pool);

    r1->id = 1; r1->value = 3.14;
    r2->id = 2; r2->value = 2.72;
    r3->id = 3; r3->value = 1.41;

    printf("r1: id=%d val=%.2f\n", r1->id, r1->value);
    printf("r2: id=%d val=%.2f\n", r2->id, r2->value);

    /* Return r2 to pool */
    pool_free(&pool, r2);

    /* Reuse that block */
    Record *r4 = pool_alloc(&pool);
    r4->id = 4; r4->value = 9.81;
    printf("r4: id=%d val=%.2f (reused r2's block)\n", r4->id, r4->value);
    printf("r4 == r2 address? %s\n", (r4 == r2) ? "yes" : "no");

    pool_destroy(&pool);
    return 0;
}
$ gcc -O2 -o pool pool.c && ./pool
r1: id=1 val=3.14
r2: id=2 val=2.72
r4: id=4 val=9.81 (reused r2's block)
r4 == r2 address? yes

Caution: Using a pool-allocated block after pool_free is use-after-free. The pool won't detect it. You'll corrupt the free list.

Rust: bumpalo and typed-arena

Rust has crate-level arena allocators that integrate with the borrow checker.

bumpalo (bump allocator)

// Cargo.toml: bumpalo = "3"
use bumpalo::Bump;

fn main() {
    let arena = Bump::new();

    // Allocate values -- they live as long as `arena`
    let x = arena.alloc(42_i32);
    let y = arena.alloc(3.14_f64);
    let s = arena.alloc_str("hello from the arena");

    println!("x = {x}, y = {y}, s = \"{s}\"");
    println!("Arena used: {} bytes", arena.allocated_bytes());

    // Everything freed when `arena` drops
}

typed-arena (single-type arena)

// Cargo.toml: typed-arena = "2"
use typed_arena::Arena;

struct Node {
    value: i32,
    label: String,
}

fn main() {
    let arena = Arena::new();

    let nodes: Vec<&Node> = (0..1000)
        .map(|i| {
            arena.alloc(Node {
                value: i,
                label: format!("node_{i}"),
            })
        })
        .collect();

    println!("Node 42: value={}, label=\"{}\"",
             nodes[42].value, nodes[42].label);
}

Rust Note: Rust arenas return references (&T) with the arena's lifetime. The borrow checker prevents use-after-free at compile time. This is the biggest difference from C arenas, where dangling pointers are your problem.

When to Use Each Allocator

+------------------+----------------------------+---------------------------+
| Pattern          | Allocator                  | Why                       |
+------------------+----------------------------+---------------------------+
| Parse a request, | Arena                      | Alloc many, free all at   |
| process, discard |                            | once. Zero fragmentation. |
+------------------+----------------------------+---------------------------+
| Game loop:       | Arena (per-frame)          | Reset at frame boundary.  |
| alloc per frame  |                            | No GC pauses.             |
+------------------+----------------------------+---------------------------+
| Connection pool: | Pool                       | Fixed-size blocks. Fast   |
| reuse objects    |                            | alloc/free. Reuse memory. |
+------------------+----------------------------+---------------------------+
| Mixed sizes,     | malloc/free (or jemalloc)  | General-purpose is fine   |
| long lifetimes   |                            | when patterns are random. |
+------------------+----------------------------+---------------------------+

Real-World Patterns

Protocol Parser with Arena

A network server receives a packet, parses headers and fields into an arena, processes the request, then resets the arena for the next packet.

/* parse_loop.c — sketch of arena-based packet parsing */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    uint8_t *base;
    size_t   capacity;
    size_t   offset;
} Arena;

Arena arena_create(size_t cap) {
    Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
    return a;
}

void *arena_alloc(Arena *a, size_t size) {
    size_t aligned = (a->offset + 7) & ~(size_t)7;
    void *p = a->base + aligned;
    a->offset = aligned + size;
    return p;
}

void arena_reset(Arena *a) { a->offset = 0; }

typedef struct {
    char *method;   /* "GET", "POST", etc. */
    char *path;     /* "/index.html" */
    int   num_headers;
} HttpRequest;

/* Parse a (fake) request into the arena */
HttpRequest *parse_request(Arena *a, const char *raw) {
    HttpRequest *req = arena_alloc(a, sizeof(HttpRequest));

    /* Copy method */
    req->method = arena_alloc(a, 8);
    strncpy(req->method, raw, 3);
    req->method[3] = '\0';

    /* Copy path */
    req->path = arena_alloc(a, 256);
    strncpy(req->path, raw + 4, 11);
    req->path[11] = '\0';

    req->num_headers = 3; /* placeholder */
    return req;
}

int main(void) {
    Arena arena = arena_create(4096);

    /* Simulate processing 3 requests */
    const char *requests[] = {
        "GET /index.html HTTP/1.1",
        "POST /api/data HTTP/1.1",
        "GET /style.css HTTP/1.1",
    };

    for (int i = 0; i < 3; i++) {
        arena_reset(&arena); /* free previous request's data */

        HttpRequest *req = parse_request(&arena, requests[i]);
        printf("Request %d: method=%s path=%s headers=%d (arena=%zu bytes)\n",
               i, req->method, req->path, req->num_headers, arena.offset);
    }

    free(arena.base);
    return 0;
}
$ gcc -O2 -o parse_loop parse_loop.c && ./parse_loop
Request 0: method=GET path=/index.html headers=3 (arena=288 bytes)
Request 1: method=POS path=/api/data HT headers=3 (arena=288 bytes)
Request 2: method=GET path=/style.css  headers=3 (arena=288 bytes)

Rust: Per-Request Arena

use bumpalo::Bump;

struct Request<'a> {
    method: &'a str,
    path: &'a str,
}

fn parse_request<'a>(arena: &'a Bump, raw: &str) -> Request<'a> {
    let parts: Vec<&str> = raw.splitn(3, ' ').collect();
    Request {
        method: arena.alloc_str(parts[0]),
        path: arena.alloc_str(parts[1]),
    }
}

fn main() {
    let requests = [
        "GET /index.html HTTP/1.1",
        "POST /api/data HTTP/1.1",
        "GET /style.css HTTP/1.1",
    ];

    let arena = Bump::new();
    for (i, raw) in requests.iter().enumerate() {
        arena.reset();
        let req = parse_request(&arena, raw);
        println!("Request {i}: {} {} (arena={} bytes)",
                 req.method, req.path, arena.allocated_bytes());
    }
}

Driver Prep: Kernel memory allocation uses slab allocators (kmem_cache), which are essentially pool allocators for fixed-size kernel objects. The concepts here map directly to kmem_cache_create / kmem_cache_alloc / kmem_cache_free in the kernel.

Growing an Arena

The simple arena above has a fixed capacity. A production arena grows by chaining blocks:

+--------+     +--------+     +--------+
| block1 | --> | block2 | --> | block3 |
| 4 KB   |     | 8 KB   |     | 16 KB  |   (double each time)
+--------+     +--------+     +--------+

Reset means: keep block1, free the rest. This gives good amortized performance without wasting memory on small workloads.

Try It: Extend the C arena to support growing. When arena_alloc runs out of space, allocate a new block (double the previous size), link it, and continue. arena_reset should free all blocks except the first.

Quick Knowledge Check

  1. Why can an arena allocator skip tracking individual frees?
  2. How does a pool allocator store its free list without extra metadata?
  3. When is malloc the right choice over an arena or pool?

Common Pitfalls

  • Using arena memory after reset. All pointers are invalidated. Same as use-after-free.
  • Pool block too small. Must be at least sizeof(void*) to hold the free-list pointer.
  • Forgetting alignment. Bumping by size without aligning causes bus errors on strict-alignment architectures (ARM).
  • Arena for long-lived objects. If you can't reset, the arena just grows forever. Use a pool or malloc.
  • Thread safety. Neither allocator above is thread-safe. Add a mutex or use per-thread arenas.