Memory Pools and Arena Allocators
malloc and free are general-purpose. General-purpose means slow for
specific patterns. When you allocate thousands of short-lived objects of the
same size, or build a parse tree that you discard all at once, custom
allocators win by an order of magnitude. This chapter builds both an arena
allocator and a pool allocator from scratch.
Why malloc Is Sometimes Too Slow
malloc must handle any size, any order of free, and thread safety. That
flexibility costs:
- Metadata overhead per allocation (typically 16-32 bytes).
- Fragmentation from interleaved alloc/free.
- Lock contention in multi-threaded programs.
- System calls (
brk/mmap) when the free list is empty.
For patterns like "allocate many, free all at once" or "allocate fixed-size blocks rapidly," we can do much better.
Arena Allocator: Bump and Reset
An arena is a contiguous block of memory. Allocation bumps a pointer forward. Freeing individual objects is not supported -- you free everything at once.
Arena layout:
+---------------------------------------------------+
| used memory | free space |
+---------------------------------------------------+
^ ^ ^
base offset base + capacity
Arena in C
/* arena.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint8_t *base;
size_t capacity;
size_t offset;
} Arena;
Arena arena_create(size_t capacity) {
Arena a;
a.base = (uint8_t *)malloc(capacity);
if (!a.base) {
fprintf(stderr, "arena: malloc failed\n");
exit(1);
}
a.capacity = capacity;
a.offset = 0;
return a;
}
void *arena_alloc(Arena *a, size_t size, size_t align) {
/* Align the current offset */
size_t aligned = (a->offset + align - 1) & ~(align - 1);
if (aligned + size > a->capacity) {
fprintf(stderr, "arena: out of memory (%zu requested, %zu free)\n",
size, a->capacity - a->offset);
return NULL;
}
void *ptr = a->base + aligned;
a->offset = aligned + size;
return ptr;
}
void arena_reset(Arena *a) {
a->offset = 0;
}
void arena_destroy(Arena *a) {
free(a->base);
a->base = NULL;
a->capacity = 0;
a->offset = 0;
}
/* ---- demo ---- */
typedef struct {
int x, y;
char label[24];
} Point;
int main(void) {
Arena arena = arena_create(1024 * 1024); /* 1 MB */
/* Allocate 1000 Points -- no individual free needed */
Point **points = arena_alloc(&arena, 1000 * sizeof(Point *),
_Alignof(Point *));
for (int i = 0; i < 1000; i++) {
points[i] = arena_alloc(&arena, sizeof(Point), _Alignof(Point));
points[i]->x = i;
points[i]->y = i * 2;
snprintf(points[i]->label, sizeof(points[i]->label), "pt_%d", i);
}
printf("Point 42: (%d, %d) \"%s\"\n",
points[42]->x, points[42]->y, points[42]->label);
printf("Arena used: %zu / %zu bytes\n", arena.offset, arena.capacity);
/* Free everything at once */
arena_reset(&arena);
printf("After reset: %zu bytes used\n", arena.offset);
arena_destroy(&arena);
return 0;
}
$ gcc -O2 -o arena arena.c && ./arena
Point 42: (42, 84) "pt_42"
Arena used: 40000 / 1048576 bytes
After reset: 0 bytes used
That's the entire allocator: 20 lines of logic. No free list, no metadata per object, no fragmentation.
Try It: Add a function
arena_alloc_string(Arena *a, const char *s)that copies a string into the arena and returns a pointer to it. Hint: usestrlen+arena_alloc+memcpy.
Arena Performance vs malloc
/* arena_bench.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>
typedef struct {
uint8_t *base;
size_t capacity;
size_t offset;
} Arena;
Arena arena_create(size_t cap) {
Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
return a;
}
void *arena_alloc(Arena *a, size_t size) {
size_t aligned = (a->offset + 7) & ~(size_t)7;
void *p = a->base + aligned;
a->offset = aligned + size;
return p;
}
static double now(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec + ts.tv_nsec / 1e9;
}
int main(void) {
enum { N = 1000000 };
/* Benchmark malloc */
void **ptrs = malloc(N * sizeof(void *));
double t0 = now();
for (int i = 0; i < N; i++)
ptrs[i] = malloc(64);
double t1 = now();
printf("malloc: %.4f s\n", t1 - t0);
for (int i = 0; i < N; i++)
free(ptrs[i]);
free(ptrs);
/* Benchmark arena */
Arena a = arena_create((size_t)N * 72);
t0 = now();
for (int i = 0; i < N; i++)
arena_alloc(&a, 64);
t1 = now();
printf("arena: %.4f s\n", t1 - t0);
free(a.base);
return 0;
}
Typical result: arena allocation is 5-20x faster than malloc for small objects.
Pool Allocator: Fixed-Size Blocks
A pool allocator manages blocks of identical size. Freed blocks go onto a free list for reuse.
Pool layout (block size = 32 bytes):
+--------+--------+--------+--------+--------+
| block0 | block1 | block2 | block3 | block4 | ...
+--------+--------+--------+--------+--------+
Free list (embedded in unused blocks):
block2 -> block0 -> block4 -> NULL
The trick: when a block is free, we store the free-list pointer inside the block itself. No extra metadata.
/* pool.c */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint8_t *memory;
void *free_list;
size_t block_size;
size_t block_count;
} Pool;
Pool pool_create(size_t block_size, size_t block_count) {
/* Block must be large enough to hold a pointer */
if (block_size < sizeof(void *))
block_size = sizeof(void *);
Pool p;
p.block_size = block_size;
p.block_count = block_count;
p.memory = (uint8_t *)malloc(block_size * block_count);
if (!p.memory) {
fprintf(stderr, "pool: malloc failed\n");
exit(1);
}
/* Build the free list */
p.free_list = NULL;
for (size_t i = 0; i < block_count; i++) {
void *block = p.memory + i * block_size;
*(void **)block = p.free_list;
p.free_list = block;
}
return p;
}
void *pool_alloc(Pool *p) {
if (!p->free_list) {
fprintf(stderr, "pool: exhausted\n");
return NULL;
}
void *block = p->free_list;
p->free_list = *(void **)block;
return block;
}
void pool_free(Pool *p, void *block) {
*(void **)block = p->free_list;
p->free_list = block;
}
void pool_destroy(Pool *p) {
free(p->memory);
p->memory = NULL;
p->free_list = NULL;
}
/* ---- demo ---- */
typedef struct {
int id;
double value;
} Record;
int main(void) {
Pool pool = pool_create(sizeof(Record), 1024);
Record *r1 = pool_alloc(&pool);
Record *r2 = pool_alloc(&pool);
Record *r3 = pool_alloc(&pool);
r1->id = 1; r1->value = 3.14;
r2->id = 2; r2->value = 2.72;
r3->id = 3; r3->value = 1.41;
printf("r1: id=%d val=%.2f\n", r1->id, r1->value);
printf("r2: id=%d val=%.2f\n", r2->id, r2->value);
/* Return r2 to pool */
pool_free(&pool, r2);
/* Reuse that block */
Record *r4 = pool_alloc(&pool);
r4->id = 4; r4->value = 9.81;
printf("r4: id=%d val=%.2f (reused r2's block)\n", r4->id, r4->value);
printf("r4 == r2 address? %s\n", (r4 == r2) ? "yes" : "no");
pool_destroy(&pool);
return 0;
}
$ gcc -O2 -o pool pool.c && ./pool
r1: id=1 val=3.14
r2: id=2 val=2.72
r4: id=4 val=9.81 (reused r2's block)
r4 == r2 address? yes
Caution: Using a pool-allocated block after
pool_freeis use-after-free. The pool won't detect it. You'll corrupt the free list.
Rust: bumpalo and typed-arena
Rust has crate-level arena allocators that integrate with the borrow checker.
bumpalo (bump allocator)
// Cargo.toml: bumpalo = "3" use bumpalo::Bump; fn main() { let arena = Bump::new(); // Allocate values -- they live as long as `arena` let x = arena.alloc(42_i32); let y = arena.alloc(3.14_f64); let s = arena.alloc_str("hello from the arena"); println!("x = {x}, y = {y}, s = \"{s}\""); println!("Arena used: {} bytes", arena.allocated_bytes()); // Everything freed when `arena` drops }
typed-arena (single-type arena)
// Cargo.toml: typed-arena = "2" use typed_arena::Arena; struct Node { value: i32, label: String, } fn main() { let arena = Arena::new(); let nodes: Vec<&Node> = (0..1000) .map(|i| { arena.alloc(Node { value: i, label: format!("node_{i}"), }) }) .collect(); println!("Node 42: value={}, label=\"{}\"", nodes[42].value, nodes[42].label); }
Rust Note: Rust arenas return references (
&T) with the arena's lifetime. The borrow checker prevents use-after-free at compile time. This is the biggest difference from C arenas, where dangling pointers are your problem.
When to Use Each Allocator
+------------------+----------------------------+---------------------------+
| Pattern | Allocator | Why |
+------------------+----------------------------+---------------------------+
| Parse a request, | Arena | Alloc many, free all at |
| process, discard | | once. Zero fragmentation. |
+------------------+----------------------------+---------------------------+
| Game loop: | Arena (per-frame) | Reset at frame boundary. |
| alloc per frame | | No GC pauses. |
+------------------+----------------------------+---------------------------+
| Connection pool: | Pool | Fixed-size blocks. Fast |
| reuse objects | | alloc/free. Reuse memory. |
+------------------+----------------------------+---------------------------+
| Mixed sizes, | malloc/free (or jemalloc) | General-purpose is fine |
| long lifetimes | | when patterns are random. |
+------------------+----------------------------+---------------------------+
Real-World Patterns
Protocol Parser with Arena
A network server receives a packet, parses headers and fields into an arena, processes the request, then resets the arena for the next packet.
/* parse_loop.c — sketch of arena-based packet parsing */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint8_t *base;
size_t capacity;
size_t offset;
} Arena;
Arena arena_create(size_t cap) {
Arena a = { .base = malloc(cap), .capacity = cap, .offset = 0 };
return a;
}
void *arena_alloc(Arena *a, size_t size) {
size_t aligned = (a->offset + 7) & ~(size_t)7;
void *p = a->base + aligned;
a->offset = aligned + size;
return p;
}
void arena_reset(Arena *a) { a->offset = 0; }
typedef struct {
char *method; /* "GET", "POST", etc. */
char *path; /* "/index.html" */
int num_headers;
} HttpRequest;
/* Parse a (fake) request into the arena */
HttpRequest *parse_request(Arena *a, const char *raw) {
HttpRequest *req = arena_alloc(a, sizeof(HttpRequest));
/* Copy method */
req->method = arena_alloc(a, 8);
strncpy(req->method, raw, 3);
req->method[3] = '\0';
/* Copy path */
req->path = arena_alloc(a, 256);
strncpy(req->path, raw + 4, 11);
req->path[11] = '\0';
req->num_headers = 3; /* placeholder */
return req;
}
int main(void) {
Arena arena = arena_create(4096);
/* Simulate processing 3 requests */
const char *requests[] = {
"GET /index.html HTTP/1.1",
"POST /api/data HTTP/1.1",
"GET /style.css HTTP/1.1",
};
for (int i = 0; i < 3; i++) {
arena_reset(&arena); /* free previous request's data */
HttpRequest *req = parse_request(&arena, requests[i]);
printf("Request %d: method=%s path=%s headers=%d (arena=%zu bytes)\n",
i, req->method, req->path, req->num_headers, arena.offset);
}
free(arena.base);
return 0;
}
$ gcc -O2 -o parse_loop parse_loop.c && ./parse_loop
Request 0: method=GET path=/index.html headers=3 (arena=288 bytes)
Request 1: method=POS path=/api/data HT headers=3 (arena=288 bytes)
Request 2: method=GET path=/style.css headers=3 (arena=288 bytes)
Rust: Per-Request Arena
use bumpalo::Bump; struct Request<'a> { method: &'a str, path: &'a str, } fn parse_request<'a>(arena: &'a Bump, raw: &str) -> Request<'a> { let parts: Vec<&str> = raw.splitn(3, ' ').collect(); Request { method: arena.alloc_str(parts[0]), path: arena.alloc_str(parts[1]), } } fn main() { let requests = [ "GET /index.html HTTP/1.1", "POST /api/data HTTP/1.1", "GET /style.css HTTP/1.1", ]; let arena = Bump::new(); for (i, raw) in requests.iter().enumerate() { arena.reset(); let req = parse_request(&arena, raw); println!("Request {i}: {} {} (arena={} bytes)", req.method, req.path, arena.allocated_bytes()); } }
Driver Prep: Kernel memory allocation uses slab allocators (kmem_cache), which are essentially pool allocators for fixed-size kernel objects. The concepts here map directly to
kmem_cache_create/kmem_cache_alloc/kmem_cache_freein the kernel.
Growing an Arena
The simple arena above has a fixed capacity. A production arena grows by chaining blocks:
+--------+ +--------+ +--------+
| block1 | --> | block2 | --> | block3 |
| 4 KB | | 8 KB | | 16 KB | (double each time)
+--------+ +--------+ +--------+
Reset means: keep block1, free the rest. This gives good amortized performance without wasting memory on small workloads.
Try It: Extend the C arena to support growing. When
arena_allocruns out of space, allocate a new block (double the previous size), link it, and continue.arena_resetshould free all blocks except the first.
Quick Knowledge Check
- Why can an arena allocator skip tracking individual frees?
- How does a pool allocator store its free list without extra metadata?
- When is
mallocthe right choice over an arena or pool?
Common Pitfalls
- Using arena memory after reset. All pointers are invalidated. Same as use-after-free.
- Pool block too small. Must be at least
sizeof(void*)to hold the free-list pointer. - Forgetting alignment. Bumping by
sizewithout aligning causes bus errors on strict-alignment architectures (ARM). - Arena for long-lived objects. If you can't reset, the arena just grows forever. Use a pool or malloc.
- Thread safety. Neither allocator above is thread-safe. Add a mutex or use per-thread arenas.