Data Structures in C and Rust
Every systems program is a data structure program. This chapter builds the classics by hand in C -- linked lists, hash tables, trees, stacks, queues -- then shows how Rust's standard library replaces most of that labor.
The Textbook Linked List
The singly linked list is the "hello world" of dynamic data structures. A node
holds data plus a pointer to the next node. The list ends when next is NULL.
/* slist.c -- singly linked list in C */
#include <stdio.h>
#include <stdlib.h>
struct node {
int data;
struct node *next;
};
/* Prepend a new node to the front of the list. */
struct node *list_push(struct node *head, int value)
{
struct node *n = malloc(sizeof(*n));
if (!n) {
perror("malloc");
exit(1);
}
n->data = value;
n->next = head;
return n;
}
/* Print every element. */
void list_print(const struct node *head)
{
for (const struct node *cur = head; cur; cur = cur->next)
printf("%d -> ", cur->data);
printf("NULL\n");
}
/* Free every node. */
void list_free(struct node *head)
{
while (head) {
struct node *tmp = head;
head = head->next;
free(tmp);
}
}
int main(void)
{
struct node *list = NULL;
for (int i = 1; i <= 5; i++)
list = list_push(list, i);
list_print(list); /* 5 -> 4 -> 3 -> 2 -> 1 -> NULL */
list_free(list);
return 0;
}
Memory layout after pushing 3, 2, 1:
head
|
v
+---+---+ +---+---+ +---+---+
| 1 | *-+--->| 2 | *-+--->| 3 | / |
+---+---+ +---+---+ +---+---+
data next data next data next (NULL)
Caution: Every
mallocmust pair with exactly onefree. Forget one -- memory leak. Free twice -- undefined behavior and likely a crash.
Try It: Add a
list_findfunction that returns a pointer to the first node whosedataequals a given value, or NULL if not found. Then add alist_removefunction that unlinks and frees that node.
The Kernel Way: Intrusive Lists
The textbook list above embeds data inside the list node. The Linux kernel flips this: it embeds a list node inside the data struct. This is called an intrusive list.
Textbook: Kernel (intrusive):
struct node { struct task_info {
int data; <--+ int pid;
struct node *next; | char name[16];
}; | struct list_head tasks; <-- just prev/next
| };
|
data lives data owns the link
inside node node lives inside data
The kernel's struct list_head is simply:
struct list_head {
struct list_head *next;
struct list_head *prev;
};
It is a doubly linked, circular list. The magic happens when you need to get
back from the embedded list_head to the enclosing struct.
container_of and offsetof
offsetof(type, member) returns the byte offset of member within type.
container_of subtracts that offset from a pointer to the member to recover
the parent struct.
/* container_of.c -- demonstrate container_of */
#include <stdio.h>
#include <stddef.h>
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
struct list_head {
struct list_head *next;
struct list_head *prev;
};
struct task_info {
int pid;
char name[16];
struct list_head link;
};
int main(void)
{
struct task_info t = { .pid = 42, .name = "init" };
/* Given only a pointer to the embedded link ... */
struct list_head *lh = &t.link;
/* ... recover the enclosing task_info. */
struct task_info *owner = container_of(lh, struct task_info, link);
printf("pid = %d, name = %s\n", owner->pid, owner->name);
printf("offsetof(task_info, link) = %zu\n",
offsetof(struct task_info, link));
return 0;
}
struct task_info layout:
byte 0 byte 4 byte 20
+------+------+-----------+----------+-----------+
| pid | pad | name[16] | link.next| link.prev |
+------+------+-----------+----------+-----------+
^ ^
| |
owner lh points here
owner = lh - offsetof(..., link)
Driver Prep: The kernel uses
container_ofthousands of times. When you write a driver, yourstruct my_deviceembeds astruct list_headfor the subsystem's device list, and you usecontainer_ofto recover it.
Hash Table with Chaining
A hash table maps keys to buckets. Collisions are handled by chaining -- each bucket is a linked list.
/* hashtable.c -- simple chained hash table */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NBUCKETS 16
struct entry {
char *key;
int value;
struct entry *next;
};
struct hashtable {
struct entry *buckets[NBUCKETS];
};
static unsigned hash(const char *s)
{
unsigned h = 5381;
while (*s)
h = h * 33 + (unsigned char)*s++;
return h % NBUCKETS;
}
void ht_insert(struct hashtable *ht, const char *key, int value)
{
unsigned idx = hash(key);
struct entry *e = malloc(sizeof(*e));
if (!e) { perror("malloc"); exit(1); }
e->key = strdup(key);
e->value = value;
e->next = ht->buckets[idx];
ht->buckets[idx] = e;
}
int ht_lookup(struct hashtable *ht, const char *key, int *out)
{
unsigned idx = hash(key);
for (struct entry *e = ht->buckets[idx]; e; e = e->next) {
if (strcmp(e->key, key) == 0) {
*out = e->value;
return 1; /* found */
}
}
return 0; /* not found */
}
void ht_free(struct hashtable *ht)
{
for (int i = 0; i < NBUCKETS; i++) {
struct entry *e = ht->buckets[i];
while (e) {
struct entry *tmp = e;
e = e->next;
free(tmp->key);
free(tmp);
}
}
}
int main(void)
{
struct hashtable ht = {0};
ht_insert(&ht, "alice", 100);
ht_insert(&ht, "bob", 200);
ht_insert(&ht, "carol", 300);
int val;
if (ht_lookup(&ht, "bob", &val))
printf("bob => %d\n", val);
if (!ht_lookup(&ht, "dave", &val))
printf("dave not found\n");
ht_free(&ht);
return 0;
}
buckets[0] -> NULL
buckets[1] -> NULL
buckets[2] -> [alice|100] -> NULL
...
buckets[7] -> [carol|300] -> [bob|200] -> NULL
...
buckets[15] -> NULL
Try It: Add an
ht_deletefunction that removes a key-value pair and frees its memory. Watch out for the head-of-list special case.
Binary Search Tree
/* bst.c -- binary search tree */
#include <stdio.h>
#include <stdlib.h>
struct bst_node {
int data;
struct bst_node *left;
struct bst_node *right;
};
struct bst_node *bst_insert(struct bst_node *root, int value)
{
if (!root) {
struct bst_node *n = malloc(sizeof(*n));
if (!n) { perror("malloc"); exit(1); }
n->data = value;
n->left = NULL;
n->right = NULL;
return n;
}
if (value < root->data)
root->left = bst_insert(root->left, value);
else if (value > root->data)
root->right = bst_insert(root->right, value);
return root;
}
void bst_inorder(const struct bst_node *root)
{
if (!root) return;
bst_inorder(root->left);
printf("%d ", root->data);
bst_inorder(root->right);
}
void bst_free(struct bst_node *root)
{
if (!root) return;
bst_free(root->left);
bst_free(root->right);
free(root);
}
int main(void)
{
struct bst_node *tree = NULL;
int vals[] = {5, 3, 7, 1, 4, 6, 8};
for (int i = 0; i < 7; i++)
tree = bst_insert(tree, vals[i]);
printf("In-order: ");
bst_inorder(tree); /* 1 2 3 4 5 6 7 8 */
printf("\n");
bst_free(tree);
return 0;
}
5
/ \
3 7
/ \ / \
1 4 6 8
Stack and Queue from Scratch
A stack is last-in-first-out. A queue is first-in-first-out. Both can be built on a linked list or on a contiguous array.
/* stack_queue.c -- array-based stack and queue */
#include <stdio.h>
#include <stdlib.h>
/* ---- Stack (LIFO) ---- */
struct stack {
int *data;
int top;
int cap;
};
struct stack stack_new(int cap)
{
struct stack s;
s.data = malloc(cap * sizeof(int));
if (!s.data) { perror("malloc"); exit(1); }
s.top = 0;
s.cap = cap;
return s;
}
void stack_push(struct stack *s, int val)
{
if (s->top >= s->cap) {
fprintf(stderr, "stack overflow\n");
exit(1);
}
s->data[s->top++] = val;
}
int stack_pop(struct stack *s)
{
if (s->top == 0) {
fprintf(stderr, "stack underflow\n");
exit(1);
}
return s->data[--s->top];
}
void stack_free(struct stack *s) { free(s->data); }
/* ---- Queue (FIFO, circular buffer) ---- */
struct queue {
int *data;
int head;
int tail;
int count;
int cap;
};
struct queue queue_new(int cap)
{
struct queue q;
q.data = malloc(cap * sizeof(int));
if (!q.data) { perror("malloc"); exit(1); }
q.head = 0;
q.tail = 0;
q.count = 0;
q.cap = cap;
return q;
}
void queue_enqueue(struct queue *q, int val)
{
if (q->count >= q->cap) {
fprintf(stderr, "queue full\n");
exit(1);
}
q->data[q->tail] = val;
q->tail = (q->tail + 1) % q->cap;
q->count++;
}
int queue_dequeue(struct queue *q)
{
if (q->count == 0) {
fprintf(stderr, "queue empty\n");
exit(1);
}
int val = q->data[q->head];
q->head = (q->head + 1) % q->cap;
q->count--;
return val;
}
void queue_free(struct queue *q) { free(q->data); }
int main(void)
{
/* Stack demo */
struct stack s = stack_new(8);
stack_push(&s, 10);
stack_push(&s, 20);
stack_push(&s, 30);
printf("stack pop: %d\n", stack_pop(&s)); /* 30 */
printf("stack pop: %d\n", stack_pop(&s)); /* 20 */
stack_free(&s);
/* Queue demo */
struct queue q = queue_new(8);
queue_enqueue(&q, 10);
queue_enqueue(&q, 20);
queue_enqueue(&q, 30);
printf("queue deq: %d\n", queue_dequeue(&q)); /* 10 */
printf("queue deq: %d\n", queue_dequeue(&q)); /* 20 */
queue_free(&q);
return 0;
}
Stack (after push 10, 20, 30):
top=3
|
v
[10][20][30][ ][ ][ ][ ][ ]
0 1 2 3 4 5 6 7
Queue (circular buffer, after enqueue 10, 20, 30):
head=0 tail=3
| |
v v
[10][20][30][ ][ ][ ][ ][ ]
0 1 2 3 4 5 6 7
Rust: The Standard Library Does the Heavy Lifting
Rust ships Vec, VecDeque, HashMap, BTreeMap, LinkedList, and more in
std::collections. You rarely build these from scratch.
// rust_collections.rs use std::collections::{VecDeque, HashMap, BTreeMap}; fn main() { // Vec -- growable array, also works as a stack let mut stack: Vec<i32> = Vec::new(); stack.push(10); stack.push(20); stack.push(30); println!("stack pop: {:?}", stack.pop()); // Some(30) println!("stack pop: {:?}", stack.pop()); // Some(20) // VecDeque -- double-ended queue (ring buffer internally) let mut queue: VecDeque<i32> = VecDeque::new(); queue.push_back(10); queue.push_back(20); queue.push_back(30); println!("queue deq: {:?}", queue.pop_front()); // Some(10) println!("queue deq: {:?}", queue.pop_front()); // Some(20) // HashMap let mut map = HashMap::new(); map.insert("alice", 100); map.insert("bob", 200); map.insert("carol", 300); if let Some(val) = map.get("bob") { println!("bob => {}", val); } // BTreeMap -- sorted by keys let mut btree = BTreeMap::new(); btree.insert(5, "five"); btree.insert(3, "three"); btree.insert(7, "seven"); for (k, v) in &btree { println!("{}: {}", k, v); // printed in sorted order } }
Rust Note: Rust's
Vecis not a linked list -- it is a contiguous, growable array. This is almost always what you want.LinkedListexists instd::collectionsbut is rarely the right choice because of poor cache locality.
Side-by-Side: C Linked List vs Rust Vec
| Operation | C (manual linked list) | Rust (Vec<T>) |
|---|---|---|
| Create | node *head = NULL; | let mut v = Vec::new(); |
| Prepend | allocate node, fix pointers | v.insert(0, val); |
| Append | walk to end, allocate, link | v.push(val); |
| Index access | walk the list -- O(n) | v[i] -- O(1) |
| Remove by idx | walk, relink, free -- O(n) | v.remove(i); -- O(n) |
| Memory layout | scattered heap allocations | one contiguous buffer |
| Cache behavior | terrible (pointer chasing) | excellent (sequential) |
| Safety | dangling pointers, double free | borrow checker enforced |
For almost all user-space programming, Vec beats a linked list. The linked
list wins only when you need O(1) insertion/removal in the middle and you
already hold a pointer to the node.
When You'd Still Write Your Own
- Embedded/no-alloc environments: You cannot use
Vecwithout an allocator. You write intrusive lists on a pre-allocated pool. - Kernel modules: The kernel has its own allocator and its own list macros.
You use
struct list_head, notstd::collections. - Lock-free data structures: Standard collections are not lock-free. You hand-roll with atomics.
- Performance-critical hot paths: When the profiler says the standard container is the bottleneck (rare, but it happens).
Driver Prep: In kernel space, you will use
struct list_head,struct hlist_head(hash list), andstruct rb_root(red-black tree). All are intrusive. Learn the pattern now; the kernel macros are the same idea.
Knowledge Check
- What does
container_of(ptr, type, member)compute, and why does the kernel need it? - Why is a
Vec(contiguous array) usually faster than a linked list for iteration, even though both are O(n)? - In the circular-buffer queue implementation, what happens if you forget
the modulo operation on
headandtail?
Common Pitfalls
- Forgetting to free every node in a linked list -- walk the list and free each node; do not just free the head.
- Off-by-one in circular buffers -- the modulo wrap must use the capacity, not the count.
- Using a linked list when a Vec would do -- cache misses from pointer chasing dominate on modern hardware.
- Dangling pointers after removal -- in C, after you
freea node, any pointer still referencing it is undefined behavior. - Hash function quality -- a bad hash clusters entries into a few buckets, turning O(1) average into O(n).