Data Structures in C and Rust

Every systems program is a data structure program. This chapter builds the classics by hand in C -- linked lists, hash tables, trees, stacks, queues -- then shows how Rust's standard library replaces most of that labor.

The Textbook Linked List

The singly linked list is the "hello world" of dynamic data structures. A node holds data plus a pointer to the next node. The list ends when next is NULL.

/* slist.c -- singly linked list in C */
#include <stdio.h>
#include <stdlib.h>

struct node {
    int data;
    struct node *next;
};

/* Prepend a new node to the front of the list. */
struct node *list_push(struct node *head, int value)
{
    struct node *n = malloc(sizeof(*n));
    if (!n) {
        perror("malloc");
        exit(1);
    }
    n->data  = value;
    n->next  = head;
    return n;
}

/* Print every element. */
void list_print(const struct node *head)
{
    for (const struct node *cur = head; cur; cur = cur->next)
        printf("%d -> ", cur->data);
    printf("NULL\n");
}

/* Free every node. */
void list_free(struct node *head)
{
    while (head) {
        struct node *tmp = head;
        head = head->next;
        free(tmp);
    }
}

int main(void)
{
    struct node *list = NULL;
    for (int i = 1; i <= 5; i++)
        list = list_push(list, i);

    list_print(list);   /* 5 -> 4 -> 3 -> 2 -> 1 -> NULL */
    list_free(list);
    return 0;
}

Memory layout after pushing 3, 2, 1:

  head
   |
   v
 +---+---+    +---+---+    +---+---+
 | 1 | *-+--->| 2 | *-+--->| 3 | / |
 +---+---+    +---+---+    +---+---+
  data next    data next    data next (NULL)

Caution: Every malloc must pair with exactly one free. Forget one -- memory leak. Free twice -- undefined behavior and likely a crash.

Try It: Add a list_find function that returns a pointer to the first node whose data equals a given value, or NULL if not found. Then add a list_remove function that unlinks and frees that node.

The Kernel Way: Intrusive Lists

The textbook list above embeds data inside the list node. The Linux kernel flips this: it embeds a list node inside the data struct. This is called an intrusive list.

Textbook:                     Kernel (intrusive):

 struct node {                struct task_info {
     int data;       <--+         int pid;
     struct node *next;  |        char name[16];
 };                      |        struct list_head tasks;  <-- just prev/next
                         |    };
                         |
              data lives      data owns the link
              inside node     node lives inside data

The kernel's struct list_head is simply:

struct list_head {
    struct list_head *next;
    struct list_head *prev;
};

It is a doubly linked, circular list. The magic happens when you need to get back from the embedded list_head to the enclosing struct.

container_of and offsetof

offsetof(type, member) returns the byte offset of member within type. container_of subtracts that offset from a pointer to the member to recover the parent struct.

/* container_of.c -- demonstrate container_of */
#include <stdio.h>
#include <stddef.h>

#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

struct list_head {
    struct list_head *next;
    struct list_head *prev;
};

struct task_info {
    int pid;
    char name[16];
    struct list_head link;
};

int main(void)
{
    struct task_info t = { .pid = 42, .name = "init" };

    /* Given only a pointer to the embedded link ... */
    struct list_head *lh = &t.link;

    /* ... recover the enclosing task_info. */
    struct task_info *owner = container_of(lh, struct task_info, link);

    printf("pid = %d, name = %s\n", owner->pid, owner->name);

    printf("offsetof(task_info, link) = %zu\n",
           offsetof(struct task_info, link));
    return 0;
}
struct task_info layout:

 byte 0        byte 4        byte 20
 +------+------+-----------+----------+-----------+
 | pid  | pad  |  name[16] | link.next| link.prev |
 +------+------+-----------+----------+-----------+
 ^                          ^
 |                          |
 owner              lh points here
                    owner = lh - offsetof(..., link)

Driver Prep: The kernel uses container_of thousands of times. When you write a driver, your struct my_device embeds a struct list_head for the subsystem's device list, and you use container_of to recover it.

Hash Table with Chaining

A hash table maps keys to buckets. Collisions are handled by chaining -- each bucket is a linked list.

/* hashtable.c -- simple chained hash table */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NBUCKETS 16

struct entry {
    char *key;
    int   value;
    struct entry *next;
};

struct hashtable {
    struct entry *buckets[NBUCKETS];
};

static unsigned hash(const char *s)
{
    unsigned h = 5381;
    while (*s)
        h = h * 33 + (unsigned char)*s++;
    return h % NBUCKETS;
}

void ht_insert(struct hashtable *ht, const char *key, int value)
{
    unsigned idx = hash(key);
    struct entry *e = malloc(sizeof(*e));
    if (!e) { perror("malloc"); exit(1); }
    e->key   = strdup(key);
    e->value = value;
    e->next  = ht->buckets[idx];
    ht->buckets[idx] = e;
}

int ht_lookup(struct hashtable *ht, const char *key, int *out)
{
    unsigned idx = hash(key);
    for (struct entry *e = ht->buckets[idx]; e; e = e->next) {
        if (strcmp(e->key, key) == 0) {
            *out = e->value;
            return 1;   /* found */
        }
    }
    return 0;  /* not found */
}

void ht_free(struct hashtable *ht)
{
    for (int i = 0; i < NBUCKETS; i++) {
        struct entry *e = ht->buckets[i];
        while (e) {
            struct entry *tmp = e;
            e = e->next;
            free(tmp->key);
            free(tmp);
        }
    }
}

int main(void)
{
    struct hashtable ht = {0};

    ht_insert(&ht, "alice", 100);
    ht_insert(&ht, "bob",   200);
    ht_insert(&ht, "carol", 300);

    int val;
    if (ht_lookup(&ht, "bob", &val))
        printf("bob => %d\n", val);

    if (!ht_lookup(&ht, "dave", &val))
        printf("dave not found\n");

    ht_free(&ht);
    return 0;
}
  buckets[0] -> NULL
  buckets[1] -> NULL
  buckets[2] -> [alice|100] -> NULL
  ...
  buckets[7] -> [carol|300] -> [bob|200] -> NULL
  ...
  buckets[15] -> NULL

Try It: Add an ht_delete function that removes a key-value pair and frees its memory. Watch out for the head-of-list special case.

Binary Search Tree

/* bst.c -- binary search tree */
#include <stdio.h>
#include <stdlib.h>

struct bst_node {
    int data;
    struct bst_node *left;
    struct bst_node *right;
};

struct bst_node *bst_insert(struct bst_node *root, int value)
{
    if (!root) {
        struct bst_node *n = malloc(sizeof(*n));
        if (!n) { perror("malloc"); exit(1); }
        n->data  = value;
        n->left  = NULL;
        n->right = NULL;
        return n;
    }
    if (value < root->data)
        root->left  = bst_insert(root->left, value);
    else if (value > root->data)
        root->right = bst_insert(root->right, value);
    return root;
}

void bst_inorder(const struct bst_node *root)
{
    if (!root) return;
    bst_inorder(root->left);
    printf("%d ", root->data);
    bst_inorder(root->right);
}

void bst_free(struct bst_node *root)
{
    if (!root) return;
    bst_free(root->left);
    bst_free(root->right);
    free(root);
}

int main(void)
{
    struct bst_node *tree = NULL;
    int vals[] = {5, 3, 7, 1, 4, 6, 8};

    for (int i = 0; i < 7; i++)
        tree = bst_insert(tree, vals[i]);

    printf("In-order: ");
    bst_inorder(tree);   /* 1 2 3 4 5 6 7 8 */
    printf("\n");

    bst_free(tree);
    return 0;
}
         5
        / \
       3   7
      / \ / \
     1  4 6  8

Stack and Queue from Scratch

A stack is last-in-first-out. A queue is first-in-first-out. Both can be built on a linked list or on a contiguous array.

/* stack_queue.c -- array-based stack and queue */
#include <stdio.h>
#include <stdlib.h>

/* ---- Stack (LIFO) ---- */
struct stack {
    int *data;
    int  top;
    int  cap;
};

struct stack stack_new(int cap)
{
    struct stack s;
    s.data = malloc(cap * sizeof(int));
    if (!s.data) { perror("malloc"); exit(1); }
    s.top = 0;
    s.cap = cap;
    return s;
}

void stack_push(struct stack *s, int val)
{
    if (s->top >= s->cap) {
        fprintf(stderr, "stack overflow\n");
        exit(1);
    }
    s->data[s->top++] = val;
}

int stack_pop(struct stack *s)
{
    if (s->top == 0) {
        fprintf(stderr, "stack underflow\n");
        exit(1);
    }
    return s->data[--s->top];
}

void stack_free(struct stack *s) { free(s->data); }

/* ---- Queue (FIFO, circular buffer) ---- */
struct queue {
    int *data;
    int  head;
    int  tail;
    int  count;
    int  cap;
};

struct queue queue_new(int cap)
{
    struct queue q;
    q.data  = malloc(cap * sizeof(int));
    if (!q.data) { perror("malloc"); exit(1); }
    q.head  = 0;
    q.tail  = 0;
    q.count = 0;
    q.cap   = cap;
    return q;
}

void queue_enqueue(struct queue *q, int val)
{
    if (q->count >= q->cap) {
        fprintf(stderr, "queue full\n");
        exit(1);
    }
    q->data[q->tail] = val;
    q->tail = (q->tail + 1) % q->cap;
    q->count++;
}

int queue_dequeue(struct queue *q)
{
    if (q->count == 0) {
        fprintf(stderr, "queue empty\n");
        exit(1);
    }
    int val = q->data[q->head];
    q->head = (q->head + 1) % q->cap;
    q->count--;
    return val;
}

void queue_free(struct queue *q) { free(q->data); }

int main(void)
{
    /* Stack demo */
    struct stack s = stack_new(8);
    stack_push(&s, 10);
    stack_push(&s, 20);
    stack_push(&s, 30);
    printf("stack pop: %d\n", stack_pop(&s));  /* 30 */
    printf("stack pop: %d\n", stack_pop(&s));  /* 20 */
    stack_free(&s);

    /* Queue demo */
    struct queue q = queue_new(8);
    queue_enqueue(&q, 10);
    queue_enqueue(&q, 20);
    queue_enqueue(&q, 30);
    printf("queue deq: %d\n", queue_dequeue(&q));  /* 10 */
    printf("queue deq: %d\n", queue_dequeue(&q));  /* 20 */
    queue_free(&q);

    return 0;
}
  Stack (after push 10, 20, 30):

  top=3
   |
   v
  [10][20][30][ ][ ][ ][ ][ ]
   0    1   2   3  4  5  6  7

  Queue (circular buffer, after enqueue 10, 20, 30):

  head=0           tail=3
   |                |
   v                v
  [10][20][30][ ][ ][ ][ ][ ]
   0    1   2   3  4  5  6  7

Rust: The Standard Library Does the Heavy Lifting

Rust ships Vec, VecDeque, HashMap, BTreeMap, LinkedList, and more in std::collections. You rarely build these from scratch.

// rust_collections.rs
use std::collections::{VecDeque, HashMap, BTreeMap};

fn main() {
    // Vec -- growable array, also works as a stack
    let mut stack: Vec<i32> = Vec::new();
    stack.push(10);
    stack.push(20);
    stack.push(30);
    println!("stack pop: {:?}", stack.pop());  // Some(30)
    println!("stack pop: {:?}", stack.pop());  // Some(20)

    // VecDeque -- double-ended queue (ring buffer internally)
    let mut queue: VecDeque<i32> = VecDeque::new();
    queue.push_back(10);
    queue.push_back(20);
    queue.push_back(30);
    println!("queue deq: {:?}", queue.pop_front());  // Some(10)
    println!("queue deq: {:?}", queue.pop_front());  // Some(20)

    // HashMap
    let mut map = HashMap::new();
    map.insert("alice", 100);
    map.insert("bob", 200);
    map.insert("carol", 300);
    if let Some(val) = map.get("bob") {
        println!("bob => {}", val);
    }

    // BTreeMap -- sorted by keys
    let mut btree = BTreeMap::new();
    btree.insert(5, "five");
    btree.insert(3, "three");
    btree.insert(7, "seven");
    for (k, v) in &btree {
        println!("{}: {}", k, v);  // printed in sorted order
    }
}

Rust Note: Rust's Vec is not a linked list -- it is a contiguous, growable array. This is almost always what you want. LinkedList exists in std::collections but is rarely the right choice because of poor cache locality.

Side-by-Side: C Linked List vs Rust Vec

OperationC (manual linked list)Rust (Vec<T>)
Createnode *head = NULL;let mut v = Vec::new();
Prependallocate node, fix pointersv.insert(0, val);
Appendwalk to end, allocate, linkv.push(val);
Index accesswalk the list -- O(n)v[i] -- O(1)
Remove by idxwalk, relink, free -- O(n)v.remove(i); -- O(n)
Memory layoutscattered heap allocationsone contiguous buffer
Cache behaviorterrible (pointer chasing)excellent (sequential)
Safetydangling pointers, double freeborrow checker enforced

For almost all user-space programming, Vec beats a linked list. The linked list wins only when you need O(1) insertion/removal in the middle and you already hold a pointer to the node.

When You'd Still Write Your Own

  • Embedded/no-alloc environments: You cannot use Vec without an allocator. You write intrusive lists on a pre-allocated pool.
  • Kernel modules: The kernel has its own allocator and its own list macros. You use struct list_head, not std::collections.
  • Lock-free data structures: Standard collections are not lock-free. You hand-roll with atomics.
  • Performance-critical hot paths: When the profiler says the standard container is the bottleneck (rare, but it happens).

Driver Prep: In kernel space, you will use struct list_head, struct hlist_head (hash list), and struct rb_root (red-black tree). All are intrusive. Learn the pattern now; the kernel macros are the same idea.

Knowledge Check

  1. What does container_of(ptr, type, member) compute, and why does the kernel need it?
  2. Why is a Vec (contiguous array) usually faster than a linked list for iteration, even though both are O(n)?
  3. In the circular-buffer queue implementation, what happens if you forget the modulo operation on head and tail?

Common Pitfalls

  • Forgetting to free every node in a linked list -- walk the list and free each node; do not just free the head.
  • Off-by-one in circular buffers -- the modulo wrap must use the capacity, not the count.
  • Using a linked list when a Vec would do -- cache misses from pointer chasing dominate on modern hardware.
  • Dangling pointers after removal -- in C, after you free a node, any pointer still referencing it is undefined behavior.
  • Hash function quality -- a bad hash clusters entries into a few buckets, turning O(1) average into O(n).