Threads and pthreads

Threads let you run multiple execution paths inside a single process, sharing the same address space. They are lighter than fork() because there is no page-table copy, no duplicated file descriptors, no COW overhead. This chapter covers POSIX threads in C and std::thread in Rust.

Why Threads?

Process with one thread:          Process with three threads:

+---------------------------+     +---------------------------+
| Code   | Data  | Heap     |     | Code   | Data  | Heap     |
|        |       |          |     |        | (shared)          |
+---------------------------+     +---------------------------+
| Stack                     |     | Stack-0 | Stack-1 | Stack-2|
+---------------------------+     +---------------------------+
| 1 program counter         |     | PC-0    | PC-1    | PC-2   |
+---------------------------+     +---------------------------+

Every thread shares the code, global data, heap, and file descriptors. Each thread gets its own stack and register set. This makes communication between threads trivial (just read shared memory) but also dangerous (data races).

Creating a Thread in C

/* thread_hello.c */
#include <stdio.h>
#include <pthread.h>

void *greet(void *arg) {
    int id = *(int *)arg;
    printf("Hello from thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t t;
    int id = 42;

    if (pthread_create(&t, NULL, greet, &id) != 0) {
        perror("pthread_create");
        return 1;
    }

    pthread_join(t, NULL);
    printf("Thread finished\n");
    return 0;
}

Compile with:

gcc -o thread_hello thread_hello.c -pthread

The -pthread flag links the pthreads library and defines the right macros.

pthread_create takes four arguments:

Argument	Meaning
`&t`	Where to store the thread ID
`NULL`	Thread attributes (NULL = defaults)
`greet`	The function to run
`&id`	Argument passed to that function

The thread function signature is always void *(*)(void *) -- it takes a void * and returns a void *.

Passing Arguments Safely

A common bug: passing a pointer to a stack variable that changes before the thread reads it.

/* broken_args.c -- DO NOT DO THIS */
#include <stdio.h>
#include <pthread.h>

void *print_id(void *arg) {
    int id = *(int *)arg;   /* race: main may have changed *arg */
    printf("Thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t threads[5];
    for (int i = 0; i < 5; i++) {
        pthread_create(&threads[i], NULL, print_id, &i);  /* BUG */
    }
    for (int i = 0; i < 5; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

Caution: The loop variable i is shared across all threads. By the time a thread reads *arg, i may already be 3 or 5. You might see "Thread 5" printed five times.

The fix: give each thread its own copy.

/* fixed_args.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *print_id(void *arg) {
    int id = *(int *)arg;
    free(arg);
    printf("Thread %d\n", id);
    return NULL;
}

int main(void) {
    pthread_t threads[5];
    for (int i = 0; i < 5; i++) {
        int *p = malloc(sizeof(int));
        *p = i;
        pthread_create(&threads[i], NULL, print_id, p);
    }
    for (int i = 0; i < 5; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

Each thread gets its own heap-allocated integer. The thread frees it after reading.

Try It: Modify broken_args.c to use an array int ids[5] instead of malloc. Set ids[i] = i before creating each thread. Does this fix the bug? Why or why not?

Return Values

A thread function returns void *. You retrieve it through pthread_join.

/* thread_return.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *compute_square(void *arg) {
    int val = *(int *)arg;
    int *result = malloc(sizeof(int));
    *result = val * val;
    return result;
}

int main(void) {
    pthread_t t;
    int input = 7;
    void *retval;

    pthread_create(&t, NULL, compute_square, &input);
    pthread_join(t, &retval);

    printf("7 squared = %d\n", *(int *)retval);
    free(retval);
    return 0;
}

Caution: Never return a pointer to a local variable from the thread function. The thread's stack is destroyed after it exits. Return heap-allocated memory or cast an integer to void *.

Joinable vs Detached Threads

By default, threads are joinable. If you never join them, you leak resources (similar to zombie processes). Detached threads clean up automatically when they exit.

/* detached.c */
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

void *background_work(void *arg) {
    (void)arg;
    sleep(1);
    printf("Background work done\n");
    return NULL;
}

int main(void) {
    pthread_t t;
    pthread_create(&t, NULL, background_work, NULL);
    pthread_detach(t);    /* cannot join after this */

    printf("Main continues immediately\n");
    sleep(2);  /* give detached thread time to finish */
    return 0;
}

You can also create a thread as detached from the start:

pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&t, &attr, func, arg);
pthread_attr_destroy(&attr);

Thread-Local Storage

Sometimes each thread needs its own copy of a variable. Three approaches in C:

1. The __thread keyword (GCC extension, also C11 _Thread_local):

/* tls_keyword.c */
#include <stdio.h>
#include <pthread.h>

__thread int counter = 0;

void *worker(void *arg) {
    int id = *(int *)arg;
    for (int i = 0; i < 1000; i++)
        counter++;
    printf("Thread %d: counter = %d\n", id, counter);
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    int id1 = 1, id2 = 2;
    pthread_create(&t1, NULL, worker, &id1);
    pthread_create(&t2, NULL, worker, &id2);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Main: counter = %d\n", counter);
    return 0;
}

Each thread sees counter = 1000. Main sees counter = 0. No synchronization needed.

2. pthread_key_create / pthread_getspecific / pthread_setspecific:

/* tls_key.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

static pthread_key_t key;

void destructor(void *val) {
    free(val);
}

void *worker(void *arg) {
    int *p = malloc(sizeof(int));
    *p = *(int *)arg;
    pthread_setspecific(key, p);

    int *my_val = pthread_getspecific(key);
    printf("Thread-local value: %d\n", *my_val);
    return NULL;
}

int main(void) {
    pthread_key_create(&key, destructor);

    pthread_t t1, t2;
    int a = 10, b = 20;
    pthread_create(&t1, NULL, worker, &a);
    pthread_create(&t2, NULL, worker, &b);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    pthread_key_delete(key);
    return 0;
}

The destructor runs automatically when a thread exits.

Thread Safety: What Breaks

When two threads touch the same data without synchronization, you get a data race.

/* data_race.c */
#include <stdio.h>
#include <pthread.h>

int shared_counter = 0;

void *increment(void *arg) {
    (void)arg;
    for (int i = 0; i < 1000000; i++)
        shared_counter++;   /* NOT atomic */
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Expected: 2000000, Got: %d\n", shared_counter);
    return 0;
}

Run this several times. You will almost never see 2000000. The increment shared_counter++ is three CPU instructions (load, add, store). Two threads interleave them:

Thread A: load counter (0)
Thread B: load counter (0)
Thread A: add 1 -> 1
Thread B: add 1 -> 1
Thread A: store 1
Thread B: store 1          <-- one increment lost

Caution: Data races in C are undefined behavior per C11. The compiler is free to assume they do not happen, leading to bizarre optimizations.

Rust: std::thread::spawn

Rust threads use OS threads, just like pthreads. The API is safer.

// thread_hello.rs
use std::thread;

fn main() {
    let handle = thread::spawn(|| {
        println!("Hello from a spawned thread");
    });

    handle.join().unwrap();
    println!("Thread finished");
}

No void * casting. No manual memory management. The closure captures its environment.

Move Closures for Safe Data Passing

Rust forces you to either borrow or move data into the thread closure. Since the compiler cannot prove the borrow outlives the thread, you must use move.

// thread_move.rs
use std::thread;

fn main() {
    let mut handles = vec![];

    for i in 0..5 {
        let handle = thread::spawn(move || {
            println!("Thread {}", i);
        });
        handles.push(handle);
    }

    for h in handles {
        h.join().unwrap();
    }
}

Each closure gets its own copy of i (integers implement Copy). There is no equivalent of the C bug where all threads share a pointer to the same loop variable.

Rust Note: Rust's thread::spawn requires the closure to be 'static -- it cannot borrow stack-local data from the parent. This prevents the entire class of dangling-pointer bugs that plague pthreads.

Returning Values from Rust Threads

The JoinHandle<T> carries the return value.

// thread_return.rs
use std::thread;

fn main() {
    let handle = thread::spawn(|| -> i32 {
        7 * 7
    });

    let result = handle.join().unwrap();
    println!("7 squared = {}", result);
}

No malloc, no void * cast, no free. The value is moved out of the thread safely.

Thread-Local Storage in Rust

// thread_local.rs
use std::cell::RefCell;
use std::thread;

thread_local! {
    static COUNTER: RefCell<u32> = RefCell::new(0);
}

fn main() {
    let mut handles = vec![];

    for id in 0..3 {
        let h = thread::spawn(move || {
            COUNTER.with(|c| {
                for _ in 0..1000 {
                    *c.borrow_mut() += 1;
                }
                println!("Thread {}: counter = {}", id, *c.borrow());
            });
        });
        handles.push(h);
    }

    for h in handles {
        h.join().unwrap();
    }

    COUNTER.with(|c| {
        println!("Main: counter = {}", *c.borrow());
    });
}

Each thread sees its own COUNTER. The thread_local! macro initializes lazily per thread.

Comparing C and Rust Thread APIs

+--------------------+-------------------------------+---------------------------+
| Operation          | C (pthreads)                  | Rust (std::thread)        |
+--------------------+-------------------------------+---------------------------+
| Create             | pthread_create(&t, NULL, f, a)| thread::spawn(closure)    |
| Join               | pthread_join(t, &retval)      | handle.join().unwrap()    |
| Detach             | pthread_detach(t)             | drop(handle) (implicit)   |
| Pass args          | void* cast                    | move closure              |
| Return values      | void* cast                    | JoinHandle<T>             |
| Thread-local       | __thread / pthread_key        | thread_local! macro       |
| Data race protect  | programmer discipline         | compiler-enforced         |
+--------------------+-------------------------------+---------------------------+

Driver Prep: Linux kernel threads use kthread_create and kthread_run, which follow a similar create-join pattern. The kernel has its own synchronization primitives (spinlock_t, mutex, rcu) but the mental model is the same: shared data needs protection.

Knowledge Check

What happens if you pass &i (where i is a loop variable) to five pthread_create calls without copying i?
Why must you compile with -pthread and not just -lpthread?
In Rust, why does thread::spawn require a 'static closure?

Common Pitfalls

Forgetting -pthread -- the program may compile but crash at runtime or behave strangely.
Returning a pointer to a local variable from a thread function -- the stack is gone after the thread exits.
Not joining and not detaching -- resource leak, just like a zombie process.
Passing a shared pointer to multiple threads without synchronization -- data race, undefined behavior.
Calling pthread_join on a detached thread -- undefined behavior.
Assuming printf is thread-safe in all cases -- it is, by POSIX, but output may interleave at the line level.

C, Rust, and Linux Systems Programming