Threads and pthreads
Threads let you run multiple execution paths inside a single process, sharing the same address space. They are lighter than fork() because there is no page-table copy, no duplicated file descriptors, no COW overhead. This chapter covers POSIX threads in C and std::thread in Rust.
Why Threads?
Process with one thread: Process with three threads:
+---------------------------+ +---------------------------+
| Code | Data | Heap | | Code | Data | Heap |
| | | | | | (shared) |
+---------------------------+ +---------------------------+
| Stack | | Stack-0 | Stack-1 | Stack-2|
+---------------------------+ +---------------------------+
| 1 program counter | | PC-0 | PC-1 | PC-2 |
+---------------------------+ +---------------------------+
Every thread shares the code, global data, heap, and file descriptors. Each thread gets its own stack and register set. This makes communication between threads trivial (just read shared memory) but also dangerous (data races).
Creating a Thread in C
/* thread_hello.c */
#include <stdio.h>
#include <pthread.h>
void *greet(void *arg) {
int id = *(int *)arg;
printf("Hello from thread %d\n", id);
return NULL;
}
int main(void) {
pthread_t t;
int id = 42;
if (pthread_create(&t, NULL, greet, &id) != 0) {
perror("pthread_create");
return 1;
}
pthread_join(t, NULL);
printf("Thread finished\n");
return 0;
}
Compile with:
gcc -o thread_hello thread_hello.c -pthread
The -pthread flag links the pthreads library and defines the right macros.
pthread_create takes four arguments:
| Argument | Meaning |
|---|---|
&t | Where to store the thread ID |
NULL | Thread attributes (NULL = defaults) |
greet | The function to run |
&id | Argument passed to that function |
The thread function signature is always void *(*)(void *) -- it takes a void * and returns a void *.
Passing Arguments Safely
A common bug: passing a pointer to a stack variable that changes before the thread reads it.
/* broken_args.c -- DO NOT DO THIS */
#include <stdio.h>
#include <pthread.h>
void *print_id(void *arg) {
int id = *(int *)arg; /* race: main may have changed *arg */
printf("Thread %d\n", id);
return NULL;
}
int main(void) {
pthread_t threads[5];
for (int i = 0; i < 5; i++) {
pthread_create(&threads[i], NULL, print_id, &i); /* BUG */
}
for (int i = 0; i < 5; i++)
pthread_join(threads[i], NULL);
return 0;
}
Caution: The loop variable
iis shared across all threads. By the time a thread reads*arg,imay already be 3 or 5. You might see "Thread 5" printed five times.
The fix: give each thread its own copy.
/* fixed_args.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *print_id(void *arg) {
int id = *(int *)arg;
free(arg);
printf("Thread %d\n", id);
return NULL;
}
int main(void) {
pthread_t threads[5];
for (int i = 0; i < 5; i++) {
int *p = malloc(sizeof(int));
*p = i;
pthread_create(&threads[i], NULL, print_id, p);
}
for (int i = 0; i < 5; i++)
pthread_join(threads[i], NULL);
return 0;
}
Each thread gets its own heap-allocated integer. The thread frees it after reading.
Try It: Modify
broken_args.cto use an arrayint ids[5]instead ofmalloc. Setids[i] = ibefore creating each thread. Does this fix the bug? Why or why not?
Return Values
A thread function returns void *. You retrieve it through pthread_join.
/* thread_return.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *compute_square(void *arg) {
int val = *(int *)arg;
int *result = malloc(sizeof(int));
*result = val * val;
return result;
}
int main(void) {
pthread_t t;
int input = 7;
void *retval;
pthread_create(&t, NULL, compute_square, &input);
pthread_join(t, &retval);
printf("7 squared = %d\n", *(int *)retval);
free(retval);
return 0;
}
Caution: Never return a pointer to a local variable from the thread function. The thread's stack is destroyed after it exits. Return heap-allocated memory or cast an integer to
void *.
Joinable vs Detached Threads
By default, threads are joinable. If you never join them, you leak resources (similar to zombie processes). Detached threads clean up automatically when they exit.
/* detached.c */
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
void *background_work(void *arg) {
(void)arg;
sleep(1);
printf("Background work done\n");
return NULL;
}
int main(void) {
pthread_t t;
pthread_create(&t, NULL, background_work, NULL);
pthread_detach(t); /* cannot join after this */
printf("Main continues immediately\n");
sleep(2); /* give detached thread time to finish */
return 0;
}
You can also create a thread as detached from the start:
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&t, &attr, func, arg);
pthread_attr_destroy(&attr);
Thread-Local Storage
Sometimes each thread needs its own copy of a variable. Three approaches in C:
1. The __thread keyword (GCC extension, also C11 _Thread_local):
/* tls_keyword.c */
#include <stdio.h>
#include <pthread.h>
__thread int counter = 0;
void *worker(void *arg) {
int id = *(int *)arg;
for (int i = 0; i < 1000; i++)
counter++;
printf("Thread %d: counter = %d\n", id, counter);
return NULL;
}
int main(void) {
pthread_t t1, t2;
int id1 = 1, id2 = 2;
pthread_create(&t1, NULL, worker, &id1);
pthread_create(&t2, NULL, worker, &id2);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Main: counter = %d\n", counter);
return 0;
}
Each thread sees counter = 1000. Main sees counter = 0. No synchronization needed.
2. pthread_key_create / pthread_getspecific / pthread_setspecific:
/* tls_key.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
static pthread_key_t key;
void destructor(void *val) {
free(val);
}
void *worker(void *arg) {
int *p = malloc(sizeof(int));
*p = *(int *)arg;
pthread_setspecific(key, p);
int *my_val = pthread_getspecific(key);
printf("Thread-local value: %d\n", *my_val);
return NULL;
}
int main(void) {
pthread_key_create(&key, destructor);
pthread_t t1, t2;
int a = 10, b = 20;
pthread_create(&t1, NULL, worker, &a);
pthread_create(&t2, NULL, worker, &b);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_key_delete(key);
return 0;
}
The destructor runs automatically when a thread exits.
Thread Safety: What Breaks
When two threads touch the same data without synchronization, you get a data race.
/* data_race.c */
#include <stdio.h>
#include <pthread.h>
int shared_counter = 0;
void *increment(void *arg) {
(void)arg;
for (int i = 0; i < 1000000; i++)
shared_counter++; /* NOT atomic */
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Expected: 2000000, Got: %d\n", shared_counter);
return 0;
}
Run this several times. You will almost never see 2000000. The increment shared_counter++ is three CPU instructions (load, add, store). Two threads interleave them:
Thread A: load counter (0)
Thread B: load counter (0)
Thread A: add 1 -> 1
Thread B: add 1 -> 1
Thread A: store 1
Thread B: store 1 <-- one increment lost
Caution: Data races in C are undefined behavior per C11. The compiler is free to assume they do not happen, leading to bizarre optimizations.
Rust: std::thread::spawn
Rust threads use OS threads, just like pthreads. The API is safer.
// thread_hello.rs use std::thread; fn main() { let handle = thread::spawn(|| { println!("Hello from a spawned thread"); }); handle.join().unwrap(); println!("Thread finished"); }
No void * casting. No manual memory management. The closure captures its environment.
Move Closures for Safe Data Passing
Rust forces you to either borrow or move data into the thread closure. Since the compiler cannot prove the borrow outlives the thread, you must use move.
// thread_move.rs use std::thread; fn main() { let mut handles = vec![]; for i in 0..5 { let handle = thread::spawn(move || { println!("Thread {}", i); }); handles.push(handle); } for h in handles { h.join().unwrap(); } }
Each closure gets its own copy of i (integers implement Copy). There is no equivalent of the C bug where all threads share a pointer to the same loop variable.
Rust Note: Rust's
thread::spawnrequires the closure to be'static-- it cannot borrow stack-local data from the parent. This prevents the entire class of dangling-pointer bugs that plague pthreads.
Returning Values from Rust Threads
The JoinHandle<T> carries the return value.
// thread_return.rs use std::thread; fn main() { let handle = thread::spawn(|| -> i32 { 7 * 7 }); let result = handle.join().unwrap(); println!("7 squared = {}", result); }
No malloc, no void * cast, no free. The value is moved out of the thread safely.
Thread-Local Storage in Rust
// thread_local.rs use std::cell::RefCell; use std::thread; thread_local! { static COUNTER: RefCell<u32> = RefCell::new(0); } fn main() { let mut handles = vec![]; for id in 0..3 { let h = thread::spawn(move || { COUNTER.with(|c| { for _ in 0..1000 { *c.borrow_mut() += 1; } println!("Thread {}: counter = {}", id, *c.borrow()); }); }); handles.push(h); } for h in handles { h.join().unwrap(); } COUNTER.with(|c| { println!("Main: counter = {}", *c.borrow()); }); }
Each thread sees its own COUNTER. The thread_local! macro initializes lazily per thread.
Comparing C and Rust Thread APIs
+--------------------+-------------------------------+---------------------------+
| Operation | C (pthreads) | Rust (std::thread) |
+--------------------+-------------------------------+---------------------------+
| Create | pthread_create(&t, NULL, f, a)| thread::spawn(closure) |
| Join | pthread_join(t, &retval) | handle.join().unwrap() |
| Detach | pthread_detach(t) | drop(handle) (implicit) |
| Pass args | void* cast | move closure |
| Return values | void* cast | JoinHandle<T> |
| Thread-local | __thread / pthread_key | thread_local! macro |
| Data race protect | programmer discipline | compiler-enforced |
+--------------------+-------------------------------+---------------------------+
Driver Prep: Linux kernel threads use
kthread_createandkthread_run, which follow a similar create-join pattern. The kernel has its own synchronization primitives (spinlock_t,mutex,rcu) but the mental model is the same: shared data needs protection.
Knowledge Check
- What happens if you pass
&i(whereiis a loop variable) to fivepthread_createcalls without copyingi? - Why must you compile with
-pthreadand not just-lpthread? - In Rust, why does
thread::spawnrequire a'staticclosure?
Common Pitfalls
- Forgetting
-pthread-- the program may compile but crash at runtime or behave strangely. - Returning a pointer to a local variable from a thread function -- the stack is gone after the thread exits.
- Not joining and not detaching -- resource leak, just like a zombie process.
- Passing a shared pointer to multiple threads without synchronization -- data race, undefined behavior.
- Calling
pthread_joinon a detached thread -- undefined behavior. - Assuming
printfis thread-safe in all cases -- it is, by POSIX, but output may interleave at the line level.