Threads and Shared Memory
Type This First
Save this as race.c and compile with gcc -pthread -o race race.c:
#include <stdio.h>
#include <pthread.h>
int counter = 0; // shared global
void *increment(void *arg) {
for (int i = 0; i < 1000000; i++) {
counter++; // looks innocent...
}
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Expected: 2000000\n");
printf("Got: %d\n", counter);
return 0;
}
Run it five times. You will get a different wrong answer each time.
Threads vs Processes
A process is an isolated world. Its own address space, its own page tables, its own file descriptors. When you fork(), the child gets a copy of everything.
A thread is different. Threads live inside a process. They share the same address space. Same heap. Same globals. Same code. Same file descriptors.
What each thread gets of its own: a stack and a set of registers. That's it.
+--------------------------------------------------+
| Process (PID 42) |
| |
| +----------+ +----------+ +----------+ |
| | Thread 1 | | Thread 2 | | Thread 3 | |
| | Stack | | Stack | | Stack | |
| | regs | | regs | | regs | |
| +----+-----+ +----+-----+ +----+-----+ |
| | | | |
| v v v |
| +------------------------------------------------+
| | Shared Address Space |
| | |
| | .text (code) -- same code, all threads |
| | .data (globals) -- same globals, all threads |
| | heap -- same heap, all threads |
| | mmap region -- same mappings |
| +------------------------------------------------+
+--------------------------------------------------+
This sharing is what makes threads fast. No copying memory. No new page tables. Context-switching between threads in the same process is cheap.
It is also what makes threads dangerous.
The Data Race Problem
Look at counter++ in the program above. In C, that is one statement. But in machine code, it is three operations:
Thread A Thread B
-------- --------
1. READ counter (gets 5)
1. READ counter (gets 5)
2. ADD 1 (now 6)
2. ADD 1 (now 6)
3. WRITE counter (writes 6)
3. WRITE counter (writes 6)
Result: counter = 6, not 7. One increment was LOST.
This is called a lost update. Two threads read the same value, compute the same result, and one write overwrites the other.
What do you think happens?
If you run the race program with 10 threads instead of 2, does the error get bigger or smaller? Why?
The counter++ operation is a read-modify-write. It is NOT atomic. The CPU does not execute it as one indivisible step. The OS can switch threads between any of those three micro-steps.
C Solution: Mutexes (Discipline-Based)
In C, you protect shared data with a mutex (mutual exclusion lock):
#include <stdio.h>
#include <pthread.h>
int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void *increment(void *arg) {
for (int i = 0; i < 1000000; i++) {
pthread_mutex_lock(&lock);
counter++;
pthread_mutex_unlock(&lock);
}
return NULL;
}
int main(void) {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Expected: 2000000\n");
printf("Got: %d\n", counter);
return 0;
}
Now the count is always exactly 2,000,000.
But notice the problem: the mutex and the counter are separate things. Nothing connects them. The compiler does not know that counter requires lock. You could forget to lock. You could lock the wrong mutex. You could access counter from a new function and never realize it needs protection.
The relationship between the lock and the data it protects exists only in the programmer's head.
Rust Solution: Mutex<T> Wraps the Data
In Rust, the mutex contains the data. You cannot touch the data without locking:
use std::sync::{Arc, Mutex}; use std::thread; fn main() { let counter = Arc::new(Mutex::new(0)); let mut handles = vec![]; for _ in 0..2 { let counter = Arc::clone(&counter); let handle = thread::spawn(move || { for _ in 0..1_000_000 { let mut num = counter.lock().unwrap(); *num += 1; // lock released here when `num` goes out of scope } }); handles.push(handle); } for handle in handles { handle.join().unwrap(); } println!("Expected: 2000000"); println!("Got: {}", *counter.lock().unwrap()); }
Mutex::new(0) wraps the integer. The only way to read or write the integer is to call .lock(), which returns a guard. When the guard is dropped, the lock is released.
You literally cannot access the data without locking. The type system enforces it.
Send and Sync: The Compiler Checks Your Threads
Rust has two marker traits that the compiler checks automatically:
Send: This type is safe to move to another thread.Sync: This type is safe to share references between threads.
You never implement these by hand (usually). The compiler derives them from the types you use.
+-------------------+------+------+
| Type | Send | Sync |
+-------------------+------+------+
| i32, String, Vec | Y | Y |
| Mutex<T> | Y | Y | <-- designed for sharing
| Arc<T> | Y | Y | <-- atomic ref count
| Rc<T> | N | N | <-- NOT thread-safe
| Cell<T> | Y | N | <-- interior mutability, not Sync
| *mut T | N | N | <-- raw pointers
+-------------------+------+------+
Rc<T> uses a non-atomic reference count. If two threads increment it simultaneously, you get the same lost-update problem we saw with counter. So Rust marks Rc<T> as !Send. The compiler will refuse to let you move it to another thread.
Arc<T> uses atomic reference counting. It is safe to share. The compiler allows it.
The Compiler Catches Data Races
Try this in Rust — sharing an Rc across threads:
use std::rc::Rc; use std::thread; fn main() { let data = Rc::new(vec![1, 2, 3]); let data_clone = Rc::clone(&data); thread::spawn(move || { println!("{:?}", data_clone); }); }
This does NOT compile:
error[E0277]: `Rc<Vec<i32>>` cannot be sent between threads safely
--> src/main.rs:8:5
|
8 | thread::spawn(move || {
| ^^^^^^^^^^^^^ `Rc<Vec<i32>>` cannot be sent between threads safely
|
= help: the trait `Send` is not implemented for `Rc<Vec<i32>>`
In C, the equivalent code compiles without any warning. It crashes at runtime. Maybe. Or worse — it corrupts memory silently and you only find out in production.
Fun Fact
The Rust compiler's thread-safety checks have no runtime cost. Send and Sync are "zero-sized" marker traits — they exist only at compile time. Your binary contains no trace of them.
Atomic Types: Lock-Free Concurrency
Sometimes a full mutex is overkill. For simple counters and flags, CPUs provide atomic instructions that complete as one indivisible step.
use std::sync::atomic::{AtomicU32, Ordering}; use std::sync::Arc; use std::thread; fn main() { let counter = Arc::new(AtomicU32::new(0)); let mut handles = vec![]; for _ in 0..2 { let counter = Arc::clone(&counter); let handle = thread::spawn(move || { for _ in 0..1_000_000 { counter.fetch_add(1, Ordering::Relaxed); } }); handles.push(handle); } for handle in handles { handle.join().unwrap(); } println!("Got: {}", counter.load(Ordering::Relaxed)); }
fetch_add compiles to a single lock xadd instruction on x86. No mutex, no waiting, no deadlock possible.
C11 has atomics too (<stdatomic.h>), but again — nothing stops you from mixing atomic and non-atomic access to the same variable.
The Spectrum of Concurrency Safety
C Rust
| |
v v
No guardrails ------> Compiler-enforced
Data races compile Data races don't compile
Mutex is separate Mutex wraps data
Human discipline Type system discipline
Bugs found at runtime Bugs found at compile time
(or never found) (before your code ships)
This does not mean Rust concurrency is easy. Deadlocks are still possible. Logic bugs are still possible. But an entire class of bugs — data races — is eliminated at compile time.
Task
- Compile and run the
race.cprogram at the top of this chapter. Run it 10 times. Record the different results.- Try the
Rcexample in Rust. Read the compiler error carefully.- Replace
RcwithArcandvec![1,2,3]withMutex::new(0). Make the threaded counter work.- Modify the C version to use 10 threads. How wrong does the count get?
- Try the atomic Rust version. Verify the count is always exactly 2,000,000.
- Bonus: Use
timeto compare the mutex version vs the atomic version. Which is faster? Why?