Threads and Shared Memory

Type This First

Save this as race.c and compile with gcc -pthread -o race race.c:

#include <stdio.h>
#include <pthread.h>

int counter = 0;  // shared global

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        counter++;  // looks innocent...
    }
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Expected: 2000000\n");
    printf("Got:      %d\n", counter);
    return 0;
}

Run it five times. You will get a different wrong answer each time.


Threads vs Processes

A process is an isolated world. Its own address space, its own page tables, its own file descriptors. When you fork(), the child gets a copy of everything.

A thread is different. Threads live inside a process. They share the same address space. Same heap. Same globals. Same code. Same file descriptors.

What each thread gets of its own: a stack and a set of registers. That's it.

+--------------------------------------------------+
|                  Process (PID 42)                 |
|                                                   |
|   +----------+  +----------+  +----------+       |
|   | Thread 1 |  | Thread 2 |  | Thread 3 |       |
|   |  Stack   |  |  Stack   |  |  Stack   |       |
|   |  regs    |  |  regs    |  |  regs    |       |
|   +----+-----+  +----+-----+  +----+-----+       |
|        |              |              |             |
|        v              v              v             |
|   +------------------------------------------------+
|   |         Shared Address Space                   |
|   |                                                |
|   |   .text (code)    -- same code, all threads    |
|   |   .data (globals) -- same globals, all threads |
|   |   heap            -- same heap, all threads    |
|   |   mmap region     -- same mappings             |
|   +------------------------------------------------+
+--------------------------------------------------+

This sharing is what makes threads fast. No copying memory. No new page tables. Context-switching between threads in the same process is cheap.

It is also what makes threads dangerous.


The Data Race Problem

Look at counter++ in the program above. In C, that is one statement. But in machine code, it is three operations:

Thread A                    Thread B
--------                    --------
1. READ counter  (gets 5)
                            1. READ counter  (gets 5)
2. ADD 1         (now 6)
                            2. ADD 1         (now 6)
3. WRITE counter (writes 6)
                            3. WRITE counter (writes 6)

Result: counter = 6, not 7. One increment was LOST.

This is called a lost update. Two threads read the same value, compute the same result, and one write overwrites the other.

What do you think happens?

If you run the race program with 10 threads instead of 2, does the error get bigger or smaller? Why?

The counter++ operation is a read-modify-write. It is NOT atomic. The CPU does not execute it as one indivisible step. The OS can switch threads between any of those three micro-steps.


C Solution: Mutexes (Discipline-Based)

In C, you protect shared data with a mutex (mutual exclusion lock):

#include <stdio.h>
#include <pthread.h>

int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);
        counter++;
        pthread_mutex_unlock(&lock);
    }
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Expected: 2000000\n");
    printf("Got:      %d\n", counter);
    return 0;
}

Now the count is always exactly 2,000,000.

But notice the problem: the mutex and the counter are separate things. Nothing connects them. The compiler does not know that counter requires lock. You could forget to lock. You could lock the wrong mutex. You could access counter from a new function and never realize it needs protection.

The relationship between the lock and the data it protects exists only in the programmer's head.


Rust Solution: Mutex<T> Wraps the Data

In Rust, the mutex contains the data. You cannot touch the data without locking:

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let counter = Arc::new(Mutex::new(0));
    let mut handles = vec![];

    for _ in 0..2 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            for _ in 0..1_000_000 {
                let mut num = counter.lock().unwrap();
                *num += 1;
                // lock released here when `num` goes out of scope
            }
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Expected: 2000000");
    println!("Got:      {}", *counter.lock().unwrap());
}

Mutex::new(0) wraps the integer. The only way to read or write the integer is to call .lock(), which returns a guard. When the guard is dropped, the lock is released.

You literally cannot access the data without locking. The type system enforces it.


Send and Sync: The Compiler Checks Your Threads

Rust has two marker traits that the compiler checks automatically:

  • Send: This type is safe to move to another thread.
  • Sync: This type is safe to share references between threads.

You never implement these by hand (usually). The compiler derives them from the types you use.

+-------------------+------+------+
| Type              | Send | Sync |
+-------------------+------+------+
| i32, String, Vec  |  Y   |  Y   |
| Mutex<T>          |  Y   |  Y   |  <-- designed for sharing
| Arc<T>            |  Y   |  Y   |  <-- atomic ref count
| Rc<T>             |  N   |  N   |  <-- NOT thread-safe
| Cell<T>           |  Y   |  N   |  <-- interior mutability, not Sync
| *mut T            |  N   |  N   |  <-- raw pointers
+-------------------+------+------+

Rc<T> uses a non-atomic reference count. If two threads increment it simultaneously, you get the same lost-update problem we saw with counter. So Rust marks Rc<T> as !Send. The compiler will refuse to let you move it to another thread.

Arc<T> uses atomic reference counting. It is safe to share. The compiler allows it.


The Compiler Catches Data Races

Try this in Rust — sharing an Rc across threads:

use std::rc::Rc;
use std::thread;

fn main() {
    let data = Rc::new(vec![1, 2, 3]);
    let data_clone = Rc::clone(&data);

    thread::spawn(move || {
        println!("{:?}", data_clone);
    });
}

This does NOT compile:

error[E0277]: `Rc<Vec<i32>>` cannot be sent between threads safely
   --> src/main.rs:8:5
    |
8   |     thread::spawn(move || {
    |     ^^^^^^^^^^^^^ `Rc<Vec<i32>>` cannot be sent between threads safely
    |
    = help: the trait `Send` is not implemented for `Rc<Vec<i32>>`

In C, the equivalent code compiles without any warning. It crashes at runtime. Maybe. Or worse — it corrupts memory silently and you only find out in production.

Fun Fact

The Rust compiler's thread-safety checks have no runtime cost. Send and Sync are "zero-sized" marker traits — they exist only at compile time. Your binary contains no trace of them.


Atomic Types: Lock-Free Concurrency

Sometimes a full mutex is overkill. For simple counters and flags, CPUs provide atomic instructions that complete as one indivisible step.

use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    let counter = Arc::new(AtomicU32::new(0));
    let mut handles = vec![];

    for _ in 0..2 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            for _ in 0..1_000_000 {
                counter.fetch_add(1, Ordering::Relaxed);
            }
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Got: {}", counter.load(Ordering::Relaxed));
}

fetch_add compiles to a single lock xadd instruction on x86. No mutex, no waiting, no deadlock possible.

C11 has atomics too (<stdatomic.h>), but again — nothing stops you from mixing atomic and non-atomic access to the same variable.


The Spectrum of Concurrency Safety

    C                                       Rust
    |                                         |
    v                                         v
  No guardrails          ------>          Compiler-enforced
  Data races compile                      Data races don't compile
  Mutex is separate                       Mutex wraps data
  Human discipline                        Type system discipline
  Bugs found at runtime                   Bugs found at compile time
  (or never found)                        (before your code ships)

This does not mean Rust concurrency is easy. Deadlocks are still possible. Logic bugs are still possible. But an entire class of bugs — data races — is eliminated at compile time.


Task

  1. Compile and run the race.c program at the top of this chapter. Run it 10 times. Record the different results.
  2. Try the Rc example in Rust. Read the compiler error carefully.
  3. Replace Rc with Arc and vec![1,2,3] with Mutex::new(0). Make the threaded counter work.
  4. Modify the C version to use 10 threads. How wrong does the count get?
  5. Try the atomic Rust version. Verify the count is always exactly 2,000,000.
  6. Bonus: Use time to compare the mutex version vs the atomic version. Which is faster? Why?