Page Faults: When Things Get Interesting
Type this right now
// save as lazy.c — compile: gcc -o lazy lazy.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
// Ask for 100 MB. Does the OS actually give us 100 MB of RAM?
size_t size = 100 * 1024 * 1024;
char *p = malloc(size);
printf("malloc returned: %p\n", (void *)p);
printf("Now check: ps -o pid,vsz,rss -p %d\n", getpid());
printf("Press Enter BEFORE touching memory...\n");
getchar();
// Touch every page (write one byte per 4 KB page)
for (size_t i = 0; i < size; i += 4096) {
p[i] = 'A';
}
printf("Memory touched. Press Enter to check again...\n");
getchar();
free(p);
return 0;
}
Run it in one terminal. In another terminal, check RSS before and after:
$ ps -o pid,vsz,rss -p $(pgrep lazy)
PID VSZ RSS
1234 203456 2340 ← Before: VSZ is large, RSS is tiny!
# (press Enter in the first terminal)
$ ps -o pid,vsz,rss -p $(pgrep lazy)
PID VSZ RSS
1234 203456 104820 ← After: RSS jumped ~100 MB
VSZ (virtual size) was always large — the address space was mapped. RSS (resident set size) was tiny until you touched the pages. Physical RAM was allocated one page at a time, on demand, via page faults.
A page fault is NOT an error
This is the most misunderstood concept in systems programming. A page fault is a CPU exception that says: "I tried to translate this virtual address, and the page table says I can't proceed. Kernel, please help."
The kernel then decides what to do:
CPU executes: mov [0x7F4000], eax
│
▼
MMU walks page table → PTE has Present=0 (or wrong permissions)
│
▼
CPU raises Page Fault Exception (#PF)
│
▼
Kernel's page fault handler runs
│
├──► Minor fault? → Map a physical frame, resume
│
├──► Major fault? → Load from disk, map frame, resume
│
└──► Invalid? → Send SIGSEGV → process dies
Three types of page faults
1. Minor fault — demand paging
You called malloc(100 MB). The kernel said "sure" and set up virtual address mappings but
left every PTE with Present=0. No physical RAM was allocated.
When you first touch a page:
Your code: p[0] = 'A';
│
▼
Virtual address 0x7F4000 → MMU walks table → Present=0
│
▼
Page Fault (minor) → kernel allocates a physical frame
│ from the free page pool, zeros it,
│ updates the PTE: Present=1, Frame=0x1A200
│
▼
CPU retries the instruction → MMU walks table → Present=1 → success!
p[0] is now 'A' in frame 0x1A200
This happens once per page, then never again (for that page). The process never notices — the
retry is automatic. This is why malloc(1 GB) succeeds on a 4 GB system — physical RAM is only
committed when pages are actually accessed.
🧠 What do you think happens?
You call
malloc(1 TB)on a machine with 16 GB of RAM.mallocreturns a valid pointer. You then try to touch every page. At some point, the kernel runs out of physical frames. What happens? (Hint: look up the "OOM killer.")
2. Minor fault — copy-on-write
After fork(), parent and child share pages marked read-only. When either writes:
Child writes to shared page at 0x5000
│
▼
MMU: page is present but marked read-only → Page Fault
│
▼
Kernel: "Ah, this is a copy-on-write page."
│
├── Allocate new physical frame
├── Copy contents from original frame
├── Update child's PTE: point to new frame, mark writable
└── Resume child's instruction
│
▼
Child's write succeeds. Parent's page is unaffected.
Still a minor fault — no disk I/O. Just a memory copy and a PTE update.
3. Major fault — loading from disk
The page was once in RAM but got swapped out to disk (because the system was low on memory). The PTE has Present=0 but contains a swap entry telling the kernel where on disk the data lives.
Access to page at 0x8000 → Present=0, swap entry = disk sector 42501
│
▼
Kernel: "This page was swapped out."
│
├── Allocate a free physical frame
├── Read 4 KB from swap disk into the frame ◄── SLOW! ~5-10 ms
├── Update PTE: Present=1, Frame=new_frame
└── Resume instruction
│
▼
Access succeeds. But it cost milliseconds, not nanoseconds.
Speed comparison:
TLB hit: ~1 ns
Minor page fault: ~1-10 μs (1,000× slower)
Major page fault: ~5-10 ms (5,000,000× slower than TLB hit)
(1,000× slower than minor fault)
Major faults are the reason your system feels sluggish when it starts swapping. A program that would normally take 1 second can take hours if most of its accesses trigger major faults.
4. Invalid fault — you messed up
The virtual address has no mapping at all. No PTE. No swap entry. Nothing.
Access to 0xDEADBEEF → no mapping exists
│
▼
Kernel: "This address is not valid for this process."
│
▼
Kernel sends SIGSEGV to the process
│
▼
Default handler: print "Segmentation fault", dump core, exit
This is the one that kills your program. We'll cover it in detail in Chapter 18.
Watching page faults happen
Linux tracks page faults per process. You can see them:
$ /usr/bin/time -v ./lazy 2>&1 | grep -i fault
Minor (reclaiming a frame): 25,612
Major (requiring I/O): 0
25,612 minor faults for 100 MB makes sense: 100 MB / 4 KB = 25,600 pages (plus a few for the program itself, stack, libraries).
You can also watch in real time with perf:
$ perf stat -e page-faults,minor-faults,major-faults ./lazy
25,614 page-faults
25,614 minor-faults
0 major-faults
mmap: the Swiss army knife
mmap() is the system call that creates virtual address mappings. Everything runs through it:
malloc (large allocs) → calls mmap(MAP_ANONYMOUS | MAP_PRIVATE)
Loading shared libs → kernel calls mmap(MAP_PRIVATE, fd)
Reading files → you call mmap(MAP_PRIVATE, fd)
Shared memory → mmap(MAP_SHARED | MAP_ANONYMOUS)
Copy-on-write fork → kernel manipulates existing mappings
Here's mmap used to read a file:
// save as mmapread.c — compile: gcc -o mmapread mmapread.c
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main() {
int fd = open("/etc/passwd", O_RDONLY);
struct stat st;
fstat(fd, &st);
// Map the entire file into our address space
char *data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd); // We can close fd — the mapping keeps the file accessible
// Access the file like a normal array
printf("First 80 bytes:\n%.80s\n", data);
munmap(data, st.st_size);
return 0;
}
And in Rust:
use std::fs::File; use std::os::unix::io::AsRawFd; fn main() { let file = File::open("/etc/passwd").unwrap(); let len = file.metadata().unwrap().len() as usize; // Using the memmap2 crate (add to Cargo.toml: memmap2 = "0.9") // let mmap = unsafe { memmap2::Mmap::map(&file).unwrap() }; // println!("{}", std::str::from_utf8(&mmap[..80]).unwrap()); // Or with raw mmap: let ptr = unsafe { libc::mmap( std::ptr::null_mut(), len, libc::PROT_READ, libc::MAP_PRIVATE, file.as_raw_fd(), 0, ) }; let data = unsafe { std::slice::from_raw_parts(ptr as *const u8, len) }; println!("First 80 bytes:\n{}", std::str::from_utf8(&data[..80]).unwrap()); unsafe { libc::munmap(ptr, len); } }
💡 Fun Fact: When the kernel loads your ELF binary at
exec()time, it doesn't read the whole file into RAM. Itmmaps the segments. Your.textsection is demand-paged — functions that are never called are never loaded from disk.
The page fault flow in full
Here's the complete picture of what happens when the CPU can't translate an address:
CPU executes instruction that accesses virtual address VA
│
▼
MMU checks TLB ─── Hit? ──► Translate, access physical memory. Done.
│
No (TLB miss)
│
▼
MMU walks 4-level page table
│
├── Present=1 and permissions OK? ──► Load into TLB, access memory. Done.
│
└── Present=0 or permission violation?
│
▼
CPU pushes fault address to CR2 register
CPU raises #PF exception (interrupt 14)
CPU switches to kernel mode
│
▼
Kernel page fault handler (arch/x86/mm/fault.c)
│
├── Is VA in a valid VMA? (vm_area_struct)
│ │
│ No ──► Send SIGSEGV (invalid access)
│ │
│ Yes
│ ▼
├── Was it a write to a read-only COW page?
│ │
│ Yes ──► Allocate frame, copy page, update PTE, resume
│ │
│ No
│ ▼
├── Is there a swap entry?
│ │
│ Yes ──► Read from swap (major fault), map frame, resume
│ │
│ No
│ ▼
├── Is it a demand-zero page (anonymous)?
│ │
│ Yes ──► Allocate zeroed frame, map it, resume (minor fault)
│ │
│ No
│ ▼
└── Is it a file-backed mapping?
│
Yes ──► Read from file (major fault), map frame, resume
│
No ──► Send SIGSEGV
🔧 Task: Watch demand paging in /proc/self/smaps
// save as demand.c — compile: gcc -o demand demand.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
void show_rss() {
char path[64];
snprintf(path, sizeof(path), "/proc/%d/status", getpid());
FILE *f = fopen(path, "r");
char line[256];
while (fgets(line, sizeof(line), f)) {
if (strncmp(line, "VmRSS", 5) == 0 || strncmp(line, "VmSize", 6) == 0) {
printf(" %s", line);
}
}
fclose(f);
}
int main() {
printf("Before malloc:\n");
show_rss();
char *p = malloc(100 * 1024 * 1024); // 100 MB
printf("\nAfter malloc (before touching):\n");
show_rss();
// Touch every page
for (size_t i = 0; i < 100 * 1024 * 1024; i += 4096) {
p[i] = 1;
}
printf("\nAfter touching every page:\n");
show_rss();
free(p);
printf("\nAfter free:\n");
show_rss();
return 0;
}
$ ./demand
Before malloc:
VmSize: 2580 kB
VmRSS: 1024 kB
After malloc (before touching):
VmSize: 105060 kB ← Virtual size jumped 100 MB
VmRSS: 1040 kB ← Physical memory: basically unchanged!
After touching every page:
VmSize: 105060 kB ← Virtual size: same
VmRSS: 103420 kB ← RSS jumped ~100 MB. NOW the RAM is used.
After free:
VmSize: 2580 kB ← Virtual mapping released
VmRSS: 1040 kB ← Physical RAM returned to the OS
Key insight: VmSize reflects the virtual address space. VmRSS reflects physical RAM.
malloc only affects VmSize. Actually touching the memory triggers page faults, which
allocate physical frames and increase VmRSS.
Now run it again under perf stat:
$ perf stat -e minor-faults,major-faults ./demand
25,630 minor-faults ← ~25,600 pages = 100 MB / 4 KB
0 major-faults
Every single one of those 25,600 minor faults was the kernel giving you one physical frame. No disk I/O. Just pure demand paging.