Loading: Binary Becomes Process

Type this right now

// save as hello.c, compile: gcc -o hello hello.c
#include <stdio.h>
int main() {
    printf("main is at: %p\n", (void *)main);
    return 0;
}
for i in 1 2 3 4 5; do ./hello; done
main is at: 0x55a3f1c00149
main is at: 0x564e28a00149
main is at: 0x55f84dc00149
main is at: 0x558b7e200149
main is at: 0x563420200149

Every run, main is at a different address. The binary on disk hasn't changed. The loader is moving things around. On purpose.


One syscall to rule them all: execve()

strace -f ./hello 2>&1 | head -15
execve("./hello", ["./hello"], 0x7ffd5c6b4e50 /* 55 vars */) = 0
brk(NULL)                               = 0x55a4b2e33000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2c8a3f1000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 2125824, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2c8a000000
mmap(0x7f2c8a028000, 1531904, PROT_READ|PROT_EXEC, ...) = 0x7f2c8a028000
mmap(0x7f2c8a1f4000, 24576, PROT_READ|PROT_WRITE, ...) = 0x7f2c8a1f4000
...

That first line — execve() — is where a file on disk becomes a running process.


What the kernel does

execve("./hello", ...)
         │
         v
  1. Read first bytes → check magic: 7f 45 4c 46 = ELF? Yes.
         │
         v
  2. Read ELF header → entry point, program header offset
         │
         v
  3. For each LOAD segment:
     └── mmap(vaddr, filesz, perms, MAP_PRIVATE|MAP_FIXED, fd, offset)
         (if memsz > filesz, zero-fill the extra — that's .bss)
         │
         v
  4. INTERP segment found? Load the dynamic linker too.
         │
         v
  5. Set up stack: argc, argv[], envp[], auxv[]
         │
         v
  6. Set instruction pointer → jump to entry point
     (dynamic binary: ld-linux entry; static binary: _start)

The kernel doesn't "run" your program. It prepares the address space and jumps to the entry point. User-space code takes over from there.


ELF segments become virtual memory

ELF file on disk                        Virtual address space
────────────────                        ─────────────────────

┌─────────────────┐
│   ELF header    │                     (not mapped)
├─────────────────┤
│  LOAD (R)       │ ────mmap()────────► Read-only (headers, notes)
├─────────────────┤
│  LOAD (R-X)     │ ────mmap()────────► Executable code (.text)
├─────────────────┤
│  LOAD (R)       │ ────mmap()────────► Read-only data (.rodata)
├─────────────────┤
│  LOAD (RW)      │ ────mmap()────────► Read-write data (.data+.bss)
├─────────────────┤
│  .symtab        │     NOT MAPPED      [heap]  ────────►
│  .debug_*       │                     ...
└─────────────────┘                     [stack] ◄────────

Each LOAD segment becomes one mmap() call. ELF permissions (R, RW, RX) become mmap flags (PROT_READ, PROT_WRITE, PROT_EXEC).

🧠 What do you think happens? The .bss segment has memsz > filesz. The extra bytes don't exist in the file. The kernel allocates them in memory and fills with zeros. Uninitialized globals become zero without wasting disk space.


The dynamic linker

readelf -l hello | grep INTERP -A 1
  INTERP  0x000318 0x0000000000000318 ... 0x00001c R 0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

This tells the kernel: "Load this program first." The kernel loads the dynamic linker into memory and jumps to its entry point. The dynamic linker then:

  1. Reads the DYNAMIC segment of your binary
  2. Opens and mmaps every shared library (libc.so.6, etc.)
  3. Resolves symbols between binary and libraries
  4. Jumps to your binary's _start
Kernel → ld-linux → _start → __libc_start_main → main()
                                                    │
                                                YOUR CODE

PLT and GOT: lazy binding

When your binary calls puts from libc, it uses:

  • PLT (Procedure Linkage Table) — small code stubs
  • GOT (Global Offset Table) — writable address slots

First call (slow path):

call puts@PLT
       │
       v
  puts@PLT:
    jmp [GOT[puts]] ──► GOT[puts] = address of resolver (not puts yet!)
    push index                │
    jmp PLT[0]                v
                        _dl_runtime_resolve:
                          1. Look up "puts" in libc
                          2. Patch GOT[puts] = real puts address
                          3. Jump to puts

Second call (fast path):

call puts@PLT
       │
       v
  puts@PLT:
    jmp [GOT[puts]] ──► GOT[puts] = 0x7f...puts (real address!)
                               │
                               v
                          puts() in libc — direct, no resolver

First call resolves the symbol. Every subsequent call is a single indirect jump.

💡 Fun Fact: This is "lazy binding" — symbols resolved on first use. Force eager binding with LD_BIND_NOW=1 ./hello to resolve everything before main runs.


ASLR: why the address changes

Run 1:  base = 0x55a3f1c00000  →  main = 0x55a3f1c00149
Run 2:  base = 0x564e28a00000  →  main = 0x564e28a00149
Run 3:  base = 0x55f84dc00000  →  main = 0x55f84dc00149
                                          ^^^^^^^^^^^
                                           Random!
                                                ^^^
                                             Always 149

The last three hex digits are always 149 — the offset of main within the binary. What changes is the base address. This is Address Space Layout Randomization (ASLR).

The kernel randomizes the base address of the executable, every shared library, the stack, the heap, and the mmap region.

Without ASLR:                    With ASLR:
┌──────────┐ 0x400000           ┌──────────┐ 0x55a3f1c00000
│  code    │                    │  code    │
├──────────┤ 0x600000           ├──────────┤ 0x55a3f1e00000
│  data    │                    │  data    │
│  ...     │                    │  ...     │
│  stack   │ 0x7ffffffde000     │  stack   │ 0x7ffd5c680000
└──────────┘                    └──────────┘
  Same every run.                 Different every run.
  Attacker knows all.             Attacker must guess.

Without ASLR, an attacker who knows your binary can predict every address. With 28-bit randomization, there are ~268 million possible base locations.


PIE: Position Independent Executable

ASLR only works if the code doesn't depend on fixed addresses. PIE uses rip-relative addressing:

# PIE (works at any base):
lea rax, [rip+0xeac]    # relative to current instruction

# Non-PIE (fixed base only):
mov rax, 0x402004        # hardcoded absolute address
readelf -h hello | grep Type       # DYN = PIE
gcc -no-pie -o hello_nopie hello.c
readelf -h hello_nopie | grep Type # EXEC = fixed address, no code ASLR

The Rust version

// save as hello_addr.rs, compile: rustc hello_addr.rs
fn main() {
    let stack_var = 42;
    println!("main:  {:p}", main as fn() as *const ());
    println!("stack: {:p}", &stack_var as *const i32);
}
for i in 1 2 3 4 5; do ./hello_addr; done

Both code and stack randomized. Same ASLR, same kernel, regardless of language.


Disabling ASLR: proof it's real

setarch $(uname -m) -R ./hello
setarch $(uname -m) -R ./hello
setarch $(uname -m) -R ./hello
main is at: 0x555555555149
main is at: 0x555555555149
main is at: 0x555555555149

Same address every time. The 0x555555555000 base is the well-known GDB default.

Check the system-wide setting: cat /proc/sys/kernel/randomize_va_space0=off, 1=partial, 2=full (default).


The complete picture

  hello (ELF on disk)
         │
         │  execve()
         v
  ┌─ Kernel ────────────────────────────────────┐
  │  Read ELF header + program headers           │
  │  mmap LOAD segments (with ASLR offset)       │
  │  Load dynamic linker (from INTERP)           │
  │  Set up stack (argc, argv, envp, auxv)       │
  │  Jump to ld-linux entry point                │
  └──────────────────────────────────────────────┘
         │
         v
  ┌─ Dynamic linker (ld-linux) ─────────────────┐
  │  Load shared libraries (libc.so, etc.)       │
  │  Set up PLT/GOT, process relocations         │
  │  Jump to binary's _start                     │
  └──────────────────────────────────────────────┘
         │
         v
  ┌─ C Runtime (_start) ───────────────────────-┐
  │  __libc_start_main → main(argc, argv, envp) │
  │  exit(return_value)                          │
  └──────────────────────────────────────────────┘
         │
         v
     YOUR CODE RUNS

From execve() to main(), every step is an ELF field read, an mmap() call, or a symbol resolution. Nothing magic.


🔧 Task: Observe ASLR yourself

  1. Compile:
    #include <stdio.h>
    int global = 42;
    int main() {
        int local = 7;
        printf("main:   %p\n", (void *)main);
        printf("global: %p\n", (void *)&global);
        printf("local:  %p\n", (void *)&local);
        return 0;
    }
    
  2. Run five times. All three addresses change each run.
  3. Notice: the offset between main and global stays constant (same binary). The offset between local (stack) and main (code) varies — randomized independently.
  4. Disable ASLR: setarch $(uname -m) -R ./a.out — same addresses every time.
  5. Bonus: strace ./a.out 2>&1 | grep mmap — count the mmap calls. Each one creates a memory region. That's the loader in action.