Compilation and Linking: Source to Binary
Type this right now
cat > greet.c << 'EOF'
#include <stdio.h>
void greet(const char *name) { printf("Hello, %s!\n", name); }
EOF
cat > main.c << 'EOF'
void greet(const char *name);
int main() { greet("world"); return 0; }
EOF
gcc -c greet.c -o greet.o
gcc -c main.c -o main.o
gcc greet.o main.o -o hello
echo "=== main.o symbols ===" && nm main.o
echo "=== After linking ===" && nm hello | grep -E 'main|greet'
=== main.o symbols ===
U greet
0000000000000000 T main
=== After linking ===
0000000000001149 T greet
0000000000001172 T main
main.o has an undefined symbol greet (the U). After linking, it has a real address.
That's the entire purpose of the linker — resolving references between files.
The C compilation pipeline
hello.c Your source code
│
│ gcc -E Preprocessor (#include, #define, #ifdef)
v
hello.i Pure C, all macros expanded
│
│ gcc -S Compiler (cc1) — C to assembly
v
hello.s Human-readable assembly
│
│ gcc -c Assembler (as) — assembly to machine code
v
hello.o Object file (relocatable ELF)
│
│ gcc Linker (ld) — resolves symbols, assigns addresses
v
a.out Executable (final ELF)
You can stop at any stage. Let's see each output.
Preprocessing (gcc -E): A hello.c with #include <stdio.h> expands to ~700 lines. Every
header pasted in, every macro expanded. The output is pure C.
Compilation (gcc -S):
main:
endbr64
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts@PLT
movl $0, %eax
popq %rbp
ret
.LC0:
.string "Hello, ELF!"
Notice puts@PLT — the compiler doesn't know where puts lives. It emits a reference the
linker will resolve.
Assembly (gcc -c):
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
The type is relocatable — machine code with provisional addresses starting at zero.
Linking (gcc):
hello: ELF 64-bit LSB pie executable, x86-64, dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, ...
The linker resolves all symbols, assigns final addresses, produces the executable.
The Rust pipeline
hello.rs Your source code
│
│ rustc Parser → AST → HIR (desugared, type-checked)
v → MIR (borrow-checked, monomorphized)
LLVM IR LLVM's intermediate representation
│
│ LLVM Machine code generation
v
hello.o Object file (same format as C!)
│
│ linker Same system linker (ld / lld)
v
hello Executable (final ELF)
After LLVM, Rust and C produce the same kind of object file. They use the same linker.
💡 Fun Fact: You can mix C and Rust in one binary. Compile C to
.o, Rust to.o, link them together. The linker doesn't care what language produced the object file — it only sees symbols, sections, and relocations.
Object files: code with holes
objdump -d main.o
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push rbp
5: 48 89 e5 mov rbp,rsp
8: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # address TBD
f: 48 89 c7 mov rdi,rax
12: e8 00 00 00 00 call 17 <main+0x17> # address TBD
17: b8 00 00 00 00 mov eax,0x0
1c: 5d pop rbp
1d: c3 ret
See the zeros at offsets 0x8 and 0x12? Those are holes. The lea needs the string
address. The call needs greet's address. Neither is known yet.
Relocation entries tell the linker where to fill in:
readelf -r main.o
Relocation section '.rela.text' at offset 0x1e8 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000b 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
000000000013 000600000004 R_X86_64_PLT32 0000000000000000 greet - 4
Two relocations. Two holes. The linker fills them.
Symbol types: the nm command
Symbol Meaning
────── ────────────────────────────────────
T Text (code) — defined in this file
D Data — initialized global variable
B BSS — uninitialized global variable
R Read-only data
U Undefined — referenced but not here
W Weak — can be overridden
main.o defines main (T) but needs greet (U). greet.o defines greet (T) but needs
printf (U). The linker connects all the U's to the T's.
🧠 What do you think happens? What if a symbol is
Uin every object file and neverT? The linker emits:undefined reference to 'greet'. The most common linker error. No file provided a definition.
Linking: three jobs
1. Symbol resolution — match undefined references to definitions:
main.o greet.o
┌────────────────┐ ┌────────────────┐
│ T main │ │ T greet │
│ U greet ───────┼────────►│ │
│ │ │ U printf ──────┼──► libc.so
└────────────────┘ └────────────────┘
2. Relocation — fill in placeholder addresses:
Before (main.o): call 0x00000000 # placeholder
After (hello): call 0x00001149 # actual address of greet
3. Section merging — combine sections from all object files:
main.o .text ──┐
├──► final .text
greet.o .text ─┘
Static vs dynamic linking
Static linking: Dynamic linking (default):
┌──────────────────────┐ ┌──────────────┐
│ hello_static │ │ hello │
│ main() │ │ main() │
│ greet() │ │ greet() │
│ printf() │ │ printf ──────┼──► libc.so.6
│ ... all of libc ... │ └───────────────┘
└──────────────────────┘ ~16 KB, needs .so
~880 KB, no dependencies
gcc -static main.o greet.o -o hello_static
ldd hello_static # "not a dynamic executable"
ldd hello # lists libc.so.6, ld-linux, vdso
Static: everything baked in. Dynamic: smaller binary, resolved at runtime.
Rust's linking story
Rust statically links its own standard library but dynamically links the system C library — a hybrid approach.
ldd hello_rs # shows libc.so.6, libgcc_s.so.1, ld-linux
💡 Fun Fact: Fully static Rust:
rustup target add x86_64-unknown-linux-musl && rustc --target x86_64-unknown-linux-musl hello.rs. Zero dynamic dependencies. Runs on any Linux.
PLT/GOT: dynamic linking at a glance
When your binary calls printf dynamically, it uses two structures:
- PLT (Procedure Linkage Table) — code trampolines
- GOT (Global Offset Table) — writable address slots
Your code PLT GOT
───────── ───── ────
call puts@PLT ──────► puts@PLT: ┌──────────────┐
jmp [GOT[puts]]───────►│ address of │
... │ puts in libc │
└──────────────┘
On the first call, the GOT entry points to a resolver that finds the real puts, patches the
GOT, and jumps there. Subsequent calls go directly through the patched GOT. Details in Chapter 14.
🔧 Task: Watch symbols resolve across files
- Create
math.c:int add(int a, int b) { return a + b; } int multiply(int a, int b) { return a * b; }- Create
app.c:#include <stdio.h> int add(int a, int b); int multiply(int a, int b); int main() { printf("%d %d\n", add(3, 4), multiply(5, 6)); return 0; }- Compile separately:
gcc -c math.candgcc -c app.c- Run
nm math.o—addandmultiplyshould beT(defined)- Run
nm app.o—addandmultiplyshould beU(undefined)- Link:
gcc math.o app.o -o app- Run
nm app | grep -E 'add|multiply'— both nowTwith real addresses- Run
./app— output:7 30- Bonus: Delete
math.o, try linking with justapp.o. The error tells you exactly which symbols are missing.