ELF: Dissecting Your Executable
Type this right now
// save as hello.c, compile: gcc -o hello hello.c
#include <stdio.h>
int main() {
printf("Hello, ELF!\n");
return 0;
}
xxd hello | head -4
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0300 3e00 0100 0000 6010 0000 0000 0000 ..>.....`.......
00000020: 4000 0000 0000 0000 9839 0000 0000 0000 @........9......
00000030: 0000 0000 4000 3800 0d00 4000 1f00 1e00 ....@.8...@.....
That's the first 64 bytes of your compiled program. Every single byte has meaning. By the end of this chapter, you'll be able to read them like a sentence.
The magic bytes
Look at the very first four bytes: 7f 45 4c 46.
0x7f— a non-printable byte, chosen specifically so ELF files can't be confused with text0x45=E0x4c=L0x46=F
Every ELF file on your system — every binary, every .so, every .o — starts with \x7fELF.
# Try it yourself — check /bin/ls
xxd /bin/ls | head -1
The kernel checks these four bytes first. If they don't match, execve() fails immediately.
💡 Fun Fact: The
0x7fbyte was chosen by the original Unix System V designers because it's the ASCII DEL character — the highest single-byte value. It makes it nearly impossible to accidentally create a file that "looks like" an ELF binary.
The ELF header: 64 bytes that describe everything
The ELF header is a fixed-size structure sitting at offset 0. On a 64-bit system, it's exactly 64 bytes. Here's the layout:
Offset Size Field What it means
────── ──── ────────────────── ──────────────────────────────────────
0x00 4 e_ident[EI_MAG] Magic: 7f 45 4c 46 (\x7fELF)
0x04 1 e_ident[EI_CLASS] Class: 1=32-bit, 2=64-bit
0x05 1 e_ident[EI_DATA] Endianness: 1=little, 2=big
0x06 1 e_ident[EI_VERSION] ELF version (always 1)
0x07 1 e_ident[EI_OSABI] OS/ABI: 0=UNIX System V
0x08 8 e_ident[EI_PAD] Padding (zeros)
0x10 2 e_type Type: 1=relocatable 2=exec 3=shared
0x12 2 e_machine Machine: 0x3e=x86-64, 0xb7=aarch64
0x14 4 e_version ELF version (again, always 1)
0x18 8 e_entry Entry point address
0x20 8 e_phoff Program header table offset
0x28 8 e_shoff Section header table offset
0x30 4 e_flags Processor-specific flags
0x34 2 e_ehsize ELF header size (64 for 64-bit)
0x36 2 e_phentsize Program header entry size
0x38 2 e_phnum Number of program headers
0x3a 2 e_shentsize Section header entry size
0x3c 2 e_shnum Number of section headers
0x3e 2 e_shstrndx Section header string table index
That's it. 64 bytes. And from them, the kernel knows everything it needs to begin loading your program.
Reading it the easy way: readelf -h
readelf -h hello
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1060
Start of program headers: 64 (bytes into file)
Start of section headers: 14744 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
Every field maps directly to the hex bytes you saw in xxd. The entry point 0x1060 is the
virtual address where execution begins — not main, but _start, the C runtime startup code that
calls main.
🧠 What do you think happens? Why is the Type
DYN(shared object) instead ofEXEC? Modern GCC produces Position-Independent Executables by default. The binary can be loaded at any base address. This enables ASLR — we'll cover this in Chapter 14.
The big picture: ELF file layout
┌──────────────────────────────┐ offset 0
│ ELF Header │ 64 bytes — the "table of contents"
│ (magic, type, entry point) │
├──────────────────────────────┤ offset 64 (0x40)
│ Program Header Table │ tells the LOADER how to map segments
│ (array of segment entries) │
├──────────────────────────────┤
│ │
│ .text section │ your compiled machine code
│ │
├──────────────────────────────┤
│ .rodata section │ read-only data ("Hello, ELF!\n")
├──────────────────────────────┤
│ .data section │ initialized global variables
├──────────────────────────────┤
│ .bss section │ uninitialized globals (zero bytes on disk)
├──────────────────────────────┤
│ .symtab section │ symbol table
├──────────────────────────────┤
│ .strtab section │ string table (symbol names)
├──────────────────────────────┤
│ .debug_* sections (if -g) │ DWARF debug info
├──────────────────────────────┤
│ ... other sections ... │
├──────────────────────────────┤
│ Section Header Table │ tells the LINKER about each section
│ (array of section entries) │
└──────────────────────────────┘
The ELF header points to the program header table (for the loader) and the section header table (for the linker and debugger). Everything else sits between them.
Program headers: what the loader sees
readelf -l hello
This shows the segments — the chunks the kernel maps into memory. Each segment has a type, permissions, a file offset, and a virtual address. We'll dissect these in the next chapter.
Section headers: what the linker sees
readelf -S hello
This shows the sections — the fine-grained pieces the compiler and linker work with. You'll
see .text, .data, .rodata, .bss, .symtab, and many more.
C vs Rust: why the Rust binary is bigger
Let's compile the same program in Rust:
// save as hello.rs, compile: rustc hello.rs fn main() { println!("Hello, ELF!"); }
$ ls -la hello # C binary
-rwxr-xr-x 1 user user 15960 Feb 19 10:00 hello
$ ls -la hello_rs # Rust binary (renamed for comparison)
-rwxr-xr-x 1 user user 4374432 Feb 19 10:01 hello_rs
The Rust binary is ~270x larger. Why?
readelf -S hello | wc -l # C: ~31 sections
readelf -S hello_rs | wc -l # Rust: ~43 sections
Three reasons:
-
Static linking of the standard library. Rust statically links
libstdby default. The C binary dynamically linkslibc. All that code forprintln!, formatting, panic handling, and the Rust runtime gets baked in. -
More debug info. Rust emits richer debug sections (
.debug_info,.debug_abbrev,.debug_line,.debug_str, etc.) even without explicit-g. -
Monomorphized generics. Rust generates specialized code for each concrete type used with generics.
println!alone pulls in a substantial amount of formatting machinery.
Strip it down
$ strip hello_rs -o hello_rs_stripped
$ ls -la hello_rs hello_rs_stripped
-rwxr-xr-x 1 user user 4374432 Feb 19 10:01 hello_rs
-rwxr-xr-x 1 user user 311496 Feb 19 10:02 hello_rs_stripped
strip removes symbol tables and debug info — sections the runtime doesn't need. The binary
still runs, but you can no longer debug it with meaningful function names.
$ strip hello -o hello_stripped
$ ls -la hello hello_stripped
-rwxr-xr-x 1 user user 15960 Feb 19 10:00 hello
-rwxr-xr-x 1 user user 14408 Feb 19 10:02 hello_stripped
The C binary barely shrinks — it was already small because it delegates everything to the shared
libc.so.
💡 Fun Fact: Production Rust binaries are typically compiled with
cargo build --release, which enables optimizations and can be further reduced withstrip,lto = trueinCargo.toml, andopt-level = "z"for size optimization. A release + stripped Rust hello world can get down to ~300 KB.
Finding your function: the symbol table
readelf -s hello | grep main
34: 0000000000001149 35 FUNC GLOBAL DEFAULT 16 main
There it is. main lives at virtual address 0x1149, is 35 bytes long, is a function (FUNC),
has global visibility, and sits in section index 16 (which is .text).
Now for Rust:
readelf -s hello_rs | grep 'main'
2156: 0000000000008280 47 FUNC GLOBAL DEFAULT 14 main
5731: 00000000000082b0 103 FUNC LOCAL DEFAULT 14 hello_rs::main
Rust has two entries: the C-compatible main that the C runtime calls, and hello_rs::main
which is your actual Rust function. The first one is a thin wrapper that calls the second.
The entry point is NOT main
Remember the entry point from readelf -h? It was 0x1060, but main is at 0x1149. What's
at 0x1060?
readelf -s hello | grep ' _start'
1: 0000000000001060 0 FUNC GLOBAL DEFAULT 16 _start
_start is the real entry point. It's provided by the C runtime (crt1.o). It sets up the
stack, initializes the C library, and then calls main. When main returns, _start calls
exit().
Kernel jumps here
│
v
_start (from crt1.o)
│
├── __libc_start_main()
│ │
│ ├── Initialize libc
│ ├── Call constructors
│ ├── Call main() ◄── YOUR CODE
│ ├── Call destructors
│ └── Call exit()
│
└── (never reached)
Rust's entry point
Rust has its own startup sequence, but it still begins with _start and ends up calling the
system's C runtime:
_start → __libc_start_main → main (Rust shim) → std::rt::lang_start
│
└── your fn main()
Same hardware. Same ELF. Same kernel. Different runtime path to reach your code.
🔧 Task: Read the ELF header by hand
- Compile a C program:
gcc -o hello hello.c- Hex dump the first 64 bytes:
xxd hello | head -4- Using the field table from this chapter, identify each field manually:
- Bytes 0-3: Magic number. What are they?
- Byte 4: Class. Is it 32-bit or 64-bit?
- Byte 5: Endianness. Little or big?
- Bytes 16-17: ELF type. What type is it?
- Bytes 18-19: Machine. What architecture?
- Bytes 24-31: Entry point. What address? (Remember: little-endian!)
- Verify your answers with
readelf -h hello.- Repeat with a Rust binary. Do the fields differ?
This is how forensic analysts and reverse engineers read binaries — byte by byte.