ELF: Dissecting Your Executable

Type this right now

// save as hello.c, compile: gcc -o hello hello.c
#include <stdio.h>
int main() {
    printf("Hello, ELF!\n");
    return 0;
}

xxd hello | head -4

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0300 3e00 0100 0000 6010 0000 0000 0000  ..>.....`.......
00000020: 4000 0000 0000 0000 9839 0000 0000 0000  @........9......
00000030: 0000 0000 4000 3800 0d00 4000 1f00 1e00  ....@.8...@.....

That's the first 64 bytes of your compiled program. Every single byte has meaning. By the end of this chapter, you'll be able to read them like a sentence.

The magic bytes

Look at the very first four bytes: 7f 45 4c 46.

0x7f — a non-printable byte, chosen specifically so ELF files can't be confused with text
0x45 = E
0x4c = L
0x46 = F

Every ELF file on your system — every binary, every .so, every .o — starts with \x7fELF.

# Try it yourself — check /bin/ls
xxd /bin/ls | head -1

The kernel checks these four bytes first. If they don't match, execve() fails immediately.

💡 Fun Fact: The 0x7f byte was chosen by the original Unix System V designers because it's the ASCII DEL character — the highest single-byte value. It makes it nearly impossible to accidentally create a file that "looks like" an ELF binary.

The ELF header: 64 bytes that describe everything

The ELF header is a fixed-size structure sitting at offset 0. On a 64-bit system, it's exactly 64 bytes. Here's the layout:

Offset  Size  Field               What it means
──────  ────  ──────────────────  ──────────────────────────────────────
0x00    4     e_ident[EI_MAG]     Magic: 7f 45 4c 46 (\x7fELF)
0x04    1     e_ident[EI_CLASS]   Class: 1=32-bit, 2=64-bit
0x05    1     e_ident[EI_DATA]    Endianness: 1=little, 2=big
0x06    1     e_ident[EI_VERSION] ELF version (always 1)
0x07    1     e_ident[EI_OSABI]   OS/ABI: 0=UNIX System V
0x08    8     e_ident[EI_PAD]     Padding (zeros)
0x10    2     e_type              Type: 1=relocatable 2=exec 3=shared
0x12    2     e_machine           Machine: 0x3e=x86-64, 0xb7=aarch64
0x14    4     e_version           ELF version (again, always 1)
0x18    8     e_entry             Entry point address
0x20    8     e_phoff             Program header table offset
0x28    8     e_shoff             Section header table offset
0x30    4     e_flags             Processor-specific flags
0x34    2     e_ehsize            ELF header size (64 for 64-bit)
0x36    2     e_phentsize         Program header entry size
0x38    2     e_phnum             Number of program headers
0x3a    2     e_shentsize         Section header entry size
0x3c    2     e_shnum             Number of section headers
0x3e    2     e_shstrndx          Section header string table index

That's it. 64 bytes. And from them, the kernel knows everything it needs to begin loading your program.

Reading it the easy way: readelf -h

readelf -h hello

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1060
  Start of program headers:          64 (bytes into file)
  Start of section headers:          14744 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 30

Every field maps directly to the hex bytes you saw in xxd. The entry point 0x1060 is the virtual address where execution begins — not main, but _start, the C runtime startup code that calls main.

🧠 What do you think happens? Why is the Type DYN (shared object) instead of EXEC? Modern GCC produces Position-Independent Executables by default. The binary can be loaded at any base address. This enables ASLR — we'll cover this in Chapter 14.

The big picture: ELF file layout

┌──────────────────────────────┐  offset 0
│         ELF Header           │  64 bytes — the "table of contents"
│  (magic, type, entry point)  │
├──────────────────────────────┤  offset 64 (0x40)
│    Program Header Table      │  tells the LOADER how to map segments
│  (array of segment entries)  │
├──────────────────────────────┤
│                              │
│        .text section         │  your compiled machine code
│                              │
├──────────────────────────────┤
│       .rodata section        │  read-only data ("Hello, ELF!\n")
├──────────────────────────────┤
│        .data section         │  initialized global variables
├──────────────────────────────┤
│         .bss section         │  uninitialized globals (zero bytes on disk)
├──────────────────────────────┤
│      .symtab section         │  symbol table
├──────────────────────────────┤
│      .strtab section         │  string table (symbol names)
├──────────────────────────────┤
│   .debug_* sections (if -g)  │  DWARF debug info
├──────────────────────────────┤
│     ... other sections ...   │
├──────────────────────────────┤
│    Section Header Table      │  tells the LINKER about each section
│  (array of section entries)  │
└──────────────────────────────┘

The ELF header points to the program header table (for the loader) and the section header table (for the linker and debugger). Everything else sits between them.

Program headers: what the loader sees

readelf -l hello

This shows the segments — the chunks the kernel maps into memory. Each segment has a type, permissions, a file offset, and a virtual address. We'll dissect these in the next chapter.

Section headers: what the linker sees

readelf -S hello

This shows the sections — the fine-grained pieces the compiler and linker work with. You'll see .text, .data, .rodata, .bss, .symtab, and many more.

C vs Rust: why the Rust binary is bigger

Let's compile the same program in Rust:

// save as hello.rs, compile: rustc hello.rs
fn main() {
    println!("Hello, ELF!");
}

$ ls -la hello         # C binary
-rwxr-xr-x 1 user user   15960 Feb 19 10:00 hello

$ ls -la hello_rs      # Rust binary (renamed for comparison)
-rwxr-xr-x 1 user user 4374432 Feb 19 10:01 hello_rs

The Rust binary is ~270x larger. Why?

readelf -S hello    | wc -l    # C:    ~31 sections
readelf -S hello_rs | wc -l    # Rust: ~43 sections

Three reasons:

Static linking of the standard library. Rust statically links libstd by default. The C binary dynamically links libc. All that code for println!, formatting, panic handling, and the Rust runtime gets baked in.
More debug info. Rust emits richer debug sections (.debug_info, .debug_abbrev, .debug_line, .debug_str, etc.) even without explicit -g.
Monomorphized generics. Rust generates specialized code for each concrete type used with generics. println! alone pulls in a substantial amount of formatting machinery.

Strip it down

$ strip hello_rs -o hello_rs_stripped
$ ls -la hello_rs hello_rs_stripped
-rwxr-xr-x 1 user user 4374432 Feb 19 10:01 hello_rs
-rwxr-xr-x 1 user user  311496 Feb 19 10:02 hello_rs_stripped

strip removes symbol tables and debug info — sections the runtime doesn't need. The binary still runs, but you can no longer debug it with meaningful function names.

$ strip hello -o hello_stripped
$ ls -la hello hello_stripped
-rwxr-xr-x 1 user user 15960 Feb 19 10:00 hello
-rwxr-xr-x 1 user user 14408 Feb 19 10:02 hello_stripped

The C binary barely shrinks — it was already small because it delegates everything to the shared libc.so.

💡 Fun Fact: Production Rust binaries are typically compiled with cargo build --release, which enables optimizations and can be further reduced with strip, lto = true in Cargo.toml, and opt-level = "z" for size optimization. A release + stripped Rust hello world can get down to ~300 KB.

Finding your function: the symbol table

readelf -s hello | grep main

    34: 0000000000001149    35 FUNC    GLOBAL DEFAULT   16 main

There it is. main lives at virtual address 0x1149, is 35 bytes long, is a function (FUNC), has global visibility, and sits in section index 16 (which is .text).

Now for Rust:

readelf -s hello_rs | grep 'main'

  2156: 0000000000008280    47 FUNC    GLOBAL DEFAULT   14 main
  5731: 00000000000082b0   103 FUNC    LOCAL  DEFAULT   14 hello_rs::main

Rust has two entries: the C-compatible main that the C runtime calls, and hello_rs::main which is your actual Rust function. The first one is a thin wrapper that calls the second.

The entry point is NOT main

Remember the entry point from readelf -h? It was 0x1060, but main is at 0x1149. What's at 0x1060?

readelf -s hello | grep ' _start'

     1: 0000000000001060     0 FUNC    GLOBAL DEFAULT   16 _start

_start is the real entry point. It's provided by the C runtime (crt1.o). It sets up the stack, initializes the C library, and then calls main. When main returns, _start calls exit().

Kernel jumps here
       │
       v
   _start          (from crt1.o)
       │
       ├── __libc_start_main()
       │       │
       │       ├── Initialize libc
       │       ├── Call constructors
       │       ├── Call main()  ◄── YOUR CODE
       │       ├── Call destructors
       │       └── Call exit()
       │
       └── (never reached)

Rust's entry point

Rust has its own startup sequence, but it still begins with _start and ends up calling the system's C runtime:

_start → __libc_start_main → main (Rust shim) → std::rt::lang_start
                                                       │
                                                       └── your fn main()

Same hardware. Same ELF. Same kernel. Different runtime path to reach your code.

🔧 Task: Read the ELF header by hand

Compile a C program: gcc -o hello hello.c

Hex dump the first 64 bytes: xxd hello | head -4

Using the field table from this chapter, identify each field manually:

Bytes 0-3: Magic number. What are they?

Byte 4: Class. Is it 32-bit or 64-bit?

Byte 5: Endianness. Little or big?

Bytes 16-17: ELF type. What type is it?

Bytes 18-19: Machine. What architecture?

Bytes 24-31: Entry point. What address? (Remember: little-endian!)

Verify your answers with readelf -h hello.

Repeat with a Rust binary. Do the fields differ?

This is how forensic analysts and reverse engineers read binaries — byte by byte.

How Programs Really Run: From CPU to Process Memory