Linux Book

From First Boot to Production

Welcome to Linux Book — a hands-on guide that takes you from your first ls command to confidently managing production Linux systems. This is not a reference manual you skim and shelve. Every chapter puts commands in your hands, explains what happens underneath, and builds real-world muscle memory.

Who This Book Is For

  • Aspiring sysadmins who want to go beyond clicking buttons in a GUI
  • DevOps engineers who need deep Linux fluency, not just copy-pasted scripts
  • Developers who deploy to Linux but never really learned it properly
  • Students and career-changers building a foundation for cloud, security, or infrastructure roles
  • Anyone who's tired of blindly following tutorials without understanding why

You don't need programming experience. You don't need a computer science degree. You need curiosity and a terminal.

What Makes This Book Different

Every tool covered is open source. No proprietary lock-in, no vendor-specific tricks. What you learn here works on any Linux distribution, any cloud provider, any hardware.

Every chapter follows the same rhythm:

  1. "Why This Matters" — A real-world scenario that shows you exactly when you'd need this skill
  2. "Try This Right Now" — Copy-paste commands you can run immediately
  3. Concept deep-dives — Clear explanations with ASCII diagrams you can actually read
  4. "Think About It" — Mid-chapter questions that make you stop and reason
  5. Hands-on blocks — Step-by-step walkthroughs with real command output
  6. "Debug This" — Broken scenarios for you to diagnose (because that's the real job)
  7. "What Just Happened?" — Recap boxes that crystallize each section
  8. "Try This" — End-of-chapter exercises with bonus challenges

Where commands differ between distributions (Debian/Ubuntu vs RHEL/Fedora vs Arch), you'll see Distro Notes calling out the differences. Where commands can destroy data, you'll see safety warnings before you run anything dangerous.

What You'll Master

┌──────────────────────────────────────────────────────────┐
│                    Linux Book                          │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  Part I     The Ground Floor                             │
│             What Linux is, choosing a distro,            │
│             installing it, meeting the shell             │
│                                                          │
│  Part II    Filesystem & "Everything Is a File"          │
│             Hierarchy, permissions, disks, inodes        │
│                                                          │
│  Part III   Users, Processes, Signals & IPC              │
│             Access control, job control, the kernel,     │
│             how Linux boots                              │
│                                                          │
│  Part IV    systemd & Service Management                 │
│             Units, services, journald logging            │
│                                                          │
│  Part V     Bash, Regex & Text Processing                │
│             Shell mastery, scripting, sed, awk,          │
│             cron, automation                             │
│                                                          │
│  Part VI    Essential Tools                              │
│             Vim, tmux, Git for operations                │
│                                                          │
│  Part VII   Networking Fundamentals                      │
│             OSI/TCP-IP, subnetting, DNS, DHCP            │
│                                                          │
│  Part VIII  Linux Networking in Practice                  │
│             Interfaces, firewalls, routing, SSH,         │
│             WireGuard VPN                                │
│                                                          │
│  Part IX    Security, PKI & Cryptography                 │
│             Hardening, TLS/SSL, OpenSSL, ACME,           │
│             SELinux, AppArmor                            │
│                                                          │
│  Part X     Web Servers & Load Balancing                 │
│             HTTP, Nginx, Apache, HAProxy                 │
│                                                          │
│  Part XI    Storage, Backup & Recovery                   │
│             LVM, RAID, NFS, backup strategies,           │
│             disaster recovery                            │
│                                                          │
│  Part XII   Performance & Monitoring                     │
│             top/htop, memory, disk I/O, network,         │
│             resource limits                              │
│                                                          │
│  Part XIII  Package Management & Software                │
│             apt/dnf/pacman, compiling from source,       │
│             shared libraries, custom kernels             │
│                                                          │
│  Part XIV   Containers & Virtualization                  │
│             Cgroups, namespaces, Docker, Podman,         │
│             LXC/LXD, orchestration                      │
│                                                          │
│  Part XV    Configuration Management & DevOps            │
│             IaC concepts, Ansible, CI/CD,                │
│             Prometheus + Grafana                         │
│                                                          │
│  Part XVI   Linux in the Real World                      │
│             Enterprise, embedded, cloud, databases,      │
│             NTP, troubleshooting methodology             │
│                                                          │
│  Appendices Command reference, config files,             │
│             lab setup, glossary, further reading         │
│                                                          │
└──────────────────────────────────────────────────────────┘

How to Use This Book

If you're brand new to Linux: Start at Chapter 1 and go straight through. Each chapter builds on the previous ones.

If you have some Linux experience: Jump to whatever interests you. Each Part is reasonably self-contained, and cross-references point you to prerequisites when needed.

If you're studying for a certification (RHCSA, LFCS, etc.): This book covers the practical skills those exams test. Use the appendices as quick references.

Set up a lab. Appendix C walks you through creating a safe practice environment. You can use a virtual machine, a spare laptop, WSL2 on Windows, or a cheap cloud instance. The important thing is: type the commands. Reading about Linux is like reading about swimming — you have to get in the water.

Conventions

Throughout this book:

  • monospace text indicates commands, file paths, or configuration values
  • Bold text marks important terms on first use
  • Commands prefixed with $ run as a regular user; commands prefixed with # require root
  • Output blocks show real terminal output — what you see should match

Distro Note: Boxes like this highlight where commands or paths differ between Debian/Ubuntu, RHEL/Fedora, and Arch Linux.

Warning: Boxes like this appear before commands that can destroy data or break your system. Read them before you type.

Think About It: Boxes like this pose questions to help you reason through concepts before we explain them.

Let's begin.

What Is Linux and Why It's Everywhere

Why This Matters

Picture this: you open your phone to check the weather, stream music on your commute, tap your card at a payment terminal, and then sit down at work to deploy code to a cloud server. Every single one of those actions just touched Linux. The phone runs Android (built on the Linux kernel). The music service runs on Linux servers. The payment terminal likely runs embedded Linux. The cloud server? Almost certainly Linux.

Linux is not some niche operating system for hobbyists. It powers 96% of the world's top one million web servers, all of the world's top 500 supercomputers, the International Space Station, Tesla's infotainment systems, and the majority of the world's cloud infrastructure. When you learn Linux, you are learning the operating system that runs the modern world.

This chapter answers the most fundamental question: what exactly is Linux, where did it come from, and why should you invest your time mastering it?


Try This Right Now

If you already have access to any Linux system (a server, a VM, WSL2, or even an Android phone with Termux), open a terminal and type:

$ uname -a

You should see something like:

Linux myhost 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

That single line tells you:

  • You are running Linux (the kernel)
  • Your hostname is myhost
  • The kernel version is 6.1.0-18-amd64
  • It was compiled with SMP (symmetric multiprocessing) and PREEMPT (preemptive scheduling)
  • The distribution is Debian
  • The architecture is x86_64 (64-bit Intel/AMD)

If you do not have a Linux system yet, that is perfectly fine. Chapter 3 walks you through setting one up. For now, read on and come back to run the commands later.


What Linux Actually Is

When people say "Linux," they usually mean one of two things, and the difference matters.

The Linux Kernel

In the strictest sense, Linux is a kernel -- the core piece of software that sits between your hardware and everything else. The kernel's job is enormous and critical:

┌─────────────────────────────────────────────────┐
│              User Applications                   │
│         (Firefox, vim, python, nginx)            │
├─────────────────────────────────────────────────┤
│              System Libraries                    │
│           (glibc, libssl, libpthread)            │
├─────────────────────────────────────────────────┤
│           System Calls Interface                 │
│      (open, read, write, fork, exec, ...)        │
├─────────────────────────────────────────────────┤
│             THE LINUX KERNEL                     │
│  ┌───────────┬──────────┬──────────────────┐     │
│  │ Process   │ Memory   │ Filesystem       │     │
│  │ Scheduler │ Manager  │ Layer (VFS)      │     │
│  ├───────────┼──────────┼──────────────────┤     │
│  │ Network   │ Device   │ Security         │     │
│  │ Stack     │ Drivers  │ (SELinux, etc.)  │     │
│  └───────────┴──────────┴──────────────────┘     │
├─────────────────────────────────────────────────┤
│                  HARDWARE                        │
│       (CPU, RAM, Disk, Network Card, GPU)        │
└─────────────────────────────────────────────────┘

The kernel handles:

  • Process management -- creating, scheduling, and terminating programs
  • Memory management -- allocating RAM to programs, swapping to disk
  • Filesystem access -- reading and writing files on disks
  • Device drivers -- talking to hardware (network cards, USB devices, GPUs)
  • Networking -- implementing TCP/IP, routing packets
  • Security -- enforcing permissions, capabilities, and mandatory access controls

Without the kernel, your applications have no way to talk to the hardware. With a broken kernel, nothing works. The kernel is the foundation.

The Linux Operating System (Distribution)

A kernel alone is not very useful. You need a shell to type commands in, utilities to copy files and manage processes, a package manager to install software, an init system to boot the machine, and thousands of other tools. When all of these are bundled together with the Linux kernel, you get a Linux distribution (or "distro").

┌──────────────────────────────────────────────┐
│          A Linux Distribution                 │
│                                               │
│   Linux Kernel                                │
│   + GNU Core Utilities (ls, cp, grep, etc.)   │
│   + Shell (bash, zsh)                         │
│   + Init System (systemd, OpenRC)             │
│   + Package Manager (apt, dnf, pacman)        │
│   + Libraries (glibc, openssl)                │
│   + Desktop Environment (optional)            │
│   + Default Applications                      │
│   + Configuration & Branding                  │
└──────────────────────────────────────────────┘

When someone says "I run Linux," they almost always mean "I run a Linux distribution." Ubuntu, Fedora, Debian, Arch Linux, Red Hat Enterprise Linux -- these are all distributions. They all share the same kernel (or a version of it) but differ in what tools they bundle, how they manage packages, and what their default configurations look like.

Think About It: If all Linux distributions share the same kernel, why are there hundreds of different distributions? What would be different about each one?


A Brief History: How We Got Here

Understanding where Linux came from helps you understand why it works the way it does.

Unix: The Ancestor (1969)

The story starts at AT&T's Bell Labs in 1969. Ken Thompson and Dennis Ritchie created Unix, an operating system built around a few powerful ideas:

  • Everything is a file -- devices, processes, even network connections can be accessed like files
  • Small, sharp tools -- each program does one thing well
  • Text as a universal interface -- programs communicate through plain text streams
  • Pipelines -- you can chain small tools together to do complex things

These ideas were radical in 1969 and they remain the philosophical backbone of Linux today. When you pipe grep into sort into uniq later in this book, you are using a design philosophy that is over fifty years old -- because it works.

GNU: The Free Software Movement (1983)

By the early 1980s, Unix had become proprietary and expensive. Richard Stallman, a programmer at MIT, decided that software should be free -- not "free" as in price, but "free" as in freedom. In 1983, he launched the GNU Project (a recursive acronym: "GNU's Not Unix") to create a completely free Unix-like operating system.

By 1991, the GNU Project had built almost everything needed for a complete operating system: a compiler (GCC), a shell (Bash), core utilities (ls, cp, mv, grep), text editors (Emacs), and libraries (glibc). They had everything except the most critical piece: a working kernel.

Linux: The Missing Kernel (1991)

On August 25, 1991, a 21-year-old Finnish computer science student named Linus Torvalds posted this message to the comp.os.minix newsgroup:

"Hello everybody out there using minix - I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones."

That "hobby" became the Linux kernel. When combined with the GNU tools, the result was a complete, free, Unix-like operating system. This is why some people call it GNU/Linux -- the kernel is Linux, but much of the userspace came from the GNU Project. In practice, most people just say "Linux."

The Timeline

1969  Unix created at Bell Labs
1983  Richard Stallman launches the GNU Project
1985  Free Software Foundation (FSF) established
1989  GNU General Public License (GPL) v1 released
1991  Linus Torvalds releases Linux kernel 0.01
1992  Linux re-licensed under GPL v2
1993  Debian and Slackware (first major distros) appear
1994  Linux kernel 1.0 released
1996  Linux kernel 2.0 -- SMP support
2003  RHEL (Red Hat Enterprise Linux) launched
2004  Ubuntu 4.10 ("Warty Warthog") released
2007  Android announced (built on Linux kernel)
2008  Linux kernel used in first Android phones
2011  Linux kernel 3.0 released
2013  All Top500 supercomputers run Linux
2015  Linux kernel 4.0 -- live patching
2019  Microsoft ships WSL2 with a real Linux kernel
2020  Linux kernel 5.x series
2022  Linux kernel 6.0 released
2024  Linux kernel 6.x continues active development

Think About It: Linus Torvalds said his project "won't be big and professional." It now runs on everything from wristwatches to supercomputers. What do you think allowed a hobby project to grow into the dominant operating system?


The GPL and Open Source Philosophy

Linux is released under the GNU General Public License version 2 (GPL v2). This is not just a legal technicality -- it is the reason Linux exists in its current form.

The GPL v2 says:

  1. Freedom to use -- You can run the software for any purpose
  2. Freedom to study -- You can read and modify the source code
  3. Freedom to share -- You can distribute copies
  4. Freedom to improve -- You can distribute your modifications

The critical requirement: if you distribute a modified version of GPL-licensed software, you must also distribute the source code under the same license. This is called copyleft -- it ensures that Linux and its derivatives remain free forever.

This is why:

  • You can download the entire Linux kernel source code right now, for free
  • Android, which is built on Linux, must make its kernel modifications public
  • Companies like Red Hat, Canonical, and SUSE can sell support and services around Linux, but cannot lock up the code itself
  • You can build your own Linux distribution tomorrow if you want to

Open Source vs Free Software

You will hear both terms. They overlap heavily but differ in emphasis:

  • Free Software (Stallman's philosophy): emphasizes user freedom as a moral imperative
  • Open Source (coined 1998): emphasizes the practical benefits of collaborative development

Linux lives comfortably in both camps. For this book, what matters is this: every tool we cover is open source. You can inspect its code, modify it, and learn from it. No black boxes.


Where Linux Runs

The scope of Linux adoption is staggering. Let us look at the numbers.

Servers and Cloud

  • 96.3% of the top one million web servers run Linux
  • 90%+ of public cloud workloads run on Linux
  • All major cloud providers (AWS, Google Cloud, Azure) offer Linux as the primary server OS
  • AWS's own infrastructure runs on a custom Linux distribution (Amazon Linux)
  • Google runs a custom Linux internally across billions of containers

When you visit nearly any website -- Google, Netflix, Amazon, Wikipedia, Reddit -- you are talking to Linux servers.

Supercomputers

  • 100% of the world's top 500 supercomputers run Linux
  • This has been the case since November 2017
  • Before Linux dominated, the list included Unix, Windows, and custom operating systems
  • Linux won because it is free, customizable, and scales from one core to millions

Mobile Devices

  • Android runs on the Linux kernel
  • Android holds roughly 72% of the global mobile OS market
  • That means the majority of smartphones in the world run Linux

Embedded Systems and IoT

  • Smart TVs (Samsung's Tizen, LG's webOS -- both Linux-based)
  • Routers and network equipment
  • Industrial control systems
  • Automotive infotainment (Tesla, Toyota, BMW)
  • Digital signage, ATMs, point-of-sale terminals

Space

  • The International Space Station migrated from Windows to Linux (Debian) in 2013
  • NASA's Mars Ingenuity helicopter runs Linux
  • SpaceX uses Linux on its flight computers

Desktop

Linux has roughly 4% of the desktop market. This is small compared to Windows and macOS, but it represents tens of millions of users. The desktop is also where you will likely do most of your learning.

┌────────────────────────────────────────────┐
│         Where Linux Runs                    │
│                                             │
│  Supercomputers     ████████████████ 100%   │
│  Cloud Servers      ███████████████░  90%+  │
│  Web Servers        ███████████████░  96%   │
│  Mobile (Android)   ███████████░░░░░  72%   │
│  Embedded/IoT       ████████░░░░░░░░  ~50%  │
│  Desktop            █░░░░░░░░░░░░░░░  ~4%   │
│                                             │
└────────────────────────────────────────────┘

Think About It: Linux dominates servers, supercomputers, and phones but has only about 4% of the desktop market. What factors might explain this difference?


Why Learn Linux?

Career Demand

Nearly every job posting for system administration, DevOps engineering, site reliability engineering, cloud engineering, cybersecurity, or backend development lists Linux as a required or preferred skill. This is not a trend -- it has been the case for decades and the demand is growing.

Certifications like the RHCSA (Red Hat Certified System Administrator), LFCS (Linux Foundation Certified System Administrator), and AWS Solutions Architect all require Linux knowledge.

Understanding Over Clicking

On Windows or macOS, you often configure things through graphical menus. You click, toggle, and hope. On Linux, configuration is done through text files and commands. This seems harder at first, but it has profound advantages:

  • Reproducibility -- you can script everything
  • Version control -- you can track every change with Git
  • Remote management -- you can manage a server 10,000 miles away over SSH
  • Automation -- you can configure a thousand servers identically
  • Debugging -- when something breaks, you can read the configuration and logs to understand exactly what happened

Control and Transparency

Linux hides nothing from you. If you want to know what your operating system is doing right now:

$ ps aux      # show every running process
$ top         # watch CPU and memory usage live
$ ss -tlnp    # see which programs are listening on network ports
$ dmesg       # read kernel messages
$ journalctl  # read system logs

On proprietary systems, much of this is hidden behind opaque interfaces. On Linux, it is all text, all accessible, all yours.

Freedom

With Linux, you are not locked into any vendor's ecosystem. You can:

  • Choose your desktop environment (or use none)
  • Choose your init system
  • Choose your package manager
  • Build the system from scratch if you want (see: Linux From Scratch, Gentoo)
  • Run it on hardware that proprietary vendors have abandoned
  • Audit every line of code that runs on your machine

This is not just philosophical. If a vendor discontinues a product, Linux keeps running. If a government mandates backdoors in proprietary software, you can verify that your Linux system has none.


Kernel vs Userspace: A Critical Distinction

One concept that will recur throughout this book is the boundary between kernel space and user space.

┌──────────────────────────────────────────────┐
│              USER SPACE                       │
│                                               │
│  Your programs, shells, scripts, daemons      │
│  They CANNOT directly access hardware         │
│  They MUST ask the kernel via system calls     │
│                                               │
├──────────────────────────────────────────────┤
│          SYSTEM CALL BOUNDARY                 │
│     (the controlled gateway)                  │
├──────────────────────────────────────────────┤
│              KERNEL SPACE                     │
│                                               │
│  The kernel, device drivers, kernel modules   │
│  Full access to hardware and memory           │
│  Runs with highest privilege                  │
│                                               │
├──────────────────────────────────────────────┤
│              HARDWARE                         │
└──────────────────────────────────────────────┘
  • Kernel space: The kernel runs with full access to the hardware. A bug here can crash the entire system.
  • User space: Everything else -- your shell, your web browser, your web server. Programs here cannot directly touch hardware. They must make system calls to ask the kernel to do things on their behalf.

This separation is what makes Linux stable. If Firefox crashes, the kernel keeps running. If a Python script goes rogue and tries to access memory it should not, the kernel terminates it. The kernel is the bouncer.

You will encounter this boundary again and again: when we discuss processes (Chapter 10), the kernel (Chapter 13), and containers (Chapter 62-66).


Hands-On: Exploring Your (or Any) Linux System

If you have a Linux system available, try these commands. Each one reveals something about the system.

What kernel are you running?

$ uname -r
6.1.0-18-amd64

The kernel version follows a pattern: major.minor.patch. The 6.1.0 tells you this is major version 6, minor version 1, patch 0.

What distribution is this?

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
ID=debian
...

This file exists on virtually every modern Linux distribution and gives you standardized information about which distro and version you are running.

How long has the system been running?

$ uptime
 14:32:07 up 47 days,  3:21,  2 users,  load average: 0.15, 0.10, 0.08

This tells you the system has been running for 47 days without a reboot. Linux servers commonly run for months or years between reboots.

What CPU does the system have?

$ lscpu | head -15
Architecture:            x86_64
CPU op-mode(s):          32-bit, 64-bit
Address sizes:           46 bits physical, 48 bits virtual
Byte Order:              Little Endian
CPU(s):                  4
On-line CPU(s) list:     0-3
Vendor ID:               GenuineIntel
Model name:              Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
...

How much memory is available?

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       2.1Gi       3.4Gi       256Mi       2.2Gi       5.1Gi
Swap:          2.0Gi          0B       2.0Gi

The -h flag means "human-readable" -- you will see this flag on many Linux commands.

How much disk space?

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   12G   35G  26% /

Every one of these commands is a small tool that does one thing well. That is the Unix philosophy at work.


The Linux Development Model

The Linux kernel is the largest collaborative software project in human history. Some numbers to appreciate the scale:

  • Over 30 million lines of code in the kernel source
  • Thousands of contributors per release cycle
  • A new stable release approximately every 9-10 weeks
  • Companies contributing include Intel, Google, Red Hat, Samsung, Microsoft, Meta, Amazon, and hundreds more
  • Linus Torvalds remains the project's Benevolent Dictator For Life (BDFL) with final say on what gets merged

The development process is remarkably open:

  1. Developers submit patches to public mailing lists
  2. Patches are reviewed by maintainers and other developers
  3. Accepted patches flow through subsystem maintainer trees
  4. Linus merges from maintainer trees during the merge window
  5. A series of release candidates (rc1, rc2, ...) are tested
  6. A stable release is tagged

Every email, every patch, every discussion is public. You can read the kernel mailing list archives and see exactly why every change was made.


What Just Happened?

┌─────────────────────────────────────────────────────┐
│                                                      │
│  In this chapter, you learned:                       │
│                                                      │
│  - Linux is a KERNEL, not an operating system.       │
│    A distribution = kernel + userspace tools.         │
│                                                      │
│  - Linux descends from Unix (1969), combines the     │
│    GNU tools (1983) with Linus Torvalds' kernel      │
│    (1991), and is licensed under the GPL v2.          │
│                                                      │
│  - Linux runs everywhere: 100% of supercomputers,    │
│    96% of web servers, 72% of mobile phones           │
│    (Android), the ISS, Mars helicopters, and cars.    │
│                                                      │
│  - The GPL ensures Linux remains free and open.       │
│    You can read, modify, and redistribute the code.   │
│                                                      │
│  - The kernel/userspace boundary is a key concept:    │
│    programs ask the kernel to access hardware via      │
│    system calls.                                      │
│                                                      │
│  - Learning Linux gives you career skills, system     │
│    understanding, and freedom from vendor lock-in.    │
│                                                      │
│  - Commands you met: uname, cat, uptime, lscpu,      │
│    free, df                                          │
│                                                      │
└─────────────────────────────────────────────────────┘

Try This

Exercises

  1. Research exercise: Go to kernel.org and find the current stable kernel version. Compare it to the version shown by uname -r on any Linux system you have access to. Is the system up to date?

  2. Exploration exercise: If you have a Linux system, run cat /etc/os-release and identify: the distribution name, the version number, and the ID. Write these down -- you will need them in Chapter 2.

  3. Reading exercise: Read the original Linus Torvalds newsgroup post from 1991 (search for "linux history linus newsgroup 1991"). What hardware was he targeting? What features was he not planning to support?

  4. Thinking exercise: We said "everything is a file" is a Unix/Linux philosophy. What might it mean for a device (like a hard disk or a keyboard) to be represented as a file? What advantages might this give?

  5. Career exercise: Search a job board for "Linux system administrator" or "DevOps engineer" positions. List five specific Linux skills that appear in multiple job descriptions.

Bonus Challenge

Find three devices in your home that likely run Linux (hint: your router is almost certainly one). For each device, research what Linux-based operating system it uses and what kernel version it likely runs.


What's Next

Now that you understand what Linux is and why it matters, the next question is: which Linux should you use? There are hundreds of distributions, and Chapter 2 will help you navigate this landscape and choose the right one for your goals.

Choosing Your Linux

Why This Matters

You have just accepted a new job as a junior DevOps engineer. On day one, your team lead says: "Spin up two servers -- one Ubuntu 22.04 for the web tier, one Rocky Linux 9 for the database tier." You nod confidently, but inside you are thinking: Why two different distributions? What is the difference? Does it matter?

It does matter. Distributions differ in their package managers, release schedules, default configurations, security models, community support, and commercial backing. Choosing the wrong distribution for a task can mean fighting your tools instead of using them. Choosing the right one means your environment works with you.

This chapter maps the Linux distribution landscape so you can make informed choices -- whether you are setting up a learning environment, a personal desktop, or a production server fleet.


Try This Right Now

If you have any Linux system available, let us immediately figure out what distribution and family you are running:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
ID=ubuntu
ID_LIKE=debian
VERSION_ID="22.04"
PRETTY_NAME="Ubuntu 22.04.3 LTS"

Two key fields:

  • ID tells you the exact distribution (ubuntu)
  • ID_LIKE tells you the family it belongs to (debian)

Now check which package manager is available:

$ which apt dnf pacman zypper 2>/dev/null
/usr/bin/apt

Whatever comes back tells you which distribution family you are in. If you see apt, you are in the Debian family. If dnf, the Red Hat family. If pacman, the Arch family. If zypper, the SUSE family.


What Is a Distribution?

In Chapter 1, we established that Linux is a kernel. A distribution (distro) takes that kernel and adds everything needed to make a usable operating system:

┌─────────────────────────────────────────────────┐
│            Linux Distribution                    │
│                                                  │
│  ┌────────────────────────────────────────────┐  │
│  │  Kernel (shared across all distros)        │  │
│  │  (possibly with distro-specific patches)   │  │
│  └────────────────────────────────────────────┘  │
│                                                  │
│  ┌────────────────────────────────────────────┐  │
│  │  Core Userspace:                           │  │
│  │  - Init system (systemd, OpenRC, runit)    │  │
│  │  - Shell (bash, zsh)                       │  │
│  │  - Core utilities (coreutils)              │  │
│  │  - System libraries (glibc, musl)          │  │
│  └────────────────────────────────────────────┘  │
│                                                  │
│  ┌────────────────────────────────────────────┐  │
│  │  Package Management:                       │  │
│  │  - Package manager (apt, dnf, pacman)      │  │
│  │  - Software repositories                   │  │
│  │  - Dependency resolution                   │  │
│  └────────────────────────────────────────────┘  │
│                                                  │
│  ┌────────────────────────────────────────────┐  │
│  │  Extras (varies by distro):                │  │
│  │  - Desktop environment (GNOME, KDE, none)  │  │
│  │  - Default applications                    │  │
│  │  - Configuration tools                     │  │
│  │  - Branding and theming                    │  │
│  └────────────────────────────────────────────┘  │
│                                                  │
└─────────────────────────────────────────────────┘

Every distribution makes choices: which kernel version to ship, which init system to use, how to manage packages, what default software to include, and how frequently to release updates. These choices create the identity and trade-offs of each distribution.


The Major Distribution Families

There are hundreds of Linux distributions, but nearly all descend from a handful of families. Understanding the families is more important than memorizing individual distros.

The Debian Family

Debian (1993)
├── Ubuntu (2004)
│   ├── Linux Mint
│   ├── Pop!_OS
│   ├── Elementary OS
│   └── Kubuntu, Xubuntu, Lubuntu (flavors)
├── Kali Linux (security/pentesting)
├── Raspberry Pi OS
└── Devuan (Debian without systemd)

Debian is the grandfather of this family. Founded in 1993 by Ian Murdock, it is one of the oldest distributions still in active development. Debian prioritizes stability and freedom above all else.

Key characteristics:

  • Package format: .deb
  • Package manager: apt (Advanced Package Tool), dpkg for low-level operations
  • Release model: Point release, roughly every 2 years
  • Philosophy: rock-solid stability, software freedom
  • Releases are named after Toy Story characters (Bookworm, Bullseye, Buster)

Ubuntu, created by Canonical in 2004, is the most popular Debian derivative. It takes Debian's foundation and adds polish, newer software, and commercial support. Ubuntu has its own release cadence:

  • Regular releases every 6 months (April and October)
  • LTS (Long Term Support) releases every 2 years, supported for 5 years
  • LTS versions are what you use on servers (e.g., Ubuntu 22.04 LTS, Ubuntu 24.04 LTS)

When to choose Debian/Ubuntu:

  • Ubuntu LTS is the most common choice for cloud servers and containers
  • Debian stable is ideal when you want maximum stability and minimal surprises
  • Enormous package repositories (over 59,000 packages in Debian)
  • Most online tutorials and guides target Ubuntu

Distro Note: When this book shows commands using apt, they apply to Debian, Ubuntu, and all their derivatives.

The Red Hat Family

Red Hat Enterprise Linux (RHEL)
├── CentOS Stream (upstream preview of RHEL)
├── Rocky Linux (community RHEL rebuild)
├── AlmaLinux (community RHEL rebuild)
├── Oracle Linux (Oracle's RHEL rebuild)
└── Amazon Linux (AWS's RHEL-like distro)

Fedora (community upstream of RHEL)

Red Hat Enterprise Linux (RHEL) is the dominant distribution in enterprise and corporate environments. Red Hat (now owned by IBM) sells subscriptions that include support, security patches, and certification.

Fedora is the community distribution that serves as a testing ground for technologies that eventually make it into RHEL. Fedora ships newer software and is more cutting-edge.

Key characteristics:

  • Package format: .rpm (RPM Package Manager)
  • Package manager: dnf (Dandified YUM), rpm for low-level operations
  • Release model: RHEL has point releases with 10-year support cycles; Fedora releases every ~6 months
  • Philosophy: enterprise stability, commercial support, certification

The CentOS Story: For years, CentOS was a free, community rebuild of RHEL -- binary-compatible, just without the Red Hat branding and support. In 2020, Red Hat controversially shifted CentOS to "CentOS Stream," an upstream preview of RHEL rather than a downstream rebuild. This broke the free-as-in-RHEL promise, and two community projects emerged to fill the gap:

  • Rocky Linux -- founded by one of the original CentOS creators
  • AlmaLinux -- backed by CloudLinux

Both aim to be 1:1 compatible with RHEL, free of charge.

When to choose RHEL/Fedora family:

  • Enterprise environments that require commercial support and certification
  • When your organization already uses RHEL (consistency matters)
  • When preparing for the RHCSA or RHCE certifications
  • Fedora for desktop users who want cutting-edge software with good stability

Distro Note: When this book shows commands using dnf, they apply to Fedora, RHEL, Rocky Linux, AlmaLinux, and CentOS Stream.

The Arch Family

Arch Linux (2002)
├── Manjaro
├── EndeavourOS
├── Garuda Linux
└── SteamOS 3.0 (Valve's gaming OS)

Arch Linux takes a radically different approach: minimalism and user control. Arch gives you nothing by default -- no desktop environment, no hand-holding, no automatic configuration. You build the system yourself, piece by piece.

Key characteristics:

  • Package format: .pkg.tar.zst
  • Package manager: pacman
  • Release model: Rolling release -- there are no "versions." You install once and continuously update
  • Philosophy: simplicity, user-centricity, cutting-edge software
  • The Arch Wiki is widely regarded as the best Linux documentation on the internet (useful even for non-Arch users)
  • AUR (Arch User Repository) -- a community repository with packages for virtually any software

When to choose Arch:

  • When you want to deeply understand how Linux works (the installation teaches you)
  • When you want the very latest software versions
  • Desktop use by experienced users
  • NOT recommended for production servers (rolling releases are unpredictable for stability)

Distro Note: When this book shows commands using pacman, they apply to Arch Linux and its derivatives like Manjaro and EndeavourOS.

The SUSE Family

SUSE Linux Enterprise (SLE)
├── SUSE Linux Enterprise Server (SLES)
└── SUSE Linux Enterprise Desktop (SLED)

openSUSE
├── openSUSE Leap (point release, based on SLE)
└── openSUSE Tumbleweed (rolling release)

SUSE is another major enterprise distribution, particularly popular in Europe. openSUSE is its community counterpart.

Key characteristics:

  • Package format: .rpm
  • Package manager: zypper
  • YaST (Yet another Setup Tool) -- a comprehensive system administration tool
  • Release model: Leap is point release; Tumbleweed is rolling
  • Strong in European enterprise, particularly Germany

When to choose SUSE:

  • Enterprise environments that already use SUSE infrastructure
  • When you want YaST for administration
  • openSUSE Tumbleweed for a polished rolling-release desktop

Other Notable Distributions

  • Alpine Linux -- tiny (5 MB base image), uses musl libc instead of glibc, and apk package manager. Extremely popular for Docker containers
  • Gentoo -- compiles everything from source for maximum optimization. Uses emerge package manager. Deep learning experience
  • Slackware -- one of the oldest surviving distributions (1993). Minimalist, does not auto-resolve dependencies
  • NixOS -- declarative system configuration. Entire OS defined in a config file. Revolutionary approach to reproducibility
  • Void Linux -- independent distribution using runit init system instead of systemd

Think About It: Why do you think there are so many Linux distributions? In most other software ecosystems, there is one dominant option. What about Linux encourages this diversity?


Package Managers: The Heart of a Distribution

The package manager is arguably the single most important difference between distribution families. It determines how you install, update, and remove software.

What a Package Manager Does

┌─────────────────────────────────────────┐
│         Package Manager                  │
│                                          │
│  1. Downloads software from repositories │
│  2. Resolves dependencies automatically  │
│  3. Installs files to correct locations  │
│  4. Tracks what is installed             │
│  5. Handles upgrades and removals        │
│  6. Verifies package integrity (GPG)     │
│                                          │
└─────────────────────────────────────────┘

Without a package manager, installing software means manually downloading source code, resolving dependencies by hand, compiling, and tracking what you installed. Package managers automate all of this.

Side-by-Side Comparison

Taskapt (Debian/Ubuntu)dnf (Fedora/RHEL)pacman (Arch)
Update package listapt updatednf check-updatepacman -Sy
Upgrade all packagesapt upgradednf upgradepacman -Syu
Install a packageapt install nginxdnf install nginxpacman -S nginx
Remove a packageapt remove nginxdnf remove nginxpacman -R nginx
Search for a packageapt search nginxdnf search nginxpacman -Ss nginx
Show package infoapt show nginxdnf info nginxpacman -Si nginx
List installed packagesdpkg -lrpm -qapacman -Q
Which package owns a filedpkg -S /usr/bin/vimrpm -qf /usr/bin/vimpacman -Qo /usr/bin/vim

The syntax differs, but the concepts are identical. Once you understand one package manager deeply, learning another takes hours, not days. Chapter 57 dives much deeper into package management.

Think About It: Why do you think Linux uses centralized package repositories instead of having users download installers from individual websites (like Windows traditionally does)? What security and reliability advantages does this provide?


Release Models: Rolling vs Point Release

This is one of the most consequential choices a distribution makes.

Point Release (Fixed Release)

Examples: Debian, Ubuntu LTS, RHEL, Rocky Linux, openSUSE Leap

Timeline:
──────────────────────────────────────────────────────────>

  Ubuntu 22.04 LTS        Ubuntu 24.04 LTS
  ┌────────────────┐      ┌────────────────┐
  │ Released        │      │ Released        │
  │ Apr 2022        │      │ Apr 2024        │
  │                 │      │                 │
  │ Security fixes  │      │ Security fixes  │
  │ only until      │      │ only until      │
  │ Apr 2027        │      │ Apr 2029        │
  └────────────────┘      └────────────────┘
  • Software versions are frozen at release time
  • Only security patches and critical bug fixes are backported
  • Major software upgrades require upgrading to the next release
  • Pros: predictable, stable, well-tested combinations of software
  • Cons: software versions can feel old; you may need newer features

Point release distributions are the standard for servers. You do not want your production database server spontaneously updating to a new major version of PostgreSQL.

Rolling Release

Examples: Arch Linux, openSUSE Tumbleweed, Gentoo, Void Linux

Timeline:
──────────────────────────────────────────────────────────>
  Arch Linux
  ┌──────────────────────────────────────────────────────┐
  │ Install once, update forever                          │
  │                                                       │
  │ Kernel 6.1 → 6.2 → 6.3 → 6.4 → 6.5 → 6.6 → ...    │
  │ Firefox 110 → 111 → 112 → 113 → 114 → 115 → ...    │
  │ Python 3.11 → 3.12 → ...                            │
  │                                                       │
  │ Every update brings the latest stable version         │
  └──────────────────────────────────────────────────────┘
  • Software is continuously updated to the latest stable versions
  • There are no release "versions" -- you are always on the latest
  • Pros: always up to date, no painful upgrade cycles
  • Cons: updates can introduce breaking changes, less predictable for servers

Semi-Rolling / Hybrid Models

Some distributions blend both approaches:

  • Fedora: releases every ~6 months but moves fast between releases. Not technically rolling, but much more current than RHEL or Debian stable
  • CentOS Stream: continuously updated, slightly ahead of RHEL. A rolling preview of the next RHEL point release
  • Ubuntu non-LTS: new release every 6 months with only 9 months of support. Good for desktops, risky for servers

Choosing for Your Use Case

For Learning (This Book)

Recommended: Ubuntu LTS or Fedora

Either of these will work perfectly for every exercise in this book. Both are well-documented, widely used, and easy to install. Ubuntu LTS is the safer default because more tutorials and guides target it.

If you want to challenge yourself and learn deeply, Arch Linux is an incredible teacher -- the installation process alone teaches you partitioning, bootloaders, and system configuration.

For Desktop Daily Use

PriorityRecommended Distro
Ease of use, "just works"Ubuntu, Linux Mint, Fedora
Cutting-edge softwareFedora, Arch, openSUSE Tumbleweed
GamingArch (or Manjaro), Pop!_OS, Nobara
Privacy-focusedTails, Whonix, Qubes OS
Old/low-spec hardwareLubuntu, antiX, Puppy Linux
Beautiful out of the boxElementary OS, Fedora (GNOME)

For Servers (Production)

PriorityRecommended Distro
Cloud/web serversUbuntu LTS, Debian stable
Enterprise with support contractsRHEL, SLES
Free RHEL-compatibleRocky Linux, AlmaLinux
Container base imagesAlpine Linux, Debian slim
Maximum stability, long lifecycleDebian stable, RHEL

For Specific Roles

RoleCommon Distributions
DevOps / Cloud EngineerUbuntu LTS, Amazon Linux, Fedora
System AdministratorRHEL, Rocky Linux, Debian
Security / PentestingKali Linux, Parrot OS
Embedded / IoTYocto, Buildroot, Alpine
Scientific ComputingCentOS/Rocky (historically), Ubuntu

Hands-On: Comparing Distributions Side by Side

You do not need to install every distribution to compare them. Let us use Docker to quickly peek inside different distributions.

Note: If you do not have Docker yet, that is fine -- just read through this section. You will install Docker in Chapter 63. This is here so you can come back to it later.

# Peek inside Ubuntu
$ docker run --rm -it ubuntu:22.04 bash
root@abc123:/# cat /etc/os-release | head -5
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
ID=ubuntu

root@abc123:/# apt --version
apt 2.4.11 (amd64)

root@abc123:/# exit
# Peek inside Fedora
$ docker run --rm -it fedora:39 bash
[root@def456 /]# cat /etc/os-release | head -5
NAME="Fedora Linux"
VERSION="39 (Container Image)"
ID=fedora
VERSION_ID=39
PRETTY_NAME="Fedora Linux 39 (Container Image)"

[root@def456 /]# dnf --version
4.18.0
...

[root@def456 /]# exit
# Peek inside Alpine (notice how tiny it is)
$ docker run --rm -it alpine:3.19 sh
/ # cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.19.0
PRETTY_NAME="Alpine Linux v3.19"

/ # apk --version
apk-tools 2.14.0

/ # exit
# Peek inside Arch
$ docker run --rm -it archlinux:latest bash
[root@ghi789 /]# cat /etc/os-release | head -3
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch

[root@ghi789 /]# pacman --version
 .---.                  Pacman v6.0.2
...

Same kernel underneath. Different userspace. Different tools. Same fundamental concepts.


Debug This

Your colleague says: "I'm setting up a new production server and I want to use Arch Linux because it has the latest packages."

What is wrong with this reasoning? What would you recommend instead, and why?

Click to see the analysis

The problem: Rolling-release distributions like Arch are risky for production servers because:

  1. Updates can introduce breaking changes at any time
  2. There is no guaranteed stable API -- a package update today might change a config file format
  3. There are no LTS commitments -- you cannot plan maintenance windows around known release dates
  4. Enterprise support contracts are not available
  5. The Arch philosophy explicitly says it is user-centric, not enterprise-centric

Better recommendation: Use Ubuntu LTS, Debian stable, RHEL, Rocky Linux, or AlmaLinux for production servers. These distributions freeze package versions and only backport security fixes, giving you a predictable, stable platform.

The nuance: Arch is an excellent learning tool and desktop distribution. The latest packages are great for a developer workstation. But "latest" and "stable" are often at odds for production.


Understanding Support Lifecycles

When choosing a distribution for production, the support lifecycle matters enormously.

Distribution Support Lifecycles (approximate):
─────────────────────────────────────────────────────────>
                                                    time

Ubuntu LTS:      ├──── 5 years standard ────┤
                 ├──── 10 years with ESM ────────────┤

Debian stable:   ├──── 3 years full ────┤
                 ├──── 5 years with LTS ──────┤

RHEL:            ├──── 10 years full support ────────────────┤
                 ├──── 13 years with ELS ────────────────────────┤

Fedora:          ├── ~13 months ──┤

Arch:            ├── rolling (forever, but no guarantees) ──────>

For a server that needs to run reliably for years, you want a distribution with a long support lifecycle. This ensures you receive security patches without needing to upgrade to a completely new release.

Think About It: Imagine you are managing 500 servers. What happens when your distribution reaches end-of-life? How does this affect your choice of distribution?


What Just Happened?

┌─────────────────────────────────────────────────────┐
│                                                      │
│  In this chapter, you learned:                       │
│                                                      │
│  - A distribution = kernel + package manager +       │
│    system tools + configuration + philosophy.         │
│                                                      │
│  - Major families: Debian (apt), Red Hat (dnf),      │
│    Arch (pacman), SUSE (zypper).                     │
│                                                      │
│  - Point-release distros (Debian, Ubuntu LTS, RHEL)  │
│    freeze versions for stability. Rolling-release     │
│    distros (Arch, Tumbleweed) update continuously.    │
│                                                      │
│  - For servers: Ubuntu LTS, Debian, RHEL, Rocky.     │
│    For desktops: Ubuntu, Fedora, Mint, Arch.          │
│    For containers: Alpine, Debian slim.               │
│    For learning: Ubuntu LTS or Fedora.                │
│                                                      │
│  - Package managers differ in syntax but share        │
│    the same concepts: install, remove, update,        │
│    search, query.                                     │
│                                                      │
│  - Support lifecycles matter for production.          │
│    RHEL: 10+ years. Ubuntu LTS: 5-10 years.          │
│    Arch: rolling with no guarantees.                  │
│                                                      │
└─────────────────────────────────────────────────────┘

Try This

Exercises

  1. Identification exercise: Run cat /etc/os-release on every Linux system you have access to (VM, server, WSL, Docker containers). Record the ID, ID_LIKE, and VERSION_ID for each.

  2. Comparison exercise: Visit the websites of Ubuntu (ubuntu.com), Fedora (fedoraproject.org), and Arch Linux (archlinux.org). For each, find: the latest release version, the supported architectures, and the default desktop environment.

  3. Package manager exercise: Using the comparison table in this chapter, write the equivalent commands for all three package managers to: (a) install the curl package, (b) search for packages related to "postgresql", and (c) remove the curl package.

  4. Research exercise: Go to distrowatch.com and look at the top 10 distributions by page hit ranking. How many of them are Debian-based? How many are independently developed?

  5. Decision exercise: For each scenario below, choose a distribution and justify your choice:

    • A web server that must run for 5 years with minimal maintenance
    • A developer's personal laptop
    • A Raspberry Pi running as a home media server
    • A Docker base image for a microservice
    • A security professional's testing workstation

Bonus Challenge

Install two different distributions in virtual machines (Chapter 3 shows you how) and perform the same task in both -- for example, install Nginx, start it, and verify it is running. Note every difference in commands and paths. This is the fastest way to internalize how distributions differ.


What's Next

You have chosen your distribution (or at least narrowed it down). Now it is time to actually install it. Chapter 3 walks you through multiple installation methods: virtual machines, WSL2 on Windows, bare metal, and cloud instances.

Installing Linux

Why This Matters

You cannot learn Linux from a book alone. You need a running system where you can break things, fix them, break them again, and gradually build confidence. The first real step on that path is installation.

The good news: installing Linux has never been easier. You do not need to wipe your existing operating system. You do not need dedicated hardware. You have at least five different ways to get a working Linux environment, ranging from "takes five minutes, no risk" to "full bare-metal installation." This chapter covers them all.

By the end of this chapter, you will have a working Linux system ready for every exercise in this book.


Try This Right Now

If you are on Windows 10/11, you can have a working Linux system in under five minutes:

# Open PowerShell as Administrator and run:
wsl --install

That single command installs WSL2 with Ubuntu. After a reboot, you will have a full Linux environment. If you already did this, try:

wsl --status

If you are on macOS or already run Linux, skip ahead to the VM section or the bare-metal section.


Installation Options Overview

Here are your options, from simplest to most involved:

┌─────────────────────────────────────────────────────────┐
│              Installation Options                        │
│                                                          │
│  Option              Risk Level    Time     Best For     │
│  ─────────────────────────────────────────────────────   │
│  WSL2 (Windows)      None          5 min    Learning,    │
│                                              dev work    │
│                                                          │
│  Cloud Instance       None          5 min    Server      │
│  (AWS/GCP/etc.)                              skills      │
│                                                          │
│  Virtual Machine      None          30 min   Full        │
│  (VirtualBox/QEMU)                           experience  │
│                                                          │
│  Live USB             None          10 min   Testing     │
│  (no install)                                hardware    │
│                                                          │
│  Dual Boot            Medium        45 min   Daily use   │
│                                              + Windows   │
│                                                          │
│  Bare Metal           High          30 min   Full        │
│  (replace OS)                                commitment  │
│                                                          │
└─────────────────────────────────────────────────────────┘

For this book, any of these options works. A VM or WSL2 is recommended because you can take snapshots, experiment freely, and destroy and rebuild your environment without consequence.


Option 1: WSL2 on Windows

Windows Subsystem for Linux 2 (WSL2) runs a real Linux kernel inside a lightweight virtual machine managed by Windows. It is not emulation -- it is genuine Linux.

Prerequisites

  • Windows 10 version 2004 or later (Build 19041+) or Windows 11
  • Hardware virtualization enabled in BIOS (usually enabled by default)
  • At least 4 GB RAM (8 GB recommended)

Step-by-Step Installation

Step 1: Enable WSL

Open PowerShell as Administrator:

wsl --install

This command:

  • Enables the WSL feature
  • Enables the Virtual Machine Platform
  • Downloads and installs the Linux kernel
  • Sets WSL2 as the default version
  • Installs Ubuntu as the default distribution

Step 2: Reboot

# After the command completes:
Restart-Computer

Step 3: First Launch

After reboot, Ubuntu will launch automatically (or find it in the Start menu). You will be prompted to create a username and password:

Installing, this may take a few minutes...
Please create a default UNIX user account. The username does not need to match your Windows username.
For more information visit: https://aka.ms/wslusers
Enter new UNIX username: yourname
New password:
Retype new password:
passwd: password updated successfully
Installation successful!

Warning: This username and password are for Linux only. They are separate from your Windows login. Choose a short username with no spaces. You will type this password often -- make it something you can remember.

Step 4: Verify

$ uname -a
Linux DESKTOP-ABC123 5.15.153.1-microsoft-standard-WSL2 #1 SMP x86_64 GNU/Linux

$ cat /etc/os-release | head -3
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"

$ echo "Hello from Linux!"
Hello from Linux!

You now have a working Linux system.

Installing Additional Distributions in WSL2

# List available distributions
wsl --list --online

# Install a specific one
wsl --install -d Debian
wsl --install -d Fedora

# List your installed distributions
wsl --list --verbose

WSL2 Tips

  • Access Windows files from Linux: They are at /mnt/c/, /mnt/d/, etc.
  • Access Linux files from Windows: In Explorer, type \\wsl$ in the address bar
  • Run Linux commands from PowerShell: wsl ls -la /home
  • Run Windows commands from Linux: /mnt/c/Windows/System32/notepad.exe
  • Keep your Linux files in the Linux filesystem (under /home/yourname), not on /mnt/c/. The Linux filesystem is much faster for Linux operations.

WSL2 Limitations

WSL2 is excellent for learning and development, but has some limitations:

  • No systemd by default (though it can be enabled in recent versions)
  • No direct hardware access (USB passthrough requires extra tools)
  • Networking is bridged through Windows
  • Not suitable for testing boot processes, kernel modules, or hardware interaction

For these topics, use a virtual machine instead.

Distro Note: WSL2 defaults to Ubuntu, but you can install Debian, Fedora, openSUSE, Kali, Alpine, and others. Check wsl --list --online for the current list.


Option 2: Virtual Machine with VirtualBox

A virtual machine (VM) gives you a complete, isolated Linux system with full control. You can take snapshots, experiment with boot configurations, and even simulate hardware failures.

Prerequisites

  • Any operating system (Windows, macOS, or Linux) as the host
  • At least 8 GB RAM (to give 2-4 GB to the VM)
  • At least 25 GB free disk space
  • VirtualBox downloaded from virtualbox.org

Note: VirtualBox is open source (GPLv2). Other options include GNOME Boxes (Linux), QEMU/KVM (Linux, covered below), and VMware Workstation Player (free for personal use but not open source).

Step 1: Download the Linux ISO

Go to your chosen distribution's website and download the installation ISO:

For this walkthrough, we use Ubuntu Server 22.04 LTS.

Step 2: Create the Virtual Machine

  1. Open VirtualBox and click New

  2. Configure:

    • Name: ubuntu-lab
    • Type: Linux
    • Version: Ubuntu (64-bit)
    • Memory: 2048 MB (2 GB) minimum, 4096 MB recommended
    • Hard disk: Create a virtual hard disk now
    • Disk size: 25 GB (dynamically allocated)
    • Disk type: VDI (VirtualBox Disk Image)
  3. Before starting, go to Settings:

    • System > Processor: 2 CPUs
    • Network > Adapter 1: NAT (for internet) or Bridged Adapter (for network visibility)
    • Storage: Under "Controller: IDE," click the empty disk icon, then select "Choose a disk file" and point to your downloaded ISO

Step 3: Install Ubuntu Server

Start the VM. You will boot from the ISO into the Ubuntu installer.

  1. Language: English
  2. Keyboard layout: Your keyboard layout
  3. Installation type: Ubuntu Server
  4. Network: The installer auto-detects. Accept defaults
  5. Storage/Partitioning: Use the entire disk (guided). The installer will suggest:
  /dev/sda  25 GB
  ├── /dev/sda1   1 GB   /boot/efi  (EFI System Partition)
  ├── /dev/sda2   1.7 GB /boot      (boot partition)
  └── /dev/sda3   22 GB  / (root)   (main partition, ext4)

Accept the defaults for now. We will cover partitioning in depth in Chapter 7.

  1. Profile setup:

    • Your name: Your Name
    • Server name: ubuntu-lab
    • Username: yourname
    • Password: choose something memorable
  2. SSH Setup: Check "Install OpenSSH server" -- you will need this

  3. Featured snaps: Skip all of these for now

  4. Wait for installation to complete, then select Reboot Now

Step 4: First Boot

After reboot, you will see a login prompt:

ubuntu-lab login: yourname
Password:
Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-91-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Mon Jan 15 10:30:00 UTC 2024

  System load:  0.08              Processes:             112
  Usage of /:   18.2% of 21.5GB  Users logged in:       0
  Memory usage: 12%               IPv4 address for eth0: 10.0.2.15
  Swap usage:   0%

yourname@ubuntu-lab:~$

You are in.

Step 5: Take a Snapshot

Before doing anything else, take a snapshot in VirtualBox. This gives you a known-good state you can revert to at any time:

  1. In VirtualBox Manager, select your VM
  2. Click Snapshots (top right)
  3. Click Take (camera icon)
  4. Name it "fresh-install"

Warning: Always take a snapshot before experimenting with anything potentially destructive. This is your safety net. You can restore to any snapshot in seconds.


Option 3: QEMU/KVM (Linux Host)

If your host system already runs Linux, QEMU/KVM is the preferred virtualization solution. It is fully open source and offers near-native performance.

Install QEMU/KVM

# Debian/Ubuntu
$ sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients \
    bridge-utils virt-manager

# Fedora
$ sudo dnf install @virtualization

# Arch
$ sudo pacman -S qemu-full libvirt virt-manager dnsmasq

Verify KVM Support

$ lscpu | grep Virtualization
Virtualization:                  VT-x

$ ls /dev/kvm
/dev/kvm

If /dev/kvm exists, you have hardware virtualization support.

Create a VM with virt-manager (GUI)

  1. Launch virt-manager
  2. Click Create a new virtual machine
  3. Select Local install media (ISO)
  4. Browse to your downloaded ISO
  5. Allocate 2 GB RAM, 2 CPUs
  6. Create a 25 GB disk
  7. Name your VM and click Finish

Create a VM from the Command Line

# Create a disk image
$ qemu-img create -f qcow2 ubuntu-lab.qcow2 25G

# Boot from ISO to install
$ qemu-system-x86_64 \
    -enable-kvm \
    -m 2048 \
    -smp 2 \
    -hda ubuntu-lab.qcow2 \
    -cdrom ubuntu-22.04-live-server-amd64.iso \
    -boot d \
    -net nic -net user,hostfwd=tcp::2222-:22

# After installation, boot without ISO
$ qemu-system-x86_64 \
    -enable-kvm \
    -m 2048 \
    -smp 2 \
    -hda ubuntu-lab.qcow2 \
    -net nic -net user,hostfwd=tcp::2222-:22

The hostfwd option forwards port 2222 on your host to port 22 on the VM, so you can SSH in:

$ ssh -p 2222 yourname@localhost

Option 4: Cloud Instance

If you have an account with any cloud provider, you can launch a Linux instance in minutes.

AWS (Free Tier)

# Using the AWS CLI
$ aws ec2 run-instances \
    --image-id ami-0c7217cdde317cfec \
    --instance-type t2.micro \
    --key-name your-keypair \
    --security-group-ids sg-xxxxx

# SSH in
$ ssh -i your-key.pem ubuntu@<public-ip>

DigitalOcean, Linode, Vultr

These providers offer $5-10/month VMs with one-click Linux provisioning. Many offer free credits for new accounts. The setup is typically:

  1. Create an account
  2. Create a "Droplet" / "Linode" / "Instance"
  3. Choose Ubuntu 22.04 LTS
  4. Choose the smallest size
  5. Add your SSH key
  6. SSH in

This approach is excellent for practicing remote server administration, which is how you will manage Linux servers in the real world.


Option 5: Live USB (Try Without Installing)

A Live USB lets you boot a full Linux system from a USB drive without touching your hard disk. This is great for testing hardware compatibility before committing to an installation.

Create a Live USB

What you need:

  • A USB drive (4 GB minimum, 8 GB recommended)
  • A Linux ISO (Ubuntu Desktop is best for this)
  • A tool to write the ISO to USB

Warning: Writing an ISO to a USB drive will ERASE ALL DATA on that drive. Back up any files on the USB before proceeding.

On Linux:

# Identify your USB drive (CAREFULLY -- do not pick the wrong device!)
$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 256.0G  0 disk
├─sda1   8:1    0   512M  0 part /boot/efi
└─sda2   8:2    0 255.5G  0 part /
sdb      8:16   1   7.5G  0 disk          <-- This is the USB drive

# Write the ISO (replace /dev/sdb with YOUR USB device)
$ sudo dd if=ubuntu-22.04-desktop-amd64.iso of=/dev/sdb bs=4M status=progress
$ sync

Warning: The dd command is sometimes called "disk destroyer" for good reason. If you specify the wrong of= target, you will overwrite your hard disk. Triple-check the device name with lsblk before running dd.

On Windows:

Use Rufus (rufus.ie) -- a free, open-source tool:

  1. Download and run Rufus
  2. Select your USB drive
  3. Select the ISO file
  4. Click Start

On macOS:

Use balenaEtcher (balena.io/etcher) -- also free and open source.

Boot from USB

  1. Insert the USB drive
  2. Reboot your computer
  3. Enter the boot menu (usually F12, F2, Esc, or Del during startup -- varies by manufacturer)
  4. Select the USB drive
  5. Choose "Try Ubuntu" (or equivalent) to boot without installing

Option 6: Bare Metal Installation

Installing Linux as the sole operating system on a computer gives you the best performance and full hardware access. This is ideal if you have a spare laptop or desktop.

Warning: Installing Linux on bare metal will ERASE your existing operating system and all data on the target disk (unless you set up dual boot). Back up everything first.

The process is the same as a VM install, but booting from a USB drive on real hardware:

  1. Create a Live USB (see above)
  2. Boot from USB
  3. Choose "Install Ubuntu" instead of "Try Ubuntu"
  4. Follow the installer (same steps as the VM walkthrough above)
  5. When asked about disk partitioning, "Erase disk and install" is the simplest option

Understanding Partitioning Basics

During installation, you encounter disk partitioning. Here is what you need to know now (Chapter 7 goes deep).

What Is a Partition?

A partition divides a physical disk into logical sections. Each partition can have its own filesystem and purpose.

Common Partition Layouts

Simple (recommended for learning):

┌──────────────────────────────────────────────┐
│  /dev/sda                                     │
│                                               │
│  ┌────────┬──────────────────────────────┐    │
│  │ sda1   │ sda2                         │    │
│  │ /boot  │ / (root)                     │    │
│  │ ~1 GB  │ remaining space              │    │
│  │ (EFI)  │ (ext4 filesystem)            │    │
│  └────────┴──────────────────────────────┘    │
└──────────────────────────────────────────────┘

Production server (typical):

┌──────────────────────────────────────────────┐
│  /dev/sda                                     │
│                                               │
│  ┌──────┬──────┬────────┬────────┬────────┐  │
│  │ sda1 │ sda2 │ sda3   │ sda4   │ sda5   │  │
│  │/boot │ /    │ /home  │ /var   │ swap   │  │
│  │512MB │ 20GB │ 50GB   │ 20GB   │ 4GB    │  │
│  │(EFI) │      │        │(logs)  │        │  │
│  └──────┴──────┴────────┴────────┴────────┘  │
└──────────────────────────────────────────────┘

Why separate partitions?

  • /boot or /boot/efi -- keeps boot files on a simple filesystem the bootloader can read
  • / (root) -- the core operating system
  • /home -- user files (separate so you can reinstall the OS without losing your data)
  • /var -- logs and variable data (separate so a full disk from runaway logs does not crash the OS)
  • swap -- virtual memory space (used when RAM is full)

For a learning VM, the simple layout is perfect. Accept the installer's defaults.

Filesystem Types

The installer will ask about filesystem types. The short answer: use ext4. It is the default, well-tested, and reliable.

FilesystemUse Case
ext4Default for most Linux installs. Reliable, mature
XFSDefault on RHEL/Rocky. Good for large files
BtrfsAdvanced features (snapshots, compression). Default on Fedora, openSUSE
ZFSAdvanced features, from the Solaris world. Popular on NAS systems

Think About It: Why might you put /var/log on a separate partition from /? What happens if log files fill up all available disk space on the root partition?


Post-Install First Steps

You have a running Linux system. Now what? Run through these steps to make sure everything is working and up to date.

Step 1: Update Your System

The very first thing to do on any new Linux install is update all packages:

# Debian/Ubuntu
$ sudo apt update && sudo apt upgrade -y

# Fedora/RHEL/Rocky
$ sudo dnf upgrade -y

# Arch
$ sudo pacman -Syu

Distro Note: apt update refreshes the package list (what is available). apt upgrade installs newer versions. These are two separate steps on Debian/Ubuntu. On Fedora, dnf upgrade does both. On Arch, pacman -Syu does both.

Step 2: Check Networking

# Do you have an IP address?
$ ip addr show
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0

# Can you reach the internet?
$ ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=12.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=11.8 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=12.1 ms

# Can DNS resolution work?
$ ping -c 3 google.com
PING google.com (142.250.80.46) 56(84) bytes of data.
64 bytes from lax17s64-in-f14.1e100.net: icmp_seq=1 ttl=117 time=12.5 ms

If ping 8.8.8.8 works but ping google.com does not, you have a DNS problem. Check /etc/resolv.conf.

Step 3: Install Essential Tools

# Debian/Ubuntu
$ sudo apt install -y curl wget vim git htop tree net-tools

# Fedora/RHEL/Rocky
$ sudo dnf install -y curl wget vim git htop tree net-tools

# Arch
$ sudo pacman -S curl wget vim git htop tree net-tools

These are tools you will use throughout this book.

Step 4: Check Disk Space

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        22G  4.2G   17G  21% /
/dev/sda1       512M  5.3M  507M   2% /boot/efi
tmpfs           994M     0  994M   0% /dev/shm

Make sure you have plenty of space available on /.

Step 5: Verify Your User Can Use sudo

$ sudo whoami
root

If this returns root, your user has sudo privileges. If it returns an error, you need to add your user to the sudo group:

# This requires logging in as root first
# Debian/Ubuntu
$ su -
# usermod -aG sudo yourname

# Fedora/RHEL/Rocky
$ su -
# usermod -aG wheel yourname

Distro Note: The sudo group is called sudo on Debian/Ubuntu and wheel on Fedora/RHEL. Same concept, different name.

Step 6: Set Your Timezone

# Check current timezone
$ timedatectl
               Local time: Mon 2024-01-15 10:45:00 UTC
           Universal time: Mon 2024-01-15 10:45:00 UTC
                 RTC time: Mon 2024-01-15 10:45:00
                Time zone: Etc/UTC (UTC, +0000)

# List available timezones
$ timedatectl list-timezones | grep America
America/Chicago
America/Denver
America/Los_Angeles
America/New_York
...

# Set your timezone
$ sudo timedatectl set-timezone America/New_York

# Verify
$ date
Mon Jan 15 05:45:00 EST 2024

Verifying Your Installation: A Checklist

Run through this checklist to confirm your system is ready for the rest of this book:

# 1. Kernel is running
$ uname -r
6.1.0-18-amd64

# 2. You know your distribution
$ cat /etc/os-release | grep PRETTY_NAME
PRETTY_NAME="Ubuntu 22.04.3 LTS"

# 3. Networking works
$ curl -s ifconfig.me
203.0.113.42

# 4. Package manager works
$ apt list --installed 2>/dev/null | wc -l    # Debian/Ubuntu
587

# 5. sudo works
$ sudo echo "I have sudo access"
I have sudo access

# 6. Disk space is sufficient
$ df -h / | awk 'NR==2 {print "Available:", $4}'
Available: 17G

# 7. Memory is sufficient
$ free -h | awk '/Mem:/ {print "Available:", $7}'
Available: 1.5Gi

# 8. Essential tools are installed
$ which bash curl vim git ssh
/usr/bin/bash
/usr/bin/curl
/usr/bin/vim
/usr/bin/git
/usr/bin/ssh

If all of these pass, you are ready.


Debug This

You installed Ubuntu in a VM. You can log in at the console, but when you try to SSH from your host machine, it does not connect:

$ ssh yourname@10.0.2.15
ssh: connect to host 10.0.2.15 port 22: Connection refused

What could be wrong? Work through these possibilities:

Click to see the diagnosis

Check 1: Is SSH installed and running on the VM?

# On the VM console
$ systemctl status sshd
● ssh.service - OpenBSD Secure Shell server
     Active: active (running)

If SSH is not installed: sudo apt install openssh-server If SSH is not running: sudo systemctl start ssh

Check 2: Is the VM network configured for access from the host?

VirtualBox NAT mode (the default) gives the VM a private IP (10.0.2.x) that the host cannot reach directly. You have two options:

Option A: Port forwarding (with NAT) In VirtualBox: Settings > Network > Advanced > Port Forwarding Add a rule: Host Port 2222 -> Guest Port 22 Then connect with: ssh -p 2222 yourname@localhost

Option B: Bridged networking In VirtualBox: Settings > Network > Attached to: Bridged Adapter The VM will get an IP on your local network, and you can SSH to it directly.

Check 3: Is a firewall blocking port 22?

$ sudo ufw status
Status: active
...

If the firewall is active and SSH is not allowed:

$ sudo ufw allow ssh

The most common cause for new users is the NAT networking issue. VirtualBox NAT gives the VM internet access but does not allow incoming connections from the host without port forwarding.


What Just Happened?

┌─────────────────────────────────────────────────────┐
│                                                      │
│  In this chapter, you:                               │
│                                                      │
│  - Learned 6 ways to run Linux: WSL2, VirtualBox,    │
│    QEMU/KVM, cloud instance, live USB, bare metal.   │
│                                                      │
│  - Set up WSL2 on Windows with a single command      │
│    (wsl --install).                                   │
│                                                      │
│  - Created a VirtualBox VM with Ubuntu Server,       │
│    walked through the installer step by step.         │
│                                                      │
│  - Learned partitioning basics: /, /boot, /home,     │
│    /var, swap -- and why separate partitions matter.  │
│                                                      │
│  - Completed post-install setup: updated packages,   │
│    verified networking, installed essential tools,    │
│    set timezone, confirmed sudo access.              │
│                                                      │
│  - Ran a verification checklist to confirm your      │
│    system is ready for the rest of this book.         │
│                                                      │
│  Commands you met: wsl, apt update, apt upgrade,     │
│  ip addr, ping, curl, df, free, timedatectl,         │
│  systemctl, ufw, dd, lsblk, sudo                    │
│                                                      │
└─────────────────────────────────────────────────────┘

Try This

Exercises

  1. Installation exercise: Install Linux using at least one method from this chapter. If you are on Windows, start with WSL2. If you want the full experience, create a VirtualBox VM.

  2. Snapshot exercise (VM users): Take a snapshot of your fresh install. Then intentionally break something (delete /etc/hostname, for example). Restore the snapshot and verify everything is back to normal.

  3. Post-install exercise: Run through every step in the "Post-Install First Steps" section. Record the output of each command. Compare with a classmate or colleague -- are your outputs different? Why?

  4. Partitioning exercise: Run lsblk and df -h on your system. Draw a diagram of your disk layout similar to the ASCII diagrams in this chapter. What partitions exist? What filesystem does each use?

  5. Multi-distro exercise: If you have VirtualBox or Docker, install a second distribution (e.g., if you installed Ubuntu, try Fedora). Run through the same post-install steps. Note every command that differs.

Bonus Challenge

Install Arch Linux in a virtual machine following the official Arch Installation Guide on the Arch Wiki. This is a rite of passage for Linux enthusiasts. The Arch installer gives you nothing -- no GUI, no automation. You will manually partition disks, install the base system, configure the bootloader, and set up networking. It is difficult, educational, and deeply satisfying when it works.


What's Next

You have a running Linux system. You are staring at a blinking cursor. What now? Chapter 4 introduces the shell -- the interface between you and the full power of Linux. This is where the real journey begins.

The Shell: Your Superpower

Why This Matters

Imagine this: a web server is down, hundreds of users are affected, and you need to find the problem and fix it -- fast. You SSH into the server. There is no graphical interface. No mouse. No menus. Just a blinking cursor. The only tool you have is the shell.

In this moment, the shell is either your greatest asset or your greatest obstacle. If you know your way around it, you can diagnose the problem in seconds: check running processes, read log files, restart services, inspect network connections -- all without leaving the keyboard. If you do not, you are helpless.

The shell is the single most important skill in this book. Every chapter after this one assumes you are comfortable typing commands, reading output, and navigating the filesystem. Master this chapter, and everything else falls into place.


Try This Right Now

Open your Linux terminal (whether it is WSL2, a VM, SSH to a server, or a native Linux desktop) and type:

$ echo "I am $(whoami) on $(hostname), running $(uname -s) kernel $(uname -r)"

You should see something like:

I am yourname on ubuntu-lab, running Linux kernel 6.1.0-18-amd64

You just used four commands (echo, whoami, hostname, uname) combined into a single line using command substitution ($(...) runs a command and inserts its output). This is the kind of thing the shell makes effortless.


Terminal vs Shell vs Console: Clearing the Confusion

These three terms are used loosely, but they mean different things.

┌─────────────────────────────────────────────────────┐
│                                                      │
│  TERMINAL EMULATOR                                   │
│  ┌─────────────────────────────────────────────────┐ │
│  │                                                  │ │
│  │  The window that displays text. It handles       │ │
│  │  fonts, colors, scrolling, and keyboard input.   │ │
│  │                                                  │ │
│  │  Examples: GNOME Terminal, Konsole, Alacritty,   │ │
│  │  iTerm2, Windows Terminal, xterm                 │ │
│  │                                                  │ │
│  │  ┌──────────────────────────────────────────┐    │ │
│  │  │                                           │    │ │
│  │  │  SHELL                                    │    │ │
│  │  │                                           │    │ │
│  │  │  The program that interprets your          │    │ │
│  │  │  commands. It reads what you type,         │    │ │
│  │  │  executes programs, and shows output.      │    │ │
│  │  │                                           │    │ │
│  │  │  Examples: bash, zsh, fish, sh, dash       │    │ │
│  │  │                                           │    │ │
│  │  └──────────────────────────────────────────┘    │ │
│  │                                                  │ │
│  └─────────────────────────────────────────────────┘ │
│                                                      │
│  CONSOLE                                             │
│  The physical or virtual text interface               │
│  (e.g., the screen when no GUI is running,           │
│  or Ctrl+Alt+F1 through F6 on Linux)                 │
│                                                      │
└─────────────────────────────────────────────────────┘
  • Terminal emulator: the application that provides the window. It is just a container.
  • Shell: the program running inside the terminal that actually interprets your commands.
  • Console: historically the physical terminal attached to a machine. Today, it usually refers to the virtual terminals you can switch to with Ctrl+Alt+F1 through F6 on a Linux system.

When you open "Terminal" on your Linux desktop, you are launching a terminal emulator that starts a shell (usually Bash) inside it.


Which Shell Are You Using?

# What shell is running right now?
$ echo $SHELL
/bin/bash

# What version?
$ bash --version
GNU bash, version 5.2.15(1)-release (x86_64-pc-linux-gnu)

# What shells are available on this system?
$ cat /etc/shells
/bin/sh
/bin/bash
/usr/bin/bash
/bin/zsh
/usr/bin/zsh
/bin/dash

The Major Shells

Bash (Bourne Again Shell) -- the default on most Linux distributions. This is what we use throughout this book. It is the lingua franca of Linux.

Zsh (Z Shell) -- the default on macOS since Catalina. Compatible with Bash but adds features like better tab completion, spelling correction, and plugin frameworks (Oh My Zsh). Popular on developer workstations.

Fish (Friendly Interactive Shell) -- designed for usability. Features autosuggestions, syntax highlighting out of the box, and a web-based configuration interface. Not POSIX-compatible, so scripts written for Bash may not work in Fish.

Dash -- a minimal POSIX-compliant shell. Used as /bin/sh on Debian/Ubuntu for speed. Not meant for interactive use.

sh (Bourne Shell) -- the original Unix shell. On modern Linux, /bin/sh is usually a symlink to dash or bash.

For this book, use Bash. It is everywhere, and understanding Bash gives you the foundation to learn any other shell quickly.

Think About It: If Bash is the default shell on most Linux distributions, why do alternative shells like Zsh and Fish exist? What might they offer that Bash does not?


Anatomy of a Command

Every shell command follows the same basic structure:

command   [options]   [arguments]

  │          │            │
  │          │            └── What to act on (files, strings, etc.)
  │          │
  │          └── Modify the command's behavior (flags/switches)
  │
  └── The program to run

Examples:

ls                          # command only (no options, no arguments)
ls -l                       # command + option
ls -l /home                 # command + option + argument
ls -la /home /tmp           # command + multiple options + multiple arguments
cp -r /source /destination  # command + option + two arguments

Options: Short and Long Form

Most commands support both short (single dash, single letter) and long (double dash, word) options:

ls -a                 # short form: show all files (including hidden)
ls --all              # long form: same thing

ls -l                 # short form: long listing format
ls --format=long      # long form: same thing

ls -la                # short forms can be combined: -l + -a
ls -l -a              # equivalent: separate short options
ls --all --format=long  # long forms cannot be combined

The -- Convention

A double dash by itself (--) means "end of options, everything after this is an argument":

# What if you have a file named "-l" and want to delete it?
$ rm -l         # ERROR: rm interprets -l as an option
$ rm -- -l      # CORRECT: -- tells rm that -l is a filename

Your First Commands

Let us work through the essential commands, one at a time. Type each one.

pwd -- Where Am I?

$ pwd
/home/yourname

pwd stands for Print Working Directory. It tells you your current location in the filesystem. You always have a current location, and it matters because relative paths are relative to this location.

ls -- What Is Here?

# List contents of current directory
$ ls
Desktop  Documents  Downloads  Music  Pictures

# Long format: permissions, owner, size, date
$ ls -l
total 20
drwxr-xr-x 2 yourname yourname 4096 Jan 15 10:30 Desktop
drwxr-xr-x 2 yourname yourname 4096 Jan 15 10:30 Documents
drwxr-xr-x 2 yourname yourname 4096 Jan 15 10:30 Downloads
drwxr-xr-x 2 yourname yourname 4096 Jan 15 10:30 Music
drwxr-xr-x 2 yourname yourname 4096 Jan 15 10:30 Pictures

# Show hidden files (files starting with .)
$ ls -a
.  ..  .bash_history  .bashrc  .profile  Desktop  Documents  ...

# Combine: long format + hidden + human-readable sizes
$ ls -lah
total 44K
drwxr-xr-x 7 yourname yourname 4.0K Jan 15 10:30 .
drwxr-xr-x 3 root     root     4.0K Jan 14 09:00 ..
-rw------- 1 yourname yourname  256 Jan 15 10:45 .bash_history
-rw-r--r-- 1 yourname yourname  220 Jan 14 09:00 .bash_logout
-rw-r--r-- 1 yourname yourname 3.5K Jan 14 09:00 .bashrc
drwxr-xr-x 2 yourname yourname 4.0K Jan 15 10:30 Desktop
...

# List a specific directory
$ ls -l /etc/

Key flags to remember:

  • -l -- long format (permissions, owner, size, date)
  • -a -- all files, including hidden (dotfiles)
  • -h -- human-readable sizes (K, M, G instead of bytes)
  • -t -- sort by modification time (newest first)
  • -r -- reverse sort order
  • -R -- recursive (list subdirectories too)

cd -- Move Around

# Go to a directory
$ cd /var/log
$ pwd
/var/log

# Go to your home directory (three equivalent ways)
$ cd
$ cd ~
$ cd $HOME

# Go to the previous directory
$ cd -
/var/log

# Go up one level
$ cd ..
$ pwd
/var

# Go up two levels
$ cd ../..
$ pwd
/

Important concepts:

  • . means "current directory"
  • .. means "parent directory"
  • ~ means "home directory" (/home/yourname)
  • - means "previous directory" (where you just were)

cat -- Read File Contents

# Display a file's contents
$ cat /etc/hostname
ubuntu-lab

# Display with line numbers
$ cat -n /etc/passwd | head -5
     1  root:x:0:0:root:/root:/bin/bash
     2  daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
     3  bin:x:2:2:bin:/bin:/usr/sbin/nologin
     4  sys:x:3:3:sys:/dev:/usr/sbin/nologin
     5  sync:x:4:65534:sync:/bin:/bin/sync

cat stands for "concatenate." Its original purpose was to concatenate multiple files, but it is most commonly used to display a single file.

Think About It: cat dumps the entire file to the screen. What happens if the file is thousands of lines long? What might be a better tool for reading long files?

echo -- Print Text

# Print a simple message
$ echo "Hello, Linux!"
Hello, Linux!

# Print the value of a variable
$ echo $HOME
/home/yourname

$ echo "My shell is $SHELL"
My shell is /bin/bash

# Print without a trailing newline
$ echo -n "no newline here"
no newline here$

# Print with escape characters
$ echo -e "Line one\nLine two\nLine three"
Line one
Line two
Line three

echo is the simplest way to produce output. You will use it constantly in scripts and one-liners.

man -- The Manual Pages

$ man ls

This opens the manual page for ls. The man page tells you:

  • What the command does
  • Every option and flag
  • Examples (sometimes)
  • Related commands

Navigating man pages:

  • Space or f -- next page
  • b -- previous page
  • /pattern -- search for "pattern"
  • n -- next search result
  • N -- previous search result
  • q -- quit

Man page sections:

# Man pages are organized into sections
$ man 1 ls      # Section 1: User commands
$ man 5 passwd  # Section 5: File formats (/etc/passwd format)
$ man 8 mount   # Section 8: System administration commands
SectionContents
1User commands (ls, cp, grep)
2System calls (open, read, fork)
3Library functions (printf, malloc)
4Special files (/dev/*)
5File formats (/etc/passwd, /etc/fstab)
6Games
7Miscellaneous (protocols, conventions)
8System administration commands (mount, iptables)

Getting Help: Three Methods

You will need help constantly. That is normal. Here are three ways to get it.

Method 1: man pages (comprehensive)

$ man grep       # full manual
$ man -k "copy files"  # search man pages by keyword

Method 2: --help flag (quick reference)

$ ls --help
Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
...

Almost every command supports --help. The output is shorter than a man page and usually sufficient for quick questions.

Method 3: info pages (detailed, GNU-style)

$ info coreutils   # detailed GNU documentation

info pages are more detailed than man pages for GNU tools but use a different navigation scheme. Most people prefer man pages.

Which to Use?

  • Quick question ("What flag makes ls sort by size?"): ls --help | grep size
  • Learning a new command: man command
  • Deep reference: info command or the online documentation

Tab Completion: Your Fingers Will Thank You

Tab completion is one of the most important productivity features of the shell. Press Tab to auto-complete commands, filenames, and paths.

Hands-On: Try Tab Completion

# Type "cd /e" then press Tab
$ cd /e<TAB>
$ cd /etc/              # auto-completed!

# Type "cd /etc/sys" then press Tab
$ cd /etc/sys<TAB>
$ cd /etc/sysctl.d/     # if only one match, it completes

# Type "cd /etc/s" then press Tab twice
$ cd /etc/s<TAB><TAB>
security/  shadow     shells     skel/      ssh/       ssl/       subgid     subuid     sudoers    sudoers.d/ sysctl.d/  systemd/
# Multiple matches! Shows all possibilities

# Type "cat /etc/hos" then press Tab
$ cat /etc/hos<TAB>
$ cat /etc/hostname     # completed!

# Tab also works for commands
$ sys<TAB><TAB>
systemctl    systemd-analyze    systemd-cat    ...

Rules of tab completion:

  • One match: auto-completes immediately
  • Multiple matches: press Tab twice to see all options
  • No matches: nothing happens (no beep, no output)

In Bash, install bash-completion for even better tab completion (it can complete package names, git branches, systemctl units, and more):

# Debian/Ubuntu
$ sudo apt install bash-completion

# Fedora/RHEL
$ sudo dnf install bash-completion

# Arch
$ sudo pacman -S bash-completion

Distro Note: Most distributions install bash-completion by default. If tab completion for commands like systemctl and git already works for subcommands and arguments, you already have it.


Command History: The Shell Remembers

The shell keeps a history of every command you type. This is enormously useful.

Basic History Usage

# Show recent commands
$ history
  1  ls -la
  2  cd /etc
  3  cat hostname
  4  ping google.com
  5  sudo apt update
  ...

# Show last 10 commands
$ history 10

# Re-run command number 42
$ !42

# Re-run the last command
$ !!

# Re-run the last command that started with "sudo"
$ !sudo

# Search history interactively: press Ctrl+R, then type
$ (press Ctrl+R)
(reverse-i-search)`ping': ping -c 3 google.com
# Press Enter to run it, or Ctrl+R again for the next match

History Configuration

# Where is history stored?
$ echo $HISTFILE
/home/yourname/.bash_history

# How many commands are kept?
$ echo $HISTSIZE
1000

# You can increase these in ~/.bashrc:
# HISTSIZE=10000
# HISTFILESIZE=20000

Essential Keyboard Shortcuts

These work in Bash and most other shells:

┌───────────────────────────────────────────────────┐
│  Navigation                                        │
│  Ctrl+A      Move cursor to beginning of line      │
│  Ctrl+E      Move cursor to end of line             │
│  Alt+B       Move back one word                     │
│  Alt+F       Move forward one word                  │
│  Ctrl+←      Move back one word (some terminals)    │
│  Ctrl+→      Move forward one word                  │
│                                                     │
│  Editing                                            │
│  Ctrl+U      Delete from cursor to beginning        │
│  Ctrl+K      Delete from cursor to end              │
│  Ctrl+W      Delete the word before cursor          │
│  Alt+D       Delete the word after cursor            │
│  Ctrl+Y      Paste what was last deleted (yank)     │
│                                                     │
│  History                                            │
│  Ctrl+R      Reverse search history                 │
│  Ctrl+P      Previous command (same as Up arrow)    │
│  Ctrl+N      Next command (same as Down arrow)      │
│  ↑ / ↓       Navigate through history               │
│                                                     │
│  Control                                            │
│  Ctrl+C      Cancel current command                 │
│  Ctrl+D      Exit shell (or signal end of input)    │
│  Ctrl+L      Clear the screen (same as `clear`)     │
│  Ctrl+Z      Suspend current process (background)   │
│                                                     │
└───────────────────────────────────────────────────┘

Practice these until they are muscle memory. The difference between a beginner and an experienced Linux user often comes down to how fluently they navigate the command line.


Environment Variables

Environment variables are named values that the shell and programs use for configuration. They are a fundamental part of how Linux works.

Viewing Environment Variables

# See all environment variables
$ env
HOME=/home/yourname
USER=yourname
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LANG=en_US.UTF-8
TERM=xterm-256color
...

# See a specific variable
$ echo $HOME
/home/yourname

$ echo $USER
yourname

$ echo $SHELL
/bin/bash

Setting Variables

# Set a variable (no spaces around =)
$ GREETING="Hello, Linux"
$ echo $GREETING
Hello, Linux

# This variable exists only in the current shell
# To make it available to child processes, export it:
$ export GREETING="Hello, Linux"

# Set and export in one step
$ export MY_APP_PORT=8080
$ echo $MY_APP_PORT
8080

Warning: There are no spaces around the = sign. NAME=value works. NAME = value fails with a confusing error (the shell tries to run NAME as a command with = and value as arguments).

Important Environment Variables

VariablePurposeExample Value
HOMEYour home directory/home/yourname
USERCurrent usernameyourname
SHELLDefault shell/bin/bash
PATHWhere to find commands/usr/local/bin:/usr/bin:/bin
PWDCurrent working directory/home/yourname
EDITORDefault text editorvim or nano
LANGLanguage/locale settingen_US.UTF-8
TERMTerminal typexterm-256color
HOSTNAMEMachine nameubuntu-lab

PATH: How the Shell Finds Commands

When you type ls, how does the shell know where the ls program is? The answer is the PATH variable.

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

PATH is a colon-separated list of directories. When you type a command, the shell searches these directories in order, left to right, looking for an executable file with that name.

You type: ls

Shell searches:
  /usr/local/sbin/ls  -- not found
  /usr/local/bin/ls   -- not found
  /usr/sbin/ls        -- not found
  /usr/bin/ls         -- FOUND! Runs /usr/bin/ls

Finding Where a Command Lives

# which: shows the path to the command that would run
$ which ls
/usr/bin/ls

$ which python3
/usr/bin/python3

# type: shows more detail (aliases, builtins, etc.)
$ type ls
ls is aliased to `ls --color=auto'

$ type cd
cd is a shell builtin

$ type python3
python3 is /usr/bin/python3

# whereis: finds the binary, source, and man page
$ whereis ls
ls: /usr/bin/ls /usr/share/man/man1/ls.1.gz

Modifying PATH

# Add a directory to PATH (temporary, current session only)
$ export PATH="$HOME/bin:$PATH"
$ echo $PATH
/home/yourname/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# To make it permanent, add the export line to ~/.bashrc:
$ echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc

Think About It: What happens if you have two different programs both named python3, one in /usr/bin/ and one in /usr/local/bin/? Which one runs when you type python3? How would you change which one runs?


Hands-On: Putting It All Together

Let us chain together everything you have learned in a real scenario.

Scenario: Investigate an Unknown Linux System

You have just SSH'd into a server you have never seen before. Gather information:

# Who am I?
$ whoami
yourname

# What machine is this?
$ hostname
prod-web-03

# What distribution?
$ cat /etc/os-release | head -2
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"

# What kernel?
$ uname -r
5.15.0-91-generic

# How long has it been running?
$ uptime
 14:32:07 up 127 days,  3:21,  1 user,  load average: 0.45, 0.38, 0.35

# How much memory?
$ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.8Gi       2.1Gi       312Mi       128Mi       1.4Gi       1.3Gi
Swap:          2.0Gi       128Mi       1.9Gi

# How much disk space?
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   32G   15G  69% /

# What is running? (top 5 CPU consumers)
$ ps aux --sort=-%cpu | head -6
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
www-data  1234 12.3  5.2 512000 201000 ?       Sl   Jan01 245:32 nginx: worker
mysql     5678  8.7 15.3 1024000 590000 ?      Sl   Jan01 170:45 /usr/sbin/mysqld
root      9012  1.2  0.5  65000  19000 ?       Ss   Jan01  23:10 /usr/sbin/sshd
...

# What ports are open?
$ ss -tlnp
State  Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
LISTEN 0      511    0.0.0.0:80           0.0.0.0:*         users:(("nginx",pid=1233))
LISTEN 0      511    0.0.0.0:443          0.0.0.0:*         users:(("nginx",pid=1233))
LISTEN 0      151    127.0.0.1:3306       0.0.0.0:*         users:(("mysqld",pid=5678))
LISTEN 0      128    0.0.0.0:22           0.0.0.0:*         users:(("sshd",pid=9012))

In under a minute, using only the commands from this chapter, you have determined that this is an Ubuntu web server running Nginx and MySQL, with 69% disk usage (might need attention), and it has been up for 127 days. This is the power of the shell.


Debug This

You are trying to run a program you just downloaded, but the shell cannot find it:

$ my-tool --version
bash: my-tool: command not found

$ ls ~/downloads/my-tool
/home/yourname/downloads/my-tool

The file exists. Why can the shell not find it? Fix it.

Click to see the diagnosis

Problem 1: The directory is not in PATH.

The shell only looks in directories listed in $PATH. ~/downloads/ is not in PATH.

Fix options:

# Option A: Run with full path
$ ~/downloads/my-tool --version

# Option B: Run with relative path
$ ./downloads/my-tool --version   # if you are in ~

# Option C: Add the directory to PATH
$ export PATH="$HOME/downloads:$PATH"
$ my-tool --version

Problem 2: The file might not be executable.

$ ls -l ~/downloads/my-tool
-rw-r--r-- 1 yourname yourname 102400 Jan 15 10:00 my-tool

No x in the permissions. Fix:

$ chmod +x ~/downloads/my-tool
$ ~/downloads/my-tool --version

Best practice: Move custom tools to ~/bin/ or /usr/local/bin/ and make sure those directories are in your PATH.


The Shell Startup Files

When Bash starts, it reads configuration files that set up your environment. Understanding which files are read and when is important.

Login Shell vs Non-Login Shell

┌─────────────────────────────────────┐
│         Login Shell                  │
│  (SSH, console login, su -)          │
│                                      │
│  Reads:                              │
│  1. /etc/profile                     │
│  2. ~/.bash_profile                  │
│     (or ~/.bash_login                │
│      or ~/.profile)                  │
│                                      │
├─────────────────────────────────────┤
│     Non-Login (Interactive) Shell    │
│  (opening a new terminal window)     │
│                                      │
│  Reads:                              │
│  1. /etc/bash.bashrc  (Debian)       │
│     /etc/bashrc       (RHEL)         │
│  2. ~/.bashrc                        │
│                                      │
└─────────────────────────────────────┘

In practice, most people put their customizations in ~/.bashrc because ~/.bash_profile typically sources ~/.bashrc anyway. Check yours:

$ cat ~/.bash_profile
# If this file exists, it often contains:
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

What Goes in ~/.bashrc?

# Aliases (shortcuts)
alias ll='ls -lah'
alias la='ls -A'
alias ..='cd ..'
alias ...='cd ../..'

# Custom PATH
export PATH="$HOME/bin:$PATH"

# Default editor
export EDITOR=vim

# History settings
HISTSIZE=10000
HISTFILESIZE=20000

# Custom prompt (covered more in Chapter 18)
PS1='\u@\h:\w\$ '

After editing ~/.bashrc, reload it:

$ source ~/.bashrc
# or equivalently
$ . ~/.bashrc

Distro Note: On Debian/Ubuntu, the system-wide bashrc is /etc/bash.bashrc. On Fedora/RHEL, it is /etc/bashrc. Both serve the same purpose.


Combining Commands

The shell has several ways to combine commands. Here is a quick preview (Chapter 18 goes deeper):

# Run commands sequentially (regardless of success/failure)
$ command1 ; command2

# Run command2 only if command1 succeeds
$ command1 && command2

# Run command2 only if command1 fails
$ command1 || command2

# Pipe: send output of command1 as input to command2
$ command1 | command2

# Examples:
$ mkdir mydir && cd mydir        # create dir, then enter it
$ sudo apt update && sudo apt upgrade -y   # update, then upgrade
$ cat /etc/passwd | grep yourname          # find your line in passwd
$ ls /nonexistent || echo "Directory not found"   # handle failure

What Just Happened?

┌─────────────────────────────────────────────────────┐
│                                                      │
│  In this chapter, you learned:                       │
│                                                      │
│  - The terminal is the window. The shell is the      │
│    program inside it. Bash is the default shell.     │
│                                                      │
│  - Command anatomy: command [options] [arguments].   │
│    Options modify behavior, arguments specify        │
│    targets.                                          │
│                                                      │
│  - Essential commands: pwd, ls, cd, cat, echo, man.  │
│    These are your daily tools.                       │
│                                                      │
│  - Tab completion saves keystrokes and prevents      │
│    typos. Use it relentlessly.                       │
│                                                      │
│  - Command history (Ctrl+R, !!, history) lets you    │
│    reuse and search past commands.                   │
│                                                      │
│  - Environment variables (HOME, USER, PATH, SHELL)   │
│    configure the shell and programs.                 │
│                                                      │
│  - PATH tells the shell where to find commands.      │
│    Modify it to add custom tool locations.           │
│                                                      │
│  - Getting help: man, --help, info. man is your      │
│    primary reference.                                │
│                                                      │
│  - Keyboard shortcuts (Ctrl+A, Ctrl+E, Ctrl+R,      │
│    Ctrl+C, Ctrl+L) make you fast.                    │
│                                                      │
│  - ~/.bashrc is where you customize your shell.      │
│                                                      │
│  - Commands can be combined with ;, &&, ||, and |.   │
│                                                      │
└─────────────────────────────────────────────────────┘

Try This

Exercises

  1. Navigation exercise: Starting from your home directory, use cd and ls to explore these directories: /etc, /var/log, /usr/bin, /tmp, /proc. For each, use ls to see what is inside and pwd to confirm your location. Return home with cd (no arguments).

  2. man page exercise: Read the man page for ls (man ls). Find the option that sorts files by size. Find the option that shows file sizes in human-readable format. Run ls with both options on /usr/bin.

  3. History exercise: Use history to find a command you ran earlier. Re-run it using !number. Practice using Ctrl+R to search your history for the word "ls".

  4. PATH exercise: Run echo $PATH and identify each directory listed. Use ls to see what is in /usr/local/bin. How many executables are in /usr/bin? (Hint: ls /usr/bin | wc -l)

  5. Variable exercise: Set a variable MY_NAME to your name. Echo it. Export it. Start a new Bash session (bash) and verify the variable is still set. Exit the sub-shell (exit). Now try without export -- what happens in the sub-shell?

  6. Help exercise: For each of these commands, find out what they do using ONLY man or --help: wc, sort, head, tail, touch. Write a one-sentence description of each.

  7. Keyboard shortcuts exercise: Type a long command (do not press Enter). Practice: move to the beginning with Ctrl+A, move to the end with Ctrl+E, delete the entire line with Ctrl+U, bring it back with Ctrl+Y. Clear the screen with Ctrl+L.

Bonus Challenge

Customize your ~/.bashrc with at least three aliases that will save you time. For example:

alias update='sudo apt update && sudo apt upgrade -y'
alias ports='ss -tlnp'
alias myip='curl -s ifconfig.me && echo'

Source your .bashrc and test each alias. Think about what commands you type most often and create aliases for them.


What's Next

You can navigate, list files, read file contents, get help, and customize your shell. In Chapter 5, we go deeper into the filesystem -- the hierarchy of directories that organizes everything in Linux, and the profound idea that in Linux, everything is a file.

The Linux Filesystem Hierarchy

Why This Matters

You have just joined a team as a junior systems administrator. Your first ticket reads: "Application logs are eating up disk space on the web servers. Clean up and figure out where they're writing." You SSH in and stare at the root of the filesystem. There are twenty-odd directories -- /var, /tmp, /usr, /opt, /etc -- and you have no idea where to look.

This is not a hypothetical situation. Every Linux professional hits this wall on day one. Unlike Windows, where programs scatter files across C:\Program Files, C:\Users, and the Registry, Linux follows a well-defined plan called the Filesystem Hierarchy Standard (FHS). Learn it once, and you will always know where to find configuration files, log files, binaries, libraries, and temporary data -- on any distribution, on any server, anywhere in the world.

By the end of this chapter, the Linux directory tree will feel like your own neighborhood.


Try This Right Now

Open a terminal and run these commands. Do not worry about understanding every detail yet -- we will cover each one shortly.

# See the top-level directory structure
ls /

# Get a tree view (install tree if needed)
tree -L 1 /

# How big is each top-level directory?
du -sh /* 2>/dev/null | sort -rh | head -15

# Where is the 'ls' command actually stored?
which ls
file $(which ls)

You should see something like this from ls /:

bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

Each of those directories has a specific purpose. Let us walk through every single one.


The Root of Everything: /

In Linux, there is one single directory tree. Everything -- every file, every device, every running process -- lives somewhere under / (called "root"). There are no drive letters like C: or D:. If you plug in a USB drive, it appears as a directory under /mnt or /media. If you add a second hard disk, it gets mounted somewhere inside this same tree.

/                         <-- The root of the entire filesystem
├── bin/                  <-- Essential user commands
├── boot/                 <-- Bootloader and kernel
├── dev/                  <-- Device files
├── etc/                  <-- System configuration
├── home/                 <-- User home directories
├── lib/                  <-- Essential shared libraries
├── media/                <-- Removable media mount points
├── mnt/                  <-- Temporary mount points
├── opt/                  <-- Optional/third-party software
├── proc/                 <-- Process and kernel info (virtual)
├── root/                 <-- Root user's home directory
├── run/                  <-- Runtime variable data
├── sbin/                 <-- Essential system binaries
├── srv/                  <-- Service data (web, FTP)
├── sys/                  <-- Kernel and device info (virtual)
├── tmp/                  <-- Temporary files
├── usr/                  <-- User programs and data
└── var/                  <-- Variable data (logs, mail, spool)

This layout is not arbitrary. It is defined by the Filesystem Hierarchy Standard (FHS), maintained by the Linux Foundation. Most major distributions (Debian, Ubuntu, Fedora, RHEL, SUSE, Arch) follow it, with minor variations.

Think About It: Why does Linux use a single unified tree instead of drive letters? What advantages does this give you as a sysadmin managing servers with dozens of disks?


Walking the Tree: Directory by Directory

/bin -- Essential User Binaries

This directory contains the commands that every user needs, even in single-user (rescue) mode. Think of the absolute basics: ls, cp, mv, rm, cat, echo, mkdir, chmod, grep.

ls /bin | head -20

On modern systems (Fedora, Ubuntu 20.04+, Arch), /bin is actually a symbolic link to /usr/bin. This is called the UsrMerge -- we will explain why shortly.

# Check if /bin is a symlink
ls -ld /bin
# Likely output: lrwxrwxrwx 1 root root 7 ... /bin -> usr/bin

Distro Note: On older CentOS 6 / Debian 8 systems, /bin and /usr/bin are separate directories. On modern Fedora, Ubuntu 20.04+, and Arch, they have been merged. If you are working on an older system, the distinction matters for rescue scenarios.

/sbin -- System Binaries

Similar to /bin, but these are commands that typically require root privileges: fdisk, mkfs, iptables, reboot, shutdown, ip.

ls /sbin | head -20

# Try running one as a regular user
fdisk -l
# You'll likely get "Permission denied" or empty output

# Now with sudo
sudo fdisk -l

On merged systems, /sbin is a symlink to /usr/sbin.

/etc -- Configuration Files

This is where the system-wide configuration lives. Every major service stores its config here. The name historically comes from "et cetera" -- it was originally a catch-all, but today it is strictly for configuration.

# List the top-level config files and directories
ls /etc

# Some important files you should know about:
cat /etc/hostname          # System hostname
cat /etc/os-release        # Distribution info
cat /etc/passwd            # User accounts (not actually passwords!)
cat /etc/fstab             # Filesystem mount table
ls /etc/ssh/               # SSH server/client config
ls /etc/systemd/           # systemd unit overrides

Key principle: configuration files in /etc are plain text. You can read, edit, back up, and version-control them. This is a major advantage of Linux -- there is no opaque binary registry.

# Find all config files modified in the last 24 hours
find /etc -type f -mtime -1 2>/dev/null

/home -- User Home Directories

Every regular user gets a directory here. Your personal files, shell configuration (.bashrc, .profile), SSH keys (.ssh/), and application settings all live under your home directory.

ls /home

# Your home directory
echo $HOME
ls -la ~

# Important dot-files
ls -la ~/.bashrc ~/.profile ~/.ssh/ 2>/dev/null

The ~ (tilde) character is a shortcut for your home directory. ~alice means /home/alice.

/root -- Root User's Home

The root user does not live in /home/root. Instead, root's home directory is /root. Why? Because /home might be on a separate partition that has not mounted yet during rescue operations. The root user needs a home directory available on the root filesystem.

sudo ls -la /root

/var -- Variable Data

This is where data that changes during normal system operation lives. This is where you look when logs are eating disk space (remember our opening scenario?).

# Key subdirectories
ls /var

# System logs -- your first stop for troubleshooting
ls /var/log/
sudo tail -20 /var/log/syslog      # Debian/Ubuntu
sudo tail -20 /var/log/messages     # RHEL/Fedora

# Package manager cache
du -sh /var/cache/apt 2>/dev/null    # Debian/Ubuntu
du -sh /var/cache/dnf 2>/dev/null    # Fedora

# Mail spool
ls /var/mail/ 2>/dev/null

# Variable runtime data
ls /var/run    # Usually a symlink to /run

Distro Note: On Debian/Ubuntu the main system log is /var/log/syslog. On RHEL/Fedora/CentOS it is /var/log/messages. On systems using only journald, use journalctl instead.

Common /var subdirectories:

DirectoryPurpose
/var/log/System and application log files
/var/cache/Application cache data
/var/lib/Variable state information (databases, etc)
/var/spool/Queued data (print jobs, mail, cron)
/var/tmp/Temporary files preserved across reboots
/var/run/Runtime data (PID files, sockets)

/tmp -- Temporary Files

Any user or process can create files here. Files in /tmp are typically cleared on reboot (and on many systems, periodically by systemd-tmpfiles or tmpwatch).

ls /tmp

# Create a temp file
echo "test" > /tmp/mytest.txt
cat /tmp/mytest.txt

# Check if /tmp is a tmpfs (RAM-based filesystem)
df -h /tmp
mount | grep /tmp

Safety Warning: Never store anything important in /tmp. It may be cleared at any time. Also, because it is world-writable, be cautious about security -- other users can see filenames (though not necessarily contents) in /tmp.

/usr -- User Programs (The Big One)

/usr is typically the largest directory on the system. Despite its name, it is not about users -- it stands for "Unix System Resources" (retroactively). It contains the bulk of installed software.

ls /usr

du -sh /usr
/usr/
├── bin/          <-- Most user commands (merged with /bin)
├── sbin/         <-- Most admin commands (merged with /sbin)
├── lib/          <-- Libraries for /usr/bin and /usr/sbin
├── lib64/        <-- 64-bit libraries
├── include/      <-- C/C++ header files
├── share/        <-- Architecture-independent data (docs, man pages)
├── local/        <-- Locally installed software (not from package manager)
└── src/          <-- Source code (kernel headers, etc.)

The key subdirectory to know is /usr/local/. When you compile and install software from source (using make install), it goes here by default. This keeps it separate from package-manager-installed software.

# Software from your package manager
which nginx     # /usr/bin/nginx or /usr/sbin/nginx

# Software you compiled yourself
ls /usr/local/bin/

/opt -- Optional Software

Third-party commercial or self-contained software packages often install here. Examples: Google Chrome (/opt/google/chrome), some monitoring agents, vendor-specific tools.

ls /opt 2>/dev/null

# Each application gets its own subdirectory
# /opt/vendor_name/application_name

Think About It: When should software go in /usr/local versus /opt? The convention is: /usr/local for software that follows the traditional Unix layout (bin, lib, share), and /opt for self-contained packages that keep everything in one directory.

/boot -- Boot Files

Contains the Linux kernel, initial RAM disk (initramfs/initrd), and bootloader configuration (GRUB).

ls /boot

# You'll typically see:
# vmlinuz-*        -- The kernel
# initramfs-* or initrd-*  -- Initial RAM filesystem
# grub/ or grub2/  -- GRUB bootloader config

Safety Warning: Do not delete files in /boot unless you know exactly what you are doing. Deleting the wrong kernel image will make your system unbootable.

/dev -- Device Files

This is where the "everything is a file" philosophy shines. Hardware devices are represented as files here. Your hard disk is /dev/sda. Your terminal is /dev/tty. Random numbers come from /dev/urandom.

ls /dev | head -30

# Your disks
ls /dev/sd*  2>/dev/null   # SATA/SCSI disks
ls /dev/nvme* 2>/dev/null  # NVMe drives
ls /dev/vd*  2>/dev/null   # Virtual disks (VMs)

# Special devices
echo "Hello" > /dev/null   # The black hole -- discards everything
head -c 16 /dev/urandom | xxd   # Random bytes

# Your terminal
tty                        # Shows your terminal device
echo "Hello" > $(tty)      # Writes to your own terminal

We will dive deep into /dev in Chapter 8 when we discuss device files and the VFS.

/proc -- Process Information (Virtual)

This directory does not exist on disk. It is a virtual filesystem generated by the kernel in real time. Each numbered directory corresponds to a running process (by PID). It also exposes kernel parameters.

# List running processes
ls /proc | head -20

# Info about process 1 (init/systemd)
sudo cat /proc/1/cmdline | tr '\0' ' ' ; echo
sudo ls /proc/1/fd       # File descriptors

# System information
cat /proc/cpuinfo | head -20
cat /proc/meminfo | head -10
cat /proc/version
cat /proc/uptime

# Kernel tunable parameters
cat /proc/sys/net/ipv4/ip_forward

/sys -- System and Device Info (Virtual)

Another virtual filesystem, introduced in Linux 2.6. While /proc focuses on processes, /sys exposes a structured view of the kernel's device model: hardware, drivers, buses, and kernel subsystems.

# Block devices
ls /sys/block/

# CPU information
ls /sys/devices/system/cpu/

# Brightness control (on laptops)
cat /sys/class/backlight/*/brightness 2>/dev/null

# Network interface info
ls /sys/class/net/
cat /sys/class/net/eth0/address 2>/dev/null

/mnt and /media -- Mount Points

/mnt is traditionally used for temporarily mounting filesystems (an NFS share, an extra disk). /media is used by desktop environments for automatically mounting removable media (USB drives, CDs).

ls /mnt
ls /media

# Manually mount an ISO
sudo mkdir -p /mnt/iso
sudo mount -o loop some-image.iso /mnt/iso

/srv -- Service Data

Meant for data served by the system: web server document roots, FTP files, etc. In practice, many administrators use /var/www for web content instead, but the FHS recommends /srv.

/run -- Runtime Data

A tmpfs filesystem that holds runtime information: PID files, sockets, lock files. It is cleared on every boot. On many systems, /var/run is a symlink to /run.

ls /run
# You'll see: systemd/, user/, lock, utmp, etc.

# PID files for running services
cat /run/sshd.pid 2>/dev/null

The "Everything Is a File" Philosophy

This is one of the most important ideas in Unix and Linux. In Linux:

  • A regular file is a file
  • A directory is a file (that contains a list of other files)
  • A hardware device is a file (/dev/sda)
  • A running process's info is a file (/proc/1234/status)
  • A network socket can be a file
  • A pipe between two processes is a file
  • Even kernel parameters are files (/proc/sys/*)

This means you can use the same tools -- cat, echo, read, write -- to interact with hardware, processes, and kernel settings, not just text files.

# Read CPU info like a file
cat /proc/cpuinfo | grep "model name" | head -1

# Write to a kernel parameter like a file
cat /proc/sys/net/ipv4/ip_forward         # Read it
sudo sh -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'  # Write to it

# Read from a device like a file
sudo head -c 512 /dev/sda | xxd | head    # Read raw disk bytes

Safety Warning: The command head -c 512 /dev/sda reads raw bytes from your disk. It is safe (read-only). But never write to /dev/sda directly unless you want to destroy data. Commands like dd if=/dev/zero of=/dev/sda will wipe your disk instantly with no confirmation.


Hands-On: Exploring the Filesystem

Let us put this knowledge to work. Follow along on your own system.

Exercise 1: Finding Where Things Live

# Where is the bash binary?
which bash
# /usr/bin/bash

# Where does bash keep its config?
ls /etc/bash*
# /etc/bash.bashrc  /etc/bash_completion  etc.

# Where are bash's man pages?
man -w bash
# /usr/share/man/man1/bash.1.gz

Notice the pattern: binary in /usr/bin, config in /etc, docs in /usr/share.

Exercise 2: Using find to Search the Tree

# Find all files named "sshd_config"
sudo find / -name "sshd_config" 2>/dev/null

# Find all log files larger than 100MB
sudo find /var/log -size +100M 2>/dev/null

# Find config files modified in the last hour
sudo find /etc -type f -mmin -60 2>/dev/null

# Find all SUID binaries (security audit)
sudo find / -perm -4000 -type f 2>/dev/null

Exercise 3: Using tree for Visual Exploration

# Install tree if needed
# Debian/Ubuntu: sudo apt install tree
# Fedora: sudo dnf install tree
# Arch: sudo pacman -S tree

# Two levels deep from root
tree -L 2 / 2>/dev/null | head -50

# Show only directories
tree -d -L 2 /etc | head -40

# Show with file sizes
tree -sh /var/log 2>/dev/null | head -30

Exercise 4: Checking Disk Usage by Directory

# Top-level disk usage
du -sh /* 2>/dev/null | sort -rh

# What's eating space in /var?
sudo du -sh /var/*/ 2>/dev/null | sort -rh

# What's eating space in /var/log?
sudo du -sh /var/log/* 2>/dev/null | sort -rh | head -10

The UsrMerge: Why /bin Points to /usr/bin

On modern distributions, you will notice that /bin, /sbin, /lib, and /lib64 are symlinks:

ls -ld /bin /sbin /lib
# lrwxrwxrwx 1 root root 7 ... /bin -> usr/bin
# lrwxrwxrwx 1 root root 8 ... /sbin -> usr/sbin
# lrwxrwxrwx 1 root root 7 ... /lib -> usr/lib

Historically, the split between /bin and /usr/bin existed because /usr might be on a separate disk that was not available during early boot. The essential boot commands had to live in /bin on the root filesystem. Modern systems mount all filesystems before user space starts (via initramfs), so the split is no longer necessary. Merging simplifies packaging and avoids confusing questions like "should this binary go in /bin or /usr/bin?"


Debug This

A colleague tells you: "I installed a custom-compiled application yesterday but I can't find it. I used ./configure && make && sudo make install with default settings. Where did it go?"

Try to answer before reading the solution.

Solution

When you compile from source with default settings, make install puts binaries in /usr/local/bin, libraries in /usr/local/lib, and config files in /usr/local/etc. Check:

ls /usr/local/bin/
ls /usr/local/lib/
ls /usr/local/etc/

If the binary is not in $PATH, it is likely because /usr/local/bin is not in the user's PATH (rare but possible). Check with:

echo $PATH | tr ':' '\n' | grep local

What Just Happened?

+------------------------------------------------------------------+
|                    CHAPTER 5 RECAP                                |
+------------------------------------------------------------------+
|                                                                    |
|  - Linux has ONE directory tree, rooted at /                      |
|  - The FHS defines where everything goes                          |
|  - /bin, /sbin    -> Essential commands (often merged to /usr)    |
|  - /etc           -> System-wide configuration (plain text!)      |
|  - /home          -> User home directories                        |
|  - /var           -> Variable data: logs, caches, spool           |
|  - /tmp           -> Temporary files (cleared on reboot)          |
|  - /usr           -> Bulk of installed programs and libraries     |
|  - /usr/local     -> Locally compiled software                    |
|  - /opt           -> Self-contained third-party software          |
|  - /proc, /sys    -> Virtual filesystems (kernel info)            |
|  - /dev           -> Device files (everything is a file!)         |
|  - /boot          -> Kernel and bootloader                        |
|  - /mnt, /media   -> Mount points for extra filesystems           |
|                                                                    |
|  Tools: ls, find, tree, du, which, file, cat, mount              |
|                                                                    |
+------------------------------------------------------------------+

Try This

Exercises

  1. Map your system. Run du -sh /* 2>/dev/null | sort -rh and identify which three directories consume the most space. Can you explain why?

  2. Track down a service. Pick a service running on your system (e.g., sshd or cron). Find its binary (using which), its configuration file (in /etc), its log file (in /var/log), and its PID file (in /run).

  3. Explore /proc. Find your shell's PID with echo $$, then explore /proc/<PID>/. Read status, cmdline, environ, and list fd/. What do you discover?

  4. Find the largest log file. Using find and du, locate the single largest file in /var/log. What service is writing it?

  5. Predict the location. Without searching, predict where each of these lives, then verify:

    • The timezone configuration
    • The system's DNS resolver settings
    • The top command binary
    • The kernel's view of mounted filesystems

Bonus Challenge

Write a shell script called fs-audit.sh that:

  • Lists each top-level directory under / with its size
  • Counts the number of files in /etc
  • Shows the 5 largest files in /var/log
  • Reports whether /tmp is a tmpfs or a regular directory
  • Checks whether /bin is a symlink or a real directory
#!/bin/bash
echo "=== Filesystem Audit ==="
echo ""
echo "--- Top-level directory sizes ---"
du -sh /* 2>/dev/null | sort -rh
echo ""
echo "--- Number of files in /etc ---"
find /etc -type f 2>/dev/null | wc -l
echo ""
echo "--- 5 largest files in /var/log ---"
sudo find /var/log -type f -exec du -sh {} + 2>/dev/null | sort -rh | head -5
echo ""
echo "--- /tmp filesystem type ---"
df -T /tmp | tail -1
echo ""
echo "--- Is /bin a symlink? ---"
if [ -L /bin ]; then
    echo "Yes: $(ls -l /bin | awk '{print $NF}')"
else
    echo "No, /bin is a real directory"
fi

Next up: Chapter 6 -- Files, Permissions & Ownership. Now that you know where everything lives, you need to understand who can access it and how.

Files, Permissions & Ownership

Why This Matters

It is 2 AM. A deployment just went out and the application is returning 403 Forbidden errors. The developer says "it works on my machine." You SSH into the production server, check the application directory, and discover that the deployment script set the wrong file permissions -- the web server process cannot read the files it needs to serve. You run one chmod command and the site comes back to life.

Permission problems are among the most common issues in Linux administration. They are also among the quickest to fix -- if you understand how they work. This chapter gives you a thorough, hands-on understanding of Linux file types, permission bits, ownership, and the more advanced features like SUID, sticky bits, and ACLs.


Try This Right Now

# Look at permissions on some files
ls -l /etc/passwd
ls -l /etc/shadow
ls -la ~/

# Create a test file and inspect it
touch /tmp/permtest
ls -l /tmp/permtest

# What is your current umask?
umask

# Who are you?
id
whoami
groups

Look at the output of ls -l /etc/passwd:

-rw-r--r-- 1 root root 2847 Jan 15 10:30 /etc/passwd

That string -rw-r--r-- encodes the file type and permissions. By the end of this chapter, you will be able to read and set these in your sleep.


File Types in Linux

Linux has seven file types. The first character in the ls -l output tells you which type you are looking at.

SymbolTypeExample
-Regular file/etc/passwd
dDirectory/home/alice
lSymbolic link/bin -> /usr/bin
bBlock device/dev/sda
cCharacter device/dev/tty
sSocket/run/systemd/journal/socket
pNamed pipe (FIFO)Created with mkfifo

Let us see each one:

# Regular file
ls -l /etc/hostname
# -rw-r--r-- ...

# Directory
ls -ld /etc
# drwxr-xr-x ...

# Symbolic link
ls -l /bin
# lrwxrwxrwx ... /bin -> usr/bin

# Block device (disk)
ls -l /dev/sda 2>/dev/null || ls -l /dev/vda 2>/dev/null
# brw-rw---- ...

# Character device (terminal)
ls -l /dev/tty
# crw-rw-rw- ...

# Socket
ls -l /run/systemd/journal/dev-log 2>/dev/null
# srw-rw-rw- ...

# Named pipe
mkfifo /tmp/mypipe
ls -l /tmp/mypipe
# prw-r--r-- ...
rm /tmp/mypipe

Think About It: Why does Linux distinguish between block devices and character devices? Think about how you access a hard disk (in blocks of data) versus a keyboard (one character at a time).


Understanding Permission Bits

Every file has three sets of permission bits:

  Owner   Group   Others
  r w x   r w x   r w x
PermissionOn a FileOn a Directory
r (read)Read file contentsList directory contents (ls)
w (write)Modify fileCreate/delete files in directory
x (execute)Run as programEnter directory (cd)

Let us decode the string -rw-r--r--:

-    rw-    r--    r--
|    |      |      |
|    |      |      +-- Others: read only
|    |      +--------- Group: read only
|    +---------------- Owner: read + write
+--------------------- Type: regular file

Hands-On: See Permissions in Action

# Create a test directory
mkdir -p /tmp/permlab
cd /tmp/permlab

# Create a file
echo "Hello, permissions!" > testfile.txt
ls -l testfile.txt
# -rw-r--r-- 1 youruser yourgroup 20 ... testfile.txt

# Remove read permission for yourself
chmod u-r testfile.txt
cat testfile.txt
# cat: testfile.txt: Permission denied

# Restore read, remove for others
chmod u+r,o-r testfile.txt
ls -l testfile.txt
# -rw-r----- 1 youruser yourgroup 20 ... testfile.txt

# Make it executable
chmod u+x testfile.txt
ls -l testfile.txt
# -rwxr----- 1 youruser yourgroup 20 ... testfile.txt

Directory Permissions Are Different

This is a common source of confusion. For directories:

  • r (read): You can list the directory contents with ls
  • w (write): You can create, rename, or delete files inside
  • x (execute): You can enter the directory with cd and access files inside
# Create a test directory
mkdir /tmp/dirtest
echo "secret" > /tmp/dirtest/file.txt

# Remove execute permission
chmod a-x /tmp/dirtest
ls /tmp/dirtest          # Works (read still allowed)
cat /tmp/dirtest/file.txt  # FAILS -- can't traverse directory

# Restore execute, remove read
chmod a+x,a-r /tmp/dirtest
ls /tmp/dirtest          # FAILS -- can't list
cat /tmp/dirtest/file.txt  # Works (if you know the filename)

# Clean up
chmod 755 /tmp/dirtest

Think About It: A directory has r permission but not x. Can you list files in it? Can you read those files? What about the reverse -- x but no r?


Octal (Numeric) Notation

Sysadmins almost always use the octal shorthand instead of the symbolic rwx notation. Each permission bit is a power of 2:

r = 4
w = 2
x = 1

So:  rwx = 4+2+1 = 7
     rw- = 4+2+0 = 6
     r-x = 4+0+1 = 5
     r-- = 4+0+0 = 4
     --- = 0+0+0 = 0

Three digits for owner, group, others:

chmod 755 file  -->  rwxr-xr-x   (owner: full, group/others: read+execute)
chmod 644 file  -->  rw-r--r--   (owner: read+write, group/others: read only)
chmod 700 file  -->  rwx------   (owner: full, nobody else)
chmod 600 file  -->  rw-------   (owner: read+write, nobody else)
chmod 777 file  -->  rwxrwxrwx   (everyone: full access -- almost never do this!)

Common permissions you will use every day:

OctalSymbolicTypical Use
644rw-r--r--Regular files (configs, HTML, etc.)
755rwxr-xr-xDirectories, executable scripts
600rw-------Private files (SSH keys, secrets)
700rwx------Private directories
750rwxr-x---Group-accessible directories
664rw-rw-r--Shared writable files
# Practice: set permissions using octal
chmod 644 /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt

chmod 600 /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt

chmod 755 /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt

Safety Warning: Never use chmod 777 on production systems. It means "every user on this system can read, write, and execute this file." If someone tells you to run chmod -R 777 /, they are either joking or about to destroy your system's security.


Ownership: chown and chgrp

Every file has an owner (user) and a group. You change them with chown and chgrp.

# View ownership
ls -l /tmp/permlab/testfile.txt
# -rwxr-xr-x 1 youruser yourgroup 20 ... testfile.txt
#               ^^^^^^^^ ^^^^^^^^^
#               owner    group

# Change owner (requires root)
sudo chown root /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt

# Change group
sudo chgrp root /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt

# Change both at once: user:group
sudo chown youruser:yourgroup /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt

# Recursively change ownership of a directory
sudo chown -R www-data:www-data /var/www/html 2>/dev/null

Real-World Scenario: Web Server Permissions

A web server (Nginx or Apache) runs as a specific user, typically www-data (Debian/Ubuntu) or nginx (RHEL/Fedora). The web content files must be readable by that user.

# Typical web directory setup
sudo mkdir -p /var/www/mysite
sudo chown -R www-data:www-data /var/www/mysite
sudo chmod -R 755 /var/www/mysite
sudo chmod 644 /var/www/mysite/*.html 2>/dev/null

Distro Note: The web server user is www-data on Debian/Ubuntu, nginx on RHEL/Fedora (when using Nginx), and apache on RHEL/Fedora (when using Apache httpd).


The umask: Default Permission Control

When you create a new file or directory, the permissions are not set to 777. They are filtered through your umask (user file-creation mask).

# Check your current umask
umask
# Typical output: 0022

# What does 0022 mean?
# Default for files:   666 - 022 = 644 (rw-r--r--)
# Default for dirs:    777 - 022 = 755 (rwxr-xr-x)

# Verify
touch /tmp/umask-test-file
mkdir /tmp/umask-test-dir
ls -l /tmp/umask-test-file
ls -ld /tmp/umask-test-dir

The umask subtracts permissions. Common umask values:

umaskFile resultDirectory resultUse case
022644755Default (shared readable)
027640750Group-readable, others blocked
077600700Private (only owner)
002664775Group-collaborative
# Temporarily change umask
umask 077
touch /tmp/private-file
ls -l /tmp/private-file
# -rw------- 1 youruser yourgroup 0 ... /tmp/private-file

# Reset to default
umask 022

To make umask changes permanent, add umask 027 to your ~/.bashrc or set it system-wide in /etc/login.defs or /etc/profile.


Special Permission Bits: SUID, SGID, and Sticky

Beyond the basic rwx bits, there are three special bits that you will encounter in production systems.

SUID (Set User ID) -- The 4 Bit

When set on an executable, the program runs as the file's owner, not the user who invoked it. This is how passwd can modify /etc/shadow even when run by a regular user.

# Notice the 's' in the owner's execute position
ls -l /usr/bin/passwd
# -rwsr-xr-x 1 root root ... /usr/bin/passwd
#    ^
#    SUID bit

# Find all SUID binaries on the system
sudo find / -perm -4000 -type f 2>/dev/null

Setting SUID:

# Using octal (4 prepended)
chmod 4755 somefile

# Using symbolic
chmod u+s somefile

Safety Warning: SUID binaries are a security-sensitive feature. A SUID root program with a vulnerability can give attackers root access. Regular security audits should check for unexpected SUID binaries.

SGID (Set Group ID) -- The 2 Bit

On an executable: the program runs with the file's group privileges. On a directory: new files created inside inherit the directory's group (instead of the creator's primary group). This is extremely useful for shared project directories.

# Create a shared project directory
sudo mkdir /tmp/project
sudo chgrp developers /tmp/project 2>/dev/null || sudo chgrp staff /tmp/project
sudo chmod 2775 /tmp/project
ls -ld /tmp/project
# drwxrwsr-x ... /tmp/project
#         ^
#         SGID bit (s in group execute position)

# New files inherit the group
touch /tmp/project/newfile
ls -l /tmp/project/newfile
# The group will be 'developers' (or 'staff'), not your primary group

Sticky Bit -- The 1 Bit

When set on a directory, only the file's owner (or root) can delete or rename files inside, even if others have write permission. The classic example is /tmp.

ls -ld /tmp
# drwxrwxrwt ... /tmp
#          ^
#          Sticky bit (t in others execute position)

# Without the sticky bit, any user could delete any other user's files in /tmp
# With it, you can only delete your own files

Setting the sticky bit:

chmod 1777 /tmp/shared-dir    # Octal
chmod +t /tmp/shared-dir      # Symbolic

Summary of Special Bits

Octal    Symbolic    Position              Effect
4xxx     u+s         Owner execute: s/S    SUID -- run as file owner
2xxx     g+s         Group execute: s/S    SGID -- run as file group / inherit group
1xxx     +t          Other execute: t/T    Sticky -- only owner can delete

When you see an uppercase S or T instead of lowercase, it means the special bit is set but the underlying execute bit is NOT set (an unusual and often incorrect configuration).

# Uppercase S means SUID without execute -- usually a mistake
chmod 4644 /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt
# -rwSr--r-- ... (capital S = SUID set but no execute)

# Fix it
chmod 4755 /tmp/permlab/testfile.txt
ls -l /tmp/permlab/testfile.txt
# -rwsr-xr-x ... (lowercase s = SUID with execute)

Access Control Lists (ACLs)

Traditional Unix permissions are limited: one owner, one group, and everyone else. What if you need to give user alice read access and user bob read+write access to the same file, without creating a special group? ACLs solve this.

Checking ACL Support

# Check if the filesystem supports ACLs
mount | grep -E 'ext4|xfs|btrfs'
# Most modern filesystems support ACLs by default

# Install ACL tools if needed
# Debian/Ubuntu: sudo apt install acl
# Fedora/RHEL: sudo dnf install acl
# (usually pre-installed)

Using getfacl and setfacl

# Create a test file
echo "ACL test" > /tmp/acltest.txt

# View current ACL
getfacl /tmp/acltest.txt
# file: tmp/acltest.txt
# owner: youruser
# group: yourgroup
# user::rw-
# group::r--
# other::r--

# Grant specific user read+write access
sudo setfacl -m u:nobody:rw /tmp/acltest.txt

# View updated ACL
getfacl /tmp/acltest.txt
# Notice the new line: user:nobody:rw-

# ls -l now shows a '+' indicating ACLs are present
ls -l /tmp/acltest.txt
# -rw-rw-r--+ 1 youruser yourgroup ... /tmp/acltest.txt
#           ^
#           The + means ACLs are set

# Grant a group specific permissions
sudo setfacl -m g:adm:r /tmp/acltest.txt

# Set a default ACL on a directory (affects new files created inside)
mkdir /tmp/acl-dir
setfacl -d -m u:nobody:rwx /tmp/acl-dir
getfacl /tmp/acl-dir

# Remove a specific ACL entry
setfacl -x u:nobody /tmp/acltest.txt

# Remove all ACLs
setfacl -b /tmp/acltest.txt

ACL Quick Reference

CommandEffect
getfacl fileShow ACLs on a file
setfacl -m u:alice:rwx fileGive user alice rwx access
setfacl -m g:devs:rx fileGive group devs r-x access
setfacl -x u:alice fileRemove alice's ACL entry
setfacl -b fileRemove all ACLs
setfacl -d -m u:alice:rwx dirSet default ACL for new files in dir
setfacl -R -m u:alice:rx dirApply recursively

Think About It: When would you use ACLs instead of traditional Unix permissions? Think about a web development team where the designer needs read-only access to CSS files, the developer needs full access, and the deployment bot needs read+execute.


Real-World Permission Scenarios

Scenario 1: SSH Key Permissions

SSH is very strict about permissions. If your key files are too permissive, SSH will refuse to use them.

# These are the required permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_rsa            # Private key: owner read/write ONLY
chmod 644 ~/.ssh/id_rsa.pub        # Public key: readable
chmod 600 ~/.ssh/authorized_keys   # authorized_keys: owner read/write ONLY
chmod 644 ~/.ssh/config            # SSH config: readable

# If you get "Permissions too open" errors, this is why

Scenario 2: Shared Team Directory

# Create a shared directory for the 'devteam' group
sudo groupadd devteam
sudo usermod -aG devteam alice
sudo usermod -aG devteam bob

sudo mkdir /opt/project
sudo chown root:devteam /opt/project
sudo chmod 2775 /opt/project
# 2 = SGID (new files inherit the group)
# 775 = rwxrwxr-x (owner and group can write, others can read)

Scenario 3: Fixing the Web Server 403 Error

# The problem: web server can't read files
# Step 1: Check what user the web server runs as
ps aux | grep -E 'nginx|apache|httpd' | head -3

# Step 2: Check current permissions
ls -la /var/www/html/

# Step 3: Fix ownership
sudo chown -R www-data:www-data /var/www/html/

# Step 4: Fix permissions
sudo find /var/www/html -type d -exec chmod 755 {} \;
sudo find /var/www/html -type f -exec chmod 644 {} \;

Debug This

You have a script called deploy.sh with the following permissions:

-rw-r--r-- 1 deploy deploy 1542 Jan 20 14:30 deploy.sh

When you try to run it, you get "Permission denied." You run chmod 777 deploy.sh and it works, but your security team flags this as a violation. What is the minimum permission change needed?

Solution

The script is missing the execute bit. You do not need 777. The minimum fix is:

chmod u+x deploy.sh
# Result: -rwxr--r-- (owner can execute, no extra access for others)

# Or if the whole team needs to run it:
chmod 755 deploy.sh
# Result: -rwxr-xr-x (everyone can execute, only owner can modify)

Using 777 was wrong because it gives write access to everyone -- any user on the system could modify your deploy script and inject malicious commands.


What Just Happened?

+------------------------------------------------------------------+
|                    CHAPTER 6 RECAP                                |
+------------------------------------------------------------------+
|                                                                    |
|  File Types: regular(-) dir(d) link(l) block(b) char(c)          |
|              socket(s) pipe(p)                                    |
|                                                                    |
|  Permission Bits:    r=4   w=2   x=1                              |
|  Three sets:         Owner | Group | Others                       |
|  Common octals:      644 (files), 755 (dirs), 600 (private)      |
|                                                                    |
|  Commands:                                                         |
|    chmod   -- Change permissions (symbolic or octal)              |
|    chown   -- Change owner (and optionally group)                 |
|    chgrp   -- Change group                                        |
|    umask   -- Set default permission mask                         |
|                                                                    |
|  Special bits:                                                     |
|    SUID (4xxx)  -- Run as file owner                              |
|    SGID (2xxx)  -- Run as file group / inherit group in dirs      |
|    Sticky (1xxx) -- Only owner can delete in shared dirs          |
|                                                                    |
|  ACLs: getfacl / setfacl for fine-grained access control         |
|                                                                    |
+------------------------------------------------------------------+

Try This

Exercises

  1. Decode these permissions. Without using a computer, write out what each octal means in rwx notation:

    • 750
    • 4711
    • 1777
    • 2755
    • 0600
  2. Permission detective. Run ls -l /usr/bin/passwd, ls -l /etc/shadow, and ls -l /etc/passwd. Explain why passwd has the SUID bit and how this allows regular users to change their password.

  3. Directory permissions lab. Create a directory where:

    • The owner can do everything
    • The group can read and enter but not create files
    • Others have no access at all What octal value is this?
  4. ACL challenge. Create a file. Using ACLs, give three different users three different levels of access (one gets r, another rw, another rwx). Verify with getfacl.

  5. Security audit. Find all SUID and SGID binaries on your system. How many are there? Research what three of them do and explain why they need the special bit.

Bonus Challenge

Write a script that takes a directory as an argument and produces a permission report:

  • Lists all files with their octal permissions
  • Flags any files with 777 permissions
  • Flags any SUID/SGID files
  • Reports the total number of files for each permission level
#!/bin/bash
# permission-audit.sh - Audit file permissions in a directory
DIR="${1:-.}"
echo "=== Permission Audit: $DIR ==="
echo ""

echo "--- Files with 777 permissions (DANGER!) ---"
find "$DIR" -perm 777 -type f 2>/dev/null

echo ""
echo "--- SUID files ---"
find "$DIR" -perm -4000 -type f 2>/dev/null

echo ""
echo "--- SGID files ---"
find "$DIR" -perm -2000 -type f 2>/dev/null

echo ""
echo "--- Permission distribution ---"
find "$DIR" -type f -printf '%m\n' 2>/dev/null | sort | uniq -c | sort -rn | head -20

echo ""
echo "--- World-writable files ---"
find "$DIR" -perm -o+w -type f 2>/dev/null

Next up: Chapter 7 -- Disks, Partitions & Filesystems. Now that you know about files and permissions, let us look at the physical storage underneath it all.

Disks, Partitions & Filesystems

Why This Matters

A monitoring alert fires: "Disk usage on /var has crossed 90%." You need to add a new disk, partition it, create a filesystem, and mount it -- all without rebooting, all without losing data, and ideally within the next fifteen minutes before the application crashes because it cannot write logs.

This is not a rare event. Disk management is a bread-and-butter skill for any Linux administrator. Whether you are provisioning a new server, expanding storage for a growing database, or preparing a fresh cloud instance, you need to understand how Linux sees physical disks, how you carve them into partitions, how you put a filesystem on top, and how you make that filesystem accessible to the rest of the system.


Try This Right Now

# What block devices does the system have?
lsblk

# How much disk space is used?
df -h

# What filesystems are mounted?
mount | column -t | head -20

# What is the disk usage of your current directory?
du -sh .

# What are the UUIDs of your partitions?
sudo blkid

These five commands form the core toolkit for day-to-day disk management. Let us understand every one of them deeply.


Block Devices: How Linux Sees Disks

A block device is a piece of hardware (or virtualized hardware) that stores data in fixed-size blocks and supports random access. Hard drives, SSDs, NVMe drives, USB sticks, and virtual disks are all block devices.

Linux represents block devices as files in /dev/:

Device type       Naming scheme         Example
-----------       -------------         -------
SATA/SCSI/SAS     /dev/sdX              /dev/sda, /dev/sdb
NVMe              /dev/nvmeXnY          /dev/nvme0n1
VirtIO (VMs)      /dev/vdX              /dev/vda, /dev/vdb
MMC/SD cards      /dev/mmcblkX          /dev/mmcblk0
Floppy (legacy)   /dev/fdX              /dev/fd0

Partitions are numbered:

/dev/sda          <-- Entire first SATA disk
/dev/sda1         <-- First partition on sda
/dev/sda2         <-- Second partition on sda
/dev/sda3         <-- Third partition on sda

/dev/nvme0n1      <-- Entire first NVMe drive
/dev/nvme0n1p1    <-- First partition on nvme0n1
/dev/nvme0n1p2    <-- Second partition on nvme0n1

lsblk -- Your Best Friend

lsblk lists all block devices in a tree format, showing the relationship between disks, partitions, and mount points.

lsblk

Typical output:

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0    50G  0 disk
├─sda1        8:1    0     1G  0 part /boot
├─sda2        8:2    0    45G  0 part /
└─sda3        8:3    0     4G  0 part [SWAP]
sdb           8:16   0   100G  0 disk
└─sdb1        8:17   0   100G  0 part /data

Useful lsblk options:

# Show filesystem type and UUIDs
lsblk -f

# Show sizes in bytes
lsblk -b

# Show all devices including empty ones
lsblk -a

# Output specific columns
lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID

MBR vs GPT: Partition Table Formats

Before you create partitions, the disk needs a partition table -- a data structure at the beginning of the disk that records the layout.

MBR (Master Boot Record)

  • The original PC partition scheme from 1983
  • Supports up to 4 primary partitions (or 3 primary + 1 extended with unlimited logical partitions)
  • Maximum disk size: 2 TB
  • Stores the partition table in the first 512 bytes of the disk
  • Used with BIOS boot

GPT (GUID Partition Table)

  • Modern standard, part of UEFI specification
  • Supports up to 128 partitions by default (no extended/logical nonsense)
  • Maximum disk size: 9.4 ZB (zettabytes -- effectively unlimited)
  • Stores redundant copies of the partition table (beginning and end of disk)
  • Includes CRC32 checksums for integrity
  • Used with UEFI boot (but can also work with BIOS via "BIOS boot partition")
+-------------------------------------------------------------------+
|                     MBR vs GPT Comparison                         |
+-------------------------------------------------------------------+
|                                                                     |
|   MBR (Legacy)              GPT (Modern)                          |
|   +-----------+             +--+--+--+--+--+--+--+--+--+--+      |
|   | MBR | P1 | P2 | P3 | P4 |   | GP | P1 | P2 | .. | Pn | GP |  |
|   +-----------+             +--+--+--+--+--+--+--+--+--+--+      |
|   Max 4 primary             Up to 128 partitions                   |
|   Max 2 TB disk             Max 9.4 ZB disk                       |
|   No redundancy             Backup table at disk end              |
|   BIOS boot                 UEFI boot (usually)                   |
|                                                                     |
+-------------------------------------------------------------------+

Think About It: When should you still use MBR? Mainly for legacy systems that boot with BIOS (not UEFI), or for very small disks where compatibility with older tools matters. For anything new, use GPT.


Partitioning Tools: fdisk and parted

fdisk -- Interactive Disk Partitioning

fdisk is the classic Linux partitioning tool. It now supports both MBR and GPT. It is interactive but straightforward.

Safety Warning: Partitioning a disk that contains data will destroy that data. Always double-check the device name. Running fdisk /dev/sda when you meant /dev/sdb could wipe your operating system. Use lsblk first to confirm which disk is which.

# ALWAYS verify your target disk first
lsblk

# View existing partition table (safe, read-only)
sudo fdisk -l /dev/sdb

# Start interactive partitioning (ONLY on a disk you want to modify)
sudo fdisk /dev/sdb

Inside fdisk, the key commands are:

CommandAction
pPrint current partition table
nCreate a new partition
dDelete a partition
tChange partition type
gCreate a new GPT partition table
oCreate a new MBR partition table
wWrite changes and exit
qQuit without saving

Hands-On: Partitioning with fdisk (Using a Loop Device)

We will safely practice by creating a virtual disk using a loop device. This will not touch your real disks.

# Create a 500MB file to use as a virtual disk
dd if=/dev/zero of=/tmp/practice-disk.img bs=1M count=500

# Attach it as a loop device
sudo losetup -fP /tmp/practice-disk.img
LOOPDEV=$(sudo losetup -j /tmp/practice-disk.img | cut -d: -f1)
echo "Our virtual disk is: $LOOPDEV"

# Verify it appears
lsblk $LOOPDEV

# Now partition it with fdisk
sudo fdisk $LOOPDEV

Inside fdisk, follow these steps:

Command (m for help): g        <-- Create GPT partition table
Command (m for help): n        <-- New partition
Partition number: 1            <-- Accept default
First sector: [Enter]          <-- Accept default
Last sector: +200M             <-- 200MB partition
Command (m for help): n        <-- New partition
Partition number: 2            <-- Accept default
First sector: [Enter]          <-- Accept default
Last sector: [Enter]           <-- Use remaining space
Command (m for help): p        <-- Print to verify
Command (m for help): w        <-- Write and exit
# Verify the partitions
lsblk $LOOPDEV

# You should see:
# loop0        7:0    0   500M  0 loop
# ├─loop0p1  259:0    0   200M  0 part
# └─loop0p2  259:1    0   298M  0 part

parted -- Non-Interactive Partitioning

parted is more powerful than fdisk and can be used non-interactively, which makes it better for scripting.

# View disk info
sudo parted $LOOPDEV print

# Non-interactive examples (DON'T run these -- just reference):
# Create GPT label
# sudo parted /dev/sdb mklabel gpt

# Create a partition from 1MiB to 500MiB
# sudo parted /dev/sdb mkpart primary ext4 1MiB 500MiB

Distro Note: On Fedora/RHEL, parted is installed by default. On minimal Debian/Ubuntu installs, you may need sudo apt install parted.


Creating Filesystems with mkfs

A partition is just a raw chunk of disk space. Before you can store files on it, you need to put a filesystem on it. The mkfs command creates filesystems.

Common Linux Filesystems

FilesystemCommandStrengthsUse Case
ext4mkfs.ext4Mature, reliable, widely supportedGeneral purpose, default choice
XFSmkfs.xfsExcellent large file / parallel I/ODatabases, large files, RHEL default
Btrfsmkfs.btrfsSnapshots, checksums, compressionAdvanced features, openSUSE/Fedora
# Create ext4 filesystem on our practice partitions
sudo mkfs.ext4 ${LOOPDEV}p1

# Create XFS filesystem
sudo mkfs.xfs ${LOOPDEV}p2

# Verify
sudo blkid ${LOOPDEV}p1 ${LOOPDEV}p2
lsblk -f $LOOPDEV

You should see output like:

NAME       FSTYPE LABEL UUID                                 MOUNTPOINT
loop0
├─loop0p1  ext4         a1b2c3d4-e5f6-7890-abcd-ef1234567890
└─loop0p2  xfs          f1e2d3c4-b5a6-7890-abcd-ef0987654321

mkfs Options

# ext4 with a label
sudo mkfs.ext4 -L "mydata" /dev/sdXn

# ext4 with specific block size
sudo mkfs.ext4 -b 4096 /dev/sdXn

# XFS with a label
sudo mkfs.xfs -L "bigdata" /dev/sdXn

# Btrfs with a label
sudo mkfs.btrfs -L "snapshots" /dev/sdXn

Safety Warning: mkfs destroys all existing data on the partition. There is no "Are you sure?" prompt for most variants. Triple-check the device name before running it.


Mounting and Unmounting Filesystems

A filesystem on a partition is useless until it is mounted -- attached to a directory in the filesystem tree.

mount -- Attach a Filesystem

# Create mount points
sudo mkdir -p /mnt/part1
sudo mkdir -p /mnt/part2

# Mount our practice partitions
sudo mount ${LOOPDEV}p1 /mnt/part1
sudo mount ${LOOPDEV}p2 /mnt/part2

# Verify
df -h /mnt/part1 /mnt/part2
mount | grep loop

# Use the filesystems
echo "Hello from ext4!" | sudo tee /mnt/part1/hello.txt
echo "Hello from XFS!" | sudo tee /mnt/part2/hello.txt
cat /mnt/part1/hello.txt
cat /mnt/part2/hello.txt

umount -- Detach a Filesystem

# Unmount (note: it's "umount", not "unmount"!)
sudo umount /mnt/part1
sudo umount /mnt/part2

If you get "device is busy":

# Find what is using the mount point
sudo lsof +D /mnt/part1
# or
sudo fuser -mv /mnt/part1

# Force unmount (use with caution)
sudo umount -l /mnt/part1    # Lazy unmount -- detaches immediately,
                              # cleans up when no longer in use

Mount Options

# Mount read-only
sudo mount -o ro ${LOOPDEV}p1 /mnt/part1

# Mount with no-exec (prevent running executables)
sudo mount -o noexec ${LOOPDEV}p1 /mnt/part1

# Mount with nosuid (ignore SUID bits)
sudo mount -o nosuid ${LOOPDEV}p1 /mnt/part1

# Combine options
sudo mount -o ro,noexec,nosuid ${LOOPDEV}p1 /mnt/part1

# Remount with different options (without unmounting)
sudo mount -o remount,rw /mnt/part1

Persistent Mounts with /etc/fstab

Mounts created with the mount command are temporary -- they disappear on reboot. To make mounts persistent, add them to /etc/fstab.

# View your current fstab
cat /etc/fstab

fstab Format

Each line in /etc/fstab has six fields:

<device>     <mount point>  <fstype>  <options>       <dump>  <pass>
UUID=xxxx    /data          ext4      defaults        0       2
/dev/sdb1    /backup        xfs       defaults,noatime 0      2
FieldMeaning
deviceWhat to mount (UUID, LABEL, or device path)
mount pointWhere to mount it
fstypeFilesystem type (ext4, xfs, btrfs, swap, etc.)
optionsMount options (defaults, ro, noexec, noatime, etc.)
dumpBackup with dump? (0=no, 1=yes. Almost always 0.)
passfsck order (0=skip, 1=root, 2=other filesystems)

Using UUIDs (Best Practice)

Device names like /dev/sdb1 can change if you add or remove disks. UUIDs are unique identifiers that never change.

# Find UUIDs
sudo blkid

# Example output:
# /dev/sda1: UUID="a1b2c3d4-..." TYPE="ext4"
# /dev/sda2: UUID="e5f6a7b8-..." TYPE="swap"

Adding an Entry to fstab

# Get the UUID of your partition
UUID=$(sudo blkid -s UUID -o value ${LOOPDEV}p1)
echo "UUID is: $UUID"

# Add to fstab (be VERY careful editing this file)
echo "UUID=$UUID  /mnt/part1  ext4  defaults  0  2" | sudo tee -a /etc/fstab

# Test without rebooting
sudo mount -a

# Verify
df -h /mnt/part1

Safety Warning: A syntax error in /etc/fstab can prevent your system from booting. Always test with sudo mount -a after editing. If you are editing remotely, keep a second SSH session open as a safety net. Also consider using sudo findmnt --verify to check for errors.

# Validate fstab syntax
sudo findmnt --verify --tab-file /etc/fstab

Think About It: Why does the pass field matter? Think about what happens if the system loses power while writing to disk, and the filesystem needs to be checked for consistency before it can be safely used.


Checking Disk Space: df and du

df -- Disk Free (Filesystem Level)

Shows how much space is used and available on each mounted filesystem.

# Human-readable sizes
df -h

# Show filesystem types
df -hT

# Check specific mount point
df -h /var

# Show only real filesystems (exclude tmpfs, devtmpfs, etc.)
df -h -x tmpfs -x devtmpfs -x squashfs

Example output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        45G   12G   31G  28% /
/dev/sda1       976M  150M  760M  17% /boot
/dev/sdb1       100G   45G   55G  45% /data

du -- Disk Usage (Directory Level)

Shows the disk space used by files and directories.

# Size of current directory
du -sh .

# Size of each subdirectory
du -sh /var/*/ 2>/dev/null | sort -rh | head -10

# Size of a specific directory, 1 level deep
du -h --max-depth=1 /var 2>/dev/null | sort -rh

# Find the largest files in a directory
du -ah /var/log 2>/dev/null | sort -rh | head -10

# Total size of a directory
du -sh /usr

Practical: Finding What Is Eating Your Disk

# Step 1: Which filesystem is full?
df -h

# Step 2: Find the biggest directories
sudo du -h --max-depth=1 / 2>/dev/null | sort -rh | head -10

# Step 3: Drill down into the culprit
sudo du -h --max-depth=1 /var 2>/dev/null | sort -rh | head -10

# Step 4: Find individual large files
sudo find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | sort -k5 -rh | head -10

# Step 5: Check for deleted files still held open
sudo lsof +L1 2>/dev/null

Swap Space

Swap is disk space used as an extension of RAM. When physical memory is full, the kernel moves inactive pages to swap. While slower than RAM, it prevents out-of-memory crashes.

Checking Current Swap

# View swap
free -h
swapon --show
cat /proc/swaps

Creating a Swap Partition

# On a real disk (example -- do not run blindly):
# sudo mkswap /dev/sda3
# sudo swapon /dev/sda3

Creating a Swap File

This is often easier than creating a swap partition, especially on cloud instances.

# Create a 1GB swap file
sudo fallocate -l 1G /swapfile
# or: sudo dd if=/dev/zero of=/swapfile bs=1M count=1024

# Set correct permissions (MUST be 600)
sudo chmod 600 /swapfile

# Format as swap
sudo mkswap /swapfile

# Enable it
sudo swapon /swapfile

# Verify
swapon --show
free -h

To make it permanent, add to /etc/fstab:

/swapfile    none    swap    sw    0    0

Swappiness

The vm.swappiness parameter controls how aggressively the kernel uses swap (0-100).

# Check current value (default is usually 60)
cat /proc/sys/vm/swappiness

# Reduce swappiness (prefer RAM over swap)
sudo sysctl vm.swappiness=10

# Make permanent
echo "vm.swappiness=10" | sudo tee -a /etc/sysctl.d/99-swappiness.conf

Hands-On: Complete Disk Workflow

Let us bring everything together. We will use our practice loop device to walk through the full lifecycle.

# Step 1: Create a virtual disk
dd if=/dev/zero of=/tmp/lab-disk.img bs=1M count=200

# Step 2: Attach as loop device
sudo losetup -fP /tmp/lab-disk.img
DISK=$(sudo losetup -j /tmp/lab-disk.img | cut -d: -f1)
echo "Working with: $DISK"

# Step 3: Create a GPT partition table and one partition
echo -e "g\nn\n1\n\n\nw" | sudo fdisk $DISK

# Step 4: Create an ext4 filesystem with a label
sudo mkfs.ext4 -L "labdata" ${DISK}p1

# Step 5: Create a mount point
sudo mkdir -p /mnt/labdata

# Step 6: Mount it
sudo mount ${DISK}p1 /mnt/labdata

# Step 7: Verify
lsblk $DISK
df -h /mnt/labdata
mount | grep labdata

# Step 8: Use it
echo "Disk management is easy!" | sudo tee /mnt/labdata/test.txt
cat /mnt/labdata/test.txt

# Step 9: Check the filesystem details
sudo tune2fs -l ${DISK}p1 | head -20    # ext4 info
sudo dumpe2fs ${DISK}p1 2>/dev/null | head -30

# Step 10: Clean up
sudo umount /mnt/labdata
sudo losetup -d $DISK
rm /tmp/lab-disk.img

Debug This

A developer reports: "I added a 100GB disk to the server and added it to /etc/fstab, but after rebooting the server, it boots into emergency mode." You check the console and see filesystem mount errors.

What are the likely causes?

Solution

Common causes of fstab-related boot failures:

  1. Typo in UUID. Even one wrong character will prevent the mount from finding the device.

    sudo blkid  # Verify the correct UUID
    
  2. Wrong filesystem type. If the fstab says ext4 but the partition has xfs, the mount fails.

    sudo blkid -s TYPE -o value /dev/sdb1
    
  3. Device name changed. If fstab uses /dev/sdb1 and the disk enumeration changed, the mount fails. This is why UUIDs are recommended.

  4. Missing mount point directory. If the directory in fstab does not exist, the mount fails.

    ls -ld /mnt/data
    
  5. The pass (fsck) field. If set to a non-zero value and the filesystem check fails (corrupted filesystem), the system drops to emergency mode.

Fix from emergency mode:

# Edit fstab to fix the error
vi /etc/fstab

# Or use nofail option to prevent boot failure
UUID=xxxx  /data  ext4  defaults,nofail  0  2

The nofail mount option tells the system to continue booting even if this mount fails -- critical for non-essential data partitions.


What Just Happened?

+------------------------------------------------------------------+
|                    CHAPTER 7 RECAP                                |
+------------------------------------------------------------------+
|                                                                    |
|  Block Devices:                                                    |
|    /dev/sdX (SATA)  /dev/nvmeXnY (NVMe)  /dev/vdX (VirtIO)     |
|                                                                    |
|  Partition Tables: MBR (legacy, 2TB max) vs GPT (modern)         |
|                                                                    |
|  Partitioning:  fdisk (interactive)  parted (scriptable)         |
|                                                                    |
|  Filesystems: ext4 (default), XFS (large), Btrfs (snapshots)    |
|  Create with: mkfs.ext4, mkfs.xfs, mkfs.btrfs                   |
|                                                                    |
|  Mounting: mount/umount, /etc/fstab for persistence              |
|  Always use UUIDs in fstab, not device names                     |
|  Use 'nofail' for non-critical mounts                            |
|                                                                    |
|  Space checks: df -h (filesystem), du -sh (directory)            |
|                                                                    |
|  Swap: mkswap + swapon, swap files for flexibility               |
|                                                                    |
|  Tools: lsblk, blkid, fdisk, parted, mkfs, mount, umount,       |
|         df, du, swapon, free, findmnt                            |
|                                                                    |
+------------------------------------------------------------------+

Try This

Exercises

  1. Explore your system. Run lsblk -f and df -hT. Identify every physical disk, partition, filesystem type, and mount point on your system. Draw it as a tree diagram.

  2. Practice safe partitioning. Create a 1GB virtual disk with dd, partition it into three parts using fdisk (a 200MB ext4, a 500MB XFS, and the rest as swap). Create the filesystems, mount them, and verify.

  3. fstab lab. For each partition in exercise 2, add an fstab entry using UUIDs. Use the nofail option. Test with sudo mount -a.

  4. Disk usage investigation. Write a one-liner that finds the 10 largest files on your system (hint: find / -type f -printf '%s %p\n' | sort -rn | head -10).

  5. Swap file. Create a 512MB swap file, enable it, verify it shows in free -h, then disable and remove it.

Bonus Challenge

Write a script called disk-report.sh that generates a comprehensive disk report:

#!/bin/bash
echo "=== Disk Report: $(hostname) -- $(date) ==="
echo ""

echo "--- Block Devices ---"
lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID

echo ""
echo "--- Filesystem Usage ---"
df -hT -x tmpfs -x devtmpfs -x squashfs

echo ""
echo "--- Largest Directories Under / ---"
sudo du -h --max-depth=1 / 2>/dev/null | sort -rh | head -10

echo ""
echo "--- Swap Status ---"
free -h | grep -i swap
swapon --show

echo ""
echo "--- Partitions Over 80% Full ---"
df -h | awk 'NR>1 && int($5)>80 {print $0}'

echo ""
echo "--- fstab Entries ---"
grep -v '^#' /etc/fstab | grep -v '^$'

Next up: Chapter 8 -- Links, Inodes & the VFS. You now understand the physical storage layer. It is time to peek behind the curtain and see how Linux actually tracks files internally.

Links, Inodes & the VFS

Why This Matters

You delete a 2GB log file, but df -h shows no change in disk usage. You check and the file is gone -- ls confirms it. Yet the space is not freed. What is going on?

The answer lies in a concept called inodes -- the hidden data structures that the filesystem uses to track every file. The file you deleted was still held open by a running process, and as long as that process keeps its file descriptor open, the inode (and the data blocks it points to) stays alive. Understanding inodes, links, and how Linux's Virtual Filesystem layer ties everything together gives you the ability to diagnose puzzles like this in seconds.

This chapter pulls back the curtain on how Linux tracks files internally, how hard links and soft links work (and why they behave differently), and how the VFS lets you interact with wildly different systems -- physical disks, RAM-based pseudo-filesystems, network shares -- through a single uniform interface.


Try This Right Now

# See the inode number of a file
ls -i /etc/hostname

# Get detailed inode information
stat /etc/hostname

# See inode usage on your filesystem
df -i

# Create and compare hard and soft links
echo "original content" > /tmp/original.txt
ln /tmp/original.txt /tmp/hardlink.txt
ln -s /tmp/original.txt /tmp/softlink.txt

ls -li /tmp/original.txt /tmp/hardlink.txt /tmp/softlink.txt

Look at the output of that last command carefully. The original file and the hard link share the same inode number. The soft link has a different inode. This single observation is the key to understanding everything in this chapter.


What Is an Inode?

An inode (index node) is a data structure on disk that stores all the metadata about a file -- everything except the file's name and its actual data. Every file and directory on an ext4, XFS, or Btrfs filesystem has exactly one inode.

What an Inode Stores

+------------------------------------------+
|              INODE #12345                 |
+------------------------------------------+
| File type:       regular file             |
| Permissions:     rw-r--r-- (644)          |
| Owner (UID):     1000                     |
| Group (GID):     1000                     |
| Size:            4096 bytes               |
| Timestamps:                               |
|   - atime:       last accessed            |
|   - mtime:       last modified            |
|   - ctime:       last status change       |
| Link count:      1                        |
| Pointers to data blocks:                  |
|   Block 0 -> disk sector 48392           |
|   Block 1 -> disk sector 48393           |
|   Block 2 -> disk sector 48400           |
+------------------------------------------+

Notice what is not in the inode: the filename. The filename lives in the directory that contains the file. A directory is essentially a table mapping names to inode numbers.

Directory: /home/alice/
+-------------------+--------+
| Filename          | Inode  |
+-------------------+--------+
| .                 | 23001  |
| ..                | 2      |
| notes.txt         | 23042  |
| report.pdf        | 23108  |
| scripts/          | 23200  |
+-------------------+--------+

The stat Command

stat shows you everything the inode contains:

stat /etc/hostname
  File: /etc/hostname
  Size: 7             Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d    Inode: 131073      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-01-15 10:00:00.000000000 +0000
Modify: 2025-01-10 08:30:00.000000000 +0000
Change: 2025-01-10 08:30:00.000000000 +0000
 Birth: 2025-01-10 08:30:00.000000000 +0000

Key fields:

  • Inode: The inode number (131073 in this example)
  • Links: The hard link count (how many directory entries point to this inode)
  • Access/Modify/Change: The three timestamps
    • atime: When the file was last read
    • mtime: When the file's contents were last modified
    • ctime: When the inode's metadata was last changed (permissions, owner, link count)
# See inode numbers alongside filenames
ls -i /etc/ | head -10

# Check inode usage (you can run out of inodes!)
df -i

Think About It: Can you run out of inodes even if you have plenty of disk space? Yes! If you create millions of tiny files (as some mail servers or cache systems do), you might exhaust inodes before disk space. On ext4, the inode count is set at filesystem creation time with mkfs.ext4 -N.


A hard link is an additional directory entry that points to the same inode as an existing file. The hard link and the original file are completely indistinguishable -- they are both equally "real" names for the same data.

# Create a file
echo "I am the original" > /tmp/original.txt
stat /tmp/original.txt | grep -E "Inode|Links"
# Inode: 12345   Links: 1

# Create a hard link
ln /tmp/original.txt /tmp/hardlink.txt
stat /tmp/original.txt | grep -E "Inode|Links"
# Inode: 12345   Links: 2    <-- Link count increased!

stat /tmp/hardlink.txt | grep -E "Inode|Links"
# Inode: 12345   Links: 2    <-- Same inode!

# Both names see the same content
cat /tmp/original.txt
cat /tmp/hardlink.txt

# Modify through one name, see it through the other
echo "modified!" >> /tmp/hardlink.txt
cat /tmp/original.txt
# I am the original
# modified!

Deleting a file (with rm) actually removes a directory entry and decrements the inode's link count. The data is only freed when the link count reaches zero AND no processes have the file open.

# Check link count
stat /tmp/original.txt | grep Links
# Links: 2

# Delete the "original" -- only removes one name
rm /tmp/original.txt

# The hard link still works! Data is intact.
cat /tmp/hardlink.txt
# I am the original
# modified!

stat /tmp/hardlink.txt | grep Links
# Links: 1    <-- Count decreased, but > 0, so data survives
  BEFORE rm:                    AFTER rm:

  Directory entries:            Directory entries:
  "original.txt" --+           (deleted)
                    |---> Inode 12345    "hardlink.txt" ---> Inode 12345
  "hardlink.txt" --+           Links: 1
                   Links: 2             Data blocks: still allocated
                   Data blocks: allocated
  1. Cannot cross filesystems. A hard link must be on the same filesystem as the target because inode numbers are only unique within a filesystem.

  2. Cannot link to directories (for regular users). This prevents circular references in the directory tree. Only the filesystem itself creates hard links to directories (. and ..).

# This will fail:
ln /tmp/somefile /home/somefile    # FAILS if /tmp and /home are on different filesystems

# This will also fail:
ln /tmp/mydir /tmp/mydir-link      # FAILS: can't hard link directories
# ln: /tmp/mydir: hard link not allowed for directory

Think About It: Why would circular directory hard links be catastrophic? Think about what happens to find, du, or any tool that walks the directory tree recursively.


A symbolic link (symlink) is a special file that contains a text path pointing to another file. It is like a shortcut or alias. Unlike hard links, symlinks:

  • Have their own inode
  • Can cross filesystem boundaries
  • Can point to directories
  • Can point to files that do not exist (dangling link)
# Create a symlink
echo "target file content" > /tmp/target.txt
ln -s /tmp/target.txt /tmp/symlink.txt

# Inspect
ls -l /tmp/symlink.txt
# lrwxrwxrwx 1 user user 15 ... /tmp/symlink.txt -> /tmp/target.txt
#  ^                                                   ^^^^^^^^^^^^^^^^
#  'l' = link type                                     target path stored in the link

# Different inodes
ls -li /tmp/target.txt /tmp/symlink.txt
# 12345 -rw-r--r-- 1 user user 20 ... /tmp/target.txt
# 12346 lrwxrwxrwx 1 user user 15 ... /tmp/symlink.txt -> /tmp/target.txt

# Reading through the symlink gives you the target's content
cat /tmp/symlink.txt
# target file content
# Delete the target
rm /tmp/target.txt

# The symlink still exists but points nowhere
ls -l /tmp/symlink.txt
# lrwxrwxrwx 1 user user 15 ... /tmp/symlink.txt -> /tmp/target.txt

cat /tmp/symlink.txt
# cat: /tmp/symlink.txt: No such file or directory

# Find dangling symlinks
find /tmp -xtype l 2>/dev/null
# Symlinks can point to directories (unlike hard links)
ln -s /var/log /tmp/logs
ls /tmp/logs/
# Shows contents of /var/log/

Symlinks are everywhere in a modern Linux system:

# UsrMerge: /bin -> /usr/bin
ls -l /bin

# Alternatives system (Debian/Ubuntu)
ls -l /usr/bin/python3
ls -l /etc/alternatives/editor

# Library versioning
ls -l /usr/lib/x86_64-linux-gnu/libssl* 2>/dev/null || ls -l /usr/lib64/libssl* 2>/dev/null
# libssl.so -> libssl.so.3
# libssl.so.3 -> libssl.so.3.0.0

+---------------------------------------------------------------+
|              Hard Link vs Symbolic Link                       |
+---------------------------------------------------------------+
|                                                                 |
|  Feature              Hard Link        Symbolic Link           |
|  ---------------------------------------------------           |
|  Same inode as target  Yes              No (own inode)         |
|  Cross filesystems     No               Yes                    |
|  Link to directories   No*              Yes                    |
|  Survives target       Yes (data stays) No (dangling link)    |
|    deletion                                                     |
|  File type in ls -l    Same as target   'l' (link)            |
|  Size                  Same as target   Length of path string  |
|  Relative paths        N/A              Relative to link loc  |
|  Performance           Direct (fast)    Extra lookup (tiny)    |
|                                                                 |
|  * root can hard-link directories on some FS, but shouldn't   |
+---------------------------------------------------------------+

Hands-On: Seeing the Difference

mkdir -p /tmp/linklab && cd /tmp/linklab

# Create original file
echo "I am the data" > original.txt

# Create both types of links
ln original.txt hard.txt
ln -s original.txt soft.txt

# Compare inodes
ls -li original.txt hard.txt soft.txt
# original.txt and hard.txt: SAME inode number
# soft.txt: DIFFERENT inode number

# Compare with stat
stat original.txt | grep -E "Inode|Links|Size"
stat hard.txt | grep -E "Inode|Links|Size"
stat soft.txt | grep -E "Inode|Links|Size"

# Delete the original
rm original.txt

# Hard link survives
cat hard.txt
# I am the data

# Soft link is broken
cat soft.txt
# cat: soft.txt: No such file or directory

# The file type
file hard.txt
# hard.txt: ASCII text
file soft.txt
# soft.txt: broken symbolic link to original.txt

The Virtual Filesystem (VFS) Layer

Here is one of the most elegant ideas in Linux kernel design. Linux supports dozens of different filesystems: ext4, XFS, Btrfs, FAT32, NTFS, NFS, procfs, sysfs, tmpfs, and many more. Yet when you type cat /proc/cpuinfo, you use the same cat command you would use on a regular file. When you write to /sys/class/leds/led0/brightness, you use the same echo command. How?

The Virtual Filesystem Switch (VFS) is an abstraction layer in the kernel that provides a uniform interface for all filesystems. User programs never talk to a specific filesystem -- they make VFS system calls (open, read, write, close, stat), and the VFS routes each call to the appropriate filesystem driver.

  +----------------------------------------------+
  |          User Space Applications              |
  |   (cat, ls, cp, vim, python, nginx...)       |
  +----------------------------------------------+
              |  system calls: open, read, write, stat
              v
  +----------------------------------------------+
  |        VFS (Virtual Filesystem Switch)        |
  |  Uniform interface: inodes, dentries, files   |
  +----------------------------------------------+
      |          |          |          |          |
      v          v          v          v          v
  +------+  +------+  +------+  +------+  +------+
  | ext4 |  | XFS  |  | proc |  | sys  |  | NFS  |
  +------+  +------+  +------+  +------+  +------+
      |          |          |          |          |
      v          v          v          v          v
  [disk]    [disk]    [kernel]  [kernel]  [network]

The VFS defines a set of data structures that every filesystem must implement:

  • superblock: Metadata about the entire filesystem (size, block size, state)
  • inode: Metadata about a single file
  • dentry: A directory entry (maps a name to an inode)
  • file: An open file (tracks current position, open flags, etc.)

Each filesystem driver provides its own implementations of operations like "read inode from disk" or "create a new file." The VFS calls the right one.

# You can see all mounted filesystem types
mount | awk '{print $5}' | sort -u

# Or more cleanly
cat /proc/filesystems
# nodev  sysfs
# nodev  tmpfs
# nodev  proc
#        ext4
#        xfs
# nodev  devtmpfs
# ...

# "nodev" means the filesystem doesn't use a block device

Think About It: Why is the VFS architecture so powerful? Think about what it means for userspace programs. A program that works with files does not need to know or care whether it is reading from a local ext4 disk, a network NFS share, or a virtual /proc file. The VFS makes them all look the same.


/proc -- The Process Filesystem

/proc is a virtual filesystem -- none of its files exist on disk. The kernel generates their contents on the fly when you read them. It serves two purposes: exposing process information and providing kernel tuning parameters.

Process Information

Every running process has a directory in /proc named by its PID:

# Your current shell's PID
echo $$

# Explore your own process
ls /proc/$$/

# Key files in a process directory
cat /proc/$$/cmdline | tr '\0' ' '; echo   # Command line
cat /proc/$$/status | head -10              # Status info
cat /proc/$$/environ | tr '\0' '\n' | head -5  # Environment vars
ls -l /proc/$$/fd/                          # Open file descriptors
cat /proc/$$/maps | head -10                # Memory mappings
readlink /proc/$$/exe                       # Path to executable
readlink /proc/$$/cwd                       # Current working directory

System Information

# CPU information
cat /proc/cpuinfo | grep "model name" | uniq

# Memory
cat /proc/meminfo | head -5

# Kernel version
cat /proc/version

# Uptime (in seconds)
cat /proc/uptime

# Load averages
cat /proc/loadavg

# Mounted filesystems (from kernel's perspective)
cat /proc/mounts | head -10

# Command line the kernel was booted with
cat /proc/cmdline

Kernel Tuning via /proc/sys/

The /proc/sys/ directory contains files that map to kernel parameters. You can read and write them to tune system behavior at runtime.

# IP forwarding (routing)
cat /proc/sys/net/ipv4/ip_forward

# Maximum number of open files
cat /proc/sys/fs/file-max

# Maximum number of PIDs
cat /proc/sys/kernel/pid_max

# Hostname
cat /proc/sys/kernel/hostname

# Change a parameter at runtime (requires root)
sudo sh -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'

# The sysctl command provides a friendlier interface
sysctl net.ipv4.ip_forward
sudo sysctl -w net.ipv4.ip_forward=1

/sys -- The Sysfs Filesystem

/sys (sysfs) exports the kernel's view of devices, drivers, buses, and other kernel objects as a filesystem hierarchy. Introduced in Linux 2.6, it replaced much of the hardware information that used to live in /proc.

# Block devices and their attributes
ls /sys/block/
cat /sys/block/sda/size 2>/dev/null    # Size in 512-byte sectors
cat /sys/block/sda/queue/scheduler 2>/dev/null  # I/O scheduler

# Network interfaces
ls /sys/class/net/
cat /sys/class/net/eth0/address 2>/dev/null   # MAC address
cat /sys/class/net/eth0/mtu 2>/dev/null       # MTU
cat /sys/class/net/eth0/operstate 2>/dev/null # up/down

# Power management
ls /sys/power/ 2>/dev/null
cat /sys/power/state 2>/dev/null

# CPU information
ls /sys/devices/system/cpu/
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 2>/dev/null

# Modules (loaded kernel modules)
ls /sys/module/ | head -10

Writing to /sys

Some /sys files are writable, allowing you to change hardware and driver settings:

# Example: Change I/O scheduler for a disk
cat /sys/block/sda/queue/scheduler 2>/dev/null
# [mq-deadline] none
# sudo sh -c 'echo "none" > /sys/block/sda/queue/scheduler'

# Example: Control CPU frequency governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 2>/dev/null

Device Files in /dev

As discussed in Chapter 5, /dev contains device files. Now that you understand inodes, let us look at how device files actually work.

Device files have a major number (identifies the driver) and a minor number (identifies the specific device within that driver):

ls -l /dev/sda /dev/null /dev/tty /dev/zero 2>/dev/null
# brw-rw---- 1 root disk    8,  0 ... /dev/sda   (block, major 8, minor 0)
# crw-rw-rw- 1 root root    1,  3 ... /dev/null  (char, major 1, minor 3)
# crw-rw-rw- 1 root tty     5,  0 ... /dev/tty   (char, major 5, minor 0)
# crw-rw-rw- 1 root root    1,  5 ... /dev/zero  (char, major 1, minor 5)

Block Devices vs Character Devices

Block Devices (b):
  - Accessed in blocks (typically 512 bytes or 4096 bytes)
  - Support random access (seek to any position)
  - Examples: hard drives, SSDs, USB drives
  - /dev/sda, /dev/nvme0n1, /dev/loop0

Character Devices (c):
  - Accessed one character (byte) at a time
  - Usually sequential access (some support seek)
  - Examples: terminals, serial ports, random number generators
  - /dev/tty, /dev/null, /dev/urandom

Special Device Files

# /dev/null -- The black hole. Discard any output.
echo "this goes nowhere" > /dev/null

# /dev/zero -- Infinite stream of zero bytes.
head -c 1024 /dev/zero | xxd | head

# /dev/urandom -- Infinite stream of random bytes.
head -c 32 /dev/urandom | xxd

# /dev/full -- Always reports "disk full" on write.
echo "test" > /dev/full
# bash: echo: write error: No space left on device

# /dev/random -- Random bytes (may block if entropy pool is empty, on older kernels)
head -c 32 /dev/random | xxd

The udev System

Modern Linux uses udev (managed by systemd as systemd-udevd) to dynamically create and manage device files. When you plug in a USB drive, udev detects the hardware event, creates the appropriate /dev entry, and can run rules to set permissions, create symlinks, or trigger scripts.

# View udev rules
ls /etc/udev/rules.d/
ls /usr/lib/udev/rules.d/ | head -20

# Monitor device events in real time (plug/unplug something)
sudo udevadm monitor --property
# (Press Ctrl+C to stop)

# Get device info
sudo udevadm info /dev/sda 2>/dev/null | head -20

mkdir -p /tmp/inode-lab && cd /tmp/inode-lab

# Create a file and check its link count
echo "test data" > file.txt
stat file.txt | grep Links
# Links: 1

# Create hard links
ln file.txt link1.txt
ln file.txt link2.txt
ln file.txt link3.txt
stat file.txt | grep Links
# Links: 4

# All share the same inode
ls -li file.txt link1.txt link2.txt link3.txt
# All show the same inode number

# Delete some links
rm link1.txt link2.txt
stat file.txt | grep Links
# Links: 2

rm link3.txt
stat file.txt | grep Links
# Links: 1  (only the original name remains)

Exercise 2: Why Deleted Files Still Use Space

# Create a large file
dd if=/dev/zero of=/tmp/bigfile bs=1M count=100

# Check space
df -h /tmp

# Open the file in the background (keep a file descriptor open)
sleep 3600 < /tmp/bigfile &
BG_PID=$!

# Delete the file
rm /tmp/bigfile

# Check: ls shows it's gone
ls /tmp/bigfile 2>&1
# No such file or directory

# But space is NOT freed!
df -h /tmp
# (same usage as before)

# Find deleted-but-open files
sudo lsof +L1 2>/dev/null | grep bigfile

# The inode still exists because the process holds a file descriptor
# The link count in the directory is 0, but the kernel count includes open FDs

# Kill the background process
kill $BG_PID

# NOW the space is freed
df -h /tmp

This is exactly the mystery from our opening scenario. Now you know how to diagnose it.

# A directory's link count has a special meaning
mkdir -p /tmp/dir-links/sub1/sub2

# Check the link count of /tmp/dir-links
stat /tmp/dir-links | grep Links
# Links: 3

# Why 3?
#   1. The entry "dir-links" in /tmp
#   2. The "." entry inside /tmp/dir-links itself
#   3. The ".." entry inside /tmp/dir-links/sub1
#
# Formula: link_count = 2 + number_of_immediate_subdirectories

ls -la /tmp/dir-links
# drwxr-xr-x 3 user user ...  .      <-- "." is a hard link to itself
# drwxrwxrwt ... ...           ..     <-- ".." links to parent

# Add another subdirectory
mkdir /tmp/dir-links/sub3
stat /tmp/dir-links | grep Links
# Links: 4  (2 + 2 subdirectories)

Debug This

A user reports: "I created a symbolic link to /opt/app/config.yaml, but the application says the file does not exist, even though I can see it with ls -l."

You check:

$ ls -l /home/alice/config.yaml
lrwxrwxrwx 1 alice alice 25 Jan 15 10:00 /home/alice/config.yaml -> ../opt/app/config.yaml

What is wrong?

Solution

The symlink uses a relative path: ../opt/app/config.yaml. Relative symlink targets are resolved relative to the symlink's location, not the current working directory.

The symlink is at /home/alice/config.yaml. The relative path ../opt/app/config.yaml resolves to /home/opt/app/config.yaml, which does not exist.

The fix is to use an absolute path:

rm /home/alice/config.yaml
ln -s /opt/app/config.yaml /home/alice/config.yaml

Or a correct relative path:

ln -s ../../opt/app/config.yaml /home/alice/config.yaml

Lesson: When in doubt, use absolute paths for symlinks. Relative paths are useful when the entire directory tree might move (e.g., inside a container or chroot), but they are a common source of bugs.


What Just Happened?

+------------------------------------------------------------------+
|                    CHAPTER 8 RECAP                                |
+------------------------------------------------------------------+
|                                                                    |
|  Inodes: The hidden data structure behind every file              |
|    - Stores metadata: permissions, timestamps, data block ptrs   |
|    - Does NOT store the filename (that's in the directory)        |
|    - Use stat to inspect, ls -i for inode numbers, df -i count   |
|                                                                    |
|  Hard Links: Additional name -> same inode                        |
|    - Same inode number, shared data                               |
|    - Survives "deletion" of other names (link count > 0)         |
|    - Cannot cross filesystems, cannot link to directories         |
|                                                                    |
|  Soft (Symbolic) Links: Separate file containing a path          |
|    - Different inode, stores target path as content               |
|    - Can cross filesystems, can link to directories               |
|    - Can become dangling (broken) if target is deleted            |
|                                                                    |
|  VFS: The kernel abstraction making all filesystems uniform       |
|    - ext4, XFS, proc, sys, NFS all behind one interface          |
|    - Userspace uses same syscalls for all filesystem types        |
|                                                                    |
|  /proc: Virtual FS for process info and kernel tuning             |
|  /sys:  Virtual FS for device/driver/bus information              |
|  /dev:  Device files (block, character) managed by udev           |
|                                                                    |
|  Tools: stat, ls -i, ln, ln -s, readlink, lsof +L1, df -i       |
|                                                                    |
+------------------------------------------------------------------+

Try This

Exercises

  1. Inode investigation. Create a file, then create 5 hard links to it. Use stat to verify the link count at each step. Delete the links one by one and confirm the count decreases. At what point is the data actually freed?

  2. Symlink maze. Create a chain of symlinks: a -> b -> c -> d -> real_file. Does reading a work? Now delete c. What happens when you read a? What error message do you get?

  3. The /proc explorer. Write a script that takes a PID as an argument and prints:

    • The command that started the process
    • Its current working directory
    • The number of open file descriptors
    • Its memory usage (from /proc/<PID>/status)
  4. Deleted-but-open files. Create a 50MB file, open it with tail -f in the background, delete the file, and verify with df that the space is not freed. Then kill tail and verify the space is freed. Use lsof +L1 to find the deleted-but-open file before killing the process.

  5. Directory link count formula. Verify that a directory's link count equals 2 + (number of immediate subdirectories). Create a directory with 0, 1, 3, and 5 subdirectories, checking the link count each time.

Bonus Challenge

Write a script called link-analyzer.sh that takes a filename as an argument and reports:

  • Whether it is a regular file, symlink, directory, device, etc.
  • Its inode number
  • Its link count
  • If it is a symlink, the target path (and whether the target exists)
  • If it is a regular file with link count > 1, find all other hard links to the same inode
#!/bin/bash
# link-analyzer.sh -- Analyze links and inodes for a given file
FILE="${1:?Usage: $0 <filename>}"

if [ ! -e "$FILE" ] && [ ! -L "$FILE" ]; then
    echo "Error: '$FILE' does not exist"
    exit 1
fi

echo "=== Link Analysis: $FILE ==="
echo ""

# File type
TYPE=$(stat -c %F "$FILE" 2>/dev/null || stat -f %HT "$FILE" 2>/dev/null)
echo "Type:       $TYPE"

# Inode number
INODE=$(stat -c %i "$FILE" 2>/dev/null)
echo "Inode:      $INODE"

# Link count
LINKS=$(stat -c %h "$FILE" 2>/dev/null)
echo "Link count: $LINKS"

# If symlink, show target
if [ -L "$FILE" ]; then
    TARGET=$(readlink "$FILE")
    echo "Symlink target: $TARGET"
    if [ -e "$FILE" ]; then
        echo "Target status: EXISTS"
        echo "Resolved path: $(readlink -f "$FILE")"
    else
        echo "Target status: BROKEN (dangling symlink)"
    fi
fi

# If regular file with multiple hard links, find siblings
if [ -f "$FILE" ] && [ "$LINKS" -gt 1 ]; then
    echo ""
    echo "--- Other hard links to inode $INODE ---"
    MOUNT=$(df --output=target "$FILE" 2>/dev/null | tail -1)
    if [ -n "$MOUNT" ]; then
        sudo find "$MOUNT" -inum "$INODE" 2>/dev/null | grep -v "^${FILE}$"
    fi
fi

echo ""
echo "--- Full stat output ---"
stat "$FILE"

This completes Part II of the book. You now understand how Linux organizes files on disk (Chapter 5), how access is controlled (Chapter 6), how physical storage is managed (Chapter 7), and the internal mechanisms of inodes, links, and the VFS (Chapter 8). Next, in Part III, we move to the dynamic side of Linux: users, processes, signals, and inter-process communication.

Users, Groups & Access Control

Why This Matters

Imagine you just set up a shared Linux server for your team. Three developers need to deploy code, two interns need read-only access to logs, and one DBA needs access only to the database directory. Without a solid understanding of users, groups, and access control, you will either lock everyone out or -- worse -- give everyone root access and pray nothing breaks.

Every file, every process, every socket on a Linux system is owned by a user and associated with a group. The entire security model of Linux rests on this foundation. When a web server gets compromised, it is the user/group model that determines whether the attacker can read /etc/shadow or pivot to other services. When a junior admin accidentally runs rm -rf /, it is the permission model that decides how much damage actually happens.

This chapter gives you the complete picture: how Linux identifies users, how groups organize access, how sudo elevates privileges safely, and how PAM ties authentication together behind the scenes.

Try This Right Now

Open a terminal and run these commands to see your own identity:

# Who am I?
whoami

# Full identity -- UID, GID, and all groups
id

# All users on this system (just usernames)
cut -d: -f1 /etc/passwd

# All groups on this system
cut -d: -f1 /etc/group

# Which groups do I belong to?
groups

You will see output like:

$ id
uid=1000(alice) gid=1000(alice) groups=1000(alice),27(sudo),999(docker)

That single line tells you everything the kernel needs to make access decisions about you.


Understanding UIDs and GIDs

Every user on a Linux system is identified by a User ID (UID) -- a number, not a name. The username is just a human-friendly label mapped to that number in /etc/passwd. The kernel does not care about "alice"; it cares about UID 1000.

Similarly, every group has a Group ID (GID).

The UID Landscape

+----------------------------------------------------+
|  UID Range         |  Purpose                       |
|--------------------|--------------------------------|
|  0                 |  root (superuser)              |
|  1 - 999           |  System/service accounts       |
|  1000 - 60000      |  Regular (human) users         |
|  65534             |  nobody (unmapped/overflow)     |
+----------------------------------------------------+

Distro Note: The boundary between system and regular UIDs varies. Debian/Ubuntu typically use 1000+ for regular users. RHEL/Fedora also use 1000+. Older systems may use 500+. Check /etc/login.defs for the UID_MIN and UID_MAX values on your system.

Why Does This Matter?

System accounts (UID 1-999) exist to run services. The www-data user runs your web server, mysql runs your database. These accounts typically have:

  • No valid login shell (set to /usr/sbin/nologin or /bin/false)
  • No home directory (or a non-standard one)
  • No password

This is a deliberate security design: even if someone exploits your web server, they land in a restricted account that cannot log in interactively.

# See the shell assigned to www-data (if it exists)
grep www-data /etc/passwd

# See root's entry
grep ^root /etc/passwd

The Three Identity Files

Linux stores user and group information in three critical files. Let us examine each.

/etc/passwd -- The User Database

Despite its name, this file does NOT contain passwords (not anymore). It is world-readable and contains one line per user:

# Look at your own entry
grep "^$(whoami):" /etc/passwd

The format is seven colon-separated fields:

username:x:UID:GID:comment:home_directory:shell

Example:

alice:x:1000:1000:Alice Johnson:/home/alice:/bin/bash
FieldValueMeaning
usernamealiceLogin name
passwordxPassword is in /etc/shadow
UID1000Numeric user ID
GID1000Primary group ID
commentAlice JohnsonFull name / description (GECOS field)
home/home/aliceHome directory
shell/bin/bashDefault login shell

The x in the password field means "look in /etc/shadow instead." In ancient Unix, the encrypted password was stored right here in this world-readable file. That was a terrible idea.

/etc/shadow -- The Password Vault

This file stores the actual password hashes and is readable only by root:

# This will fail as a regular user
cat /etc/shadow

# This works
sudo cat /etc/shadow

Format:

username:$algorithm$salt$hash:last_changed:min:max:warn:inactive:expire:reserved

Example:

alice:$6$rANd0mSaLt$kQ8Xp...long_hash...:19500:0:99999:7:::

The password hash format tells you the algorithm:

  • $1$ -- MD5 (ancient, insecure)
  • $5$ -- SHA-256
  • $6$ -- SHA-512 (most common today)
  • $y$ -- yescrypt (newer, used by Debian 12+, Fedora 35+)
FieldMeaning
last_changedDays since Jan 1, 1970 that password was last changed
minMinimum days before password can be changed
maxMaximum days before password must be changed
warnDays before expiry to warn the user
inactiveDays after expiry before account is disabled
expireDate the account expires (days since epoch)

Think About It: Why is the password hash stored in a separate file from /etc/passwd? What would happen if any user on the system could read everyone's password hashes?

/etc/group -- The Group Database

# See all groups
cat /etc/group

# See groups you belong to
grep "$(whoami)" /etc/group

Format:

groupname:x:GID:member1,member2,member3

Example:

sudo:x:27:alice,bob
docker:x:999:alice
developers:x:1001:alice,bob,charlie

A user has one primary group (from /etc/passwd, GID field) and zero or more supplementary groups (listed in /etc/group). When you create a file, it is owned by your primary group by default.


Hands-On: Managing Users

Creating Users

# Create a new user with defaults
sudo useradd testuser1

# Create a user with all the options you typically want
sudo useradd -m -s /bin/bash -c "Test User Two" testuser2

The flags explained:

  • -m -- Create home directory
  • -s /bin/bash -- Set login shell
  • -c "Test User Two" -- Set the GECOS comment field

Distro Note: On Debian/Ubuntu, useradd does NOT create a home directory by default -- you need -m. On RHEL/Fedora, useradd creates a home directory by default. To be safe, always use -m explicitly.

# Verify the user was created
grep testuser2 /etc/passwd
id testuser2
ls -la /home/testuser2/

Setting Passwords

# Set a password for the new user
sudo passwd testuser2

You will be prompted to enter the password twice. The passwd command handles all the hashing and writes to /etc/shadow.

# Check the shadow entry (see that a hash now exists)
sudo grep testuser2 /etc/shadow

Creating System Users

System users are for running services, not for humans to log in:

# Create a system user for a hypothetical app
sudo useradd --system --no-create-home --shell /usr/sbin/nologin myapp

# Verify -- note the low UID
id myapp
grep myapp /etc/passwd

Modifying Users

# Change a user's shell
sudo usermod -s /bin/zsh testuser2

# Add a user to a supplementary group (KEEP existing groups with -a)
sudo usermod -aG sudo testuser2

# Change the user's comment
sudo usermod -c "Test Account" testuser2

# Lock an account (disable login without deleting)
sudo usermod -L testuser2

# Unlock an account
sudo usermod -U testuser2

WARNING: Using usermod -G WITHOUT -a will REMOVE the user from all other supplementary groups. Always use -aG to append to existing groups. This is one of the most common and dangerous mistakes in user management.

Deleting Users

# Delete user but keep their home directory
sudo userdel testuser1

# Delete user AND their home directory and mail spool
sudo userdel -r testuser2

WARNING: userdel -r permanently removes the user's home directory. In production, consider locking the account instead and archiving the home directory first.


Hands-On: Managing Groups

# Create a new group
sudo groupadd developers

# Create a group with a specific GID
sudo groupadd -g 2000 ops

# Add existing users to the group
sudo usermod -aG developers alice

# See group membership
getent group developers

# Remove a user from a group (use gpasswd)
sudo gpasswd -d alice developers

# Delete a group
sudo groupdel ops

Primary vs. Supplementary Groups

+--------------------------------------------------+
|  User: alice                                     |
|                                                  |
|  Primary Group: alice (GID 1000)                 |
|    - Assigned in /etc/passwd                     |
|    - Default group for new files                 |
|                                                  |
|  Supplementary Groups:                           |
|    - sudo (GID 27)     -- can use sudo           |
|    - docker (GID 999)  -- can use Docker          |
|    - developers (GID 1001)                       |
+--------------------------------------------------+

When alice creates a file, it is owned by alice:alice (user:primary_group). If she wants new files to be owned by the developers group, she can:

# Temporarily change primary group for this session
newgrp developers

# Now create a file -- it belongs to 'developers' group
touch /tmp/team-file
ls -l /tmp/team-file

Think About It: A developer says "I added myself to the docker group but I still get permission denied." What is the most likely cause? (Hint: group membership changes require a fresh login session.)


su vs. sudo: Elevating Privileges

su -- Switch User

su lets you become another user entirely. You need that user's password:

# Become root (need root's password)
su -

# Become another user
su - alice

# Run a single command as another user
su -c "whoami" alice

The - (or -l) flag is important: it gives you a full login shell with that user's environment. Without it, you keep your current environment, which can cause confusing path and variable issues.

sudo -- Superuser Do

sudo lets you run commands as root (or another user) using YOUR OWN password, if you are authorized:

# Run a single command as root
sudo apt update

# Open a root shell
sudo -i

# Run a command as a different user
sudo -u postgres psql

# See what you are allowed to do
sudo -l

su vs. sudo: When to Use Which

+-----------------------------------------------------------+
|  Feature          |  su                |  sudo              |
|-------------------|--------------------|---------------------|
|  Password needed  |  Target user's     |  Your own           |
|  Granularity      |  All or nothing    |  Per-command control |
|  Audit trail      |  Minimal           |  Full logging        |
|  Best for         |  Switching users   |  Admin tasks         |
+-----------------------------------------------------------+

In modern Linux administration, sudo is preferred almost universally. It provides better auditing (every sudo command is logged), does not require sharing the root password, and allows fine-grained control over what each user can do.


The sudoers File and visudo

The /etc/sudoers file controls who can use sudo and what they can do. NEVER edit it directly with a text editor -- always use visudo:

# Open sudoers for editing (uses your default EDITOR)
sudo visudo

visudo checks syntax before saving. A syntax error in sudoers can lock you out of sudo entirely.

sudoers Syntax

The basic format is:

who    where=(as_whom)    what

Examples:

# alice can run any command as any user on any host
alice   ALL=(ALL:ALL)   ALL

# bob can only restart nginx
bob     ALL=(root)      /usr/bin/systemctl restart nginx

# The %sudo group can run anything (the % means "group")
%sudo   ALL=(ALL:ALL)   ALL

# ops group can run any command without a password
%ops    ALL=(ALL)       NOPASSWD: ALL

# charlie can run specific commands without a password
charlie ALL=(root)      NOPASSWD: /usr/bin/systemctl restart myapp, /usr/bin/journalctl -u myapp

Drop-in Files

Rather than editing the main sudoers file, you can create files in /etc/sudoers.d/:

# Create a drop-in file for a team
sudo visudo -f /etc/sudoers.d/developers

Add content like:

%developers ALL=(root) /usr/bin/systemctl restart myapp, /usr/bin/tail -f /var/log/myapp/*
# Make sure the include directive exists in sudoers
sudo grep includedir /etc/sudoers
# You should see: @includedir /etc/sudoers.d

Distro Note: Debian/Ubuntu ship with the @includedir /etc/sudoers.d directive enabled by default. On RHEL/CentOS, it is also present but uses the older #includedir syntax (the # is NOT a comment here -- this is one of the strangest syntax decisions in Unix history).


Debug This: sudo Permission Problems

A user reports they cannot use sudo despite being in the sudo group:

# The user runs:
sudo whoami
# Output: alice is not in the sudoers file. This incident will be reported.

Diagnosis steps:

# 1. Check if the user is actually in the sudo group
id alice
# Look for "sudo" or "wheel" in the groups list

# 2. Did they log out and back in after being added?
# Group changes require a new session!

# 3. Check sudoers configuration
sudo visudo
# Look for: %sudo ALL=(ALL:ALL) ALL
# On RHEL: %wheel ALL=(ALL) ALL

# 4. Check for syntax errors in drop-in files
sudo visudo -c
# This checks ALL sudoers files for syntax errors

# 5. Check the log for details
sudo grep sudo /var/log/auth.log        # Debian/Ubuntu
sudo grep sudo /var/log/secure           # RHEL/Fedora

Distro Note: The admin group is called sudo on Debian/Ubuntu and wheel on RHEL/Fedora/Arch. The name goes back to the 1970s: "big wheel" was slang for someone with power.


PAM: Pluggable Authentication Modules

PAM is the framework that sits behind every authentication event on Linux. When you type your password at the login screen, when you su to another user, when you sudo -- PAM is handling it.

How PAM Works

+-----------------+     +--------+     +------------------+
|  Application    |---->|  PAM   |---->|  PAM Modules     |
|  (login, sudo,  |     |  Core  |     |  pam_unix.so     |
|   ssh, su)      |     |        |     |  pam_ldap.so     |
+-----------------+     +--------+     |  pam_google...so  |
                                       |  pam_limits.so   |
                                       +------------------+

Applications do not handle passwords themselves. They ask PAM, and PAM consults a chain of modules. This means you can change how authentication works (add two-factor, switch to LDAP) without modifying any application.

PAM Configuration

PAM config files live in /etc/pam.d/, one per service:

# See all PAM-aware services
ls /etc/pam.d/

# Look at the sudo PAM config
cat /etc/pam.d/sudo

A PAM config line has four fields:

type    control    module    [arguments]
  • type: auth (verify identity), account (check access), password (change password), session (setup/teardown)
  • control: required, requisite, sufficient, optional
  • module: The shared library (e.g., pam_unix.so)

Example (/etc/pam.d/sudo):

auth    required    pam_unix.so
account required    pam_unix.so
session required    pam_limits.so

You do not need to master PAM internals right now, but understanding that it exists and how it is structured will help immensely when you need to configure LDAP authentication, set up two-factor auth, or debug "why can't this user log in?"


Hands-On: Practical Scenarios

Scenario 1: Onboarding a New Developer

# Create the user
sudo useradd -m -s /bin/bash -c "Dave Chen" dchen

# Set an initial password
sudo passwd dchen

# Add to the developers group
sudo usermod -aG developers dchen

# Give limited sudo access
sudo visudo -f /etc/sudoers.d/developers
# Add: %developers ALL=(root) /usr/bin/systemctl restart myapp

# Verify everything
id dchen
sudo -l -U dchen

Scenario 2: Offboarding an Employee

# Lock the account immediately
sudo usermod -L jsmith

# Kill any running processes
sudo pkill -u jsmith

# Expire the account
sudo usermod --expiredate 1 jsmith

# Archive home directory before deletion
sudo tar czf /backups/jsmith-home-$(date +%Y%m%d).tar.gz /home/jsmith

# Remove the user and home directory
sudo userdel -r jsmith

# Check for orphaned files anywhere on the system
sudo find / -nouser -o -nogroup 2>/dev/null

Scenario 3: Restricting a Contractor to a Specific Directory

# Create the user with a restricted shell
sudo useradd -m -s /bin/rbash contractor1
sudo passwd contractor1

# Create a limited PATH
sudo mkdir /home/contractor1/bin
sudo ln -s /usr/bin/ls /home/contractor1/bin/
sudo ln -s /usr/bin/cat /home/contractor1/bin/

# Set the restricted PATH in their .bashrc
echo 'export PATH=$HOME/bin' | sudo tee /home/contractor1/.bashrc
sudo chown contractor1:contractor1 /home/contractor1/.bashrc
sudo chmod 444 /home/contractor1/.bashrc

Password Policies and Account Aging

# See password aging info for a user
sudo chage -l alice

# Force password change on next login
sudo chage -d 0 alice

# Set password to expire in 90 days
sudo chage -M 90 alice

# Set minimum 7 days between password changes
sudo chage -m 7 alice

# Set account to expire on a specific date
sudo chage -E 2025-12-31 contractor1

You can set system-wide defaults in /etc/login.defs:

grep -E "^(PASS_MAX_DAYS|PASS_MIN_DAYS|PASS_WARN_AGE|UID_MIN|UID_MAX)" /etc/login.defs

What Just Happened?

+------------------------------------------------------------------+
|  Chapter 9 Recap: Users, Groups & Access Control                 |
|------------------------------------------------------------------|
|                                                                  |
|  - Every user has a UID; every group has a GID.                  |
|  - UIDs 0-999 are system accounts; 1000+ are regular users.      |
|  - /etc/passwd stores user info (not passwords!).                |
|  - /etc/shadow stores password hashes (root-readable only).      |
|  - /etc/group maps group names to GIDs and members.              |
|  - useradd, usermod, userdel manage users.                       |
|  - groupadd, gpasswd manage groups.                              |
|  - sudo is preferred over su for administrative tasks.           |
|  - sudoers controls who can sudo; always edit with visudo.       |
|  - PAM handles all authentication behind the scenes.             |
|  - Group changes require logging out and back in.                |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: User Audit

Run awk -F: '$3 >= 1000 && $3 < 65534 {print $1, $3, $7}' /etc/passwd to list all regular users, their UIDs, and shells. Are there any with /bin/false or /usr/sbin/nologin? Why might a regular-range UID have a non-login shell?

Exercise 2: Group Design

Design a group structure for a team of 5 developers, 2 QA engineers, and 1 DBA. What groups would you create? Which sudo permissions would each group need? Write the sudoers rules.

Exercise 3: Shadow File Analysis

Run sudo awk -F: '{print $1, $3, $5}' /etc/shadow to see usernames, last-changed dates, and max days. Calculate the actual date each password was last changed (hint: the number is days since January 1, 1970 -- use date -d "1970-01-01 + N days").

Exercise 4: Lock and Investigate

Create a test user, set a password, lock the account with usermod -L, then examine /etc/shadow. What character was added to the password hash? Unlock it and check again.

Bonus Challenge

Write a bash script that takes a username as an argument and outputs a full "identity report": username, UID, GID, primary group name, all supplementary groups, home directory, shell, password status (locked/unlocked/no password), last password change, and account expiry date. Use only standard commands (id, passwd, chage, grep, awk).

Processes & Job Control

Why This Matters

You SSH into a production server and the monitoring alert says CPU usage is at 98%. Something is eating all your compute. Or maybe you started a long database migration over SSH and your connection dropped -- is the migration still running? Did it die? How do you reconnect to it?

Every command you type, every service running in the background, every daemon handling network requests -- they are all processes. Understanding how processes work, how to inspect them, how to control them, and how to keep them running when you walk away from the terminal is fundamental to Linux administration.

This chapter takes you from "what is a process?" to confidently managing foreground and background jobs, diagnosing runaway processes, and understanding the process lifecycle from birth to death.

Try This Right Now

# How many processes are running on your system right now?
ps aux | wc -l

# What is YOUR current shell's process ID?
echo $$

# What is your shell's parent process?
echo $PPID

# See the process tree rooted at PID 1
pstree -p 1 | head -30

# What is consuming the most CPU right now?
ps aux --sort=-%cpu | head -5

You will see that even a "quiet" Linux system has dozens or hundreds of processes running. Every one of them has a story.


What Is a Process?

A process is a running instance of a program. The program /usr/bin/bash is a file sitting on disk. When you launch a terminal, the kernel loads that program into memory, assigns it resources, and creates a process. If you open three terminals, you have three separate bash processes, each with its own memory, its own variables, its own PID.

Key Attributes of Every Process

+------------------------------------------+
|  Process Attributes                      |
|------------------------------------------|
|  PID    - Process ID (unique number)     |
|  PPID   - Parent Process ID             |
|  UID    - User who owns it              |
|  State  - Running, Sleeping, etc.       |
|  Nice   - Priority value                |
|  Memory - How much RAM it uses          |
|  CPU    - How much CPU time it uses     |
|  TTY    - Terminal it is attached to    |
|  CMD    - The command that started it   |
+------------------------------------------+

How Processes Are Born: fork() and exec()

When you type ls in your shell, here is what actually happens:

  Your bash shell (PID 1234)
       |
       | 1. fork() -- creates a COPY of itself
       |
       v
  Child bash (PID 5678)     <-- exact clone of parent
       |
       | 2. exec() -- replaces itself with the 'ls' program
       |
       v
  ls process (PID 5678)     <-- now running ls code
       |
       | 3. ls runs, outputs results
       |
       | 4. exit() -- process terminates
       |
       v
  Parent bash (PID 1234) collects exit status
  (the child is now gone)

This fork-then-exec pattern is how nearly every process on Linux is created. The very first process, PID 1 (init or systemd), is started by the kernel. Every other process is a descendant of PID 1.

# See the full process tree to verify this
pstree -p

Think About It: If every process is created by forking, and PID 1 is the ancestor of all, what happens to a process if its parent dies before it does? (We will answer this when we discuss zombie and orphan processes.)


Inspecting Processes with ps

ps is the workhorse command for viewing processes. It has two major syntax styles (because Unix history is messy):

BSD Style (No Dashes)

# The "classic" incantation -- show ALL processes with details
ps aux

Output columns:

USER       PID  %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1   0.0  0.1 169516 13200 ?        Ss   Feb10   0:07 /sbin/init
alice     1234   0.0  0.0  10056  3456 pts/0    Ss   09:15   0:00 -bash
alice     5678  45.2  2.1 987654 56789 pts/0    R+   09:30   5:12 python train.py
ColumnMeaning
USERProcess owner
PIDProcess ID
%CPUCPU percentage
%MEMMemory percentage
VSZVirtual memory size (KB)
RSSResident Set Size -- actual physical memory (KB)
TTYTerminal (? means no terminal -- a daemon)
STATProcess state (see below)
STARTWhen the process started
TIMETotal CPU time consumed
COMMANDThe command line

System V Style (With Dashes)

# Full-format listing
ps -ef

# Full-format with thread info
ps -eLf
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Feb10 ?        00:00:07 /sbin/init
alice     1234  1100  0 09:15 pts/0    00:00:00 -bash

Useful ps Recipes

# Find a specific process
ps aux | grep nginx

# Better: use pgrep (no grep-in-grep problem)
pgrep -a nginx

# Show process tree
ps auxf

# Show specific columns
ps -eo pid,ppid,user,%cpu,%mem,stat,cmd --sort=-%mem | head -20

# Show all processes by a specific user
ps -u alice

# Show process with specific PID
ps -p 1234 -o pid,ppid,stat,cmd

Process States

Every process is in one of several states. The STAT column in ps shows this:

+-------+------------------+------------------------------------------+
| Code  | State            | Meaning                                  |
+-------+------------------+------------------------------------------+
|  R    | Running          | Actively using CPU or ready to run        |
|  S    | Sleeping         | Waiting for an event (I/O, signal, etc.) |
|  D    | Uninterruptible  | Waiting for I/O (cannot be killed!)      |
|       | Sleep            |                                          |
|  Z    | Zombie           | Finished but parent hasn't collected      |
|       |                  | its exit status                          |
|  T    | Stopped          | Suspended (e.g., Ctrl+Z)                 |
|  t    | Traced           | Stopped by a debugger                    |
|  I    | Idle             | Kernel thread, idle                      |
+-------+------------------+------------------------------------------+

Additional modifiers appear after the main state:

+-------+----------------------------------------------+
| Mod   | Meaning                                      |
+-------+----------------------------------------------+
|  s    | Session leader                               |
|  +    | In the foreground process group               |
|  l    | Multi-threaded                                |
|  <    | High priority (negative nice value)           |
|  N    | Low priority (positive nice value)            |
+-------+----------------------------------------------+

So Ss means "sleeping, session leader." R+ means "running, foreground." Ssl means "sleeping, session leader, multi-threaded."

The Dangerous D State

A process in state D (uninterruptible sleep) cannot be killed, not even with SIGKILL. It is waiting for I/O to complete -- usually disk or NFS. If you see many processes stuck in D state, you likely have a storage problem (dead NFS mount, failing disk, etc.).

# Find processes in D state
ps aux | awk '$8 ~ /D/'

Zombie Processes

A zombie (Z) is not actually running. It is a dead process whose entry still sits in the process table because its parent has not called wait() to collect its exit status. Zombies consume almost no resources (just a PID table entry), but a large number of them indicates a buggy parent process.

# Find zombie processes
ps aux | awk '$8 == "Z"'

# Or
ps -eo pid,ppid,stat,cmd | grep Z

The fix for zombies is to fix or restart their parent process. Killing a zombie does nothing -- it is already dead.


Real-Time Monitoring with top and htop

top

# Launch top
top

Key top commands while running:

KeyAction
PSort by CPU
MSort by memory
kKill a process (enter PID)
rRenice a process
1Show individual CPU cores
cToggle full command path
fChoose which fields to display
qQuit

htop (Better Interactive Viewer)

# Install if not present
sudo apt install htop    # Debian/Ubuntu
sudo dnf install htop    # RHEL/Fedora

# Launch
htop

htop provides:

  • Color-coded CPU/memory bars
  • Mouse support
  • Horizontal scrolling for long commands
  • Tree view (press F5)
  • Easy filtering (press F4)
  • Easy process killing (press F9)

Distro Note: htop is not installed by default on most distributions. It is available in the standard repositories for all major distros. Install it -- you will use it constantly.


Exploring /proc/PID/

Every running process has a directory under /proc/ named after its PID. This is a virtual filesystem -- the kernel generates the contents on the fly.

# Explore your own shell's process info
ls /proc/$$/

# What command started this process?
cat /proc/$$/cmdline | tr '\0' ' '; echo

# What is its current working directory?
ls -l /proc/$$/cwd

# What is its executable?
ls -l /proc/$$/exe

# What environment variables does it have?
cat /proc/$$/environ | tr '\0' '\n' | head -10

# What files does it have open?
ls -l /proc/$$/fd/

# Its memory map
cat /proc/$$/maps | head -10

# Process status summary
cat /proc/$$/status | head -20

The status file is especially useful:

cat /proc/$$/status

Key fields:

  • Name: -- process name
  • State: -- current state
  • Pid: / PPid: -- PID and parent PID
  • Uid: / Gid: -- real, effective, saved, filesystem UIDs
  • VmRSS: -- resident memory size
  • Threads: -- number of threads
  • voluntary_ctxt_switches: -- context switches

Think About It: The /proc/$$/fd/ directory shows all open file descriptors. File descriptor 0 is stdin, 1 is stdout, 2 is stderr. What do the symlinks point to when you are working in a terminal? What would they point to for a daemon process?


Foreground and Background Jobs

Running Commands in the Background

Add & to run a command in the background:

# Start a long-running command in the background
sleep 300 &
# Output: [1] 12345
# [1] is the job number, 12345 is the PID

Managing Jobs

# List all background jobs in this shell
jobs
# Output: [1]+  Running    sleep 300 &

# Bring a background job to the foreground
fg %1

# Suspend a foreground job (Ctrl+Z)
# Then see it in jobs list
jobs
# Output: [1]+  Stopped    sleep 300

# Resume it in the background
bg %1

# Resume it in the foreground
fg %1

The Job Control Workflow

                    Ctrl+Z
  FOREGROUND  ──────────────>  STOPPED
      ^                          |
      |                          |
    fg %N                      bg %N
      |                          |
      |                          v
      +──────── BACKGROUND <─────+
                   (& at start)

Practical Example

# Start a large file copy in the foreground
cp -r /data/bigdir /backup/bigdir

# Oh wait, this is taking forever. Suspend it:
# Press Ctrl+Z
# Output: [1]+  Stopped    cp -r /data/bigdir /backup/bigdir

# Resume it in the background so you can do other work
bg %1
# Output: [1]+ cp -r /data/bigdir /backup/bigdir &

# Now you can keep using the terminal
# Check on the job periodically
jobs

Keeping Processes Alive: nohup and disown

When you log out of a terminal, the shell sends SIGHUP to all its child processes. This kills them. That is fine for interactive commands, but terrible for long-running tasks.

nohup -- Plan Ahead

If you know beforehand that a command should survive logout:

# nohup redirects output to nohup.out and ignores SIGHUP
nohup python3 long_training.py &

# Or redirect output yourself
nohup ./backup_script.sh > /var/log/backup.log 2>&1 &

disown -- After the Fact

You already started a job and forgot nohup? Use disown:

# Start a long job
python3 train_model.py &
# Output: [1] 34567

# Remove it from the shell's job table
disown %1

# Now you can log out safely -- the process will keep running
# disown all background jobs
disown -a

# disown and also suppress SIGHUP
disown -h %1

Which to Use?

+-------------------------------------------------------------+
|  Situation                        |  Use                     |
|-----------------------------------|--------------------------|
|  Starting a new long-running job  |  nohup command &         |
|  Already running, forgot nohup    |  Ctrl+Z, bg, disown      |
|  Need to reconnect to output      |  Use tmux or screen       |
+-------------------------------------------------------------+

Think About It: Neither nohup nor disown lets you reconnect to the process output later. What tool from Chapter 26 (tmux) solves this problem? Why is tmux or screen often the better choice from the start?


Process Priority: nice and renice

Linux uses a priority system to decide how much CPU time each process gets. The "nice" value ranges from -20 (highest priority, least nice to other processes) to 19 (lowest priority, most nice):

  -20 ─────────── 0 ─────────── 19
  Highest priority    Default    Lowest priority
  (least nice)                   (most nice)

Setting Priority at Launch

# Run a CPU-heavy command at low priority
nice -n 10 make -j$(nproc)

# Run at high priority (requires root)
sudo nice -n -5 ./critical-task

Changing Priority of Running Processes

# Find the PID
pgrep -a my_script

# Lower its priority (any user can increase niceness)
renice 15 -p 12345

# Raise its priority (requires root)
sudo renice -10 -p 12345

# Renice all processes by a user
sudo renice 10 -u testuser

Practical Example

# You need to compile a project but don't want it to slow down
# the web server running on the same machine
nice -n 19 make -j$(nproc)

# The backup is running too fast and choking disk I/O
# Lower its priority
renice 15 -p $(pgrep rsync)

Killing Processes

The kill Command

Despite its name, kill actually sends signals. We will cover signals in depth in Chapter 11, but here are the essentials:

# Send SIGTERM (graceful termination) -- this is the default
kill 12345

# Send SIGKILL (forceful termination -- cannot be caught)
kill -9 12345
# Or equivalently:
kill -KILL 12345

# Send SIGTERM to all processes with a given name
killall nginx

# Same, with pattern matching
pkill -f "python.*train"

# Kill all processes owned by a user
pkill -u testuser

WARNING: kill -9 should be your LAST resort. It does not give the process a chance to clean up -- temporary files may be left behind, database connections may not be closed properly, data may be corrupted. Always try kill (SIGTERM) first, wait a few seconds, and only use kill -9 if the process refuses to die.

The Escalation Ladder

# Step 1: Ask nicely (SIGTERM)
kill 12345

# Step 2: Wait a moment
sleep 5

# Step 3: Check if it is still running
ps -p 12345

# Step 4: Only if still running, force kill
kill -9 12345

Debug This: Diagnosing a High-CPU Process

Your monitoring says the system load is 12 on a 4-core machine. Diagnose it:

# Step 1: See system load averages
uptime
# Output: load average: 12.05, 11.30, 9.45

# Step 2: What is consuming CPU?
ps aux --sort=-%cpu | head -10

# Step 3: Is it a single process or many?
# If one process shows 400% CPU, it is multi-threaded
# If many processes show 100%, you have too many competing

# Step 4: Investigate the top consumer
# Say PID 5678 is at 350% CPU
cat /proc/5678/status | grep -E "Name|State|Threads|Pid|PPid"

# Step 5: What files does it have open?
ls -l /proc/5678/fd/ | head -20

# Step 6: What is it actually doing? (trace its system calls)
sudo strace -p 5678 -c
# Press Ctrl+C after a few seconds to see a summary

# Step 7: What started it? Check full command line
cat /proc/5678/cmdline | tr '\0' ' '; echo

# Step 8: If it is runaway garbage, kill it gracefully
kill 5678
sleep 3
# If still alive:
kill -9 5678

Hands-On: Process Exploration Lab

# 1. Create some processes to observe
sleep 600 &
sleep 601 &
sleep 602 &

# 2. See them in your job list
jobs -l

# 3. See them in ps
ps aux | grep sleep

# 4. Look at their parent PID -- it should be your shell
echo "My shell PID: $$"
ps -p $(pgrep -f "sleep 60[0-2]") -o pid,ppid,stat,cmd

# 5. Suspend one
kill -STOP $(pgrep -f "sleep 600")
# Check its state changed to T
ps -p $(pgrep -f "sleep 600") -o pid,stat,cmd

# 6. Resume it
kill -CONT $(pgrep -f "sleep 600")
# Check its state is back to S
ps -p $(pgrep -f "sleep 600") -o pid,stat,cmd

# 7. Create a "zombie" (for educational purposes)
# This script forks a child that exits, but parent doesn't wait()
bash -c '(exit 0) & sleep 2; ps aux | grep -E "Z|defunct"' &

# 8. Clean up
kill $(pgrep -f "sleep 60[0-2]") 2>/dev/null

What Just Happened?

+------------------------------------------------------------------+
|  Chapter 10 Recap: Processes & Job Control                       |
|------------------------------------------------------------------|
|                                                                  |
|  - A process is a running instance of a program.                 |
|  - Every process has a PID and a parent (PPID).                  |
|  - Processes are created via fork()+exec().                      |
|  - PID 1 (init/systemd) is the ancestor of all processes.       |
|  - ps aux and ps -ef show process listings.                      |
|  - Process states: R (running), S (sleeping), D (disk wait),    |
|    Z (zombie), T (stopped).                                     |
|  - Use & to background, Ctrl+Z to suspend, bg/fg to resume.    |
|  - nohup and disown keep processes alive after logout.           |
|  - nice/renice control scheduling priority (-20 to 19).         |
|  - kill sends signals; always try SIGTERM before SIGKILL.       |
|  - /proc/PID/ is a goldmine of per-process information.         |
|  - top/htop provide real-time monitoring.                        |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Process Genealogy

Run pstree -p $$ to see the process tree from your shell to PID 1. Count how many generations separate your shell from PID 1. What are the intermediate processes?

Exercise 2: Resource Hog

Write a command that consumes CPU (e.g., yes > /dev/null &). Launch three of them with different nice values (0, 10, 19). Use top to observe how CPU time is distributed. Which one gets the most? Kill them all when done.

Exercise 3: Job Control Mastery

Start five sleep 1000 commands in the background. Use jobs to list them. Bring the third one to the foreground, suspend it with Ctrl+Z, then resume it in the background. Kill the second one by job number (kill %2). Verify with jobs.

Exercise 4: /proc Deep Dive

Pick any running process and explore its /proc/PID/ directory. Read status, cmdline, environ, maps, and fd/. Write a one-paragraph summary of what this process is doing, based solely on what you found in /proc.

Bonus Challenge

Write a bash script called proc-report.sh that takes a PID as an argument and outputs: the command name, its parent's command name, its state, memory usage (RSS), number of open file descriptors, and how long it has been running. All information should come from /proc/PID/ and related files.

Signals Deep Dive

Why This Matters

You press Ctrl+C and a program stops. You close a terminal and background jobs die. You run sudo systemctl reload nginx and Nginx picks up its new configuration without dropping a single connection. You type kill -9 and a stubborn process vanishes.

All of these work through signals -- the kernel's mechanism for delivering notifications to processes. Signals are how Linux says "hey, something just happened that you need to deal with." They are the nervous system of process management, and understanding them is the difference between blindly typing kill -9 and surgically managing processes like a professional.

This chapter covers what signals are, which ones matter most, how to trap them in your own scripts, and how production services use them for graceful reloads and shutdowns.

Try This Right Now

# See all available signals on your system
kill -l

# Start a process and send it different signals
sleep 300 &
PID=$!
echo "Started sleep with PID $PID"

# Send SIGTERM (signal 15) -- the polite termination
kill $PID
# The process is gone

# Start another
sleep 300 &
PID=$!

# Send SIGSTOP (pause the process)
kill -STOP $PID
ps -p $PID -o pid,stat,cmd
# State should show 'T' (stopped)

# Resume it
kill -CONT $PID
ps -p $PID -o pid,stat,cmd
# State should be back to 'S' (sleeping)

# Clean up
kill $PID

What Are Signals?

A signal is an asynchronous notification sent to a process. Think of it like tapping someone on the shoulder -- the process was doing something else, and now it has to respond.

Signals can come from:

  • The kernel -- hardware faults (SIGSEGV), timer expired, child process died
  • Another process -- via the kill system call (despite the name, it sends any signal)
  • The terminal -- key combinations like Ctrl+C, Ctrl+Z, Ctrl+\
  • The process itself -- a program can send signals to itself

When a signal arrives, the process can:

  1. Handle it -- run a custom signal handler function
  2. Ignore it -- pretend it never happened
  3. Let the default action occur -- each signal has a defined default

Two signals are special: SIGKILL (9) and SIGSTOP (19) can NEVER be caught, handled, or ignored. The kernel enforces them directly.


The Signal Table: Signals You Need to Know

+--------+----------+---------+-----------------------------------------------+
| Number | Name     | Default | Purpose                                       |
+--------+----------+---------+-----------------------------------------------+
|   1    | SIGHUP   | Term    | Terminal hangup; daemons: reload config        |
|   2    | SIGINT   | Term    | Interrupt from keyboard (Ctrl+C)               |
|   3    | SIGQUIT  | Core    | Quit from keyboard (Ctrl+\), produces core dump|
|   6    | SIGABRT  | Core    | Abort signal from abort() call                 |
|   9    | SIGKILL  | Term    | Force kill (CANNOT be caught or ignored)       |
|  10    | SIGUSR1  | Term    | User-defined signal 1                          |
|  11    | SIGSEGV  | Core    | Invalid memory reference (segfault)            |
|  12    | SIGUSR2  | Term    | User-defined signal 2                          |
|  13    | SIGPIPE  | Term    | Broken pipe (write to pipe with no reader)     |
|  14    | SIGALRM  | Term    | Timer alarm from alarm()                       |
|  15    | SIGTERM  | Term    | Graceful termination request                   |
|  17    | SIGCHLD  | Ignore  | Child process stopped or terminated            |
|  18    | SIGCONT  | Cont    | Resume if stopped                              |
|  19    | SIGSTOP  | Stop    | Pause process (CANNOT be caught or ignored)    |
|  20    | SIGTSTP  | Stop    | Stop from terminal (Ctrl+Z)                    |
+--------+----------+---------+-----------------------------------------------+

Distro Note: Signal numbers can differ between architectures (e.g., SIGUSR1 is 10 on x86 but 16 on MIPS). Always use signal names, not numbers, in scripts and commands for portability. Write kill -TERM instead of kill -15.

Default Actions Explained

  • Term -- Terminate the process
  • Core -- Terminate and produce a core dump file (for debugging)
  • Stop -- Pause (suspend) the process
  • Cont -- Resume a stopped process
  • Ignore -- Do nothing

Signals in Detail

SIGTERM (15) -- The Gentleman's Termination

This is the default signal sent by kill. It is a polite request: "please shut down." Well-written programs catch SIGTERM, clean up resources (close files, release locks, flush buffers), and exit cleanly.

# These are equivalent:
kill 12345
kill -15 12345
kill -TERM 12345
kill -SIGTERM 12345

SIGKILL (9) -- The Executioner

SIGKILL terminates a process immediately. The process gets no chance to handle it, no chance to clean up. The kernel simply removes it.

kill -9 12345
kill -KILL 12345

When to use SIGKILL:

  • The process is not responding to SIGTERM
  • The process is stuck and must be removed
  • You are dealing with a compromised process

When NOT to use SIGKILL:

  • As your first attempt (always try SIGTERM first)
  • On database processes (risk of data corruption)
  • On processes holding locks (locks may become stale)

WARNING: kill -9 is the sledgehammer. It leaves temporary files behind, does not close database connections cleanly, does not release file locks, and does not flush write buffers. Use it only after SIGTERM has failed.

SIGINT (2) -- The Keyboard Interrupt

This is what Ctrl+C sends. Programs can catch it to clean up gracefully:

# Start a process
sleep 300
# Press Ctrl+C
# The process receives SIGINT and terminates

SIGQUIT (3) -- Quit with Core Dump

Ctrl+\ sends SIGQUIT. Like SIGINT, but also produces a core dump for debugging:

# Start a process
sleep 300
# Press Ctrl+\
# Output: Quit (core dumped)

SIGHUP (1) -- Hang Up / Reload

Originally meant "the terminal connection was lost" (the modem hung up). Today it has two common uses:

  1. Terminal hangup: When you close a terminal, SIGHUP is sent to all processes in that terminal
  2. Daemon reload: By convention, sending SIGHUP to a daemon tells it to re-read its configuration file without restarting
# Reload Nginx configuration
sudo kill -HUP $(cat /run/nginx.pid)
# Or equivalently:
sudo systemctl reload nginx  # This sends SIGHUP under the hood

# Reload sshd
sudo kill -HUP $(pgrep -x sshd | head -1)

SIGSTOP (19) and SIGCONT (18) -- Pause and Resume

SIGSTOP pauses a process. SIGCONT resumes it. SIGSTOP cannot be caught -- this is how debuggers freeze processes.

# Pause a process
kill -STOP 12345

# Resume it
kill -CONT 12345

The terminal equivalent of SIGSTOP is SIGTSTP (20), sent by Ctrl+Z. Unlike SIGSTOP, SIGTSTP CAN be caught and handled.

SIGUSR1 (10) and SIGUSR2 (12) -- User Defined

These have no predefined meaning. Programs define what they do:

# Example: dd reports progress on SIGUSR1
dd if=/dev/urandom of=/tmp/testfile bs=1M count=1000 &
DD_PID=$!

# Send SIGUSR1 to get a progress report
kill -USR1 $DD_PID
# dd outputs bytes transferred so far

# Clean up
kill $DD_PID
rm -f /tmp/testfile

Other examples:

  • Nginx uses SIGUSR1 to reopen log files (log rotation)
  • Some programs use SIGUSR2 to toggle debug mode

SIGPIPE (13) -- Broken Pipe

Sent when a process writes to a pipe but the reading end has been closed:

# This triggers SIGPIPE:
yes | head -1
# 'yes' keeps writing, but 'head' exits after 1 line
# The kernel sends SIGPIPE to 'yes'

SIGCHLD (17) -- Child Status Changed

Sent to a parent when a child process stops, continues, or terminates. The parent should call wait() to collect the child's exit status. If it doesn't, the child becomes a zombie.

Think About It: When you run systemctl reload nginx, Nginx re-reads its config and applies it without dropping connections. How is this possible? (Hint: Nginx's master process catches SIGHUP, re-reads the config, spawns new worker processes with the new config, and gracefully shuts down old workers.)


Sending Signals: kill, killall, pkill

kill -- By PID

# Send SIGTERM (default)
kill 12345

# Send a specific signal
kill -HUP 12345
kill -SIGUSR1 12345

# Send to multiple PIDs
kill 12345 12346 12347

# Send signal 0 -- checks if process exists (sends nothing)
kill -0 12345
echo $?  # 0 = exists, 1 = doesn't exist

killall -- By Name

# Kill all processes named "python3"
killall python3

# Send SIGHUP to all nginx processes
killall -HUP nginx

# Interactive mode -- confirm before each kill
killall -i python3

# Only kill processes owned by a specific user
killall -u alice python3

pkill -- By Pattern

# Kill processes matching a pattern
pkill -f "python train"

# Send SIGHUP to processes matching a pattern
pkill -HUP -f "gunicorn"

# Kill processes owned by a user
pkill -u testuser

# Kill the oldest matching process
pkill -o -f "worker"

# Kill the newest matching process
pkill -n -f "worker"

WARNING: Be very careful with killall and pkill. killall python3 kills ALL python3 processes, not just yours (if you are root). Always verify what will be matched first with pgrep:

# See what pkill would match BEFORE actually killing
pgrep -a -f "python train"

Trapping Signals in Bash

The trap builtin lets your scripts catch signals and run custom code in response. This is how you write scripts that clean up after themselves:

Basic Trap Syntax

trap 'commands' SIGNAL_LIST

Example: Cleanup on Exit

#!/bin/bash
TMPFILE=$(mktemp)
echo "Working in $TMPFILE"

# Clean up on exit, interrupt, or termination
trap 'echo "Cleaning up..."; rm -f "$TMPFILE"; exit' EXIT INT TERM

# Do some work
echo "data" > "$TMPFILE"
sleep 30

echo "Done"

Example: Graceful Shutdown Script

#!/bin/bash

RUNNING=true

cleanup() {
    echo "Received shutdown signal. Cleaning up..."
    RUNNING=false
    # Close database connections, flush caches, etc.
    echo "Cleanup complete. Exiting."
    exit 0
}

trap cleanup SIGTERM SIGINT

echo "Service started. PID: $$"

# Main loop
while $RUNNING; do
    # Do work here
    echo "Working... $(date)"
    sleep 5
done

Test it:

# Terminal 1:
bash graceful_shutdown.sh

# Terminal 2:
kill $(pgrep -f graceful_shutdown)
# Watch Terminal 1 -- it should print cleanup messages

Example: Ignore a Signal

#!/bin/bash
# Ignore SIGINT (Ctrl+C) -- use with caution!
trap '' SIGINT

echo "You cannot Ctrl+C me! (Use Ctrl+\ or kill from another terminal)"
sleep 60

Example: Trap on SIGHUP for Config Reload

#!/bin/bash
CONFIG_FILE="/etc/myapp/config.conf"

load_config() {
    echo "Loading configuration from $CONFIG_FILE..."
    # In a real script, you would source the config file here
    # source "$CONFIG_FILE"
    echo "Configuration reloaded at $(date)"
}

trap load_config SIGHUP

load_config  # Initial load

echo "Running as PID $$. Send SIGHUP to reload config."

while true; do
    # Main application loop
    sleep 10
done

Test it:

# Terminal 1:
bash config_daemon.sh

# Terminal 2:
kill -HUP $(pgrep -f config_daemon)
# Watch Terminal 1 -- it should print "Configuration reloaded"

Trap Gotchas

# List all active traps
trap -p

# Reset a trap to default behavior
trap - SIGINT

# The EXIT trap fires on ANY exit (normal, error, signal)
trap 'echo "Goodbye"' EXIT

# Traps in subshells: subshells inherit traps but
# modifications in subshells do not affect the parent
(trap 'echo "sub"' SIGINT)
# The parent's SIGINT trap is unchanged

Think About It: If you trap SIGTERM in a script but your script calls kill -9 $$, will the trap run? Why or why not?


How Daemons Handle SIGHUP: A Deep Look

The SIGHUP reload pattern is one of the most important signal conventions in Linux. Here is how it typically works with a real service like Nginx:

                    SIGHUP
                      |
                      v
  +-----------------------------------+
  |  Nginx Master Process (PID 100)   |
  |  1. Catches SIGHUP                |
  |  2. Re-reads nginx.conf           |
  |  3. Validates new configuration   |
  |  4. If valid:                     |
  |     a. Spawns new worker processes|
  |     b. Signals old workers to     |
  |        finish current requests    |
  |     c. Old workers exit after     |
  |        completing in-flight work  |
  |  5. If invalid:                   |
  |     a. Logs error                 |
  |     b. Keeps running with old     |
  |        configuration              |
  +-----------------------------------+
         |                    |
         v                    v
  +-------------+     +-------------+
  | Old Worker  |     | New Worker  |
  | (finishing  |     | (new config)|
  |  requests)  |     |             |
  +-------------+     +-------------+

This is why systemctl reload nginx does not cause downtime -- existing connections are served to completion by old workers while new connections go to new workers with the updated config.

Services and Their Signal Conventions

ServiceSIGHUPSIGUSR1SIGUSR2
NginxReload configReopen logsUpgrade binary
ApacheGraceful restartReopen logsGraceful restart
PostgreSQLReload config----
sshdReload config----
rsyslogReload config----
# The systemctl commands that use signals under the hood:
sudo systemctl reload nginx     # Sends SIGHUP
sudo systemctl stop nginx       # Sends SIGTERM, then SIGKILL after timeout
sudo systemctl kill nginx       # Sends SIGTERM by default
sudo systemctl kill -s HUP nginx  # Sends SIGHUP explicitly

Signal Handling Best Practices

For Scripts

  1. Always trap EXIT for cleanup: Temporary files, lock files, PID files -- clean them up.
trap 'rm -f /tmp/myapp.lock' EXIT
  1. Use named signals, not numbers: kill -TERM is portable; kill -15 may not be.

  2. Do not trap SIGKILL or SIGSTOP: You cannot. If you try, the trap is silently ignored.

  3. Keep trap handlers short: A signal can arrive at any time. Your handler should do minimal work -- set a flag, then let the main loop check the flag.

# Good: set a flag
SHUTDOWN=false
trap 'SHUTDOWN=true' SIGTERM

while ! $SHUTDOWN; do
    do_work
done
cleanup
  1. Propagate signals to child processes in scripts:
trap 'kill 0' SIGTERM SIGINT
# 'kill 0' sends the signal to the entire process group

For System Administration

  1. Always try SIGTERM before SIGKILL: Give processes time to clean up.

  2. Use systemctl for managed services: Let systemd handle signal delivery.

  3. Use SIGHUP for config reloads: Check if the service supports it first (man service_name).

  4. Use kill -0 to check if a process is alive:

if kill -0 "$PID" 2>/dev/null; then
    echo "Process $PID is still running"
else
    echo "Process $PID is gone"
fi

Debug This: A Script That Won't Die

You have a script running that ignores Ctrl+C:

#!/bin/bash
trap '' SIGINT SIGTERM
echo "I am invincible! PID: $$"
while true; do sleep 1; done

How do you stop it?

Diagnosis:

# Ctrl+C won't work (SIGINT is trapped and ignored)
# kill PID won't work (SIGTERM is trapped and ignored)

# Option 1: SIGKILL -- cannot be trapped
kill -9 $(pgrep -f "I am invincible")

# Option 2: SIGQUIT -- they forgot to trap it
kill -QUIT $(pgrep -f "I am invincible")

# Option 3: SIGSTOP -- pause it, then SIGKILL
kill -STOP $(pgrep -f "I am invincible")
kill -9 $(pgrep -f "I am invincible")

The lesson: trapping SIGINT and SIGTERM can be useful for cleanup, but ignoring them entirely (empty trap handler) is an anti-pattern. Always do something useful in the handler, and always exit afterward.


Hands-On: Signal Playground

Create a script that demonstrates signal handling:

#!/bin/bash
# signal_playground.sh

echo "Signal Playground - PID: $$"
echo "Send me signals and watch what happens!"
echo ""

trap 'echo "[$(date +%T)] Caught SIGHUP -- reloading config..."' HUP
trap 'echo "[$(date +%T)] Caught SIGUSR1 -- toggling debug mode"' USR1
trap 'echo "[$(date +%T)] Caught SIGUSR2 -- dumping status"' USR2
trap 'echo "[$(date +%T)] Caught SIGINT -- Ctrl+C pressed"; exit 0' INT
trap 'echo "[$(date +%T)] Caught SIGTERM -- shutting down gracefully"; exit 0' TERM
trap 'echo "[$(date +%T)] EXIT trap -- final cleanup done"' EXIT

echo "Waiting for signals... (try these in another terminal)"
echo "  kill -HUP $$"
echo "  kill -USR1 $$"
echo "  kill -USR2 $$"
echo "  kill -INT $$     (or press Ctrl+C)"
echo "  kill -TERM $$"
echo ""

while true; do
    sleep 1
done

Run it and experiment from another terminal:

# Terminal 1:
bash signal_playground.sh

# Terminal 2:
PID=<the PID shown>
kill -HUP $PID
kill -USR1 $PID
kill -USR2 $PID
kill -TERM $PID

Signals and Process Groups

When you press Ctrl+C, SIGINT is sent not just to one process but to the entire foreground process group. This is why a pipeline like cat file | grep pattern | sort is killed entirely by one Ctrl+C.

# See process group IDs
ps -eo pid,pgid,sid,cmd | head -20

# Send a signal to an entire process group
kill -TERM -12345   # Note the negative PID -- means "process group 12345"

This is also how kill 0 works in a trap handler -- it sends the signal to every process in the current process group.


What Just Happened?

+------------------------------------------------------------------+
|  Chapter 11 Recap: Signals Deep Dive                             |
|------------------------------------------------------------------|
|                                                                  |
|  - Signals are asynchronous notifications to processes.          |
|  - SIGTERM (15): graceful termination request (default kill).    |
|  - SIGKILL (9): force kill, CANNOT be caught -- last resort.    |
|  - SIGINT (2): keyboard interrupt (Ctrl+C).                     |
|  - SIGHUP (1): hangup / reload config for daemons.              |
|  - SIGSTOP/SIGCONT: pause and resume (SIGSTOP can't be caught). |
|  - SIGUSR1/SIGUSR2: user-defined, application-specific.         |
|  - trap in bash lets scripts handle signals.                     |
|  - Always trap EXIT for cleanup of temp files and locks.         |
|  - Daemons use SIGHUP to reload config without downtime.        |
|  - Use signal names (SIGTERM) not numbers (15) for portability. |
|  - Try SIGTERM first; only SIGKILL as last resort.              |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Signal Identification

Run kill -l and identify the signal number for: SIGHUP, SIGINT, SIGTERM, SIGKILL, SIGUSR1, SIGCHLD, SIGSTOP, SIGCONT. Then write the keyboard shortcut that sends SIGINT, SIGQUIT, and SIGTSTP.

Exercise 2: Trap Practice

Write a script that creates three temporary files on startup and removes them all when it receives SIGINT or SIGTERM, or when it exits normally. Test it by killing it with different signals and verifying the temp files are gone.

Exercise 3: Process Group Experiment

Start a pipeline: sleep 100 | sleep 200 | sleep 300 &. Find the process group ID for all three processes (use ps -eo pid,pgid,cmd | grep sleep). Send SIGTERM to the process group using kill -TERM -PGID. Verify all three are gone.

Exercise 4: SIGHUP Reload

Write a simple script that reads a "config file" (just a text file with a key=value pair) on startup and re-reads it whenever it receives SIGHUP. Test it by changing the config file and sending SIGHUP.

Bonus Challenge

Write a bash script that acts as a simple process monitor. It takes a command as arguments, runs it, and if the command dies unexpectedly (exit code not 0), it restarts it automatically. Use trap to handle SIGCHLD or simply loop with wait. The monitor itself should handle SIGTERM gracefully by forwarding it to the child process and then exiting.

Inter-Process Communication

Why This Matters

You type cat access.log | grep "404" | sort | uniq -c | sort -rn | head -10 and instantly get the top 10 most common 404 URLs from a log file. Five separate programs just cooperated seamlessly to produce that result. How?

Or consider this: your web browser talks to a local caching proxy. Your application connects to a PostgreSQL database running on the same machine. Your container runtime communicates with its daemon. None of these use the network -- they all use Inter-Process Communication (IPC) mechanisms built into the kernel.

Linux provides multiple IPC mechanisms, each suited to different scenarios. This chapter covers the ones you will encounter daily -- pipes, named pipes, redirections, process substitution -- and introduces the ones you need to know about for deeper work: shared memory, Unix domain sockets, and message queues.

Try This Right Now

# A pipeline: three processes communicating through pipes
echo "hello world" | tr 'a-z' 'A-Z' | rev
# Output: DLROW OLLEH

# How many processes were involved?
# Three: echo, tr, rev -- all connected by pipes

# See a pipe in action with /proc
sleep 100 | sleep 200 &
ls -l /proc/$(pgrep -n "sleep 100")/fd/
# fd/1 (stdout) will be a pipe
ls -l /proc/$(pgrep -n "sleep 200")/fd/
# fd/0 (stdin) will be a pipe

# Clean up
kill %1 2>/dev/null

Pipes: The Unix Superpower

The pipe (|) is the single most important IPC mechanism in Unix. It connects the standard output of one process to the standard input of the next:

+---------+    pipe    +---------+    pipe    +---------+
| Process |  stdout-->| Process |  stdout-->| Process |
|    A    |    stdin<--|    B    |    stdin<--|    C    |
+---------+           +---------+           +---------+

How Pipes Work Internally

When the shell sees cmd1 | cmd2, it:

  1. Creates a pipe (a small kernel buffer, typically 64KB on Linux)
  2. Forks two child processes
  3. Connects cmd1's stdout (fd 1) to the write end of the pipe
  4. Connects cmd2's stdin (fd 0) to the read end of the pipe
  5. cmd1 writes data into the pipe; cmd2 reads data from the pipe
          Kernel Pipe Buffer (64KB)
         +------------------------+
cmd1 --> | data flows this way -> | --> cmd2
  fd 1   +------------------------+   fd 0
 (write)                             (read)

Key characteristics:

  • Pipes are unidirectional -- data flows one way only
  • Pipes are anonymous -- they exist only while the processes are running
  • If the pipe buffer is full, the writer blocks until the reader consumes data
  • If the reader exits, the writer gets SIGPIPE
  • Pipes connect processes that share a common ancestor (usually the shell)

Pipeline Examples

# Count lines in a file
cat /etc/passwd | wc -l
# Better: wc -l < /etc/passwd (no useless 'cat')

# Find the 10 largest files in /var/log
du -sh /var/log/* 2>/dev/null | sort -rh | head -10

# Show unique shells used on the system
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -rn

# Monitor a log file and filter for errors
tail -f /var/log/syslog | grep --line-buffered "error"

# Count processes per user
ps aux | awk '{print $1}' | sort | uniq -c | sort -rn

# Generate a random password
cat /dev/urandom | tr -dc 'A-Za-z0-9!@#$' | head -c 20; echo

Pipeline Exit Status

By default, the shell reports the exit status of the LAST command in a pipeline:

false | true
echo $?
# Output: 0 (true's exit code, not false's)

To get the exit status of every command in the pipeline:

# Enable pipefail -- the pipeline fails if ANY command fails
set -o pipefail

false | true
echo $?
# Output: 1 (false's exit code)

# Or check individual statuses with PIPESTATUS (bash-specific)
cat nonexistent_file 2>/dev/null | sort | head
echo "${PIPESTATUS[@]}"
# Output: 1 0 0

Think About It: Why does grep pattern file | wc -l work but might give the wrong answer if grep fails? How does set -o pipefail help in scripts?


Redirection: Controlling File Descriptors

Every process starts with three open file descriptors:

+-----+--------+-------------------+
| FD  | Name   | Default           |
+-----+--------+-------------------+
|  0  | stdin  | Keyboard / terminal|
|  1  | stdout | Terminal screen    |
|  2  | stderr | Terminal screen    |
+-----+--------+-------------------+

Redirection lets you change where these point.

Output Redirection

# Redirect stdout to a file (overwrites)
echo "hello" > output.txt

# Redirect stdout to a file (appends)
echo "world" >> output.txt

# Redirect stderr to a file
ls /nonexistent 2> errors.txt

# Redirect stderr to the same place as stdout
ls /nonexistent /tmp 2>&1

# Redirect both stdout and stderr to a file
ls /nonexistent /tmp > all_output.txt 2>&1
# Or the modern shorthand (bash 4+):
ls /nonexistent /tmp &> all_output.txt

# Append both stdout and stderr
command &>> logfile.txt

The Order Matters

This is a classic gotcha:

# WRONG: stderr goes to terminal, not the file
ls /nonexistent /tmp 2>&1 > output.txt
# Why? 2>&1 duplicates fd 2 to where fd 1 currently points (terminal)
# Then > output.txt redirects fd 1 to the file
# So fd 2 still points to the terminal!

# RIGHT: redirect stdout first, then stderr to the same place
ls /nonexistent /tmp > output.txt 2>&1
# > output.txt redirects fd 1 to the file
# 2>&1 duplicates fd 2 to where fd 1 now points (the file)

Input Redirection

# Read from a file instead of the keyboard
sort < unsorted_list.txt

# Here document -- inline multi-line input
cat << 'EOF'
This is line one.
This is line two.
Variables like $HOME are NOT expanded (single-quoted delimiter).
EOF

# Here document with variable expansion
cat << EOF
Your home directory is $HOME
Your shell is $SHELL
EOF

# Here string -- single-line input
grep "pattern" <<< "search in this string"

/dev/null -- The Black Hole

/dev/null is a special file that discards everything written to it and produces nothing when read:

# Discard stdout (keep only errors)
find / -name "*.conf" > /dev/null

# Discard stderr (keep only normal output)
find / -name "*.conf" 2>/dev/null

# Discard everything
command > /dev/null 2>&1

# Check if a command succeeds without seeing output
if grep -q "pattern" file.txt 2>/dev/null; then
    echo "Found it"
fi

Redirecting to Multiple Places with tee

tee reads stdin and writes to both stdout AND a file:

# See output AND save it to a file
make 2>&1 | tee build.log

# Append instead of overwrite
df -h | tee -a disk_report.txt

# Write to multiple files
echo "log entry" | tee file1.log file2.log file3.log
                   +-----------> Terminal (stdout)
                   |
stdin --> [ tee ] -+
                   |
                   +-----------> file.log

Named Pipes (FIFOs)

Regular pipes are anonymous -- they exist only within a pipeline. Named pipes (FIFOs) are visible in the filesystem and can connect unrelated processes:

# Create a named pipe
mkfifo /tmp/mypipe

# Check it
ls -l /tmp/mypipe
# prw-r--r-- 1 alice alice 0 ... /tmp/mypipe
# The 'p' at the start means "pipe"

Using Named Pipes

Named pipes are blocking: a writer blocks until a reader opens the pipe, and vice versa. You need two terminals (or background processes):

# Terminal 1: Write to the pipe (this will block until someone reads)
echo "Hello from Terminal 1" > /tmp/mypipe

# Terminal 2: Read from the pipe
cat < /tmp/mypipe
# Output: Hello from Terminal 1
# Both commands complete

Practical Use Case: Log Processing Pipeline

# Create a named pipe for log processing
mkfifo /tmp/log_pipe

# Terminal 1: Tail a log into the pipe
tail -f /var/log/syslog > /tmp/log_pipe 2>/dev/null &

# Terminal 2: Process logs from the pipe
cat /tmp/log_pipe | grep --line-buffered "error" | while read line; do
    echo "[ALERT] $line"
done

# Clean up
rm /tmp/log_pipe

Practical Use Case: Parallel Processing

# Create two named pipes
mkfifo /tmp/pipe_a /tmp/pipe_b

# Split processing: compress and checksum simultaneously
tee /tmp/pipe_a < largefile.bin | md5sum > checksum.txt &
gzip < /tmp/pipe_a > largefile.bin.gz &
wait

# Clean up
rm /tmp/pipe_a /tmp/pipe_b

Think About It: What happens if you open a named pipe for writing but no process ever opens it for reading? How is this different from writing to a regular file?


Process Substitution

Process substitution lets you use a process's output (or input) as if it were a file. This is a bash feature (not POSIX sh).

Output Process Substitution: <(command)

The <(command) syntax runs command and makes its output available as a file path:

# Compare the output of two commands
diff <(ls /dir1) <(ls /dir2)

# Compare sorted lists without creating temporary files
diff <(sort file1.txt) <(sort file2.txt)

# What is the "file"?
echo <(echo hello)
# Output: /dev/fd/63 (or similar -- it's a file descriptor)

This is incredibly powerful because many commands expect file arguments, not piped input:

# paste needs two files -- use process substitution
paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f7 /etc/passwd)

# Feed two data streams to a command that expects files
join <(sort file1.txt) <(sort file2.txt)

# Load data from a command into a while loop without subshell issues
while read -r user shell; do
    echo "User $user uses $shell"
done < <(awk -F: '{print $1, $7}' /etc/passwd)

Input Process Substitution: >(command)

The >(command) syntax creates a file path that feeds into a command's stdin:

# Write to two destinations simultaneously
echo "log entry" | tee >(gzip > compressed.gz) >(wc -c > byte_count.txt)

# Send output to both a file and a log processor
some_command > >(tee output.log) 2> >(tee error.log >&2)

Why Not Just Use Pipes?

Process substitution solves problems that pipes cannot:

# Problem: compare output of two commands
# With pipes -- impossible (pipes are linear, not branching)
# With process substitution:
diff <(find /dir1 -type f | sort) <(find /dir2 -type f | sort)

# Problem: the while-read-pipe subshell issue
# This LOSES the variable outside the loop:
count=0
cat file.txt | while read line; do
    count=$((count + 1))
done
echo "$count"  # Prints 0! The while loop ran in a subshell

# Process substitution avoids the subshell:
count=0
while read line; do
    count=$((count + 1))
done < <(cat file.txt)
echo "$count"  # Prints the correct count

Shared Memory Overview

Shared memory is the fastest IPC mechanism. Two or more processes map the same region of physical memory into their address spaces:

  Process A                     Process B
  +----------+                 +----------+
  | Address  |                 | Address  |
  | Space    |                 | Space    |
  |          |                 |          |
  | Shared   |---+         +---| Shared   |
  | Region   |   |         |   | Region   |
  +----------+   |         |   +----------+
                  v         v
            +------------------+
            | Physical Memory  |
            | (shared segment) |
            +------------------+

There is no copying of data -- both processes read and write to the same memory. This makes it extremely fast but also means you need synchronization (mutexes, semaphores) to prevent data corruption.

POSIX Shared Memory

# List existing shared memory segments
ls /dev/shm/

# See System V shared memory segments
ipcs -m

# See all IPC resources (shared memory, semaphores, message queues)
ipcs -a

Shared memory is commonly used by:

  • Database systems (PostgreSQL shared buffers)
  • Web servers (shared worker state)
  • Audio/video processing (passing frames between processes)
  • tmpfs mounts (/dev/shm is a tmpfs)
# /dev/shm is a tmpfs mount -- a RAM-based filesystem
df -h /dev/shm
mount | grep shm

# You can use it for fast temporary storage
echo "fast data" > /dev/shm/temp_data
# But remember: it vanishes on reboot!

Distro Note: The default size of /dev/shm varies. Debian/Ubuntu defaults to 50% of RAM. RHEL/Fedora also defaults to 50% of RAM. You can resize it: sudo mount -o remount,size=2G /dev/shm.


Unix Domain Sockets

Unix domain sockets are like network sockets but for local communication only. They are faster than TCP/IP sockets (no network stack overhead) and support both stream and datagram modes.

# Find Unix domain sockets on your system
ss -xl  # or: ss -x

# Or look in common locations
ls -l /var/run/*.sock 2>/dev/null
ls -l /run/*.sock 2>/dev/null
ls -l /tmp/*.sock 2>/dev/null

Common examples you will encounter:

# Docker daemon socket
ls -l /var/run/docker.sock

# MySQL/MariaDB socket
ls -l /var/run/mysqld/mysqld.sock

# PostgreSQL socket
ls -l /var/run/postgresql/.s.PGSQL.5432

# systemd-journald socket
ls -l /run/systemd/journal/socket

# D-Bus system socket
ls -l /run/dbus/system_bus_socket

How They Differ from Pipes

+---------------------+------------------+--------------------+
| Feature             | Pipes            | Unix Sockets       |
+---------------------+------------------+--------------------+
| Direction           | Unidirectional   | Bidirectional      |
| Connections         | One-to-one       | Many-to-one        |
| Related processes   | Required (pipes) | Not required       |
| needed?             | Optional (FIFO)  |                    |
| File on disk        | No (pipes)       | Yes (socket file)  |
|                     | Yes (FIFOs)      |                    |
| Protocol support    | Byte stream only | Stream or datagram |
| Permissions         | Via fd inheritance| Via file permissions|
+---------------------+------------------+--------------------+

Practical Example: Communicating with Docker via Socket

# Docker CLI talks to dockerd via Unix socket
# You can do the same with curl:
sudo curl --unix-socket /var/run/docker.sock http://localhost/version
# Returns JSON with Docker version info

# List containers via the socket API
sudo curl --unix-socket /var/run/docker.sock http://localhost/containers/json

Creating a Simple Unix Domain Socket (with socat)

# Install socat if not present
sudo apt install socat    # Debian/Ubuntu
sudo dnf install socat    # RHEL/Fedora

# Terminal 1: Create a socket server
socat UNIX-LISTEN:/tmp/test.sock,fork EXEC:/bin/cat

# Terminal 2: Connect and send data
echo "Hello, socket!" | socat - UNIX-CONNECT:/tmp/test.sock

# Clean up
rm -f /tmp/test.sock

Message Queues Overview

Message queues allow processes to exchange discrete messages through a kernel-maintained queue. Unlike pipes (byte streams), messages maintain their boundaries.

  Process A                              Process B
  +----------+                          +----------+
  |  Send    |---> +----------------+ -->| Receive  |
  |  msg 1   |     | Kernel Message |   |  msg 1   |
  |  msg 2   |---> |    Queue       | -->|  msg 2   |
  |  msg 3   |---> |                | -->|  msg 3   |
  +----------+     +----------------+   +----------+

Key characteristics:

  • Messages have types and priorities
  • Messages maintain boundaries (unlike pipes, which are byte streams)
  • The queue persists until explicitly removed (survives process death)
  • Kernel enforces queue size limits

Viewing Message Queues

# List POSIX message queues
ls /dev/mqueue/ 2>/dev/null

# List System V message queues
ipcs -q

# Show all IPC objects with details
ipcs -a

Linux supports both System V IPC (older: shmget, msgget, semget) and POSIX IPC (newer: shm_open, mq_open, sem_open). New code should prefer POSIX. Use ipcs / ipcrm to manage System V resources, and browse /dev/shm / /dev/mqueue for POSIX resources.


Debug This: Mystery Broken Pipe

A user reports that their script fails with "Broken pipe" errors:

#!/bin/bash
generate_report | head -5
echo "Report generated successfully"

The script works but prints write error: Broken pipe to stderr.

Diagnosis:

generate_report produces many lines of output. head -5 reads only 5 lines and then exits, closing the read end of the pipe. When generate_report tries to write the next line, the kernel sends it SIGPIPE and it dies.

Solutions:

# Option 1: Suppress the error message
generate_report 2>/dev/null | head -5

# Option 2: Redirect stderr to /dev/null for just that command
{ generate_report 2>/dev/null; } | head -5

# Option 3: Trap SIGPIPE (if you control the script doing the writing)
trap '' SIGPIPE

# Option 4: In many cases, this is harmless -- the exit code
# of the pipeline is 0 (head succeeded), and the SIGPIPE
# is just the kernel cleaning up efficiently

Think About It: Is a broken pipe actually an error? Or is it the kernel's efficient way of saying "nobody is listening, so stop wasting effort"?


Hands-On: Building an IPC Pipeline

Let us build a practical log processing system using different IPC mechanisms:

# Step 1: Create a named pipe for log ingestion
mkfifo /tmp/log_pipe

# Step 2: Write a log producer (simulates an application logging)
for i in $(seq 1 100); do
    echo "$(date '+%Y-%m-%d %H:%M:%S') [$(shuf -e INFO WARN ERROR -n1)] Message $i"
    sleep 0.1
done > /tmp/log_pipe &
PRODUCER_PID=$!

# Step 3: Process the logs -- count by severity
cat /tmp/log_pipe | tee \
    >(grep "ERROR" >> /tmp/errors.log) \
    >(grep "WARN" >> /tmp/warnings.log) \
    > /tmp/all.log

# Step 4: Wait for the producer to finish
wait $PRODUCER_PID 2>/dev/null

# Step 5: Check results
echo "=== Error count ==="
wc -l < /tmp/errors.log
echo "=== Warning count ==="
wc -l < /tmp/warnings.log
echo "=== Total messages ==="
wc -l < /tmp/all.log

# Step 6: Compare error and warning files
diff <(cut -d']' -f2 /tmp/errors.log | sort) \
     <(cut -d']' -f2 /tmp/warnings.log | sort) | head -20

# Clean up
rm -f /tmp/log_pipe /tmp/errors.log /tmp/warnings.log /tmp/all.log

IPC Mechanism Selection Guide

+-------------------+-------------+----------+----------+-----------+
| Mechanism         | Direction   | Speed    | Related  | Best For  |
|                   |             |          | Procs?   |           |
+-------------------+-------------+----------+----------+-----------+
| Pipe (|)          | One-way     | Fast     | Yes      | Pipelines |
| Named Pipe (FIFO) | One-way    | Fast     | No       | Producer- |
|                   |             |          |          | consumer  |
| Unix Socket       | Two-way     | Fast     | No       | Client-   |
|                   |             |          |          | server    |
| Shared Memory     | Both read/  | Fastest  | No       | Large data|
|                   | write       |          |          | sharing   |
| Message Queue     | One-way     | Moderate | No       | Discrete  |
|                   | per queue   |          |          | messages  |
| Signals           | One-way     | Fast     | No       | Simple    |
|                   | (notify)    |          |          | events    |
| Files             | Both        | Slow     | No       | Persistent|
|                   |             | (disk)   |          | data      |
+-------------------+-------------+----------+----------+-----------+

What Just Happened?

+------------------------------------------------------------------+
|  Chapter 12 Recap: Inter-Process Communication                   |
|------------------------------------------------------------------|
|                                                                  |
|  - Pipes (|) connect stdout of one process to stdin of next.     |
|  - Pipes are anonymous, unidirectional, and kernel-buffered.     |
|  - Redirections (>, >>, 2>&1, <) control file descriptors.      |
|  - /dev/null discards output; tee duplicates it.                |
|  - Named pipes (mkfifo) persist on disk and connect any two     |
|    processes.                                                    |
|  - Process substitution <() and >() treat command output as     |
|    file paths.                                                   |
|  - Shared memory is the fastest IPC (no data copying).          |
|  - Unix domain sockets provide bidirectional local IPC.          |
|  - Message queues preserve message boundaries.                   |
|  - Use pipefail in scripts to catch pipeline errors.            |
|  - Order matters in redirections: > file 2>&1 is correct.       |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Redirection Mastery

Write a command that runs find / -name "*.conf" and saves normal output to found.txt, errors to errors.txt, and also displays both on the terminal simultaneously. (Hint: you need tee and redirection.)

Exercise 2: Named Pipe Chat

Create a simple two-way chat system using two named pipes. Two terminals should be able to send messages to each other. Each terminal reads from one pipe and writes to the other.

Exercise 3: Process Substitution Power

Without creating any temporary files, find all files that exist in /etc but not in /usr/etc (or any two directories). Use diff with process substitution and find.

Exercise 4: Pipeline Analysis

Run cat /etc/passwd | cut -d: -f7 | sort | uniq -c | sort -rn. Then rewrite it without cat (using input redirection). Then use ${PIPESTATUS[@]} to verify all pipeline stages succeeded.

Bonus Challenge

Write a script that uses a named pipe to implement a simple job queue. One terminal acts as the "dispatcher" writing commands to the pipe, and another terminal acts as the "worker" reading and executing them one at a time. Include error handling and logging of each job's exit status.

The Kernel Up Close

Why This Matters

Every command you have run so far in this book -- every file you opened, every process you started, every network packet you sent -- went through the Linux kernel. The kernel is the one piece of software that sits between your programs and the hardware. It manages memory, schedules processes, handles disk I/O, drives network interfaces, and enforces security.

Yet most Linux users never look at the kernel directly. They interact with it through commands, system calls, and virtual filesystems without realizing it. This chapter pulls back the curtain. You will learn what the kernel actually does, how to inspect it, how to load and unload kernel modules, how to tune kernel behavior at runtime, and how to read the kernel's own log messages.

This knowledge is essential for performance tuning, hardware troubleshooting, security hardening, and understanding why things work the way they do.

Try This Right Now

# What kernel are you running?
uname -a

# Kernel version only
uname -r

# How long has this kernel been running?
uptime

# See kernel log messages (most recent)
dmesg | tail -20

# How many kernel modules are loaded?
lsmod | wc -l

# Peek at the kernel's view of your CPU
cat /proc/cpuinfo | head -20

# How much memory does the kernel see?
cat /proc/meminfo | head -10

Kernel vs. Userspace

The most fundamental distinction in Linux is between kernel space and user space.

+--------------------------------------------------+
|                User Space                        |
|                                                  |
|   +--------+  +--------+  +--------+  +------+  |
|   | bash   |  | nginx  |  | python |  | top  |  |
|   +--------+  +--------+  +--------+  +------+  |
|                                                  |
|   Applications, libraries (glibc), utilities     |
|                                                  |
+=======================+=========================+
|        System Call Interface (syscall)           |
+=======================+=========================+
|                                                  |
|                Kernel Space                      |
|                                                  |
|   +----------+  +---------+  +----------+       |
|   | Process   |  | Memory  |  | Network  |       |
|   | Scheduler |  | Manager |  | Stack    |       |
|   +----------+  +---------+  +----------+       |
|                                                  |
|   +----------+  +---------+  +----------+       |
|   | VFS      |  | Device  |  | Security |       |
|   |          |  | Drivers |  | (LSM)    |       |
|   +----------+  +---------+  +----------+       |
|                                                  |
+=======================+=========================+
|              Hardware                            |
|   CPU, RAM, Disk, Network, USB, GPU, ...         |
+--------------------------------------------------+

Why Two Spaces?

  • Kernel space has unrestricted access to hardware. A bug here can crash the entire system.
  • User space is restricted. A bug in your application cannot (usually) crash the kernel or affect other users.

The CPU enforces this split using hardware protection rings:

  • Ring 0: Kernel mode (full hardware access)
  • Ring 3: User mode (restricted)

When your program needs something that requires kernel privileges (opening a file, sending a network packet, allocating memory), it makes a system call.


System Calls: The Gateway

A system call (syscall) is how user-space programs request services from the kernel. Every meaningful operation eventually becomes a system call.

  Your Program (user space)
       |
       | printf("hello\n")
       |
       v
  C Library (glibc)
       |
       | write(1, "hello\n", 6)  <-- system call wrapper
       |
       v
  Kernel (kernel space)
       |
       | Actually writes bytes to the terminal device
       |
       v
  Hardware (terminal/screen)

Common System Calls

System CallWhat It DoesYou Use It When...
open()Open a fileOpening any file
read()Read from a file descriptorReading file contents
write()Write to a file descriptorWriting to a file or stdout
close()Close a file descriptorDone with a file
fork()Create a child processStarting a new process
exec()Replace process with new programRunning a command
mmap()Map file/memory into address spaceMemory allocation, file I/O
socket()Create a network socketAny network operation
ioctl()Device-specific controlHardware configuration

Watching System Calls with strace

strace lets you see every system call a process makes:

# Trace a simple command
strace ls /tmp 2>&1 | head -30

# Trace a running process
sudo strace -p $(pgrep nginx | head -1) -e trace=read,write

# Count system calls (summary mode)
strace -c ls /tmp

# Trace file-related calls only
strace -e trace=file ls /tmp

# Trace network-related calls only
strace -e trace=network curl -s example.com > /dev/null

# Trace with timestamps
strace -t ls /tmp 2>&1 | head -10

Example output from strace -c ls /tmp:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 25.00    0.000050          10         5           openat
 20.00    0.000040           5         8           mmap
 15.00    0.000030           4         7           close
 10.00    0.000020           3         6           fstat
 10.00    0.000020           4         5           read
  5.00    0.000010          10         1           getdents64
  ...

Think About It: When you run echo "hello", how many system calls happen? Try strace echo "hello" 2>&1 | wc -l to find out. Why are there so many for such a simple command?


Kernel Modules

The Linux kernel is modular. Rather than compiling every possible driver and feature into the kernel image, Linux loads functionality on demand through kernel modules. These are .ko (kernel object) files.

  +----------------------------------+
  |         Linux Kernel             |
  |                                  |
  |  Core (always loaded):           |
  |  - Process scheduler             |
  |  - Memory manager                |
  |  - VFS layer                     |
  |                                  |
  |  Modules (loaded on demand):     |
  |  +----------+ +----------+      |
  |  | ext4.ko  | | e1000.ko |      |
  |  +----------+ +----------+      |
  |  +----------+ +----------+      |
  |  | nf_tables| | usb_hid  |      |
  |  +----------+ +----------+      |
  +----------------------------------+

Listing Loaded Modules

# List all currently loaded modules
lsmod

# Output format:
# Module                  Size  Used by
# nf_tables             303104  0
# e1000                  151552  0
# ext4                   806912  1
# ...

The columns are:

  • Module: Module name
  • Size: Memory used (bytes)
  • Used by: Count of dependents, and which modules depend on it
# Filter for a specific module
lsmod | grep ext4

# Count loaded modules
lsmod | wc -l

Getting Module Information

# Detailed info about a module
modinfo ext4

# Key fields:
# filename:       /lib/modules/.../ext4.ko
# license:        GPL
# description:    Fourth Extended Filesystem
# depends:        jbd2,mbcache,crc16
# parm:           ...  (module parameters)

# Just show the description
modinfo -d ext4

# Show module parameters
modinfo -p ext4

# Show the file path
modinfo -n ext4

Loading and Unloading Modules

# Load a module (resolves dependencies automatically)
sudo modprobe snd_dummy

# Verify it loaded
lsmod | grep snd_dummy

# Unload a module
sudo modprobe -r snd_dummy

# Load with parameters
sudo modprobe loop max_loop=64

WARNING: Be very careful loading and unloading kernel modules on production systems. Unloading a module that is in use can crash the system. modprobe -r will refuse if the module is in use, but forcing removal (rmmod -f) can cause a kernel panic.

Module Dependencies

Modules can depend on other modules. modprobe handles this automatically, but you can see the dependency tree:

# Show what a module depends on
modinfo ext4 | grep depends

# Show the full dependency tree
modprobe --show-depends ext4

Blacklisting Modules

Sometimes you need to prevent a module from loading (conflicting drivers, security):

# Create a blacklist file
sudo tee /etc/modprobe.d/blacklist-example.conf << 'EOF'
# Prevent the nouveau driver from loading (example)
blacklist nouveau
EOF

# After blacklisting, update initramfs
sudo update-initramfs -u      # Debian/Ubuntu
sudo dracut --force            # RHEL/Fedora

Distro Note: Module blacklisting syntax is the same across distributions, but the command to rebuild initramfs differs. Debian/Ubuntu use update-initramfs, RHEL/Fedora use dracut.


Exploring /proc -- The Process Filesystem

/proc is a virtual filesystem. Nothing in it exists on disk -- the kernel generates its contents on the fly when you read them. It is your window into the kernel's state.

System-Wide Information

# Kernel version
cat /proc/version

# CPU information
cat /proc/cpuinfo

# Memory statistics
cat /proc/meminfo

# Uptime (in seconds)
cat /proc/uptime

# Load average
cat /proc/loadavg

# Mounted filesystems
cat /proc/mounts

# Currently active partitions
cat /proc/partitions

# Network statistics
cat /proc/net/dev

# Open file count system-wide
cat /proc/sys/fs/file-nr

# Maximum number of open files
cat /proc/sys/fs/file-max

# Kernel command line (boot parameters)
cat /proc/cmdline

Per-Process Information

Each PID has its own directory (covered in Chapter 10, but here is the kernel-focused view):

# Pick a PID (your own shell)
PID=$$

# Command that started this process
cat /proc/$PID/cmdline | tr '\0' ' '; echo

# Process status (kernel's view)
cat /proc/$PID/status

# Memory map
cat /proc/$PID/maps | head -10

# Open file descriptors
ls -l /proc/$PID/fd/

# Limits applied to this process
cat /proc/$PID/limits

# cgroup membership
cat /proc/$PID/cgroup

# Namespace information
ls -l /proc/$PID/ns/

# Scheduling information
cat /proc/$PID/sched | head -20

Interesting /proc Files

# Random number entropy available
cat /proc/sys/kernel/random/entropy_avail

# Hostname
cat /proc/sys/kernel/hostname

# OS type
cat /proc/sys/kernel/ostype

# Swappiness (how aggressively kernel swaps)
cat /proc/sys/vm/swappiness

# IP forwarding enabled?
cat /proc/sys/net/ipv4/ip_forward

# Maximum number of processes
cat /proc/sys/kernel/pid_max

# Kernel taint flags (non-zero means something unusual)
cat /proc/sys/kernel/tainted

Think About It: /proc files have a size of 0 bytes according to ls -l, yet cat can read content from them. Why? What does this tell you about how /proc works?


Exploring /sys -- The Device Filesystem

/sys (sysfs) is another virtual filesystem, focused on devices and kernel subsystems:

# Block devices (disks)
ls /sys/block/

# Network devices and their MAC addresses
ls /sys/class/net/
cat /sys/class/net/eth0/address 2>/dev/null

# CPU frequency governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 2>/dev/null

# Disk queue scheduler
cat /sys/block/sda/queue/scheduler 2>/dev/null

While /proc is a mix of process info and kernel state (and is older, from Linux 1.0), /sys is a cleaner, hierarchical view focused on devices and drivers (introduced in Linux 2.6). Both are virtual -- nothing on disk.


uname: Kernel Identity

# All information at once
uname -a

# Most commonly used flags
uname -r     # Kernel release: 6.1.0-18-amd64
uname -m     # Architecture: x86_64
uname -s     # Kernel name: Linux
uname -n     # Hostname

The kernel version string decoded:

  6.1.0-18-amd64
  | | |  |    |
  | | |  |    +-- Architecture variant
  | | |  +------- Distro patch level
  | | +---------- Patch version
  | +------------ Minor version
  +-------------- Major version

Kernel Parameters with sysctl

sysctl reads and writes kernel parameters at runtime. These correspond to files under /proc/sys/:

# List all kernel parameters
sysctl -a 2>/dev/null | head -20

# Read a specific parameter
sysctl net.ipv4.ip_forward
# Same as: cat /proc/sys/net/ipv4/ip_forward

# Set a parameter temporarily (until reboot)
sudo sysctl net.ipv4.ip_forward=1

# Set a parameter permanently
echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/99-forwarding.conf
sudo sysctl --system    # Reload all sysctl config

Important sysctl Parameters

# Network
sysctl net.ipv4.ip_forward                    # IP routing
sysctl net.ipv4.tcp_syncookies                # SYN flood protection
sysctl net.core.somaxconn                     # Max socket listen backlog
sysctl net.ipv4.tcp_max_syn_backlog           # SYN queue size

# Virtual memory
sysctl vm.swappiness                          # Swap aggressiveness (0-100)
sysctl vm.dirty_ratio                         # % of RAM for dirty pages before sync
sysctl vm.overcommit_memory                   # Memory overcommit policy

# Kernel
sysctl kernel.pid_max                         # Maximum PID value
sysctl kernel.hostname                        # System hostname
sysctl kernel.panic                           # Seconds before reboot on panic (0=hang)

# File system
sysctl fs.file-max                            # Maximum open files system-wide
sysctl fs.inotify.max_user_watches            # inotify watch limit

Practical: Tuning for a Web Server

# Increase connection backlog for high-traffic servers
sudo sysctl net.core.somaxconn=65535
sudo sysctl net.ipv4.tcp_max_syn_backlog=65535

# Increase file descriptor limits
sudo sysctl fs.file-max=2097152

# Increase inotify watches (for file-watching dev tools)
sudo sysctl fs.inotify.max_user_watches=524288

# Make changes permanent
sudo tee /etc/sysctl.d/99-webserver.conf << 'EOF'
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
EOF

sudo sysctl --system

dmesg: The Kernel Ring Buffer

dmesg displays the kernel ring buffer -- a circular log where the kernel writes messages about hardware detection, driver loading, errors, and other events:

# View all kernel messages
dmesg

# View with human-readable timestamps
dmesg -T

# View with color
dmesg --color=always | less -R

# Show only errors and warnings
dmesg --level=err,warn

# Follow new messages in real time (like tail -f)
dmesg -w

# Show messages since last boot
dmesg -T | head -50

# Clear the ring buffer (root only)
sudo dmesg -c

What to Look for in dmesg

# Hardware detection at boot
dmesg | grep -i "cpu\|memory\|disk\|network\|usb"

# Disk/storage messages
dmesg | grep -i "sd[a-z]\|nvme\|ext4\|xfs"

# Network interface detection
dmesg | grep -i "eth\|ens\|wlan\|link"

# Errors (these are important!)
dmesg --level=err

# Out of memory events
dmesg | grep -i "oom\|out of memory"

# USB device events
dmesg | grep -i usb

# Firewall drops (if logging is enabled)
dmesg | grep -i "iptables\|nftables\|DROP"

dmesg and journalctl

On systemd systems, kernel messages are also captured by journald:

# Kernel messages via journalctl
journalctl -k

# Kernel messages from current boot
journalctl -k -b 0

# Kernel messages from previous boot
journalctl -k -b -1

# Follow kernel messages
journalctl -kf

Think About It: The kernel ring buffer has a fixed size (typically 256KB-1MB). What happens when it fills up? How does this affect your ability to investigate boot problems hours after the system started?


Debug This: Identifying a Missing Driver

A new USB device is plugged in but does not work:

# Step 1: Check dmesg for recent USB events
dmesg -T | tail -30

# You might see something like:
# [timestamp] usb 1-1: new high-speed USB device number 4
# [timestamp] usb 1-1: New USB device found, idVendor=1234, idProduct=5678
# [timestamp] usb 1-1: New USB device strings: Mfr=1, Product=2, Serial=3

# Step 2: Check if a driver was loaded
dmesg -T | grep -i "driver\|module\|bound"

# Step 3: Find the vendor/product ID
lsusb
# Bus 001 Device 004: ID 1234:5678 Unknown Device

# Step 4: Search for a matching module
find /lib/modules/$(uname -r) -name "*.ko" | xargs modinfo 2>/dev/null | grep -B5 "1234"

# Step 5: Check if the module exists but isn't loaded
modprobe --show-depends relevant_module

# Step 6: Try loading it manually
sudo modprobe relevant_module

# Step 7: Check dmesg again
dmesg -T | tail -10

Hands-On: Kernel Exploration Lab

# 1. Determine your exact kernel version and architecture
uname -r
uname -m

# 2. How many system calls does the kernel support?
# (On x86_64 systems)
grep -c "^[0-9]" /usr/include/asm/unistd_64.h 2>/dev/null || \
ausyscall --dump 2>/dev/null | wc -l

# 3. What kernel modules are loaded for your filesystem?
lsmod | grep -E "ext4|xfs|btrfs"

# 4. What is the kernel's view of your disks?
cat /proc/partitions

# 5. Check kernel taint status (0 = clean, non-zero = something unusual)
cat /proc/sys/kernel/tainted

# 6. See the kernel command line (how it was booted)
cat /proc/cmdline

# 7. What interrupts are firing?
cat /proc/interrupts | head -20

# 8. Check the current swappiness
sysctl vm.swappiness

# 9. Temporarily change swappiness and verify
sudo sysctl vm.swappiness=10
sysctl vm.swappiness
# Reset it
sudo sysctl vm.swappiness=60

# 10. Trace system calls of a simple command
strace -c date 2>&1

What Just Happened?

+------------------------------------------------------------------+
|  Chapter 13 Recap: The Kernel Up Close                           |
|------------------------------------------------------------------|
|                                                                  |
|  - The kernel manages hardware, processes, memory, and I/O.     |
|  - User space programs access the kernel via system calls.       |
|  - strace lets you watch system calls in real time.              |
|  - Kernel modules (.ko) load functionality on demand.            |
|  - lsmod, modprobe, modinfo manage modules.                     |
|  - /proc is a virtual filesystem exposing kernel state.          |
|  - /sys exposes device and driver information hierarchically.    |
|  - uname -r shows your kernel version.                           |
|  - sysctl reads and tunes kernel parameters at runtime.         |
|  - dmesg shows the kernel ring buffer (hardware, drivers, errors)|
|  - Kernel parameters can be made permanent in /etc/sysctl.d/.   |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: System Call Counting

Use strace -c on three different commands: ls /tmp, cat /etc/passwd, and curl -s example.com > /dev/null. Compare the number and types of system calls. Which command makes the most? Why?

Exercise 2: Module Investigation

Run lsmod and pick three modules you do not recognize. Use modinfo to learn about each one: what does it do, what license is it under, and what parameters does it accept?

Exercise 3: /proc Scavenger Hunt

Using only files in /proc, determine: (a) how many CPUs/cores the kernel sees, (b) total installed RAM, (c) current load average, (d) the kernel's command line boot parameters, and (e) how many file descriptors are currently in use system-wide.

Exercise 4: sysctl Tuning

Read the current values of vm.swappiness, net.ipv4.ip_forward, and fs.file-max. Change vm.swappiness to 10, verify the change took effect, then set it back to the original value. Write the appropriate line for /etc/sysctl.d/ to make it permanent.

Bonus Challenge

Write a script called kernel-report.sh that outputs a comprehensive report: kernel version, architecture, uptime, number of loaded modules, top 5 modules by memory usage, number of running processes, file descriptor usage, and any errors in the kernel ring buffer from the last hour. Format the output cleanly with headers and dividers.

Boot Process Demystified

Why This Matters

You press the power button. Sixty seconds later, you are staring at a login prompt. What just happened in those sixty seconds?

Understanding the boot process is not academic trivia. When a server refuses to boot, when a kernel update goes wrong, when you need to recover a system with a corrupted root filesystem, when a misconfigured GRUB entry leaves you staring at a blinking cursor -- that is when this knowledge saves you. Every minute a production server is down costs money. Knowing the boot sequence means you can diagnose where it broke and fix it.

This chapter walks through every stage, from the moment electricity hits the motherboard to the moment systemd presents you with a login prompt. You will learn to configure GRUB, inspect initramfs, analyze boot performance, and rescue a system that will not boot.

Try This Right Now

# How long did your system take to boot?
systemd-analyze

# Breakdown by stage
systemd-analyze blame | head -15

# Visual chain of the boot process
systemd-analyze critical-chain

# What target (runlevel) are you running?
systemctl get-default

# When was this system last booted?
who -b

# See the kernel's boot parameters
cat /proc/cmdline

# Check for boot errors
journalctl -b 0 -p err

The Boot Sequence: Bird's Eye View

+------------------------------------------------------------------+
|  1. FIRMWARE (BIOS or UEFI)                                     |
|     - Hardware initialization (POST)                             |
|     - Find and load the bootloader                               |
+------------------------------------------------------------------+
         |
         v
+------------------------------------------------------------------+
|  2. BOOTLOADER (GRUB2)                                          |
|     - Present boot menu                                          |
|     - Load the kernel and initramfs into memory                  |
+------------------------------------------------------------------+
         |
         v
+------------------------------------------------------------------+
|  3. KERNEL                                                       |
|     - Initialize hardware, memory, CPU                           |
|     - Mount initramfs as temporary root                          |
|     - Find and mount the real root filesystem                    |
|     - Start PID 1                                                |
+------------------------------------------------------------------+
         |
         v
+------------------------------------------------------------------+
|  4. INIT SYSTEM (systemd, PID 1)                                |
|     - Mount remaining filesystems                                |
|     - Start services according to the target                    |
|     - Present login prompt                                       |
+------------------------------------------------------------------+

Let us examine each stage in detail.


Stage 1: Firmware -- BIOS and UEFI

When you press the power button, the CPU starts executing code from a chip on the motherboard -- the firmware.

BIOS (Legacy)

The Basic Input/Output System is the older standard (1980s):

  1. POST (Power-On Self-Test) -- checks CPU, RAM, and basic hardware
  2. Looks for a bootable device based on the boot order (hard disk, USB, network)
  3. Reads the first 512 bytes of the boot device -- the Master Boot Record (MBR)
  4. Executes the bootloader code found in the MBR

MBR limitations:

  • Only 446 bytes for bootloader code (tiny!)
  • Maximum disk size: 2 TB
  • Maximum 4 primary partitions

UEFI (Modern)

The Unified Extensible Firmware Interface is the modern replacement:

  1. POST -- same hardware checks
  2. Reads the EFI System Partition (ESP) -- a FAT32 partition, usually mounted at /boot/efi
  3. Runs the bootloader EFI application (e.g., grubx64.efi)
  4. Supports Secure Boot (only runs cryptographically signed bootloaders)

UEFI advantages:

  • Supports disks larger than 2 TB (GPT partitioning)
  • Faster boot (can skip legacy compatibility)
  • Secure Boot protects against rootkits
  • Can boot directly from EFI applications (no MBR needed)
# Check if your system uses UEFI or BIOS
ls /sys/firmware/efi 2>/dev/null && echo "UEFI boot" || echo "Legacy BIOS boot"

# If UEFI, see the boot entries
efibootmgr -v 2>/dev/null

# See the EFI System Partition
mount | grep efi
ls /boot/efi/EFI/ 2>/dev/null

Think About It: Why does the EFI System Partition use FAT32 and not ext4 or XFS? (Hint: the firmware needs to read it before any Linux driver is loaded.)


Stage 2: The GRUB2 Bootloader

GRUB2 (GRand Unified Bootloader version 2) is the standard bootloader on most Linux distributions. Its job is to load the kernel and initramfs into memory.

What GRUB Does

  1. Presents a boot menu (if configured to show one)
  2. Lets you select which kernel to boot, edit boot parameters, or enter a recovery shell
  3. Loads the selected kernel image (vmlinuz-*) into memory
  4. Loads the initial RAM filesystem (initramfs-* or initrd-*) into memory
  5. Passes control to the kernel with the specified boot parameters

GRUB Configuration

# Main GRUB config file (DO NOT edit directly)
ls -l /boot/grub/grub.cfg      # Debian/Ubuntu
ls -l /boot/grub2/grub.cfg     # RHEL/Fedora

# Edit GRUB defaults instead
cat /etc/default/grub

The /etc/default/grub file contains the settings you should modify:

# Key settings in /etc/default/grub:

GRUB_DEFAULT=0                    # Default menu entry (0 = first)
GRUB_TIMEOUT=5                    # Seconds to show menu (0 = skip, -1 = wait forever)
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"  # Kernel params for default boot
GRUB_CMDLINE_LINUX=""             # Kernel params for ALL boots
GRUB_DISABLE_RECOVERY="false"    # Show recovery entries?

Common kernel parameters you might add:

ParameterEffect
quietSuppress most boot messages
splashShow graphical splash screen
nomodesetDisable kernel mode-setting (GPU troubleshooting)
single / 1Boot into single-user mode
init=/bin/bashSkip init, drop to shell (emergency recovery)
rd.breakBreak into initramfs shell
systemd.unit=rescue.targetBoot into rescue mode
mem=4GLimit usable memory

Regenerating GRUB Configuration

After changing /etc/default/grub, regenerate the actual config:

# Debian/Ubuntu
sudo update-grub

# RHEL/Fedora
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

# For UEFI systems on RHEL/Fedora
sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

Distro Note: Debian/Ubuntu provide the update-grub convenience wrapper. RHEL/Fedora require the full grub2-mkconfig command. Both do the same thing -- regenerate /boot/grub/grub.cfg from templates and /etc/default/grub.

WARNING: Never edit /boot/grub/grub.cfg directly. Your changes will be overwritten the next time update-grub runs (e.g., during a kernel update). Always edit /etc/default/grub and regenerate.

GRUB Interactive Editing

At the GRUB menu (press Shift during boot on BIOS, or Esc on UEFI):

  • Press e to edit the selected entry's boot parameters
  • Find the line starting with linux (the kernel command line)
  • Add or modify parameters
  • Press Ctrl+X or F10 to boot with those changes (temporary, one-time)
  • Press c to drop to a GRUB command-line shell

This is invaluable for recovery. For example, to boot into a root shell:

# At the GRUB edit screen, add to the end of the 'linux' line:
init=/bin/bash
# Then press Ctrl+X to boot

GRUB Installation

# Install GRUB to a disk's MBR (BIOS)
sudo grub-install /dev/sda

# Install GRUB to EFI System Partition
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi

# Verify installation
sudo grub-install --recheck /dev/sda

Stage 3: The Kernel Loads

Once GRUB loads the kernel into memory and passes control, the kernel takes over:

Kernel Initialization Sequence

  1. Decompress kernel image (vmlinuz -> vmlinux)
  2. Initialize CPU and memory management
  3. Initialize essential hardware (console, timer, interrupts)
  4. Mount initramfs as temporary root (/)
  5. Run /init from initramfs
  6. initramfs loads necessary drivers (storage, filesystem)
  7. Mount the real root filesystem
  8. Pivot root from initramfs to real filesystem
  9. Execute /sbin/init (systemd) as PID 1

You can watch this process in dmesg:

# First few kernel messages -- decompression and initialization
dmesg | head -30

# Look for the root filesystem mounting
dmesg | grep -i "root\|mount\|ext4\|xfs"

# Look for the transition to userspace
dmesg | grep -i "init\|systemd\|pid 1"

initramfs: The Bridge

The kernel cannot mount your root filesystem without the right drivers. But those drivers are stored ON the root filesystem. Chicken-and-egg problem.

The solution is initramfs (initial RAM filesystem) -- a compressed archive containing just enough to find and mount the root filesystem:

+------------------------------------------+
|  initramfs contents:                     |
|  - Filesystem drivers (ext4, xfs, etc.)  |
|  - Storage drivers (SCSI, NVMe, RAID)   |
|  - LVM tools (if root is on LVM)        |
|  - LUKS tools (if root is encrypted)    |
|  - /init script (orchestrates mounting)  |
|  - udev rules for device detection      |
+------------------------------------------+
# See your initramfs files
ls -lh /boot/initrd.img-*    # Debian/Ubuntu
ls -lh /boot/initramfs-*     # RHEL/Fedora

# Examine initramfs contents
lsinitramfs /boot/initrd.img-$(uname -r) | head -30              # Debian/Ubuntu
lsinitrd /boot/initramfs-$(uname -r).img | head -30              # RHEL/Fedora

# Rebuild initramfs (if you've changed modules or drivers)
sudo update-initramfs -u -k $(uname -r)    # Debian/Ubuntu
sudo dracut --force                          # RHEL/Fedora

Think About It: If your root filesystem is on a simple SATA drive with ext4, the kernel might have those drivers built-in and could theoretically skip initramfs. But if your root is on an encrypted LVM volume on top of software RAID, initramfs is absolutely essential. Why?


Stage 4: systemd Takes Over (PID 1)

Once the kernel mounts the root filesystem, it executes /sbin/init, which on modern distributions is a symlink to systemd. systemd becomes PID 1 -- the parent of all processes.

Boot Targets (Modern Runlevels)

systemd uses targets instead of the old SysVinit runlevels:

+----------+-----------------------+-------------------------------+
| Runlevel | systemd Target        | Description                   |
+----------+-----------------------+-------------------------------+
|    0     | poweroff.target       | Halt the system               |
|    1     | rescue.target         | Single-user mode, root shell  |
|    2     | multi-user.target *   | (Debian: same as 3)           |
|    3     | multi-user.target     | Full multi-user, no GUI       |
|    4     | multi-user.target     | Unused (customizable)         |
|    5     | graphical.target      | Multi-user with GUI           |
|    6     | reboot.target         | Reboot                        |
+----------+-----------------------+-------------------------------+
# What target is currently active?
systemctl get-default

# Change the default target
sudo systemctl set-default multi-user.target    # Server (no GUI)
sudo systemctl set-default graphical.target     # Desktop (with GUI)

# Switch to a different target right now (without reboot)
sudo systemctl isolate rescue.target            # Single-user mode
sudo systemctl isolate multi-user.target        # Back to normal

What systemd Does During Boot

# See the dependency order of boot
systemd-analyze critical-chain

# Output (example):
# graphical.target @12.345s
# └─multi-user.target @12.300s
#   └─network-online.target @10.200s
#     └─NetworkManager-wait-online.service @5.100s +5.099s
#       └─NetworkManager.service @3.200s +1.899s
#         └─dbus.service @2.800s +0.399s
#           └─basic.target @2.700s
#             └─sockets.target @2.699s

Analyzing Boot Performance

# Total boot time breakdown
systemd-analyze
# Output: Startup finished in 2.5s (firmware) + 3.1s (loader) +
#         4.2s (kernel) + 12.3s (userspace) = 22.1s

# Which services took the longest?
systemd-analyze blame | head -20

# Generate an SVG boot chart (visual timeline)
systemd-analyze plot > boot-chart.svg
# Open in a browser to see a graphical timeline

# Check for units that are slowing down boot
systemd-analyze critical-chain --no-pager

# See target dependencies
systemctl list-dependencies multi-user.target | head -30

Boot Log

# View complete boot log for this boot
journalctl -b 0

# View boot log for the previous boot
journalctl -b -1

# Show only errors from this boot
journalctl -b 0 -p err

# Show kernel messages from boot
journalctl -b 0 -k

# List all recorded boots
journalctl --list-boots

Troubleshooting Boot Failures

Identifying Where Boot Failed

+----------------------------------------------------+
|  Symptom                    | Stage That Failed     |
|-----------------------------|-----------------------|
|  No display, no beep       | Hardware/BIOS         |
|  BIOS screen, then blank   | Bootloader (GRUB)     |
|  GRUB menu appears,        | Kernel loading or     |
|  then kernel panic          | initramfs             |
|  Kernel loads, services     | systemd / services    |
|  fail, no login             |                       |
|  Login appears but can't    | Authentication / PAM  |
|  log in                     |                       |
+----------------------------------------------------+

Rescue Mode

If the system boots far enough for GRUB to work:

Method 1: GRUB Kernel Parameter Edit

  1. At the GRUB menu, press e
  2. Find the line starting with linux
  3. Add systemd.unit=rescue.target at the end
  4. Press Ctrl+X to boot
  5. You will get a root shell (may require root password)

Method 2: Emergency Mode

For more serious problems (filesystem issues):

  1. At GRUB, press e
  2. Add systemd.unit=emergency.target to the kernel line
  3. Press Ctrl+X
  4. Only the root filesystem is mounted (read-only)
# In emergency mode, remount root as read-write
mount -o remount,rw /

# Fix the problem (e.g., edit /etc/fstab)
vi /etc/fstab

# Reboot
reboot

Method 3: init=/bin/bash

When even systemd will not start:

  1. At GRUB, press e
  2. Replace ro quiet splash with rw init=/bin/bash
  3. Press Ctrl+X
  4. You drop directly to a bash shell as root, no authentication
# Root filesystem may be read-only; remount it
mount -o remount,rw /

# Reset a forgotten root password
passwd root

# Fix other issues as needed

# When done, remount read-only and reboot
mount -o remount,ro /
exec /sbin/reboot -f

WARNING: init=/bin/bash bypasses ALL authentication. Anyone with physical access to the GRUB menu can become root. This is why physical security matters, and why some organizations configure GRUB passwords.

Fixing a Broken GRUB

If GRUB itself is corrupted:

# Boot from a live USB/ISO
# Mount the installed system's partitions:
sudo mount /dev/sda2 /mnt          # Root partition
sudo mount /dev/sda1 /mnt/boot/efi # EFI partition (if UEFI)
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys

# Chroot into the installed system
sudo chroot /mnt

# Reinstall GRUB
grub-install /dev/sda              # BIOS
grub-install --target=x86_64-efi --efi-directory=/boot/efi  # UEFI

# Regenerate config
update-grub                        # Debian/Ubuntu
grub2-mkconfig -o /boot/grub2/grub.cfg  # RHEL/Fedora

# Exit chroot and reboot
exit
sudo reboot

Fixing a Broken /etc/fstab

A common boot failure: you added an entry to /etc/fstab with a typo, and now the system drops to an emergency shell on boot.

# In emergency mode:
# The error message will tell you which line failed
# Mount root as read-write
mount -o remount,rw /

# Edit fstab
vi /etc/fstab

# Either fix the typo or comment out the problematic line
# Save and reboot
reboot

Think About It: Why does a bad /etc/fstab entry prevent booting, even if the bad entry is for a non-root filesystem like /data? (Hint: look at the mount options -- is nofail set?)

Pro tip: always add nofail to non-critical mount entries in /etc/fstab:

/dev/sdb1  /data  ext4  defaults,nofail  0  2

This tells systemd: "Try to mount this, but do not fail the boot if it cannot be mounted."


Hands-On: Boot Analysis

# 1. Analyze your boot time
systemd-analyze

# 2. Find the slowest services
systemd-analyze blame | head -10

# 3. See the critical chain (the longest dependency path)
systemd-analyze critical-chain

# 4. Check for failed services during boot
systemctl --failed

# 5. See what the kernel was told at boot
cat /proc/cmdline

# 6. Examine the initramfs for your current kernel
file /boot/initrd.img-$(uname -r) 2>/dev/null || \
file /boot/initramfs-$(uname -r).img 2>/dev/null

# 7. Check GRUB configuration
cat /etc/default/grub

# 8. List available kernels in GRUB
grep menuentry /boot/grub/grub.cfg 2>/dev/null | head -10 || \
grep menuentry /boot/grub2/grub.cfg 2>/dev/null | head -10

# 9. Check for boot errors
journalctl -b 0 -p err --no-pager | head -30

# 10. See how many boots have been recorded
journalctl --list-boots

GRUB Password Protection

To prevent unauthorized users from editing GRUB entries (and bypassing authentication with init=/bin/bash):

# Generate a GRUB password hash
grub-mkpasswd-pbkdf2
# Enter and confirm a password
# Output: grub.pbkdf2.sha512.10000.HASH...

# Add to /etc/grub.d/40_custom
sudo tee -a /etc/grub.d/40_custom << 'EOF'
set superusers="admin"
password_pbkdf2 admin grub.pbkdf2.sha512.10000.YOUR_HASH_HERE
EOF

# Regenerate GRUB config
sudo update-grub

Now editing GRUB entries at boot will require the GRUB password.


Debug This: System Drops to Emergency Shell

After a reboot, the system drops to:

You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" to try again
to boot into default mode.
Give root password for maintenance:

Diagnosis steps:

# 1. Enter root password (if set)

# 2. Check the journal for the cause
journalctl -xb --no-pager | grep -E "Failed|Error|error" | head -20

# 3. Common cause #1: /etc/fstab error
# Look for filesystem mount failures
systemctl --failed
# If a mount unit failed:
cat /etc/fstab
# Fix the bad line

# 4. Common cause #2: Disk UUID changed
blkid
# Compare UUIDs to what is in /etc/fstab

# 5. Common cause #3: Filesystem needs fsck
# The error message often says which device
fsck /dev/sda2

# 6. Remount root as read-write to make changes
mount -o remount,rw /

# 7. Make your fix, then:
systemctl default
# Or simply:
reboot

The Complete Boot Timeline

Here is the full sequence with approximate times for a modern SSD-based system:

  T+0.0s   Power button pressed
  T+0.5s   UEFI POST completes
  T+1.0s   UEFI finds and loads GRUB
  T+1.5s   GRUB loads kernel + initramfs
  T+2.0s   Kernel decompresses and initializes
  T+3.0s   initramfs mounts real root filesystem
  T+3.5s   systemd starts as PID 1
  T+4.0s   systemd mounts filesystems from /etc/fstab
  T+5.0s   systemd starts udev (device manager)
  T+6.0s   Network services start
  T+8.0s   Login services start (sshd, getty)
  T+10.0s  System is fully booted
            (graphical target may take longer)
# Verify this timeline on your system
systemd-analyze
# Output: Startup finished in 1.8s (firmware) + 2.3s (loader) +
#         3.5s (kernel) + 8.7s (userspace) = 16.3s

What Just Happened?

+------------------------------------------------------------------+
|  Chapter 14 Recap: Boot Process Demystified                      |
|------------------------------------------------------------------|
|                                                                  |
|  - Boot sequence: Firmware -> GRUB -> Kernel -> systemd          |
|  - BIOS is legacy (MBR, 2TB limit); UEFI is modern (GPT).       |
|  - GRUB2 loads the kernel and initramfs into memory.             |
|  - Edit /etc/default/grub, then run update-grub.                |
|  - initramfs provides drivers needed to mount the root FS.       |
|  - The kernel starts systemd as PID 1.                           |
|  - systemd targets replaced SysVinit runlevels.                  |
|  - systemd-analyze shows boot timing and bottlenecks.            |
|  - Rescue mode: add systemd.unit=rescue.target to kernel line.   |
|  - Emergency recovery: init=/bin/bash bypasses all auth.         |
|  - GRUB passwords prevent unauthorized boot parameter edits.     |
|  - Always use 'nofail' for non-critical mounts in /etc/fstab.   |
|  - journalctl -b 0 -p err shows boot errors.                    |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Boot Time Analysis

Run systemd-analyze blame | head -20 and identify the three slowest services. Research what each one does. Could any of them be disabled on a server that does not need a GUI?

Exercise 2: Kernel Command Line

Read /proc/cmdline. Look up each parameter. Try temporarily adding quiet (if not present) or removing it (if present) by editing the GRUB entry at boot time (press e, modify, Ctrl+X). Notice the difference in boot verbosity.

Exercise 3: initramfs Exploration

List the contents of your initramfs (lsinitramfs or lsinitrd). Find the filesystem driver modules included. Find the /init script. What does it do?

Exercise 4: Target Practice

Check your current default target (systemctl get-default). If it is graphical.target, switch to multi-user.target and reboot. Log in via the text console. Switch back and reboot again. Compare boot times between the two.

Bonus Challenge

Intentionally break your boot process in a VM (not on a real machine!) by adding a nonexistent UUID to /etc/fstab. Boot the VM and practice recovering:

  1. Use GRUB edit mode to boot into rescue mode
  2. Fix the /etc/fstab entry
  3. Reboot and verify normal boot

Then, practice the GRUB recovery process:

  1. From a live USB/ISO, mount the installed system's partitions
  2. Chroot into the installed system
  3. Reinstall and reconfigure GRUB
  4. Reboot into the repaired system

systemd: The Init System

Why This Matters

Picture this: you deploy a web application at 2 AM, and the server reboots unexpectedly. When it comes back up, your database starts before the network is ready, your app starts before the database is listening, and your reverse proxy starts before your app is responding. Nothing works. Users see errors. Your phone rings.

This is the exact problem an init system solves. It is the very first process that runs on your Linux system (PID 1), and it is responsible for starting everything else in the right order, keeping services alive, and shutting things down cleanly. On virtually every modern Linux distribution, that init system is systemd.

Whether you are a developer deploying applications, a sysadmin managing servers, or someone learning Linux for the first time, understanding systemd is non-negotiable. It controls how your system boots, how services run, how logs are collected, and how your system shuts down.


Try This Right Now

Open a terminal and run these commands. No setup required:

# What is PID 1 on your system?
ps -p 1 -o comm=

# How long has your system been running?
systemctl status --no-pager | head -5

# List all running services
systemctl list-units --type=service --state=running --no-pager

# What target (runlevel) is your system in?
systemctl get-default

If ps -p 1 printed systemd, you are running systemd. That covers Ubuntu, Fedora, Debian, RHEL, Arch, openSUSE, and nearly every other mainstream distribution.


What Is an Init System?

When the Linux kernel finishes its own initialization, it does one final thing: it launches a single userspace process. This process gets PID 1, and it becomes the ancestor of every other process on the system.

This PID 1 process is the init system, and it has several critical responsibilities:

  1. Start system services in the correct order (networking, logging, databases, etc.)
  2. Manage dependencies between services (the database needs the network first)
  3. Supervise running services and restart them if they crash
  4. Handle system state transitions (booting, shutting down, rebooting)
  5. Reap orphaned processes (adopt zombie children whose parents died)
+----------------------------------------------------------+
|                    Linux Kernel                           |
|  (hardware init, drivers, mount root filesystem)         |
+---------------------------+------------------------------+
                            |
                            v
                    +-------+-------+
                    |   PID 1       |
                    |   (systemd)   |
                    +-------+-------+
                            |
              +-------------+-------------+
              |             |             |
              v             v             v
        +---------+   +---------+   +---------+
        | sshd    |   | nginx   |   | cron    |
        | (svc)   |   | (svc)   |   | (svc)   |
        +---------+   +---------+   +---------+

A Brief History: SysVinit to systemd

SysVinit (The Old Way)

For decades, Linux used SysVinit, inherited from Unix System V. It worked like this:

  • Shell scripts in /etc/init.d/ controlled each service
  • Scripts were symlinked into numbered "runlevel" directories (/etc/rc3.d/, etc.)
  • Services started sequentially, one after another
  • The naming convention (S20sshd, S30apache) controlled start order
# Old SysVinit style (you may still see this on older systems)
/etc/init.d/apache2 start
/etc/init.d/apache2 stop
/etc/init.d/apache2 restart

The problems with SysVinit were real:

  • Slow boot times because services started one at a time
  • No dependency tracking — just numbered ordering and hope
  • No process supervision — if a service crashed, nobody restarted it
  • Shell scripts everywhere — fragile, inconsistent, hard to debug

systemd (The Modern Way)

Lennart Poettering and Kay Sievers created systemd in 2010. It was controversial (many Unix purists objected to its scope), but it solved genuine problems:

  • Parallel service startup dramatically reduced boot times
  • Declarative unit files replaced fragile shell scripts
  • Dependency management ensured correct startup order
  • Process supervision with automatic restart on failure
  • Unified logging via the journal (journald)
  • On-demand activation via socket and D-Bus activation

By 2015, every major distribution had adopted systemd.

Think About It: Why would parallel service startup require explicit dependency management? What could go wrong if you just started everything at the same time without tracking which services depend on which?


systemctl: Your Primary Interface

systemctl is the command you will use most often to interact with systemd. Think of it as the control panel for everything running on your system.

Starting and Stopping Services

# Start a service (takes effect immediately)
sudo systemctl start nginx

# Stop a service
sudo systemctl stop nginx

# Restart a service (stop + start)
sudo systemctl restart nginx

# Reload a service configuration without full restart
# (not all services support this)
sudo systemctl reload nginx

# Reload if supported, otherwise restart
sudo systemctl reload-or-restart nginx

Enabling and Disabling Services

Starting a service only affects the current session. If you reboot, it will not start automatically unless you enable it:

# Enable a service to start at boot
sudo systemctl enable nginx

# Disable a service from starting at boot
sudo systemctl disable nginx

# Enable AND start in one command
sudo systemctl enable --now nginx

# Disable AND stop in one command
sudo systemctl disable --now nginx

When you enable a service, systemd creates a symlink in the appropriate target directory. When you disable it, that symlink is removed.

# See what enabling actually does
sudo systemctl enable --now nginx
# Output: Created symlink /etc/systemd/system/multi-user.target.wants/nginx.service
#         -> /usr/lib/systemd/system/nginx.service

Checking Service Status

# Detailed status of a service
systemctl status nginx

Here is what typical output looks like:

● nginx.service - A high performance web server
     Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; preset: disabled)
     Active: active (running) since Mon 2025-03-10 14:22:01 UTC; 3h ago
       Docs: man:nginx(8)
    Process: 1234 ExecStartPre=/usr/bin/nginx -t -q (code=exited, status=0/SUCCESS)
   Main PID: 1235 (nginx)
      Tasks: 3 (limit: 4915)
     Memory: 8.2M
        CPU: 120ms
     CGroup: /system.slice/nginx.service
             ├─1235 "nginx: master process /usr/bin/nginx"
             ├─1236 "nginx: worker process"
             └─1237 "nginx: worker process"

Mar 10 14:22:01 server01 systemd[1]: Starting nginx.service...
Mar 10 14:22:01 server01 systemd[1]: Started nginx.service.

Let us break down each line:

FieldMeaning
LoadedWhere the unit file lives, whether it is enabled
ActiveCurrent state and how long it has been running
Main PIDThe primary process ID
TasksNumber of tasks (threads/processes) in the cgroup
MemoryMemory consumed by the service and its children
CGroupThe cgroup tree showing all child processes

Quick Status Checks

Sometimes you just need a yes/no answer:

# Is it running?
systemctl is-active nginx
# Output: active

# Is it enabled at boot?
systemctl is-enabled nginx
# Output: enabled

# Has it failed?
systemctl is-failed nginx
# Output: active   (meaning "not failed")

These commands return exit codes you can use in scripts:

if systemctl is-active --quiet nginx; then
    echo "nginx is running"
else
    echo "nginx is NOT running"
fi

Hands-On: Exploring Your System's Services

Let us explore what is running on your system right now.

Step 1: List All Running Services

systemctl list-units --type=service --state=running --no-pager

You will see output like:

UNIT                      LOAD   ACTIVE SUB     DESCRIPTION
cron.service              loaded active running Regular background program processing
dbus.service              loaded active running D-Bus System Message Bus
NetworkManager.service    loaded active running Network Manager
sshd.service              loaded active running OpenSSH Daemon
systemd-journald.service  loaded active running Journal Service
systemd-udevd.service     loaded active running Rule-based Manager for Device Events
...

Step 2: List Failed Services

systemctl list-units --type=service --state=failed --no-pager

On a healthy system, this should be empty. If you see failures, investigate with systemctl status <unit-name>.

Step 3: List All Installed Services (Running or Not)

systemctl list-unit-files --type=service --no-pager

This shows every service unit file installed on your system and whether it is enabled, disabled, static, or masked.

Step 4: View a Service's Unit File

systemctl cat sshd.service

This prints the actual unit file contents. We will study unit file anatomy in Chapter 16.

Distro Note: On Ubuntu/Debian, the SSH service is called ssh.service. On Fedora/RHEL/Arch, it is sshd.service. When in doubt, use tab completion: systemctl status ssh<TAB>.


Unit Types: Not Just Services

systemd does not just manage services. It manages many types of "units." Each unit type handles a different kind of system resource:

Unit TypeExtensionPurpose
Service.serviceDaemons and processes
Socket.socketIPC or network sockets (for activation)
Timer.timerScheduled tasks (like cron jobs)
Mount.mountFilesystem mount points
Automount.automountOn-demand filesystem mounting
Target.targetGroups of units (like runlevels)
Device.deviceKernel device events
Path.pathFilesystem path monitoring
Swap.swapSwap space
Slice.sliceResource management groups (cgroups)
Scope.scopeExternally created process groups

Listing Different Unit Types

# List all active timers
systemctl list-timers --no-pager

# List all mount units
systemctl list-units --type=mount --no-pager

# List all socket units
systemctl list-units --type=socket --no-pager

# List all targets
systemctl list-units --type=target --no-pager

Service Units

These are the most common. They manage long-running daemons:

systemctl list-units --type=service --no-pager | head -20

Socket Units

Socket units enable socket activation: systemd listens on a socket and only starts the actual service when a connection arrives. This saves resources.

# See which sockets systemd is listening on
systemctl list-sockets --no-pager
LISTEN                        UNIT                     ACTIVATES
/run/dbus/system_bus_socket   dbus.socket              dbus.service
/run/systemd/journal/socket   systemd-journald.socket  systemd-journald.service
[::]:22                       sshd.socket              sshd.service

Timer Units

Timer units are systemd's replacement for cron jobs. We will cover these in detail in Chapters 16 and 24.

# List all active timers and when they fire next
systemctl list-timers --all --no-pager

Mount Units

Every entry in /etc/fstab gets automatically converted to a mount unit:

# See mount units
systemctl list-units --type=mount --no-pager

Think About It: Why would systemd want to manage mount points as units instead of just reading /etc/fstab the old-fashioned way? Think about dependencies: what if a service needs a specific filesystem to be mounted before it can start?


Targets: The New Runlevels

In SysVinit, runlevels (0-6) defined what state the system was in. systemd replaces runlevels with targets -- units that group other units together.

Runlevel to Target Mapping

Runlevelsystemd TargetPurpose
0poweroff.targetHalt the system
1rescue.targetSingle-user mode (recovery)
2multi-user.targetMulti-user, no GUI (Debian-specific)
3multi-user.targetMulti-user, no GUI
4multi-user.targetUnused (custom)
5graphical.targetMulti-user with GUI
6reboot.targetReboot

Checking and Changing the Default Target

# What target does your system boot into?
systemctl get-default
# Output: graphical.target   (desktop) or multi-user.target (server)

# Change default to text/server mode
sudo systemctl set-default multi-user.target

# Change default to graphical/desktop mode
sudo systemctl set-default graphical.target

Switching Targets at Runtime

# Switch to rescue mode (single-user, for recovery)
sudo systemctl isolate rescue.target

# Switch to multi-user (text mode)
sudo systemctl isolate multi-user.target

# Switch to graphical mode
sudo systemctl isolate graphical.target

WARNING: systemctl isolate rescue.target will kill most running services and drop you to a root shell. Do not run this on a remote server unless you have console access.

Understanding Target Dependencies

Targets are like dependency trees. multi-user.target depends on basic.target, which depends on sysinit.target, which depends on local-fs.target and others:

graphical.target
    └── multi-user.target
            └── basic.target
                    ├── sockets.target
                    ├── timers.target
                    ├── paths.target
                    ├── slices.target
                    └── sysinit.target
                            ├── local-fs.target
                            ├── swap.target
                            └── cryptsetup.target

You can visualize this:

# Show what a target "wants" (its dependencies)
systemctl list-dependencies multi-user.target --no-pager

# Show the full boot dependency tree
systemctl list-dependencies default.target --no-pager

Hands-On: Managing a Real Service

Let us work through a complete service management workflow using the SSH daemon.

Step 1: Check Current Status

systemctl status sshd.service

Distro Note: Use ssh.service on Debian/Ubuntu.

Step 2: Stop the Service

sudo systemctl stop sshd.service

WARNING: If you are connected via SSH, do NOT stop sshd. Your existing connection will survive, but you will not be able to open new ones. Use a different service for practice if you are remote.

Step 3: Verify It Stopped

systemctl is-active sshd.service
# Output: inactive

systemctl status sshd.service
# Active line now shows: inactive (dead)

Step 4: Start It Again

sudo systemctl start sshd.service

systemctl is-active sshd.service
# Output: active

Step 5: Check Boot Configuration

systemctl is-enabled sshd.service
# Output: enabled

Step 6: View Recent Logs

# Last 20 log entries for sshd
journalctl -u sshd.service -n 20 --no-pager

Masking Services: The Nuclear Option

Sometimes disabling a service is not enough. Another service or a system update might re-enable it. Masking a service makes it completely impossible to start:

# Mask a service (symlinks unit file to /dev/null)
sudo systemctl mask bluetooth.service

# Try to start it — it will refuse
sudo systemctl start bluetooth.service
# Failed to start bluetooth.service: Unit bluetooth.service is masked.

# Unmask it when you want to allow it again
sudo systemctl unmask bluetooth.service

Masking is useful when:

  • You want to ensure a service never runs (security hardening)
  • Two services conflict and you want to permanently disable one
  • You are troubleshooting and want to eliminate a service entirely
# See what a mask looks like
ls -la /etc/systemd/system/bluetooth.service
# /etc/systemd/system/bluetooth.service -> /dev/null

Debug This: Why Won't My Service Start?

You install nginx and try to start it, but it fails:

sudo systemctl start nginx.service
# Job for nginx.service failed because the control process exited with error code.

Here is your debugging workflow:

Step 1: Check Status

systemctl status nginx.service --no-pager -l

The -l flag prevents line truncation. Look for error messages in the log section at the bottom.

Step 2: Check the Journal

journalctl -u nginx.service -n 50 --no-pager

Look for lines marked with err or crit priority.

Step 3: Check Configuration Syntax

# For nginx specifically
sudo nginx -t

Step 4: Check Port Conflicts

# Is something else using port 80?
sudo ss -tlnp | grep :80

Step 5: Check File Permissions

# Can the service user read its config?
ls -la /etc/nginx/nginx.conf

# Can it write to its log directory?
ls -la /var/log/nginx/

Step 6: Try Starting Manually

# Run the exact command from the unit file
systemctl cat nginx.service | grep ExecStart
# ExecStart=/usr/sbin/nginx -g 'daemon off;'

# Run it manually to see direct error output
sudo /usr/sbin/nginx -t

Common causes of service start failures:

  • Configuration syntax errors in the service's config file
  • Port already in use by another service
  • Missing files or directories that the service expects
  • Permission denied on config files, log directories, or PID files
  • Missing dependencies (a library or another service)

Useful systemctl Commands Reference

# Reload systemd itself after editing unit files
sudo systemctl daemon-reload

# Show all properties of a unit
systemctl show nginx.service --no-pager

# Show a specific property
systemctl show nginx.service --property=MainPID
systemctl show nginx.service --property=ActiveState

# List all units that failed
systemctl --failed --no-pager

# Reset the "failed" state of a unit
sudo systemctl reset-failed nginx.service

# Show the boot time breakdown
systemd-analyze

# Show which services took the longest to start
systemd-analyze blame --no-pager

# Show the critical chain (boot bottlenecks)
systemd-analyze critical-chain --no-pager

# Verify a unit file for errors
systemd-analyze verify /etc/systemd/system/myapp.service

systemd-analyze: Understanding Boot Performance

systemd-analyze is a powerful tool for understanding what happens during boot:

# Overall boot time
systemd-analyze
# Startup finished in 2.5s (kernel) + 5.1s (userspace) = 7.6s

# Which services took the longest?
systemd-analyze blame --no-pager | head -10
# 3.2s   NetworkManager-wait-online.service
# 1.1s   snapd.service
# 0.8s   udisks2.service
# 0.5s   accounts-daemon.service
# ...

# Show the critical path through the boot
systemd-analyze critical-chain --no-pager

The critical chain shows the longest dependency path through boot. This is where to focus if you want to speed up boot times.


What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 15 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - The init system (PID 1) starts and supervises all services    |
|  - systemd replaced SysVinit with parallel startup,             |
|    dependency management, and process supervision                |
|  - systemctl is your main tool: start, stop, restart,           |
|    enable, disable, status                                       |
|  - Unit types: service, socket, timer, mount, target, etc.      |
|  - Targets replace runlevels: multi-user.target,                |
|    graphical.target, rescue.target                               |
|  - enable = start at boot; start = start now                    |
|  - mask = prevent a service from starting entirely              |
|  - systemd-analyze helps diagnose slow boots                    |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Service Inventory

List all enabled services on your system. How many are there? Pick three you do not recognize and look up what they do using systemctl cat and man.

systemctl list-unit-files --type=service --state=enabled --no-pager

Exercise 2: Boot Analysis

Run systemd-analyze blame and find the three slowest services. Research whether any can be safely disabled on your system.

Exercise 3: Target Exploration

Run systemctl list-dependencies graphical.target and trace the dependency tree. Draw it on paper. How many levels deep does it go?

Exercise 4: Service Lifecycle

Install a simple service (like apache2 or httpd), then practice the full lifecycle:

sudo apt install apache2        # or: sudo dnf install httpd
sudo systemctl start apache2    # or: httpd
sudo systemctl status apache2
sudo systemctl stop apache2
sudo systemctl enable apache2
sudo systemctl disable apache2
sudo systemctl mask apache2
sudo systemctl start apache2    # Watch it fail
sudo systemctl unmask apache2

Bonus Challenge

Use systemd-analyze critical-chain to find the boot bottleneck on your system. Can you reduce boot time by disabling unnecessary services? Document the before and after boot times.

Writing & Managing Services

Why This Matters

You have written an application -- maybe a Python web server, a Go API, or a Node.js background worker. It runs fine when you type the command in a terminal. But what happens when you log out? It dies. What happens when it crashes at 3 AM? It stays dead. What happens when the server reboots? Nobody starts it.

This is where writing your own systemd service unit comes in. A unit file is a simple text file that tells systemd how to start, stop, supervise, and restart your application. No shell script gymnastics. No screen sessions. No nohup hacks. Just a declarative configuration that systemd follows reliably, every time.

This chapter teaches you to write unit files from scratch, understand every directive that matters, configure restart policies, manage dependencies, and replace cron jobs with systemd timers.


Try This Right Now

Let us create a trivially simple service in under a minute:

# Create a tiny script
sudo tee /usr/local/bin/hello-service.sh << 'SCRIPT'
#!/bin/bash
while true; do
    echo "Hello from my service at $(date)"
    sleep 10
done
SCRIPT
sudo chmod +x /usr/local/bin/hello-service.sh

# Create a unit file for it
sudo tee /etc/systemd/system/hello.service << 'UNIT'
[Unit]
Description=My Hello Service

[Service]
ExecStart=/usr/local/bin/hello-service.sh

[Install]
WantedBy=multi-user.target
UNIT

# Load, start, and watch it
sudo systemctl daemon-reload
sudo systemctl start hello.service
journalctl -u hello.service -f

You should see "Hello from my service" messages appearing every 10 seconds. Press Ctrl+C to stop watching the log. The service keeps running.

# Clean up when done
sudo systemctl stop hello.service
sudo systemctl disable hello.service
sudo rm /etc/systemd/system/hello.service
sudo rm /usr/local/bin/hello-service.sh
sudo systemctl daemon-reload

Unit File Anatomy

Every systemd unit file has the same basic structure: sections (denoted by square brackets) containing key-value directives.

+---------------------------------------------------+
|  [Unit]           <-- Metadata & Dependencies     |
|  Description=...                                   |
|  After=...                                         |
|  Requires=...                                      |
|                                                    |
|  [Service]        <-- How to Run                  |
|  Type=...                                          |
|  ExecStart=...                                     |
|  Restart=...                                       |
|                                                    |
|  [Install]        <-- Boot Integration            |
|  WantedBy=...                                      |
+---------------------------------------------------+

The [Unit] Section

This section describes the unit and defines its relationships to other units.

[Unit]
Description=My Application Server
Documentation=https://example.com/docs
After=network.target postgresql.service
Requires=postgresql.service
Wants=redis.service
DirectivePurpose
Description=Human-readable name shown in systemctl status
Documentation=URL or man page reference
After=Start this unit after the listed units
Before=Start this unit before the listed units
Requires=Hard dependency -- if the required unit fails, this unit fails too
Wants=Soft dependency -- if the wanted unit fails, this unit still starts
BindsTo=Like Requires, but also stops this unit if the bound unit stops
Conflicts=Cannot run at the same time as the listed units
ConditionPathExists=Only start if the given path exists

After vs Requires: A Critical Distinction

These two directives do different things, and confusing them is a very common mistake:

  • After= controls ordering -- "start me after X has started"
  • Requires= controls dependency -- "if X fails, I fail too"

You almost always want both together:

# WRONG: ordering without dependency
After=postgresql.service
# PostgreSQL starts first, but if it fails, your app starts anyway

# WRONG: dependency without ordering
Requires=postgresql.service
# They might start at the same time (parallel), causing race conditions

# RIGHT: both together
After=postgresql.service
Requires=postgresql.service
# PostgreSQL starts first, AND your app won't start if PostgreSQL fails

Think About It: When would you use Wants= instead of Requires=? Think of a case where you would prefer your application to start even if an optional dependency failed.

The [Service] Section

This is where you define how the service actually runs.

[Service]
Type=simple
User=appuser
Group=appgroup
WorkingDirectory=/opt/myapp
Environment=NODE_ENV=production
EnvironmentFile=/opt/myapp/.env
ExecStartPre=/opt/myapp/check-config.sh
ExecStart=/opt/myapp/server --port 8080
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -TERM $MAINPID
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal

We will explore each important directive in detail below.

The [Install] Section

This section defines how the unit integrates with the boot process.

[Install]
WantedBy=multi-user.target
DirectivePurpose
WantedBy=When enabled, add this unit to the listed target's "wants"
RequiredBy=When enabled, add this unit to the listed target's "requires"
Also=When enabling this unit, also enable the listed units
Alias=Additional names for this unit

WantedBy=multi-user.target is the most common value. It means "start this service when the system reaches multi-user mode" -- which is the normal boot target for servers.


ExecStart, ExecStop, and ExecReload

ExecStart

The most important directive. This is the command that starts your service:

# Simple command
ExecStart=/usr/bin/python3 /opt/myapp/server.py

# With arguments
ExecStart=/usr/bin/node /opt/myapp/index.js --port 3000

# IMPORTANT: Must be an absolute path. This will NOT work:
# ExecStart=python3 server.py    <-- WRONG

Rules for ExecStart:

  • Must use an absolute path to the executable
  • For Type=simple and Type=forking, there can be only one ExecStart line
  • For Type=oneshot, you can have multiple ExecStart lines

ExecStartPre and ExecStartPost

Run commands before or after the main process starts:

ExecStartPre=/opt/myapp/validate-config.sh
ExecStart=/opt/myapp/server
ExecStartPost=/opt/myapp/notify-started.sh

Prefix with - to ignore failures:

# If this check fails, still start the service
ExecStartPre=-/opt/myapp/optional-check.sh

ExecStop

How to stop the service. If not specified, systemd sends SIGTERM (and then SIGKILL after a timeout):

# Custom graceful shutdown
ExecStop=/opt/myapp/graceful-shutdown.sh

ExecReload

What to do when systemctl reload is called. Typically sends SIGHUP:

ExecReload=/bin/kill -HUP $MAINPID

$MAINPID is a special variable systemd sets to the PID of the main process.


Service Types: Type=

The Type= directive tells systemd how your service starts and how to track its main process. Getting this wrong is one of the most common sources of service management bugs.

Type=simple (Default)

systemd considers the service "started" as soon as ExecStart runs. The process specified by ExecStart is the main process.

[Service]
Type=simple
ExecStart=/usr/bin/python3 /opt/myapp/server.py

Use when: your application runs in the foreground and does not fork.

Type=forking

For traditional daemons that fork a child process and then the parent exits. systemd considers the service started when the parent process exits.

[Service]
Type=forking
PIDFile=/var/run/myapp.pid
ExecStart=/opt/myapp/start.sh

Use when: your application daemonizes itself (forks into the background). You usually need PIDFile= so systemd can track the main process.

Type=oneshot

For services that do a single task and then exit. systemd waits for the process to finish before considering the unit "started."

[Service]
Type=oneshot
ExecStart=/opt/myapp/run-migration.sh
ExecStart=/opt/myapp/seed-database.sh
RemainAfterExit=yes

Use when: you need to run a setup task at boot (like loading firewall rules). With RemainAfterExit=yes, the unit shows as "active" even after the process exits.

Type=notify

The service sends a notification to systemd when it is ready. This is the most precise way to signal readiness.

[Service]
Type=notify
ExecStart=/opt/myapp/server

The application must call sd_notify(0, "READY=1") (using the systemd library) or write to the $NOTIFY_SOCKET. Many modern services support this (e.g., PostgreSQL, nginx with certain configurations).

Type=exec

Similar to simple, but systemd considers the service started only after the binary has been successfully executed (after the exec() system call). This catches cases where the binary does not exist or cannot be executed.

+---------------------------------------------------+
|  Type=simple   ->  Started immediately             |
|  Type=exec     ->  Started after exec() succeeds   |
|  Type=forking  ->  Started when parent exits       |
|  Type=oneshot  ->  Started when process exits      |
|  Type=notify   ->  Started when service signals    |
+---------------------------------------------------+

Think About It: You have an application that takes 30 seconds to warm up (loading data into memory, connecting to databases) before it can serve requests. Which Type would you choose, and why?


Restart Policies

One of systemd's most valuable features: automatic restart when a service crashes.

[Service]
Restart=on-failure
RestartSec=5

Restart= Options

ValueRestarts On
noNever restart (default)
on-successClean exit (exit code 0)
on-failureNon-zero exit code, signal, timeout, watchdog
on-abnormalSignal, timeout, watchdog (but NOT non-zero exit)
on-abortUnclean signal only
on-watchdogWatchdog timeout only
alwaysAlways restart, no matter what

For most services, you want either on-failure or always:

# Restart only on crashes (not on intentional stops)
Restart=on-failure
RestartSec=5

# Always restart (even after clean exit -- useful for workers)
Restart=always
RestartSec=5

Preventing Restart Loops

If a service is badly broken, you do not want systemd to restart it forever:

[Service]
Restart=on-failure
RestartSec=5
StartLimitIntervalSec=300
StartLimitBurst=5

This means: if the service fails 5 times within 300 seconds (5 minutes), stop trying. The service enters a "failed" state.

Distro Note: StartLimitIntervalSec and StartLimitBurst belong in the [Unit] section on older systemd versions (before 230). On modern systems, they work in either section, but [Unit] is more portable.


Hands-On: Writing a Custom Service

Let us write a proper service for a Python web application.

Step 1: Create the Application

sudo mkdir -p /opt/mywebapp
sudo tee /opt/mywebapp/app.py << 'PYTHON'
#!/usr/bin/env python3
"""A tiny HTTP server for demonstration."""
from http.server import HTTPServer, SimpleHTTPRequestHandler
import os
import signal
import sys

PORT = int(os.environ.get('PORT', 8080))

def graceful_shutdown(signum, frame):
    print(f"Received signal {signum}, shutting down gracefully...", flush=True)
    sys.exit(0)

signal.signal(signal.SIGTERM, graceful_shutdown)

print(f"Starting server on port {PORT}", flush=True)
server = HTTPServer(('', PORT), SimpleHTTPRequestHandler)
print(f"Server is ready and listening on port {PORT}", flush=True)
server.serve_forever()
PYTHON
sudo chmod +x /opt/mywebapp/app.py

Step 2: Create a Dedicated User

sudo useradd --system --no-create-home --shell /usr/sbin/nologin mywebapp

Step 3: Write the Unit File

sudo tee /etc/systemd/system/mywebapp.service << 'UNIT'
[Unit]
Description=My Python Web Application
After=network.target
Documentation=https://example.com/mywebapp

[Service]
Type=simple
User=mywebapp
Group=mywebapp
WorkingDirectory=/opt/mywebapp
Environment=PORT=8080
ExecStart=/usr/bin/python3 /opt/mywebapp/app.py
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
StartLimitIntervalSec=300
StartLimitBurst=5

# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes

StandardOutput=journal
StandardError=journal
SyslogIdentifier=mywebapp

[Install]
WantedBy=multi-user.target
UNIT

Step 4: Deploy and Start

# Reload systemd to pick up the new unit file
sudo systemctl daemon-reload

# Start and enable the service
sudo systemctl enable --now mywebapp.service

# Check status
systemctl status mywebapp.service

# Test it
curl http://localhost:8080/

Step 5: Test Restart Behavior

# Find the main PID
systemctl show mywebapp.service --property=MainPID
# MainPID=12345

# Kill it rudely (simulating a crash)
sudo kill -9 $(systemctl show mywebapp.service --property=MainPID --value)

# Wait a moment, then check -- it should have restarted
sleep 6
systemctl status mywebapp.service
# Notice the PID has changed and the service is active

Step 6: Check the Logs

# View all logs for this service
journalctl -u mywebapp.service --no-pager

# Follow logs in real time
journalctl -u mywebapp.service -f

Clean Up

sudo systemctl disable --now mywebapp.service
sudo rm /etc/systemd/system/mywebapp.service
sudo rm -rf /opt/mywebapp
sudo userdel mywebapp
sudo systemctl daemon-reload

Service Security Hardening

systemd provides powerful security directives that sandbox your service. Use them wherever possible:

[Service]
# Run as non-root
User=myapp
Group=myapp

# Cannot gain new privileges (e.g., via setuid binaries)
NoNewPrivileges=yes

# Make the entire filesystem read-only except specified paths
ProtectSystem=strict
ReadWritePaths=/var/lib/myapp /var/log/myapp

# Hide /home, /root, /run/user
ProtectHome=yes

# Private /tmp (isolated from other services)
PrivateTmp=yes

# Cannot modify kernel variables
ProtectKernelTunables=yes

# Cannot load kernel modules
ProtectKernelModules=yes

# Restrict network families
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX

# Restrict system calls
SystemCallFilter=@system-service

You can check the security score of any service:

systemd-analyze security mywebapp.service

This gives each service a score from 0 (fully exposed) to 10 (fully locked down) and shows which hardening directives are missing.


Dependencies Deep Dive

Ordering: After= and Before=

These control when units start relative to each other:

# My app starts AFTER network and database are ready
After=network.target postgresql.service

# My app starts BEFORE the monitoring agent
Before=monitoring-agent.service

Dependency: Requires= and Wants=

These control whether units must succeed:

# Hard dependency: if PostgreSQL fails to start, my app fails too
Requires=postgresql.service

# Soft dependency: try to start Redis, but my app works without it
Wants=redis.service

Combining Them

A complete dependency setup:

[Unit]
Description=My Application
After=network.target postgresql.service redis.service
Requires=postgresql.service
Wants=redis.service

This means:

  1. Start after network, PostgreSQL, and Redis
  2. Fail if PostgreSQL is not running
  3. Continue even if Redis is not running

BindsTo=

Stronger than Requires. If the bound unit stops at any time (not just at startup), this unit also stops:

[Unit]
BindsTo=postgresql.service
After=postgresql.service

Conflicts=

Ensures two units never run simultaneously:

[Unit]
Conflicts=apache2.service

If you start this service, apache2.service is stopped automatically.


systemd Timers: The Modern Cron

systemd timers are a powerful replacement for cron jobs. They offer better logging, dependency management, and resource control.

Timer Anatomy

A timer requires two files:

  1. A .timer unit (the schedule)
  2. A .service unit (the actual work)

Example: Run a Backup Every Day at 2 AM

The service file (/etc/systemd/system/backup.service):

[Unit]
Description=Daily Backup Job

[Service]
Type=oneshot
ExecStart=/opt/scripts/backup.sh
User=backup
StandardOutput=journal

The timer file (/etc/systemd/system/backup.timer):

[Unit]
Description=Run Backup Daily at 2 AM

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
RandomizedDelaySec=300

[Install]
WantedBy=timers.target
# Enable the timer (not the service)
sudo systemctl daemon-reload
sudo systemctl enable --now backup.timer

# Check when it will fire next
systemctl list-timers backup.timer --no-pager

Timer Directives

DirectivePurpose
OnCalendar=Calendar-based schedule (like cron)
OnBootSec=Run X time after boot
OnUnitActiveSec=Run X time after the service last ran
OnStartupSec=Run X time after systemd started
Persistent=trueIf the system was off when the timer should have fired, run it at next boot
RandomizedDelaySec=Add random delay to prevent thundering herd
AccuracySec=How precise the timer needs to be

OnCalendar Syntax

The OnCalendar format is DayOfWeek Year-Month-Day Hour:Minute:Second:

OnCalendar=*-*-* 02:00:00          # Every day at 2:00 AM
OnCalendar=Mon *-*-* 09:00:00      # Every Monday at 9:00 AM
OnCalendar=*-*-01 00:00:00         # First day of every month
OnCalendar=*-01-01 00:00:00        # January 1st every year
OnCalendar=hourly                   # Every hour
OnCalendar=daily                    # Every day at midnight
OnCalendar=weekly                   # Every Monday at midnight
OnCalendar=*-*-* *:00:00           # Every hour on the hour
OnCalendar=*-*-* *:*:00            # Every minute
OnCalendar=*-*-* 08..17:00:00      # Every hour from 8 AM to 5 PM

Validate your schedule with systemd-analyze:

# When will this fire next?
systemd-analyze calendar "Mon *-*-* 09:00:00"
# Next elapse: Mon 2025-03-17 09:00:00 UTC

# How about every 15 minutes?
systemd-analyze calendar "*-*-* *:00/15:00"

Relative Timers

Instead of calendar-based, run relative to events:

[Timer]
# 15 minutes after boot
OnBootSec=15min

# Every 30 minutes after the service last ran
OnUnitActiveSec=30min

Socket Activation Basics

Socket activation lets systemd listen on a port and start the service only when a connection arrives. This means:

  • Services start on-demand, not at boot (faster boot)
  • If no one connects, the service never runs (saves resources)
  • systemd can restart a crashed service without losing queued connections

How Socket Activation Works

+----------+       +---------+       +----------+
| Client   | ----> | systemd | ----> | Service  |
| connects |       | (holds  |       | (started |
|          |       |  socket)|       |  on demand)|
+----------+       +---------+       +----------+

1. systemd creates the socket and listens
2. Client connects to the socket
3. systemd starts the service
4. systemd passes the socket file descriptor to the service
5. Service handles the connection

Example: Socket-Activated Service

Socket file (/etc/systemd/system/myapp.socket):

[Unit]
Description=My App Socket

[Socket]
ListenStream=8080
Accept=no

[Install]
WantedBy=sockets.target

Service file (/etc/systemd/system/myapp.service):

[Unit]
Description=My App Service
Requires=myapp.socket

[Service]
Type=simple
ExecStart=/opt/myapp/server
# Enable the socket (not the service directly)
sudo systemctl enable --now myapp.socket

# The service is not running yet
systemctl is-active myapp.service
# inactive

# Connect to the socket
curl http://localhost:8080/
# Now the service starts automatically

systemctl is-active myapp.service
# active

Debug This: Service Keeps Crashing

Your custom service starts but immediately dies, and systemd keeps restarting it:

● myapp.service - My Application
     Active: activating (auto-restart) (Result: exit-code)

The journal shows:

myapp.service: Main process exited, code=exited, status=1/FAILURE
myapp.service: Scheduled restart job, restart counter is at 4.

Here is your debugging checklist:

  1. Check the full journal output:

    journalctl -u myapp.service -n 100 --no-pager
    
  2. Run ExecStart manually to see errors directly:

    systemctl cat myapp.service | grep ExecStart
    # Then run that command as the same user:
    sudo -u appuser /opt/myapp/server
    
  3. Check that the binary exists and is executable:

    ls -la /opt/myapp/server
    file /opt/myapp/server
    
  4. Check that the user has correct permissions:

    sudo -u appuser ls -la /opt/myapp/
    sudo -u appuser cat /opt/myapp/config.yaml
    
  5. Check environment variables:

    systemctl show myapp.service --property=Environment
    
  6. Temporarily stop restart looping to debug:

    sudo systemctl stop myapp.service
    # Now you can investigate without it restarting
    

Where Unit Files Live

+------------------------------------------------------------------+
|  /usr/lib/systemd/system/     <- Vendor/package-provided units   |
|                                   (do NOT edit these directly)    |
|                                                                   |
|  /etc/systemd/system/         <- Admin-created units (your stuff)|
|                                   (this is where you create them)|
|                                                                   |
|  /run/systemd/system/         <- Runtime-only units              |
|                                   (disappear on reboot)          |
+------------------------------------------------------------------+

Priority: /etc > /run > /usr/lib

To override a vendor-provided unit without editing it directly:

# Create an override directory
sudo systemctl edit nginx.service
# This opens an editor for /etc/systemd/system/nginx.service.d/override.conf

Or manually:

sudo mkdir -p /etc/systemd/system/nginx.service.d/
sudo tee /etc/systemd/system/nginx.service.d/override.conf << 'OVERRIDE'
[Service]
LimitNOFILE=65535
OVERRIDE
sudo systemctl daemon-reload
sudo systemctl restart nginx

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 16 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - Unit files have three sections: [Unit], [Service], [Install]  |
|  - ExecStart= must use absolute paths                           |
|  - Type= controls how systemd tracks your process:              |
|    simple, forking, oneshot, notify                              |
|  - Restart=on-failure with RestartSec= for automatic recovery   |
|  - After= controls order; Requires= controls dependency         |
|  - Use both together for proper dependency management            |
|  - systemd timers replace cron with OnCalendar= schedules       |
|  - Socket activation starts services on-demand                   |
|  - Put custom units in /etc/systemd/system/                     |
|  - Always run daemon-reload after editing unit files             |
|  - Security hardening: User=, ProtectSystem=, PrivateTmp=       |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Write a One-Shot Service

Write a Type=oneshot service that creates a /tmp/system-booted file containing the current timestamp. Enable it so it runs at boot.

Exercise 2: Build a Timer

Create a systemd timer that runs a script every 15 minutes. The script should append the current date and system load average (uptime) to a log file. Verify the timer with systemctl list-timers.

Exercise 3: Dependency Chain

Create three services: service-a, service-b, and service-c. Configure them so that service-c requires service-b, which requires service-a. Verify the ordering:

sudo systemctl start service-c.service
# Should automatically start a and b first

Exercise 4: Crash Recovery

Create a service that intentionally exits with an error after 5 seconds. Set up Restart=on-failure with RestartSec=3. Watch it restart using journalctl -f. Then add StartLimitBurst=3 and StartLimitIntervalSec=60 and observe what happens after the third failure.

Bonus Challenge

Convert one of your existing cron jobs to a systemd timer. Compare the two approaches. Which gives you better logging? Which is easier to debug when something goes wrong?

Logging with journald & syslog

Why This Matters

It is 3 AM. Your monitoring system pages you: "Website down." You SSH into the server, and the application is not running. You need answers, fast. When did it stop? What error caused it? Did something else fail first?

Every answer lives in the logs.

Logging is how Linux systems tell you what is happening, what went wrong, and what happened leading up to a failure. Without good logging, you are debugging blind. With good logging, you have a time machine that lets you replay exactly what happened on your system.

Modern Linux has two logging systems that work together: journald (systemd's structured journal) and rsyslog (the traditional syslog daemon). This chapter teaches you to query, filter, manage, and rotate logs so you always have the information you need when something breaks.


Try This Right Now

# See the last 20 log entries
journalctl -n 20 --no-pager

# Follow the log in real time (like tail -f but for everything)
journalctl -f
# Press Ctrl+C to stop

# See logs from the current boot only
journalctl -b --no-pager | head -30

# Find logs from a specific service
journalctl -u sshd.service -n 10 --no-pager

# See all ERROR-level messages from the last hour
journalctl -p err --since "1 hour ago" --no-pager

The Two Logging Systems

Modern Linux distributions run two logging systems side by side:

+----------------------------------------------------------+
|  Applications / Services / Kernel                         |
+----------+----------------------------+------------------+
           |                            |
           v                            v
  +--------+---------+       +----------+-----------+
  |    journald      |       |     rsyslog          |
  | (systemd journal)|       | (traditional syslog) |
  | Binary, indexed  |       | Plain text files     |
  | /run/log/journal |       | /var/log/syslog      |
  | or               |       | /var/log/messages    |
  | /var/log/journal |       | /var/log/auth.log    |
  +------------------+       +----------------------+

journald collects logs from:

  • systemd services (stdout/stderr)
  • The kernel (kmsg)
  • Syslog messages
  • Audit framework

rsyslog receives messages from:

  • journald (forwarded)
  • Direct syslog connections
  • Remote log sources

On most modern systems, journald is the primary collector, and rsyslog writes the traditional text files that many tools expect.


journalctl: Your Log Investigation Tool

journalctl is the command-line tool for querying the systemd journal. It is incredibly powerful once you know its filtering options.

Basic Usage

# View all logs (oldest first, paged)
journalctl

# View all logs (newest first)
journalctl -r

# View last N entries
journalctl -n 50

# Follow new entries in real time
journalctl -f

# No pager (dump to stdout, useful for piping)
journalctl --no-pager

Filtering by Unit

This is the most common filter -- show logs from a specific service:

# Logs from nginx
journalctl -u nginx.service --no-pager

# Logs from multiple units
journalctl -u nginx.service -u php-fpm.service --no-pager

# Follow a specific service's logs
journalctl -u myapp.service -f

Filtering by Time

# Logs since a specific time
journalctl --since "2025-03-10 14:00:00" --no-pager

# Logs in a time range
journalctl --since "2025-03-10 14:00" --until "2025-03-10 15:00" --no-pager

# Relative time expressions
journalctl --since "1 hour ago" --no-pager
journalctl --since "30 min ago" --no-pager
journalctl --since yesterday --no-pager
journalctl --since today --no-pager

Filtering by Boot

# Current boot only
journalctl -b

# Previous boot
journalctl -b -1

# Two boots ago
journalctl -b -2

# List all recorded boots
journalctl --list-boots

Sample output from --list-boots:

-3 abc123... Sat 2025-03-08 10:15:22 — Sat 2025-03-08 22:01:15
-2 def456... Sun 2025-03-09 08:30:11 — Sun 2025-03-09 23:45:00
-1 ghi789... Mon 2025-03-10 07:00:05 — Mon 2025-03-10 23:59:59
 0 jkl012... Tue 2025-03-11 06:55:30 — Tue 2025-03-11 14:22:10

Think About It: Why would you want to look at logs from a previous boot? Think about what happens when a system crashes and reboots -- the clues to the crash are in the previous boot's logs, not the current one.

Filtering by Priority

Syslog priorities, from most to least severe:

PriorityKeywordMeaning
0emergSystem is unusable
1alertImmediate action needed
2critCritical conditions
3errError conditions
4warningWarning conditions
5noticeNormal but significant
6infoInformational
7debugDebug-level messages
# Show only errors and above (emerg, alert, crit, err)
journalctl -p err --no-pager

# Show warnings and above
journalctl -p warning --no-pager

# Show a specific priority range
journalctl -p warning..err --no-pager

# Combine with other filters
journalctl -p err -u nginx.service --since today --no-pager

Filtering by Other Fields

The journal stores structured data. You can filter on many fields:

# By process ID
journalctl _PID=1234 --no-pager

# By user ID
journalctl _UID=1000 --no-pager

# By executable path
journalctl _EXE=/usr/sbin/sshd --no-pager

# By hostname (useful in centralized logging)
journalctl _HOSTNAME=webserver01 --no-pager

# By kernel messages only
journalctl -k --no-pager
# or equivalently:
journalctl _TRANSPORT=kernel --no-pager

Output Formats

# Default format (human-readable)
journalctl -n 5

# Short with precise timestamps
journalctl -n 5 -o short-precise

# JSON format (for parsing)
journalctl -n 5 -o json --no-pager

# JSON, one entry per line (great for piping to jq)
journalctl -n 5 -o json-pretty --no-pager

# Verbose (show all fields)
journalctl -n 1 -o verbose --no-pager

# Export format (for backup/transfer)
journalctl -o export --no-pager > journal-export.bin

The verbose output is particularly useful for understanding what metadata the journal stores:

journalctl -n 1 -o verbose --no-pager
Tue 2025-03-11 14:22:01.123456 UTC [s=abc123;i=42;b=def456...]
    _TRANSPORT=syslog
    PRIORITY=6
    SYSLOG_IDENTIFIER=sshd
    _PID=1234
    _UID=0
    _GID=0
    _EXE=/usr/sbin/sshd
    _COMM=sshd
    _CMDLINE=sshd: user [priv]
    MESSAGE=Accepted publickey for user from 10.0.0.1 port 54321
    ...

Hands-On: Log Investigation Workflow

Let us practice a realistic log investigation. We will look at SSH authentication events.

Step 1: Find SSH Events

journalctl -u sshd.service --since today --no-pager

Distro Note: Use -u ssh.service on Ubuntu/Debian.

Step 2: Filter for Authentication Failures

journalctl -u sshd.service --since today --no-pager | grep -i "failed\|invalid\|error"

Or use the journal's native grep:

journalctl -u sshd.service --since today --no-pager --grep="Failed password"

Step 3: Count Events

# How many failed login attempts today?
journalctl -u sshd.service --since today --no-pager --grep="Failed password" | wc -l

Step 4: Extract Attacker IPs

journalctl -u sshd.service --since today --no-pager --grep="Failed password" \
  | grep -oP 'from \K[0-9.]+' | sort | uniq -c | sort -rn | head -10

This gives you the top 10 IP addresses attempting failed SSH logins.

Step 5: See the Full Context Around an Event

# Find an interesting timestamp, then look at everything around that time
journalctl --since "2025-03-11 14:20:00" --until "2025-03-11 14:25:00" --no-pager

Persistent Journal Storage

By default, many distributions store the journal only in /run/log/journal/, which means logs are lost on reboot. For production systems, you want persistent storage.

Check Your Current Setup

# Where is the journal stored?
journalctl --disk-usage

# Is it persistent?
ls -la /var/log/journal/ 2>/dev/null && echo "Persistent" || echo "Volatile"

Enable Persistent Storage

# Create the persistent journal directory
sudo mkdir -p /var/log/journal

# Set correct ownership
sudo systemd-tmpfiles --create --prefix /var/log/journal

# Restart journald to start using persistent storage
sudo systemctl restart systemd-journald

# Verify
journalctl --disk-usage

Or configure it in the journal configuration:

sudo mkdir -p /etc/systemd/journald.conf.d/
sudo tee /etc/systemd/journald.conf.d/persistent.conf << 'CONF'
[Journal]
Storage=persistent
CONF
sudo systemctl restart systemd-journald

The Storage= options are:

ValueBehavior
autoPersistent if /var/log/journal/ exists, otherwise volatile (default)
persistentAlways persistent, creates the directory if needed
volatileOnly store in /run/log/journal/ (lost on reboot)
noneDo not store logs at all (not recommended)

Journal Size Management

The journal can grow large on busy systems. Configure size limits:

sudo tee /etc/systemd/journald.conf.d/size.conf << 'CONF'
[Journal]
# Maximum disk space the journal can use
SystemMaxUse=500M

# Maximum size of individual journal files
SystemMaxFileSize=50M

# Keep at least this much free disk space
SystemKeepFree=1G

# Maximum time to keep entries
MaxRetentionSec=30day
CONF
sudo systemctl restart systemd-journald

Manual Cleanup

# See current disk usage
journalctl --disk-usage
# Archived and active journals take up 1.2G in the file system.

# Remove entries older than 2 weeks
sudo journalctl --vacuum-time=2weeks

# Reduce journal to a specific size
sudo journalctl --vacuum-size=500M

# Remove entries beyond a number of journal files
sudo journalctl --vacuum-files=5

Think About It: What is the trade-off between keeping more log history and managing disk space? On a production server, how far back would you want to keep logs, and why?


rsyslog and /var/log

While journald is the modern standard, rsyslog and the traditional /var/log/ files are still important. Many tools, scripts, and monitoring systems expect plain text log files.

The /var/log Directory

ls /var/log/

Common log files:

FileContents
/var/log/syslogGeneral system messages (Debian/Ubuntu)
/var/log/messagesGeneral system messages (RHEL/Fedora)
/var/log/auth.logAuthentication events (Debian/Ubuntu)
/var/log/secureAuthentication events (RHEL/Fedora)
/var/log/kern.logKernel messages
/var/log/dmesgBoot-time kernel messages
/var/log/boot.logBoot process messages
/var/log/cronCron job execution
/var/log/maillogMail server logs
/var/log/nginx/nginx access and error logs
/var/log/apt/Package management logs (Debian/Ubuntu)
/var/log/dnf.logPackage management log (Fedora)

Distro Note: Debian/Ubuntu use /var/log/syslog and /var/log/auth.log. RHEL and Fedora use /var/log/messages and /var/log/secure. The content is the same; only the filenames differ.

rsyslog Configuration

rsyslog's main configuration is in /etc/rsyslog.conf with additional files in /etc/rsyslog.d/.

# View main config
cat /etc/rsyslog.conf

The configuration uses rules in the format:

facility.priority    destination

Example rules:

# All auth messages go to auth.log
auth,authpriv.*      /var/log/auth.log

# Everything except auth goes to syslog
*.*;auth,authpriv.none    /var/log/syslog

# Kernel messages
kern.*               /var/log/kern.log

# Emergency messages to all logged-in users
*.emerg              :omusrmsg:*

Syslog Facilities

FacilityPurpose
authAuthentication
authprivPrivate authentication
cronCron daemon
daemonSystem daemons
kernKernel
mailMail system
userUser processes
local0-local7Custom use

Testing Syslog

# Send a test message to syslog
logger "This is a test message from $(whoami)"

# Send with a specific facility and priority
logger -p local0.notice "Test from local0"

# Check it arrived
journalctl --since "1 min ago" --no-pager
tail -5 /var/log/syslog 2>/dev/null || tail -5 /var/log/messages 2>/dev/null

Log Rotation with logrotate

Text log files grow forever unless rotated. logrotate handles this automatically: it compresses old logs, removes ancient ones, and signals services to reopen their log files.

How logrotate Works

+------------------+    logrotate    +------------------+
| access.log       | -------------> | access.log       |  (current, fresh)
| (500 MB, 7 days) |                | access.log.1     |  (yesterday)
+------------------+                | access.log.2.gz  |  (2 days ago, compressed)
                                    | access.log.3.gz  |  (3 days ago, compressed)
                                    +------------------+

Configuration

Global settings are in /etc/logrotate.conf. Per-application configs are in /etc/logrotate.d/.

# See what configs exist
ls /etc/logrotate.d/

Example configuration for a custom application:

sudo tee /etc/logrotate.d/myapp << 'CONF'
/var/log/myapp/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 myapp myapp
    sharedscripts
    postrotate
        systemctl reload myapp.service 2>/dev/null || true
    endscript
}
CONF

Let us break down each directive:

DirectiveMeaning
dailyRotate once per day (also: weekly, monthly)
missingokDo not error if the log file is missing
rotate 14Keep 14 rotated files before deleting
compressCompress rotated files with gzip
delaycompressDo not compress the most recent rotated file
notifemptyDo not rotate if the file is empty
create 0640 myapp myappCreate new log file with these permissions
sharedscriptsRun postrotate only once, not per file
postrotate/endscriptCommands to run after rotation

Testing logrotate

# Dry run (see what would happen without doing it)
sudo logrotate --debug /etc/logrotate.d/myapp

# Force a rotation right now
sudo logrotate --force /etc/logrotate.d/myapp

# Run the full logrotate (as cron normally does)
sudo logrotate /etc/logrotate.conf

Viewing the nginx Rotation Config

cat /etc/logrotate.d/nginx

Typical output:

/var/log/nginx/*.log {
    daily
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    prerotate
        if [ -d /etc/logrotate.d/httpd-prerotate ]; then
            run-parts /etc/logrotate.d/httpd-prerotate
        fi
    endscript
    postrotate
        invoke-rc.d nginx rotate >/dev/null 2>&1
    endscript
}

Hands-On: Setting Up Complete Logging

Let us set up proper logging for a custom application.

Step 1: Create a Test Application That Logs

sudo mkdir -p /opt/logdemo
sudo tee /opt/logdemo/app.sh << 'SCRIPT'
#!/bin/bash
while true; do
    # Log to stdout (captured by journald)
    echo "[INFO] Processing request at $(date)"

    # Simulate occasional errors
    if (( RANDOM % 5 == 0 )); then
        echo "[ERROR] Something went wrong at $(date)" >&2
    fi

    sleep 5
done
SCRIPT
sudo chmod +x /opt/logdemo/app.sh

Step 2: Create a Service for It

sudo tee /etc/systemd/system/logdemo.service << 'UNIT'
[Unit]
Description=Logging Demo Application

[Service]
Type=simple
ExecStart=/opt/logdemo/app.sh
StandardOutput=journal
StandardError=journal
SyslogIdentifier=logdemo
Restart=on-failure

[Install]
WantedBy=multi-user.target
UNIT

sudo systemctl daemon-reload
sudo systemctl start logdemo.service

Step 3: Query the Logs

# All logs from our demo
journalctl -u logdemo.service --no-pager -n 20

# Only errors
journalctl -u logdemo.service -p err --no-pager

# Follow in real time
journalctl -u logdemo.service -f

# JSON output for parsing
journalctl -u logdemo.service -o json-pretty -n 5 --no-pager

Step 4: Clean Up

sudo systemctl stop logdemo.service
sudo rm /etc/systemd/system/logdemo.service
sudo rm -rf /opt/logdemo
sudo systemctl daemon-reload

Debug This: Where Did My Logs Go?

You check journalctl -u myapp.service and see nothing. The service is running. Where are the logs?

Diagnosis checklist:

  1. Is the service actually producing output?

    systemctl cat myapp.service | grep -E "Standard(Output|Error)"
    

    If StandardOutput=null or StandardError=null, output is discarded.

  2. Is the application logging to a file instead of stdout? Many applications write to their own log files. Check the app's configuration.

    ls -la /var/log/myapp/ 2>/dev/null
    
  3. Is the journal full and dropping messages?

    journalctl --disk-usage
    journalctl -p warning --no-pager | grep -i "journal"
    
  4. Is the service running under the wrong identifier?

    # Maybe it is logging under a different name
    journalctl --since "5 min ago" --no-pager | grep -i myapp
    
  5. Are you looking at the right boot?

    # Make sure you are looking at the current boot
    journalctl -u myapp.service -b 0 --no-pager
    

Centralized Logging Concepts

In production, you rarely look at logs on individual servers. Instead, you send logs to a centralized system.

Why Centralize?

  • Persistence: If a server dies, its local logs may be lost
  • Correlation: See events from multiple servers in one place
  • Search: Query across all servers at once
  • Alerting: Trigger alerts on specific log patterns
  • Compliance: Some regulations require centralized, tamper-proof logs

Common Approaches

+----------+     +----------+     +----------+
| Server A |     | Server B |     | Server C |
|  rsyslog | --> |          | --> |          |
|  journald|     |  rsyslog |     |  journald|
+----+-----+     +----+-----+     +----+-----+
     |                |                |
     +--------+-------+-------+--------+
              |               |
              v               v
     +-----------------+  +------------------+
     |  Central Syslog |  |  Elasticsearch   |
     |  Server         |  |  (ELK/OpenSearch)|
     +-----------------+  +------------------+

Forwarding with rsyslog

To send logs to a remote syslog server:

sudo tee /etc/rsyslog.d/50-remote.conf << 'CONF'
# Forward all logs to central server via TCP
*.* @@logserver.example.com:514

# Or via UDP (single @)
# *.* @logserver.example.com:514
CONF
sudo systemctl restart rsyslog

Forwarding with journald

journald can forward to a remote journal:

sudo tee /etc/systemd/journal-upload.conf << 'CONF'
[Upload]
URL=http://logserver.example.com:19532
CONF
sudo systemctl enable --now systemd-journal-upload.service

Open Source Centralized Logging Stacks

StackComponents
ELKElasticsearch + Logstash + Kibana
OpenSearchOpenSearch + Data Prepper + OpenSearch Dashboards
LokiGrafana Loki + Promtail + Grafana
GraylogGraylog + MongoDB + OpenSearch

Grafana Loki is particularly popular because it is lightweight and integrates naturally with Grafana dashboards. It is designed to be "like Prometheus, but for logs."


What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 17 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - journalctl is your primary log investigation tool             |
|  - Filter by unit (-u), time (--since/--until), priority (-p),  |
|    and boot (-b)                                                 |
|  - Enable persistent journal storage for production systems      |
|  - Manage journal size with SystemMaxUse= and vacuum commands   |
|  - rsyslog writes traditional /var/log text files                |
|  - /var/log/syslog (Debian) or /var/log/messages (RHEL) for     |
|    general messages                                              |
|  - logrotate compresses and cleans old log files                 |
|  - Centralized logging (ELK, Loki, Graylog) for production     |
|  - logger command sends test messages to syslog                  |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Log Investigation

Use journalctl to answer these questions about your system:

  • How many error-level messages occurred today?
  • Which service produced the most log entries in the last hour?
  • When did your system last boot?

Exercise 2: Persistent Journal

Check whether your journal is persistent. If not, enable persistent storage and verify that logs survive a reboot (if you can reboot your system).

Exercise 3: Custom logrotate

Create a logrotate configuration for a fictional application that writes to /var/log/myapp/app.log. Configure it to rotate weekly, keep 8 weeks of history, compress old files, and test with logrotate --debug.

Exercise 4: Priority Filtering

Use journalctl -p to list all critical and error messages from the last 7 days. Are there any patterns? Any services that appear repeatedly?

Bonus Challenge

Set up rsyslog to write all authentication-related log entries to a separate file at /var/log/auth-audit.log. Configure logrotate for this file. Then generate some authentication events (SSH logins, sudo commands) and verify they appear in both journalctl and your custom log file.

Bash In Depth

Why This Matters

You already know how to type commands into a terminal. But have you ever been confused by why echo $HOME prints your home directory while echo '$HOME' prints the literal text $HOME? Have you wondered why rm * deletes all files but rm "*" tries to delete a file literally named *? Or why echo {1..10} prints 1 2 3 4 5 6 7 8 9 10?

These behaviors are not random. Bash processes every command line through a precise sequence of expansions before executing it. Understanding this sequence is what separates someone who uses the shell from someone who truly controls it.

This chapter takes you deep into how Bash works: the expansion order, quoting rules, variables, arrays, and special parameters. Master these, and you will write commands and scripts that do exactly what you intend, every time.


Try This Right Now

Run each of these and observe the differences carefully:

# Brace expansion
echo {a,b,c}-{1,2}

# Tilde expansion
echo ~
echo ~root

# Parameter expansion
name="Linux"
echo "Hello, $name"
echo "Hello, ${name}!"
echo "Length: ${#name}"

# Command substitution
echo "Today is $(date +%A)"

# Arithmetic expansion
echo "2 + 3 = $((2 + 3))"

# Globbing
echo /etc/*.conf | tr ' ' '\n' | head -5

Now observe how quoting changes things:

echo $HOME         # Expanded
echo "$HOME"       # Expanded
echo '$HOME'       # NOT expanded -- literal text
echo \$HOME        # NOT expanded -- escaped

The Shell Expansion Order

When you type a command and press Enter, Bash does not simply pass your text to the program. It processes it through a specific sequence of expansions, in this exact order:

+-------------------------------------------------------------------+
|  1. Brace Expansion        {a,b,c}  {1..5}                       |
|  2. Tilde Expansion        ~  ~user                               |
|  3. Parameter Expansion    $var  ${var}  ${var:-default}          |
|  4. Command Substitution   $(cmd)  `cmd`                          |
|  5. Arithmetic Expansion   $((expr))                              |
|  6. Word Splitting         (on unquoted results of 3, 4, 5)      |
|  7. Pathname Expansion     *  ?  [abc]  (globbing)                |
+-------------------------------------------------------------------+

Each step operates on the output of the previous step. This ordering matters because it determines what you can combine and what you cannot.

Step 1: Brace Expansion

Braces generate multiple strings. This happens first, before any variable expansion:

# Comma-separated list
echo {cat,dog,fish}
# cat dog fish

# Sequence
echo {1..5}
# 1 2 3 4 5

# Sequence with step
echo {0..20..5}
# 0 5 10 15 20

# Letter sequence
echo {a..f}
# a b c d e f

# Combinations (cartesian product)
echo {web,db}-{01,02}
# web-01 web-02 db-01 db-02

# Practical: create multiple directories
mkdir -p project/{src,tests,docs}/{v1,v2}
# Creates: project/src/v1  project/src/v2  project/tests/v1 ...

# Practical: backup a file
cp config.yml{,.bak}
# Equivalent to: cp config.yml config.yml.bak

Key rule: Brace expansion happens before variable expansion. This means you cannot use a variable inside braces for sequence generation:

n=5
echo {1..$n}
# Output: {1..5}  <-- Not expanded! Braces happen first, $n isn't resolved yet

Step 2: Tilde Expansion

The tilde ~ expands to home directories:

echo ~                # /home/yourusername
echo ~root            # /root
echo ~nobody          # /nonexistent (or wherever nobody's home is)
echo ~/Documents      # /home/yourusername/Documents

This only works at the beginning of a word. A tilde in the middle of text is just a literal tilde.

Step 3: Parameter Expansion

This is where variables get replaced with their values. Bash offers far more than just $var:

name="hello world"

# Basic expansion
echo $name          # hello world
echo ${name}        # hello world (braces clarify boundaries)
echo "${name}ish"   # hello worldish

# String length
echo ${#name}       # 11

# Default values
echo ${unset_var:-default}    # default (use default if unset/empty)
echo ${unset_var:=default}    # default (AND assign the default)

# Substring extraction
str="Hello, World!"
echo ${str:7}       # World!
echo ${str:7:5}     # World

# Pattern removal
path="/home/user/documents/file.tar.gz"
echo ${path##*/}    # file.tar.gz  (remove longest prefix matching */)
echo ${path#*/}     # home/user/documents/file.tar.gz  (remove shortest prefix)
echo ${path%%.*}    # /home/user/documents/file  (remove longest suffix matching .*)
echo ${path%.*}     # /home/user/documents/file.tar  (remove shortest suffix)

# Substitution
echo ${path/user/admin}      # /home/admin/documents/file.tar.gz  (first match)
echo ${path//o/0}            # /h0me/user/d0cuments/file.tar.gz  (all matches)

# Case modification (Bash 4+)
greeting="hello world"
echo ${greeting^}    # Hello world  (capitalize first letter)
echo ${greeting^^}   # HELLO WORLD  (capitalize all)
upper="HELLO"
echo ${upper,}       # hELLO  (lowercase first letter)
echo ${upper,,}      # hello  (lowercase all)

Think About It: Given the file path /var/log/nginx/access.log, how would you extract just the filename access.log using parameter expansion? What about just the extension log?

Step 4: Command Substitution

Replace a command with its output:

# Modern syntax (preferred)
echo "Today is $(date +%Y-%m-%d)"

# Old syntax (backticks -- avoid for readability)
echo "Today is `date +%Y-%m-%d`"

# Nested command substitution (much cleaner with $() than backticks)
echo "Config dir: $(dirname $(readlink -f /etc/resolv.conf))"

# Assign to variable
file_count=$(ls /etc/*.conf 2>/dev/null | wc -l)
echo "Found $file_count conf files"

Always prefer $(...) over backticks. Backticks are harder to read and cannot nest cleanly.

Step 5: Arithmetic Expansion

Perform integer math directly:

echo $((2 + 3))         # 5
echo $((10 / 3))        # 3  (integer division!)
echo $((10 % 3))        # 1  (modulo)
echo $((2 ** 10))       # 1024  (exponentiation)

x=10
echo $((x + 5))         # 15  (no $ needed inside $(()))
echo $((x++))           # 10  (post-increment, x is now 11)
echo $x                 # 11

# Comparison (returns 1 for true, 0 for false)
echo $((5 > 3))         # 1
echo $((5 < 3))         # 0

WARNING: Bash arithmetic is integer only. $((10 / 3)) gives 3, not 3.333. For floating point, use bc or awk.

Step 6: Word Splitting

After parameter expansion, command substitution, and arithmetic expansion, Bash splits the results into separate words based on the IFS (Internal Field Separator) variable.

# Default IFS is space, tab, newline
files="file1.txt file2.txt file3.txt"
for f in $files; do        # Word splitting splits into three words
    echo "Processing: $f"
done

# This is why quoting is critical:
filename="my important file.txt"
touch "$filename"           # Creates ONE file: "my important file.txt"
touch $filename             # Creates THREE files: "my", "important", "file.txt"

Word splitting does not happen on text inside double quotes. This is why you should almost always quote your variables.

Step 7: Pathname Expansion (Globbing)

The final step expands wildcard patterns into matching filenames:

# * matches any string (including empty)
echo /etc/*.conf

# ? matches exactly one character
echo /etc/host?

# [abc] matches one character from the set
echo /dev/sd[a-c]

# [!abc] or [^abc] matches one character NOT in the set
echo /dev/sd[!a]

# ** matches directories recursively (needs shopt -s globstar)
shopt -s globstar
echo /etc/**/*.conf

Globbing only happens on unquoted text. This is another reason quoting matters:

echo *.txt        # Expands to matching files
echo "*.txt"      # Literal string: *.txt

Quoting Rules

Quoting controls which expansions happen. This is one of the most important things to understand in Bash.

No Quotes

All expansions happen. Word splitting and globbing happen.

echo $HOME/*.txt
# Expands $HOME, then globs for .txt files

Double Quotes (")

Parameter expansion, command substitution, and arithmetic expansion happen. Word splitting and globbing do not happen.

echo "$HOME/*.txt"
# Expands $HOME to /home/user, but *.txt stays literal
# Output: /home/user/*.txt

# Preserves whitespace in variables
greeting="  hello   world  "
echo $greeting         # hello world  (whitespace collapsed)
echo "$greeting"       #   hello   world   (whitespace preserved)

Single Quotes (')

Nothing is expanded. Everything is literal.

echo '$HOME is not expanded'
# Output: $HOME is not expanded

echo 'No $(commands) or $((math)) either'
# Output: No $(commands) or $((math)) either

Escape Character ()

Removes the special meaning of the next character:

echo \$HOME            # $HOME  (literal dollar sign)
echo "She said \"hi\"" # She said "hi"
echo 'It'\''s here'    # It's here (ending and reopening single quotes)

The $'...' Syntax

Allows escape sequences like \n, \t:

echo $'Line 1\nLine 2\tTabbed'
# Line 1
# Line 2	Tabbed

Summary Table

ContextVariableCommand SubGlobWord Split
UnquotedYesYesYesYes
Double "..."YesYesNoNo
Single '...'NoNoNoNo

Hands-On: Quoting Pitfalls

Try each of these to internalize the differences:

# Setup
mkdir -p /tmp/quoting-lab
cd /tmp/quoting-lab
touch "file one.txt" "file two.txt" "file three.txt"

# WRONG: word splitting breaks filenames with spaces
for f in $(ls); do
    echo "File: $f"
done
# Output: each word separately (broken!)

# RIGHT: use glob instead of ls, with proper quoting
for f in *.txt; do
    echo "File: $f"
done
# Output: correct filenames

# The difference between $@ and $*
# Create a test script:
cat > /tmp/quoting-lab/test-args.sh << 'SCRIPT'
#!/bin/bash
echo '--- $* (unquoted) ---'
for arg in $*; do echo "  [$arg]"; done

echo '--- "$*" (quoted) ---'
for arg in "$*"; do echo "  [$arg]"; done

echo '--- $@ (unquoted) ---'
for arg in $@; do echo "  [$arg]"; done

echo '--- "$@" (quoted -- correct) ---'
for arg in "$@"; do echo "  [$arg]"; done
SCRIPT
chmod +x /tmp/quoting-lab/test-args.sh

# Test it with arguments that contain spaces
/tmp/quoting-lab/test-args.sh "hello world" "foo bar"

# Clean up
rm -rf /tmp/quoting-lab

Variables

Setting Variables

# No spaces around the = sign!
name="Alice"        # Correct
name = "Alice"      # WRONG -- Bash thinks "name" is a command

# No need to quote simple values
count=42

# Quote when value contains spaces or special characters
greeting="Hello, World!"
path="/home/user/my files"

Local Variables

By default, variables are local to the current shell:

color="blue"
echo $color          # blue

# Start a subshell
bash -c 'echo $color'  # (empty -- variable not inherited)

Exported Variables (Environment Variables)

Use export to make a variable available to child processes:

export DATABASE_URL="postgres://localhost:5432/mydb"

# Now child processes can see it
bash -c 'echo $DATABASE_URL'
# postgres://localhost:5432/mydb

# Or set and export in one step
export API_KEY="secret123"

Readonly Variables

readonly PI=3.14159
PI=3.0
# bash: PI: readonly variable

Unsetting Variables

temp="something"
unset temp
echo $temp           # (empty)

Arrays

Bash supports indexed arrays (like lists) and associative arrays (like dictionaries/maps).

Indexed Arrays

# Declare an array
fruits=("apple" "banana" "cherry" "date")

# Access elements (0-indexed)
echo ${fruits[0]}     # apple
echo ${fruits[2]}     # cherry

# All elements
echo ${fruits[@]}     # apple banana cherry date

# Number of elements
echo ${#fruits[@]}    # 4

# Add an element
fruits+=("elderberry")
echo ${#fruits[@]}    # 5

# Iterate
for fruit in "${fruits[@]}"; do
    echo "I like $fruit"
done

# Slice (elements 1 through 2)
echo ${fruits[@]:1:2}   # banana cherry

# Indices
echo ${!fruits[@]}    # 0 1 2 3 4

# Remove an element (leaves a gap!)
unset 'fruits[1]'
echo ${fruits[@]}     # apple cherry date elderberry
echo ${!fruits[@]}    # 0 2 3 4   (index 1 is gone, others unchanged)

Associative Arrays (Bash 4+)

# Must declare with -A
declare -A config

config[host]="localhost"
config[port]="5432"
config[database]="mydb"
config[user]="admin"

# Access
echo ${config[host]}         # localhost
echo ${config[port]}         # 5432

# All values
echo ${config[@]}            # localhost 5432 mydb admin (order not guaranteed)

# All keys
echo ${!config[@]}           # host port database user

# Number of elements
echo ${#config[@]}           # 4

# Iterate over key-value pairs
for key in "${!config[@]}"; do
    echo "$key = ${config[$key]}"
done

# Check if a key exists
if [[ -v config[host] ]]; then
    echo "host is set"
fi

Think About It: Why might you prefer an associative array over a series of individual variables when managing configuration values?


Special Variables

Bash provides several special variables that are essential for scripting:

VariableMeaning
$$PID of the current shell
$!PID of the last background process
$BASHPIDPID of the current Bash process (differs from $$ in subshells)
$PPIDParent process ID
echo "My PID: $$"
sleep 100 &
echo "Background PID: $!"
echo "Parent PID: $PPID"
VariableMeaning
$0Name of the script or shell
$1-$9Positional parameters (first 9 arguments)
${10}10th argument and beyond (braces required)
$#Number of positional parameters
$@All arguments (preserves quoting when in "$@")
$*All arguments (joins into single string when in "$*")
# In a script:
echo "Script name: $0"
echo "First arg: $1"
echo "All args: $@"
echo "Arg count: $#"

The difference between "$@" and "$*" is critical:

# "$@" preserves each argument as a separate word -- USUALLY WHAT YOU WANT
# "$*" joins all arguments into a single string separated by first char of IFS
VariableMeaning
$?Exit status of the last command (0 = success)
$_Last argument of the previous command
ls /etc/hosts
echo $?          # 0 (success)

ls /nonexistent
echo $?          # 2 (error)

echo hello world
echo $_          # world (last argument of previous command)

Shell Configuration

VariableMeaning
$HOMEHome directory
$USERCurrent username
$HOSTNAMESystem hostname
$PWDCurrent working directory
$OLDPWDPrevious working directory
$PATHExecutable search path
$SHELLDefault shell path
$IFSInternal Field Separator
$RANDOMRandom integer (0-32767)
$LINENOCurrent line number in a script
$SECONDSSeconds since the shell started
echo "User $USER on $HOSTNAME in $PWD"
echo "Random number: $RANDOM"
echo "Shell has been running for $SECONDS seconds"

Hands-On: Expansion Mastery

Challenge 1: Build File Paths

# Use parameter expansion to manipulate this path:
filepath="/var/log/nginx/access.log.2.gz"

# Extract just the filename
echo ${filepath##*/}
# access.log.2.gz

# Extract just the directory
echo ${filepath%/*}
# /var/log/nginx

# Extract the extension
echo ${filepath##*.}
# gz

# Remove all extensions
echo ${filepath%%.*}
# /var/log/nginx/access

# Replace nginx with apache
echo ${filepath/nginx/apache}
# /var/log/apache/access.log.2.gz

Challenge 2: Batch Rename Using Arrays

mkdir -p /tmp/rename-lab
cd /tmp/rename-lab
touch photo_{001..005}.JPG

# Rename .JPG to .jpg using parameter expansion
for f in *.JPG; do
    mv "$f" "${f%.JPG}.jpg"
done
ls
# photo_001.jpg  photo_002.jpg  photo_003.jpg  photo_004.jpg  photo_005.jpg

# Clean up
rm -rf /tmp/rename-lab

Challenge 3: Default Values in Practice

# A script that uses default values
DB_HOST=${DB_HOST:-localhost}
DB_PORT=${DB_PORT:-5432}
DB_NAME=${DB_NAME:-myapp}

echo "Connecting to $DB_HOST:$DB_PORT/$DB_NAME"
# If no env vars set: Connecting to localhost:5432/myapp

Debug This: Quoting Gone Wrong

Someone wrote this script and it does not work with filenames containing spaces:

#!/bin/bash
# BUG: This breaks on filenames with spaces
for file in $(find /data -name "*.csv"); do
    wc -l $file
done

Problems:

  1. $(find ...) is subject to word splitting
  2. $file is unquoted, so it splits on spaces

Fixed version:

#!/bin/bash
# FIXED: Use find -exec or a while loop with null delimiter
find /data -name "*.csv" -print0 | while IFS= read -r -d '' file; do
    wc -l "$file"
done

Or even simpler with find -exec:

find /data -name "*.csv" -exec wc -l {} \;

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 18 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  Bash expansion order:                                           |
|  1. Brace  2. Tilde  3. Parameter  4. Command Sub               |
|  5. Arithmetic  6. Word Splitting  7. Globbing                   |
|                                                                  |
|  Quoting:                                                        |
|  - Double quotes: variables expand, no glob/split                |
|  - Single quotes: nothing expands                                |
|  - Always quote "$variable" to prevent word splitting            |
|                                                                  |
|  Parameter expansion: ${var:-default}, ${var##pattern},          |
|  ${var%pattern}, ${var/old/new}, ${#var}                         |
|                                                                  |
|  Arrays: fruits=(...), ${fruits[@]}, ${#fruits[@]}               |
|  Associative arrays: declare -A, ${map[key]}                     |
|                                                                  |
|  Special vars: $?, $$, $!, $@, $#, $0                            |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Expansion Order

Predict the output of each line before running it, then verify:

echo {1..3}_{a,b}
echo ~root/test
echo "Today: $(date +%H:%M) - PID $$"
echo '$HOME is' $HOME
echo $((2**8))

Exercise 2: Parameter Expansion Practice

Given url="https://www.example.com/path/to/page.html?query=1", use parameter expansion to extract:

  • Just the protocol (https)
  • Just the filename (page.html?query=1)
  • The URL with example.com replaced by mysite.org

Exercise 3: Array Operations

Create an indexed array of 5 Linux distribution names. Write a loop that prints each with its index number. Then create an associative array mapping each distro to its package manager.

Exercise 4: Special Variables

Write a short script that prints: its own name, all arguments it received, the count of arguments, and the PID it is running as. Test it with various inputs.

Bonus Challenge

Write a one-liner using brace expansion that creates a directory structure for a new project with: src/, tests/, docs/, config/, and inside each a README.md file. Use only mkdir -p and touch with brace expansion.

Bash Scripting

Why This Matters

You have been typing commands one at a time. That works fine when you are doing something once. But what happens when you need to do the same thing every day? Or on fifty servers? Or in a CI/CD pipeline at 2 AM with nobody at the keyboard?

You write a script.

A Bash script is just a text file full of commands that Bash executes in sequence. But scripts can also make decisions, loop over data, accept arguments, handle errors, and call functions. A well-written script can replace hours of manual work with a single command.

This chapter takes you from writing your first script to writing robust, production-grade Bash that handles errors gracefully, parses command-line arguments, and does not break when the unexpected happens.


Try This Right Now

Create and run your first script:

cat > /tmp/hello.sh << 'SCRIPT'
#!/bin/bash
echo "Hello, $(whoami)!"
echo "Today is $(date +%A), $(date +%B) $(date +%d)"
echo "You are running Bash $BASH_VERSION"
echo "Your system has $(nproc) CPU cores"
echo "Free memory: $(free -h | awk '/Mem:/ {print $4}')"
SCRIPT
chmod +x /tmp/hello.sh
/tmp/hello.sh

You should see a personalized system summary. That is a script: commands in a file, executed as a unit.


The Shebang Line

Every script should start with a shebang (#!) that tells the system which interpreter to use:

#!/bin/bash

Common shebangs:

#!/bin/bash        # Use Bash specifically
#!/bin/sh          # Use the system's POSIX shell (might not be Bash)
#!/usr/bin/env bash  # Find bash in PATH (more portable)
#!/usr/bin/env python3  # For Python scripts

Why does this matter? Without a shebang, the script is run by whatever shell invoked it. With a shebang, it always runs with the correct interpreter, even when called from a different shell.

# Make a script executable and run it
chmod +x myscript.sh
./myscript.sh        # Uses the shebang interpreter

# Or explicitly invoke bash (shebang is ignored)
bash myscript.sh

Distro Note: On some systems (FreeBSD, certain containers), Bash is not at /bin/bash. Using #!/usr/bin/env bash is more portable because it searches $PATH.


Exit Codes

Every command returns an exit code: an integer from 0 to 255.

# 0 means success
ls /tmp
echo $?    # 0

# Non-zero means failure
ls /nonexistent
echo $?    # 2

# You can set your own exit code
exit 0     # Success
exit 1     # General error
exit 2     # Misuse of command

Conventional exit codes:

CodeMeaning
0Success
1General error
2Misuse of command/arguments
126Command found but not executable
127Command not found
128+NKilled by signal N (e.g., 137 = killed by SIGKILL)

Using exit codes in scripts:

#!/bin/bash
if [ ! -f /etc/hosts ]; then
    echo "ERROR: /etc/hosts not found" >&2
    exit 1
fi
echo "OK: /etc/hosts exists"
exit 0

Conditionals

The if Statement

#!/bin/bash

if [ -f /etc/hosts ]; then
    echo "/etc/hosts exists"
elif [ -f /etc/hostname ]; then
    echo "/etc/hostname exists"
else
    echo "Neither file found"
fi

test, [ ], and [[ ]]

There are three ways to test conditions:

# These are equivalent:
test -f /etc/hosts
[ -f /etc/hosts ]

# [[ ]] is a Bash enhancement (preferred):
[[ -f /etc/hosts ]]

Why prefer [[ ]]?

  • No word splitting on variables (safer)
  • Supports && and || inside the brackets
  • Supports pattern matching with == and regex with =~
  • Does not need quoting on variables (though quoting is still good practice)

File Tests

[[ -f /path ]]    # True if file exists and is a regular file
[[ -d /path ]]    # True if directory exists
[[ -e /path ]]    # True if anything exists at that path
[[ -r /path ]]    # True if readable
[[ -w /path ]]    # True if writable
[[ -x /path ]]    # True if executable
[[ -s /path ]]    # True if file exists and is not empty
[[ -L /path ]]    # True if symbolic link

String Tests

[[ -z "$str" ]]            # True if string is empty
[[ -n "$str" ]]            # True if string is NOT empty
[[ "$a" == "$b" ]]         # True if strings are equal
[[ "$a" != "$b" ]]         # True if strings are not equal
[[ "$a" < "$b" ]]          # True if a sorts before b
[[ "$str" == *.txt ]]      # Pattern matching (glob)
[[ "$str" =~ ^[0-9]+$ ]]   # Regex matching

Integer Comparisons

[[ $a -eq $b ]]    # Equal
[[ $a -ne $b ]]    # Not equal
[[ $a -lt $b ]]    # Less than
[[ $a -le $b ]]    # Less than or equal
[[ $a -gt $b ]]    # Greater than
[[ $a -ge $b ]]    # Greater than or equal

# Or use (( )) for arithmetic comparisons (more readable):
(( a == b ))
(( a != b ))
(( a < b ))
(( a > b ))

Combining Conditions

# AND
[[ -f /etc/hosts && -r /etc/hosts ]]

# OR
[[ -f /etc/hosts || -f /etc/hostname ]]

# NOT
[[ ! -f /etc/hosts ]]

# Complex combination
if [[ -f "$config" && -r "$config" ]] && command -v jq &>/dev/null; then
    echo "Config file is readable and jq is available"
fi

Think About It: Why is [[ -f "$file" ]] safer than [ -f $file ] when $file might contain spaces or be empty?


Loops

for Loops

# Over a list
for color in red green blue; do
    echo "Color: $color"
done

# Over command output
for user in $(cut -d: -f1 /etc/passwd | head -5); do
    echo "User: $user"
done

# Over files (use glob, not ls!)
for f in /etc/*.conf; do
    echo "Config: $f"
done

# C-style for loop
for ((i=1; i<=5; i++)); do
    echo "Iteration $i"
done

# Over a range
for i in {1..10}; do
    echo "Number $i"
done

while Loops

# Basic while
count=1
while [[ $count -le 5 ]]; do
    echo "Count: $count"
    ((count++))
done

# Read a file line by line (correct way)
while IFS= read -r line; do
    echo "Line: $line"
done < /etc/hostname

# Read command output line by line
ps aux | while IFS= read -r line; do
    echo "$line"
done

# Infinite loop (useful for daemons, menus)
while true; do
    echo "Working... (Ctrl+C to stop)"
    sleep 5
done

until Loops

Like while, but runs until the condition becomes true:

count=1
until [[ $count -gt 5 ]]; do
    echo "Count: $count"
    ((count++))
done

Loop Control

# break -- exit the loop
for i in {1..100}; do
    if [[ $i -eq 5 ]]; then
        break
    fi
    echo $i
done
# Prints: 1 2 3 4

# continue -- skip to next iteration
for i in {1..10}; do
    if (( i % 2 == 0 )); then
        continue
    fi
    echo $i
done
# Prints: 1 3 5 7 9

case Statements

Pattern matching for cleaner multi-branch logic:

#!/bin/bash
case "$1" in
    start)
        echo "Starting service..."
        ;;
    stop)
        echo "Stopping service..."
        ;;
    restart)
        echo "Restarting service..."
        ;;
    status)
        echo "Service status: running"
        ;;
    *)
        echo "Usage: $0 {start|stop|restart|status}"
        exit 1
        ;;
esac

Cases support patterns:

case "$input" in
    [0-9]*)
        echo "Starts with a number"
        ;;
    *.txt|*.md)
        echo "Text-like file"
        ;;
    y|Y|yes|YES)
        echo "Affirmative"
        ;;
    *)
        echo "Unknown input"
        ;;
esac

Functions

#!/bin/bash

# Define a function
greet() {
    local name="$1"
    local time_of_day="$2"
    echo "Good $time_of_day, $name!"
}

# Call the function
greet "Alice" "morning"
greet "Bob" "evening"

Function Return Values

Functions use return for exit codes (0-255), not for returning data:

is_root() {
    [[ $(id -u) -eq 0 ]]
    return $?
}

if is_root; then
    echo "Running as root"
else
    echo "Not root"
fi

To return data, use echo (command substitution) or a global variable:

# Method 1: echo + command substitution (preferred)
get_hostname() {
    echo "$(hostname -f)"
}
my_host=$(get_hostname)

# Method 2: global variable (use sparingly)
get_info() {
    RESULT_OS=$(uname -s)
    RESULT_ARCH=$(uname -m)
}
get_info
echo "OS: $RESULT_OS, Arch: $RESULT_ARCH"

Local Variables

Always use local inside functions to avoid polluting the global scope:

bad_function() {
    counter=0    # GLOBAL -- bleeds into the caller's scope
}

good_function() {
    local counter=0    # LOCAL -- contained within the function
}

Here Documents

Here documents let you embed multi-line text in a script:

# Basic here document
cat << 'EOF'
This is a multi-line
text block. Variables like $HOME
are NOT expanded because we quoted 'EOF'.
EOF

# With expansion (no quotes on delimiter)
cat << EOF
Your home directory is: $HOME
Today is: $(date)
EOF

# Indented here document (<<- strips leading tabs)
if true; then
	cat <<- EOF
	This text can be indented with tabs
	and the tabs are stripped from output.
	EOF
fi

# Here string (single line)
grep "root" <<< "root:x:0:0:root:/root:/bin/bash"

Practical use -- creating a config file from a script:

cat > /tmp/myapp.conf << EOF
# Generated by setup script on $(date)
server_name=$HOSTNAME
listen_port=8080
log_level=info
EOF

Argument Parsing with getopts

For scripts that accept command-line flags:

#!/bin/bash

# Default values
verbose=false
output_file=""
count=1

# Parse options
while getopts "vo:c:h" opt; do
    case "$opt" in
        v) verbose=true ;;
        o) output_file="$OPTARG" ;;
        c) count="$OPTARG" ;;
        h)
            echo "Usage: $0 [-v] [-o output_file] [-c count] [files...]"
            exit 0
            ;;
        *)
            echo "Usage: $0 [-v] [-o output_file] [-c count] [files...]" >&2
            exit 1
            ;;
    esac
done

# Remove parsed options, leaving positional arguments
shift $((OPTIND - 1))

# Now "$@" contains the remaining arguments
echo "Verbose: $verbose"
echo "Output: $output_file"
echo "Count: $count"
echo "Remaining args: $@"

The option string "vo:c:h" means:

  • v -- flag (no argument)
  • o: -- option requiring an argument (the colon)
  • c: -- option requiring an argument
  • h -- flag (no argument)
./myscript.sh -v -o results.txt -c 5 file1.txt file2.txt
# Verbose: true
# Output: results.txt
# Count: 5
# Remaining args: file1.txt file2.txt

Defensive Scripting: set -euo pipefail

The single most important line you can add to any script:

#!/bin/bash
set -euo pipefail

What each flag does:

FlagBehavior
-eExit immediately if any command fails (non-zero exit code)
-uTreat unset variables as an error
-o pipefailA pipeline fails if any command in it fails (not just the last)

Why This Matters

Without set -e:

#!/bin/bash
cd /nonexistent        # Fails silently, cd does not happen
rm -rf *               # DELETES FILES IN THE CURRENT DIRECTORY!

With set -e:

#!/bin/bash
set -e
cd /nonexistent        # Script exits here with an error
rm -rf *               # Never reached

Without set -u:

#!/bin/bash
rm -rf "$DIERCTORY"/data    # Typo in variable name!
# Without -u, $DIERCTORY is empty, so this becomes: rm -rf /data

With set -u:

#!/bin/bash
set -u
rm -rf "$DIERCTORY"/data    # Script exits: DIERCTORY: unbound variable

WARNING: set -e can be surprising. It does not trigger on commands in if conditions, commands before ||, or commands in while conditions. Always test your error handling.

Handling Expected Failures with set -e

Sometimes a command is allowed to fail:

#!/bin/bash
set -euo pipefail

# Method 1: OR with true
grep "pattern" file.txt || true
# grep failing (no match) won't exit the script

# Method 2: Conditional
if grep -q "pattern" file.txt; then
    echo "Found it"
fi
# The if-condition doesn't trigger set -e

# Method 3: Explicit check
result=$(command_that_might_fail) || {
    echo "Command failed, but we'll continue"
}

Debugging with set -x

When a script does not behave as expected, set -x shows every command as it executes:

#!/bin/bash
set -x    # Enable trace mode

name="world"
echo "Hello, $name"
ls /tmp/*.txt 2>/dev/null | wc -l

Output:

+ name=world
+ echo 'Hello, world'
Hello, world
+ wc -l
+ ls '/tmp/a.txt' '/tmp/b.txt'
2

Each line prefixed with + is Bash showing you the command after expansion. This is invaluable for finding bugs.

You can also enable tracing for a section of a script:

#!/bin/bash
echo "Normal output"

set -x
# Traced section
result=$((2 + 3))
echo "Result: $result"
set +x

echo "Normal output again"

Using PS4 for Better Traces

Customize the trace prefix to show more context:

#!/bin/bash
export PS4='+ ${BASH_SOURCE}:${LINENO}: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
set -x

my_function() {
    echo "Inside function"
}
my_function

Output:

+ myscript.sh:7: my_function()
+ myscript.sh:4: my_function(): echo 'Inside function'
Inside function

ShellCheck: Your Script Linter

ShellCheck is an open source static analysis tool that catches common Bash mistakes. Install it and use it on every script you write.

# Install
sudo apt install shellcheck        # Debian/Ubuntu
sudo dnf install ShellCheck        # Fedora
sudo pacman -S shellcheck          # Arch

Example: create a script with common mistakes:

cat > /tmp/buggy.sh << 'SCRIPT'
#!/bin/bash
echo $1
cd $dir
for f in $(ls *.txt); do
    cat $f
done
[ $var = "hello" ]
SCRIPT

shellcheck /tmp/buggy.sh

ShellCheck output:

In /tmp/buggy.sh line 2:
echo $1
     ^-- SC2086: Double quote to prevent globbing and word splitting.

In /tmp/buggy.sh line 3:
cd $dir
   ^--- SC2086: Double quote to prevent globbing and word splitting.
   ^--- SC2164: Use 'cd ... || exit' in case cd fails.

In /tmp/buggy.sh line 4:
for f in $(ls *.txt); do
         ^--------- SC2045: Iterating over ls output is fragile. Use globs.

In /tmp/buggy.sh line 5:
    cat $f
        ^-- SC2086: Double quote to prevent globbing and word splitting.

In /tmp/buggy.sh line 7:
[ $var = "hello" ]
  ^--- SC2086: Double quote to prevent globbing and word splitting.
  ^--- SC2154: var is referenced but not assigned.

Every warning is a real bug or potential bug. The fixed version:

#!/bin/bash
echo "$1"
cd "$dir" || exit 1
for f in *.txt; do
    cat "$f"
done
[[ "$var" == "hello" ]]

Think About It: Why does ShellCheck warn against for f in $(ls *.txt)? Think about what happens when filenames contain spaces, newlines, or special characters.


Hands-On: A Complete Script

Let us write a real-world script: a log analyzer that processes system logs.

cat > /tmp/log-analyzer.sh << 'MAINSCRIPT'
#!/bin/bash
set -euo pipefail

#-------------------------------------------------------
# log-analyzer.sh - Analyze system logs for SSH activity
#-------------------------------------------------------

# Default values
SINCE="today"
TOP_N=10
VERBOSE=false

# Colors (only if stdout is a terminal)
if [[ -t 1 ]]; then
    RED='\033[0;31m'
    GREEN='\033[0;32m'
    YELLOW='\033[0;33m'
    NC='\033[0m'  # No Color
else
    RED='' GREEN='' YELLOW='' NC=''
fi

usage() {
    cat << EOF
Usage: $(basename "$0") [OPTIONS]

Analyze system logs for SSH activity.

Options:
    -s SINCE    Time period (default: "today")
    -n NUM      Show top N results (default: 10)
    -v          Verbose output
    -h          Show this help message

Examples:
    $(basename "$0")
    $(basename "$0") -s "1 hour ago" -n 5
    $(basename "$0") -s "2025-03-10" -v
EOF
}

log_info() {
    echo -e "${GREEN}[INFO]${NC} $*"
}

log_warn() {
    echo -e "${YELLOW}[WARN]${NC} $*" >&2
}

log_error() {
    echo -e "${RED}[ERROR]${NC} $*" >&2
}

# Parse arguments
while getopts "s:n:vh" opt; do
    case "$opt" in
        s) SINCE="$OPTARG" ;;
        n) TOP_N="$OPTARG" ;;
        v) VERBOSE=true ;;
        h) usage; exit 0 ;;
        *) usage; exit 1 ;;
    esac
done
shift $((OPTIND - 1))

# Validate
if ! [[ "$TOP_N" =~ ^[0-9]+$ ]]; then
    log_error "Invalid number: $TOP_N"
    exit 1
fi

# Check if we can read the journal
if ! command -v journalctl &>/dev/null; then
    log_error "journalctl not found. Is systemd installed?"
    exit 1
fi

log_info "Analyzing SSH logs since: $SINCE"
echo ""

# Count total SSH log entries
total=$(journalctl -u sshd.service -u ssh.service \
    --since "$SINCE" --no-pager 2>/dev/null | wc -l || echo "0")
log_info "Total SSH log entries: $total"

# Count failed login attempts
failed=$(journalctl -u sshd.service -u ssh.service \
    --since "$SINCE" --no-pager 2>/dev/null \
    | grep -ci "failed\|invalid" || echo "0")
log_info "Failed login attempts: $failed"

# Count successful logins
success=$(journalctl -u sshd.service -u ssh.service \
    --since "$SINCE" --no-pager 2>/dev/null \
    | grep -ci "accepted" || echo "0")
log_info "Successful logins: $success"

echo ""
log_info "Top $TOP_N source IPs (failed attempts):"
echo "---"
journalctl -u sshd.service -u ssh.service \
    --since "$SINCE" --no-pager 2>/dev/null \
    | grep -i "failed" \
    | grep -oP 'from \K[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' \
    | sort | uniq -c | sort -rn | head -"$TOP_N" || true
echo ""

if $VERBOSE; then
    log_info "Last 5 SSH events:"
    echo "---"
    journalctl -u sshd.service -u ssh.service \
        --since "$SINCE" --no-pager -n 5 2>/dev/null || true
fi

log_info "Analysis complete."
MAINSCRIPT

chmod +x /tmp/log-analyzer.sh

Test it:

/tmp/log-analyzer.sh -h
/tmp/log-analyzer.sh -v
/tmp/log-analyzer.sh -s "1 week ago" -n 5

Debug This: Script Fails Silently

Someone wrote this script, but it produces no output and exits with code 0:

#!/bin/bash
set -euo pipefail

LOGFILE="/var/log/myapp/app.log"

# Count errors in the log
error_count=$(grep -c "ERROR" $LOGFILE)
echo "Found $error_count errors"

What is wrong?

  1. $LOGFILE is unquoted (though this specific path has no spaces, it is still bad practice)
  2. If the file does not exist, grep fails with exit code 2, and set -e kills the script silently
  3. If the file exists but has no matches, grep -c returns exit code 1 (no match), and set -e kills the script

Fixed:

#!/bin/bash
set -euo pipefail

LOGFILE="/var/log/myapp/app.log"

if [[ ! -f "$LOGFILE" ]]; then
    echo "ERROR: Log file not found: $LOGFILE" >&2
    exit 1
fi

error_count=$(grep -c "ERROR" "$LOGFILE" || true)
echo "Found $error_count errors"

Script Template

Use this as a starting point for new scripts:

#!/bin/bash
set -euo pipefail

# Description: What this script does
# Usage: ./script.sh [-v] [-o output] <input>

readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"

# Default values
VERBOSE=false
OUTPUT=""

usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] <input>

Options:
    -v          Verbose output
    -o FILE     Output file
    -h          Show help
EOF
}

die() {
    echo "ERROR: $*" >&2
    exit 1
}

log() {
    if $VERBOSE; then
        echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" >&2
    fi
}

# Parse arguments
while getopts "vo:h" opt; do
    case "$opt" in
        v) VERBOSE=true ;;
        o) OUTPUT="$OPTARG" ;;
        h) usage; exit 0 ;;
        *) usage; exit 1 ;;
    esac
done
shift $((OPTIND - 1))

# Validate required arguments
if [[ $# -lt 1 ]]; then
    die "Missing required argument. Use -h for help."
fi

INPUT="$1"
log "Processing: $INPUT"

# Main logic here
echo "TODO: implement"

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 19 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - Start scripts with #!/bin/bash and set -euo pipefail         |
|  - Exit codes: 0 = success, non-zero = failure                  |
|  - Conditionals: prefer [[ ]] over [ ]; use (( )) for math     |
|  - Loops: for, while, until; use globs not ls for files         |
|  - case statements for multi-branch pattern matching            |
|  - Functions: use local variables, return exit codes             |
|  - Here documents for multi-line text embedding                  |
|  - getopts for command-line argument parsing                     |
|  - set -x for debugging; PS4 for better traces                  |
|  - ShellCheck catches common bugs -- use it always              |
|  - Always quote variables: "$var" not $var                      |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: System Info Script

Write a script that displays: hostname, kernel version, uptime, CPU count, total/free memory, total/free disk space, and the number of logged-in users. Accept a -j flag that outputs everything as JSON.

Exercise 2: File Organizer

Write a script that takes a directory as an argument and organizes files into subdirectories by extension (e.g., txt/, pdf/, jpg/). Include a -n (dry run) flag that shows what would happen without actually moving files.

Exercise 3: Process Monitor

Write a script that checks if a given process name is running. If not, print a warning. Accept the process name as an argument. Add a -l (loop) flag that checks every 10 seconds.

Exercise 4: Validate and Fix

Run ShellCheck on every script in your home directory. Fix the top 3 most common warnings it finds.

Bonus Challenge

Write a deployment script that: (1) accepts -e for environment (staging/production), (2) validates prerequisites (git, rsync, ssh), (3) shows a confirmation prompt with what it will do, (4) simulates deploying by syncing a local directory to a remote path using rsync, and (5) logs everything to a timestamped file. Include proper error handling and rollback on failure.

Regular Expressions

Why This Matters

You have a 2 GB server log and you need to find every line where someone accessed the /api/users endpoint from an IP address starting with 10.0.. Or you need to validate that a configuration file contains properly formatted email addresses. Or you need to extract all phone numbers from a messy text dump.

You could write a custom program for each of these. Or you could write a regular expression in 30 seconds and use it with grep, sed, awk, or any programming language on the planet.

Regular expressions (regex) are a pattern language for matching text. They are one of the most powerful and universally useful tools in computing. Every text editor, every programming language, every log analysis tool supports them. Learn regex once, use it everywhere, forever.

This chapter teaches you regex from the ground up: what the symbols mean, how to combine them, and how to use them with grep for real-world text searching.


Try This Right Now

# Create a sample file to work with
cat > /tmp/regex-lab.txt << 'DATA'
john.doe@example.com
jane_smith@company.org
invalid-email@
bob@test.co.uk
192.168.1.1
10.0.0.255
300.400.500.600
127.0.0.1
ERROR: Connection timeout at 14:23:45
WARNING: Disk usage at 85%
INFO: User login successful
ERROR: File not found: /var/data/report.csv
phone: 555-123-4567
phone: (555) 123-4567
phone: 5551234567
2025-03-10 14:22:01 server01 sshd: Failed password for root from 10.0.0.5
2025-03-10 14:23:15 server01 sshd: Accepted password for alice from 192.168.1.50
DATA

# Find lines containing "ERROR"
grep "ERROR" /tmp/regex-lab.txt

# Find lines starting with a number
grep "^[0-9]" /tmp/regex-lab.txt

# Find email-like patterns
grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" /tmp/regex-lab.txt

# Find IP addresses (rough pattern)
grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /tmp/regex-lab.txt

BRE vs ERE: Two Regex Flavors

Linux tools support two flavors of regular expressions:

BRE (Basic Regular Expressions) -- the default for grep and sed:

  • Metacharacters ? + { } ( ) | must be escaped with \ to have special meaning
  • Without escaping, they are literal characters

ERE (Extended Regular Expressions) -- used by grep -E (or egrep) and sed -E:

  • Metacharacters ? + { } ( ) | have special meaning by default
  • To match them literally, escape with \
# BRE: must escape + and ( )
grep 'ab\+c' file          # One or more 'b'
grep '\(abc\)\{2\}' file   # Exactly two "abc"

# ERE: cleaner, no escaping needed
grep -E 'ab+c' file        # One or more 'b'
grep -E '(abc){2}' file    # Exactly two "abc"

Recommendation: Use grep -E (ERE) for nearly everything. It is cleaner and more readable. The rest of this chapter uses ERE unless noted.

+---------------------------------------------+
|  Feature        | BRE          | ERE         |
|-----------------|--------------|-------------|
|  ?              | literal      | 0 or 1      |
|  +              | literal      | 1 or more   |
|  {n,m}          | \{n,m\}     | {n,m}       |
|  ( )            | \( \)       | ( )         |
|  |              | literal      | alternation |
|  . * ^ $ [ ]    | same         | same        |
+---------------------------------------------+

Metacharacters: The Building Blocks

The Dot: Match Any Character

. matches any single character (except newline):

echo -e "cat\ncar\ncap\ncab\ncan" | grep -E 'ca.'
# cat, car, cap, cab, can -- all match

echo -e "cat\ncoat\nct" | grep -E 'c.t'
# cat, ct does NOT match (. needs exactly one character)
# coat does NOT match (only one . so only one char between c and t)

Anchors: Where to Match

^ matches the start of a line. $ matches the end:

# Lines starting with "ERROR"
grep -E '^ERROR' /tmp/regex-lab.txt

# Lines ending with ".com"
grep -E '\.com$' /tmp/regex-lab.txt

# Lines that are exactly "127.0.0.1"
grep -E '^127\.0\.0\.1$' /tmp/regex-lab.txt

# Empty lines
grep -E '^$' /tmp/regex-lab.txt

Character Classes: Match One of a Set

[...] matches any single character in the set:

# Match vowels
echo -e "bat\nbet\nbit\nbot\nbut" | grep -E 'b[aeiou]t'
# bat, bet, bit, bot, but

# Match digits
grep -E '[0-9]' /tmp/regex-lab.txt

# Match uppercase letters
grep -E '[A-Z]' /tmp/regex-lab.txt

# Negate: match anything NOT in the set
echo -e "bat\nbet\nbit\nbot\nbut" | grep -E 'b[^aeiou]t'
# (no output -- all have vowels)

POSIX Character Classes

More portable than ranges like [A-Z] (which depend on locale):

ClassMatches
[:alpha:]Letters (a-z, A-Z)
[:digit:]Digits (0-9)
[:alnum:]Letters and digits
[:upper:]Uppercase letters
[:lower:]Lowercase letters
[:space:]Whitespace (space, tab, newline)
[:punct:]Punctuation characters
[:print:]Printable characters
# Match lines containing uppercase letters
grep -E '[[:upper:]]' /tmp/regex-lab.txt

# Match lines starting with a digit
grep -E '^[[:digit:]]' /tmp/regex-lab.txt

Note the double brackets: [[:digit:]]. The outer [] is the character class syntax; the inner [:digit:] is the POSIX class name.


Quantifiers: How Many Times

Quantifiers specify how many times the preceding element must match:

QuantifierMeaning
*Zero or more
+One or more (ERE)
?Zero or one (ERE)
{n}Exactly n times (ERE)
{n,}n or more times (ERE)
{n,m}Between n and m times (ERE)
# * -- zero or more
echo -e "ac\nabc\nabbc\nabbbc" | grep -E 'ab*c'
# ac, abc, abbc, abbbc (all match -- zero or more 'b')

# + -- one or more
echo -e "ac\nabc\nabbc\nabbbc" | grep -E 'ab+c'
# abc, abbc, abbbc (NOT ac -- needs at least one 'b')

# ? -- zero or one
echo -e "color\ncolour" | grep -E 'colou?r'
# color, colour (the 'u' is optional)

# {n} -- exactly n times
echo -e "ab\naab\naaab\naaaab" | grep -E 'a{3}b'
# aaab (exactly 3 a's before b)

# {n,m} -- between n and m times
echo -e "ab\naab\naaab\naaaab" | grep -E 'a{2,3}b'
# aab, aaab (2 or 3 a's before b)

# {n,} -- n or more
echo -e "ab\naab\naaab\naaaab" | grep -E 'a{2,}b'
# aab, aaab, aaaab (2 or more a's)

Think About It: What is the difference between .* and .+? When would the distinction matter?


Alternation and Grouping

Alternation: OR

The | operator matches either the left or right pattern:

# Match ERROR or WARNING
grep -E 'ERROR|WARNING' /tmp/regex-lab.txt

# Match cat, dog, or fish
echo -e "I have a cat\nI have a dog\nI have a fish" | grep -E 'cat|dog|fish'

Grouping: Parentheses

Parentheses group parts of a pattern:

# Without grouping: matches "gray" or "grey"
echo -e "gray\ngrey\ngruy" | grep -E 'gr(a|e)y'
# gray, grey

# Group + quantifier
echo -e "ab\nabab\nababab" | grep -E '(ab){2,}'
# abab, ababab

# Match repeated words
echo -e "the the cat\na big big dog" | grep -E '([a-z]+) \1'
# (This uses backreferences -- see below)

Backreferences

Capture groups and refer back to them with \1, \2, etc.:

# Find repeated words (BRE -- backreferences work in BRE with grep)
echo -e "the the cat\na big dog" | grep '\([a-z]\+\) \1'
# the the cat

# Note: backreferences in ERE support varies by tool.
# grep -E supports them on GNU grep:
echo -e "the the cat\na big dog" | grep -E '([a-z]+) \1'

Backreferences are most useful in sed for search-and-replace (covered in Chapter 21).


Practical Examples

Example 1: Matching IP Addresses

A rough pattern for IPv4 addresses:

# Basic pattern (matches invalid IPs too, like 999.999.999.999)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/regex-lab.txt

A more precise pattern (validates 0-255 for each octet):

# Strict IPv4 validation
grep -E '^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$' /tmp/regex-lab.txt

Let us break this down:

25[0-5]           --> matches 250-255
2[0-4][0-9]       --> matches 200-249
[01]?[0-9][0-9]?  --> matches 0-199
\.                 --> literal dot
{3}               --> repeat the octet+dot pattern 3 times
# Test it
echo "192.168.1.1" | grep -E '^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
# Match

echo "300.400.500.600" | grep -E '^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
# No match (correct!)

Example 2: Matching Email Addresses

# Simplified email pattern
grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' /tmp/regex-lab.txt

Breaking it down:

[a-zA-Z0-9._%+-]+     --> local part (before @): letters, digits, special chars
@                       --> literal @
[a-zA-Z0-9.-]+        --> domain name: letters, digits, dots, hyphens
\.                      --> literal dot
[a-zA-Z]{2,}          --> TLD: at least 2 letters

Example 3: Matching Log Lines

# Match timestamps like "14:23:45" or "2025-03-10 14:22:01"
grep -E '[0-9]{2}:[0-9]{2}:[0-9]{2}' /tmp/regex-lab.txt

# Match date-time format "YYYY-MM-DD HH:MM:SS"
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}' /tmp/regex-lab.txt

# Match failed SSH attempts and extract username
grep -E 'Failed password for [a-zA-Z0-9_]+' /tmp/regex-lab.txt

Example 4: Matching Phone Numbers (Multiple Formats)

# Match various phone formats
grep -E '(\(?[0-9]{3}\)?[-. ]?)?[0-9]{3}[-. ]?[0-9]{4}' /tmp/regex-lab.txt

grep Options for Regex Work

Essential grep Flags

# -E: Extended regex (always use this)
grep -E 'pattern' file

# -i: Case-insensitive
grep -Ei 'error|warning' /var/log/syslog

# -v: Invert match (show lines that do NOT match)
grep -Ev '^#|^$' /etc/ssh/sshd_config
# Show config without comments or blank lines

# -c: Count matching lines
grep -Ec 'ERROR' logfile

# -n: Show line numbers
grep -En 'TODO' *.py

# -l: Show only filenames (not matching lines)
grep -Erl 'password' /etc/

# -o: Show only the matching part (not the whole line)
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/regex-lab.txt

# -w: Match whole words only
echo -e "cat\ncatalog\nconcat" | grep -w 'cat'
# Only "cat" matches, not "catalog" or "concat"

# -A N: Show N lines AFTER match
grep -EA 2 'ERROR' /tmp/regex-lab.txt

# -B N: Show N lines BEFORE match
grep -EB 2 'ERROR' /tmp/regex-lab.txt

# -C N: Show N lines of context (before and after)
grep -EC 2 'ERROR' /tmp/regex-lab.txt

Combining grep with Other Tools

# Count unique IP addresses in a log
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log \
    | sort | uniq -c | sort -rn | head -10

# Find all functions in a Python file
grep -En '^def [a-zA-Z_]+' script.py

# Find TODO/FIXME comments across a project
grep -Ern 'TODO|FIXME|HACK|XXX' /opt/myproject/ --include="*.py"

# Find config files containing a specific setting
grep -Erl 'max_connections' /etc/

Hands-On: Regex Practice

Step 1: Setup

# Create a practice log file
cat > /tmp/practice.log << 'LOG'
2025-03-10 08:00:01 INFO  Application started on port 8080
2025-03-10 08:00:02 INFO  Connected to database at 10.0.1.50:5432
2025-03-10 08:15:33 WARN  High memory usage: 82%
2025-03-10 08:30:00 INFO  Processed 1500 requests in 60s
2025-03-10 09:00:01 ERROR Connection refused to 10.0.1.50:5432
2025-03-10 09:00:05 ERROR Retry 1/3: Connection refused
2025-03-10 09:00:10 ERROR Retry 2/3: Connection refused
2025-03-10 09:00:15 ERROR Retry 3/3: Connection refused
2025-03-10 09:00:15 FATAL All retries exhausted, shutting down
2025-03-10 09:01:00 INFO  Application restarted by systemd
2025-03-10 09:01:01 INFO  Connected to database at 10.0.1.50:5432
2025-03-10 10:45:22 WARN  Slow query detected: 2340ms
2025-03-10 11:00:00 INFO  Health check: OK
2025-03-10 12:30:45 ERROR Invalid input from user_id=42: "Robert'); DROP TABLE users;--"
2025-03-10 13:00:00 INFO  Backup completed: /var/backups/db-20250310.sql.gz (2.3GB)
LOG

Step 2: Practice Queries

# 1. Find all ERROR and FATAL lines
grep -E '(ERROR|FATAL)' /tmp/practice.log

# 2. Find all IP addresses
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/practice.log

# 3. Find lines with percentage values
grep -E '[0-9]+%' /tmp/practice.log

# 4. Find timestamps between 09:00 and 10:00
grep -E '09:[0-9]{2}:[0-9]{2}' /tmp/practice.log

# 5. Find retry messages and extract the attempt number
grep -Eo 'Retry [0-9]+/[0-9]+' /tmp/practice.log

# 6. Find lines that do NOT contain INFO
grep -Ev 'INFO' /tmp/practice.log

# 7. Find the SQL injection attempt
grep -E "DROP TABLE" /tmp/practice.log

Debug This: Why Doesn't My Regex Match?

You write this command and get no output:

grep -E "Failed password for .+ from [0-9]+.[0-9]+.[0-9]+.[0-9]+" /tmp/regex-lab.txt

Problem: The . in the IP pattern matches any character, not just a literal dot. The regex works "too well" -- it matches, but it also matches things it shouldn't.

Actually, in this case it should still match the line. But let us look at a subtler bug:

# This doesn't match anything
grep -E "^ERROR:" /tmp/regex-lab.txt

Problem: The lines say ERROR: with a space before the colon. Look carefully:

ERROR: Connection timeout at 14:23:45

That is ERROR: followed by a space, but the actual text is ERROR (with no colon directly after ERROR -- the word Connection follows).

Wait, actually re-read the sample data. The line is:

ERROR: Connection timeout at 14:23:45

So grep -E "^ERROR:" should work. Let me demonstrate a real common bug instead:

# Common mistake: forgetting to escape dots in IP patterns
echo "192x168x1x1" | grep -E '192.168.1.1'
# MATCHES! Because . means "any character"

echo "192x168x1x1" | grep -E '192\.168\.1\.1'
# No match (correct -- dots must be literal)

Lesson: When matching literal dots, periods, or other metacharacters, always escape them with \.

Common regex debugging tips:

  1. Start with a simpler pattern and gradually add complexity
  2. Use grep -o to see exactly what is matching
  3. Test on simple input first, then scale to real data
  4. Remember to escape metacharacters when you want their literal form
  5. Check BRE vs ERE -- are you using grep or grep -E?

Quick Reference

+------------------------------------------------------------+
|  REGEX QUICK REFERENCE (ERE)                                |
+------------------------------------------------------------+
|                                                             |
|  .          Any character (except newline)                  |
|  ^          Start of line                                   |
|  $          End of line                                     |
|  *          Zero or more of preceding                       |
|  +          One or more of preceding                        |
|  ?          Zero or one of preceding                        |
|  {n}        Exactly n of preceding                          |
|  {n,m}      Between n and m of preceding                    |
|  [abc]      One character from set                          |
|  [^abc]     One character NOT in set                        |
|  [a-z]      Character range                                 |
|  (abc)      Group                                           |
|  a|b        Alternation (a or b)                            |
|  \1         Backreference to group 1                        |
|  \.         Literal dot (escape metacharacters with \)      |
|                                                             |
|  [:alpha:]  Letters        [:digit:]  Digits                |
|  [:alnum:]  Alphanumeric   [:space:]  Whitespace            |
|  [:upper:]  Uppercase      [:lower:]  Lowercase             |
+------------------------------------------------------------+

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 20 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - Regular expressions match patterns in text                    |
|  - BRE (basic) vs ERE (extended) -- use grep -E for ERE        |
|  - . matches any character; use \. for literal dot              |
|  - ^ and $ anchor to line start/end                             |
|  - [abc] character classes; [^abc] negated classes              |
|  - *, +, ?, {n,m} are quantifiers                               |
|  - ( ) groups patterns; | provides alternation                  |
|  - grep -o shows only matching text                             |
|  - grep -E 'pattern' is your go-to for searching               |
|  - Always escape metacharacters when matching literally         |
|  - Build patterns incrementally: start simple, add detail       |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Log Analysis

Using /tmp/practice.log, write regex patterns to:

  • Extract all timestamps (HH:MM:SS format)
  • Find lines where processing took more than 1000ms
  • Extract all file paths (starting with /)

Exercise 2: Data Validation

Write regex patterns that validate:

  • A date in YYYY-MM-DD format
  • A 24-hour time in HH:MM format
  • A US ZIP code (5 digits, optionally followed by dash and 4 digits)

Test each with echo "test-string" | grep -E 'pattern'.

Exercise 3: Config File Cleaning

Take /etc/ssh/sshd_config (or any config file with comments) and use grep -Ev to remove all comment lines (starting with #) and blank lines in a single command.

Write a single grep -E command that finds all lines in /tmp/regex-lab.txt containing either an IP address, an email address, or a phone number.

Bonus Challenge

Write a regex that matches a valid MAC address in the format AA:BB:CC:DD:EE:FF (where each pair is a hexadecimal value). Test it against both valid and invalid MAC addresses. Then modify it to also accept dashes (AA-BB-CC-DD-EE-FF) as separators.

sed: Stream Editing Mastery

Why This Matters

You need to change a configuration value across 200 files. Or strip all comments from a config before processing it. Or reformat a CSV export that has dates in the wrong format. Or fix a typo in a thousand log entries being piped through a pipeline.

You are not going to open each file in a text editor. You are going to use sed.

sed (stream editor) reads input line by line, applies transformations, and writes the result to standard output. It does not load the entire file into memory, so it can process files of any size. It works in pipelines, so it fits naturally into the Unix tool chain. And it can edit files in-place, making batch modifications trivial.

If grep finds text, sed transforms it.


Try This Right Now

# Simple substitution
echo "Hello World" | sed 's/World/Linux/'
# Hello Linux

# Delete lines containing a pattern
echo -e "keep this\ndelete this line\nkeep this too" | sed '/delete/d'
# keep this
# keep this too

# Print only lines matching a pattern (like grep)
echo -e "ERROR: something\nINFO: normal\nERROR: another" | sed -n '/ERROR/p'
# ERROR: something
# ERROR: another

# In-place edit of a file (with backup)
echo "color=red" > /tmp/test.conf
sed -i.bak 's/red/blue/' /tmp/test.conf
cat /tmp/test.conf      # color=blue
cat /tmp/test.conf.bak  # color=red
rm /tmp/test.conf /tmp/test.conf.bak

How sed Works

sed processes input one line at a time through this cycle:

+------------------------------------------------------------------+
|  1. Read a line from input into the "pattern space"              |
|  2. Apply all commands (in order) to the pattern space           |
|  3. Print the pattern space to stdout (unless -n is used)        |
|  4. Clear the pattern space                                       |
|  5. Repeat for the next line                                      |
+------------------------------------------------------------------+

    Input        Pattern Space       Commands        Output
  +-------+     +------------+     +----------+    +--------+
  | Line 1| --> |  "Line 1"  | --> | s/old/new| -> | result |
  | Line 2|     |            |     | /pat/d   |    |        |
  | Line 3|     |            |     | ...      |    |        |
  +-------+     +------------+     +----------+    +--------+

The basic syntax is:

sed [options] 'commands' [input-file...]

# Or in a pipeline:
command | sed 'commands'

Substitution: s///

The s command is what you will use 90% of the time. It replaces text matching a pattern with a replacement.

Basic Substitution

# Replace first occurrence on each line
echo "cat cat cat" | sed 's/cat/dog/'
# dog cat cat (only the first "cat" changed)

# Replace ALL occurrences on each line (global flag)
echo "cat cat cat" | sed 's/cat/dog/g'
# dog dog dog

# Replace the Nth occurrence
echo "cat cat cat cat" | sed 's/cat/dog/3'
# cat cat dog cat (only the 3rd)

Substitution Flags

FlagMeaning
gReplace all occurrences on the line (global)
pPrint the line if a substitution was made
i or ICase-insensitive match
nReplace only the Nth occurrence
w fileWrite matched lines to a file
# Case-insensitive substitution
echo "Hello HELLO hello" | sed 's/hello/hi/gi'
# hi hi hi

# Print only lines where substitution happened
echo -e "good line\nbad line\ngood line" | sed -n 's/bad/fixed/p'
# fixed line

Using Different Delimiters

When your pattern contains /, use a different delimiter to avoid escaping:

# Awkward with / delimiter
sed 's/\/home\/user/\/opt\/app/' file

# Much cleaner with | or # as delimiter
sed 's|/home/user|/opt/app|' file
sed 's#/home/user#/opt/app#' file

You can use almost any character as the delimiter.

The Replacement String

Special sequences in the replacement:

SequenceMeaning
&The entire matched text
\1-\9Backreference to capture group
\UUppercase everything that follows (GNU sed)
\LLowercase everything that follows (GNU sed)
\uUppercase only the next character (GNU sed)
\lLowercase only the next character (GNU sed)
# & refers to the whole match
echo "hello world" | sed 's/[a-z]*/(&)/g'
# (hello) (world)

# Backreferences with capture groups
echo "John Smith" | sed 's/\([A-Z][a-z]*\) \([A-Z][a-z]*\)/\2, \1/'
# Smith, John

# Same with ERE (-E flag, cleaner syntax)
echo "John Smith" | sed -E 's/([A-Z][a-z]+) ([A-Z][a-z]+)/\2, \1/'
# Smith, John

# Case conversion (GNU sed)
echo "hello world" | sed 's/.*/\U&/'
# HELLO WORLD

echo "HELLO WORLD" | sed 's/.*/\L&/'
# hello world

# Capitalize first letter of each word
echo "hello world" | sed -E 's/\b([a-z])/\u\1/g'
# Hello World

Think About It: When would you use & versus \1 in a replacement? Think about when you want the entire match versus just part of it.


Addresses: Targeting Specific Lines

By default, sed commands apply to every line. Addresses restrict which lines a command operates on.

Line Number Addresses

# Only line 3
sed '3s/old/new/' file

# Lines 2 through 5
sed '2,5s/old/new/' file

# First line
sed '1s/old/new/' file

# Last line
sed '$s/old/new/' file

# Every other line (starting from line 1, step 2)
sed '1~2s/old/new/' file

Pattern Addresses

# Lines matching a pattern
sed '/ERROR/s/old/new/' file

# Lines NOT matching a pattern
sed '/ERROR/!s/old/new/' file

# Between two patterns (inclusive)
sed '/START/,/END/s/old/new/' file

Combining Address Types

# From line 5 to the first line matching "END"
sed '5,/END/s/old/new/' file

# From line matching "START" to line 10
sed '/START/,10s/old/new/' file

Delete, Insert, and Append

Delete Lines: d

# Delete lines matching a pattern
sed '/^#/d' /etc/ssh/sshd_config
# Remove all comment lines

# Delete blank lines
sed '/^$/d' file

# Delete comment lines AND blank lines
sed '/^#/d; /^$/d' file

# Delete lines 1 through 5
sed '1,5d' file

# Delete the last line
sed '$d' file

# Delete everything EXCEPT lines matching a pattern
sed '/important/!d' file
# (equivalent to grep "important")

Insert Lines: i

Insert text before a line:

# Insert before line 3
sed '3i\This is inserted before line 3' file

# Insert before lines matching a pattern
sed '/ERROR/i\--- Error found below ---' file

Append Lines: a

Append text after a line:

# Append after line 3
sed '3a\This is appended after line 3' file

# Append after the last line
sed '$a\This is the final line' file

# Add a blank line after each line (double-space)
sed 'a\\' file

Change Lines: c

Replace entire lines:

# Replace line 3 entirely
sed '3c\This replaces line 3 completely' file

# Replace lines matching a pattern
sed '/old_setting/c\new_setting=true' config.ini

In-Place Editing: -i

The -i flag modifies files directly instead of writing to stdout.

# Edit in-place (no backup -- dangerous!)
sed -i 's/old/new/g' file.txt

# Edit in-place WITH backup
sed -i.bak 's/old/new/g' file.txt
# Creates file.txt.bak with original content

WARNING: sed -i modifies files permanently. Always test your command without -i first, or use -i.bak to keep a backup. There is no undo.

Distro Note: On macOS/BSD, sed -i requires an argument (even empty): sed -i '' 's/old/new/' file. On GNU/Linux, sed -i 's/old/new/' file works without an argument. For cross-platform scripts, use sed -i.bak which works on both.

In-Place Editing Multiple Files

# Change "foo" to "bar" in all .conf files
sed -i.bak 's/foo/bar/g' /etc/myapp/*.conf

# Remove backup files after verifying
diff /etc/myapp/main.conf /etc/myapp/main.conf.bak
rm /etc/myapp/*.bak

Multiple Commands

Using -e

sed -e 's/cat/dog/' -e 's/red/blue/' file

Using Semicolons

sed 's/cat/dog/; s/red/blue/' file

Using a Script File

For complex transformations, put commands in a file:

cat > /tmp/transform.sed << 'SED'
# Remove comments
/^#/d

# Remove blank lines
/^$/d

# Trim trailing whitespace
s/[[:space:]]*$//

# Replace tabs with spaces
s/\t/    /g
SED

sed -f /tmp/transform.sed input.txt

Hands-On: Practical sed Examples

Setup

cat > /tmp/sed-lab.txt << 'DATA'
# Server Configuration
# Last updated: 2025-03-01

server_name = production-web-01
listen_port = 8080
max_connections = 100
log_level = debug

# Database settings
db_host = 10.0.1.50
db_port = 5432
db_name = myapp_prod
db_user = admin
db_password = secret123

# Feature flags
enable_cache = true
enable_debug = true
DATA

Example 1: Clean Config (Remove Comments and Blank Lines)

sed '/^#/d; /^$/d' /tmp/sed-lab.txt

Output:

server_name = production-web-01
listen_port = 8080
max_connections = 100
log_level = debug
db_host = 10.0.1.50
db_port = 5432
db_name = myapp_prod
db_user = admin
db_password = secret123
enable_cache = true
enable_debug = true

Example 2: Change a Configuration Value

# Change log_level from debug to info
sed 's/log_level = debug/log_level = info/' /tmp/sed-lab.txt

More robust (handles varying whitespace):

sed -E 's/(log_level\s*=\s*).*/\1info/' /tmp/sed-lab.txt

Example 3: Comment Out a Line

# Comment out the debug setting
sed '/enable_debug/s/^/# /' /tmp/sed-lab.txt

Example 4: Add a Setting After a Section Header

# Add a timeout setting after the database section header
sed '/# Database settings/a\db_timeout = 30' /tmp/sed-lab.txt

Example 5: Extract Values

# Extract just the database host value
sed -n '/^db_host/s/.*= //p' /tmp/sed-lab.txt
# 10.0.1.50

Example 6: Multiple Transformations for Deployment

# Prepare config for staging environment
sed -E \
    -e 's/(server_name\s*=\s*).*/\1staging-web-01/' \
    -e 's/(db_name\s*=\s*).*/\1myapp_staging/' \
    -e 's/(enable_debug\s*=\s*).*/\1false/' \
    -e 's/(log_level\s*=\s*).*/\1warning/' \
    /tmp/sed-lab.txt

The Hold Space

sed has two buffers: the pattern space (where the current line lives) and the hold space (a secondary buffer for storing text between lines).

CommandAction
hCopy pattern space to hold space (overwrite)
HAppend pattern space to hold space
gCopy hold space to pattern space (overwrite)
GAppend hold space to pattern space
xExchange pattern space and hold space

The hold space is advanced, but here is a practical example:

# Reverse the order of lines in a file
sed -n '1!G;h;$p' file

# This is equivalent to the tac command:
tac file

How it works:

  1. 1!G -- For every line except the first, append hold space to pattern space
  2. h -- Copy pattern space to hold space
  3. $p -- On the last line, print the pattern space

Another practical use -- print a line and the line before it:

# Print the line before each ERROR line (gives context)
sed -n '/ERROR/{x;p;x;p;};h' /tmp/practice.log 2>/dev/null || true

For most practical tasks, you will not need the hold space. The pattern space and regular commands handle 95% of use cases.


Debug This: sed Substitution Not Working

You try to uncomment a line in a config file:

sed 's/^#listen_port/listen_port/' /tmp/sed-lab.txt

But nothing changes. The line still has the #.

Diagnosis:

# Look at the actual line
grep "listen_port" /tmp/sed-lab.txt
# listen_port = 8080

The line is not commented out. There is no # before listen_port. Your pattern does not match anything, so nothing changes. sed does not produce an error when a pattern does not match -- it just leaves the line unchanged.

Another common issue:

# Trying to replace a path
sed 's/home/user/opt/app/' file
# ERROR: unknown option to 's' command

The / in the path conflicts with the / delimiter. Fix it by using a different delimiter:

sed 's|home/user|opt/app|' file

sed debugging tips:

  1. Always test without -i first -- let sed print to stdout
  2. Use -n with p to see which lines match: sed -n '/pattern/p'
  3. When in doubt, print the file and look at the actual content
  4. Use a different delimiter when patterns contain /
  5. Remember: BRE by default, add -E for extended regex

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 21 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - sed processes input line-by-line through the pattern space    |
|  - s/old/new/g is the workhorse command (substitute)            |
|  - Use g flag for all occurrences, i flag for case-insensitive  |
|  - Addresses target specific lines: numbers, patterns, ranges   |
|  - d deletes lines, i inserts before, a appends after           |
|  - -i edits files in-place (use -i.bak for safety)              |
|  - Use | or # as delimiter when patterns contain /              |
|  - -E enables extended regex (cleaner syntax)                    |
|  - & in replacement = entire match; \1 = first capture group    |
|  - Always test without -i before modifying files                 |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Config File Editing

Take a copy of /etc/ssh/sshd_config and use sed to:

  • Remove all comment lines and blank lines
  • Change #Port 22 to Port 2222 (uncomment and change)
  • Change #PermitRootLogin to PermitRootLogin no

Exercise 2: Log Processing

Using the practice log from Chapter 20, use sed to:

  • Replace all IP addresses with [REDACTED]
  • Convert all timestamps from HH:MM:SS to just HH:MM
  • Remove all INFO lines, keeping only WARN, ERROR, and FATAL

Exercise 3: Batch File Rename

Create 10 files named report_2024_01.txt through report_2024_10.txt. Use a combination of ls and sed to generate mv commands that rename them to report_2025_01.txt through report_2025_10.txt. Pipe the output to bash to execute the renames.

Exercise 4: Data Reformatting

Create a CSV file with names in "First Last" format. Use sed to convert them to "Last, First" format.

Bonus Challenge

Write a sed script (using -f) that takes an HTML file and strips all HTML tags, converting it to plain text. Test it on a simple HTML file you create.

awk: Pattern Scanning & Reporting

Why This Matters

You are staring at a log file with millions of lines. You need to know: what is the average response time for requests to the /api/orders endpoint? Or you have a CSV file with sales data and you need to sum the revenue column grouped by region. Or you need to reformat the output of ps aux to show only the top memory consumers with their process names and memory percentages.

grep can find lines. sed can transform text. But when you need to compute, restructure, or report on structured data, you need awk.

awk is a pattern-scanning and processing language. It automatically splits every line into fields, has variables for tracking state across lines, supports arithmetic, and has built-in constructs for conditional logic and loops. It sits right at the boundary between a Unix utility and a programming language -- and that is exactly what makes it so powerful for data processing.


Try This Right Now

# Print the 1st and 3rd fields of /etc/passwd (username and UID)
awk -F: '{print $1, $3}' /etc/passwd | head -5

# Sum a column of numbers
echo -e "10\n20\n30\n40" | awk '{sum += $1} END {print "Total:", sum}'

# Find processes using more than 1% memory
ps aux | awk '$4 > 1.0 {print $4"%", $11}'

# Count lines in a file (like wc -l)
awk 'END {print NR}' /etc/passwd

# Print lines longer than 80 characters
awk 'length > 80' /etc/services | head -5

The awk Program Structure

Every awk program follows this pattern:

pattern { action }
  • pattern -- a condition that selects which lines to process
  • action -- what to do with the selected lines (enclosed in { })

If you omit the pattern, the action applies to every line. If you omit the action, the default action is to print the line.

# Pattern only (print matching lines)
awk '/ERROR/' logfile

# Action only (applies to every line)
awk '{print $1}' logfile

# Both pattern and action
awk '/ERROR/ {print $1, $4}' logfile

The Three Sections: BEGIN, Main, END

awk '
    BEGIN { ... }      # Runs once, before processing any input
    /pattern/ { ... }  # Runs for each matching input line
    END { ... }        # Runs once, after all input is processed
'
+------------------------------------------------------------------+
|                                                                   |
|  BEGIN { setup code }     <--- Runs once before input             |
|          |                                                        |
|          v                                                        |
|  +----[ Read line ]<---+  <--- Main loop: for each line          |
|  |   pattern { action } |                                         |
|  |   pattern { action } |                                         |
|  +------->---------+    |                                         |
|          |              |                                         |
|          +--- more lines?                                         |
|          |                                                        |
|          v (no more lines)                                        |
|  END { cleanup code }     <--- Runs once after all input          |
|                                                                   |
+------------------------------------------------------------------+

Example:

awk '
    BEGIN { print "=== User Report ===" }
    /\/bin\/bash$/ { print $1 }
    END { print "=== End ===" }
' FS=: /etc/passwd

Fields: $1, $2, $NF

awk automatically splits each input line into fields. By default, the delimiter is whitespace (spaces and tabs).

SymbolMeaning
$0The entire current line
$1First field
$2Second field
$NFLast field
$(NF-1)Second-to-last field
NFNumber of fields on this line
NRCurrent line number (record number)
# Sample: ps output
ps aux | head -5 | awk '{print "PID:", $2, "  CMD:", $11}'

# Print the last field of each line
echo -e "one two three\nfour five six" | awk '{print $NF}'
# three
# six

# Print line number and line content
awk '{print NR": "$0}' /etc/hostname

Changing the Field Separator

Use -F to set a custom field separator:

# Parse /etc/passwd (colon-separated)
awk -F: '{print "User:", $1, "  Shell:", $7}' /etc/passwd | head -5

# Parse CSV
echo "Alice,30,Engineering" | awk -F, '{print $1, "is in", $3}'
# Alice is in Engineering

# Multiple separator characters
echo "key=value" | awk -F= '{print "Key:", $1, "Value:", $2}'

You can also set FS in the BEGIN block:

awk 'BEGIN {FS=":"} {print $1, $3}' /etc/passwd | head -5

Built-In Variables

VariableMeaningDefault
FSInput field separatorwhitespace
OFSOutput field separatorspace
RSInput record separatornewline
ORSOutput record separatornewline
NRCurrent record number (across all files)--
NFNumber of fields in current record--
FNRRecord number in current file--
FILENAMECurrent input filename--

OFS: Output Field Separator

When you use a comma in print, awk inserts the OFS between fields:

# Default OFS is space
awk -F: '{print $1, $3}' /etc/passwd | head -3
# root 0
# daemon 1
# bin 2

# Set OFS to tab
awk -F: -v OFS='\t' '{print $1, $3}' /etc/passwd | head -3
# root	0
# daemon	1
# bin	2

# Set OFS to comma (create CSV)
awk -F: -v OFS=',' '{print $1, $3, $7}' /etc/passwd | head -3
# root,0,/bin/bash
# daemon,1,/usr/sbin/nologin
# bin,2,/usr/sbin/nologin

Think About It: What is the difference between print $1, $2 (with comma) and print $1 $2 (without comma)? Try both and observe the output.

NR and FNR

# Print line numbers
awk '{print NR, $0}' /etc/hostname

# Skip the header row (line 1)
awk 'NR > 1 {print}' data.csv

# Print specific lines
awk 'NR >= 5 && NR <= 10' /etc/passwd

Patterns: Selecting Lines

Regular Expression Patterns

# Lines matching a regex
awk '/^root/' /etc/passwd

# Lines NOT matching a regex
awk '!/^#/' /etc/ssh/sshd_config

# Field-specific regex
awk -F: '$7 ~ /bash/' /etc/passwd
# Lines where field 7 contains "bash"

awk -F: '$7 !~ /nologin/' /etc/passwd
# Lines where field 7 does NOT contain "nologin"

Comparison Patterns

# Numeric comparisons
awk -F: '$3 >= 1000' /etc/passwd
# Users with UID >= 1000 (regular users)

awk -F: '$3 == 0' /etc/passwd
# Users with UID 0 (root)

# String comparisons
awk -F: '$1 == "root"' /etc/passwd

Range Patterns

# Print lines between two patterns (inclusive)
awk '/START/,/END/' file

# Print lines between line 5 and line 10
awk 'NR==5, NR==10' file

Compound Patterns

# AND
awk -F: '$3 >= 1000 && $7 ~ /bash/' /etc/passwd

# OR
awk '/ERROR/ || /FATAL/' logfile

# NOT
awk '!/^#/ && !/^$/' config.file

printf: Formatted Output

print is convenient, but printf gives you precise control over formatting:

# Basic printf (no automatic newline!)
awk '{printf "%-20s %5d\n", $1, $3}' FS=: /etc/passwd | head -5

Output:

root                     0
daemon                   1
bin                      2
sys                      3
sync                     4

Format Specifiers

FormatMeaning
%sString
%dInteger
%fFloating point
%eScientific notation
%xHexadecimal
%oOctal
%%Literal percent sign

Width and Alignment

ModifierMeaning
%10sRight-aligned, 10 chars wide
%-10sLeft-aligned, 10 chars wide
%05dZero-padded, 5 digits
%.2fFloat with 2 decimal places
# Formatted table output
awk -F: '
    BEGIN { printf "%-15s %6s %s\n", "USERNAME", "UID", "SHELL" }
    $3 >= 1000 {
        printf "%-15s %6d %s\n", $1, $3, $7
    }
' /etc/passwd

Output:

USERNAME           UID SHELL
nobody           65534 /usr/sbin/nologin
user1             1000 /bin/bash
user2             1001 /bin/zsh

Conditionals and Loops in awk

awk supports full programming constructs.

if-else

awk -F: '{
    if ($3 == 0) {
        print $1, "is root"
    } else if ($3 < 1000) {
        print $1, "is a system account"
    } else {
        print $1, "is a regular user"
    }
}' /etc/passwd

Ternary Operator

awk -F: '{
    type = ($3 < 1000) ? "system" : "regular"
    print $1, type
}' /etc/passwd | head -5

for Loop

# Print each field on its own line
echo "one two three four five" | awk '{
    for (i = 1; i <= NF; i++) {
        print "Field", i":", $i
    }
}'

while Loop

# Factorial calculator
echo "5" | awk '{
    n = $1
    result = 1
    while (n > 1) {
        result *= n
        n--
    }
    print $1"! =", result
}'
# 5! = 120

Hands-On: Practical awk Examples

Setup

cat > /tmp/sales.csv << 'CSV'
Region,Product,Quantity,Price
North,Widget,100,9.99
South,Widget,150,9.99
East,Gadget,200,19.99
West,Widget,75,9.99
North,Gadget,120,19.99
South,Gadget,180,19.99
East,Widget,90,9.99
West,Gadget,60,19.99
North,Doohickey,50,29.99
South,Doohickey,80,29.99
CSV

Example 1: Total Revenue

awk -F, 'NR > 1 {
    revenue = $3 * $4
    total += revenue
}
END {
    printf "Total Revenue: $%.2f\n", total
}' /tmp/sales.csv

Example 2: Revenue by Region

awk -F, 'NR > 1 {
    region_rev[$1] += $3 * $4
}
END {
    for (region in region_rev) {
        printf "%-10s $%10.2f\n", region, region_rev[region]
    }
}' /tmp/sales.csv

Example 3: Revenue by Product

awk -F, 'NR > 1 {
    prod_qty[$2] += $3
    prod_rev[$2] += $3 * $4
}
END {
    printf "%-12s %8s %12s\n", "Product", "Qty", "Revenue"
    printf "%-12s %8s %12s\n", "-------", "---", "-------"
    for (p in prod_qty) {
        printf "%-12s %8d $%10.2f\n", p, prod_qty[p], prod_rev[p]
    }
}' /tmp/sales.csv

Example 4: Parse ps Output for Top Memory Users

ps aux | awk 'NR > 1 {
    mem[$11] += $4
}
END {
    for (proc in mem) {
        if (mem[proc] > 0.5) {
            printf "%6.1f%%  %s\n", mem[proc], proc
        }
    }
}' | sort -rn | head -10

Example 5: Log Analysis

cat > /tmp/access.log << 'LOG'
10.0.0.1 - - [10/Mar/2025:14:00:01] "GET /api/users HTTP/1.1" 200 1234 0.045
10.0.0.2 - - [10/Mar/2025:14:00:02] "POST /api/orders HTTP/1.1" 201 567 0.230
10.0.0.1 - - [10/Mar/2025:14:00:03] "GET /api/users HTTP/1.1" 200 1234 0.038
10.0.0.3 - - [10/Mar/2025:14:00:04] "GET /api/products HTTP/1.1" 200 8901 0.120
10.0.0.2 - - [10/Mar/2025:14:00:05] "GET /api/users HTTP/1.1" 200 1234 0.042
10.0.0.1 - - [10/Mar/2025:14:00:06] "POST /api/orders HTTP/1.1" 500 234 1.500
10.0.0.4 - - [10/Mar/2025:14:00:07] "GET /api/products HTTP/1.1" 200 8901 0.115
10.0.0.1 - - [10/Mar/2025:14:00:08] "GET /health HTTP/1.1" 200 2 0.001
LOG

# Average response time per endpoint
awk '{
    endpoint = $7
    time = $NF
    count[endpoint]++
    total[endpoint] += time
}
END {
    printf "%-20s %8s %10s\n", "Endpoint", "Requests", "Avg Time"
    printf "%-20s %8s %10s\n", "--------", "--------", "--------"
    for (ep in count) {
        printf "%-20s %8d %10.3fs\n", ep, count[ep], total[ep]/count[ep]
    }
}' /tmp/access.log

Example 6: Status Code Summary

awk '{
    codes[$9]++
}
END {
    for (code in codes) {
        printf "HTTP %s: %d requests\n", code, codes[code]
    }
}' /tmp/access.log | sort

Associative Arrays

awk has built-in associative arrays (similar to dictionaries/hash maps). You have already seen them in the examples above. Here are the details:

# Arrays are created by use
awk 'BEGIN {
    fruits["apple"] = 5
    fruits["banana"] = 3
    fruits["cherry"] = 8

    # Iterate over keys
    for (key in fruits) {
        print key, fruits[key]
    }

    # Check if key exists
    if ("apple" in fruits) {
        print "We have apples!"
    }

    # Delete an element
    delete fruits["banana"]

    # Length of array (GNU awk)
    print "Items:", length(fruits)
}'

Counting Pattern: The Most Common Use

# Count words in a file
awk '{
    for (i = 1; i <= NF; i++) {
        words[tolower($i)]++
    }
}
END {
    for (w in words) {
        printf "%5d %s\n", words[w], w
    }
}' /tmp/sed-lab.txt 2>/dev/null | sort -rn | head -10

Useful Built-In Functions

String Functions

# length() -- string length
echo "hello" | awk '{print length($0)}'   # 5

# substr() -- substring
echo "Hello World" | awk '{print substr($0, 7)}'   # World
echo "Hello World" | awk '{print substr($0, 1, 5)}'   # Hello

# index() -- find substring position
echo "Hello World" | awk '{print index($0, "World")}'   # 7

# split() -- split string into array
echo "a:b:c:d" | awk '{n = split($0, arr, ":"); for(i=1;i<=n;i++) print arr[i]}'

# toupper() / tolower()
echo "Hello World" | awk '{print toupper($0)}'   # HELLO WORLD
echo "Hello World" | awk '{print tolower($0)}'   # hello world

# gsub() -- global substitution (returns count of replacements)
echo "aabaa" | awk '{gsub(/a/, "X"); print}'   # XXbXX

# sub() -- substitute first occurrence only
echo "aabaa" | awk '{sub(/a/, "X"); print}'   # Xabaa

# match() -- regex match (sets RSTART and RLENGTH)
echo "Error at line 42" | awk '{match($0, /[0-9]+/); print substr($0, RSTART, RLENGTH)}'
# 42

Numeric Functions

awk 'BEGIN {
    print int(3.9)        # 3
    print sqrt(144)       # 12
    print log(2.718)      # ~1
    print sin(3.14159)    # ~0
    print rand()          # random 0-1
    srand()               # seed random number generator
}'

Think About It: Why does awk use gsub and sub for substitution instead of using the s/// syntax like sed? Think about awk's design as a programming language versus sed's design as a stream editor.


Debug This: awk Not Splitting Fields Correctly

You parse a CSV file and the fields seem wrong:

echo 'Alice,"New York, NY",30' | awk -F, '{print "Name:", $1, "City:", $2}'
# Name: Alice City: "New York

Problem: awk's -F, does not handle quoted CSV fields. The comma inside the quotes is treated as a field separator.

Solutions:

  1. Use FPAT (GNU awk) to define what a field looks like instead of what separates fields:
echo 'Alice,"New York, NY",30' | awk -v FPAT='([^,]*)|("[^"]*")' '{
    print "Name:", $1
    print "City:", $2
    print "Age:", $3
}'
  1. For serious CSV work, use a dedicated CSV tool like csvtool, mlr (Miller), or python -m csv.

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 22 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - awk structure: pattern { action }                             |
|  - Fields: $1, $2, ..., $NF (automatic splitting)              |
|  - -F sets the field separator                                   |
|  - BEGIN runs before input; END runs after all input             |
|  - NR = line number, NF = number of fields                      |
|  - printf for formatted output (%-10s, %6d, %.2f)               |
|  - Associative arrays for counting and grouping                  |
|  - Built-in: length, substr, split, gsub, toupper, tolower     |
|  - Comparisons: $3 > 100, $1 == "root", $7 ~ /bash/           |
|  - awk is ideal for: column extraction, aggregation,            |
|    reformatting structured text, and simple reporting            |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: System Report

Write an awk command that parses df -h output and prints only filesystems that are more than 50% full, formatted as a clean table.

Exercise 2: CSV Analysis

Using /tmp/sales.csv, write awk commands to:

  • Find the region with the highest total revenue
  • Find the product with the highest average price
  • Generate a formatted report with headers, data, and totals

Exercise 3: Log Parser

Using /tmp/access.log, write an awk program that:

  • Counts requests per IP address
  • Identifies the slowest request (highest response time)
  • Calculates the total bytes transferred
  • Reports the percentage of 5xx errors

Exercise 4: /etc/passwd Analysis

Using awk, produce a report showing:

  • Total number of users
  • Number of users with bash as their shell
  • Number of system accounts (UID < 1000)
  • Number of regular accounts (UID >= 1000)
  • The user with the highest UID

Bonus Challenge

Write an awk program that reads /etc/passwd and generates a properly formatted HTML table with columns for Username, UID, GID, Home Directory, and Shell. Include a header row and alternating row colors using inline CSS.

Text Processing Toolkit

Why This Matters

The power of Linux lies not in any single tool, but in how tools combine. Each utility in this chapter does one thing well: sort sorts, uniq deduplicates, cut extracts columns, tr translates characters. Alone, each is simple. Piped together, they become a data processing pipeline that can rival purpose-built programs.

Need to find the top 10 most active IP addresses in a web server log? That is awk, sort, uniq -c, and head piped together. Need to compare two configuration files to see what changed? That is diff. Need to run a command on every file matching a pattern? That is xargs.

This chapter is your reference and training ground for the essential text processing utilities. Master these tools and their combinations, and you will solve most data problems without ever writing a script.


Try This Right Now

# Create a sample data file
cat > /tmp/toolkit-data.txt << 'DATA'
banana
apple
cherry
banana
date
apple
elderberry
banana
fig
apple
cherry
date
grape
DATA

# Sort, count duplicates, show top 3
sort /tmp/toolkit-data.txt | uniq -c | sort -rn | head -3
#   3 banana
#   3 apple
#   2 cherry

# One-liner: find the 5 most common words in a file
tr -s '[:space:]' '\n' < /etc/services | tr '[:upper:]' '[:lower:]' | \
    sort | uniq -c | sort -rn | head -5

sort: Ordering Lines

sort arranges lines in order. It is far more powerful than just alphabetical sorting.

# Alphabetical sort (default)
sort /tmp/toolkit-data.txt

# Reverse order
sort -r /tmp/toolkit-data.txt

# Numeric sort (-n)
echo -e "10\n2\n100\n20\n1" | sort -n
# 1  2  10  20  100

# Without -n, "10" comes before "2" (lexicographic)
echo -e "10\n2\n100\n20\n1" | sort
# 1  10  100  2  20

# Human-readable numeric sort (-h): handles K, M, G suffixes
echo -e "1G\n500M\n2G\n100K" | sort -h
# 100K  500M  1G  2G

# Sort by specific field
echo -e "Alice 30\nBob 25\nCharlie 35" | sort -k2 -n
# Bob 25  Alice 30  Charlie 35

# Sort by multiple keys
echo -e "A 3\nB 1\nA 1\nB 3" | sort -k1,1 -k2,2n
# A 1  A 3  B 1  B 3

# Remove duplicates while sorting (-u)
echo -e "banana\napple\nbanana\ncherry" | sort -u
# apple  banana  cherry

# Case-insensitive sort (-f)
echo -e "Banana\napple\nCherry" | sort -f
# apple  Banana  Cherry

# Sort CSV by 3rd column (comma-separated)
sort -t, -k3 -n file.csv

Key Specification: -k

The -k flag specifies which field to sort on:

# -k2          Sort on field 2 through end of line
# -k2,2        Sort on field 2 only
# -k2,2n       Sort on field 2, numerically
# -k2,2nr      Sort on field 2, numerically, reversed
# -k1,1 -k3,3n Sort on field 1 (alpha), then field 3 (numeric)
# Practical: sort /etc/passwd by UID (field 3)
sort -t: -k3,3n /etc/passwd | head -5

uniq: Removing Duplicates

uniq removes adjacent duplicate lines. This means you almost always need to sort first.

# Remove adjacent duplicates (sort first!)
sort /tmp/toolkit-data.txt | uniq

# Count occurrences (-c)
sort /tmp/toolkit-data.txt | uniq -c
#   3 apple
#   3 banana
#   2 cherry
#   2 date
#   1 elderberry
#   1 fig
#   1 grape

# Show only duplicated lines (-d)
sort /tmp/toolkit-data.txt | uniq -d
# apple  banana  cherry  date

# Show only unique lines (appearing exactly once) (-u)
sort /tmp/toolkit-data.txt | uniq -u
# elderberry  fig  grape

# Case-insensitive (-i)
echo -e "Apple\napple\nAPPLE" | sort | uniq -i
# Apple

The sort | uniq -c | sort -rn pattern is so common it deserves its own shorthand in your memory:

# "Count and rank" pattern -- you will use this constantly
some_command | sort | uniq -c | sort -rn | head -10

cut: Extracting Columns

cut extracts specific columns or fields from each line.

# Extract by character position
echo "Hello World" | cut -c1-5
# Hello

echo "Hello World" | cut -c7-
# World

# Extract by delimiter and field
echo "root:x:0:0:root:/root:/bin/bash" | cut -d: -f1
# root

echo "root:x:0:0:root:/root:/bin/bash" | cut -d: -f1,7
# root:/bin/bash

echo "root:x:0:0:root:/root:/bin/bash" | cut -d: -f1,3-5
# root:0:0:root

# Extract from CSV
echo "Alice,30,Engineering" | cut -d, -f1,3
# Alice,Engineering

Practical Uses

# Get all usernames
cut -d: -f1 /etc/passwd

# Get all shells in use
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -rn

# Extract columns from space-delimited output
# (cut works poorly with multiple spaces -- use awk instead)
df -h | cut -c1-20,45-

Think About It: When would you choose cut over awk for extracting fields? Hint: think about when the input is cleanly delimited versus when fields are separated by variable whitespace.


paste: Merging Lines Side by Side

paste joins lines from multiple files or merges consecutive lines.

# Merge two files side by side
echo -e "Alice\nBob\nCharlie" > /tmp/names.txt
echo -e "30\n25\n35" > /tmp/ages.txt
paste /tmp/names.txt /tmp/ages.txt
# Alice	30
# Bob	25
# Charlie	35

# Custom delimiter
paste -d, /tmp/names.txt /tmp/ages.txt
# Alice,30
# Bob,25
# Charlie,35

# Merge all lines into one (serial mode)
echo -e "one\ntwo\nthree" | paste -sd,
# one,two,three

# Merge every N lines (using - as stdin placeholder)
echo -e "1\n2\n3\n4\n5\n6" | paste - - -
# 1	2	3
# 4	5	6

rm /tmp/names.txt /tmp/ages.txt

tr: Translating Characters

tr translates (replaces) or deletes characters. It works on characters, not strings.

# Replace lowercase with uppercase
echo "hello world" | tr 'a-z' 'A-Z'
# HELLO WORLD

# Replace uppercase with lowercase
echo "HELLO WORLD" | tr 'A-Z' 'a-z'
# hello world

# Replace spaces with newlines (one word per line)
echo "one two three" | tr ' ' '\n'
# one
# two
# three

# Squeeze repeated characters (-s)
echo "hello     world" | tr -s ' '
# hello world

# Delete characters (-d)
echo "Hello, World! 123" | tr -d '[:digit:]'
# Hello, World!

echo "Hello, World! 123" | tr -d '[:punct:]'
# Hello World 123

# Replace non-alphanumeric with underscores
echo "file name (2).txt" | tr -c '[:alnum:].\n' '_'
# file_name__2_.txt

# Squeeze multiple newlines into one
cat file_with_blanks.txt | tr -s '\n'

# Remove carriage returns (Windows line endings)
tr -d '\r' < windows-file.txt > unix-file.txt

Character Classes for tr

ClassCharacters
[:alpha:]Letters
[:digit:]Digits
[:alnum:]Letters and digits
[:upper:]Uppercase
[:lower:]Lowercase
[:space:]Whitespace
[:punct:]Punctuation

wc: Counting

wc (word count) counts lines, words, and characters.

# All three counts
wc /etc/hosts
#   12   35  338 /etc/hosts
#  lines words bytes

# Lines only (-l)
wc -l /etc/passwd
# 35 /etc/passwd

# Words only (-w)
wc -w /etc/hosts

# Characters only (-c for bytes, -m for characters)
wc -c /etc/hosts
wc -m /etc/hosts

# Count from a pipeline
ps aux | wc -l

# Multiple files
wc -l /etc/passwd /etc/group /etc/hosts

head and tail: Beginning and End

# First 10 lines (default)
head /etc/passwd

# First N lines
head -n 5 /etc/passwd
head -5 /etc/passwd          # Shorthand

# All but the last N lines
head -n -5 /etc/passwd       # Everything except last 5

# Last 10 lines (default)
tail /etc/passwd

# Last N lines
tail -n 5 /etc/passwd
tail -5 /etc/passwd          # Shorthand

# Starting from line N
tail -n +5 /etc/passwd       # From line 5 to end

# Follow a file in real time (-f)
tail -f /var/log/syslog

# Follow and retry if file is recreated (-F)
tail -F /var/log/nginx/access.log

Extracting a Range of Lines

# Lines 10-20 of a file
sed -n '10,20p' file

# Or with head and tail
head -20 file | tail -11

tee: Split Output

tee writes output to both stdout and one or more files:

# Save output while also displaying it
ls -la /etc | tee /tmp/etc-listing.txt

# Append instead of overwrite
echo "new entry" | tee -a /tmp/log.txt

# Write to multiple files
echo "data" | tee file1.txt file2.txt file3.txt

# Use in a pipeline (save intermediate results)
ps aux | tee /tmp/all-processes.txt | grep nginx | tee /tmp/nginx-processes.txt | wc -l

diff: Comparing Files

diff shows the differences between two files.

# Create two similar files
echo -e "line 1\nline 2\nline 3" > /tmp/file1.txt
echo -e "line 1\nline TWO\nline 3\nline 4" > /tmp/file2.txt

# Normal diff
diff /tmp/file1.txt /tmp/file2.txt
# 2c2
# < line 2
# ---
# > line TWO
# 3a4
# > line 4

# Unified diff (-u) -- most readable format
diff -u /tmp/file1.txt /tmp/file2.txt
# --- /tmp/file1.txt
# +++ /tmp/file2.txt
# @@ -1,3 +1,4 @@
#  line 1
# -line 2
# +line TWO
#  line 3
# +line 4

# Side-by-side (-y)
diff -y /tmp/file1.txt /tmp/file2.txt
# line 1            line 1
# line 2          | line TWO
# line 3            line 3
#                 > line 4

# Just tell me if they differ (exit code)
diff -q /tmp/file1.txt /tmp/file2.txt
# Files /tmp/file1.txt and /tmp/file2.txt differ

# Recursive diff on directories
diff -r /etc/ssh/ /tmp/ssh-backup/

# Color diff (if available)
diff --color /tmp/file1.txt /tmp/file2.txt

rm /tmp/file1.txt /tmp/file2.txt

comm: Compare Sorted Files

comm compares two sorted files and shows three columns:

  1. Lines only in file 1
  2. Lines only in file 2
  3. Lines in both files
echo -e "apple\nbanana\ncherry" > /tmp/a.txt
echo -e "banana\ncherry\ndate" > /tmp/b.txt

comm /tmp/a.txt /tmp/b.txt
# apple
# 		banana
# 		cherry
# 	date

# Show only lines unique to file 1
comm -23 /tmp/a.txt /tmp/b.txt
# apple

# Show only lines unique to file 2
comm -13 /tmp/a.txt /tmp/b.txt
# date

# Show only lines in common
comm -12 /tmp/a.txt /tmp/b.txt
# banana
# cherry

rm /tmp/a.txt /tmp/b.txt

join: Database-Style Joins

join merges two sorted files on a common field, like an SQL JOIN:

echo -e "1 Alice\n2 Bob\n3 Charlie" > /tmp/users.txt
echo -e "1 Engineering\n2 Marketing\n3 Engineering" > /tmp/depts.txt

join /tmp/users.txt /tmp/depts.txt
# 1 Alice Engineering
# 2 Bob Marketing
# 3 Charlie Engineering

# Join on different fields
echo -e "Alice 1\nBob 2\nCharlie 3" > /tmp/users2.txt
join -1 2 -2 1 /tmp/users2.txt /tmp/depts.txt
# 1 Alice Engineering
# 2 Bob Marketing
# 3 Charlie Engineering

rm /tmp/users.txt /tmp/depts.txt /tmp/users2.txt

xargs: Building Commands from Input

xargs reads items from stdin and passes them as arguments to a command. It is the bridge between output and execution.

# Basic: pass lines as arguments
echo -e "file1.txt\nfile2.txt\nfile3.txt" | xargs ls -l

# Find and delete (safer than find -delete)
find /tmp -name "*.bak" -print | xargs rm -v

# Null-delimited (handles spaces in filenames)
find /tmp -name "*.log" -print0 | xargs -0 ls -l

# Run command for each item individually (-I)
echo -e "alice\nbob\ncharlie" | xargs -I {} echo "Hello, {}!"
# Hello, alice!
# Hello, bob!
# Hello, charlie!

# Limit number of arguments per command (-n)
echo -e "1\n2\n3\n4\n5\n6" | xargs -n 2 echo
# 1 2
# 3 4
# 5 6

# Parallel execution (-P)
echo -e "1\n2\n3\n4" | xargs -P 4 -I {} sh -c 'sleep 1; echo "Done: {}"'
# All four complete in ~1 second instead of ~4

# Prompt before executing (-p)
echo "important-file.txt" | xargs -p rm

# Practical: grep across files found by find
find /etc -name "*.conf" -print0 2>/dev/null | xargs -0 grep -l "port" 2>/dev/null

WARNING: Without -print0 and -0, xargs breaks on filenames with spaces, quotes, or backslashes. Always use the null-delimiter pair for robustness.


Hands-On: Combining Tools

The real power emerges when you combine these tools in pipelines.

Setup

cat > /tmp/weblog.txt << 'LOG'
10.0.0.1 GET /index.html 200 1234
10.0.0.2 POST /api/login 200 567
10.0.0.1 GET /api/users 200 8901
10.0.0.3 GET /index.html 200 1234
10.0.0.2 GET /api/users 404 123
10.0.0.1 POST /api/orders 500 234
10.0.0.4 GET /index.html 200 1234
10.0.0.1 GET /api/users 200 8901
10.0.0.2 GET /api/products 200 5678
10.0.0.3 POST /api/orders 201 890
10.0.0.1 GET /api/users 200 8901
10.0.0.5 GET /index.html 200 1234
10.0.0.2 DELETE /api/users/42 403 98
10.0.0.1 GET /favicon.ico 404 0
LOG

Pipeline 1: Top 5 IP Addresses by Request Count

awk '{print $1}' /tmp/weblog.txt | sort | uniq -c | sort -rn | head -5

Output:

      6 10.0.0.1
      4 10.0.0.2
      2 10.0.0.3
      1 10.0.0.5
      1 10.0.0.4

Pipeline 2: Most Requested Endpoints

awk '{print $3}' /tmp/weblog.txt | sort | uniq -c | sort -rn

Pipeline 3: Error Requests (4xx and 5xx)

awk '$4 >= 400 {print $4, $1, $2, $3}' /tmp/weblog.txt | sort

Pipeline 4: Total Bytes Transferred by Endpoint

awk '{bytes[$3] += $5} END {for(ep in bytes) print bytes[ep], ep}' /tmp/weblog.txt | sort -rn

Pipeline 5: Unique IPs per Endpoint

awk '{print $3, $1}' /tmp/weblog.txt | sort -u | awk '{print $1}' | sort | uniq -c | sort -rn

Pipeline 6: Find Large Files and Their Total Size

find /var/log -type f -name "*.log" 2>/dev/null | xargs du -sh 2>/dev/null | sort -rh | head -10

Think About It: Look at Pipeline 5. Why do we need two sort commands? What would happen if we removed the first sort -u?


Debug This: Pipeline Producing Wrong Results

You try to count unique users in /etc/passwd:

cut -d: -f7 /etc/passwd | uniq -c

The output shows every shell with a count of 1, which is wrong. You know /bin/bash appears multiple times.

Problem: uniq only removes adjacent duplicates. Without sorting first, it compares each line only to the previous line.

Fix:

cut -d: -f7 /etc/passwd | sort | uniq -c | sort -rn

Now you see the correct counts.


Quick Reference

+------------------------------------------------------------------+
|  TOOL        | PURPOSE            | KEY FLAGS                     |
+------------------------------------------------------------------+
|  sort        | Order lines        | -n, -r, -k, -t, -u, -h      |
|  uniq        | Deduplicate        | -c, -d, -u, -i               |
|  cut         | Extract columns    | -d, -f, -c                   |
|  paste       | Merge lines        | -d, -s                        |
|  tr          | Translate chars    | -d, -s, -c                   |
|  wc          | Count              | -l, -w, -c, -m               |
|  head        | First N lines      | -n                             |
|  tail        | Last N lines       | -n, -f, -F, +N                |
|  tee         | Split output       | -a                             |
|  diff        | Compare files      | -u, -y, -r, -q               |
|  comm        | Compare sorted     | -1, -2, -3, -12, -23, -13   |
|  join        | Merge on field     | -1, -2, -t                    |
|  xargs       | Build commands     | -I, -0, -n, -P, -p           |
+------------------------------------------------------------------+

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 23 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - sort | uniq -c | sort -rn is the "count and rank" pattern    |
|  - cut extracts columns by delimiter; awk handles variable      |
|    whitespace better                                             |
|  - tr translates or deletes characters (not strings)            |
|  - diff -u shows differences in unified format                   |
|  - xargs converts stdin to command arguments                     |
|  - Always use find -print0 | xargs -0 for safe file handling   |
|  - tee saves output while passing it through the pipeline       |
|  - paste merges files or lines side by side                     |
|  - comm compares sorted files (unique to each, common to both)  |
|  - The real power is in combining tools with pipes              |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Word Frequency

Take any text file (like /usr/share/common-licenses/GPL-3 if available, or download one) and find the 20 most frequently used words. Use tr, sort, uniq, and head.

Exercise 2: Log Analysis Pipeline

Using /tmp/weblog.txt:

  • Find the IP that made the most POST requests
  • Find the endpoint with the highest error rate (4xx/5xx)
  • Calculate the average bytes per request

Exercise 3: Comparing Configurations

Copy /etc/ssh/sshd_config to /tmp/sshd_config_modified. Make three changes to the copy (uncomment a line, change a value, add a new line). Use diff -u to create a patch, then explore comm to see the differences.

Exercise 4: Batch Operations with xargs

Find all .conf files under /etc and use xargs to count the total number of non-comment, non-empty lines across all of them.

find /etc -name "*.conf" -print0 2>/dev/null | \
    xargs -0 grep -v '^#' 2>/dev/null | \
    grep -v '^$' | wc -l

Bonus Challenge

Write a single pipeline (no scripts, no temporary files) that reads /etc/passwd and produces a formatted table showing: shells in the first column, the count of users per shell in the second column, and the usernames in the third column (comma-separated). Sort by count, descending. This combines cut, sort, awk, paste, and more.

Scheduling & Automation

Why This Matters

Every system administrator and developer has tasks that need to run on a schedule: database backups at midnight, log rotation every week, security scans every Sunday, certificate renewals before they expire, health checks every five minutes.

You could set a reminder and do these things manually. But you are human -- you will forget, you will be on vacation, you will be asleep. The machine never forgets. The machine never sleeps.

Linux provides several tools for scheduling automated tasks: cron (the classic scheduler), anacron (for machines that are not always on), at (for one-time future tasks), and systemd timers (the modern alternative). This chapter covers all of them, compares their strengths, and teaches you best practices for automation that runs reliably for years.


Try This Right Now

# See what is currently scheduled for your user
crontab -l 2>/dev/null || echo "No crontab for $(whoami)"

# See all system-wide cron jobs
ls /etc/cron.d/ /etc/cron.daily/ /etc/cron.hourly/ /etc/cron.weekly/ /etc/cron.monthly/ 2>/dev/null

# See all active systemd timers
systemctl list-timers --all --no-pager

# Schedule a one-time command for 2 minutes from now (if 'at' is installed)
echo "echo 'Hello from the future!' >> /tmp/at-test.txt" | at now + 2 minutes 2>/dev/null || echo "Install 'at': sudo apt install at"

# Check what's queued
atq 2>/dev/null

cron: The Classic Scheduler

cron is the traditional Unix job scheduler. It has been around since the 1970s, and it is available on every Linux system. The cron daemon (crond or cron) wakes up every minute, checks the schedule, and runs any jobs that are due.

crontab: User Cron Tables

Each user can have their own crontab (cron table):

# View your crontab
crontab -l

# Edit your crontab
crontab -e

# Remove your crontab entirely
crontab -r

WARNING: crontab -r removes your entire crontab without confirmation. If you have important jobs, back them up first: crontab -l > ~/crontab-backup.txt

Distro Note: On some systems, the default editor for crontab -e is vi. To change it: export EDITOR=nano (or add it to your ~/.bashrc).

Crontab Syntax

Each line in a crontab follows this format:

*  *  *  *  *  command-to-run
│  │  │  │  │
│  │  │  │  └── Day of Week   (0-7, 0 and 7 = Sunday)
│  │  │  └───── Month         (1-12)
│  │  └──────── Day of Month  (1-31)
│  └─────────── Hour          (0-23)
└────────────── Minute        (0-59)
+----------------------------------------------------------------+
|  Field         | Range   | Special Values                      |
+----------------------------------------------------------------+
|  Minute        | 0-59    | * (every), */5 (every 5)           |
|  Hour          | 0-23    | * (every), 1-5 (range)             |
|  Day of Month  | 1-31    | * (every), 1,15 (list)             |
|  Month         | 1-12    | * (every), jan-dec                  |
|  Day of Week   | 0-7     | * (every), mon-fri                  |
+----------------------------------------------------------------+

Common Schedules

# Every minute
* * * * * /path/to/script.sh

# Every 5 minutes
*/5 * * * * /path/to/script.sh

# Every hour at minute 0
0 * * * * /path/to/script.sh

# Every day at 2:30 AM
30 2 * * * /path/to/script.sh

# Every Monday at 9 AM
0 9 * * 1 /path/to/script.sh

# Every weekday at 6 PM
0 18 * * 1-5 /path/to/script.sh

# First day of every month at midnight
0 0 1 * * /path/to/script.sh

# Every 15 minutes during business hours (9-17)
*/15 9-17 * * 1-5 /path/to/script.sh

# Twice a day at 8 AM and 8 PM
0 8,20 * * * /path/to/script.sh

# Every Sunday at 3 AM
0 3 * * 0 /path/to/script.sh

# January 1st at midnight
0 0 1 1 * /path/to/script.sh

Special Strings

Some cron implementations support shorthand:

@reboot    /path/to/script.sh    # Run once at startup
@yearly    /path/to/script.sh    # 0 0 1 1 *
@monthly   /path/to/script.sh    # 0 0 1 * *
@weekly    /path/to/script.sh    # 0 0 * * 0
@daily     /path/to/script.sh    # 0 0 * * *
@hourly    /path/to/script.sh    # 0 * * * *

Environment in cron

Cron jobs run in a minimal environment. Your $PATH, aliases, and shell functions are not available. This is the number one source of cron bugs.

# BAD: relies on $PATH to find python3
* * * * * python3 /opt/myapp/script.py

# GOOD: use absolute paths
* * * * * /usr/bin/python3 /opt/myapp/script.py

# Or set PATH in the crontab
PATH=/usr/local/bin:/usr/bin:/bin
* * * * * python3 /opt/myapp/script.py

You can also set other environment variables at the top of the crontab:

# Set environment at top of crontab
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin
MAILTO=admin@example.com
HOME=/home/myuser

# Jobs follow
30 2 * * * /opt/scripts/backup.sh

Think About It: Why does cron use a minimal environment instead of sourcing the user's shell profile? Think about what could go wrong if cron inherited a user's full interactive environment.


Hands-On: Setting Up a Cron Job

Step 1: Create a Script

mkdir -p ~/scripts
cat > ~/scripts/system-snapshot.sh << 'SCRIPT'
#!/bin/bash
set -euo pipefail

LOGFILE="/tmp/system-snapshots.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

{
    echo "=== System Snapshot: $TIMESTAMP ==="
    echo "Uptime: $(uptime -p)"
    echo "Load: $(cat /proc/loadavg)"
    echo "Memory: $(free -h | awk '/Mem:/ {print $3, "/", $2}')"
    echo "Disk: $(df -h / | awk 'NR==2 {print $3, "/", $2, "("$5" used)"}')"
    echo ""
} >> "$LOGFILE"
SCRIPT
chmod +x ~/scripts/system-snapshot.sh

# Test it
~/scripts/system-snapshot.sh
cat /tmp/system-snapshots.log

Step 2: Schedule It

# Add a cron job to run every 15 minutes
(crontab -l 2>/dev/null; echo "*/15 * * * * $HOME/scripts/system-snapshot.sh") | crontab -

# Verify
crontab -l

Step 3: Check Cron Logs

# On Debian/Ubuntu
grep CRON /var/log/syslog | tail -5

# On RHEL/Fedora
grep CRON /var/log/cron | tail -5

# Or via journald
journalctl -u cron.service --since "30 min ago" --no-pager 2>/dev/null || \
journalctl -u crond.service --since "30 min ago" --no-pager 2>/dev/null

Step 4: Clean Up

# Remove just that one job
crontab -l | grep -v 'system-snapshot' | crontab -

# Verify it is gone
crontab -l

System-Wide Cron

Beyond per-user crontabs, there are system-wide cron locations:

/etc/crontab

The system crontab has an extra field -- the username:

# /etc/crontab format:
# min hour dom month dow USER command
17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly
25 6    * * *   root    test -x /usr/sbin/anacron || run-parts --report /etc/cron.daily
47 6    * * 7   root    test -x /usr/sbin/anacron || run-parts --report /etc/cron.weekly
52 6    1 * *   root    test -x /usr/sbin/anacron || run-parts --report /etc/cron.monthly

/etc/cron.d/

Individual cron files for system services (same format as /etc/crontab):

ls /etc/cron.d/

Drop-In Directories

Scripts placed in these directories run at the indicated frequency:

/etc/cron.hourly/    # Runs every hour
/etc/cron.daily/     # Runs every day
/etc/cron.weekly/    # Runs every week
/etc/cron.monthly/   # Runs every month

Scripts in these directories must be executable and should not have a .sh extension on some systems (run-parts may skip files with dots in the name).

# List daily cron jobs
ls -la /etc/cron.daily/

# You'll see things like:
# logrotate
# man-db
# apt-compat

anacron: For Machines That Sleep

cron expects the machine to be running 24/7. If a daily job is scheduled for 3 AM and the machine is off at 3 AM, the job simply does not run. anacron fixes this.

anacron tracks when jobs last ran and executes them when the machine comes back online. It is ideal for laptops and desktops that are not always on.

# Check anacron configuration
cat /etc/anacrontab

Typical contents:

# period  delay  job-identifier  command
1         5      cron.daily      run-parts /etc/cron.daily
7         10     cron.weekly     run-parts /etc/cron.weekly
@monthly  15     cron.monthly    run-parts /etc/cron.monthly
FieldMeaning
periodHow often (in days)
delayMinutes to wait after boot before running
job-identifierUnique name (timestamp stored in /var/spool/anacron/)
commandWhat to run

anacron checks timestamps in /var/spool/anacron/. If a job has not run within its period, anacron runs it (after the specified delay).

# See when jobs last ran
cat /var/spool/anacron/cron.daily
# 20250310

# Force anacron to run all pending jobs
sudo anacron -f

at: One-Time Scheduled Commands

The at command schedules a command to run once at a specific time in the future.

# Install if needed
sudo apt install at        # Debian/Ubuntu
sudo dnf install at        # Fedora/RHEL

Scheduling with at

# Run at a specific time
at 14:30 << 'JOB'
echo "Reminder: meeting in 30 minutes" | mail -s "Meeting" user@example.com
JOB

# Run at a relative time
at now + 30 minutes << 'JOB'
/opt/scripts/cleanup.sh
JOB

# Other time formats
at midnight
at noon
at teatime        # 4:00 PM
at 9:00 AM tomorrow
at now + 2 hours
at 3:00 PM next Friday

Managing at Jobs

# List pending jobs
atq

# View a specific job's commands
at -c 5    # Show job number 5

# Remove a pending job
atrm 5     # Remove job number 5

batch: Run When Load is Low

batch is like at but waits until the system load drops below a threshold:

batch << 'JOB'
/opt/scripts/heavy-processing.sh
JOB

This is useful for resource-intensive tasks that should not interfere with normal operations.


systemd Timers: The Modern Way

systemd timers are a powerful, modern alternative to cron. We introduced them in Chapter 16. Here we go deeper.

Why Use systemd Timers?

Featurecronsystemd Timers
LoggingMail output or redirect manuallyAutomatic journal integration
DependenciesNoneFull systemd dependency system
Resource controlNonecgroups (CPU, memory limits)
Randomized delayNoYes (prevent thundering herd)
Persistent (catch-up)No (anacron needed)Built-in with Persistent=true
Security sandboxingNoFull systemd sandboxing
Calendar precisionMinute-levelSecond-level or beyond
Status/debuggingCheck mail/logssystemctl status, list-timers

Creating a systemd Timer

You need two files: a .service (what to run) and a .timer (when to run).

Service file (/etc/systemd/system/disk-report.service):

[Unit]
Description=Generate Disk Usage Report

[Service]
Type=oneshot
ExecStart=/usr/local/bin/disk-report.sh
User=root
StandardOutput=journal
StandardError=journal

Timer file (/etc/systemd/system/disk-report.timer):

[Unit]
Description=Run Disk Report Every 6 Hours

[Timer]
OnCalendar=*-*-* 00/6:00:00
Persistent=true
RandomizedDelaySec=300

[Install]
WantedBy=timers.target

The script (/usr/local/bin/disk-report.sh):

#!/bin/bash
set -euo pipefail
echo "=== Disk Report: $(date) ==="
df -h | grep -E '^/dev/'
echo ""
du -sh /var/log/ /var/cache/ /tmp/ 2>/dev/null
sudo chmod +x /usr/local/bin/disk-report.sh
sudo systemctl daemon-reload
sudo systemctl enable --now disk-report.timer

OnCalendar Expressions

OnCalendar=*-*-* 02:00:00          # Daily at 2 AM
OnCalendar=Mon *-*-* 09:00:00      # Mondays at 9 AM
OnCalendar=*-*-* 00/6:00:00        # Every 6 hours (0, 6, 12, 18)
OnCalendar=*-*-* *:00/15:00        # Every 15 minutes
OnCalendar=*-*-1 00:00:00          # First of each month
OnCalendar=Mon..Fri *-*-* 08:00:00 # Weekdays at 8 AM
OnCalendar=hourly                   # Shorthand for *-*-* *:00:00
OnCalendar=daily                    # Shorthand for *-*-* 00:00:00
OnCalendar=weekly                   # Shorthand for Mon *-*-* 00:00:00

Validate your schedule:

systemd-analyze calendar "Mon..Fri *-*-* 08:00:00"
systemd-analyze calendar "*-*-* *:00/15:00"
systemd-analyze calendar "daily"

Monotonic Timers (Relative)

Instead of calendar-based, trigger relative to system events:

[Timer]
OnBootSec=15min          # 15 minutes after boot
OnStartupSec=30min       # 30 minutes after systemd started
OnUnitActiveSec=1h       # 1 hour after the service last ran
OnUnitInactiveSec=30min  # 30 minutes after the service became inactive

Managing Timers

# List all timers and their next firing time
systemctl list-timers --all --no-pager

# Check a specific timer
systemctl status disk-report.timer

# Manually trigger the associated service (for testing)
sudo systemctl start disk-report.service

# View the service's output
journalctl -u disk-report.service --no-pager -n 20

# Disable a timer
sudo systemctl disable --now disk-report.timer

Hands-On: Migrating from cron to systemd Timer

Let us convert a common cron job to a systemd timer.

The Original Cron Job

# In crontab: run backup every night at 2 AM
0 2 * * * /opt/scripts/backup.sh >> /var/log/backup.log 2>&1

Step 1: Create the Service

sudo tee /etc/systemd/system/nightly-backup.service << 'UNIT'
[Unit]
Description=Nightly Backup
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/opt/scripts/backup.sh
User=root
StandardOutput=journal
StandardError=journal
SyslogIdentifier=nightly-backup

# Security hardening
NoNewPrivileges=yes
ProtectHome=read-only
PrivateTmp=yes
UNIT

Step 2: Create the Timer

sudo tee /etc/systemd/system/nightly-backup.timer << 'UNIT'
[Unit]
Description=Run Nightly Backup at 2 AM

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
RandomizedDelaySec=600

[Install]
WantedBy=timers.target
UNIT

Step 3: Enable and Test

sudo systemctl daemon-reload
sudo systemctl enable --now nightly-backup.timer

# Verify the timer is scheduled
systemctl list-timers nightly-backup.timer --no-pager

# Test by running the service manually
sudo systemctl start nightly-backup.service

# Check the output
journalctl -u nightly-backup.service --no-pager

Step 4: Remove the Old Cron Job

crontab -l | grep -v 'backup.sh' | crontab -

Comparing cron vs systemd Timers

Aspectcronsystemd Timers
Setup complexityOne lineTwo files (service + timer)
Learning curveFamiliar, simpleMore to learn initially
LoggingManual (redirect to file/mail)Automatic journald
Error handlingCheck exit code manuallysystemctl status shows failures
DependenciesNoneFull systemd dependency graph
Resource limitsNone built-inCPU, memory limits via cgroups
Catch-upMissed = skippedPersistent=true catches up
DebuggingRead log filessystemctl status, journalctl
User timerscrontab -e~/.config/systemd/user/

When to use cron:

  • Simple, quick one-liners
  • Systems without systemd
  • When minimal setup overhead matters

When to use systemd timers:

  • Production services needing reliability
  • Jobs with dependencies on other services
  • When you need resource control or sandboxing
  • When logging and debugging matter

Think About It: You have a job that sends daily reports. It takes about 30 seconds to run. Which would you choose, cron or a systemd timer, and why?


Automation Best Practices

1. Always Log Output

# Cron: redirect both stdout and stderr
0 2 * * * /opt/scripts/backup.sh >> /var/log/backup.log 2>&1

# systemd: automatic, but set SyslogIdentifier
[Service]
StandardOutput=journal
SyslogIdentifier=my-backup

2. Handle Errors Gracefully

#!/bin/bash
set -euo pipefail

cleanup() {
    echo "[$(date)] Script exiting with code $?" >> /var/log/myjob.log
}
trap cleanup EXIT

echo "[$(date)] Starting backup..." >> /var/log/myjob.log
# ... actual work ...
echo "[$(date)] Backup complete." >> /var/log/myjob.log

3. Prevent Overlapping Runs (Locking)

If a job takes longer than its interval, you can end up with multiple instances running simultaneously. Use a lock file:

#!/bin/bash
set -euo pipefail

LOCKFILE="/var/lock/mybackup.lock"

# Use flock for atomic locking
exec 200>"$LOCKFILE"
if ! flock -n 200; then
    echo "Another instance is already running. Exiting."
    exit 1
fi

# Your actual work here
echo "Running backup at $(date)"
sleep 60  # Simulating long-running job
echo "Backup complete at $(date)"

# Lock is automatically released when the script exits

Or use flock directly on the command line:

# In crontab:
*/5 * * * * flock -n /var/lock/myjob.lock /opt/scripts/myjob.sh

4. Use Timeouts

Prevent a job from running forever:

# Cron: use timeout command
0 2 * * * timeout 3600 /opt/scripts/backup.sh

# systemd: built-in timeout
[Service]
TimeoutStartSec=3600

5. Monitor Your Jobs

# Create a simple monitoring wrapper
#!/bin/bash
set -euo pipefail

JOB_NAME="nightly-backup"
START_TIME=$(date +%s)

/opt/scripts/backup.sh
EXIT_CODE=$?

END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))

if [[ $EXIT_CODE -ne 0 ]]; then
    echo "ALERT: $JOB_NAME failed with exit code $EXIT_CODE after ${DURATION}s" | \
        mail -s "Job Failure: $JOB_NAME" admin@example.com
fi

echo "$JOB_NAME completed in ${DURATION}s with exit code $EXIT_CODE"

6. Set MAILTO (cron)

cron can email the output of jobs:

# At the top of crontab
MAILTO=admin@example.com

# Or to disable email for a specific job
0 2 * * * /opt/scripts/backup.sh > /dev/null 2>&1

Debug This: Cron Job Not Running

You set up a cron job but it never fires. Here is your debugging checklist:

  1. Is cron running?

    systemctl status cron.service 2>/dev/null || systemctl status crond.service
    
  2. Is the crontab syntax correct?

    crontab -l
    # Common mistake: 6 fields instead of 5 (user field only in /etc/crontab)
    
  3. Can the script be found and executed?

    # Use absolute paths!
    which python3
    ls -la /opt/scripts/backup.sh
    
  4. Check cron logs:

    grep CRON /var/log/syslog | tail -20
    journalctl -u cron.service --since "1 hour ago" --no-pager
    
  5. Check permissions:

    # Script must be executable
    chmod +x /opt/scripts/backup.sh
    
    # Check /etc/cron.allow and /etc/cron.deny
    cat /etc/cron.allow 2>/dev/null
    cat /etc/cron.deny 2>/dev/null
    
  6. Test the command manually as the cron user:

    # Run with cron's minimal environment
    env -i /bin/bash --noprofile --norc -c '/opt/scripts/backup.sh'
    
  7. Check for environment issues:

    # Add this as the first line in your cron script
    env > /tmp/cron-env-debug.txt
    # Then compare with your interactive environment
    

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 24 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  cron:                                                           |
|  - Classic scheduler, minute-level precision                     |
|  - Syntax: min hour dom month dow command                       |
|  - Use crontab -e to edit, crontab -l to list                  |
|  - Always use absolute paths in cron jobs                       |
|                                                                  |
|  anacron:                                                        |
|  - Catches up on missed jobs after downtime                     |
|  - Ideal for laptops and desktops                               |
|                                                                  |
|  at:                                                             |
|  - One-time future execution                                    |
|  - at now + 30 minutes, at 3:00 PM tomorrow                    |
|                                                                  |
|  systemd timers:                                                 |
|  - Modern, with logging, dependencies, sandboxing               |
|  - OnCalendar for schedules, Persistent for catch-up            |
|  - Two files: .service + .timer                                 |
|                                                                  |
|  Best practices:                                                 |
|  - Always log output                                            |
|  - Use flock to prevent overlapping runs                        |
|  - Set timeouts for long-running jobs                           |
|  - Monitor job success/failure                                  |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Cron Basics

Set up a cron job that runs every 5 minutes and appends the current date and load average to /tmp/load-monitor.log. Let it run for 30 minutes, then check the log. Remove the cron job when done.

Exercise 2: systemd Timer

Convert the cron job from Exercise 1 to a systemd timer. Use OnUnitActiveSec=5min for the timing. Compare the logging experience: which is easier to check for errors?

Exercise 3: Locking

Create two cron jobs that run the same script at the same time (every minute). The script should sleep for 90 seconds. Observe what happens without locking (two instances run simultaneously). Then add flock and verify only one runs at a time.

Exercise 4: Catch-Up Behavior

Create a systemd timer with Persistent=true set to run every hour. Stop the timer for 3 hours, then restart it. Does it catch up on the missed runs? Now try the same with a cron job. What is the difference?

Bonus Challenge

Build a complete automation suite: a systemd timer that runs a script every 6 hours. The script should: (1) check disk usage and warn if any filesystem is over 80%, (2) check for failed systemd services, (3) report the top 5 CPU-consuming processes, and (4) send the report to a log file. Add flock for safety, a timeout of 60 seconds, and proper error handling. Include the service file, timer file, and script.

Vim Essentials

Why This Matters

It is 2 AM. You are SSH'd into a production server that just started throwing 502 errors. The application config has a typo, and the only editors installed on this minimal server image are vi and nano. Your colleague used nano, accidentally saved the file with trailing whitespace that broke the YAML parser, and now the service will not start at all.

Vim (and its predecessor vi) is the one editor you can count on being available on virtually every Unix and Linux system ever built -- from a Raspberry Pi to a mainframe, from a Docker container to a rescue boot disk. Knowing Vim is not about editor wars or productivity hacks. It is about having a reliable, powerful tool in your hands no matter where you land.

Beyond availability, Vim's modal editing model is genuinely efficient once you internalize it. Instead of holding modifier keys for every operation, you switch between modes -- one for navigating, one for inserting text, one for selecting, one for running commands. This separation means your hands rarely leave the home row, and complex edits become composable sequences of simple keystrokes.

You do not need to master Vim today. You need to be competent enough to open a file, make a change, save it, and quit. Everything beyond that is a bonus that will come with practice.


Try This Right Now

Open a terminal and run:

vimtutor

This launches Vim's built-in interactive tutorial. It takes about 30 minutes. If you have never used Vim before, vimtutor is the single best starting point. It teaches by doing.

If vimtutor is not available:

# Debian/Ubuntu
sudo apt install vim

# Fedora/RHEL
sudo dnf install vim-enhanced

# Arch
sudo pacman -S vim

Distro Note: Most minimal installations include vi (often a stripped-down version called vim-tiny or the POSIX vi). The full vim package gives you syntax highlighting, undo history, and all the features covered in this chapter.


Understanding Modal Editing

Most editors you have used -- nano, Notepad, VS Code -- are modeless. Every key you press inserts a character. To do anything else, you hold Ctrl, Alt, or Shift.

Vim is different. It has modes. The mode you are in determines what your keypresses do.

+----------------------------------------------------------+
|                     VIM MODES                            |
+----------------------------------------------------------+
|                                                          |
|                   NORMAL MODE                            |
|               (navigation, commands)                     |
|                    |    ^                                 |
|              i,a,o |    | Esc                             |
|                    v    |                                 |
|                INSERT MODE                               |
|              (typing text)                                |
|                                                          |
|   From NORMAL:     |    From NORMAL:                     |
|         v, V, Ctrl-v    |         : (colon)               |
|              |           |              |                  |
|              v           |              v                  |
|         VISUAL MODE      |       COMMAND-LINE MODE        |
|      (selecting text)    |    (ex commands like :w :q)    |
|              |           |              |                  |
|         Esc  |           |         Enter or Esc            |
|              v           |              v                  |
|         NORMAL MODE      |         NORMAL MODE            |
+----------------------------------------------------------+

The Four Essential Modes

Normal Mode -- This is where Vim starts. Your keys are commands, not characters. Pressing j moves the cursor down, not inserting the letter "j". This feels alien at first. Give it time.

Insert Mode -- This is the familiar typing mode. Press i to enter insert mode, type your text, press Esc to return to normal mode.

Visual Mode -- For selecting text. Press v for character-wise, V for line-wise, or Ctrl-v for block (column) selection.

Command-Line Mode -- Press : from normal mode to type commands at the bottom of the screen: saving files, quitting, searching and replacing, running shell commands.

Think About It: Why would separating navigation from text input be an advantage? Consider how much time you spend navigating versus actually typing new text when editing config files.


Opening Files and the Basics

Opening Vim

# Open a file
vim /etc/hostname

# Open a file at a specific line number
vim +42 /etc/nginx/nginx.conf

# Open a file and search for a pattern
vim +/server_name /etc/nginx/nginx.conf

# Open multiple files
vim file1.conf file2.conf

# Open in read-only mode
vim -R /etc/passwd

The Status Line

When you open a file in Vim, look at the bottom of the screen:

"myfile.conf" 42L, 1337C

This tells you the filename, line count, and character count. When you are in different modes, the bottom-left will show:

-- INSERT --          (you are in insert mode)
-- VISUAL --          (you are in visual mode)
-- VISUAL LINE --     (line-wise visual mode)
-- VISUAL BLOCK --    (block visual mode)
:                     (command-line mode)
(nothing)             (normal mode)

Essential Motions (Normal Mode)

In normal mode, you navigate without touching the mouse. Here are the motions you need to know, organized from most basic to most useful.

Character and Line Movement

KeyAction
hMove left one character
jMove down one line
kMove up one line
lMove right one line

Think of it this way: j looks like a down arrow. h is on the left, l is on the right.

You can prefix any motion with a number:

5j      Move down 5 lines
10k     Move up 10 lines
3l      Move right 3 characters

Word Movement

KeyAction
wJump to the start of the next word
bJump to the start of the previous word
eJump to the end of the current/next word
WJump to the next WORD (whitespace-delimited)
BJump to the previous WORD
EJump to the end of the current/next WORD

The difference between w and W: lowercase w treats punctuation as word boundaries. W only treats whitespace as boundaries. So in server_name=localhost, pressing w would stop at _, =, and l. Pressing W would jump over the whole thing.

Line Movement

KeyAction
0Jump to the first column of the line
^Jump to the first non-blank character
$Jump to the end of the line
f{char}Jump forward to the next occurrence of {char} on this line
F{char}Jump backward to the previous occurrence of {char}
t{char}Jump forward to just before {char}

File Movement

KeyAction
ggJump to the first line of the file
GJump to the last line of the file
42GJump to line 42
:42Jump to line 42 (command-line mode)
Ctrl-dScroll down half a page
Ctrl-uScroll up half a page
Ctrl-fScroll down a full page
Ctrl-bScroll up a full page
HJump to the top of the screen (High)
MJump to the middle of the screen (Middle)
LJump to the bottom of the screen (Low)

Hands-On: Practice Navigation

  1. Open the system dictionary file (a large file, perfect for practice):
vim /usr/share/dict/words

If that file does not exist, use any long file, or create one:

seq 1 500 | vim -
  1. Try these commands in sequence:
gg          Go to the top
G           Go to the bottom
50G         Go to line 50
Ctrl-d      Scroll down
Ctrl-u      Scroll up
0           Go to start of line
$           Go to end of line
w           Jump word by word
b           Jump back word by word
  1. Press :q! to quit without saving.

Entering and Leaving Insert Mode

There are several ways to enter insert mode, each placing your cursor differently:

KeyAction
iInsert before the cursor
IInsert at the beginning of the line
aAppend after the cursor
AAppend at the end of the line
oOpen a new line below and enter insert mode
OOpen a new line above and enter insert mode

To leave insert mode, press Esc. Some people also use Ctrl-[, which does the same thing and is easier to reach.

Think About It: Why does Vim offer six different ways to enter insert mode? Think about the most common editing patterns: adding to the end of a line (A), starting a new config entry below the current one (o), inserting before a specific character (i).


Editing Commands (Normal Mode)

This is where Vim starts to feel powerful. These commands follow a grammar: operator + motion (or operator + text object).

Basic Editing

CommandAction
xDelete the character under the cursor
XDelete the character before the cursor
r{char}Replace the character under the cursor with {char}
ddDelete (cut) the entire current line
yyYank (copy) the entire current line
pPaste after the cursor
PPaste before the cursor
uUndo the last change
Ctrl-rRedo (undo the undo)
.Repeat the last change

The Operator + Motion Grammar

Vim commands are composable. An operator followed by a motion applies that operation over the motion's range:

d       = delete operator
w       = word motion
dw      = delete from cursor to next word

c       = change operator (delete + enter insert mode)
$       = end of line motion
c$      = change from cursor to end of line

y       = yank (copy) operator
gg      = beginning of file
ygg     = yank from cursor to beginning of file

Common combinations:

CommandAction
dwDelete from cursor to next word
d$ or DDelete from cursor to end of line
d0Delete from cursor to beginning of line
dGDelete from cursor to end of file
dggDelete from cursor to beginning of file
cwChange word (delete word + enter insert mode)
c$ or CChange to end of line
ci"Change inside quotes
di(Delete inside parentheses
da{Delete around braces (including the braces)
yawYank a word (including surrounding whitespace)

Text Objects

Text objects let you operate on structured chunks of text. They start with i (inside) or a (around):

ciw     Change inside word
daw     Delete a word (word + surrounding whitespace)
ci"     Change inside double quotes
da"     Delete around double quotes (quotes + content)
ci(     Change inside parentheses
da[     Delete around square brackets
cit     Change inside HTML/XML tag

Hands-On: Editing Practice

Create a practice file:

vim /tmp/vim-practice.txt

Press i to enter insert mode and type:

server_name = oldhostname
port = 8080
log_level = debug
database_host = 192.168.1.100
database_port = 5432

Press Esc to return to normal mode. Now practice:

gg          Go to line 1
f=          Jump to the = sign
w           Move to "oldhostname"
cw          Change word -- type "newhostname", then Esc
j           Move down to port line
$           Go to end of line
r0          Replace the last "0" with "0" (no change, just practice)
j           Move to log_level line
fdbug       Oops -- that's not right. Use: f d to find "d", then cw to change word

Let us do that last one properly:

j           Move to the log_level line
fd          Jump to "d" in "debug"
cw          Change word -- type "info", then Esc

Save your work with :w and quit with :q (or combine: :wq).


Searching and Replacing

Searching

CommandAction
/patternSearch forward for "pattern"
?patternSearch backward for "pattern"
nJump to the next match
NJump to the previous match
*Search forward for the word under the cursor
#Search backward for the word under the cursor

Example: To find all occurrences of "error" in a log file:

/error

Press Enter, then n to cycle through matches.

Search and Replace

The substitute command uses this syntax:

:[range]s/pattern/replacement/[flags]

Common examples:

:s/old/new/           " Replace first 'old' on the current line
:s/old/new/g          " Replace all 'old' on the current line
:%s/old/new/g         " Replace all 'old' in the entire file
:%s/old/new/gc        " Replace all, but ask for confirmation each time
:10,20s/old/new/g     " Replace all 'old' on lines 10-20

Hands-On: Search and Replace

vim /tmp/search-practice.txt

Enter insert mode (i) and paste:

server1 host=10.0.0.1 port=8080
server2 host=10.0.0.2 port=8080
server3 host=10.0.0.3 port=8080
server4 host=10.0.0.4 port=9090

Press Esc, then:

/8080           Search for 8080 -- cursor jumps to first match
n               Next match
n               Next match
N               Previous match

:%s/8080/3000/g     Replace all 8080 with 3000
u                   Undo that change

:%s/8080/3000/gc    Replace with confirmation -- press y/n for each

Saving, Quitting, and File Operations

This is the section everyone needs. If you remember nothing else from this chapter, remember these:

CommandAction
:wSave (write) the file
:qQuit (fails if there are unsaved changes)
:wqSave and quit
:xSave and quit (only writes if changes were made)
ZZSave and quit (normal mode shortcut for :x)
:q!Quit without saving (discard changes)
:w newfile.txtSave as a different filename
:e otherfile.txtOpen a different file
:r filenameRead and insert the contents of another file
:!commandRun a shell command without leaving Vim
:r !commandInsert the output of a shell command

WARNING: :q! discards ALL unsaved changes without confirmation. Use it deliberately, not out of frustration.

Hands-On: File Operations

vim /tmp/fileops-practice.txt

Enter insert mode and type a few lines, then:

Esc             Return to normal mode
:w              Save the file
:!ls /tmp       Run ls without leaving Vim -- press Enter to return
:r !date        Insert the current date/time into the file
:w backup.txt   Save a copy as backup.txt
:q              Quit

Visual Mode: Selecting Text

Visual mode lets you select text and then operate on it.

KeyModeSelection Type
vVisualCharacter-wise selection
VVisual LineWhole-line selection
Ctrl-vVisual BlockColumn/rectangular selection

Common Visual Mode Workflow

  1. Enter visual mode (v, V, or Ctrl-v)
  2. Move the cursor to extend the selection
  3. Apply an operator: d (delete), y (yank), c (change), > (indent), < (unindent)

Hands-On: Block Selection

Visual block mode (Ctrl-v) is especially useful for editing columnar data:

vim /tmp/block-practice.txt

Enter insert mode and type:

host1  10.0.0.1   active
host2  10.0.0.2   active
host3  10.0.0.3   active
host4  10.0.0.4   active

Now add a comment character to the beginning of every line:

gg              Go to the first line
Ctrl-v          Enter visual block mode
3j              Select down 4 lines (first + 3 more)
I               Capital I -- insert at block beginning
#               Type the # character
Esc             Press Escape -- the # appears on all selected lines

To remove those comments:

gg              Go to the first line
Ctrl-v          Enter visual block mode
3j              Select down 4 lines
x               Delete the selected column

Registers: Vim's Clipboards

When you delete or yank text, it goes into a register. Vim has many registers, not just one clipboard.

RegisterDescription
""Default (unnamed) register -- last delete or yank
"0Yank register -- last yank only (not deletes)
"a to "zNamed registers -- you control what goes in
"+System clipboard (if Vim is compiled with +clipboard)
"*Primary selection (X11 middle-click paste)

Using Named Registers

"ayy        Yank current line into register a
"bdd        Delete current line into register b
"ap         Paste from register a
"bp         Paste from register b
:reg        View all registers and their contents

This is useful when you need to juggle multiple pieces of text.

Distro Note: System clipboard support ("+ register) requires vim-gtk3 on Debian/Ubuntu, vim-X11 on Fedora/RHEL, or gvim on Arch. The terminal vim package often lacks clipboard support. Check with vim --version | grep clipboard.


Macros: Repeating Complex Edits

Macros let you record a sequence of keystrokes and replay them. This is extraordinarily useful for repetitive edits.

Recording and Playing Macros

qa          Start recording into register a
(do stuff)  Your keystrokes are recorded
q           Stop recording
@a          Play back the macro in register a
@@          Repeat the last played macro
5@a         Play the macro 5 times

Hands-On: Macro Editing

Suppose you have a list of hostnames and need to wrap each in quotes and add a comma:

vim /tmp/macro-practice.txt

Enter insert mode and type:

server1
server2
server3
server4
server5

Press Esc, then:

gg          Go to line 1
qa          Start recording into register a
I"          Insert " at beginning of line
Esc         Back to normal mode
A",         Append ", at end of line
Esc         Back to normal mode
j           Move down one line
q           Stop recording
4@a         Play the macro 4 times (for the remaining 4 lines)

Result:

"server1",
"server2",
"server3",
"server4",
"server5",

Think About It: How would you modify this macro to also add a trailing comma on all lines except the last? (Hint: record without j, use a count, then manually edit the last line.)


Configuring Vim: .vimrc Basics

Vim reads its configuration from ~/.vimrc (or ~/.vim/vimrc). Here is a sensible starting configuration:

vim ~/.vimrc
" Basic settings
set nocompatible          " Use Vim defaults, not vi
syntax on                 " Enable syntax highlighting
set number                " Show line numbers
set relativenumber        " Show relative line numbers
set ruler                 " Show cursor position
set showcmd               " Show partial commands
set showmatch             " Highlight matching brackets

" Indentation
set tabstop=4             " Tab width = 4 spaces
set shiftwidth=4          " Indent width = 4 spaces
set expandtab             " Use spaces instead of tabs
set autoindent            " Copy indent from current line
set smartindent           " Smart autoindenting for C-like languages

" Search
set hlsearch              " Highlight search matches
set incsearch             " Incremental search (highlight as you type)
set ignorecase            " Case-insensitive search...
set smartcase             " ...unless you use uppercase letters

" Usability
set wildmenu              " Enhanced command-line completion
set scrolloff=5           " Keep 5 lines visible above/below cursor
set backspace=indent,eol,start  " Make backspace work normally
set mouse=a               " Enable mouse support (optional)

" File handling
set encoding=utf-8        " UTF-8 encoding
set fileformats=unix,dos  " Prefer Unix line endings
set nobackup              " Don't create backup files
set noswapfile            " Don't create swap files (live dangerously)

After saving, reload with :source ~/.vimrc or restart Vim.


Debug This

You SSH into a server to fix an Nginx config. You open the file:

vim /etc/nginx/sites-available/default

You make your changes, then try to save:

:w

You get this error:

E45: 'readonly' option is set (add ! to override)

You try :w! and get:

E212: Can't open file for writing

What is happening? You opened the file as a non-root user and do not have write permission. Here are your options:

Option 1: Save to a temporary file and move it:

:w /tmp/nginx-fix.conf

Then in another terminal: sudo cp /tmp/nginx-fix.conf /etc/nginx/sites-available/default

Option 2: Use the tee trick (write with sudo from within Vim):

:w !sudo tee % > /dev/null

This pipes the buffer through sudo tee, where % is Vim's shorthand for the current filename. You will be prompted for your password.

Option 3: Open the file with sudo from the start:

sudo vim /etc/nginx/sites-available/default

WARNING: Running sudo vim gives the editor root privileges. If you have untrusted plugins or a complex .vimrc, this can be a security risk. Consider using sudoedit or sudo -e instead, which copies the file, lets you edit the copy, then moves it back.


Quick Reference Card

+---------------------------------------------------------------+
|                    VIM QUICK REFERENCE                         |
+---------------------------------------------------------------+
| MOVEMENT           | EDITING            | FILES               |
| h/j/k/l  arrows    | i    insert before | :w   save           |
| w/b      word jump  | a    append after  | :q   quit           |
| 0/$      line edges | o/O  new line      | :wq  save+quit      |
| gg/G     file edges | dd   delete line   | :q!  force quit     |
| Ctrl-d/u half-page  | yy   copy line     | :e   open file      |
| /pattern search     | p    paste         | :!   shell cmd      |
| n/N      next/prev  | u    undo          |                     |
| *        word search| Ctrl-r redo        |                     |
|                     | .    repeat        |                     |
+---------------------------------------------------------------+
| VISUAL             | MACROS             | SEARCH/REPLACE      |
| v   character      | qa  record to a    | /text  search fwd   |
| V   line           | q   stop record    | ?text  search back  |
| C-v block          | @a  play macro     | :%s/a/b/g  replace  |
| d/y after select   | @@  repeat last    | :%s/a/b/gc confirm  |
+---------------------------------------------------------------+

What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  Vim uses MODAL EDITING: Normal, Insert, Visual, Command-Line.  |
|                                                                   |
|  Normal mode is home base -- navigation and commands.            |
|  Press i/a/o to enter Insert mode, Esc to return.               |
|                                                                   |
|  Commands are COMPOSABLE: operator + motion = action.            |
|    dw = delete word,  c$ = change to end of line                |
|                                                                   |
|  Search with /pattern, replace with :%s/old/new/g               |
|                                                                   |
|  Save with :w, quit with :q, force quit with :q!                |
|                                                                   |
|  Visual mode (v/V/Ctrl-v) for selecting, then apply operator.   |
|                                                                   |
|  Macros (qa...q, @a) automate repetitive edits.                 |
|                                                                   |
|  Customize Vim with ~/.vimrc for a better experience.            |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. Config File Editing: Open /etc/fstab (read-only: vim -R /etc/fstab), practice navigating with motions. Jump to specific lines, search for "uuid", use * on a word.

  2. Macro Challenge: Create a file with 20 lines of key=value pairs. Record a macro that converts key=value to export KEY="value" (uppercase the key, add export and quotes). Apply it to all 20 lines.

  3. Visual Block Power: Create a file with 10 lines of data separated by spaces. Use visual block mode to delete a column, add a column, and reorder columns.

  4. Search and Replace: Download a sample config file (e.g., a default nginx.conf) and practice replacing all IP addresses matching 127.0.0.1 with 0.0.0.0.

  5. Bonus Challenge: Open your ~/.bashrc in Vim. Using only Vim commands (no mouse), navigate to a specific alias, duplicate it, change the duplicate's name, and save. Time yourself. Repeat until you can do it in under 10 seconds.

tmux: Terminal Multiplexing

Why This Matters

You are deploying a critical database migration over SSH to a production server. The script will take 45 minutes to run. Twenty minutes in, your Wi-Fi drops. Your SSH session dies. The migration script -- was it still running? Was it halfway through altering a table? You have no way to know, no way to reconnect to that session, and now you are staring at a potential data corruption scenario.

This exact nightmare is why tmux exists.

tmux (terminal multiplexer) lets you create persistent terminal sessions that survive disconnections. When your SSH connection drops, the tmux session keeps running on the server. You reconnect, reattach, and everything is exactly where you left it -- running processes, command history, output scrollback, all of it.

But tmux is far more than an insurance policy against dropped connections. It lets you split your terminal into multiple panes, manage multiple windows (like tabs), and run several workflows simultaneously in a single SSH connection. Instead of opening five SSH sessions to the same server, you open one and use tmux to organize your workspace.


Try This Right Now

# Install tmux
# Debian/Ubuntu
sudo apt install tmux

# Fedora/RHEL
sudo dnf install tmux

# Arch
sudo pacman -S tmux

# Start a new tmux session
tmux

# You are now inside tmux. Notice the green status bar at the bottom.
# Type a command:
top

# Now detach from the session: press Ctrl-b, then d
# (That means: hold Ctrl, press b, release both, then press d)

# You are back at your regular shell. top is still running inside tmux.
# Reattach:
tmux attach

You just experienced the core superpower of tmux: detach and reattach. The top process kept running even though you "left."


Core Concepts: Sessions, Windows, and Panes

tmux organizes your work in a three-level hierarchy:

+-------------------------------------------------------+
|                   tmux SERVER                          |
|  (background process managing everything)             |
|                                                        |
|  +--------------------------------------------------+ |
|  |  SESSION: "webserver"                             | |
|  |                                                    | |
|  |  +-------------------+  +-------------------+     | |
|  |  | WINDOW 0: "edit"  |  | WINDOW 1: "logs"  |     | |
|  |  |                   |  |                   |     | |
|  |  | +------+--------+ |  | +---------------+ |     | |
|  |  | | PANE | PANE   | |  | |     PANE      | |     | |
|  |  | |  0   |   1    | |  | |      0        | |     | |
|  |  | | vim  | shell  | |  | | tail -f log   | |     | |
|  |  | +------+--------+ |  | +---------------+ |     | |
|  |  +-------------------+  +-------------------+     | |
|  +--------------------------------------------------+ |
|                                                        |
|  +--------------------------------------------------+ |
|  |  SESSION: "database"                              | |
|  |  ...                                               | |
|  +--------------------------------------------------+ |
+-------------------------------------------------------+

Session -- A collection of windows. Think of it as a workspace. You might have a "webserver" session and a "database" session.

Window -- A full-screen view within a session. Like tabs in a browser. Each window has a name shown in the status bar.

Pane -- A subdivision of a window. You can split a window horizontally or vertically into multiple panes, each running its own shell.


The Prefix Key

Almost every tmux command starts with a prefix key combination. The default prefix is:

Ctrl-b

You press Ctrl-b, release it, then press the command key. This two-step process prevents tmux keybindings from interfering with normal typing.

Throughout this chapter, I will write prefix commands as Ctrl-b <key>. For example, Ctrl-b d means: press Ctrl-b, release, press d.

Think About It: Why does tmux use a prefix key instead of direct shortcuts like Ctrl-d for detach? Consider that tmux runs inside a terminal where programs like vim, bash, and python all have their own keybindings. The prefix creates a clean namespace.


Managing Sessions

Creating Sessions

# Start a new unnamed session
tmux

# Start a named session
tmux new-session -s webserver

# Short form
tmux new -s webserver

# Start a session and immediately run a command
tmux new -s monitoring 'htop'

Listing Sessions

# From outside tmux
tmux list-sessions
tmux ls

# Example output:
# database: 1 windows (created Sat Feb 21 10:30:00 2026)
# webserver: 3 windows (created Sat Feb 21 09:15:00 2026) (attached)

The (attached) label tells you which session is currently active.

Attaching and Detaching

# Detach from current session (from inside tmux)
# Ctrl-b d

# Attach to the most recent session
tmux attach
tmux a

# Attach to a specific session
tmux attach -t webserver
tmux a -t database

# Attach to a session, detaching it from any other client first
tmux a -dt webserver

Switching and Killing Sessions

Ctrl-b s        List all sessions and switch interactively
Ctrl-b (        Switch to the previous session
Ctrl-b )        Switch to the next session
# Kill a specific session
tmux kill-session -t database

# Kill all sessions except "webserver"
tmux kill-session -a -t webserver

# Kill the tmux server (destroys everything)
tmux kill-server

WARNING: tmux kill-server destroys ALL sessions, windows, and panes immediately. Any running processes inside tmux will receive SIGHUP and typically terminate. Use this only when you truly want to clean up everything.

Hands-On: Session Management

Let us create a realistic multi-session setup:

# Create a session for web server work
tmux new -s web -d          # -d starts it detached

# Create a session for database work
tmux new -s db -d

# Create a session for monitoring
tmux new -s monitor -d

# List them
tmux ls
# Output:
# db: 1 windows (created ...)
# monitor: 1 windows (created ...)
# web: 1 windows (created ...)

# Attach to the web session
tmux a -t web

# Inside the session, switch between them:
# Ctrl-b s    (shows an interactive list, use arrows and Enter)

# Detach
# Ctrl-b d

# Clean up
tmux kill-session -t db
tmux kill-session -t monitor
tmux kill-session -t web

Managing Windows

Windows are like tabs within a session.

Window Commands

Key BindingAction
Ctrl-b cCreate a new window
Ctrl-b ,Rename the current window
Ctrl-b nSwitch to the next window
Ctrl-b pSwitch to the previous window
Ctrl-b 0-9Switch to window by number
Ctrl-b wList all windows interactively
Ctrl-b &Kill the current window (with confirmation)
Ctrl-b lToggle to the last active window

The Status Bar

The bottom of the screen shows your windows:

[web] 0:edit* 1:logs- 2:shell                    "hostname" 14:30 21-Feb
  ^     ^       ^       ^                             ^         ^
  |     |       |       |                             |         |
session |    previous  other                       hostname   time
name    |    window    window
     current
     window (*)

The * marks the active window. The - marks the previously active window.

Hands-On: Working with Windows

tmux new -s practice

Inside the session:

Ctrl-b c              Create a new window (you're now in window 1)
Ctrl-b ,              Rename it -- type "logs" and press Enter
Ctrl-b c              Create another window (window 2)
Ctrl-b ,              Rename it -- type "editor"
Ctrl-b 0              Switch to window 0
Ctrl-b ,              Rename it -- type "shell"
Ctrl-b w              List all windows -- navigate and select one
Ctrl-b n              Next window
Ctrl-b p              Previous window

Managing Panes

Panes let you see and work with multiple terminals side by side.

Pane Commands

Key BindingAction
Ctrl-b %Split the current pane vertically (left/right)
Ctrl-b "Split the current pane horizontally (top/bottom)
Ctrl-b arrowMove between panes using arrow keys
Ctrl-b oCycle to the next pane
Ctrl-b ;Toggle to the last active pane
Ctrl-b zZoom/unzoom the current pane (full screen toggle)
Ctrl-b xKill the current pane (with confirmation)
Ctrl-b {Swap with the previous pane
Ctrl-b }Swap with the next pane
Ctrl-b SpaceCycle through pane layouts
Ctrl-b qShow pane numbers (press number to jump to that pane)

Resizing Panes

Ctrl-b Ctrl-arrow     Resize pane in direction of arrow (1 cell at a time)
Ctrl-b Alt-arrow      Resize pane in direction of arrow (5 cells at a time)

Or use the command line:

Ctrl-b :              Enter command mode
resize-pane -D 10     Resize down by 10 cells
resize-pane -U 5      Resize up by 5 cells
resize-pane -L 10     Resize left by 10 cells
resize-pane -R 10     Resize right by 10 cells

Hands-On: Building a Dashboard

Let us create a practical monitoring layout:

tmux new -s dashboard
# Split vertically (left and right)
Ctrl-b %

# In the right pane, split horizontally (top and bottom)
Ctrl-b "

# Now you have three panes:
# +------------------+------------------+
# |                  |                  |
# |     pane 0       |     pane 1       |
# |                  |                  |
# |                  +------------------+
# |                  |                  |
# |                  |     pane 2       |
# |                  |                  |
# +------------------+------------------+

# Navigate to pane 0 (left)
Ctrl-b Left

# Run a process in each pane:
# Pane 0: system monitoring
htop

# Ctrl-b Right to go to pane 1
Ctrl-b Right
# Run log watching
tail -f /var/log/syslog    # or journalctl -f on systemd systems

# Ctrl-b Down to go to pane 2
Ctrl-b Down
# Free for commands

To zoom into any pane temporarily:

Ctrl-b z         Zoom current pane to full window
Ctrl-b z         Press again to unzoom back to the layout

Copy Mode: Scrolling and Copying Text

By default, you cannot scroll up in tmux using your mouse scroll wheel (unless you enable mouse mode). Instead, tmux has a copy mode.

Entering and Using Copy Mode

Ctrl-b [         Enter copy mode
q                Exit copy mode

In copy mode, you can scroll and navigate:

Arrow keys       Move cursor
Page Up/Down     Scroll by page
g                Go to top of buffer
G                Go to bottom of buffer
/pattern         Search forward
?pattern         Search backward
n                Next search match
N                Previous search match

Copying Text

tmux supports two key binding styles: emacs (default) and vi. With the default emacs bindings:

Ctrl-b [         Enter copy mode
                 Navigate to the start of text you want
Ctrl-Space       Start selection
                 Navigate to the end of text
Alt-w            Copy selection (or Ctrl-w to cut)
Ctrl-b ]         Paste the copied text

With vi-style bindings (add set-window-option -g mode-keys vi to .tmux.conf):

Ctrl-b [         Enter copy mode
                 Navigate to the start of text
Space            Start selection
                 Navigate to the end
Enter            Copy selection
Ctrl-b ]         Paste

Think About It: When would you use tmux copy mode instead of just piping output to a file? Consider cases where a program's output has already scrolled past and you need to grab something from the scrollback buffer.


Practical Workflows

Workflow 1: SSH + tmux for Remote Work

This is the single most important tmux workflow. It protects you from disconnections.

# SSH into a remote server
ssh user@production-server

# Start a named tmux session
tmux new -s deploy

# Run your long-running deployment
./deploy.sh

# If you need to disconnect (or if your connection drops):
# Ctrl-b d

# Later, reconnect:
ssh user@production-server
tmux a -t deploy
# Everything is exactly as you left it

Pro tip: Always name your sessions when working on remote servers. If you just run tmux, you will end up with multiple unnamed sessions and forget which is which.

Workflow 2: Development Environment

Set up a complete development workspace with one command:

# Create a script to set up your tmux workspace
cat > ~/tmux-dev.sh << 'EOF'
#!/bin/bash
SESSION="dev"

# Create session with first window named "editor"
tmux new-session -d -s $SESSION -n editor

# Window 0: editor
tmux send-keys -t $SESSION:0 'vim .' C-m

# Window 1: server
tmux new-window -t $SESSION -n server
tmux send-keys -t $SESSION:1 'echo "Start your server here"' C-m

# Window 2: logs + shell (split pane)
tmux new-window -t $SESSION -n logs
tmux split-window -h -t $SESSION:2
tmux send-keys -t $SESSION:2.0 'journalctl -f' C-m
tmux send-keys -t $SESSION:2.1 'echo "Shell ready"' C-m

# Window 3: git
tmux new-window -t $SESSION -n git
tmux send-keys -t $SESSION:3 'git status' C-m

# Select the first window
tmux select-window -t $SESSION:0

# Attach to the session
tmux attach -t $SESSION
EOF

chmod +x ~/tmux-dev.sh

Workflow 3: Monitoring Multiple Servers

# Split and SSH to different servers in each pane
tmux new -s servers

# Ctrl-b %   (split vertically)
# Ctrl-b "   (split the right pane horizontally)

# In each pane, SSH to a different server:
# Pane 0: ssh web-server-1
# Pane 1: ssh web-server-2
# Pane 2: ssh db-server-1

# Synchronize input to all panes (type once, execute everywhere):
Ctrl-b :
setw synchronize-panes on

# Now every keystroke goes to all panes simultaneously!
# Type: uptime
# All three servers show their uptime

# Turn it off when done:
Ctrl-b :
setw synchronize-panes off

WARNING: Synchronized panes means EVERY keystroke goes to ALL panes. If you type sudo rm -rf / in synchronized mode, it executes on every server. Be extremely careful. Always turn off synchronization when you are done with the parallel command.


Customizing tmux: .tmux.conf

tmux reads its configuration from ~/.tmux.conf. Here is a practical starting configuration:

vim ~/.tmux.conf
# ============================================
# General Settings
# ============================================

# Use 256 colors
set -g default-terminal "screen-256color"

# Increase scrollback buffer
set -g history-limit 50000

# Start window numbering at 1 (not 0)
set -g base-index 1
setw -g pane-base-index 1

# Renumber windows when one is closed
set -g renumber-windows on

# Reduce escape delay (important for Vim users)
set -sg escape-time 10

# Enable mouse support
set -g mouse on

# ============================================
# Key Bindings
# ============================================

# Remap prefix to Ctrl-a (easier to reach than Ctrl-b)
# unbind C-b
# set -g prefix C-a
# bind C-a send-prefix

# Split panes with | and - (more intuitive)
bind | split-window -h -c "#{pane_current_path}"
bind - split-window -v -c "#{pane_current_path}"
unbind '"'
unbind %

# New windows open in the current directory
bind c new-window -c "#{pane_current_path}"

# Reload config with prefix + r
bind r source-file ~/.tmux.conf \; display "Config reloaded!"

# Use vi keys in copy mode
setw -g mode-keys vi

# ============================================
# Pane Navigation (Vim-style)
# ============================================

bind h select-pane -L
bind j select-pane -D
bind k select-pane -U
bind l select-pane -R

# Resize panes with Shift+arrow
bind -r H resize-pane -L 5
bind -r J resize-pane -D 5
bind -r K resize-pane -U 5
bind -r L resize-pane -R 5

# ============================================
# Status Bar
# ============================================

# Status bar colors
set -g status-style bg=colour235,fg=colour136

# Current window highlight
setw -g window-status-current-style fg=colour166,bold

# Status bar content
set -g status-left '#[fg=colour46]#S #[fg=colour245]| '
set -g status-right '#[fg=colour245]%Y-%m-%d #[fg=colour136]%H:%M '
set -g status-left-length 20

After saving, reload the config:

Ctrl-b :
source-file ~/.tmux.conf

Or if you added the bind r shortcut: Ctrl-b r.

Distro Note: On older systems with tmux < 2.1, some settings like set -g mouse on do not exist. Older versions used separate mouse-select-pane, mouse-resize-pane, and mouse-select-window options. Check your tmux version with tmux -V.


tmux Command Reference

You can run tmux commands from the command line or from within tmux (via Ctrl-b :).

# Session commands
tmux new -s name              Create named session
tmux ls                       List sessions
tmux a -t name                Attach to session
tmux kill-session -t name     Kill session

# Window commands (from command mode Ctrl-b :)
new-window -n name            Create named window
rename-window name            Rename current window
kill-window                   Kill current window

# Pane commands (from command mode)
split-window -h               Split horizontally
split-window -v               Split vertically
resize-pane -D/U/L/R N       Resize pane by N cells

# Misc
list-keys                     Show all key bindings
show-options -g               Show global options
display-message "text"        Show a message

Debug This

You run tmux a and get:

no sessions

But you are sure you started one earlier today. What happened?

Diagnosis:

  1. The tmux server may have been restarted (e.g., system reboot). tmux sessions do not survive reboots by default -- they live in memory.

  2. Someone (or a cron job) may have run tmux kill-server.

  3. The session may have exited naturally. If the last shell in a session exits, the session closes. If you ran tmux new -s work 'python script.py' and the script finished, the session is gone.

Prevention:

  • For long-running work, make sure the session has a regular shell, not just a single command.
  • Consider tmux session persistence plugins like tmux-resurrect or tmux-continuum that can save and restore sessions across reboots.
  • Check systemctl status tmux if your system runs tmux as a systemd service.

What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  tmux creates PERSISTENT terminal sessions that survive          |
|  disconnections -- essential for remote server work.             |
|                                                                   |
|  Hierarchy: Session > Window > Pane                              |
|                                                                   |
|  PREFIX KEY: Ctrl-b (then command key)                           |
|                                                                   |
|  Sessions: new -s name, attach -t name, Ctrl-b d to detach      |
|  Windows:  Ctrl-b c (create), Ctrl-b n/p (next/prev)            |
|  Panes:    Ctrl-b % (v-split), Ctrl-b " (h-split)               |
|            Ctrl-b arrow (navigate), Ctrl-b z (zoom)              |
|                                                                   |
|  Copy mode: Ctrl-b [ to enter, scroll/search/copy               |
|                                                                   |
|  Customize with ~/.tmux.conf                                     |
|                                                                   |
|  Key workflow: SSH -> tmux new -s name -> work -> Ctrl-b d       |
|                SSH -> tmux a -t name -> resume exactly            |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. Session Persistence Test: Start a tmux session, run ping 8.8.8.8, detach, wait 30 seconds, reattach. Verify the ping is still running and count the packets.

  2. Window Workflow: Create a session with 4 windows named "edit", "build", "test", and "deploy". Practice switching between them using numbers and the interactive list.

  3. Pane Layout: Create a 4-pane layout (2x2 grid) in a single window. Run a different monitoring command in each pane: top, iostat 1, watch df -h, and journalctl -f.

  4. Synchronized Panes: If you have access to multiple machines (or use localhost), set up synchronized panes to run the same command on three "servers" simultaneously.

  5. tmux Scripting: Write a shell script that creates your ideal development environment with named windows, split panes, and commands pre-typed in each pane. Make it idempotent (check if the session exists before creating it).

  6. Bonus Challenge: Customize your ~/.tmux.conf to use Ctrl-a as the prefix key (screen-style), add vim-style pane navigation, and make the status bar show system load. Share your config with a colleague and have them try it.

Git for Ops

Why This Matters

Your colleague made a change to the Nginx configuration last Thursday. It broke something subtle -- response headers are missing, and a downstream service started failing. Nobody remembers exactly what changed, when, or why. The config file has no comments about the modification. You are left diffing mental models and guessing.

Now imagine the same scenario, but the config is tracked in Git. You run git log and see:

commit a3f9e2b  Thu Feb 19 14:32:00 2026
Author: Priya <priya@ops.team>
    Remove X-Request-ID header to fix proxy buffering issue

There it is. You see exactly what changed, when, who did it, and why. You run git revert a3f9e2b, and the header is back. Problem solved in under two minutes.

Git is not just for software developers. For operations engineers, Git is the backbone of infrastructure as code, configuration management, change tracking, collaboration, and disaster recovery. If you manage servers and you are not using Git, you are flying blind.


Try This Right Now

# Install Git
# Debian/Ubuntu
sudo apt install git

# Fedora/RHEL
sudo dnf install git

# Arch
sudo pacman -S git

# Set your identity (Git needs to know who makes each commit)
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

# Create a practice repository
mkdir ~/git-practice && cd ~/git-practice
git init
echo "Hello, Git!" > README.txt
git add README.txt
git commit -m "Initial commit"

# See what you just did
git log

You have just created a repository, tracked a file, and saved a snapshot of it. Every concept in this chapter builds on this foundation.


How Git Works: The Mental Model

Git is a distributed version control system. But what does that actually mean?

Think of Git as a series of snapshots. Every time you commit, Git takes a picture of all your tracked files at that moment and stores a reference to that snapshot. If a file has not changed, Git does not store it again -- it just links to the previous identical version.

+-----------------------------------------------------------+
|                    GIT MENTAL MODEL                       |
+-----------------------------------------------------------+
|                                                           |
|  Working       Staging        Repository                  |
|  Directory     Area (Index)   (.git/)                     |
|                                                           |
|  +----------+  +----------+  +------------------------+  |
|  |          |  |          |  |  commit 3 (HEAD)       |  |
|  | files    |  | files    |  |    |                    |  |
|  | you see  |--| ready to |--+  commit 2              |  |
|  | and edit |  | commit   |  |    |                    |  |
|  |          |  |          |  |  commit 1               |  |
|  +----------+  +----------+  +------------------------+  |
|                                                           |
|    git add ->    git commit ->                            |
|                                                           |
+-----------------------------------------------------------+

Working Directory -- The actual files on disk. This is what you see and edit.

Staging Area (Index) -- A holding area where you prepare changes for the next commit. You choose exactly which changes to include.

Repository (.git/) -- The database of all commits, branches, and history. This lives in the hidden .git/ directory.

Think About It: Why does Git have a staging area instead of just committing all changes? Consider a scenario where you fixed a bug AND reformatted a config file. The staging area lets you commit the bug fix separately from the formatting change, keeping the history clean and meaningful.


The Basic Git Workflow

Step 1: Check Status

Always start by understanding the current state:

git status

Output:

On branch main
nothing to commit, working tree clean

This tells you: you are on the main branch, and there are no uncommitted changes.

Step 2: Make Changes

# Create or edit files
echo "server_name=webserver01" > server.conf
echo "port=8080" >> server.conf
echo "log_level=info" >> server.conf

Check status again:

git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        server.conf

nothing added to commit but untracked files present

Git sees the new file but is not tracking it yet.

Step 3: Stage Changes

# Stage a specific file
git add server.conf

# Stage all changes (new, modified, deleted files)
git add .

# Stage only modified and deleted files (not new untracked files)
git add -u
git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   server.conf

The file is now staged (in the index), ready to be committed.

Step 4: Commit

git commit -m "Add initial server configuration"
[main a1b2c3d] Add initial server configuration
 1 file changed, 3 insertions(+)
 create mode 100644 server.conf

Commit messages matter. Write them as if you are explaining the change to a colleague who will read it at 3 AM during an outage. Good: "Fix log rotation to prevent disk full on /var/log". Bad: "update stuff".

Step 5: View History

git log
commit a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
Author: Your Name <your.email@example.com>
Date:   Sat Feb 21 10:00:00 2026 +0000

    Add initial server configuration

commit 1234567890abcdef1234567890abcdef12345678
Author: Your Name <your.email@example.com>
Date:   Sat Feb 21 09:50:00 2026 +0000

    Initial commit

Useful log variations:

# Compact one-line format
git log --oneline

# Show what changed in each commit
git log -p

# Show stats (files changed, insertions, deletions)
git log --stat

# Last 5 commits
git log -5

# Commits by a specific author
git log --author="Priya"

# Commits in the last week
git log --since="1 week ago"

# Search commit messages
git log --grep="nginx"

Viewing Changes with git diff

git diff is your X-ray vision into what changed.

# Edit the config file
echo "log_level=debug" >> server.conf

# See unstaged changes (working directory vs staging area)
git diff

# Stage the change
git add server.conf

# See staged changes (staging area vs last commit)
git diff --staged

# See changes between any two commits
git diff abc123..def456

# See changes in a specific file
git diff server.conf

# See just the names of changed files
git diff --name-only

Example output:

diff --git a/server.conf b/server.conf
index 1234567..abcdefg 100644
--- a/server.conf
+++ b/server.conf
@@ -1,3 +1,4 @@
 server_name=webserver01
 port=8080
 log_level=info
+log_level=debug

Lines starting with + are additions, - are deletions. The @@ line shows the location in the file.


Undoing Things

Everyone makes mistakes. Git gives you several levels of undo.

Unstage a File

# You staged something you did not mean to
git restore --staged server.conf

# Older syntax (still works)
git reset HEAD server.conf

Discard Working Directory Changes

# Discard changes to a specific file (revert to last committed version)
git restore server.conf

# Older syntax
git checkout -- server.conf

WARNING: git restore <file> permanently discards your uncommitted changes to that file. There is no undo for this operation. The changes are gone.

Amend the Last Commit

# Forgot to include a file, or want to fix the commit message
git add forgotten-file.conf
git commit --amend -m "Add initial server configuration with all files"

Revert a Commit

# Create a new commit that undoes a previous commit
# This is safe -- it preserves history
git revert a1b2c3d

This does not delete the original commit. It creates a new commit that applies the inverse of the original changes. This is the safe way to undo changes, especially on shared branches.


Branches: Parallel Timelines

Branches let you work on changes in isolation without affecting the main line. This is essential for ops work -- you can test a config change on a branch before applying it.

          main:    A---B---C
                            \
          feature:           D---E

Branch Commands

# List all branches
git branch

# Create a new branch
git branch feature/new-nginx-config

# Switch to a branch
git checkout feature/new-nginx-config

# Create and switch in one command
git checkout -b feature/new-nginx-config

# Modern syntax (Git 2.23+)
git switch -c feature/new-nginx-config

# Delete a branch (after merging)
git branch -d feature/new-nginx-config

# Force delete an unmerged branch
git branch -D feature/abandoned-experiment

Merging Branches

When your changes on a branch are ready, merge them back:

# Switch to the branch you want to merge INTO
git checkout main

# Merge the feature branch
git merge feature/new-nginx-config

If Git can cleanly combine the changes, it creates a merge commit automatically. If the same lines were changed in both branches, you get a merge conflict.

Resolving Merge Conflicts

When a conflict occurs, Git marks the conflicting sections in the file:

<<<<<<< HEAD
log_level=info
=======
log_level=debug
>>>>>>> feature/new-nginx-config

To resolve:

  1. Open the file and decide which version to keep (or combine them)
  2. Remove the conflict markers (<<<<<<<, =======, >>>>>>>)
  3. Stage the resolved file: git add server.conf
  4. Complete the merge: git commit

Think About It: Why would you use branches for ops work instead of just editing files directly on main? Consider code review, testing changes before applying them, and the ability to quickly revert if something goes wrong.


Working with Remote Repositories

So far, everything has been local. Remote repositories let you collaborate, back up your work, and deploy from a central location.

Cloning a Repository

# Clone an existing repository
git clone https://github.com/example/server-configs.git

# Clone into a specific directory
git clone https://github.com/example/server-configs.git my-configs

# Clone via SSH (requires SSH key setup)
git clone git@github.com:example/server-configs.git

Adding a Remote

If you started with git init locally:

# Add a remote (conventionally named "origin")
git remote add origin https://github.com/yourteam/server-configs.git

# List remotes
git remote -v

# Output:
# origin  https://github.com/yourteam/server-configs.git (fetch)
# origin  https://github.com/yourteam/server-configs.git (push)

Push, Pull, and Fetch

# Push your commits to the remote
git push origin main

# Push and set the upstream tracking (do this the first time)
git push -u origin main

# After -u, you can just use:
git push

# Fetch changes from the remote (does not modify your working directory)
git fetch origin

# Pull changes (fetch + merge)
git pull origin main

# After setting upstream:
git pull
+---------------------------------------------------------------+
|              PUSH AND PULL FLOW                               |
+---------------------------------------------------------------+
|                                                               |
|  Your Local Repo          Remote Repo (GitHub, GitLab, etc.)  |
|  +---------------+        +---------------+                   |
|  | Working Dir   |        |               |                   |
|  | Staging Area  | --push--> | main branch |                  |
|  | Local Commits | <-pull--- |             |                  |
|  +---------------+        +---------------+                   |
|                                                               |
+---------------------------------------------------------------+

Distro Note: Git is available on all major distributions. For SSH-based remote access, ensure openssh-client is installed. On minimal server images, you may need to install it separately: sudo apt install openssh-client (Debian/Ubuntu) or sudo dnf install openssh-clients (Fedora/RHEL).


.gitignore: Keeping Secrets Out

The .gitignore file tells Git which files to ignore. This is critical for ops -- you must never commit passwords, API keys, or sensitive data.

vim .gitignore
# Ignore editor backup files
*.swp
*.swo
*~

# Ignore OS files
.DS_Store
Thumbs.db

# Ignore secrets and credentials
*.key
*.pem
*.p12
.env
secrets.yml
credentials.conf

# Ignore logs
*.log
logs/

# Ignore compiled files
*.pyc
__pycache__/

# Ignore but track the template
!credentials.conf.example

The ! prefix negates a pattern -- useful for tracking a template file while ignoring the actual secrets file.

git add .gitignore
git commit -m "Add .gitignore to exclude secrets and temp files"

WARNING: .gitignore only prevents NEW files from being tracked. If you already committed a file with secrets and then add it to .gitignore, the secret is still in the Git history. To remove it, you need git rm --cached secrets.conf followed by a commit, and ideally a history rewrite with git filter-branch or git filter-repo. Better yet, rotate the compromised secret immediately.


Practical Ops Workflows

Workflow 1: Tracking Configuration Changes

Use Git to track changes to system configuration files:

# Initialize a repo for your configs
sudo mkdir /etc/myconfigs
sudo git -C /etc/myconfigs init

# Or track specific config files by copying them
mkdir ~/config-tracking && cd ~/config-tracking
git init

# Copy configs you want to track
cp /etc/nginx/nginx.conf .
cp /etc/ssh/sshd_config .
git add .
git commit -m "Baseline: current production configs"

# When you make changes, commit them with context
vim nginx.conf  # make your changes
git diff        # review what changed
git add nginx.conf
git commit -m "Increase worker_connections to 4096 for traffic spike

Ticket: OPS-1234
Approved by: Team Lead
Rollback: revert this commit"

Workflow 2: Managing Dotfiles

Track your personal configuration files across machines:

# Initialize a dotfiles repo
mkdir ~/dotfiles && cd ~/dotfiles
git init

# Copy your dotfiles
cp ~/.bashrc .
cp ~/.vimrc .
cp ~/.tmux.conf .
cp ~/.gitconfig .

git add .
git commit -m "Add dotfiles from workstation"

# Push to a remote
git remote add origin git@github.com:yourusername/dotfiles.git
git push -u origin main

# On a new machine, clone and create symlinks
git clone git@github.com:yourusername/dotfiles.git ~/dotfiles
ln -sf ~/dotfiles/.bashrc ~/.bashrc
ln -sf ~/dotfiles/.vimrc ~/.vimrc
ln -sf ~/dotfiles/.tmux.conf ~/.tmux.conf

Workflow 3: Collaborative Script Management

# Clone the team's scripts repo
git clone git@github.com:yourteam/ops-scripts.git
cd ops-scripts

# Create a branch for your new script
git checkout -b feature/disk-cleanup-script

# Write and test your script
vim disk-cleanup.sh
chmod +x disk-cleanup.sh
./disk-cleanup.sh --dry-run

# Commit and push
git add disk-cleanup.sh
git commit -m "Add disk cleanup script for /var/log rotation

- Removes logs older than 30 days
- Compresses logs older than 7 days
- Sends summary to syslog
- Supports --dry-run flag"

git push -u origin feature/disk-cleanup-script

# Open a pull request for team review (using GitHub CLI)
# gh pr create --title "Add disk cleanup script" --body "..."

Workflow 4: Emergency Rollback

# Something broke after a config change
# Find the commit that caused the issue
git log --oneline -10

# Output:
# a3f9e2b Fix proxy timeout settings
# b7c1d4e Update SSL cipher suite
# 9e2f3a1 Add rate limiting rules
# ...

# The proxy timeout change broke things -- revert it
git revert a3f9e2b

# Or, check out a specific file from a previous commit
git checkout 9e2f3a1 -- nginx.conf

# Review the reverted state
git diff HEAD~1
git status

# Commit if needed and deploy

Hands-On: Complete Git Exercise

Let us walk through a realistic scenario from start to finish:

# 1. Set up the repository
mkdir ~/server-configs && cd ~/server-configs
git init

# 2. Create initial configuration files
cat > nginx.conf << 'EOF'
worker_processes auto;
events {
    worker_connections 1024;
}
http {
    server {
        listen 80;
        server_name example.com;
        root /var/www/html;
    }
}
EOF

cat > .gitignore << 'EOF'
*.key
*.pem
.env
EOF

# 3. Initial commit
git add .
git commit -m "Initial server configuration"

# 4. Create a branch for SSL configuration
git checkout -b feature/add-ssl

# 5. Modify the config
cat > nginx.conf << 'EOF'
worker_processes auto;
events {
    worker_connections 1024;
}
http {
    server {
        listen 80;
        server_name example.com;
        return 301 https://$server_name$request_uri;
    }
    server {
        listen 443 ssl;
        server_name example.com;
        ssl_certificate /etc/ssl/certs/example.crt;
        ssl_certificate_key /etc/ssl/private/example.key;
        root /var/www/html;
    }
}
EOF

# 6. Review and commit
git diff
git add nginx.conf
git commit -m "Add SSL configuration with HTTP-to-HTTPS redirect"

# 7. Switch back to main and merge
git checkout main
git merge feature/add-ssl

# 8. View the complete history
git log --oneline --graph

# 9. Clean up the merged branch
git branch -d feature/add-ssl

# 10. Check the final state
git log --oneline
git status

Debug This

You are trying to push your changes and get this error:

! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'origin'
hint: Updates were rejected because the remote contains work that you do not
hint: have locally.

What is happening? Someone else pushed commits to the remote main branch since your last pull. Your local branch has diverged from the remote.

Solution:

# Fetch the latest changes
git fetch origin

# Merge them into your branch
git merge origin/main

# Or, combine fetch and merge:
git pull origin main

# Resolve any conflicts if needed, then push
git push origin main

Alternatively, use rebase to keep a linear history:

git pull --rebase origin main
git push origin main

Essential Git Configuration

Here are some settings that improve the Git experience:

# Set your identity
git config --global user.name "Your Name"
git config --global user.email "you@example.com"

# Set default branch name to 'main'
git config --global init.defaultBranch main

# Use colors in output
git config --global color.ui auto

# Set default editor
git config --global core.editor vim

# Useful aliases
git config --global alias.st status
git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.ci commit
git config --global alias.lg "log --oneline --graph --all --decorate"
git config --global alias.last "log -1 HEAD"
git config --global alias.unstage "restore --staged"

# View your config
git config --list

What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  Git tracks SNAPSHOTS of your files, not just differences.       |
|                                                                   |
|  Three areas: Working Dir -> Staging Area -> Repository          |
|    git add    = stage changes                                    |
|    git commit = save snapshot                                    |
|    git push   = share with remote                                |
|                                                                   |
|  git status  = see current state                                 |
|  git log     = see history                                       |
|  git diff    = see changes                                       |
|                                                                   |
|  Branches let you work on changes in ISOLATION.                  |
|    git checkout -b feature -> work -> git merge                  |
|                                                                   |
|  For ops: track configs, manage dotfiles, collaborate on         |
|  scripts, and maintain an audit trail of every change.           |
|                                                                   |
|  NEVER commit secrets. Use .gitignore proactively.               |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. Config Tracking: Initialize a Git repository and track copies of three system config files (e.g., /etc/hostname, /etc/hosts, /etc/resolv.conf). Make changes, commit them with meaningful messages, and use git log -p to review the history.

  2. Branch Workflow: Create a branch called feature/new-firewall-rules. Add a file with sample iptables rules. Commit it. Switch back to main, make a different change, commit. Then merge the feature branch. Resolve any conflicts.

  3. Dotfiles Repository: Set up a dotfiles repo with your .bashrc, .vimrc, and .tmux.conf. Push it to a remote (GitHub, GitLab, or a self-hosted Git server). Clone it on a different machine (or in a different directory) and create symlinks.

  4. Revert Practice: Make three commits in a row. Then use git revert to undo the middle commit. Verify that the first and third commit's changes are preserved.

  5. Bonus Challenge: Write a shell script that automates daily backups of critical config files into a Git repository. It should: copy the files, commit with a timestamp message, and push to a remote. Set it up as a cron job (see Chapter 24).

Networking Concepts: OSI & TCP/IP Models

Why This Matters

You are on a call with the network team. The application is down. Someone says, "It's a Layer 3 issue -- routing is broken." Someone else says, "No, I think it's Layer 7 -- the application is returning 503s." A third person asks, "Have you checked Layer 2? Maybe it's an ARP problem."

If you do not know what these layers mean, you are lost in this conversation. And that conversation is happening in real-time while a production service is down.

The OSI and TCP/IP networking models are not academic abstractions. They are the shared vocabulary that every systems administrator, network engineer, and DevOps engineer uses to diagnose problems, communicate precisely, and isolate faults. When you understand the layers, you can systematically eliminate possibilities: "The physical link is up, the IP address is correct, the port is open -- so the problem must be at the application layer."

This chapter gives you that vocabulary and that systematic thinking.


Try This Right Now

Let us trace a network request through the layers, right now, on your machine:

# See your network interfaces (Layer 2 - Data Link)
ip link show

# See your IP addresses (Layer 3 - Network)
ip addr show

# Check if you can reach a host (Layer 3 - Network)
ping -c 3 8.8.8.8

# See the path packets take (Layer 3 - Network)
tracepath 8.8.8.8

# Check if a TCP port is reachable (Layer 4 - Transport)
ss -tlnp

# Make an HTTP request (Layer 7 - Application)
curl -I https://example.com

Each of these commands operates at a different layer of the network stack. By the end of this chapter, you will understand exactly why.


The OSI Model: Seven Layers of Networking

The Open Systems Interconnection (OSI) model breaks network communication into seven layers. Each layer has a specific job, and each layer provides services to the layer above it while relying on the layer below it.

+---------------------------------------------------------------+
|                    THE OSI MODEL                              |
+---------------------------------------------------------------+
|                                                               |
|  Layer 7  APPLICATION    HTTP, DNS, SSH, FTP, SMTP            |
|           -------------------------------------------------   |
|  Layer 6  PRESENTATION   Encryption, compression, encoding   |
|           -------------------------------------------------   |
|  Layer 5  SESSION        Connection management, sessions     |
|           -------------------------------------------------   |
|  Layer 4  TRANSPORT      TCP, UDP -- ports, reliability      |
|           -------------------------------------------------   |
|  Layer 3  NETWORK        IP -- addressing, routing           |
|           -------------------------------------------------   |
|  Layer 2  DATA LINK      Ethernet, MAC addresses, switches   |
|           -------------------------------------------------   |
|  Layer 1  PHYSICAL       Cables, radio waves, voltage        |
|                                                               |
+---------------------------------------------------------------+

A common mnemonic to remember the layers from bottom to top:

Please Do Not Throw Sausage Pizza Away

(Physical, Data Link, Network, Transport, Session, Presentation, Application)

Or from top to bottom:

All People Seem To Need Data Processing


Layer by Layer: What Each Layer Does

Layer 1: Physical

What it does: Transmits raw bits over a physical medium.

What it includes: Ethernet cables (Cat5e, Cat6), fiber optic cables, Wi-Fi radio waves, voltage levels, pin layouts, signal timing.

Troubleshooting at this layer:

  • Is the cable plugged in?
  • Is the link light on the switch/NIC green?
  • Is the network interface up?
# Check if the physical link is detected
ip link show eth0

# Look for "state UP" or "state DOWN"
# A "NO-CARRIER" state means no cable/link detected

# Check interface statistics for physical errors
ip -s link show eth0
# Look for: errors, dropped, overruns, carrier errors

Real-world analogy: Layer 1 is the road. It does not care what vehicles are on it -- it just provides the physical surface for things to travel on.

What it does: Handles communication between devices on the same local network (LAN). Uses MAC (Media Access Control) addresses -- the hardware address burned into every network card.

Key concepts:

  • MAC addresses (e.g., 00:1A:2B:3C:4D:5E) -- 48-bit hardware addresses
  • Ethernet frames -- the unit of data at this layer
  • Switches -- Layer 2 devices that forward frames based on MAC addresses
  • ARP -- translates IP addresses to MAC addresses
# See your MAC addresses
ip link show

# See the ARP cache (IP-to-MAC mappings your system knows about)
ip neigh show

# Example output:
# 192.168.1.1 dev eth0 lladdr aa:bb:cc:dd:ee:ff REACHABLE

Real-world analogy: Layer 2 is the local postal system within a building. It uses apartment numbers (MAC addresses) to deliver mail within the building, but it cannot deliver to another building.

Layer 3: Network

What it does: Handles communication between different networks. Uses IP addresses for logical addressing and routing to determine the path packets take.

Key concepts:

  • IP addresses (e.g., 192.168.1.100 for IPv4, 2001:db8::1 for IPv6)
  • Packets -- the unit of data at this layer
  • Routers -- Layer 3 devices that forward packets between networks
  • Routing tables -- rules that determine where to send packets
# See your IP addresses
ip addr show

# See the routing table
ip route show

# Test Layer 3 connectivity
ping -c 3 192.168.1.1

# Trace the route to a destination
tracepath 8.8.8.8

Real-world analogy: Layer 3 is the national postal system. It uses street addresses (IP addresses) and knows how to route mail between cities (networks) using sorting facilities (routers).

Layer 4: Transport

What it does: Provides end-to-end communication between applications. Adds the concept of ports, which identify specific services on a host.

Key protocols:

  • TCP (Transmission Control Protocol) -- reliable, ordered, connection-oriented
  • UDP (User Datagram Protocol) -- fast, connectionless, no guarantee of delivery
# See active TCP connections and listening ports
ss -tlnp

# See active UDP sockets
ss -ulnp

# Check if a specific port is reachable
# (If nmap is installed)
nmap -p 443 example.com

Real-world analogy: Layer 4 is the postal service's handling options. TCP is like registered mail with tracking and delivery confirmation. UDP is like dropping a postcard in the mailbox -- fast, but you do not know if it arrived.

Layer 5: Session

What it does: Manages sessions (ongoing conversations) between applications. Handles setup, maintenance, and teardown of communication sessions.

What it includes: Session establishment, synchronization, authentication tokens, RPC (Remote Procedure Calls).

In practice, this layer is often merged with Layers 6 and 7 in modern implementations. You will rarely hear someone say "it's a Layer 5 problem."

Layer 6: Presentation

What it does: Translates data formats between the network and the application. Handles encryption, compression, and character encoding.

What it includes: SSL/TLS encryption, data serialization (JSON, XML), character sets (ASCII, UTF-8), image format conversion.

# TLS operates at this layer
# Check the TLS certificate of a site
openssl s_client -connect example.com:443 < /dev/null 2>/dev/null | \
    openssl x509 -noout -subject -dates

Layer 7: Application

What it does: The layer closest to the user. Provides network services directly to applications.

Key protocols: HTTP/HTTPS, DNS, SSH, FTP, SMTP, IMAP, SNMP, LDAP.

# HTTP request (Layer 7)
curl -v https://example.com

# DNS query (Layer 7)
dig example.com

# SSH connection (Layer 7)
ssh user@server

Real-world analogy: Layer 7 is the actual content of the letter. Layers 1-6 got the letter to you. Layer 7 is you reading it and understanding the message.

Think About It: When you type https://example.com in a browser, which layers are involved? (Answer: all of them. Layer 7 = HTTP, Layer 6 = TLS encryption, Layer 4 = TCP connection to port 443, Layer 3 = IP routing, Layer 2 = Ethernet frame to gateway, Layer 1 = electrical signals on the wire.)


The TCP/IP Model: Four Layers

The OSI model is a theoretical reference. The TCP/IP model is what the internet actually runs on. It condenses seven layers into four practical layers:

+-----------------------------------------------+
|       OSI MODEL          TCP/IP MODEL          |
+-----------------------------------------------+
|                                                |
|  7  Application  \                             |
|  6  Presentation  >  APPLICATION               |
|  5  Session      /                             |
|                                                |
|  4  Transport       TRANSPORT                  |
|                                                |
|  3  Network          INTERNET                  |
|                                                |
|  2  Data Link   \                              |
|  1  Physical     >   NETWORK ACCESS            |
|                                                |
+-----------------------------------------------+

TCP/IP Layers Explained

Application Layer -- Combines OSI Layers 5, 6, and 7. This is where protocols like HTTP, DNS, SSH, and SMTP live. The application handles its own session management and data formatting.

Transport Layer -- Same as OSI Layer 4. TCP and UDP operate here, providing port-based multiplexing and (in TCP's case) reliable delivery.

Internet Layer -- Same as OSI Layer 3. IP operates here, providing logical addressing and routing between networks.

Network Access Layer -- Combines OSI Layers 1 and 2. Handles the physical transmission of data and the local network delivery using hardware addresses.

Why Two Models?

The OSI model is great for understanding concepts and discussing problems. It gives you precise language (Layer 2, Layer 3, etc.).

The TCP/IP model is great for understanding implementation. It reflects how the internet protocol suite is actually built and how the Linux kernel's networking stack is organized.

In practice, people use OSI layer numbers (Layer 2, Layer 3, Layer 7) in conversation, but the software follows the TCP/IP model.


Data Encapsulation: How Data Flows Through Layers

When you send data over a network, each layer wraps the data from the layer above with its own header (and sometimes a trailer). This process is called encapsulation.

When data is received, each layer strips its header and passes the payload up to the next layer. This is decapsulation.

SENDING (Encapsulation - each layer adds a header):

+---------------------------------------------------+
|  Application Layer                                 |
|  +----------------------------------------------+ |
|  |              DATA                             | |
|  +----------------------------------------------+ |
|            |                                       |
|            v                                       |
|  Transport Layer                                   |
|  +----------------------------------------------+ |
|  | TCP/UDP |           DATA                      | |
|  | Header  |                                     | |
|  +----------------------------------------------+ |
|            |                                       |
|            v                                       |
|  Internet Layer                                    |
|  +----------------------------------------------+ |
|  | IP     | TCP/UDP |        DATA                | |
|  | Header | Header  |                            | |
|  +----------------------------------------------+ |
|            |                                       |
|            v                                       |
|  Network Access Layer                              |
|  +----------------------------------------------+ |
|  | Eth  | IP     | TCP/UDP |  DATA  | Eth       | |
|  | Hdr  | Header | Header  |       | Trailer   | |
|  +----------------------------------------------+ |
|                                                    |
|  Transmitted as bits on the wire                   |
+---------------------------------------------------+

The Names Change at Each Layer

LayerData Unit Name
ApplicationData / Message
TransportSegment (TCP) / Datagram (UDP)
Internet/NetworkPacket
Network Access/Data LinkFrame
PhysicalBits

This naming matters when reading documentation and communicating with colleagues. "We're dropping packets" means a Layer 3 problem. "We're dropping frames" means a Layer 2 problem.


Practical Troubleshooting with Layers

The power of the layered model is systematic troubleshooting. Start from the bottom and work up:

+---------------------------------------------------------------+
|          TROUBLESHOOTING CHECKLIST BY LAYER                   |
+---------------------------------------------------------------+
|                                                               |
|  Layer 1 - Physical                                           |
|    [ ] Is the cable/link connected?                           |
|    [ ] ip link show -- is the interface UP?                   |
|    [ ] Any errors in ip -s link show?                         |
|                                                               |
|  Layer 2 - Data Link                                          |
|    [ ] Can you see MAC addresses? ip neigh show               |
|    [ ] Is ARP resolving? arping -c 3 <gateway>                |
|    [ ] Check switch port (if you have access)                 |
|                                                               |
|  Layer 3 - Network                                            |
|    [ ] Is IP address configured? ip addr show                 |
|    [ ] Can you ping the gateway? ping <gateway>               |
|    [ ] Can you ping the destination? ping <target>            |
|    [ ] Is routing correct? ip route show                      |
|    [ ] tracepath to find where packets stop                   |
|                                                               |
|  Layer 4 - Transport                                          |
|    [ ] Is the service listening? ss -tlnp                     |
|    [ ] Can you connect to the port? nc -zv <host> <port>      |
|    [ ] Firewall blocking? iptables -L -n / nft list ruleset   |
|                                                               |
|  Layer 7 - Application                                        |
|    [ ] Does the service respond? curl -v http://host:port     |
|    [ ] Check application logs                                 |
|    [ ] Is DNS resolving? dig <hostname>                       |
|    [ ] Is TLS working? openssl s_client -connect host:443     |
|                                                               |
+---------------------------------------------------------------+

Hands-On: Layer-by-Layer Troubleshooting

Let us walk through a real troubleshooting scenario. Imagine a web server is unreachable.

# Step 1: Layer 1 -- Is the interface up?
ip link show
# Look for "state UP"

# Step 2: Layer 2 -- Can we reach the local network?
ip neigh show
# Do we have ARP entries for the gateway?

# Step 3: Layer 3 -- Can we reach the gateway and beyond?
ip route show
# What is the default gateway?

ping -c 3 $(ip route show default | awk '{print $3}')
# Can we ping the gateway?

ping -c 3 8.8.8.8
# Can we reach the internet?

# Step 4: Layer 4 -- Is the port open?
ss -tlnp | grep ':80'
# Is anything listening on port 80?

# Step 5: Layer 7 -- Does the application respond?
curl -v http://localhost:80
# What does the application return?

If ping 8.8.8.8 works but ping google.com fails, the problem is DNS (Layer 7 in OSI, Application layer in TCP/IP), not routing (Layer 3).

If ping works but curl fails, the problem is at Layer 4 (port not open, firewall) or Layer 7 (application not running or misconfigured).

Think About It: You can ping a server by IP address but cannot connect to its web page. At which layers could the problem exist? (Layer 4: port blocked by firewall or service not running. Layer 7: application error, TLS certificate issue, etc.)


Protocols at Each Layer

Here is a reference mapping common protocols to their OSI layers:

LayerProtocolPurpose
7 - ApplicationHTTP/HTTPSWeb traffic
7 - ApplicationDNSName resolution
7 - ApplicationSSHSecure remote access
7 - ApplicationFTP/SFTPFile transfer
7 - ApplicationSMTP/IMAP/POP3Email
7 - ApplicationSNMPNetwork management
7 - ApplicationDHCPIP address assignment
7 - ApplicationNFSNetwork file sharing
6 - PresentationTLS/SSLEncryption
6 - PresentationJPEG, MPEGMedia encoding
5 - SessionNetBIOSSession management
5 - SessionRPCRemote procedure calls
4 - TransportTCPReliable transport
4 - TransportUDPUnreliable transport
3 - NetworkIP (IPv4/IPv6)Addressing, routing
3 - NetworkICMPDiagnostics (ping)
3 - NetworkIGMPMulticast management
2 - Data LinkEthernet (802.3)Wired LAN
2 - Data LinkWi-Fi (802.11)Wireless LAN
2 - Data LinkARPIP-to-MAC resolution
1 - PhysicalCat6, FiberCables
1 - Physical802.11 radioWi-Fi signals

Distro Note: The tools for each layer are the same across distributions. However, some minimal images may not include tools like tracepath, nmap, or curl by default. Install them as needed: sudo apt install iputils-ping traceroute nmap curl (Debian/Ubuntu) or sudo dnf install iputils traceroute nmap curl (Fedora/RHEL).


How the Linux Kernel Implements the Network Stack

The Linux kernel implements the TCP/IP model directly. Understanding this helps you know where to look when troubleshooting:

+---------------------------------------------------------------+
|  USER SPACE                                                   |
|  +-------------------+  +-------------------+                 |
|  | curl / nginx /    |  | Application       |                 |
|  | your application  |  | (uses sockets)    |                 |
|  +--------+----------+  +--------+----------+                 |
|           |                       |                            |
|           v     SYSTEM CALLS      v                            |
|  =========================================                    |
|  KERNEL SPACE                                                 |
|  +---------------------------------------------------+       |
|  | Socket Layer                                       |       |
|  | (connects userspace to the network stack)          |       |
|  +---------------------------------------------------+       |
|  +---------------------------------------------------+       |
|  | Transport Layer (TCP / UDP)                        |       |
|  | /proc/sys/net/ipv4/tcp_*  -- tuning parameters     |       |
|  +---------------------------------------------------+       |
|  +---------------------------------------------------+       |
|  | Network Layer (IP, routing, netfilter/iptables)    |       |
|  | /proc/sys/net/ipv4/ip_forward -- routing toggle    |       |
|  +---------------------------------------------------+       |
|  +---------------------------------------------------+       |
|  | Device Drivers (Ethernet, Wi-Fi)                   |       |
|  | /sys/class/net/  -- device information              |       |
|  +---------------------------------------------------+       |
|  =========================================                    |
|  HARDWARE (NIC)                                               |
+---------------------------------------------------------------+
# See kernel network parameters
sysctl -a | grep net.ipv4 | head -20

# Check if IP forwarding is enabled (routing)
cat /proc/sys/net/ipv4/ip_forward
# 0 = disabled, 1 = enabled

# See network device info
ls /sys/class/net/

Debug This

A web application running on 192.168.1.50:8080 is unreachable from a client at 192.168.1.100. Walk through the layers to find the problem.

# Layer 1: Is the interface up?
ip link show eth0
# Output: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>  state UP
# Layer 1 is fine.

# Layer 2: Can we resolve the MAC address?
arping -c 1 192.168.1.50
# Output: Unicast reply from 192.168.1.50 [aa:bb:cc:dd:ee:ff]
# Layer 2 is fine.

# Layer 3: Can we ping the host?
ping -c 3 192.168.1.50
# Output: 3 packets transmitted, 3 received, 0% packet loss
# Layer 3 is fine.

# Layer 4: Is the port open?
nc -zv 192.168.1.50 8080
# Output: Connection refused
# Layer 4 PROBLEM -- nothing is listening on port 8080

Diagnosis: Layers 1-3 are all functional. The problem is at Layer 4 -- the application is not listening on port 8080. Possible causes: the application crashed, it is bound to localhost only, or it is running on a different port.

# On the server (192.168.1.50), check what is listening:
ss -tlnp | grep 8080
# (no output -- nothing on 8080)

ss -tlnp | grep python
# Output: LISTEN  0  5  127.0.0.1:8000  users:(("python3",pid=1234))

The app is running on port 8000 (not 8080) and bound to 127.0.0.1 (not all interfaces). Two problems, both at Layer 4/7. Fix the application configuration to listen on 0.0.0.0:8080.


What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  The OSI model has 7 layers; the TCP/IP model has 4.            |
|                                                                   |
|  OSI (bottom to top): Physical, Data Link, Network,             |
|    Transport, Session, Presentation, Application.               |
|                                                                   |
|  TCP/IP: Network Access, Internet, Transport, Application.      |
|                                                                   |
|  DATA ENCAPSULATION: each layer wraps data with its header.     |
|    Application: Data -> Transport: Segment -> Network: Packet   |
|    -> Data Link: Frame -> Physical: Bits                        |
|                                                                   |
|  TROUBLESHOOTING: Start at Layer 1 and work up.                 |
|    L1: Link up? | L2: ARP? | L3: Ping? Route?                  |
|    L4: Port open? | L7: App responding?                         |
|                                                                   |
|  Know the layers to COMMUNICATE precisely with your team.       |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. Layer Identification: For each of the following scenarios, identify which OSI layer is most likely the source of the problem:

    • A server's Ethernet cable was unplugged
    • DNS resolution fails but pinging IP addresses works
    • A firewall is blocking port 443
    • A website returns a 500 Internal Server Error
    • A Wi-Fi adapter is not detected by the system
  2. Trace a Request: Run curl -v https://example.com and identify which layers are involved at each stage of the output (DNS resolution, TCP connection, TLS handshake, HTTP request/response).

  3. Layer-by-Layer Check: Pick a remote server you have access to. Walk through the troubleshooting checklist above, running each command and documenting the output.

  4. Packet Capture: If you have tcpdump installed, capture some traffic and identify the Layer 2 (MAC), Layer 3 (IP), and Layer 4 (TCP/UDP port) information:

    sudo tcpdump -i any -c 10 -nn
    
  5. Bonus Challenge: Draw your own network diagram showing how a packet travels from your machine to google.com. Include every device and every layer transition. How many Layer 3 hops does tracepath google.com show?

IP Addressing & Subnetting

Why This Matters

Your manager asks you to set up a new network segment for the development team. They need 50 IP addresses, isolated from the production network. "Just pick a subnet," they say.

If you do not understand subnetting, you might pick a range that overlaps with production, causing routing chaos. Or you might allocate a /24 (254 hosts) when you only need 50, wasting addresses. Or worse, you might pick a /28 (14 hosts) and run out of IPs in a week.

IP addressing and subnetting are foundational skills for anyone who manages servers. Every time you configure a network interface, set up a firewall rule, troubleshoot connectivity, or plan a network, you need to understand how IP addresses work, what a subnet mask means, and how to calculate network boundaries.

This chapter teaches you the math (it is simpler than you think) and the practical skills to work with IP addresses confidently.


Try This Right Now

# See your IP addresses and subnet masks
ip addr show

# Example output (look for "inet" lines):
# inet 192.168.1.100/24 brd 192.168.1.255 scope global eth0

# That /24 is the subnet mask in CIDR notation
# Let's decode it:
# 192.168.1.100 = your IP address
# /24           = subnet mask (255.255.255.0)
# brd 192.168.1.255 = broadcast address

# See your default gateway (the router)
ip route show default

# Check another machine's IP
ping -c 1 google.com

By the end of this chapter, you will understand every number in that output.


IPv4 Addresses: The Basics

An IPv4 address is a 32-bit number, written as four octets (bytes) separated by dots. Each octet ranges from 0 to 255.

    192  .  168  .    1  .  100
      |       |       |      |
  Octet 1  Octet 2  Octet 3  Octet 4

Binary Representation

Computers do not see 192.168.1.100. They see 32 bits:

  192       168         1       100
 11000000  10101000  00000001  01100100

 Full binary: 11000000.10101000.00000001.01100100

To convert a decimal octet to binary, use powers of 2:

 Bit position:    128   64   32   16    8    4    2    1
                  ---   ---  ---  ---  ---  ---  ---  ---

 192 in binary:    1     1    0    0    0    0    0    0
                  128 + 64 = 192  ✓

 168 in binary:    1     0    1    0    1    0    0    0
                  128 + 32 + 8 = 168  ✓

 100 in binary:    0     1    1    0    0    1    0    0
                   64 + 32 + 4 = 100  ✓

Think About It: Why do octets max out at 255? Because 8 bits can represent values from 0 (00000000) to 255 (11111111). And 128+64+32+16+8+4+2+1 = 255.


Network and Host Portions

Every IP address has two parts:

  1. Network portion -- Identifies which network this address belongs to
  2. Host portion -- Identifies the specific device on that network

The subnet mask tells you where the dividing line is.

IP Address:     192.168.1.100
Subnet Mask:    255.255.255.0

Binary:
  IP:    11000000.10101000.00000001.01100100
  Mask:  11111111.11111111.11111111.00000000
         |<--- Network portion --->|<-Host->|

  Network: 192.168.1.0     (the "street")
  Host:    0.0.0.100        (the "house number")

Rule: Where the subnet mask has a 1-bit, that part of the address is the network portion. Where the mask has a 0-bit, that part is the host portion.


Subnet Masks and CIDR Notation

Subnet masks can be written in two ways:

Dotted Decimal Notation

255.255.255.0     = 24 bits of network, 8 bits of host
255.255.0.0       = 16 bits of network, 16 bits of host
255.0.0.0         = 8 bits of network, 24 bits of host
255.255.255.128   = 25 bits of network, 7 bits of host
255.255.255.192   = 26 bits of network, 6 bits of host

CIDR (Classless Inter-Domain Routing) Notation

CIDR notation appends the number of network bits after a slash:

192.168.1.100/24     same as  255.255.255.0
10.0.0.1/16          same as  255.255.0.0
172.16.0.1/12        same as  255.240.0.0
192.168.1.100/26     same as  255.255.255.192

CIDR notation is simpler and is what you will see in ip addr show output, config files, and modern documentation.

Common Subnet Masks Reference

+------+-------------------+-------+--------------------+
| CIDR | Subnet Mask       | Hosts | Common Use         |
+------+-------------------+-------+--------------------+
| /8   | 255.0.0.0         | 16M   | Class A networks   |
| /12  | 255.240.0.0       | 1M    | Large private nets |
| /16  | 255.255.0.0       | 65534 | Class B networks   |
| /20  | 255.255.240.0     | 4094  | Large subnets      |
| /22  | 255.255.252.0     | 1022  | Medium subnets     |
| /24  | 255.255.255.0     | 254   | Most common LAN    |
| /25  | 255.255.255.128   | 126   | Split /24 in half  |
| /26  | 255.255.255.192   | 62    | Quarter of /24     |
| /27  | 255.255.255.224   | 30    | Small subnet       |
| /28  | 255.255.255.240   | 14    | Very small subnet  |
| /29  | 255.255.255.248   | 6     | Point-to-point     |
| /30  | 255.255.255.252   | 2     | Router links       |
| /32  | 255.255.255.255   | 1     | Single host        |
+------+-------------------+-------+--------------------+

The "Hosts" column is calculated as: 2^(host bits) - 2. The subtraction of 2 accounts for the network address (all host bits = 0) and the broadcast address (all host bits = 1), neither of which can be assigned to a device.


Calculating Subnets: Step by Step

This is the skill that matters most. Given an IP address and a CIDR prefix, you need to be able to calculate:

  1. The network address (first address in the range)
  2. The broadcast address (last address in the range)
  3. The usable host range (first usable to last usable)
  4. The number of usable hosts

Example 1: 192.168.1.100/24

Step 1: Identify the mask
  /24 = 24 network bits, 8 host bits
  Subnet mask: 255.255.255.0

Step 2: Calculate the network address
  AND the IP address with the subnet mask:

  IP:    192.168.  1.100    = 11000000.10101000.00000001.01100100
  Mask:  255.255.255.  0    = 11111111.11111111.11111111.00000000
  ---------------------------------------------------------
  Network: 192.168.1.  0    = 11000000.10101000.00000001.00000000

  Network address: 192.168.1.0

Step 3: Calculate the broadcast address
  Set all host bits to 1:

  Network: 192.168.  1.  0  = 11000000.10101000.00000001.00000000
  Host bits flipped to 1:    11000000.10101000.00000001.11111111
  Broadcast: 192.168.1.255

Step 4: Usable host range
  First usable: 192.168.1.1   (network address + 1)
  Last usable:  192.168.1.254 (broadcast address - 1)
  Total usable: 2^8 - 2 = 254 hosts

Example 2: 10.0.50.75/20

This is a less intuitive example where the subnet boundary falls in the middle of an octet.

Step 1: Identify the mask
  /20 = 20 network bits, 12 host bits
  Subnet mask: 255.255.240.0

  Why 240? The third octet has 4 network bits and 4 host bits:
  11110000 = 128+64+32+16 = 240

Step 2: Calculate the network address
  IP:    10.  0. 50. 75    = 00001010.00000000.00110010.01001011
  Mask: 255.255.240.  0    = 11111111.11111111.11110000.00000000
  ---------------------------------------------------------
  Network: 10.0. 48.  0    = 00001010.00000000.00110000.00000000

  Key: 50 AND 240:
    50  = 00110010
    240 = 11110000
    AND = 00110000 = 48

  Network address: 10.0.48.0

Step 3: Calculate the broadcast address
  Set all 12 host bits to 1:
  Network: 00001010.00000000.00110000.00000000
  Bcast:   00001010.00000000.00111111.11111111
           10.0.63.255

  Key: 48 OR 15 (since 4 host bits in the third octet: 00001111 = 15):
    48 + 15 = 63

  Broadcast address: 10.0.63.255

Step 4: Usable host range
  First usable: 10.0.48.1
  Last usable:  10.0.63.254
  Total usable: 2^12 - 2 = 4094 hosts

Example 3: 172.16.5.130/26

Step 1: Identify the mask
  /26 = 26 network bits, 6 host bits
  Subnet mask: 255.255.255.192

  Fourth octet: 11000000 = 192
  Subnets in the last octet: 256/64 = 4 subnets
  (Block size = 2^6 = 64)

Step 2: Find which subnet block 130 falls in
  Block size = 64
  Subnet boundaries: 0, 64, 128, 192
  130 falls in the 128 block (128 <= 130 < 192)

  Network address: 172.16.5.128

Step 3: Broadcast address
  Next subnet starts at 192, so broadcast = 192 - 1 = 191
  Broadcast address: 172.16.5.191

Step 4: Usable host range
  First usable: 172.16.5.129
  Last usable:  172.16.5.190
  Total usable: 2^6 - 2 = 62 hosts

The Block Size Shortcut

For subnets within the last octet, the block size method is fastest:

Block size = 256 - subnet mask value in the relevant octet

/25:  256 - 128 = 128  (blocks: 0, 128)
/26:  256 - 192 = 64   (blocks: 0, 64, 128, 192)
/27:  256 - 224 = 32   (blocks: 0, 32, 64, 96, 128, 160, 192, 224)
/28:  256 - 240 = 16   (blocks: 0, 16, 32, 48, ... 240)
/29:  256 - 248 = 8    (blocks: 0, 8, 16, 24, ... 248)
/30:  256 - 252 = 4    (blocks: 0, 4, 8, 12, ... 252)

To find which subnet an address belongs to: find the largest block boundary that is less than or equal to the host octet value.

Think About It: Your company has 200 employees who need IP addresses. What is the smallest subnet that can accommodate them all? (Answer: /24 gives 254 hosts. /25 gives only 126 -- not enough.)


Private Address Ranges (RFC 1918)

Not all IP addresses are routable on the public internet. RFC 1918 defines three ranges reserved for private use:

+-------------------+-----------------+------------------+
| Range             | CIDR            | Addresses        |
+-------------------+-----------------+------------------+
| 10.0.0.0 -        | 10.0.0.0/8      | 16,777,214       |
| 10.255.255.255    |                 |                  |
+-------------------+-----------------+------------------+
| 172.16.0.0 -       | 172.16.0.0/12   | 1,048,574        |
| 172.31.255.255    |                 |                  |
+-------------------+-----------------+------------------+
| 192.168.0.0 -      | 192.168.0.0/16  | 65,534           |
| 192.168.255.255   |                 |                  |
+-------------------+-----------------+------------------+

Why these exist: There are not enough IPv4 addresses for every device on earth to have a public one. Private addresses can be reused on different private networks. A router using NAT (Network Address Translation) translates private addresses to a public address when traffic leaves the local network.

Other Special Address Ranges

RangePurpose
127.0.0.0/8Loopback (localhost)
169.254.0.0/16Link-local (APIPA, used when DHCP fails)
224.0.0.0/4Multicast
240.0.0.0/4Reserved (historically "Class E")
0.0.0.0/0Default route (means "any address")
255.255.255.255Limited broadcast
# Verify your loopback interface
ip addr show lo

# Output:
# inet 127.0.0.1/8 scope host lo

# The /8 means the entire 127.x.x.x range is loopback
ping -c 1 127.42.42.42
# It works! Any 127.x.x.x address loops back to your machine.

Hands-On: Subnetting with Linux Tools

Linux provides tools to verify your subnet calculations:

Using ipcalc

# Install ipcalc
# Debian/Ubuntu
sudo apt install ipcalc

# Fedora/RHEL
sudo dnf install ipcalc

# Use it
ipcalc 192.168.1.100/24

Sample output:

Address:   192.168.1.100        11000000.10101000.00000001. 01100100
Netmask:   255.255.255.0 = 24   11111111.11111111.11111111. 00000000
Wildcard:  0.0.0.255            00000000.00000000.00000000. 11111111
=>
Network:   192.168.1.0/24       11000000.10101000.00000001. 00000000
HostMin:   192.168.1.1          11000000.10101000.00000001. 00000001
HostMax:   192.168.1.254        11000000.10101000.00000001. 11111110
Broadcast: 192.168.1.255        11000000.10101000.00000001. 11111111
Hosts/Net: 254                   Class C, Private Internet

Try it with more complex subnets:

ipcalc 10.0.50.75/20
ipcalc 172.16.5.130/26
ipcalc 192.168.10.0/28

Distro Note: There are two different tools named ipcalc. The version in Debian/Ubuntu (ipcalc) and the version in Fedora/RHEL (ipcalc) have slightly different output formats but both do the same calculation. Some systems also have sipcalc with more detailed output.

Using Python for Quick Calculations

python3 -c "
import ipaddress
net = ipaddress.ip_network('192.168.1.100/26', strict=False)
print(f'Network:   {net.network_address}')
print(f'Netmask:   {net.netmask}')
print(f'Broadcast: {net.broadcast_address}')
print(f'First:     {net.network_address + 1}')
print(f'Last:      {net.broadcast_address - 1}')
print(f'Hosts:     {net.num_addresses - 2}')
"

Output:

Network:   192.168.1.64
Netmask:   255.255.255.192
Broadcast: 192.168.1.127
First:     192.168.1.65
Last:      192.168.1.126
Hosts:     62

IPv6: A Brief Overview

IPv4 has approximately 4.3 billion addresses (2^32). That is not enough -- we ran out of new IPv4 allocations years ago. IPv6 solves this with 128-bit addresses, providing approximately 340 undecillion addresses (2^128).

IPv6 Address Format

An IPv6 address is written as eight groups of four hexadecimal digits, separated by colons:

Full:       2001:0db8:85a3:0000:0000:8a2e:0370:7334
Shortened:  2001:db8:85a3::8a2e:370:7334

Shortening rules:

  1. Drop leading zeros in each group: 0db8 becomes db8
  2. Replace consecutive all-zero groups with :: (only once per address)

Special IPv6 Addresses

AddressPurpose
::1Loopback (like 127.0.0.1)
::Unspecified (like 0.0.0.0)
fe80::/10Link-local (auto-configured on every interface)
fc00::/7Unique local (like RFC 1918 private)
2000::/3Global unicast (public internet)
ff00::/8Multicast

IPv6 on Linux

# See IPv6 addresses
ip -6 addr show

# Ping an IPv6 address
ping -6 ::1

# Ping a host via IPv6
ping -6 google.com

# See IPv6 routes
ip -6 route show

# See IPv6 neighbors (like ARP for IPv6)
ip -6 neigh show

IPv6 Subnetting

IPv6 subnetting follows the same principles as IPv4 but with larger numbers. The standard allocation for a single LAN is /64, which provides 2^64 host addresses per subnet -- more than enough for any conceivable LAN.

A typical IPv6 allocation:

  ISP gives you:     2001:db8:abcd::/48     (65,536 /64 subnets)
  You create subnets: 2001:db8:abcd:0001::/64  (subnet 1)
                      2001:db8:abcd:0002::/64  (subnet 2)
                      ...
                      2001:db8:abcd:ffff::/64  (subnet 65535)

Think About It: With IPv6 giving every LAN 2^64 host addresses, is subnetting for host conservation still relevant? (In IPv6, subnetting is about network organization and routing, not address conservation.)


Debug This

You configure a server with IP 192.168.1.200/28 and a gateway of 192.168.1.1. The server cannot reach the gateway. Why?

Diagnosis:

Let us calculate the subnet for 192.168.1.200/28:

/28 = block size of 16
Subnet boundaries: 0, 16, 32, 48, ..., 192, 208, ...
200 falls in the 192 block

Network:   192.168.1.192
Broadcast: 192.168.1.207
Range:     192.168.1.193 - 192.168.1.206

The gateway 192.168.1.1 is in the 192.168.1.0/28 subnet (range 1-14). The server at 192.168.1.200 is in the 192.168.1.192/28 subnet. They are on different subnets.

The server will try to ARP for the gateway but will never get a response because the gateway is not on the same Layer 2 network segment (assuming the subnetting is enforced by the network).

Fix: Either change the server's IP to something in the same /28 as the gateway (e.g., 192.168.1.2/28), or change the subnet mask to include both addresses (e.g., use /24 instead of /28).


What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  IPv4: 32-bit address, written as four octets (0-255).           |
|                                                                   |
|  Every IP has a NETWORK portion and a HOST portion.              |
|  The SUBNET MASK defines the boundary.                           |
|                                                                   |
|  CIDR notation: /24 = 24 network bits, 8 host bits.             |
|                                                                   |
|  To calculate a subnet:                                          |
|    1. Find the block size (256 - mask octet value)               |
|    2. Network = largest block boundary <= host value             |
|    3. Broadcast = next boundary - 1                              |
|    4. Usable hosts = block size - 2                              |
|                                                                   |
|  Private ranges: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16     |
|                                                                   |
|  IPv6: 128-bit addresses, written in hex with colons.            |
|  Standard LAN subnet: /64                                        |
|                                                                   |
|  Use ipcalc or Python's ipaddress module to verify.              |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. Manual Calculation: Without using any tools, calculate the network address, broadcast address, and usable host range for each of these:

    • 10.10.10.50/24
    • 172.16.100.200/20
    • 192.168.5.67/27
    • 10.0.0.1/30

    Then verify your answers with ipcalc.

  2. Network Planning: You need to design a network with four subnets:

    • Engineering: 100 hosts
    • Sales: 50 hosts
    • Management: 20 hosts
    • Guest Wi-Fi: 30 hosts

    Starting with 10.1.0.0, assign the smallest appropriate subnet to each department. Document the network address, usable range, and broadcast for each.

  3. Your Network: Run ip addr show and ip route show. Identify your IP address, subnet mask, network address, broadcast address, and gateway. Calculate how many hosts your subnet can support.

  4. IPv6 Exploration: Run ip -6 addr show on your system. What IPv6 addresses are configured? Are they link-local (fe80::) or global? What is their prefix length?

  5. Bonus Challenge: Write a bash script that takes an IP address and CIDR prefix as arguments and outputs the network address, broadcast address, first host, last host, and number of usable hosts. (Hint: use bitwise operations in bash with $(( )) or call Python's ipaddress module.)

TCP, UDP & Ports

Why This Matters

A web application is "slow." Users complain about timeouts. You SSH into the server, and everything looks fine -- CPU is idle, memory is available, disk I/O is normal. You check the web server, and it is running. But then you look at the network connections:

ss -s
Total: 12847
TCP:   12583 (estab 2034, closed 9200, orphaned 0, timewait 9150)

Over 9,000 connections stuck in TIME_WAIT. The server is running out of ephemeral ports. New connections cannot be established because there are no free source ports available.

Understanding TCP, UDP, connection states, and port management is not optional for anyone running production services. The transport layer is where your applications actually talk to each other, and when it breaks, knowing these fundamentals is the difference between a five-minute fix and hours of fumbling.


Try This Right Now

# See all listening TCP ports on your system
ss -tlnp

# See all established connections
ss -tnp

# Count connections by state
ss -s

# See what process is using a specific port
ss -tlnp | grep ':22'

# Check a well-known port mapping
grep -w "80" /etc/services

These commands will make much more sense by the end of this chapter.


TCP vs UDP: Two Transport Protocols

The transport layer (Layer 4) has two main protocols. They solve the same problem -- getting data between applications -- but they make very different trade-offs.

TCP: Transmission Control Protocol

TCP is connection-oriented and reliable. Before any data is sent, TCP establishes a connection. It guarantees that data arrives in order, without duplicates, and retransmits anything that gets lost.

Characteristics:

  • Connection-oriented (three-way handshake)
  • Reliable delivery (acknowledgments, retransmissions)
  • Ordered (data arrives in the sequence it was sent)
  • Flow control (sender slows down if receiver is overwhelmed)
  • Congestion control (sender slows down if the network is congested)
  • Higher overhead (headers, state tracking, retransmissions)

Used by: HTTP/HTTPS, SSH, FTP, SMTP, database connections, anything where losing data is unacceptable.

UDP: User Datagram Protocol

UDP is connectionless and unreliable (in the technical sense -- it does not guarantee delivery). There is no handshake, no acknowledgment, no retransmission. You send a packet and hope it arrives.

Characteristics:

  • Connectionless (no handshake, just send)
  • Unreliable (no acknowledgment, no retransmission)
  • Unordered (packets can arrive in any order)
  • No flow control or congestion control
  • Lower overhead (minimal header, no state)
  • Faster for real-time applications

Used by: DNS (queries), DHCP, NTP, streaming video/audio, online gaming, VoIP, VPN tunnels (WireGuard, OpenVPN in UDP mode).

Side-by-Side Comparison

+-----------------------------+-----------------------------+
|            TCP              |            UDP              |
+-----------------------------+-----------------------------+
| Connection-oriented         | Connectionless              |
| Reliable delivery           | Best-effort delivery        |
| Ordered                     | Unordered                   |
| Retransmits lost data       | Lost data stays lost        |
| Flow control                | No flow control             |
| Higher latency              | Lower latency               |
| 20-60 byte header           | 8 byte header               |
| Streaming data (byte stream)| Message-based (datagrams)   |
+-----------------------------+-----------------------------+
| Used for:                   | Used for:                   |
|   Web (HTTP/HTTPS)          |   DNS queries               |
|   Email (SMTP/IMAP)         |   Video streaming           |
|   SSH                       |   Online gaming             |
|   File transfer             |   VoIP (phone calls)        |
|   Database connections      |   DHCP                      |
+-----------------------------+-----------------------------+

Think About It: Why does DNS typically use UDP instead of TCP? DNS queries are small (usually fit in a single packet), and the speed of UDP matters when you resolve dozens of names per page load. However, DNS DOES fall back to TCP for large responses (like zone transfers) or when a response is truncated.


The TCP Three-Way Handshake

Before any data flows over a TCP connection, the two sides perform a three-way handshake to establish the connection. This is one of the most important concepts in networking.

    Client                              Server
      |                                   |
      |  1. SYN (seq=100)                |
      |  "I want to connect"             |
      |--------------------------------->|
      |                                   |
      |  2. SYN-ACK (seq=300, ack=101)   |
      |  "OK, I acknowledge your SYN"    |
      |<---------------------------------|
      |                                   |
      |  3. ACK (seq=101, ack=301)       |
      |  "Got it, connection established"|
      |--------------------------------->|
      |                                   |
      |  Connection ESTABLISHED           |
      |  Data can now flow both ways      |
      |                                   |

Step 1 - SYN: The client sends a SYN (synchronize) packet with a random initial sequence number. "I want to start a conversation."

Step 2 - SYN-ACK: The server responds with its own SYN and an ACK (acknowledgment) of the client's SYN. "I hear you, and I want to talk too."

Step 3 - ACK: The client acknowledges the server's SYN. "Great, let's go."

After this, the connection is ESTABLISHED and data flows.

Connection Teardown: Four-Way Handshake

Closing a TCP connection takes four steps (or sometimes three, with a combined FIN-ACK):

    Client                              Server
      |                                   |
      |  1. FIN                           |
      |  "I'm done sending"              |
      |--------------------------------->|
      |                                   |
      |  2. ACK                           |
      |  "OK, noted"                      |
      |<---------------------------------|
      |                                   |
      |  3. FIN                           |
      |  "I'm done too"                  |
      |<---------------------------------|
      |                                   |
      |  4. ACK                           |
      |  "Got it, connection closed"      |
      |--------------------------------->|
      |                                   |
      |  Connection CLOSED                |
      |                                   |

Hands-On: Watching the Handshake

You can actually see the three-way handshake using tcpdump:

# In terminal 1: start capturing on the loopback interface
sudo tcpdump -i lo -nn port 8080

# In terminal 2: start a simple listener
nc -l -p 8080

# In terminal 3: connect to it
nc localhost 8080

In the tcpdump output, you will see something like:

10:00:01 IP 127.0.0.1.54321 > 127.0.0.1.8080: Flags [S], seq 12345
10:00:01 IP 127.0.0.1.8080 > 127.0.0.1.54321: Flags [S.], seq 67890, ack 12346
10:00:01 IP 127.0.0.1.54321 > 127.0.0.1.8080: Flags [.], ack 67891

[S] = SYN, [S.] = SYN-ACK, [.] = ACK. You just witnessed the three-way handshake.


Port Numbers

A port number is a 16-bit integer (0-65535) that identifies a specific application or service on a host. Think of the IP address as a street address and the port as the apartment number.

   IP Address : Port
   192.168.1.100 : 443
        |            |
   "Which server"   "Which service on that server"

Port Ranges

RangeNamePurpose
0 - 1023Well-known / SystemReserved for standard services. Binding requires root or CAP_NET_BIND_SERVICE
1024 - 49151RegisteredAssigned by IANA for specific applications. Can be used by regular users
49152 - 65535Dynamic / EphemeralUsed by the OS for outgoing connections (source ports)

Common Well-Known Ports

+--------+------------------+------------+
| Port   | Service          | Protocol   |
+--------+------------------+------------+
| 20/21  | FTP (data/ctrl)  | TCP        |
| 22     | SSH              | TCP        |
| 23     | Telnet           | TCP        |
| 25     | SMTP             | TCP        |
| 53     | DNS              | TCP/UDP    |
| 67/68  | DHCP (srv/client)| UDP        |
| 80     | HTTP             | TCP        |
| 110    | POP3             | TCP        |
| 123    | NTP              | UDP        |
| 143    | IMAP             | TCP        |
| 443    | HTTPS            | TCP        |
| 465    | SMTPS            | TCP        |
| 514    | Syslog           | UDP        |
| 587    | SMTP (submission)| TCP        |
| 993    | IMAPS            | TCP        |
| 995    | POP3S            | TCP        |
| 3306   | MySQL            | TCP        |
| 5432   | PostgreSQL       | TCP        |
| 6379   | Redis            | TCP        |
| 8080   | HTTP (alt)       | TCP        |
| 8443   | HTTPS (alt)      | TCP        |
+--------+------------------+------------+

The /etc/services File

Linux maintains a mapping of port numbers to service names:

# Search for a port number
grep -w "443" /etc/services

# Search for a service name
grep -w "ssh" /etc/services

# Count total entries
wc -l /etc/services

Sample output:

https           443/tcp
https           443/udp
ssh             22/tcp

Ephemeral Ports

When your machine initiates a connection (e.g., your browser connecting to a web server), the OS assigns a random source port from the ephemeral range:

# See the configured ephemeral port range
cat /proc/sys/net/ipv4/ip_local_port_range
# Output: 32768    60999

# This means the OS will use ports 32768-60999 for outgoing connections
# That's 28,232 available source ports

If you run out of ephemeral ports (too many connections, or too many stuck in TIME_WAIT), new outgoing connections will fail. This is a common production issue.

Think About It: If a server has one IP address and the ephemeral port range is 32768-60999, what is the maximum number of simultaneous outgoing connections it can make to a single destination IP and port? (Answer: 28,232. Each connection needs a unique source-IP:source-port to destination-IP:destination-port combination.)


Examining Connections with ss

The ss command (socket statistics) is the modern replacement for the older netstat. It is faster and provides more information.

Basic ss Usage

# Show all TCP sockets
ss -t

# Show all UDP sockets
ss -u

# Show listening sockets only
ss -l

# Show process information (requires root for all processes)
ss -p

# Show numeric addresses (don't resolve hostnames)
ss -n

# Common combinations:

# All listening TCP ports with process info and numeric addresses
ss -tlnp

# All established TCP connections
ss -tnp state established

# All listening UDP ports
ss -ulnp

# Summary statistics
ss -s

Reading ss Output

ss -tlnp
State    Recv-Q Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN   0      128     0.0.0.0:22          0.0.0.0:*          users:(("sshd",pid=1234,fd=3))
LISTEN   0      511     0.0.0.0:80          0.0.0.0:*          users:(("nginx",pid=5678,fd=6))
LISTEN   0      128     [::]:22             [::]:*             users:(("sshd",pid=1234,fd=4))
LISTEN   0      511     127.0.0.1:3306      0.0.0.0:*          users:(("mysqld",pid=9012,fd=20))

Breaking this down:

FieldMeaning
StateLISTEN = waiting for connections
Recv-QBytes in receive queue (0 is normal for LISTEN)
Send-QBacklog size (max pending connections)
Local Address:PortWhat address and port the service is bound to
Peer Address:Port* means accepting from any address
ProcessThe process, PID, and file descriptor

Important: 0.0.0.0:80 means listening on ALL interfaces. 127.0.0.1:3306 means listening ONLY on localhost -- MySQL is not accessible from other machines. [::]:22 means listening on all IPv6 addresses.

Filtering Connections

# Filter by state
ss -tn state established
ss -tn state time-wait
ss -tn state close-wait

# Filter by port
ss -tn sport = :22                  # Source port 22
ss -tn dport = :443                 # Destination port 443
ss -tn '( sport = :80 or sport = :443 )'  # Port 80 or 443

# Filter by address
ss -tn dst 192.168.1.0/24          # Connections to a subnet

# Count connections per state
ss -tn state established | wc -l
ss -tn state time-wait | wc -l

The Older netstat (Still Useful)

# Install if not present
# Debian/Ubuntu: sudo apt install net-tools
# Fedora/RHEL: sudo dnf install net-tools

# Equivalent of ss -tlnp
netstat -tlnp

# All connections
netstat -an

# Statistics
netstat -s

Distro Note: netstat is part of the net-tools package, which is considered deprecated. Modern distributions may not include it by default. Use ss instead -- it is faster, more feature-rich, and included in iproute2 which is installed everywhere.


TCP Connection States

A TCP connection goes through several states during its lifetime. Understanding these states is critical for debugging connection issues.

+------------------------------------------------------------------+
|                TCP CONNECTION STATE DIAGRAM                       |
+------------------------------------------------------------------+
|                                                                   |
|  Client                              Server                     |
|                                                                   |
|  CLOSED                              CLOSED                     |
|    |                                    |                        |
|    | connect()                          | listen()               |
|    v                                    v                        |
|  SYN_SENT ----SYN---->              LISTEN                      |
|    |                                    |                        |
|    |            <----SYN-ACK----        |                        |
|    v                                    v                        |
|  ESTABLISHED ----ACK---->          SYN_RECEIVED                 |
|    |                                    |                        |
|    |                                    v                        |
|    |                               ESTABLISHED                  |
|    |                                    |                        |
|    | (data transfer)                    |                        |
|    |                                    |                        |
|    | close()                            |                        |
|    v                                    |                        |
|  FIN_WAIT_1 ----FIN---->               |                        |
|    |                                    v                        |
|    |            <----ACK----       CLOSE_WAIT                   |
|    v                                    |                        |
|  FIN_WAIT_2                             | close()               |
|    |                                    v                        |
|    |            <----FIN----       LAST_ACK                     |
|    v                                    |                        |
|  TIME_WAIT ----ACK---->            CLOSED                       |
|    |                                                             |
|    | (wait 2*MSL)                                                |
|    v                                                             |
|  CLOSED                                                          |
|                                                                   |
+------------------------------------------------------------------+

States You Need to Know

StateMeaningCommon Issues
LISTENWaiting for incoming connectionsNormal for servers
ESTABLISHEDActive connection, data flowingNormal
SYN_SENTClient sent SYN, waiting for responseFirewall blocking, server down
SYN_RECEIVEDServer received SYN, sent SYN-ACKSYN flood attack
TIME_WAITConnection closed, waiting before reuseToo many = port exhaustion
CLOSE_WAITRemote side closed, local has not yetApplication bug (not closing sockets)
FIN_WAIT_1Sent FIN, waiting for ACKRemote side not responding
FIN_WAIT_2Received ACK for our FINRemote side has not sent FIN yet
LAST_ACKSent FIN, waiting for final ACKUnusual

Diagnosing State Problems

Too many TIME_WAIT:

# Count TIME_WAIT connections
ss -tn state time-wait | wc -l

# If thousands: connections are being opened and closed rapidly
# TIME_WAIT lasts for 2*MSL (Maximum Segment Lifetime), typically 60 seconds

# Kernel tuning options (use with caution):
# Allow reuse of TIME_WAIT sockets
cat /proc/sys/net/ipv4/tcp_tw_reuse
# 1 = enabled (safe for clients, helps with outgoing connections)

Too many CLOSE_WAIT:

# Count CLOSE_WAIT connections
ss -tn state close-wait | wc -l

# CLOSE_WAIT means the remote end closed the connection, but YOUR application
# has not called close() on the socket. This is almost always an application bug.
# Find which process has them:
ss -tnp state close-wait

WARNING: Do not blindly tune kernel TCP parameters to "fix" TIME_WAIT or other state issues. These parameters exist for good reasons (preventing old duplicate packets from being accepted by new connections). Instead, fix the root cause -- typically an application that opens too many short-lived connections.


Practical Port Scanning with nmap

nmap (Network Mapper) is the standard open-source tool for network exploration and port scanning. It helps you discover what services are running and what ports are open.

# Install nmap
# Debian/Ubuntu
sudo apt install nmap

# Fedora/RHEL
sudo dnf install nmap

# Arch
sudo pacman -S nmap

WARNING: Only scan networks and systems you own or have explicit permission to scan. Unauthorized port scanning can violate laws and policies. In many jurisdictions, scanning someone else's network without permission is illegal.

Basic Scans

# Scan common ports on a host
nmap 192.168.1.1

# Scan specific ports
nmap -p 22,80,443 192.168.1.1

# Scan a port range
nmap -p 1-1024 192.168.1.1

# Scan all 65535 ports
nmap -p- 192.168.1.1

# Scan a subnet
nmap 192.168.1.0/24

# UDP scan (requires root)
sudo nmap -sU -p 53,67,123 192.168.1.1

# Service version detection
nmap -sV -p 22,80,443 192.168.1.1

# OS detection (requires root)
sudo nmap -O 192.168.1.1

Reading nmap Output

Starting Nmap 7.94 ( https://nmap.org )
Nmap scan report for 192.168.1.1
Host is up (0.0010s latency).

PORT    STATE    SERVICE
22/tcp  open     ssh
80/tcp  open     http
443/tcp open     https
3306/tcp closed  mysql
8080/tcp filtered http-proxy

Nmap done: 1 IP address (1 host up) scanned in 0.25 seconds
StateMeaning
openPort is accepting connections
closedPort is reachable but no service is listening
filteredA firewall is blocking nmap's probe packets

Hands-On: Scan Your Own Machine

# Scan your localhost
nmap localhost

# Compare with ss output
ss -tlnp

# They should show the same open ports
# nmap shows them from the "outside" perspective
# ss shows them from the "inside" perspective

Hands-On: Putting It All Together

Let us create a practical exercise that demonstrates TCP connections, ports, and state transitions:

# Terminal 1: Start a TCP listener on port 9999
nc -l -p 9999

# Terminal 2: Watch the connection states
watch -n 0.5 'ss -tn | grep 9999'

# Terminal 3: Connect to the listener
nc localhost 9999

# In Terminal 2, you should see:
# ESTAB  0  0  127.0.0.1:random_port  127.0.0.1:9999
# ESTAB  0  0  127.0.0.1:9999  127.0.0.1:random_port

# Type some text in Terminal 3 -- it appears in Terminal 1
# This is TCP at work: reliable, ordered delivery

# Press Ctrl-C in Terminal 3 to close the connection
# Watch Terminal 2 -- you may briefly see TIME_WAIT

# Now try UDP:
# Terminal 1:
nc -u -l -p 9999

# Terminal 3:
nc -u localhost 9999
# Type text -- it arrives, but there is no connection state to track
# UDP is connectionless

Debug This

A developer reports that their application cannot connect to the database. They get a "Connection refused" error on port 5432.

# Step 1: Is PostgreSQL listening?
ss -tlnp | grep 5432

# Output:
# LISTEN  0  128  127.0.0.1:5432  0.0.0.0:*  users:(("postgres",pid=1234,fd=5))

PostgreSQL is listening, but only on 127.0.0.1. If the application is on a different server, it cannot connect because PostgreSQL is not listening on the external interface.

Fix: Edit postgresql.conf and change listen_addresses:

# From:
listen_addresses = 'localhost'

# To:
listen_addresses = '*'       # Listen on all interfaces
# Or:
listen_addresses = '0.0.0.0' # Listen on all IPv4 interfaces

Then restart PostgreSQL and verify:

sudo systemctl restart postgresql
ss -tlnp | grep 5432
# Should now show 0.0.0.0:5432 instead of 127.0.0.1:5432

Also check pg_hba.conf to ensure the remote host is allowed to authenticate, and verify no firewall is blocking port 5432.


What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  TCP: Reliable, ordered, connection-oriented. Used for web,      |
|  email, SSH, databases. Three-way handshake: SYN, SYN-ACK, ACK. |
|                                                                   |
|  UDP: Fast, connectionless, best-effort. Used for DNS, video,    |
|  VoIP, gaming. No handshake, no guarantees.                      |
|                                                                   |
|  PORTS: 0-1023 (well-known), 1024-49151 (registered),           |
|  49152-65535 (ephemeral/dynamic).                                |
|                                                                   |
|  Key commands:                                                    |
|    ss -tlnp      Show listening TCP ports                        |
|    ss -tn        Show TCP connections                            |
|    ss -s         Connection statistics                           |
|    nmap host     Scan for open ports                             |
|                                                                   |
|  Connection states to watch: ESTABLISHED (normal),               |
|  TIME_WAIT (too many = exhaustion), CLOSE_WAIT (app bug).        |
|                                                                   |
|  0.0.0.0 = all interfaces. 127.0.0.1 = localhost only.           |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. Port Discovery: Run ss -tlnp on your system. For each listening port, identify the service and whether it is listening on all interfaces or just localhost.

  2. Connection Tracking: Open a web browser and load several pages. Use ss -tn to observe the TCP connections being created to remote servers. Note the source (ephemeral) ports and destination ports.

  3. State Observation: Write a small script that opens and closes 100 TCP connections to a local service rapidly. Use ss -tn state time-wait | wc -l to watch the TIME_WAIT accumulation.

  4. nmap Scan: Scan your own machine with nmap localhost. Compare the results with ss -tlnp. Are there any differences? Why might there be?

  5. Service Fingerprinting: Use nmap -sV to detect the version of services running on your machine. How much information does it reveal? Consider the security implications.

  6. Bonus Challenge: Set up a simple TCP echo server using nc or Python. Connect to it from another terminal, send data, and use tcpdump to capture the full TCP lifecycle: three-way handshake, data transfer, and four-way teardown. Identify each packet's flags.

DNS: The Internet's Phone Book

Why This Matters

It is Monday morning. Users report that your company website is unreachable. You try pinging it by IP address -- works fine. You try the domain name -- nothing. Your colleague in another office says it works for them. A third person reports it is intermittent.

Welcome to DNS hell.

DNS (Domain Name System) is the infrastructure that translates human-readable domain names like example.com into IP addresses like 93.184.216.34. Every single web request, email delivery, API call, and service discovery starts with a DNS lookup. When DNS breaks, the internet breaks -- or at least your view of it.

DNS problems are some of the most frustrating to diagnose because they are invisible. There is no clear error saying "DNS failed." You just get "host not found" or "connection timed out," and you have to figure out that the real issue is name resolution. Understanding how DNS works -- the hierarchy, the caching, the record types, the resolution process -- turns these mysterious failures into straightforward troubleshooting.


Try This Right Now

# Resolve a domain name to an IP address
dig example.com

# Simpler output -- just the answer
dig +short example.com

# Query a specific DNS server (Google's public DNS)
dig @8.8.8.8 example.com

# See the full resolution chain
dig +trace example.com

# Check your current DNS configuration
cat /etc/resolv.conf

# Check the local hosts file
cat /etc/hosts

How DNS Resolution Works

When you type www.example.com in your browser, here is what happens behind the scenes:

+------------------------------------------------------------------+
|                  DNS RESOLUTION PROCESS                           |
+------------------------------------------------------------------+
|                                                                   |
|  1. Browser/App                                                   |
|     "What is the IP for www.example.com?"                        |
|         |                                                         |
|         v                                                         |
|  2. Local Cache (OS resolver)                                    |
|     "Have I looked this up recently?"                            |
|     If YES --> return cached answer                              |
|     If NO  --> ask recursive resolver                            |
|         |                                                         |
|         v                                                         |
|  3. Recursive Resolver (ISP or 8.8.8.8)                          |
|     "I'll find the answer for you"                               |
|     Checks its own cache first.                                  |
|     If not cached, starts the iterative process:                 |
|         |                                                         |
|         v                                                         |
|  4. Root Name Server (.)                                         |
|     "I don't know www.example.com, but .com is handled           |
|      by these TLD servers: a.gtld-servers.net, ..."              |
|         |                                                         |
|         v                                                         |
|  5. TLD Name Server (.com)                                       |
|     "I don't know www.example.com, but example.com is            |
|      handled by these nameservers: ns1.example.com, ..."         |
|         |                                                         |
|         v                                                         |
|  6. Authoritative Name Server (example.com)                      |
|     "www.example.com is 93.184.216.34"                           |
|         |                                                         |
|         v                                                         |
|  Answer flows back through the chain, getting cached at          |
|  each level along the way.                                       |
|                                                                   |
+------------------------------------------------------------------+

Recursive vs Iterative Queries

Recursive query: The client asks a resolver, and the resolver does ALL the work to find the answer. Your computer makes recursive queries to your DNS resolver. "Find me the answer and come back when you have it."

Iterative query: The server responds with the best answer it has, which might be a referral to another server. The recursive resolver makes iterative queries to root, TLD, and authoritative servers. "I don't know, but ask that server over there."

The DNS Hierarchy

DNS is organized as an inverted tree:

                        . (root)
                       /    \     \
                    .com    .org   .net   .uk   .io  ...
                   /    \
            example.com  google.com
           /     |
     www.example  mail.example

Root servers -- 13 groups of root name servers (named a.root-servers.net through m.root-servers.net), replicated globally via anycast. They know where to find each TLD.

TLD (Top-Level Domain) servers -- Manage domains like .com, .org, .net, .uk. They know which authoritative servers handle each domain within their TLD.

Authoritative servers -- The definitive source for a specific domain. They hold the actual DNS records (A, AAAA, MX, etc.) and give the final answer.

Think About It: Why is DNS hierarchical instead of having one giant database with all domain names? Consider scalability (billions of lookups per day), administration (each organization manages its own records), and resilience (no single point of failure).


DNS Record Types

DNS stores different types of records for different purposes. Here are the ones you will encounter most often:

A Record (Address)

Maps a domain name to an IPv4 address. The most fundamental record type.

dig A example.com +short
# 93.184.216.34

AAAA Record (IPv6 Address)

Maps a domain name to an IPv6 address. The name "AAAA" is because IPv6 addresses are four times the length of IPv4 addresses.

dig AAAA example.com +short
# 2606:2800:220:1:248:1893:25c8:1946

CNAME Record (Canonical Name)

An alias that points one domain name to another. When you query a CNAME, the resolver follows it to the target name and returns that IP.

dig CNAME www.example.com +short
# example.com
# (www.example.com is an alias for example.com)

Important: A CNAME cannot coexist with other record types at the same name. You cannot have both a CNAME and an MX record for example.com. This is a common source of misconfigurations.

MX Record (Mail Exchanger)

Specifies which mail servers handle email for a domain. Includes a priority number (lower number = higher priority).

dig MX example.com +short
# 10 mail.example.com.
# 20 mail2.example.com.

If the server with priority 10 is unreachable, email is delivered to priority 20.

NS Record (Name Server)

Identifies the authoritative DNS servers for a domain.

dig NS example.com +short
# a.iana-servers.net.
# b.iana-servers.net.

TXT Record (Text)

Holds arbitrary text data. Commonly used for SPF (email sender verification), DKIM (email signing), domain verification, and ACME challenges (Let's Encrypt).

dig TXT example.com +short
# "v=spf1 -all"

PTR Record (Pointer / Reverse DNS)

Maps an IP address back to a domain name. The reverse of an A record. Used for reverse DNS lookups.

dig -x 93.184.216.34 +short
# (returns the hostname associated with that IP)

SOA Record (Start of Authority)

Contains administrative information about a DNS zone: the primary nameserver, the administrator's email, serial number, and timing parameters for caching and zone transfers.

dig SOA example.com +short
# sns.dns.icann.org. noc.dns.icann.org. 2022091303 7200 3600 1209600 3600

Fields: primary NS, admin email (@ replaced by .), serial, refresh, retry, expire, minimum TTL.

Complete Record Type Reference

+--------+-------------------+-----------------------------------+
| Type   | Name              | Purpose                           |
+--------+-------------------+-----------------------------------+
| A      | Address           | Domain -> IPv4 address            |
| AAAA   | IPv6 Address      | Domain -> IPv6 address            |
| CNAME  | Canonical Name    | Alias -> another domain name      |
| MX     | Mail Exchanger    | Mail server for the domain        |
| NS     | Name Server       | Authoritative DNS servers         |
| TXT    | Text              | SPF, DKIM, verification, etc.     |
| PTR    | Pointer           | IP address -> domain (reverse)    |
| SOA    | Start of Authority| Zone admin info and parameters    |
| SRV    | Service           | Service discovery (host:port)     |
| CAA    | Cert Authority    | Which CAs can issue certs         |
+--------+-------------------+-----------------------------------+

The dig Command: Deep Dive

dig (Domain Information Groper) is the most important DNS troubleshooting tool. It is flexible, detailed, and reveals exactly what is happening.

Basic Usage

# Simple query
dig example.com

# Query a specific record type
dig MX example.com

# Query a specific DNS server
dig @8.8.8.8 example.com

# Short output (just the answer)
dig +short example.com

# No extra sections (cleaner output)
dig +noall +answer example.com

Reading Full dig Output

dig example.com
; <<>> DiG 9.18.1 <<>> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12345
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;example.com.                   IN      A

;; ANSWER SECTION:
example.com.            86400   IN      A       93.184.216.34

;; Query time: 23 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Sat Feb 21 10:00:00 UTC 2026
;; MSG SIZE  rcvd: 56

Breaking this down:

SectionWhat It Tells You
status: NOERRORQuery was successful. NXDOMAIN means the domain does not exist. SERVFAIL means the server had an error.
flags: qr rd raqr = query response, rd = recursion desired, ra = recursion available
QUESTION SECTIONWhat was asked (A record for example.com)
ANSWER SECTIONThe answer: IP 93.184.216.34, TTL 86400 seconds (24 hours)
Query timeHow long the lookup took
SERVERWhich DNS server answered

Advanced dig Usage

# Trace the full resolution path (shows each step)
dig +trace example.com

# Check all record types
dig ANY example.com

# Reverse DNS lookup
dig -x 93.184.216.34

# Check a specific nameserver's records
dig @ns1.example.com example.com

# Check if a response is authoritative
dig +noall +answer +authority example.com

# Check the TTL (Time To Live) -- how long the record is cached
dig +noall +answer example.com
# The number after the domain name is the TTL in seconds

# Batch queries from a file
echo -e "example.com\ngoogle.com\ngithub.com" > domains.txt
dig -f domains.txt +short

Hands-On: Tracing DNS Resolution

# Watch the entire resolution chain
dig +trace www.google.com

This shows you exactly which servers were queried at each level:

.                       518400  IN  NS  a.root-servers.net.
.                       518400  IN  NS  b.root-servers.net.
(... root servers ...)
;; Received 525 bytes from 127.0.0.53#53

com.                    172800  IN  NS  a.gtld-servers.net.
com.                    172800  IN  NS  b.gtld-servers.net.
(... .com TLD servers ...)
;; Received 734 bytes from 198.41.0.4#53 (a.root-servers.net)

google.com.             172800  IN  NS  ns1.google.com.
google.com.             172800  IN  NS  ns2.google.com.
(... Google's nameservers ...)
;; Received 836 bytes from 192.5.6.30#53 (a.gtld-servers.net)

www.google.com.         300     IN  A   142.250.80.100
;; Received 48 bytes from 216.239.32.10#53 (ns1.google.com)

You can see the query going from root -> .com TLD -> google.com authoritative -> answer.


nslookup: The Quick Alternative

nslookup is simpler than dig but less detailed. It is available on almost every system, including Windows.

# Basic lookup
nslookup example.com

# Specify a DNS server
nslookup example.com 8.8.8.8

# Lookup a specific record type
nslookup -type=mx example.com
nslookup -type=ns example.com
nslookup -type=txt example.com

# Reverse lookup
nslookup 93.184.216.34

DNS Configuration on Linux

/etc/resolv.conf

This file tells your system which DNS servers to use:

cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 8.8.8.8
nameserver 8.8.4.4
search example.com internal.example.com
DirectiveMeaning
nameserverIP address of a DNS resolver (up to 3)
searchDomains to append when searching short names
domainDefault domain (older alternative to search)

The search directive means that if you query webserver, the resolver will try webserver.example.com and then webserver.internal.example.com before giving up.

WARNING: On many modern systems, /etc/resolv.conf is managed automatically by systemd-resolved, NetworkManager, or dhclient. Editing it directly may work temporarily but get overwritten. Check if the file is a symlink: ls -la /etc/resolv.conf

/etc/hosts

The hosts file provides static name-to-IP mappings that bypass DNS entirely. It is checked BEFORE DNS servers.

cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       myhostname
::1             localhost ip6-localhost

# Custom entries
192.168.1.10    dbserver.internal  dbserver
192.168.1.20    webserver.internal webserver

Use cases for /etc/hosts:

  • Testing a website before DNS is configured
  • Blocking domains (point them to 127.0.0.1)
  • Local development (mapping myapp.local to 127.0.0.1)
  • Servers that need to resolve each other without DNS

/etc/nsswitch.conf

This file controls the order in which name resolution sources are consulted:

grep hosts /etc/nsswitch.conf
hosts:          files dns mymachines

This means: check /etc/hosts first (files), then DNS (dns). Changing the order changes the behavior.

Think About It: A developer adds 192.168.1.99 api.production.com to their /etc/hosts for testing. They forget about it. Months later, the production API server changes its IP. Why does their application break while everyone else's works fine?


systemd-resolved: Modern DNS Management

Many modern Linux distributions use systemd-resolved as the local DNS resolver. It provides caching, DNSSEC validation, and per-interface DNS configuration.

# Check if systemd-resolved is running
systemctl status systemd-resolved

# See current DNS configuration
resolvectl status

# Query using systemd-resolved directly
resolvectl query example.com

# See DNS statistics (cache hits, misses)
resolvectl statistics

# Flush the DNS cache
resolvectl flush-caches
# Or:
sudo systemd-resolve --flush-caches

How systemd-resolved Works

+----------------------------------------------------------+
|                                                          |
|  Application                                             |
|      |                                                   |
|      v                                                   |
|  NSS (checks /etc/nsswitch.conf)                        |
|      |                                                   |
|      v                                                   |
|  systemd-resolved  (127.0.0.53:53)                      |
|  +----------------------------------------------------+ |
|  | Local cache                                         | |
|  | DNSSEC validation                                   | |
|  | Per-link DNS configuration                          | |
|  +----------------------------------------------------+ |
|      |                                                   |
|      v                                                   |
|  Upstream DNS servers                                    |
|  (configured per-interface via DHCP or manually)        |
|                                                          |
+----------------------------------------------------------+

When systemd-resolved is active, /etc/resolv.conf typically contains:

nameserver 127.0.0.53

This points to the local stub resolver. The actual upstream DNS servers are managed by systemd-resolved and visible via resolvectl status.

Distro Note: systemd-resolved is default on Ubuntu 18.04+ and Fedora. It is optional on Arch and not typically used on Debian (though available). On systems without systemd-resolved, DNS is configured directly in /etc/resolv.conf, often managed by NetworkManager or dhclient.


DNS Caching and TTL

Every DNS record has a TTL (Time To Live) value, specified in seconds. This tells resolvers how long they can cache the record before they must query again.

# See the TTL in dig output
dig +noall +answer example.com
# example.com.    86400   IN  A  93.184.216.34
#                 ^^^^^
#                 TTL: 86400 seconds = 24 hours

# Query again -- the TTL will count down
dig +noall +answer example.com
# example.com.    85200   IN  A  93.184.216.34
#                 ^^^^^
#                 TTL has decreased (time passed since last query)

Why TTL Matters

  • High TTL (e.g., 86400 = 24h): Less DNS traffic, faster resolution from cache. But changes take up to 24 hours to propagate.
  • Low TTL (e.g., 60 = 1 min): Changes propagate quickly. But more DNS queries, slightly slower resolution.

When you are planning a DNS change (like migrating to a new server), lower the TTL days in advance so that when you make the change, the old cached records expire quickly.

Flushing DNS Caches

When you make a DNS change and need it to take effect immediately on your machine:

# systemd-resolved
resolvectl flush-caches

# nscd (Name Service Cache Daemon)
sudo systemctl restart nscd

# dnsmasq (if used as local resolver)
sudo systemctl restart dnsmasq

# There is no universal "flush DNS" on Linux -- it depends on
# what resolver you are running

Debug This

Users report that app.internal.example.com resolves to the wrong IP address on some servers but works correctly on others.

Step 1: Check what each server resolves:

# On the broken server
dig +short app.internal.example.com
# 10.0.0.50  (wrong -- old IP)

# On a working server
dig +short app.internal.example.com
# 10.0.1.100  (correct -- new IP)

Step 2: Check if the answer is cached:

# Query the authoritative server directly
dig +short @ns1.example.com app.internal.example.com
# 10.0.1.100  (authoritative answer is correct)

Step 3: The broken server has a stale cached answer. Check where it gets DNS:

resolvectl status
# Or:
cat /etc/resolv.conf

Step 4: Flush the cache:

resolvectl flush-caches

Step 5: Check /etc/hosts:

grep app.internal /etc/hosts
# 10.0.0.50  app.internal.example.com    <-- Found it!

Someone added a static entry in /etc/hosts months ago and forgot about it. Since files comes before dns in /etc/nsswitch.conf, the hosts file entry takes precedence over DNS.

Fix: Remove the stale entry from /etc/hosts.


What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  DNS translates domain names to IP addresses.                    |
|                                                                   |
|  Resolution path: Local cache -> Recursive resolver ->           |
|    Root server -> TLD server -> Authoritative server             |
|                                                                   |
|  Key record types:                                               |
|    A (IPv4), AAAA (IPv6), CNAME (alias), MX (mail),             |
|    NS (nameserver), TXT (text), PTR (reverse), SOA (admin)      |
|                                                                   |
|  Essential commands:                                              |
|    dig domain          Full DNS query                            |
|    dig +short domain   Just the answer                           |
|    dig +trace domain   Full resolution chain                     |
|    dig @server domain  Query specific server                     |
|                                                                   |
|  Configuration:                                                   |
|    /etc/resolv.conf    DNS server settings                       |
|    /etc/hosts          Static name mappings (checked first)      |
|    /etc/nsswitch.conf  Resolution order                          |
|                                                                   |
|  TTL controls caching. Lower TTL before DNS changes.             |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. Record Type Exploration: Use dig to query every record type (A, AAAA, MX, NS, TXT, SOA) for google.com. What do you learn about Google's infrastructure?

  2. Trace Resolution: Run dig +trace example.com and dig +trace github.com. Compare the resolution paths. How many levels of nameservers does each pass through?

  3. Local Overrides: Add a temporary entry in /etc/hosts that maps testsite.local to 127.0.0.1. Verify it works with ping testsite.local. Then remove it.

  4. DNS Server Comparison: Query the same domain using different DNS servers and compare the results and response times:

    dig @8.8.8.8 example.com       # Google
    dig @1.1.1.1 example.com       # Cloudflare
    dig @9.9.9.9 example.com       # Quad9
    
  5. Reverse DNS: Find the PTR record for several well-known IP addresses:

    dig -x 8.8.8.8
    dig -x 1.1.1.1
    
  6. Bonus Challenge: Set up a local DNS cache using dnsmasq or unbound. Configure your system to use it as the primary resolver. Measure the performance improvement by timing repeated queries with dig before and after caching.

DHCP, ARP & ICMP

Why This Matters

You plug a new server into the network. Within seconds, it has an IP address, a subnet mask, a default gateway, and DNS server addresses. You did not configure any of this. How did it know?

DHCP gave it an IP address. ARP let it find the gateway's MAC address so it could actually send packets. And when you run ping to verify connectivity, that is ICMP doing the work.

These three protocols operate largely behind the scenes, but they are foundational to how networks actually function. When DHCP fails, machines come up with no network configuration (or worse, a link-local address that goes nowhere). When ARP breaks, machines have IP addresses but cannot communicate with anything on their local network. And ICMP is the first tool you reach for when diagnosing any connectivity problem.

Understanding DHCP, ARP, and ICMP transforms you from someone who "just plugs things in and hopes" to someone who understands what happens at every step and can diagnose when things go wrong.


Try This Right Now

# See your current DHCP lease information
cat /var/lib/dhcp/dhclient.leases 2>/dev/null || \
cat /var/lib/dhclient/dhclient.leases 2>/dev/null || \
cat /var/lib/NetworkManager/*.lease 2>/dev/null || \
echo "Check: journalctl -u NetworkManager | grep -i dhcp"

# See your ARP cache (IP-to-MAC address mappings)
ip neigh show

# Send ICMP echo requests (ping)
ping -c 4 8.8.8.8

# Trace the path to a destination using ICMP
tracepath 8.8.8.8

DHCP: Dynamic Host Configuration Protocol

DHCP automatically assigns IP addresses and network configuration to devices. Without DHCP, you would need to manually configure every single device on the network -- every server, every laptop, every phone, every IoT device. That does not scale.

The DORA Process

DHCP uses a four-step process often remembered as DORA: Discover, Offer, Request, Acknowledge.

+------------------------------------------------------------------+
|                    DHCP DORA PROCESS                              |
+------------------------------------------------------------------+
|                                                                   |
|  Client (no IP yet)                    DHCP Server               |
|       |                                     |                     |
|       |  1. DHCP DISCOVER (broadcast)       |                     |
|       |  "Is there a DHCP server out there?"|                     |
|       |  src: 0.0.0.0  dst: 255.255.255.255 |                     |
|       |------------------------------------>|                     |
|       |                                     |                     |
|       |  2. DHCP OFFER                      |                     |
|       |  "I can offer you 192.168.1.100"    |                     |
|       |  + subnet mask, gateway, DNS, lease |                     |
|       |<------------------------------------|                     |
|       |                                     |                     |
|       |  3. DHCP REQUEST (broadcast)        |                     |
|       |  "I'll take 192.168.1.100 please"   |                     |
|       |------------------------------------>|                     |
|       |                                     |                     |
|       |  4. DHCP ACKNOWLEDGE                |                     |
|       |  "Confirmed. It's yours for 24h"    |                     |
|       |<------------------------------------|                     |
|       |                                     |                     |
|  Client now has:                                                  |
|    IP: 192.168.1.100                                              |
|    Mask: 255.255.255.0                                            |
|    Gateway: 192.168.1.1                                           |
|    DNS: 8.8.8.8, 8.8.4.4                                         |
|    Lease: 86400 seconds (24 hours)                                |
|                                                                   |
+------------------------------------------------------------------+

Step 1 - Discover: The client has no IP address, so it sends a broadcast message to the entire local network (destination 255.255.255.255, source 0.0.0.0). "Is there a DHCP server out there?"

Step 2 - Offer: Any DHCP server on the network responds with an offer: an available IP address and all the configuration parameters.

Step 3 - Request: The client broadcasts a request for the offered address. This is broadcast (not unicast) so that if multiple DHCP servers made offers, the others know the client chose a different server.

Step 4 - Acknowledge: The server confirms the lease. The client can now use the IP address for the duration of the lease.

Why Broadcasts?

The Discover and Request are sent as broadcasts because the client does not yet have an IP address -- it cannot send unicast traffic. The DHCP server knows to listen for these broadcasts on UDP port 67, and the client listens on UDP port 68.

DHCP Lease Lifecycle

+------------------------------------------------------------------+
|                   DHCP LEASE TIMELINE                             |
+------------------------------------------------------------------+
|                                                                   |
|  |--- Lease obtained ---|--- 50% (T1) ---|--- 87.5% (T2) ---|  |
|  0                      T1               T2              Expiry  |
|  |                       |                |                  |   |
|  | Using the IP address  | Try to renew   | Try to rebind    |   |
|  | normally              | with same      | with ANY DHCP    |   |
|  |                       | server         | server           |   |
|  |                       | (unicast)      | (broadcast)      |   |
|                                                                   |
+------------------------------------------------------------------+

At T1 (50% of lease duration), the client tries to renew its lease with the same DHCP server using unicast. If successful, the lease timer resets.

At T2 (87.5% of lease duration), if T1 renewal failed, the client broadcasts a request to any DHCP server on the network.

If neither renewal succeeds, the lease expires and the client must start the DORA process again.

DHCP on Linux: dhclient

dhclient is the traditional DHCP client on Linux:

# Request a new DHCP lease
sudo dhclient eth0

# Release the current lease
sudo dhclient -r eth0

# Request a lease with verbose output
sudo dhclient -v eth0

# View the current lease
cat /var/lib/dhcp/dhclient.leases

A lease file looks like:

lease {
  interface "eth0";
  fixed-address 192.168.1.100;
  option subnet-mask 255.255.255.0;
  option routers 192.168.1.1;
  option domain-name-servers 8.8.8.8, 8.8.4.4;
  option domain-name "example.com";
  renew 4 2026/02/22 10:00:00;
  rebind 4 2026/02/22 22:00:00;
  expire 5 2026/02/23 01:00:00;
}

Distro Note: Different distributions use different DHCP clients. Ubuntu 18.04+ uses netplan with systemd-networkd, which has a built-in DHCP client. Fedora and RHEL use NetworkManager, which has its own DHCP handling. The dhclient command may not be present on newer systems. On NetworkManager systems, use nmcli instead:

nmcli device show eth0 | grep -i dhcp
nmcli connection show "Wired connection 1" | grep -i dhcp

Watching DHCP in Real-Time

# Watch DHCP traffic with tcpdump
sudo tcpdump -i eth0 -n port 67 or port 68

# Watch DHCP activity in the system journal
journalctl -f -u NetworkManager | grep -i dhcp
# Or:
journalctl -f -u systemd-networkd | grep -i dhcp

Think About It: What happens when you plug a server into a network with no DHCP server? The client sends Discovers and gets no response. After timing out, many systems auto-configure a "link-local" address in the 169.254.0.0/16 range (APIPA). This address allows local-only communication but cannot reach the internet.


ARP: Address Resolution Protocol

ARP solves a fundamental problem: your machine knows the IP address it wants to reach (Layer 3), but the Ethernet frame needs a MAC address (Layer 2). How does it find the MAC address for a given IP?

How ARP Works

+------------------------------------------------------------------+
|                    ARP RESOLUTION PROCESS                        |
+------------------------------------------------------------------+
|                                                                   |
|  Host A (192.168.1.100)          Host B (192.168.1.200)          |
|  MAC: aa:aa:aa:aa:aa:aa          MAC: bb:bb:bb:bb:bb:bb          |
|       |                                     |                     |
|       |  1. ARP Request (broadcast)         |                     |
|       |  "Who has 192.168.1.200?            |                     |
|       |   Tell 192.168.1.100"               |                     |
|       |------------------------------------>|                     |
|       |  (sent to ff:ff:ff:ff:ff:ff)        |                     |
|       |                                     |                     |
|       |  2. ARP Reply (unicast)             |                     |
|       |  "192.168.1.200 is at               |                     |
|       |   bb:bb:bb:bb:bb:bb"                |                     |
|       |<------------------------------------|                     |
|       |                                     |                     |
|  Host A now knows:                                                |
|    192.168.1.200 -> bb:bb:bb:bb:bb:bb                            |
|  Stores this in its ARP cache.                                    |
|                                                                   |
+------------------------------------------------------------------+

Step 1: Host A needs to send a packet to 192.168.1.200 but only knows the IP address. It broadcasts an ARP request to ff:ff:ff:ff:ff:ff (every device on the LAN): "Who has 192.168.1.200? Tell 192.168.1.100."

Step 2: Host B recognizes its own IP address and replies directly (unicast) to Host A: "192.168.1.200 is at bb:bb:bb:bb:bb:bb."

Host A stores this mapping in its ARP cache and can now send the Ethernet frame.

Important ARP Detail

ARP only works within the same broadcast domain (same subnet / same VLAN). If the destination IP is on a different network, the sending host ARPs for the default gateway's MAC address and sends the packet to the router. The router then handles forwarding it to the correct network.

Host wants to reach 8.8.8.8 (different network):

  1. "8.8.8.8 is not on my local subnet"
  2. "I need to send it to my gateway: 192.168.1.1"
  3. ARP for 192.168.1.1 -> router's MAC address
  4. Send the IP packet (dst: 8.8.8.8) inside an Ethernet
     frame (dst: router's MAC)
  5. Router receives it, looks up the route for 8.8.8.8,
     and forwards it onward

The ARP Cache

Every system maintains an ARP cache -- a table of recently resolved IP-to-MAC mappings:

# View the ARP cache
ip neigh show
192.168.1.1 dev eth0 lladdr aa:bb:cc:dd:ee:ff REACHABLE
192.168.1.50 dev eth0 lladdr 11:22:33:44:55:66 STALE
192.168.1.75 dev eth0  FAILED
StateMeaning
REACHABLERecently confirmed, entry is valid
STALEEntry has not been confirmed recently, may be outdated
DELAYWaiting to confirm the entry
FAILEDARP resolution failed (host did not respond)
PERMANENTManually added static entry
INCOMPLETEARP request sent, no response yet

Managing the ARP Cache

# View the ARP cache
ip neigh show

# Delete a specific entry
sudo ip neigh del 192.168.1.50 dev eth0

# Flush the entire ARP cache
sudo ip neigh flush all

# Add a static ARP entry (permanent)
sudo ip neigh add 192.168.1.10 lladdr 00:11:22:33:44:55 dev eth0

# The older arp command (from net-tools)
arp -n                          # View cache
sudo arp -d 192.168.1.50       # Delete entry
sudo arp -s 192.168.1.10 00:11:22:33:44:55  # Add static entry

Hands-On: Watching ARP

# Terminal 1: Watch ARP traffic
sudo tcpdump -i eth0 -nn arp

# Terminal 2: Flush the cache and ping a local host
sudo ip neigh flush all
ping -c 1 192.168.1.1

# In Terminal 1, you should see:
# ARP, Request who-has 192.168.1.1 tell 192.168.1.100
# ARP, Reply 192.168.1.1 is-at aa:bb:cc:dd:ee:ff

# Check the ARP cache -- the entry should now be there
ip neigh show | grep 192.168.1.1

ARP and arping

arping is a specialized tool that sends ARP requests and measures responses:

# Install arping
# Debian/Ubuntu
sudo apt install arping

# Fedora/RHEL (part of iputils)
sudo dnf install iputils

# Send ARP requests to verify a host is reachable at Layer 2
sudo arping -c 3 192.168.1.1

# Output:
# ARPING 192.168.1.1 from 192.168.1.100 eth0
# Unicast reply from 192.168.1.1 [aa:bb:cc:dd:ee:ff]  0.823ms
# Unicast reply from 192.168.1.1 [aa:bb:cc:dd:ee:ff]  0.721ms
# Unicast reply from 192.168.1.1 [aa:bb:cc:dd:ee:ff]  0.654ms

This is useful when ping fails (maybe ICMP is blocked by a firewall) but you want to verify that a host is physically present on the network.

Think About It: What happens if two devices on the same network have the same IP address? Both will respond to ARP requests, and traffic will be unpredictably delivered to one or the other. This is called an IP conflict and causes intermittent, maddening connectivity issues. Some systems detect this and log warnings.


ICMP: Internet Control Message Protocol

ICMP is a Layer 3 protocol used for diagnostics and error reporting. It is not used to transfer data between applications -- instead, it carries control messages about the network itself.

The two most common ICMP tools you will use daily are ping and traceroute/tracepath.

ICMP Message Types

TypeCodeNameDescription
00Echo ReplyResponse to a ping
30Destination Unreachable: NetworkCannot reach the network
31Destination Unreachable: HostCannot reach the host
33Destination Unreachable: PortNo service on that port (UDP)
313Destination Unreachable: FilteredFirewall blocked it
80Echo RequestPing request
110Time ExceededTTL expired (used by traceroute)

ping: Testing Connectivity

ping sends ICMP Echo Request packets and waits for Echo Reply packets. It is the first tool you reach for when diagnosing connectivity.

# Basic ping
ping 8.8.8.8

# Send exactly 4 pings
ping -c 4 8.8.8.8

# Set a timeout (seconds)
ping -c 4 -W 2 8.8.8.8

# Change the interval between pings (default: 1 second)
ping -c 4 -i 0.5 8.8.8.8

# Set the packet size (default: 64 bytes)
ping -c 4 -s 1400 8.8.8.8

# Ping with a deadline (total seconds to run)
ping -w 5 8.8.8.8

Reading ping Output

ping -c 4 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=12.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=11.8 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=118 time=12.1 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=118 time=11.9 ms

--- 8.8.8.8 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 11.800/12.025/12.300/0.183 ms
FieldMeaning
icmp_seqSequence number (detect out-of-order or lost packets)
ttlTime To Live (how many router hops until the packet is dropped)
timeRound-trip time in milliseconds
packet lossPercentage of pings that did not get a reply
rttRound-trip time statistics: min/average/max/standard deviation

What ping Results Tell You

+-----------------------------------------------------+
|  INTERPRETING PING RESULTS                          |
+-----------------------------------------------------+
|                                                     |
|  All replies, 0% loss, low time:                    |
|    --> Connectivity is good                         |
|                                                     |
|  All replies, 0% loss, high time:                   |
|    --> Network congestion or long path              |
|                                                     |
|  Some replies, >0% loss:                            |
|    --> Packet loss -- network problem, overloaded   |
|        link, or intermittent issue                  |
|                                                     |
|  No replies, 100% loss:                             |
|    --> Host is down, unreachable, or ICMP is        |
|        blocked by a firewall                        |
|                                                     |
|  "Destination Host Unreachable":                    |
|    --> A router on the path cannot reach the host   |
|        (Layer 3 routing issue)                      |
|                                                     |
|  "Request timed out" (no output at all):            |
|    --> Packets are being silently dropped            |
|        (typically a firewall)                       |
|                                                     |
+-----------------------------------------------------+

WARNING: Many servers and firewalls intentionally block ICMP. A host that does not respond to ping is not necessarily down. Always verify with other methods (TCP connection on a known port, for example: nc -zv host 22).

traceroute and tracepath: Tracing the Path

traceroute and tracepath show the route packets take from your machine to a destination. They work by exploiting the IP TTL (Time To Live) field.

How it works:

  1. Send a packet with TTL=1. The first router decrements it to 0 and sends back an ICMP "Time Exceeded" message. Now you know the first hop.
  2. Send a packet with TTL=2. The second router sends back Time Exceeded. Now you know the second hop.
  3. Continue until the destination is reached.
# tracepath (no root required, uses UDP)
tracepath 8.8.8.8

# traceroute (may require root for ICMP mode)
traceroute 8.8.8.8

# traceroute using TCP (useful when ICMP is blocked)
sudo traceroute -T -p 443 8.8.8.8

# traceroute using ICMP
sudo traceroute -I 8.8.8.8

Reading tracepath Output

tracepath 8.8.8.8
 1?: [LOCALHOST]                        pmtu 1500
 1:  gateway                                         0.321ms
 1:  gateway                                         0.262ms
 2:  10.0.0.1                                        1.234ms asymm  3
 3:  192.168.100.1                                    5.678ms
 4:  isp-router.example.net                          10.234ms
 5:  72.14.236.216                                   11.567ms reached
     Resume: pmtu 1500 hops 5 back 5
FieldMeaning
Hop numberDistance in routers from you
Hostname/IPThe router at that hop
TimeRound-trip time to that hop
* * *No response (router does not reply or ICMP is blocked)
asymmAsymmetric route (return path is different)

Hands-On: Comparing Paths

# Trace to different destinations and compare paths
tracepath 8.8.8.8        # Google DNS
tracepath 1.1.1.1        # Cloudflare DNS
tracepath github.com     # GitHub

# Notice where the paths diverge -- that's where the internet
# exchanges traffic between different networks

Distro Note: tracepath is included in iputils on most distributions. traceroute is a separate package: sudo apt install traceroute (Debian/Ubuntu) or sudo dnf install traceroute (Fedora/RHEL). If neither is available, mtr is an excellent alternative that combines ping and traceroute: sudo apt install mtr-tiny.


Practical Troubleshooting: Putting It All Together

Here is a systematic approach using all three protocols:

Scenario: New Server Cannot Reach the Internet

# Step 1: Did we get an IP from DHCP?
ip addr show eth0
# Look for an "inet" line
# If 169.254.x.x --> DHCP failed (link-local address)
# If no "inet" --> no IP at all

# Step 2: If no DHCP, try requesting manually
sudo dhclient -v eth0
# Watch the DORA process. Does it get an offer?

# Step 3: Can we reach the local network? (ARP test)
ip route show   # What is the gateway?
sudo arping -c 3 192.168.1.1   # Can we reach it at Layer 2?

# Step 4: Can we ping the gateway? (ICMP test)
ping -c 3 192.168.1.1

# Step 5: Can we reach the internet?
ping -c 3 8.8.8.8

# Step 6: Where does the path break?
tracepath 8.8.8.8
# Look for where the responses stop

# Step 7: Can we resolve DNS?
dig google.com
# If ping by IP works but DNS fails, the problem is DNS

Scenario: Intermittent Connectivity

# Long-running ping to detect packet loss patterns
ping -c 100 192.168.1.1

# Use mtr for continuous traceroute (combines ping + traceroute)
mtr 8.8.8.8

# Check ARP -- is the MAC address flapping?
watch -n 1 'ip neigh show | grep 192.168.1.1'
# If the MAC address keeps changing, there might be a duplicate IP
# or a network loop

Scenario: Can Ping by IP, Cannot Resolve Names

# Verify DNS is the issue
ping -c 1 8.8.8.8           # Works
ping -c 1 google.com        # Fails

# Check DNS configuration
cat /etc/resolv.conf
# Is there a nameserver configured?

# Test DNS directly
dig @8.8.8.8 google.com     # Does a known-good DNS server work?
dig @$(grep nameserver /etc/resolv.conf | head -1 | awk '{print $2}') google.com
# Does the configured DNS server work?

# If the configured server fails, fix /etc/resolv.conf or
# check if the DNS server is reachable
ping -c 1 $(grep nameserver /etc/resolv.conf | head -1 | awk '{print $2}')

Debug This

A newly provisioned virtual machine has an IP address (192.168.1.50/24) and can ping itself and the gateway (192.168.1.1). But it cannot ping any other machine on the same subnet (e.g., 192.168.1.100).

ping -c 3 192.168.1.1       # Works
ping -c 3 192.168.1.100     # "Destination Host Unreachable"
ip neigh show                # 192.168.1.100  FAILED

Diagnosis:

The ARP resolution for 192.168.1.100 is failing. The VM sends ARP requests but gets no reply. Possible causes:

  1. The target host is down -- Verify that 192.168.1.100 is actually powered on and has its interface up.

  2. VLAN mismatch -- The VM and the target might be on different VLANs, even though they have addresses in the same subnet. VLANs create separate Layer 2 broadcast domains.

  3. VM network mode -- If this is a VM, check the virtual switch configuration. The VM might be in "NAT" mode (its own private network) instead of "bridged" mode (shared network with the host).

  4. Firewall blocking ARP -- Unusual, but some host firewalls or security tools can block ARP. Check iptables -L and any bridge-level filtering (ebtables).

Resolution:

# Check from the target side -- can it see the ARP request?
# On 192.168.1.100:
sudo tcpdump -i eth0 arp

# Check the VM's virtual switch settings
# (depends on your hypervisor: VirtualBox, KVM, VMware)

# If VLAN mismatch, check the switch port configuration
# or the virtual switch VLAN tagging

What Just Happened?

+------------------------------------------------------------------+
|                        CHAPTER RECAP                              |
+------------------------------------------------------------------+
|                                                                   |
|  DHCP: Automatic IP configuration using DORA process.            |
|    Discover -> Offer -> Request -> Acknowledge                   |
|    Lease-based: IP is "rented" for a specific duration.          |
|    Renewal at 50% (T1) and 87.5% (T2) of lease time.            |
|                                                                   |
|  ARP: Resolves IP addresses to MAC addresses on the LAN.        |
|    Broadcast request, unicast reply.                             |
|    ARP cache: ip neigh show                                      |
|    arping: test Layer 2 reachability                             |
|                                                                   |
|  ICMP: Network diagnostics and error reporting.                  |
|    ping: test host reachability and latency                      |
|    tracepath/traceroute: discover the network path               |
|    ICMP types: Echo (ping), Destination Unreachable,             |
|    Time Exceeded (traceroute)                                    |
|                                                                   |
|  Troubleshooting order:                                          |
|    DHCP (do I have an IP?) -> ARP (can I reach local hosts?)    |
|    -> ICMP (can I reach remote hosts?) -> DNS (can I resolve?)  |
|                                                                   |
+------------------------------------------------------------------+

Try This

  1. DHCP Observation: On a test machine or VM, release your DHCP lease and watch the DORA process in real-time:

    # Terminal 1: Watch DHCP traffic
    sudo tcpdump -i eth0 -nn port 67 or port 68
    
    # Terminal 2: Release and renew
    sudo dhclient -r eth0   # Release
    sudo dhclient -v eth0   # Request new lease
    
  2. ARP Exploration: Flush your ARP cache, then ping several machines on your local network. Watch the ARP cache populate:

    sudo ip neigh flush all
    ping -c 1 192.168.1.1
    ip neigh show
    ping -c 1 192.168.1.100
    ip neigh show
    
  3. Ping Patterns: Run a long-duration ping to your gateway and to an internet host simultaneously. Compare packet loss and latency:

    ping -c 100 192.168.1.1 > /tmp/gateway-ping.txt &
    ping -c 100 8.8.8.8 > /tmp/internet-ping.txt &
    wait
    tail -3 /tmp/gateway-ping.txt
    tail -3 /tmp/internet-ping.txt
    
  4. Tracepath Investigation: Run tracepath to three different destinations on different continents. Compare the number of hops and latency at each hop. Where does the latency jump significantly? (Usually at intercontinental links.)

  5. arping vs ping: Find a host on your network that responds to ping but not to arping, or vice versa. What does this tell you about the host's firewall configuration?

  6. Bonus Challenge: Set up a small DHCP server using dnsmasq on a test network. Configure it to hand out addresses in a specific range with a custom lease time, gateway, and DNS server. Connect a client and verify the full DORA process with tcpdump.

    # Example dnsmasq config (/etc/dnsmasq.conf):
    # interface=eth1
    # dhcp-range=10.0.0.100,10.0.0.200,255.255.255.0,1h
    # dhcp-option=3,10.0.0.1     # Gateway
    # dhcp-option=6,8.8.8.8      # DNS server
    

Configuring Network Interfaces

Why This Matters

You have just been handed SSH credentials to a brand-new bare-metal server in your company's data center. The machine boots, but it has no IP address. Nobody can reach it, and it cannot reach the internet. Your job is to bring it onto the network -- assign an IP, set a default gateway, configure DNS, and make sure everything survives a reboot.

This is the bread and butter of Linux system administration. Every single server, VM, container host, or IoT device you will ever touch needs its network configured. Whether you are setting up a DHCP-based laptop or a multi-homed production server with VLANs and bonded interfaces, the tools in this chapter are what you will reach for.


Try This Right Now

Open a terminal on any Linux machine and run:

# Show all network interfaces and their addresses
ip addr show

# Show only interfaces that are UP
ip link show up

# Show the routing table
ip route show

You should see at least two interfaces: lo (the loopback) and something like eth0, ens33, enp0s3, or wlp2s0 (your real network interface). Take note of your interface name -- you will need it throughout this chapter.


The ip Command: Your Primary Tool

The ip command from the iproute2 package is the modern, standard way to configure networking on Linux. It replaced the older ifconfig, route, and arp commands. If you only learn one networking tool, make it ip.

The ip command is organized into objects:

ip <object> <command>

Objects:
  addr     - IP addresses
  link     - Network interfaces (layer 2)
  route    - Routing table
  neigh    - ARP / neighbor cache
  netns    - Network namespaces

Viewing Interface Information

# Full details of all interfaces
ip addr show

# Short form
ip a

# Just one interface
ip addr show dev eth0

# Only show IPv4 addresses
ip -4 addr show

# Only show IPv6 addresses
ip -6 addr show

# Machine-readable JSON output
ip -j addr show | python3 -m json.tool

Here is what typical ip addr show output looks like:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:a1:b2:c3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.100/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fea1:b2c3/64 scope link
       valid_lft forever preferred_lft forever

Let's decode each piece:

+------------------------------------------------------------------+
| Field                        | Meaning                           |
|------------------------------|-----------------------------------|
| eth0                         | Interface name                    |
| <BROADCAST,MULTICAST,UP,     | Interface flags                   |
|  LOWER_UP>                   | UP=admin up, LOWER_UP=cable in    |
| mtu 1500                     | Maximum Transmission Unit         |
| state UP                     | Operational state                 |
| link/ether 52:54:00:a1:b2:c3| MAC address                       |
| inet 192.168.1.100/24        | IPv4 address with prefix length   |
| brd 192.168.1.255            | Broadcast address                 |
| scope global                 | Address is globally reachable     |
| inet6 fe80::...              | IPv6 link-local address           |
+------------------------------------------------------------------+
# Bring an interface down
sudo ip link set eth0 down

# Bring it back up
sudo ip link set eth0 up

# Change the MTU
sudo ip link set eth0 mtu 9000

# Change the MAC address (interface must be down)
sudo ip link set eth0 down
sudo ip link set eth0 address 02:00:00:00:00:01
sudo ip link set eth0 up

# Show interface statistics
ip -s link show eth0

Safety Warning: Running ip link set eth0 down on a remote server over SSH through that same interface will immediately disconnect you. Always ensure you have out-of-band access (console, IPMI, or a second interface) before taking an interface down remotely.

Managing IP Addresses with ip addr

# Add an IP address to an interface
sudo ip addr add 192.168.1.100/24 dev eth0

# Add a second IP address (yes, one interface can have multiple)
sudo ip addr add 192.168.1.101/24 dev eth0

# Remove an IP address
sudo ip addr del 192.168.1.101/24 dev eth0

# Flush ALL addresses from an interface
sudo ip addr flush dev eth0

Safety Warning: ip addr flush will remove ALL IP addresses from the interface. On a remote machine, this is just as dangerous as taking the interface down.

Managing Routes with ip route

# View the routing table
ip route show

# Add a default gateway
sudo ip route add default via 192.168.1.1

# Add a specific route
sudo ip route add 10.0.0.0/8 via 192.168.1.254

# Delete a route
sudo ip route del 10.0.0.0/8

# Replace a route (add or update)
sudo ip route replace default via 192.168.1.1

Think About It: You have added an IP address and a default gateway with ip commands. You reboot the server. Are those settings still there? Why or why not?

The answer is no. Everything done with the ip command is temporary. It lives in the kernel's memory and vanishes on reboot. To make changes permanent, you need a configuration file or a network management daemon. That is what the rest of this chapter covers.


The Legacy: ifconfig

You will still see ifconfig in older scripts, blog posts, and Stack Overflow answers. It comes from the net-tools package and is considered deprecated.

# View interfaces (legacy)
ifconfig

# Assign an IP (legacy)
sudo ifconfig eth0 192.168.1.100 netmask 255.255.255.0 up

Why you should use ip instead:

  • ifconfig cannot show all addresses on an interface (only the primary)
  • ifconfig cannot handle advanced features like policy routing or network namespaces
  • ifconfig is not installed by default on many modern distributions
  • The iproute2 suite (ip) is actively maintained; net-tools is not

Distro Note: On minimal installs of RHEL/CentOS, Fedora, Debian, and Ubuntu Server, ifconfig may not even be present. The ip command is always available.


NetworkManager: The Desktop and Server Standard

NetworkManager is the most widely used network management daemon on Linux today. It is the default on Fedora, RHEL, CentOS, Ubuntu Desktop, and many others. It handles wired, wireless, VPN, and mobile broadband connections.

+-------------------------------------------------------+
|                    NetworkManager                      |
|                                                       |
|  nmcli   nmtui   GNOME Settings   nm-connection-editor|
|    |        |          |                  |            |
|    +--------+----------+------------------+            |
|                    |                                   |
|            NetworkManager daemon                       |
|                    |                                   |
|            Kernel networking stack                     |
+-------------------------------------------------------+

nmcli: The Command-Line Interface

nmcli is the most powerful way to interact with NetworkManager from the terminal.

# Show overall status
nmcli general status

# List all connections
nmcli connection show

# Show active connections
nmcli connection show --active

# Show details of a specific connection
nmcli connection show "Wired connection 1"

# List all devices
nmcli device status

Creating a Static IP Connection

# Create a new connection with a static IP
sudo nmcli connection add \
  con-name "static-eth0" \
  type ethernet \
  ifname eth0 \
  ipv4.addresses 192.168.1.100/24 \
  ipv4.gateway 192.168.1.1 \
  ipv4.dns "8.8.8.8 8.8.4.4" \
  ipv4.method manual

# Activate it
sudo nmcli connection up "static-eth0"

Switching to DHCP

# Create a DHCP connection
sudo nmcli connection add \
  con-name "dhcp-eth0" \
  type ethernet \
  ifname eth0 \
  ipv4.method auto

# Activate it
sudo nmcli connection up "dhcp-eth0"

Modifying an Existing Connection

# Change the DNS servers
sudo nmcli connection modify "static-eth0" ipv4.dns "1.1.1.1 9.9.9.9"

# Add a secondary IP address
sudo nmcli connection modify "static-eth0" +ipv4.addresses 192.168.1.101/24

# Apply changes without taking the connection down
sudo nmcli connection up "static-eth0"

Quick Cheat Sheet

# Bring a connection down
sudo nmcli connection down "static-eth0"

# Delete a connection
sudo nmcli connection delete "static-eth0"

# Set a connection to auto-connect on boot
sudo nmcli connection modify "static-eth0" connection.autoconnect yes

# Show the wifi networks (on a laptop)
nmcli device wifi list

# Connect to wifi
sudo nmcli device wifi connect "MyNetwork" password "secret123"

nmtui: The Text User Interface

If you prefer a visual, menu-driven approach in the terminal:

sudo nmtui

This launches a curses-based interface where you can:

  • Edit a connection
  • Activate a connection
  • Set the system hostname

It is perfect for quick configuration when you do not want to remember nmcli syntax.


systemd-networkd: Lightweight Network Configuration

On servers, containers, and embedded systems, systemd-networkd is a lighter alternative to NetworkManager. It is part of systemd and uses simple .network configuration files.

Enabling systemd-networkd

# If NetworkManager is running, disable it first to avoid conflicts
sudo systemctl stop NetworkManager
sudo systemctl disable NetworkManager

# Enable systemd-networkd and systemd-resolved
sudo systemctl enable --now systemd-networkd
sudo systemctl enable --now systemd-resolved

Configuration Files

Configuration lives in /etc/systemd/network/. Files are processed in alphabetical order and use the .network extension.

Static IP Configuration

Create /etc/systemd/network/20-wired.network:

[Match]
Name=eth0

[Network]
Address=192.168.1.100/24
Gateway=192.168.1.1
DNS=8.8.8.8
DNS=8.8.4.4

DHCP Configuration

Create /etc/systemd/network/20-wired.network:

[Match]
Name=eth0

[Network]
DHCP=yes

After creating or modifying files:

# Reload and apply
sudo networkctl reload

# Check status
networkctl status eth0
networkctl list

Distro Note: Ubuntu Server (since 17.10) uses Netplan, which is a YAML-based abstraction layer that can generate configuration for either NetworkManager or systemd-networkd. We cover Netplan shortly.


Debian-style: /etc/network/interfaces

On Debian and older Ubuntu systems (before Netplan), the classic configuration file is /etc/network/interfaces. It is managed by the ifupdown package.

Static IP

# /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
    address 192.168.1.100
    netmask 255.255.255.0
    gateway 192.168.1.1
    dns-nameservers 8.8.8.8 8.8.4.4

DHCP

auto eth0
iface eth0 inet dhcp

Applying Changes

# Bring an interface down and back up
sudo ifdown eth0 && sudo ifup eth0

# Or restart the networking service
sudo systemctl restart networking

RHEL-style: Network Scripts (Legacy)

On older RHEL, CentOS (7 and earlier), and Fedora systems, network configuration lived in per-interface scripts under /etc/sysconfig/network-scripts/.

Static IP

# /etc/sysconfig/network-scripts/ifcfg-eth0

TYPE=Ethernet
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.1.100
PREFIX=24
GATEWAY=192.168.1.1
DNS1=8.8.8.8
DNS2=8.8.4.4

DHCP

# /etc/sysconfig/network-scripts/ifcfg-eth0

TYPE=Ethernet
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes

Applying Changes

sudo systemctl restart network
# or for a single interface:
sudo ifdown eth0 && sudo ifup eth0

Distro Note: RHEL 9 and later have removed legacy network scripts entirely. NetworkManager with nmcli is the only supported method. The ifcfg files may still work (NetworkManager reads them), but the new keyfile format in /etc/NetworkManager/system-connections/ is preferred.


Netplan (Ubuntu)

Ubuntu Server uses Netplan as an abstraction layer. You write YAML files, and Netplan generates the backend configuration.

Configuration lives in /etc/netplan/. The file is usually named something like 01-netcfg.yaml or 50-cloud-init.yaml.

Static IP

# /etc/netplan/01-netcfg.yaml
network:
  version: 2
  renderer: networkd       # or NetworkManager
  ethernets:
    eth0:
      addresses:
        - 192.168.1.100/24
      routes:
        - to: default
          via: 192.168.1.1
      nameservers:
        addresses:
          - 8.8.8.8
          - 8.8.4.4

DHCP

network:
  version: 2
  ethernets:
    eth0:
      dhcp4: true

Applying Changes

# Test the configuration (auto-reverts after 120 seconds if you don't confirm)
sudo netplan try

# Apply permanently
sudo netplan apply

# Generate backend config without applying
sudo netplan generate

The netplan try command is brilliant for remote servers -- if your configuration is broken, it will automatically revert, saving you from being locked out.


Static vs DHCP: When to Use Which

+--------------------------------------------------------------+
|              Static IP                |      DHCP             |
|---------------------------------------|---------------------- |
| Servers                               | Desktops/laptops      |
| Network infrastructure                | Guest networks        |
| DNS servers                           | Development VMs       |
| Load balancers                        | IoT devices (some)    |
| Database servers                      | Containers (often)    |
|                                       |                       |
| You control the exact address         | Address assigned       |
| Survives DHCP server outages          |   automatically       |
| Required for services others connect  | Less config to manage |
|   to by IP                            | Easy to move between  |
|                                       |   networks            |
+--------------------------------------------------------------+

Think About It: Your company has a DHCP server that hands out addresses in the range 192.168.1.100-200. You manually assign 192.168.1.150 as a static IP to your new server. What could go wrong?

The DHCP server might hand out 192.168.1.150 to another machine, creating an IP conflict. The fix is to either use a static IP outside the DHCP range or create a DHCP reservation for your server's MAC address.


Hands-On: Configure a Network Interface from Scratch

Let's walk through configuring a static IP using nmcli, which works across most modern distributions.

Step 1: Identify your interface

nmcli device status

Expected output:

DEVICE  TYPE      STATE         CONNECTION
eth0    ethernet  connected     Wired connection 1
lo      loopback  unmanaged     --

Step 2: Create a new connection profile

sudo nmcli connection add \
  con-name "my-static" \
  type ethernet \
  ifname eth0 \
  ipv4.addresses 10.0.0.50/24 \
  ipv4.gateway 10.0.0.1 \
  ipv4.dns "1.1.1.1" \
  ipv4.method manual \
  connection.autoconnect yes

Step 3: Activate the connection

sudo nmcli connection up "my-static"

Step 4: Verify

ip addr show eth0
ip route show
cat /etc/resolv.conf
ping -c 3 1.1.1.1

Step 5: Test DNS resolution

ping -c 3 google.com

If the ping to 1.1.1.1 works but google.com does not resolve, your DNS configuration needs fixing.


VLANs: Virtual LANs

VLANs let you segment a single physical network into multiple logical networks. On Linux, you create a VLAN sub-interface tagged with a VLAN ID.

# Using ip command (temporary)
sudo ip link add link eth0 name eth0.100 type vlan id 100
sudo ip addr add 10.100.0.1/24 dev eth0.100
sudo ip link set eth0.100 up

# Using nmcli (persistent)
sudo nmcli connection add \
  con-name "vlan100" \
  type vlan \
  ifname eth0.100 \
  dev eth0 \
  id 100 \
  ipv4.addresses 10.100.0.1/24 \
  ipv4.method manual

To verify:

ip -d link show eth0.100
cat /proc/net/vlan/eth0.100

Network Bonding: Combining Interfaces

Bonding (also called NIC teaming) combines two or more physical interfaces into one logical interface for redundancy or increased throughput.

+-------------------+
|    bond0          |  <-- logical interface (192.168.1.100)
|   /       \       |
| eth0     eth1     |  <-- physical interfaces
+-------------------+

Common bonding modes:

ModeNameDescription
0balance-rrRound-robin, requires switch support
1active-backupOne active, others standby (most common)
2balance-xorXOR-based hash
4802.3adLACP, requires switch support
6balance-albAdaptive load balancing

Creating a Bond with nmcli

# Create the bond
sudo nmcli connection add \
  con-name "bond0" \
  type bond \
  ifname bond0 \
  bond.options "mode=active-backup,miimon=100" \
  ipv4.addresses 192.168.1.100/24 \
  ipv4.gateway 192.168.1.1 \
  ipv4.method manual

# Add slave interfaces
sudo nmcli connection add \
  con-name "bond0-slave1" \
  type ethernet \
  ifname eth0 \
  master bond0

sudo nmcli connection add \
  con-name "bond0-slave2" \
  type ethernet \
  ifname eth1 \
  master bond0

# Activate
sudo nmcli connection up bond0

To check bond status:

cat /proc/net/bonding/bond0

Debug This

Scenario: A junior admin reports that a newly deployed server has no network connectivity. They show you this:

$ ip addr show eth0
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:a1:b2:c3 brd ff:ff:ff:ff:ff:ff

$ ip route show
(empty output)

What are the problems? How do you fix them?

Diagnosis:

  1. The interface is DOWN -- notice the flags show <BROADCAST,MULTICAST> but no UP or LOWER_UP. Also, state DOWN and qdisc noop (no queuing discipline assigned).
  2. There is no IP address assigned.
  3. There is no routing table -- no default gateway.

Fix:

# Step 1: Bring the interface up
sudo ip link set eth0 up

# Step 2: Assign an IP address
sudo ip addr add 192.168.1.100/24 dev eth0

# Step 3: Add a default gateway
sudo ip route add default via 192.168.1.1

# Step 4: Make it permanent (using nmcli)
sudo nmcli connection add con-name "eth0-static" type ethernet ifname eth0 \
  ipv4.addresses 192.168.1.100/24 ipv4.gateway 192.168.1.1 \
  ipv4.dns "8.8.8.8" ipv4.method manual
sudo nmcli connection up "eth0-static"

What Just Happened?

+-------------------------------------------------------------------+
|                     Chapter 33 Recap                               |
+-------------------------------------------------------------------+
|                                                                   |
|  * `ip` is the modern tool for viewing and configuring            |
|    interfaces, addresses, and routes. Changes are temporary.      |
|                                                                   |
|  * NetworkManager (`nmcli`, `nmtui`) makes configuration          |
|    persistent and handles complex setups. Default on most distros.|
|                                                                   |
|  * systemd-networkd is a lighter daemon controlled via            |
|    .network files in /etc/systemd/network/.                       |
|                                                                   |
|  * Distro-specific methods:                                       |
|    - Debian: /etc/network/interfaces                              |
|    - RHEL legacy: /etc/sysconfig/network-scripts/                 |
|    - Ubuntu modern: Netplan (/etc/netplan/*.yaml)                 |
|                                                                   |
|  * Static IPs are for servers and infrastructure.                 |
|    DHCP is for clients and dynamic environments.                  |
|                                                                   |
|  * VLANs segment traffic on a single physical interface.          |
|  * Bonding combines multiple interfaces for redundancy.           |
|                                                                   |
+-------------------------------------------------------------------+

Try This

  1. Basic configuration: Using nmcli, create a connection profile with a static IP of your choice. Verify it works with ping, then switch the same interface to DHCP.

  2. Multi-address: Add three IP addresses to a single interface. Verify all three respond to ping from another machine on the same network.

  3. Explore your system: Run nmcli connection show and examine every field in one of your connection profiles. Identify at least five settings you did not know about.

  4. Compare methods: Configure the same static IP using ip commands, then using nmcli, then using a Netplan YAML file (on Ubuntu) or a manual config file for your distribution. Reboot after each method and verify which ones survive.

  5. Bonus challenge: Set up a VLAN sub-interface on your machine. You will need a switch or virtual network that supports VLAN tagging. Verify the VLAN interface gets its own IP address in a different subnet than the parent interface.

Firewalls: iptables & nftables

Why This Matters

You have just deployed a web server. It is serving pages beautifully on port 80 and 443. But it is also running MySQL on port 3306, an unprotected Redis on port 6379, and SSH on port 22 -- all exposed to the entire internet. Within hours, bots will find the open ports. Within days, someone will brute-force your SSH or exploit your unprotected Redis.

A firewall is the gatekeeper that decides which network traffic is allowed in, allowed out, and allowed to pass through your machine. On Linux, that gatekeeper lives inside the kernel itself, and it is called Netfilter. Everything in this chapter -- iptables, nftables, ufw, firewalld -- is just a different way to talk to Netfilter.


Try This Right Now

# Check if iptables rules exist
sudo iptables -L -n -v

# Check if nftables rules exist
sudo nft list ruleset

# Check if ufw is active (Ubuntu/Debian)
sudo ufw status

# Check if firewalld is active (RHEL/Fedora)
sudo firewall-cmd --state

At least one of these should show you the current firewall state of your system. If all of them return empty results or "inactive," your machine currently has no firewall rules -- all traffic is allowed.


The Netfilter Framework

Netfilter is the packet-filtering framework built into the Linux kernel. It provides hooks at various points in the networking stack where code can inspect and manipulate packets.

                        Incoming Packet
                              |
                              v
                      +---------------+
                      |  PREROUTING   |  (NAT, mangle)
                      +-------+-------+
                              |
                    +----Is it for us?----+
                    |                     |
                    v                     v
            +-------+-------+    +-------+-------+
            |    INPUT      |    |    FORWARD     |
            | (filter)      |    | (filter)       |
            +-------+-------+    +-------+-------+
                    |                     |
                    v                     v
             Local Process         +-----+------+
                    |               | POSTROUTING|
                    v               |  (NAT)     |
            +-------+-------+      +-----+------+
            |    OUTPUT     |            |
            | (filter, NAT) |            v
            +-------+-------+     Outgoing Packet
                    |
                    v
            +-------+-------+
            | POSTROUTING   |
            |  (NAT)        |
            +-------+-------+
                    |
                    v
             Outgoing Packet

The five hooks (also called chains in iptables) are:

ChainWhen it fires
PREROUTINGAs soon as a packet arrives, before routing decision
INPUTPacket is destined for this machine
FORWARDPacket is passing through this machine to another
OUTPUTPacket originates from this machine
POSTROUTINGJust before a packet leaves this machine

iptables: The Classic Firewall

iptables has been the standard Linux firewall tool since 2001. Even though nftables is its successor, iptables remains widely deployed and its concepts are essential knowledge.

Tables and Chains

iptables organizes rules into tables, each containing chains:

+------------------------------------------------------------------+
|  Table      | Purpose                  | Chains                  |
|-------------|--------------------------|-------------------------|
|  filter     | Packet filtering         | INPUT, FORWARD, OUTPUT  |
|             | (default table)          |                         |
|  nat        | Network Address          | PREROUTING, OUTPUT,     |
|             | Translation              | POSTROUTING             |
|  mangle     | Packet alteration        | All five chains         |
|  raw        | Connection tracking      | PREROUTING, OUTPUT      |
|             | exemptions               |                         |
+------------------------------------------------------------------+

Most of the time you will work with the filter table (the default) and occasionally the nat table.

Rule Anatomy

Every iptables rule has this structure:

iptables -t <table> -A <chain> <match-conditions> -j <target>

Where:

  • -t <table>: Which table (default: filter)
  • -A <chain>: Append to which chain
  • Match conditions: What to match (source, destination, port, protocol, etc.)
  • -j <target>: What to do with matching packets

Common targets:

TargetAction
ACCEPTAllow the packet through
DROPSilently discard the packet (sender gets no response)
REJECTDiscard and send an error back (connection refused)
LOGLog the packet, then continue processing the next rule
MASQUERADEReplace source IP with the outgoing interface's IP (NAT)

Viewing Current Rules

# List all rules in the filter table with line numbers
sudo iptables -L -n -v --line-numbers

# List rules in the nat table
sudo iptables -t nat -L -n -v

# List rules in the raw iptables format (best for scripts)
sudo iptables-save

The -n flag prevents DNS lookups (much faster), and -v shows packet/byte counters.

Essential iptables Examples

Allow established connections (stateful firewall)

This is almost always the first rule you should add:

sudo iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

This allows return traffic for connections your machine initiated, and related traffic like ICMP error messages.

Allow SSH

sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT

Allow HTTP and HTTPS

sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT

Allow loopback traffic

sudo iptables -A INPUT -i lo -j ACCEPT

Allow ping (ICMP)

sudo iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT

Allow SSH only from a specific network

sudo iptables -A INPUT -p tcp --dport 22 -s 10.0.0.0/8 -j ACCEPT

Drop everything else (default deny)

sudo iptables -P INPUT DROP
sudo iptables -P FORWARD DROP
sudo iptables -P OUTPUT ACCEPT

Safety Warning: Setting the INPUT policy to DROP before adding an ACCEPT rule for SSH will immediately lock you out of a remote server. Always add your allow rules first, then set the default policy.

Think About It: Why do we set OUTPUT policy to ACCEPT instead of DROP? Think about what would break if outgoing traffic were blocked by default.

If you set OUTPUT to DROP, your server could not make DNS queries, download package updates, reach NTP servers, or initiate any outgoing connection. While a DROP OUTPUT policy is more secure, it requires carefully whitelisting every outbound connection your server needs, which is difficult to maintain.

A Complete iptables Setup for a Web Server

#!/bin/bash
# Flush existing rules
sudo iptables -F
sudo iptables -t nat -F

# Default policies
sudo iptables -P INPUT DROP
sudo iptables -P FORWARD DROP
sudo iptables -P OUTPUT ACCEPT

# Allow loopback
sudo iptables -A INPUT -i lo -j ACCEPT

# Allow established and related connections
sudo iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Allow SSH (from management network only)
sudo iptables -A INPUT -p tcp --dport 22 -s 10.0.0.0/24 -j ACCEPT

# Allow HTTP and HTTPS from anywhere
sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Allow ping
sudo iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT

# Log dropped packets (optional, useful for debugging)
sudo iptables -A INPUT -j LOG --log-prefix "iptables-dropped: " --log-level 4

Inserting, Deleting, and Replacing Rules

# Insert a rule at position 1 (top of chain)
sudo iptables -I INPUT 1 -p tcp --dport 8080 -j ACCEPT

# Delete a rule by specification
sudo iptables -D INPUT -p tcp --dport 8080 -j ACCEPT

# Delete a rule by line number
sudo iptables -D INPUT 3

# Replace a rule at position 2
sudo iptables -R INPUT 2 -p tcp --dport 8443 -j ACCEPT

Saving and Restoring Rules

iptables rules are stored in kernel memory and are lost on reboot. You must save them explicitly.

# Save current rules to a file
sudo iptables-save > /etc/iptables/rules.v4

# Restore rules from a file
sudo iptables-restore < /etc/iptables/rules.v4

Distro Note:

  • Debian/Ubuntu: Install iptables-persistent to auto-load rules on boot: sudo apt install iptables-persistent Rules are stored in /etc/iptables/rules.v4 and /etc/iptables/rules.v6.

  • RHEL/CentOS 7: Use systemctl enable iptables and save with sudo service iptables save. Rules go to /etc/sysconfig/iptables.

  • RHEL/CentOS 8+, Fedora: firewalld is the default; if you use raw iptables, disable firewalld first.


nftables: The Modern Successor

nftables replaces iptables, ip6tables, arptables, and ebtables with a single unified framework. It is the default on Debian 10+, RHEL 8+, and Fedora.

Why nftables?

  • Single tool for IPv4, IPv6, ARP, and bridging (no more separate ip6tables)
  • Better syntax -- more readable and consistent
  • Better performance -- rules are compiled into a virtual machine in the kernel
  • Atomic rule updates -- load an entire ruleset at once, not rule by rule
  • Built-in sets and maps -- native support for collections of addresses/ports

nft Basics

# List the entire ruleset
sudo nft list ruleset

# List tables
sudo nft list tables

# Flush all rules
sudo nft flush ruleset

nftables Structure

nftables uses a hierarchy: tables contain chains, which contain rules.

ruleset
  +-- table inet filter
        +-- chain input
        |     +-- rule: accept established
        |     +-- rule: accept ssh
        |     +-- rule: drop
        +-- chain forward
        +-- chain output

The inet family handles both IPv4 and IPv6 simultaneously.

Creating Rules with nft

# Create a table
sudo nft add table inet filter

# Create a chain (base chain attached to a hook)
sudo nft add chain inet filter input { type filter hook input priority 0 \; policy drop \; }

# Allow loopback
sudo nft add rule inet filter input iif lo accept

# Allow established connections
sudo nft add rule inet filter input ct state established,related accept

# Allow SSH
sudo nft add rule inet filter input tcp dport 22 accept

# Allow HTTP and HTTPS
sudo nft add rule inet filter input tcp dport { 80, 443 } accept

# Allow ping
sudo nft add rule inet filter input icmp type echo-request accept
sudo nft add rule inet filter input icmpv6 type echo-request accept

Notice how nftables uses sets ({ 80, 443 }) natively -- no need for multiport match extensions like in iptables.

A Complete nftables Configuration File

The cleanest way to use nftables is with a configuration file. Create /etc/nftables.conf:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;

        # Allow loopback
        iif lo accept

        # Allow established/related connections
        ct state established,related accept

        # Drop invalid connections
        ct state invalid drop

        # Allow ICMP and ICMPv6
        ip protocol icmp accept
        ip6 nexthdr icmpv6 accept

        # Allow SSH from management network
        ip saddr 10.0.0.0/24 tcp dport 22 accept

        # Allow HTTP and HTTPS
        tcp dport { 80, 443 } accept

        # Log everything else
        log prefix "nftables-dropped: " counter drop
    }

    chain forward {
        type filter hook forward priority 0; policy drop;
    }

    chain output {
        type filter hook output priority 0; policy accept;
    }
}

Load it:

sudo nft -f /etc/nftables.conf

# Enable nftables service to load on boot
sudo systemctl enable nftables

Named Sets

One of the most powerful nftables features is named sets:

table inet filter {
    set allowed_ssh {
        type ipv4_addr
        elements = { 10.0.0.5, 10.0.0.10, 192.168.1.0/24 }
    }

    chain input {
        type filter hook input priority 0; policy drop;
        ct state established,related accept
        ip saddr @allowed_ssh tcp dport 22 accept
        tcp dport { 80, 443 } accept
    }
}

You can dynamically add and remove elements from a set:

# Add an address to the set
sudo nft add element inet filter allowed_ssh { 10.0.0.20 }

# Remove an address
sudo nft delete element inet filter allowed_ssh { 10.0.0.20 }

Think About It: You are migrating from iptables to nftables. You have 200 iptables rules across three scripts. Is there a way to translate them automatically?

Yes! The iptables-translate command converts iptables rules to nftables syntax:

iptables-translate -A INPUT -p tcp --dport 22 -j ACCEPT
# Output: nft add rule ip filter INPUT tcp dport 22 counter accept

And iptables-restore-translate converts an entire iptables-save dump:

iptables-save | iptables-restore-translate

ufw: Uncomplicated Firewall (Ubuntu/Debian)

ufw is a user-friendly frontend to iptables/nftables. It is the default firewall management tool on Ubuntu.

# Enable ufw
sudo ufw enable

# Check status
sudo ufw status verbose

# Allow SSH (always do this BEFORE enabling!)
sudo ufw allow ssh

# Allow HTTP and HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Allow from a specific network
sudo ufw allow from 10.0.0.0/24 to any port 22

# Deny a specific port
sudo ufw deny 3306/tcp

# Delete a rule
sudo ufw delete allow 80/tcp

# Reset to defaults
sudo ufw reset

# Show numbered rules for deletion
sudo ufw status numbered
sudo ufw delete 3

Safety Warning: Always sudo ufw allow ssh (or sudo ufw allow 22/tcp) before running sudo ufw enable on a remote server. Enabling ufw without an SSH rule will lock you out immediately.


firewalld: Zone-Based Firewall (RHEL/Fedora)

firewalld is the default on RHEL, CentOS, and Fedora. It uses the concept of zones to group interfaces and apply different trust levels.

+--------------------------------------------------------------+
|  Zone          | Default behavior   | Typical use            |
|----------------|--------------------|-----------------------|
|  drop          | Drop all incoming  | Maximum restriction   |
|  block         | Reject all incoming| Similar to drop       |
|  public        | Reject, allow SSH  | Default zone          |
|  external      | Masquerade, SSH    | External-facing NAT   |
|  dmz           | Allow SSH only     | DMZ servers           |
|  work          | Allow SSH, some    | Work network          |
|  home          | Allow more         | Home network          |
|  internal      | Allow more         | Internal LAN          |
|  trusted       | Accept all         | Maximum trust         |
+--------------------------------------------------------------+

firewalld Commands

# Check state
sudo firewall-cmd --state

# Show default zone
sudo firewall-cmd --get-default-zone

# List all zones and their settings
sudo firewall-cmd --list-all-zones

# List current zone rules
sudo firewall-cmd --list-all

# Add a service (temporary, until reload)
sudo firewall-cmd --add-service=http

# Add a service permanently
sudo firewall-cmd --add-service=http --permanent
sudo firewall-cmd --add-service=https --permanent

# Add a specific port
sudo firewall-cmd --add-port=8080/tcp --permanent

# Allow from a specific source
sudo firewall-cmd --add-rich-rule='rule family="ipv4" source address="10.0.0.0/24" port port="22" protocol="tcp" accept' --permanent

# Reload to apply permanent changes
sudo firewall-cmd --reload

# Remove a service
sudo firewall-cmd --remove-service=http --permanent
sudo firewall-cmd --reload

Distro Note: On RHEL/Fedora systems, avoid using raw iptables/nftables commands when firewalld is running -- they will conflict. Either use firewalld exclusively or disable it and manage rules directly.


Hands-On: Build a Firewall for an SSH-Only Server

Let's build a lockdown firewall that only allows SSH. We will do this three ways.

Method 1: iptables

# Flush
sudo iptables -F

# Allow loopback
sudo iptables -A INPUT -i lo -j ACCEPT

# Allow established
sudo iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Allow SSH
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Default deny
sudo iptables -P INPUT DROP
sudo iptables -P FORWARD DROP
sudo iptables -P OUTPUT ACCEPT

# Verify
sudo iptables -L -n -v

Method 2: nftables

sudo nft flush ruleset
sudo nft add table inet filter
sudo nft add chain inet filter input '{ type filter hook input priority 0; policy drop; }'
sudo nft add rule inet filter input iif lo accept
sudo nft add rule inet filter input ct state established,related accept
sudo nft add rule inet filter input tcp dport 22 accept

# Verify
sudo nft list ruleset

Method 3: ufw

sudo ufw reset
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw enable

# Verify
sudo ufw status verbose

All three produce effectively the same result: only SSH on port 22 is allowed in, everything else is dropped.


NAT with iptables/nftables

Network Address Translation allows machines on a private network to reach the internet through a Linux gateway.

Source NAT (Masquerade) with iptables

# Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1

# Masquerade outgoing traffic on eth0
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

# Allow forwarding from internal network
sudo iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT
sudo iptables -A FORWARD -i eth0 -o eth1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

Port Forwarding (DNAT) with iptables

Forward incoming traffic on port 8080 to an internal server:

sudo iptables -t nat -A PREROUTING -p tcp --dport 8080 -j DNAT --to-destination 10.0.0.50:80
sudo iptables -A FORWARD -p tcp -d 10.0.0.50 --dport 80 -j ACCEPT

NAT with nftables

table ip nat {
    chain prerouting {
        type nat hook prerouting priority -100;
        tcp dport 8080 dnat to 10.0.0.50:80
    }

    chain postrouting {
        type nat hook postrouting priority 100;
        oifname "eth0" masquerade
    }
}

Debug This

Scenario: You set up a firewall on your web server. Users report that the website is not loading, but you can SSH in just fine. You check:

$ sudo iptables -L -n --line-numbers
Chain INPUT (policy DROP)
num  target  prot  opt  source       destination
1    ACCEPT  all   --   0.0.0.0/0    0.0.0.0/0    ctstate ESTABLISHED,RELATED
2    ACCEPT  tcp   --   0.0.0.0/0    0.0.0.0/0    tcp dpt:22
3    ACCEPT  tcp   --   0.0.0.0/0    0.0.0.0/0    tcp dpt:443

$ curl -I http://localhost
HTTP/1.1 200 OK

The web server is running (curl from localhost works), and HTTPS (443) is allowed. But users still cannot load the site via HTTP. What is wrong?

Diagnosis: Port 80 (HTTP) is not in the rules. Many users will type http:// or their browser will try HTTP first. There is a rule for port 443 but not for port 80.

Fix:

sudo iptables -I INPUT 3 -p tcp --dport 80 -j ACCEPT

Or if the web server should redirect HTTP to HTTPS, you still need to allow port 80 so the redirect can happen:

sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT

What Just Happened?

+-------------------------------------------------------------------+
|                     Chapter 34 Recap                               |
+-------------------------------------------------------------------+
|                                                                   |
|  * Netfilter is the kernel framework. iptables and nftables are   |
|    userspace tools that configure it.                             |
|                                                                   |
|  * iptables uses tables (filter, nat, mangle) and chains          |
|    (INPUT, OUTPUT, FORWARD, PREROUTING, POSTROUTING).             |
|                                                                   |
|  * Rules match packets and jump to targets: ACCEPT, DROP,         |
|    REJECT, LOG, MASQUERADE.                                       |
|                                                                   |
|  * nftables is the modern replacement. It offers a cleaner        |
|    syntax, native sets, atomic updates, and unified IPv4/IPv6.    |
|                                                                   |
|  * ufw (Ubuntu) and firewalld (RHEL/Fedora) are user-friendly    |
|    frontends -- they still use Netfilter under the hood.          |
|                                                                   |
|  * ALWAYS add SSH allow rules before setting default DROP policy. |
|                                                                   |
|  * Save your rules! iptables rules vanish on reboot unless        |
|    you explicitly save them.                                      |
|                                                                   |
+-------------------------------------------------------------------+

Try This

  1. Lockdown exercise: Starting from a machine with no firewall rules, build a firewall that allows only SSH from your network and ping. Test by scanning yourself with nmap from another machine.

  2. iptables to nftables: Write five iptables rules, save them with iptables-save, and then translate them with iptables-restore-translate. Compare the two syntaxes.

  3. ufw practice: Using ufw, allow HTTP, HTTPS, and SSH. Then add a rule to deny all traffic from a specific IP address. Verify with sudo ufw status numbered.

  4. Rate limiting: Use iptables to limit SSH connections to 3 per minute per source IP (hint: use the limit or recent match module). Test by trying to connect rapidly.

  5. Bonus challenge: Set up a Linux machine as a NAT gateway. Configure two VMs: one as the gateway with two interfaces (internal and external), and one as an internal client. The client should be able to reach the internet through the gateway using masquerade NAT. Verify with curl from the internal client.

Routing & Network Troubleshooting

Why This Matters

It is 2 AM. Your monitoring system is firing alerts: the application is down. You SSH into the server and discover the app itself is fine -- it just cannot reach the database server on another subnet. Or perhaps a DNS change has gone wrong. Or maybe a newly added firewall rule is silently eating packets.

Network problems are among the most common and most stressful issues you will face as a Linux admin. They are also the most satisfying to solve, because Linux gives you incredible tools to peel back every layer of the network stack and see exactly what is happening. This chapter teaches you a systematic approach to diagnosing network issues and the tools that make it possible.


Try This Right Now

Run these commands and observe the output. They form the foundation of every network troubleshooting session:

# Can I reach the local network?
ping -c 3 $(ip route | awk '/default/ {print $3}')

# Can I reach the internet by IP?
ping -c 3 1.1.1.1

# Can I resolve DNS?
ping -c 3 google.com

# What is my routing table?
ip route show

# What connections are active right now?
ss -tunap

If the first ping fails, you have a local network problem. If the second works but the third fails, you have a DNS problem. If all three work, your basic connectivity is fine and the problem is elsewhere.


The Linux Routing Table

Every Linux system maintains a routing table that tells the kernel where to send packets. When a packet needs to leave your machine, the kernel looks up the destination IP in the routing table and picks the best matching route.

Viewing the Routing Table

# Modern way
ip route show

# Example output:
# default via 192.168.1.1 dev eth0 proto dhcp metric 100
# 10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.1
# 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.100

# Legacy way (avoid, but you'll see it in old docs)
route -n
netstat -rn

Let's decode this:

+----------------------------------------------------------------------+
| Route                                  | Meaning                     |
|----------------------------------------|-----------------------------|
| default via 192.168.1.1 dev eth0       | Default gateway: send all   |
|                                        | unknown traffic to          |
|                                        | 192.168.1.1 via eth0        |
|                                        |                             |
| 10.0.0.0/24 dev eth1 src 10.0.0.1     | The 10.0.0.0/24 network is  |
|                                        | directly attached to eth1.  |
|                                        | Use source IP 10.0.0.1.     |
|                                        |                             |
| 192.168.1.0/24 dev eth0               | The 192.168.1.0/24 network  |
|  src 192.168.1.100                     | is directly attached to     |
|                                        | eth0.                       |
+----------------------------------------------------------------------+

How Routing Decisions Work

The kernel uses longest prefix match: it picks the most specific route that matches the destination.

Destination: 10.0.0.50

Routing table:
  default via 192.168.1.1 dev eth0       (/0  -- matches everything)
  10.0.0.0/24 dev eth1                   (/24 -- matches 10.0.0.*)
  10.0.0.48/30 dev eth2                  (/30 -- matches 10.0.0.48-51)

Winner: 10.0.0.48/30 via eth2 (longest prefix = most specific)

The Default Gateway

The default gateway is the route of last resort. If no other route matches a destination, the packet goes to the default gateway. On most single-homed machines, this is the only route that matters for internet-bound traffic.

# View the default gateway
ip route show default

# Set a default gateway (temporary)
sudo ip route add default via 192.168.1.1

# Replace the default gateway
sudo ip route replace default via 192.168.1.254

# Delete the default gateway
sudo ip route del default

Adding Static Routes

Static routes tell the kernel about networks that are not directly connected but are reachable through a specific gateway.

# Add a static route: "to reach 10.10.0.0/16, go via 192.168.1.254"
sudo ip route add 10.10.0.0/16 via 192.168.1.254

# Add a route through a specific interface
sudo ip route add 172.16.0.0/12 via 10.0.0.1 dev eth1

# Add a route with a specific metric (lower = preferred)
sudo ip route add 10.20.0.0/16 via 192.168.1.254 metric 200

# Delete a static route
sudo ip route del 10.10.0.0/16

Think About It: You add a static route to 10.10.0.0/16 via 192.168.1.254, but pinging 10.10.0.5 still fails. The gateway 192.168.1.254 is reachable (you can ping it). What could be wrong?

Several possibilities: the gateway at 192.168.1.254 does not have a route to 10.10.0.0/16 either, the gateway does not have IP forwarding enabled, a firewall on the gateway is blocking forwarded traffic, or the destination host does not have a return route back to you.

Making Static Routes Persistent

As always, ip route add is temporary. To persist routes:

NetworkManager (nmcli):

sudo nmcli connection modify "my-connection" +ipv4.routes "10.10.0.0/16 192.168.1.254"
sudo nmcli connection up "my-connection"

systemd-networkd (add to your .network file):

[Route]
Destination=10.10.0.0/16
Gateway=192.168.1.254

Debian /etc/network/interfaces (add to the interface stanza):

up ip route add 10.10.0.0/16 via 192.168.1.254

Netplan:

network:
  ethernets:
    eth0:
      routes:
        - to: 10.10.0.0/16
          via: 192.168.1.254

IP Forwarding: Turning Linux into a Router

By default, Linux drops packets that arrive on one interface and are destined for another. To make Linux forward packets (act as a router), you must enable IP forwarding.

# Check current setting (0 = disabled, 1 = enabled)
cat /proc/sys/net/ipv4/ip_forward

# Enable temporarily
sudo sysctl -w net.ipv4.ip_forward=1

# Enable permanently
echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/99-ip-forward.conf
sudo sysctl -p /etc/sysctl.d/99-ip-forward.conf

For IPv6 forwarding:

sudo sysctl -w net.ipv6.conf.all.forwarding=1

Simple NAT Gateway

A common setup: Linux machine with two interfaces acting as a gateway for an internal network.

 Internet                       Internal Network
    |                                |
    |   eth0 (public IP)            eth1 (10.0.0.1/24)
    +--------[ Linux Gateway ]--------+
              IP forwarding ON
              NAT (masquerade)
                                     |
                              [ Internal hosts ]
                              10.0.0.0/24
# Enable forwarding
sudo sysctl -w net.ipv4.ip_forward=1

# Add masquerade rule (replace eth0 with your external interface)
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

# Allow forwarding between interfaces
sudo iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT
sudo iptables -A FORWARD -i eth0 -o eth1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

Internal hosts need their default gateway set to 10.0.0.1, and they will be able to reach the internet through the Linux gateway.


The Troubleshooting Methodology

When network connectivity fails, work through the layers systematically. Do not skip ahead -- each step rules out a category of problems.

+-------------------------------------------------------------------+
|  Step  | Check                | Tool          | Rules Out          |
|--------|----------------------|---------------|--------------------|
|   1    | Is the interface up? | ip link       | Physical/driver    |
|   2    | Do I have an IP?     | ip addr       | DHCP/config        |
|   3    | Can I reach the      | ping gateway  | Local network/     |
|        | gateway?             |               | ARP/switching      |
|   4    | Can I reach the      | ping 1.1.1.1  | Routing/gateway/   |
|        | internet by IP?      |               | ISP                |
|   5    | Does DNS resolve?    | dig, nslookup | DNS configuration  |
|   6    | Can I reach the      | curl, telnet  | Firewall/app/port  |
|        | target service?      |               | issues             |
+-------------------------------------------------------------------+

Let's go through each tool in detail.


Tool: ping -- Basic Connectivity

ping sends ICMP echo request packets and waits for replies. It tests basic IP connectivity and measures round-trip time.

# Ping a host (Ctrl+C to stop)
ping 192.168.1.1

# Send exactly 5 pings
ping -c 5 google.com

# Set a timeout of 2 seconds per ping
ping -W 2 -c 3 10.0.0.1

# Flood ping (root only, sends as fast as possible)
sudo ping -f -c 1000 192.168.1.1

# Ping with a specific source interface
ping -I eth1 10.0.0.1

What ping results tell you:

PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=12.3 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=57 time=11.8 ms

--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 11.8/12.0/12.3/0.250 ms
  • time: Round-trip time. Over 100ms to local resources is suspicious.
  • ttl: Time To Live. Decreases at each hop. Helps identify how far away a host is.
  • packet loss: Any loss is a problem. More than 1-2% is significant.

Think About It: You can ping 8.8.8.8 but not google.com. You can ping 1.1.1.1 but not cloudflare.com. What is the most likely problem?

DNS resolution is broken. IP connectivity works fine. Check /etc/resolv.conf for your DNS server settings and test with dig google.com @8.8.8.8.


Tool: traceroute / tracepath -- Tracing the Path

traceroute shows every router (hop) between you and the destination. It works by sending packets with increasing TTL values.

# Basic traceroute
traceroute google.com

# Use ICMP instead of UDP (sometimes more reliable)
sudo traceroute -I google.com

# Use TCP SYN on port 80 (gets through more firewalls)
sudo traceroute -T -p 80 google.com

# tracepath (no root needed, uses UDP)
tracepath google.com

Example output:

traceroute to google.com (142.250.80.46), 30 hops max, 60 byte packets
 1  192.168.1.1 (192.168.1.1)  1.234 ms  1.112 ms  1.001 ms
 2  10.0.0.1 (10.0.0.1)  5.432 ms  5.321 ms  5.210 ms
 3  isp-router.example.com (203.0.113.1)  10.123 ms  10.234 ms  10.345 ms
 4  * * *
 5  142.250.80.46 (142.250.80.46)  15.678 ms  15.567 ms  15.456 ms
  • * * * means that hop did not respond. This is often normal -- many routers are configured not to reply to traceroute probes.
  • If the trace stops at a certain hop and never progresses, there is likely a routing problem or firewall at that hop.

Distro Note: traceroute is not always installed by default. Install it with sudo apt install traceroute (Debian/Ubuntu) or sudo dnf install traceroute (RHEL/Fedora). tracepath (from iputils) is usually pre-installed.


Tool: dig -- DNS Troubleshooting

dig is the gold standard for DNS troubleshooting. It queries DNS servers directly and shows the full response.

# Basic lookup
dig google.com

# Query a specific DNS server
dig @8.8.8.8 google.com

# Look up a specific record type
dig google.com MX
dig google.com AAAA
dig google.com NS

# Short output (just the answer)
dig +short google.com

# Trace the full resolution path
dig +trace google.com

# Reverse DNS lookup
dig -x 8.8.8.8

When DNS is not working, the most useful test is:

# Test with your configured DNS server
dig google.com

# Test with a known-good public DNS server
dig @8.8.8.8 google.com

If the second works but the first does not, your configured DNS server (in /etc/resolv.conf) is the problem.

# Check your DNS configuration
cat /etc/resolv.conf

# On systems using systemd-resolved
resolvectl status

Tool: ss -- Socket Statistics

ss replaces the older netstat command. It shows listening ports, active connections, and socket details.

# Show all listening TCP ports
ss -tlnp

# Show all listening UDP ports
ss -ulnp

# Show all established connections
ss -tnp

# Show connections to a specific port
ss -tnp | grep :443

# Show socket summary statistics
ss -s

# Show all TCP sockets in all states
ss -ta

Breaking down the flags:

  • -t: TCP
  • -u: UDP
  • -l: Listening only
  • -n: Show numbers (no DNS resolution)
  • -p: Show the process using each socket

Example output:

State   Recv-Q  Send-Q  Local Address:Port   Peer Address:Port  Process
LISTEN  0       128     0.0.0.0:22           0.0.0.0:*          users:(("sshd",pid=1234,fd=3))
LISTEN  0       511     0.0.0.0:80           0.0.0.0:*          users:(("nginx",pid=5678,fd=6))
ESTAB   0       0       192.168.1.100:22     192.168.1.50:54321  users:(("sshd",pid=9012,fd=4))

This tells you:

  • SSH is listening on all interfaces, port 22
  • Nginx is listening on all interfaces, port 80
  • There is one active SSH connection from 192.168.1.50

Tool: tcpdump -- Packet Capture

tcpdump is your most powerful network troubleshooting tool. It captures and displays actual packets on the wire. When all else fails, tcpdump tells you exactly what is happening.

# Capture all traffic on eth0
sudo tcpdump -i eth0

# Capture only traffic to/from a specific host
sudo tcpdump -i eth0 host 10.0.0.5

# Capture only TCP traffic on port 80
sudo tcpdump -i eth0 tcp port 80

# Capture DNS traffic
sudo tcpdump -i eth0 port 53

# Capture ICMP (ping)
sudo tcpdump -i eth0 icmp

# Show packet contents in ASCII
sudo tcpdump -i eth0 -A port 80

# Show packet contents in hex and ASCII
sudo tcpdump -i eth0 -XX port 80

# Save capture to a file (for analysis in Wireshark)
sudo tcpdump -i eth0 -w capture.pcap

# Read a saved capture
sudo tcpdump -r capture.pcap

# Capture only 100 packets
sudo tcpdump -i eth0 -c 100

# Don't resolve hostnames (faster)
sudo tcpdump -i eth0 -n

tcpdump Filter Expressions

You can build complex filters:

# Traffic to or from a subnet
sudo tcpdump -i eth0 net 10.0.0.0/24

# Traffic from a source to a specific destination port
sudo tcpdump -i eth0 src 192.168.1.50 and dst port 443

# Traffic that is NOT SSH (filter out your own session)
sudo tcpdump -i eth0 not port 22

# SYN packets only (new TCP connections)
sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0'

# HTTP GET requests
sudo tcpdump -i eth0 -A 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

Reading tcpdump Output

14:23:45.123456 IP 192.168.1.100.54321 > 93.184.216.34.80: Flags [S], seq 1234567890, win 65535, length 0
14:23:45.234567 IP 93.184.216.34.80 > 192.168.1.100.54321: Flags [S.], seq 987654321, ack 1234567891, win 65535, length 0
14:23:45.234600 IP 192.168.1.100.54321 > 93.184.216.34.80: Flags [.], ack 1, win 65535, length 0

This is a TCP three-way handshake:

  1. [S] -- SYN: Client initiates connection
  2. [S.] -- SYN-ACK: Server acknowledges and responds
  3. [.] -- ACK: Client acknowledges, connection established

Common flags: [S] SYN, [S.] SYN-ACK, [.] ACK, [P.] PSH-ACK (data), [F.] FIN-ACK (close), [R] RST (reset/reject).

Safety Warning: tcpdump can capture sensitive data including passwords sent in plain text, session tokens, and personal information. Use it responsibly and be mindful of capture files stored on disk.


Tool: curl -- Application-Layer Testing

curl tests HTTP/HTTPS connectivity and is essential for verifying web services.

# Basic request
curl http://example.com

# Show response headers
curl -I http://example.com

# Verbose output (shows the full connection process)
curl -v https://example.com

# Follow redirects
curl -L http://example.com

# Test a specific port
curl http://10.0.0.5:8080

# Set a timeout
curl --connect-timeout 5 --max-time 10 http://example.com

# Test with a specific Host header (useful for virtual hosts)
curl -H "Host: mysite.com" http://10.0.0.5

# Test HTTPS, ignoring certificate errors
curl -k https://self-signed-server.local

Hands-On: Systematic Network Troubleshooting

Here is a real-world troubleshooting workflow. Imagine you cannot reach a web server at web.example.com.

Step 1: Is my interface up and do I have an IP?

ip link show
ip addr show

Look for state UP and a valid inet address. If the interface is down or has no IP, fix that first (see Chapter 33).

Step 2: Can I reach my gateway?

ip route show default
ping -c 3 192.168.1.1    # (your gateway)

If this fails, the problem is local: check cables, switch, VLAN, ARP table.

ip neigh show    # Check ARP cache

Step 3: Can I reach an external IP?

ping -c 3 1.1.1.1

If this fails but the gateway ping works, the problem is upstream: routing, ISP, or the gateway is not forwarding traffic.

Step 4: Does DNS work?

dig web.example.com
dig @8.8.8.8 web.example.com

If DNS fails with your configured server but works with 8.8.8.8, update your DNS configuration.

Step 5: Can I reach the target service?

curl -v http://web.example.com
curl -v --connect-timeout 5 http://web.example.com:80

If DNS resolves and you can ping the server but curl times out, there may be a firewall blocking port 80.

Step 6: Is the port actually open on the remote end?

# Test if a port is open
ss -tln | grep :80                      # On the server itself
sudo nmap -p 80 web.example.com         # From your machine (if nmap is available)

Step 7: Capture packets to see what is happening

# On the server, watch for incoming connections on port 80
sudo tcpdump -i eth0 tcp port 80 -n

# From your machine, try to connect
curl http://web.example.com

If you see SYN packets arriving but no SYN-ACK response, the server's firewall is dropping them. If you see no packets at all, they are being dropped somewhere between you and the server.


Common Problems and Solutions

Problem: "Network is unreachable"

$ ping 10.0.0.5
connect: Network is unreachable

Cause: No route to the destination network.

Fix: Add a route or default gateway:

sudo ip route add default via 192.168.1.1

Problem: "No route to host"

$ ping 192.168.1.50
From 192.168.1.100 icmp_seq=1 Destination Host Unreachable

Cause: The destination is on the local network but not responding to ARP.

Fix: Check that the destination host is powered on, on the same VLAN, and has the correct IP configured. Check ARP:

ip neigh show

Problem: "Connection refused"

$ curl http://10.0.0.5:80
curl: (7) Failed to connect to 10.0.0.5 port 80: Connection refused

Cause: The port is not open. The host is reachable, but nothing is listening.

Fix: Start the service on the target host:

sudo systemctl start nginx
ss -tlnp | grep :80

Problem: "Connection timed out"

$ curl --connect-timeout 5 http://10.0.0.5:80
curl: (28) Connection timed out

Cause: A firewall is silently dropping packets (no response, not even a reject).

Fix: Check firewall rules on the target host and any intermediate firewalls:

sudo iptables -L -n -v | grep 80

Problem: "Name resolution failed"

$ ping google.com
ping: google.com: Temporary failure in name resolution

Cause: DNS is not configured or not reachable.

Fix:

cat /etc/resolv.conf
# If empty or wrong, temporarily fix:
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

Debug This

Scenario: A developer says their application on Server A (192.168.1.100) cannot connect to the API on Server B (10.0.0.50, port 8080). You investigate:

# From Server A:
$ ping 10.0.0.50
PING 10.0.0.50 (10.0.0.50) 56(84) bytes of data.
64 bytes from 10.0.0.50: icmp_seq=1 ttl=63 time=1.23 ms

$ curl --connect-timeout 5 http://10.0.0.50:8080
curl: (28) Connection timed out after 5001 milliseconds

$ sudo tcpdump -i eth0 host 10.0.0.50 and port 8080 -c 5
14:23:01 IP 192.168.1.100.54321 > 10.0.0.50.8080: Flags [S], seq 12345
14:23:02 IP 192.168.1.100.54321 > 10.0.0.50.8080: Flags [S], seq 12345
14:23:04 IP 192.168.1.100.54321 > 10.0.0.50.8080: Flags [S], seq 12345

What does this tell you, and what do you do next?

Diagnosis: Ping works (ICMP is allowed), but TCP to port 8080 times out. The tcpdump shows SYN packets being sent but no SYN-ACK coming back. This means either:

  1. A firewall on Server B is dropping TCP packets to port 8080
  2. The application on Server B is not listening on port 8080
  3. A network firewall between the two is blocking port 8080

Next steps: SSH into Server B and check:

# Is anything listening on port 8080?
ss -tlnp | grep 8080

# What does the firewall look like?
sudo iptables -L -n -v | grep 8080

What Just Happened?

+-------------------------------------------------------------------+
|                     Chapter 35 Recap                               |
+-------------------------------------------------------------------+
|                                                                   |
|  * The routing table determines where packets are sent.           |
|    Longest prefix match wins.                                     |
|                                                                   |
|  * ip route manages routes. Static routes need persistence        |
|    via nmcli, netplan, or config files.                           |
|                                                                   |
|  * IP forwarding turns Linux into a router.                       |
|    Enable via sysctl net.ipv4.ip_forward=1.                      |
|                                                                   |
|  * Troubleshooting order: interface -> IP -> gateway ->           |
|    internet -> DNS -> service/port.                               |
|                                                                   |
|  * Key tools:                                                     |
|    - ping: basic connectivity                                    |
|    - traceroute: path to destination                              |
|    - dig: DNS queries                                            |
|    - ss: listening ports and connections                          |
|    - tcpdump: actual packet capture                              |
|    - curl: HTTP-level testing                                    |
|                                                                   |
|  * Error messages tell you exactly what layer is broken:          |
|    "Network unreachable" = no route                              |
|    "Connection refused"  = nothing listening                     |
|    "Connection timed out" = firewall dropping packets            |
|                                                                   |
+-------------------------------------------------------------------+

Try This

  1. Route tracing: Run traceroute (or tracepath) to five different websites. Compare the paths. Can you identify which hops belong to your ISP?

  2. tcpdump practice: Start a tcpdump capture on port 80, then open a web page in your browser. Identify the TCP handshake, the HTTP request, and the response.

  3. Break and fix: On a test VM, remove the default gateway (sudo ip route del default). Observe what breaks. Then add it back and verify.

  4. DNS investigation: Use dig +trace google.com to see the full DNS resolution chain from root servers to authoritative servers. How many DNS servers are involved?

  5. Bonus challenge: Set up two VMs on different subnets. Configure a third VM as a router between them (with IP forwarding and proper routes). Verify that the two VMs can ping each other through the router.

SSH: Secure Remote Access

Why This Matters

Before SSH existed, system administrators used Telnet and rsh to manage remote servers. Every keystroke -- including passwords -- was sent across the network in plain text. Anyone on the same network could read everything with a simple packet capture.

SSH (Secure Shell) changed everything. It encrypts all communication between your machine and the remote server. Today, SSH is so fundamental that it is practically impossible to do Linux administration without it. If you manage even one remote server, you are using SSH. If you manage hundreds, you need to understand SSH deeply -- key-based authentication, tunneling, agent forwarding, and hardening.

This chapter takes you from "I can ssh into a server" to "I understand SSH well enough to build secure, efficient remote access infrastructure."


Try This Right Now

# Check if SSH client is installed
ssh -V

# Check if SSH server is running on your machine
systemctl status sshd

# List your existing SSH keys (if any)
ls -la ~/.ssh/

# Try connecting to localhost (if sshd is running)
ssh localhost

How SSH Works

When you type ssh user@server, a lot happens before you see that command prompt.

The Connection Process

  Client                              Server
    |                                   |
    |  1. TCP connection (port 22)      |
    |---------------------------------->|
    |                                   |
    |  2. Protocol version exchange     |
    |<--------------------------------->|
    |                                   |
    |  3. Key exchange (Diffie-Hellman) |
    |<--------------------------------->|
    |  (Both sides now have a shared    |
    |   session key for encryption)     |
    |                                   |
    |  4. Server authentication         |
    |<----------------------------------|
    |  (Server proves identity with     |
    |   its host key)                   |
    |                                   |
    |  5. User authentication           |
    |---------------------------------->|
    |  (Password, public key, etc.)     |
    |                                   |
    |  6. Encrypted session begins      |
    |<=================================>|
    |                                   |

Key concepts:

  • Key Exchange: Both sides negotiate a shared secret using Diffie-Hellman. This secret is used to encrypt the session. Even if someone captures all the traffic, they cannot derive the encryption key.

  • Host Key: The server has a unique key pair. The first time you connect, SSH asks you to verify the fingerprint. This fingerprint is stored in ~/.ssh/known_hosts. If it changes later, SSH warns you -- this could mean a man-in-the-middle attack.

  • User Authentication: After the encrypted channel is established, you prove your identity (password, key, certificate, etc.).

The "Host Key Changed" Warning

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

This warning means the server's host key is different from what you have stored. This happens when:

  • The server was reinstalled
  • The server's IP was reassigned to a different machine
  • Someone is attempting a man-in-the-middle attack

If you are sure the change is legitimate (e.g., you just reinstalled the server):

# Remove the old key for that host
ssh-keygen -R hostname_or_ip

# Then connect again, and you'll be prompted to accept the new key
ssh user@hostname

SSH Basics

Connecting to a Remote Server

# Basic connection (uses your current username)
ssh 192.168.1.100

# Specify a username
ssh admin@192.168.1.100

# Connect on a non-standard port
ssh -p 2222 admin@192.168.1.100

# Verbose mode (for debugging connection problems)
ssh -v admin@192.168.1.100

# Extra verbose (more detail)
ssh -vv admin@192.168.1.100

Running a Single Command

# Run a command on the remote server without opening a shell
ssh admin@server 'uptime'

# Run multiple commands
ssh admin@server 'hostname && uptime && df -h'

# Run a command that needs a TTY (e.g., sudo, top)
ssh -t admin@server 'sudo systemctl restart nginx'

The -t flag forces a pseudo-terminal allocation, which is needed for interactive commands.


Key-Based Authentication

Password authentication works but has serious drawbacks:

  • Passwords can be brute-forced
  • Passwords can be phished or shoulder-surfed
  • Passwords are annoying to type hundreds of times a day
  • Passwords cannot be used for automated scripts

Key-based authentication uses public-key cryptography. You generate a key pair:

  • Private key: Lives on your machine. Never share it with anyone.
  • Public key: Goes on every server you want to access. Safe to share.
+-------------------+          +-------------------+
|    Your Machine   |          |   Remote Server   |
|                   |          |                   |
|  ~/.ssh/id_ed25519|          | ~/.ssh/            |
|  (PRIVATE KEY)    |  proves  |  authorized_keys  |
|  Keep this secret!|--------->|  (PUBLIC KEY)     |
|                   |          |                   |
+-------------------+          +-------------------+

Generating a Key Pair

# Generate an Ed25519 key (recommended, modern, fast)
ssh-keygen -t ed25519 -C "your_email@example.com"

# You will be prompted:
# Enter file in which to save the key (/home/user/.ssh/id_ed25519):
# Enter passphrase (empty for no passphrase):

You should set a passphrase. It encrypts your private key on disk, so even if someone steals the file, they cannot use it without the passphrase.

If you need RSA compatibility (older systems):

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

Copying Your Public Key to a Server

# The easy way (recommended)
ssh-copy-id admin@server

# This copies your public key to the server's ~/.ssh/authorized_keys
# and sets the correct permissions automatically.

If ssh-copy-id is not available:

# Manual method
cat ~/.ssh/id_ed25519.pub | ssh admin@server 'mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys'

Verifying It Works

# This should log you in without a password
ssh admin@server

# If you set a passphrase, it will ask for the passphrase (not the server password)

The authorized_keys File

On the server, each user's authorized keys live in ~/.ssh/authorized_keys. Each line is one public key:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHx... user@laptop
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJy... user@desktop
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQ... admin@workstation

You can add options before each key to restrict it:

# Only allow this key from a specific IP
from="10.0.0.5" ssh-ed25519 AAAAC3...

# Only allow this key to run a specific command
command="/usr/local/bin/backup.sh" ssh-ed25519 AAAAC3...

# Disable forwarding for this key
no-port-forwarding,no-agent-forwarding ssh-ed25519 AAAAC3...

Think About It: Why is it important to set correct file permissions on the .ssh directory and its contents?

SSH is strict about permissions for security. If ~/.ssh is world-readable or authorized_keys is writable by others, SSH will refuse to use them. The correct permissions are:

  • ~/.ssh/ -- 700 (drwx------)
  • ~/.ssh/authorized_keys -- 600 (-rw-------)
  • ~/.ssh/id_ed25519 (private key) -- 600 (-rw-------)
  • ~/.ssh/id_ed25519.pub (public key) -- 644 (-rw-r--r--)

SSH Config File

Typing ssh -p 2222 -i ~/.ssh/special_key admin@long-server-name.example.com every time is tedious. The SSH config file (~/.ssh/config) solves this.

# ~/.ssh/config

# Production web server
Host web-prod
    HostName 203.0.113.50
    User admin
    Port 2222
    IdentityFile ~/.ssh/id_ed25519_work

# Database server (only reachable through web-prod)
Host db-prod
    HostName 10.0.0.50
    User dba
    ProxyJump web-prod

# Development machines
Host dev-*
    User developer
    Port 22
    IdentityFile ~/.ssh/id_ed25519_dev

Host dev-app
    HostName 192.168.1.10

Host dev-api
    HostName 192.168.1.11

# Default settings for all hosts
Host *
    ServerAliveInterval 60
    ServerAliveCountMax 3
    AddKeysToAgent yes
    IdentitiesOnly yes

Now you can simply type:

ssh web-prod     # instead of ssh -p 2222 -i ~/.ssh/id_ed25519_work admin@203.0.113.50
ssh db-prod      # automatically jumps through web-prod
ssh dev-app      # uses developer@192.168.1.10

Important Config Options

OptionPurpose
HostNameThe actual hostname or IP
UserUsername to connect as
PortSSH port
IdentityFilePath to private key
ProxyJumpJump through another host (bastion/jump box)
ServerAliveIntervalSend keepalive every N seconds
ServerAliveCountMaxDisconnect after N missed keepalives
ForwardAgentForward SSH agent to remote host
LocalForwardSet up a local port forward
IdentitiesOnlyOnly use the specified key, not all in agent

SSH Agent

If you have a passphrase on your key (and you should), typing it every time gets old fast. The SSH agent stores your decrypted private key in memory so you only type the passphrase once per session.

# Start the SSH agent (often already running in desktop environments)
eval "$(ssh-agent -s)"

# Add your key to the agent
ssh-add ~/.ssh/id_ed25519
# Enter passphrase once

# List keys in the agent
ssh-add -l

# Remove all keys from the agent
ssh-add -D

Agent Forwarding

Agent forwarding lets you use your local SSH keys on a remote server without copying the private key to that server.

You (laptop)  --->  Jump box  --->  Internal server
   [key]         [no key needed]    [authenticates with
                 [agent forwarded]    your key via agent]
# Enable agent forwarding for a connection
ssh -A user@jumpbox

# Or in ~/.ssh/config:
Host jumpbox
    ForwardAgent yes

Safety Warning: Only enable agent forwarding to servers you trust. A compromised server with access to your forwarded agent could use your keys to connect to other servers. Use ProxyJump instead of agent forwarding when possible -- it is more secure because the jump host never sees your keys.


Port Forwarding (SSH Tunnels)

SSH tunnels are one of the most useful and underappreciated features of SSH. They let you securely access services that are behind firewalls or only listening on localhost.

Local Port Forwarding

"Make a remote service available on my local machine."

Your machine (localhost:8080) --[SSH tunnel]--> Remote (localhost:5432)

You access localhost:8080, traffic goes through the SSH tunnel
to the remote server, which connects to its own localhost:5432.
# Forward local port 8080 to remote's localhost:5432 (PostgreSQL)
ssh -L 8080:localhost:5432 admin@remote-server

# Now you can connect to PostgreSQL at localhost:8080 from your machine
psql -h localhost -p 8080 -U myuser mydb

More specific syntax:

# Forward local port 3307 to a database at 10.0.0.50:3306 via jump-server
ssh -L 3307:10.0.0.50:3306 admin@jump-server

# The database is not directly reachable from your machine,
# but jump-server can reach it.
# You connect to localhost:3307, which tunnels through.

Remote Port Forwarding

"Make a local service available on the remote machine."

# Make your local web server (port 3000) available on remote port 9000
ssh -R 9000:localhost:3000 admin@remote-server

# Anyone who can reach remote-server:9000 can now access
# your local machine's port 3000 through the tunnel.

This is useful for:

  • Exposing a local development server to a remote tester
  • Giving remote access to a service behind NAT

Dynamic Port Forwarding (SOCKS Proxy)

This creates a SOCKS proxy that routes all traffic through the SSH connection.

# Create a SOCKS5 proxy on local port 1080
ssh -D 1080 admin@remote-server

# Configure your browser or application to use SOCKS proxy localhost:1080
# All traffic will be routed through remote-server

This is useful for:

  • Browsing the web as if you were on the remote network
  • Accessing internal web applications
  • Bypassing geographic restrictions

Persistent Tunnels

# Keep the tunnel open in the background
ssh -f -N -L 8080:localhost:5432 admin@remote-server

# -f: Go to background after authentication
# -N: Don't execute a remote command (just the tunnel)

In your SSH config:

Host tunnel-db
    HostName remote-server
    User admin
    LocalForward 8080 localhost:5432
    # Connect with: ssh -f -N tunnel-db

File Transfer over SSH

scp -- Secure Copy

# Copy a file to a remote server
scp file.txt admin@server:/home/admin/

# Copy a file from a remote server
scp admin@server:/var/log/syslog ./

# Copy a directory recursively
scp -r ./project/ admin@server:/home/admin/

# Use a specific port
scp -P 2222 file.txt admin@server:/home/admin/

sftp -- Secure FTP

# Start an interactive SFTP session
sftp admin@server

# Inside sftp:
sftp> ls
sftp> cd /var/log
sftp> get syslog
sftp> put localfile.txt
sftp> mkdir new-directory
sftp> exit

rsync over SSH

rsync is the best tool for transferring files over SSH because it only transfers what has changed.

# Sync a directory to a remote server
rsync -avz ./project/ admin@server:/home/admin/project/

# Sync from remote to local
rsync -avz admin@server:/var/log/ ./logs/

# Use a specific SSH port
rsync -avz -e 'ssh -p 2222' ./data/ admin@server:/backup/

# Dry run (show what would be transferred)
rsync -avzn ./project/ admin@server:/home/admin/project/

# Delete files on destination that don't exist on source
rsync -avz --delete ./project/ admin@server:/home/admin/project/

Flags breakdown:

  • -a: Archive mode (preserves permissions, timestamps, symlinks, etc.)
  • -v: Verbose
  • -z: Compress during transfer
  • -n: Dry run
  • --delete: Remove files on destination not in source

Think About It: You need to transfer a 10 GB directory to a remote server. The connection drops halfway through. With scp, what happens? With rsync, what happens?

With scp, you have to start over. The partial transfer is there but scp does not know where it left off. With rsync, you run the same command again and it picks up where it left off, only transferring the remaining files.


Hardening SSH: sshd_config

The SSH server configuration lives at /etc/ssh/sshd_config. Here are the essential hardening steps for production servers.

Disable Root Login

# /etc/ssh/sshd_config
PermitRootLogin no

This forces admins to log in as a regular user and use sudo. It also eliminates the "root" username as a brute-force target.

Key-Only Authentication (Disable Passwords)

# /etc/ssh/sshd_config
PasswordAuthentication no
PubkeyAuthentication yes

Safety Warning: Before disabling password authentication, make ABSOLUTELY sure that key-based authentication works. Test it in a separate terminal first. If you disable passwords and your key does not work, you will be locked out.

Change the Default Port

# /etc/ssh/sshd_config
Port 2222

This is security through obscurity and does not stop determined attackers, but it eliminates 99% of automated brute-force attempts that target port 22.

Limit Users and Groups

# Only allow specific users
AllowUsers admin deployer

# Or allow by group
AllowGroups ssh-users

Other Hardening Options

# Disable empty passwords
PermitEmptyPasswords no

# Set a login grace period (time to authenticate)
LoginGraceTime 30

# Limit authentication attempts per connection
MaxAuthTries 3

# Disable X11 forwarding (if not needed)
X11Forwarding no

# Disable TCP forwarding (if not needed)
AllowTcpForwarding no

# Use only protocol 2 (protocol 1 is insecure and obsolete)
Protocol 2

# Specify allowed key exchange algorithms (modern only)
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org

# Specify allowed ciphers (modern only)
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com

# Specify allowed MACs (modern only)
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com

Applying Changes

After editing sshd_config:

# Test the configuration for syntax errors (critical step!)
sudo sshd -t

# If no errors, reload the service
sudo systemctl reload sshd

Safety Warning: Always test the config with sshd -t before reloading. And always keep your current SSH session open while testing. Open a NEW terminal and try to connect. If the new connection works, you are safe. If it does not, you still have your existing session to fix things.


Hands-On: Complete SSH Setup

Let's walk through a complete SSH setup from scratch.

Step 1: Generate a key pair on your local machine

ssh-keygen -t ed25519 -C "admin@company.com"
# Accept the default path, set a strong passphrase

Step 2: Copy the public key to the server

ssh-copy-id admin@192.168.1.100
# Enter the server password one last time

Step 3: Test key-based login

ssh admin@192.168.1.100
# Should ask for your key passphrase, NOT the server password

Step 4: Set up the SSH agent

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
# Enter passphrase once

ssh admin@192.168.1.100
# No passphrase prompt this time

Step 5: Create an SSH config entry

cat >> ~/.ssh/config << 'EOF'
Host myserver
    HostName 192.168.1.100
    User admin
    IdentityFile ~/.ssh/id_ed25519
EOF

chmod 600 ~/.ssh/config

# Now just:
ssh myserver

Step 6: Harden the server (keep your current session open!)

ssh myserver
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup
sudo vim /etc/ssh/sshd_config

Add or modify:

PermitRootLogin no
PasswordAuthentication no
MaxAuthTries 3
sudo sshd -t && sudo systemctl reload sshd

Step 7: Test from a new terminal

ssh myserver    # Should work with key
ssh -o PubkeyAuthentication=no myserver   # Should FAIL (password disabled)

Debug This

Scenario: You cannot SSH into a server. You have already verified that the server is reachable (ping works). Running ssh -v admin@server shows:

debug1: Connecting to server [192.168.1.100] port 22.
debug1: Connection established.
debug1: identity file /home/user/.ssh/id_ed25519 type 3
debug1: Authentications that can continue: publickey
debug1: Trying private key: /home/user/.ssh/id_ed25519
debug1: Authentication failed.
Permission denied (publickey).

What is happening and how do you fix it?

Diagnosis: The server only accepts public key authentication (no password). Your key is being offered but rejected. Possible causes:

  1. Your public key is not in the server's ~/.ssh/authorized_keys
  2. The permissions on the server's ~/.ssh directory are wrong
  3. You are using the wrong key or the wrong username
  4. The server's authorized_keys file is owned by root or has wrong permissions

Fix: If you have out-of-band access (console):

# On the server, check the auth log
sudo tail -20 /var/log/auth.log       # Debian/Ubuntu
sudo tail -20 /var/log/secure         # RHEL/CentOS

# Common log messages:
# "Authentication refused: bad ownership or modes for directory /home/admin/.ssh"
# Fix:
chmod 700 /home/admin/.ssh
chmod 600 /home/admin/.ssh/authorized_keys
chown -R admin:admin /home/admin/.ssh

What Just Happened?

+-------------------------------------------------------------------+
|                     Chapter 36 Recap                               |
+-------------------------------------------------------------------+
|                                                                   |
|  * SSH encrypts all communication using key exchange and          |
|    symmetric encryption. It replaced insecure protocols like      |
|    Telnet and rsh.                                                |
|                                                                   |
|  * Key-based auth is more secure than passwords. Use              |
|    ssh-keygen (Ed25519) and ssh-copy-id.                         |
|                                                                   |
|  * The SSH config file (~/.ssh/config) saves you from typing     |
|    long commands and organizes access to many servers.            |
|                                                                   |
|  * SSH agent caches your key passphrase in memory.               |
|    Agent forwarding lets you use your keys on remote servers.    |
|                                                                   |
|  * Port forwarding creates encrypted tunnels:                    |
|    -L (local), -R (remote), -D (dynamic/SOCKS).                 |
|                                                                   |
|  * File transfers: scp (simple), sftp (interactive),             |
|    rsync (incremental, resumable -- best for large transfers).   |
|                                                                   |
|  * Harden sshd_config: disable root login, disable passwords,   |
|    limit users, change port, use modern algorithms.              |
|                                                                   |
|  * Always test config changes from a new terminal while keeping  |
|    your current session open.                                    |
|                                                                   |
+-------------------------------------------------------------------+

Try This

  1. Key rotation: Generate a new Ed25519 key pair. Add it to a server alongside your existing key. Verify both work. Remove the old key. This simulates key rotation.

  2. SSH tunnel: Start a simple web server on a remote machine (python3 -m http.server 8000 on localhost). Use local port forwarding to access it from your laptop at localhost:9000. Verify with curl localhost:9000.

  3. Jump host: If you have three machines (or VMs), configure the middle one as a jump host. Use ProxyJump in your SSH config to transparently reach the third machine through the second.

  4. Security audit: Examine your current sshd_config. How many of the hardening recommendations from this chapter are already in place? Apply the ones that are missing on a test system.

  5. Bonus challenge: Set up a dynamic SOCKS proxy with SSH and configure your web browser to use it. Visit a site like ifconfig.me to verify that your traffic is exiting from the remote server's IP address, not your own.

WireGuard VPN

Why This Matters

Your company has developers working from coffee shops, home offices, and airports. They need to reach internal services -- databases, staging environments, monitoring dashboards -- that should never be exposed to the public internet. You could use SSH tunnels for each service, but managing dozens of tunnels is painful and fragile. You need a VPN.

For decades, OpenVPN and IPsec were the standard choices. They work, but they are complex: OpenVPN has a 100,000+ line codebase, certificates to manage, and performance overhead. IPsec configuration can fill entire books on its own.

WireGuard is a modern VPN that takes a radically different approach: simplicity. Its entire codebase is about 4,000 lines of code. It is built into the Linux kernel since version 5.6. It uses state-of-the-art cryptography, and it is fast -- often significantly faster than OpenVPN or IPsec. Configuration is a single file.

If you need a VPN on Linux in 2025, WireGuard should be your first choice.


Try This Right Now

# Check if WireGuard is available on your kernel
modprobe wireguard && echo "WireGuard module loaded" || echo "Not available"

# Check if wg tools are installed
wg --version

# If not installed, install it:
# Debian/Ubuntu:
# sudo apt install wireguard

# RHEL/CentOS/Fedora:
# sudo dnf install wireguard-tools

# Arch:
# sudo pacman -S wireguard-tools

Distro Note:

  • Ubuntu 20.04+, Debian 11+, Fedora 32+: WireGuard is in the kernel. Just install wireguard-tools for the userspace utilities.
  • CentOS/RHEL 8: You may need to install kmod-wireguard from ELRepo or use the DKMS module: sudo dnf install elrepo-release && sudo dnf install kmod-wireguard.
  • CentOS/RHEL 9+: WireGuard is in the kernel. Install wireguard-tools.
  • Older kernels (< 5.6): WireGuard is available as a DKMS module via the WireGuard PPA or ELRepo.

WireGuard vs OpenVPN

+------------------------------------------------------------------+
|  Feature           | WireGuard          | OpenVPN               |
|--------------------|--------------------|----------------------|
|  Codebase          | ~4,000 lines       | ~100,000+ lines      |
|  Protocol          | UDP only           | UDP or TCP           |
|  Cryptography      | Modern, fixed      | Configurable (TLS)   |
|                    | (Curve25519,       | (many cipher options)|
|                    | ChaCha20, Poly1305)|                      |
|  Performance       | Excellent (kernel) | Good (userspace)     |
|  Configuration     | Simple INI-like    | Complex config files |
|  Key management    | Public/private keys| Certificates (PKI)   |
|  Connection model  | Peer-to-peer       | Client-server        |
|  Stealth           | No response to     | Depends on config    |
|                    | unauthenticated    |                      |
|                    | packets            |                      |
|  Roaming           | Built-in           | Reconnect needed     |
|  Kernel integration| In-kernel (5.6+)   | Userspace (tun/tap)  |
+------------------------------------------------------------------+

WireGuard's philosophy: there are no configurable cipher suites. It uses one fixed set of modern, audited algorithms. If a vulnerability is found, a new version replaces the algorithms entirely. This eliminates the entire category of "misconfigured crypto" bugs.


Core Concepts

WireGuard thinks in terms of peers, not "clients" and "servers." Every machine that participates in a WireGuard network is a peer. Each peer has:

  1. A private key (kept secret)
  2. A public key (shared with other peers)
  3. An IP address on the VPN tunnel
  4. A list of allowed IPs (what traffic routes through the tunnel to this peer)
+--------------------+                    +--------------------+
|    Peer A          |   WireGuard        |    Peer B          |
|                    |   Tunnel           |                    |
|  Private key: kA   |<==================>|  Private key: kB   |
|  Public key:  KA   |   UDP port 51820   |  Public key:  KB   |
|  VPN IP: 10.0.0.1  |                    |  VPN IP: 10.0.0.2  |
|                    |                    |                    |
|  AllowedIPs for B: |                    |  AllowedIPs for A: |
|    10.0.0.2/32     |                    |    10.0.0.1/32     |
+--------------------+                    +--------------------+

The Cryptokey Routing Table

WireGuard's most elegant concept is the cryptokey routing table. It maps public keys to allowed IP addresses. When WireGuard receives an encrypted packet, it decrypts it with the peer's key and checks if the source IP is in that peer's AllowedIPs. If not, the packet is dropped.

Similarly, when WireGuard needs to send a packet, it looks up the destination IP in the AllowedIPs of all peers and encrypts it with the matching peer's public key.

Cryptokey Routing Table:
+-----------------------------------+
| Public Key        | AllowedIPs    |
|-------------------|---------------|
| KB (Peer B)       | 10.0.0.2/32  |
| KC (Peer C)       | 10.0.0.3/32, |
|                   | 192.168.2.0/24|
+-----------------------------------+

Packet to 10.0.0.2 --> encrypt with KB --> send to Peer B's endpoint
Packet to 192.168.2.50 --> encrypt with KC --> send to Peer C's endpoint

Hands-On: Point-to-Point Tunnel

Let's set up a basic WireGuard tunnel between two machines.

Step 1: Generate Keys on Both Machines

On Peer A (the "server"):

# Generate private key
wg genkey | sudo tee /etc/wireguard/private.key
sudo chmod 600 /etc/wireguard/private.key

# Derive public key from private key
sudo cat /etc/wireguard/private.key | wg pubkey | sudo tee /etc/wireguard/public.key

On Peer B (the "client"):

wg genkey | sudo tee /etc/wireguard/private.key
sudo chmod 600 /etc/wireguard/private.key
sudo cat /etc/wireguard/private.key | wg pubkey | sudo tee /etc/wireguard/public.key

Now exchange public keys between the two machines. You need:

  • Peer A's public key on Peer B
  • Peer B's public key on Peer A

Think About It: Why do we generate keys separately on each machine rather than generating all keys on one machine and distributing them? Think about what would happen if the private key were intercepted during transfer.

Generating keys locally means the private key never crosses the network. If you generate all keys on one machine and transfer them, the private keys could be intercepted, logged, or cached on intermediate systems. The private key should ideally never exist anywhere except on the machine that owns it.

Step 2: Configure Peer A (Server)

Create /etc/wireguard/wg0.conf on Peer A:

[Interface]
# Peer A's private key
PrivateKey = <PEER_A_PRIVATE_KEY>
Address = 10.0.0.1/24
ListenPort = 51820

[Peer]
# Peer B's public key
PublicKey = <PEER_B_PUBLIC_KEY>
AllowedIPs = 10.0.0.2/32

Step 3: Configure Peer B (Client)

Create /etc/wireguard/wg0.conf on Peer B:

[Interface]
# Peer B's private key
PrivateKey = <PEER_B_PRIVATE_KEY>
Address = 10.0.0.2/24

[Peer]
# Peer A's public key
PublicKey = <PEER_A_PUBLIC_KEY>
Endpoint = 203.0.113.50:51820
AllowedIPs = 10.0.0.1/32
PersistentKeepalive = 25

Key differences on the client side:

  • Endpoint: Peer A's public IP and port. Peer B needs to know where to reach Peer A. Peer A does not need an Endpoint for Peer B because it will learn Peer B's address from incoming packets.
  • PersistentKeepalive: Sends a keepalive packet every 25 seconds. This is essential when Peer B is behind NAT, to keep the NAT mapping alive.

Step 4: Bring Up the Tunnel

On both machines:

# Start the tunnel
sudo wg-quick up wg0

# Check status
sudo wg show

# Test connectivity
ping -c 3 10.0.0.1    # From Peer B
ping -c 3 10.0.0.2    # From Peer A

Step 5: Verify the Tunnel

sudo wg show

Expected output on Peer A:

interface: wg0
  public key: <PEER_A_PUBLIC_KEY>
  private key: (hidden)
  listening port: 51820

peer: <PEER_B_PUBLIC_KEY>
  endpoint: 198.51.100.20:43210
  allowed ips: 10.0.0.2/32
  latest handshake: 12 seconds ago
  transfer: 1.24 KiB received, 956 B sent

If you see "latest handshake" with a recent timestamp, the tunnel is working.

Step 6: Enable on Boot

sudo systemctl enable wg-quick@wg0

Bringing the Tunnel Down

sudo wg-quick down wg0

Safety Warning: The WireGuard configuration file contains your private key. Set strict permissions:

sudo chmod 600 /etc/wireguard/wg0.conf
sudo chown root:root /etc/wireguard/wg0.conf

Routing All Traffic Through the VPN

To use WireGuard as a full VPN (all internet traffic goes through the tunnel), modify the client configuration:

On Peer B (client):

[Interface]
PrivateKey = <PEER_B_PRIVATE_KEY>
Address = 10.0.0.2/24
DNS = 1.1.1.1

[Peer]
PublicKey = <PEER_A_PUBLIC_KEY>
Endpoint = 203.0.113.50:51820
AllowedIPs = 0.0.0.0/0, ::/0
PersistentKeepalive = 25

The key change is AllowedIPs = 0.0.0.0/0, ::/0 -- this means "route ALL traffic (IPv4 and IPv6) through this peer."

On Peer A (server), enable forwarding and NAT:

[Interface]
PrivateKey = <PEER_A_PRIVATE_KEY>
Address = 10.0.0.1/24
ListenPort = 51820
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

Also enable IP forwarding on the server:

echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/99-wireguard.conf
sudo sysctl -p /etc/sysctl.d/99-wireguard.conf

The PostUp and PostDown hooks in the config file run shell commands when the interface comes up or goes down. Here we add and remove NAT rules automatically.

Think About It: After setting AllowedIPs to 0.0.0.0/0, you notice your DNS queries are leaking -- they bypass the VPN and go to your local DNS server. Why, and how does the DNS = 1.1.1.1 line in the Interface section fix this?

wg-quick reads the DNS directive and configures the system's DNS resolver to use the specified server. Since all traffic (including DNS on port 53) is routed through AllowedIPs = 0.0.0.0/0, DNS queries now go through the tunnel to the VPN server, which forwards them to 1.1.1.1. Without the DNS directive, the system might still use the local DNS server configured by DHCP.


Multi-Peer Setup (Hub and Spoke)

A common architecture: one VPN server (hub) with multiple clients (spokes).

                    +-------------------+
                    |    VPN Server     |
                    |   10.0.0.1/24     |
                    |  (Public IP)      |
                    +---+-------+---+---+
                        |       |   |
              +---------+   +---+   +---------+
              |             |                 |
        +-----+----+  +----+-----+  +--------+---+
        | Client A |  | Client B |  | Client C   |
        | 10.0.0.2 |  | 10.0.0.3 |  | 10.0.0.4   |
        +----------+  +----------+  +-------------+

Server Configuration

# /etc/wireguard/wg0.conf on VPN Server

[Interface]
PrivateKey = <SERVER_PRIVATE_KEY>
Address = 10.0.0.1/24
ListenPort = 51820
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

[Peer]
# Client A
PublicKey = <CLIENT_A_PUBLIC_KEY>
AllowedIPs = 10.0.0.2/32

[Peer]
# Client B
PublicKey = <CLIENT_B_PUBLIC_KEY>
AllowedIPs = 10.0.0.3/32

[Peer]
# Client C
PublicKey = <CLIENT_C_PUBLIC_KEY>
AllowedIPs = 10.0.0.4/32

Client A Configuration

# /etc/wireguard/wg0.conf on Client A

[Interface]
PrivateKey = <CLIENT_A_PRIVATE_KEY>
Address = 10.0.0.2/24
DNS = 1.1.1.1

[Peer]
PublicKey = <SERVER_PUBLIC_KEY>
Endpoint = 203.0.113.50:51820
AllowedIPs = 10.0.0.0/24
PersistentKeepalive = 25

With AllowedIPs = 10.0.0.0/24, Client A can reach the server and all other clients through the VPN, but regular internet traffic goes directly (split tunneling).

For full tunnel (all traffic through VPN), change to AllowedIPs = 0.0.0.0/0, ::/0.

Adding a New Peer Without Restarting

You can add peers on the fly without restarting the WireGuard interface:

# On the server, add a new peer dynamically
sudo wg set wg0 peer <NEW_CLIENT_PUBLIC_KEY> allowed-ips 10.0.0.5/32

# Save the running config to the config file
sudo wg-quick save wg0

Allowing Clients to Communicate with Each Other

By default in the hub-and-spoke model, client-to-client traffic flows through the server. For this to work, IP forwarding must be enabled on the server, and the iptables FORWARD rule must be in place.

If client A wants to reach client B (10.0.0.3), the packet flow is:

Client A (10.0.0.2) --> VPN Server (10.0.0.1) --> Client B (10.0.0.3)

On each client, AllowedIPs must include the other clients' IPs (or the entire VPN subnet like 10.0.0.0/24).


Site-to-Site VPN

WireGuard can connect entire networks, not just individual hosts. Say Office A has the network 192.168.1.0/24 and Office B has 192.168.2.0/24.

Office A Gateway

[Interface]
PrivateKey = <OFFICE_A_PRIVATE_KEY>
Address = 10.0.0.1/24
ListenPort = 51820
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -D FORWARD -o wg0 -j ACCEPT

[Peer]
PublicKey = <OFFICE_B_PUBLIC_KEY>
Endpoint = office-b.example.com:51820
AllowedIPs = 10.0.0.2/32, 192.168.2.0/24
PersistentKeepalive = 25

Office B Gateway

[Interface]
PrivateKey = <OFFICE_B_PRIVATE_KEY>
Address = 10.0.0.2/24
ListenPort = 51820
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -D FORWARD -o wg0 -j ACCEPT

[Peer]
PublicKey = <OFFICE_A_PUBLIC_KEY>
Endpoint = office-a.example.com:51820
AllowedIPs = 10.0.0.1/32, 192.168.1.0/24
PersistentKeepalive = 25

Each gateway must also have IP forwarding enabled and must be the default gateway (or have static routes) for machines on their respective LANs.

On machines in Office A, add a route:

sudo ip route add 192.168.2.0/24 via 192.168.1.1   # Office A gateway's LAN IP

Troubleshooting WireGuard

Check 1: Is the Interface Up?

sudo wg show
ip addr show wg0

If wg show outputs nothing, the interface is not up:

sudo wg-quick up wg0
# Check for errors in the output

Check 2: Is the Handshake Happening?

sudo wg show

Look at the latest handshake field. If it says "none" or the timestamp is very old, the peers have not successfully established a session.

Common causes:

  • Firewall blocking UDP port 51820
  • Incorrect public key
  • Endpoint address is wrong
  • NAT not being traversed (add PersistentKeepalive)

Check 3: Firewall Rules

# Ensure UDP port 51820 is open on the server
sudo iptables -L -n | grep 51820
sudo ss -ulnp | grep 51820

# If using ufw:
sudo ufw allow 51820/udp

# If using firewalld:
sudo firewall-cmd --add-port=51820/udp --permanent
sudo firewall-cmd --reload

Check 4: IP Forwarding

If peers can ping each other but cannot reach networks behind a peer:

cat /proc/sys/net/ipv4/ip_forward
# Should be 1

Check 5: Key Mismatch

The most common configuration error is a key mismatch. Double-check:

  • Peer A's config has Peer B's public key (not private!)
  • Peer B's config has Peer A's public key
  • Neither config has the wrong key pasted
# Verify a public key matches a private key
echo "<PRIVATE_KEY>" | wg pubkey
# Output should match the corresponding public key

Check 6: Listen for Packets

# On the server, watch for incoming WireGuard traffic
sudo tcpdump -i eth0 udp port 51820 -n

# You should see UDP packets arriving from the client's IP

If no packets arrive, the issue is upstream of the server (client firewall, NAT, ISP blocking UDP).

Check 7: AllowedIPs Configuration

A subtle but common problem: if the AllowedIPs on the server do not include the client's VPN IP, the server will decrypt packets from the client but then drop them because the source IP is not in the allowed list.

# Verify AllowedIPs for each peer
sudo wg show wg0

Debug This

Scenario: You have set up WireGuard between a server (Peer A, public IP 203.0.113.50) and a laptop (Peer B). sudo wg-quick up wg0 succeeds on both sides. But from Peer B, ping 10.0.0.1 (Peer A's VPN IP) fails. You check:

# On Peer B:
$ sudo wg show
interface: wg0
  public key: <PEER_B_PUB>
  private key: (hidden)
  listening port: 43210

peer: <PEER_A_PUB>
  endpoint: 203.0.113.50:51820
  allowed ips: 10.0.0.1/32
  latest handshake: (none)
  transfer: 0 B received, 920 B sent

What does "latest handshake: (none)" and "0 B received, 920 B sent" tell you?

Diagnosis: Peer B is sending packets (920 B sent = initiation handshake attempts) but receiving nothing back (0 B received). The handshake has never completed. This means Peer A is not responding. Possible causes:

  1. The server's firewall is blocking UDP port 51820.
  2. The server's WireGuard interface is not running.
  3. The server has the wrong public key for Peer B in its config.
  4. There is a NAT device in front of the server that is not forwarding port 51820.

Fix approach:

# SSH into Peer A and check:
sudo wg show                    # Is wg0 running?
sudo ss -ulnp | grep 51820     # Is WireGuard listening?
sudo iptables -L -n | grep 51820   # Is firewall blocking?
sudo tcpdump -i eth0 udp port 51820 -c 5   # Are packets arriving?

If tcpdump shows packets arriving but wg show shows no handshake, the key configuration is wrong. If tcpdump shows no packets, the issue is the network path (firewall or NAT between the two).


What Just Happened?

+-------------------------------------------------------------------+
|                     Chapter 37 Recap                               |
+-------------------------------------------------------------------+
|                                                                   |
|  * WireGuard is a modern VPN: simple, fast, in-kernel,            |
|    and uses state-of-the-art cryptography.                        |
|                                                                   |
|  * It uses a peer model, not client/server. Each peer has         |
|    a public/private key pair and a list of AllowedIPs.            |
|                                                                   |
|  * The cryptokey routing table maps public keys to IP             |
|    ranges, combining authentication and routing.                  |
|                                                                   |
|  * Configuration is a single INI-style file in                    |
|    /etc/wireguard/wg0.conf.                                       |
|                                                                   |
|  * wg-quick manages the interface lifecycle. Enable on boot       |
|    with systemctl enable wg-quick@wg0.                            |
|                                                                   |
|  * Route all traffic through VPN with AllowedIPs = 0.0.0.0/0.    |
|    Requires NAT and IP forwarding on the server.                  |
|                                                                   |
|  * Multi-peer setups use hub-and-spoke topology.                  |
|    Peers can be added dynamically with wg set.                    |
|                                                                   |
|  * Troubleshooting: check handshake status, firewall rules,       |
|    IP forwarding, and key correctness.                            |
|                                                                   |
+-------------------------------------------------------------------+

Try This

  1. Basic tunnel: Set up a WireGuard tunnel between two machines (VMs, cloud instances, or even two containers). Verify that you can ping across the tunnel.

  2. Full tunnel: Modify the setup so that all traffic from the client routes through the VPN. Visit ifconfig.me or curl ifconfig.me to verify your exit IP has changed to the server's IP.

  3. Multi-peer: Add a third peer to your setup. Verify that all three peers can ping each other through the VPN.

  4. Performance test: Install iperf3 on both ends. Run a bandwidth test through the WireGuard tunnel vs directly. How much overhead does WireGuard add?

    # On server: iperf3 -s
    # On client: iperf3 -c 10.0.0.1
    
  5. Bonus challenge: Set up a site-to-site VPN. Create two virtual networks (192.168.1.0/24 and 192.168.2.0/24) each behind a WireGuard gateway. Configure routing so that hosts on one network can reach hosts on the other network through the WireGuard tunnel, without WireGuard being installed on the individual hosts.

Linux Security Fundamentals

Why This Matters

It is 2:00 AM. Your phone buzzes. A monitoring alert says your web server is sending outbound traffic to an IP address in a country you have no business with. Someone has exploited a forgotten test account with the password test123, escalated privileges through a world-writable script, and is now exfiltrating your customer database.

This is not fiction. It happens every day. The difference between the teams that get breached and the teams that do not is rarely some expensive appliance -- it is the disciplined application of basic security principles. Strong passwords, minimal privileges, reduced attack surfaces, and automated patching stop the vast majority of real-world attacks. This chapter gives you the mindset and the concrete tools to harden a Linux system from the ground up.


Try This Right Now

Before we discuss theory, get a quick snapshot of your system's security posture:

# Who is logged in right now?
who

# Any users with UID 0 (root-equivalent)?
awk -F: '$3 == 0 {print $1}' /etc/passwd

# Find all SUID binaries (programs that run as their owner, often root)
find / -perm -4000 -type f 2>/dev/null

# What ports are listening for connections?
ss -tlnp

# When did each user last change their password?
sudo chage -l root

Run these commands on a system you manage. If any output surprises you, good -- that is exactly the kind of surprise we want to find before an attacker does.


The Security Mindset

Security is not a product you install. It is a way of thinking about every decision you make on a system. Three principles form the foundation.

Defense in Depth

Never rely on a single layer of protection. Think of it like a medieval castle:

    +-----------------------------------------------+
    |  Internet                                     |
    |  +------------------------------------------+ |
    |  |  Firewall (iptables/nftables)            | |
    |  |  +-------------------------------------+ | |
    |  |  |  Network segmentation (VLANs)       | | |
    |  |  |  +--------------------------------+ | | |
    |  |  |  |  SSH key-only access            | | | |
    |  |  |  |  +---------------------------+  | | | |
    |  |  |  |  |  File permissions          |  | | | |
    |  |  |  |  |  +---------------------+  |  | | | |
    |  |  |  |  |  | SELinux / AppArmor  |  |  | | | |
    |  |  |  |  |  |  +---------------+  |  |  | | | |
    |  |  |  |  |  |  |  Your Data    |  |  |  | | | |
    |  |  |  |  |  |  +---------------+  |  |  | | | |
    |  |  |  |  |  +---------------------+  |  | | | |
    |  |  |  |  +---------------------------+  | | | |
    |  |  |  +--------------------------------+ | | |
    |  |  +-------------------------------------+ | |
    |  +------------------------------------------+ |
    +-----------------------------------------------+

If the firewall has a gap, SSH keys still protect login. If a key is compromised, file permissions limit damage. If permissions are bypassed, SELinux provides a last line of defense. Each layer is independent.

Least Privilege

Every user, process, and program should have the absolute minimum permissions needed to do its job -- and nothing more. A web server does not need to read /etc/shadow. A database process does not need to bind to port 22. A developer does not need sudo su - on a production server.

Attack Surface Reduction

Every running service, open port, installed package, and user account is a potential entry point. The smaller the surface, the fewer targets an attacker has.

  MORE RISK                              LESS RISK
  +------------------+                  +------------------+
  | 47 services      |                  | 5 services       |
  | 200 packages     |    Harden ---->  | 40 packages      |
  | 15 user accounts |                  | 3 user accounts  |
  | 12 open ports    |                  | 2 open ports     |
  +------------------+                  +------------------+

Think About It: Look at a server you manage. How many services are running that you do not actually use? Run systemctl list-units --type=service --state=running to find out. Could any of them be stopped and disabled?


User Security

Strong Password Policies

Weak passwords remain the number one cause of breaches. Linux gives you tools to enforce policy.

Setting Password Quality Requirements

On RHEL/Fedora systems, password quality is managed by pam_pwquality:

# View current password quality settings
sudo cat /etc/security/pwquality.conf

Key settings to configure:

# Edit password quality configuration
sudo vi /etc/security/pwquality.conf
# /etc/security/pwquality.conf
minlen = 12           # Minimum password length
dcredit = -1          # Require at least 1 digit
ucredit = -1          # Require at least 1 uppercase letter
lcredit = -1          # Require at least 1 lowercase letter
ocredit = -1          # Require at least 1 special character
maxrepeat = 3         # No more than 3 consecutive identical characters
reject_username       # Cannot contain the username
enforce_for_root      # Apply rules even when root sets passwords

Distro Note: On Debian/Ubuntu, install libpam-pwquality first: sudo apt install libpam-pwquality. The configuration file is the same.

Password Aging with chage

The chage command controls password expiration and aging policies.

# View password aging info for a user
sudo chage -l alice
Last password change                    : Jan 15, 2026
Password expires                        : Apr 15, 2026
Password inactive                       : May 15, 2026
Account expires                         : never
Minimum number of days between changes  : 7
Maximum number of days between changes  : 90
Number of days of warning before expiry : 14
# Set maximum password age to 90 days
sudo chage -M 90 alice

# Set minimum days between changes (prevents rapid cycling)
sudo chage -m 7 alice

# Set warning period to 14 days before expiry
sudo chage -W 14 alice

# Force password change on next login
sudo chage -d 0 alice

# Set account expiration date
sudo chage -E 2026-12-31 alice

# Set inactive period (days after expiry before account locks)
sudo chage -I 30 alice

Hands-On: Set Up a Secure User Account

# Create a user with secure defaults
sudo useradd -m -s /bin/bash -c "Alice Engineer" alice

# Set a password
sudo passwd alice

# Configure password aging
sudo chage -M 90 -m 7 -W 14 -I 30 alice

# Verify the settings
sudo chage -l alice

# Lock an account (prefix password hash with !)
sudo usermod -L alice

# Unlock the account
sudo usermod -U alice

Account Auditing

Regularly audit your user accounts:

# Find accounts with empty passwords (CRITICAL security issue)
sudo awk -F: '($2 == "") {print $1}' /etc/shadow

# Find accounts with UID 0 (root-equivalent access)
awk -F: '$3 == 0 {print $1}' /etc/passwd

# Find users who can log in (have a valid shell)
grep -v '/nologin\|/false' /etc/passwd

# Check for accounts that have never had a password set
sudo awk -F: '$2 == "!" || $2 == "!!" {print $1}' /etc/shadow

# List users in the sudo/wheel group
getent group sudo 2>/dev/null || getent group wheel

Think About It: Why is it dangerous to have more than one account with UID 0? (Hint: think about audit trails and accountability.)


File Security Review

Checking Permissions on Critical Files

Certain files must have strict permissions. If they do not, your system is vulnerable.

# /etc/shadow should be readable only by root
ls -l /etc/shadow
# Expected: -rw-r----- 1 root shadow

# /etc/passwd is world-readable (that is fine), but not writable
ls -l /etc/passwd
# Expected: -rw-r--r-- 1 root root

# SSH host keys must be root-only
ls -l /etc/ssh/ssh_host_*_key
# Expected: -rw------- 1 root root

# Home directories should not be world-readable
ls -ld /home/*
# Expected: drwx------ or drwxr-x---

# Cron directories
ls -ld /etc/cron.*
ls -l /etc/crontab

Finding World-Writable Files

World-writable files can be modified by any user -- a serious risk:

# Find world-writable files (excluding /proc and /sys)
find / -xdev -type f -perm -0002 -ls 2>/dev/null

# Find world-writable directories without sticky bit
find / -xdev -type d \( -perm -0002 -a ! -perm -1000 \) -ls 2>/dev/null

The sticky bit (t) on directories like /tmp prevents users from deleting each other's files. Without it, any user could delete any other user's temp files.

Finding SUID and SGID Binaries

SUID (Set User ID) binaries run with the permissions of the file owner, typically root. A vulnerable SUID binary is a direct path to root access.

# Find all SUID binaries
find / -perm -4000 -type f 2>/dev/null

# Find all SGID binaries
find / -perm -2000 -type f 2>/dev/null

# Find both SUID and SGID
find / -perm /6000 -type f 2>/dev/null

Common legitimate SUID binaries include:

/usr/bin/passwd          - Users need to update /etc/shadow (owned by root)
/usr/bin/sudo            - Privilege escalation by design
/usr/bin/su              - Switch user
/usr/bin/chsh            - Change login shell
/usr/bin/mount           - Mount filesystems (sometimes)
/usr/bin/ping            - Raw socket access (on older systems)

Any SUID binary not on your expected list should be investigated. To remove the SUID bit from a binary you do not need:

# Remove SUID bit (be careful -- do not break system tools!)
sudo chmod u-s /path/to/suspicious/binary

WARNING: Never remove the SUID bit from /usr/bin/sudo or /usr/bin/passwd unless you know exactly what you are doing. You will lock yourself out of privilege escalation and password changes.


Open Ports Audit

Every open port is a door. Know which doors are open and why.

# List all listening TCP and UDP ports with process info
sudo ss -tlnup

# Same information, different format
sudo netstat -tlnup    # (install net-tools if not available)

Example output:

State   Recv-Q  Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN  0       128     0.0.0.0:22          0.0.0.0:*          users:(("sshd",pid=1234))
LISTEN  0       511     0.0.0.0:80          0.0.0.0:*          users:(("nginx",pid=5678))
LISTEN  0       128     127.0.0.1:5432      0.0.0.0:*          users:(("postgres",pid=9012))
LISTEN  0       128     0.0.0.0:3306        0.0.0.0:*          users:(("mysqld",pid=3456))

Reading this output, ask yourself:

  • Port 22 on 0.0.0.0 -- SSH is accessible from all interfaces. Is that intended? Should it be restricted to a management network?
  • Port 80 on 0.0.0.0 -- Web server is publicly accessible. Expected for a web server.
  • Port 5432 on 127.0.0.1 -- PostgreSQL listens only on localhost. Good -- it is not exposed to the network.
  • Port 3306 on 0.0.0.0 -- MySQL is accessible from all interfaces. Does it need to be? If only local applications use it, bind it to 127.0.0.1.

Restricting a Service to Localhost

If a service should only accept local connections:

# For PostgreSQL, edit postgresql.conf
listen_addresses = 'localhost'

# For MySQL/MariaDB, edit my.cnf
bind-address = 127.0.0.1

# For Redis, edit redis.conf
bind 127.0.0.1

After changing the configuration, restart the service and verify:

sudo systemctl restart postgresql
sudo ss -tlnp | grep 5432
# Should show 127.0.0.1:5432, not 0.0.0.0:5432

Basic Hardening Checklist

Here is a practical hardening checklist you can walk through on any new server.

1. Update Everything

# Debian/Ubuntu
sudo apt update && sudo apt upgrade -y

# RHEL/Fedora
sudo dnf upgrade -y

2. Remove Unnecessary Packages and Services

# List running services
systemctl list-units --type=service --state=running

# Disable a service you do not need
sudo systemctl stop cups
sudo systemctl disable cups

# Remove packages you do not need
sudo apt remove --purge telnetd rsh-server    # Debian/Ubuntu
sudo dnf remove telnet-server rsh-server      # RHEL/Fedora

3. Configure SSH Securely

Edit /etc/ssh/sshd_config:

# Disable root login
PermitRootLogin no

# Disable password authentication (use keys only)
PasswordAuthentication no

# Use only SSH protocol 2
Protocol 2

# Limit SSH to specific users
AllowUsers alice bob

# Set idle timeout
ClientAliveInterval 300
ClientAliveCountMax 2
# Validate config before restarting
sudo sshd -t

# Restart SSH
sudo systemctl restart sshd

WARNING: Before disabling password authentication, make sure you have uploaded your SSH public key and tested key-based login. Otherwise you will lock yourself out.

4. Configure a Firewall

Use the firewall (covered in Chapter 34) to allow only needed ports:

# UFW (Ubuntu/Debian)
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable

# firewalld (RHEL/Fedora)
sudo firewall-cmd --set-default-zone=drop
sudo firewall-cmd --permanent --add-service=ssh
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload

5. Set Up Proper Logging

Ensure system logging is working and logs are being rotated:

# Check if rsyslog or journald is running
systemctl status rsyslog
systemctl status systemd-journald

# Check log rotation configuration
cat /etc/logrotate.conf
ls /etc/logrotate.d/

fail2ban: Automated Intrusion Prevention

fail2ban monitors log files for failed authentication attempts and automatically blocks offending IP addresses using firewall rules.

Installation

# Debian/Ubuntu
sudo apt install fail2ban

# RHEL/Fedora
sudo dnf install fail2ban

Configuration

Never edit the main config files directly. Create local overrides:

# Create a local jail configuration
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
sudo vi /etc/fail2ban/jail.local

Key settings in jail.local:

[DEFAULT]
# Ban IP for 1 hour
bantime = 3600

# Time window for counting failures
findtime = 600

# Number of failures before ban
maxretry = 5

# Email notifications (optional)
# destemail = admin@example.com
# action = %(action_mwl)s

[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
bantime = 3600

Distro Note: The log path differs between distributions. Debian/Ubuntu uses /var/log/auth.log, while RHEL/Fedora uses /var/log/secure. Some newer setups using only journald may need backend = systemd instead.

Managing fail2ban

# Start and enable fail2ban
sudo systemctl start fail2ban
sudo systemctl enable fail2ban

# Check status of all jails
sudo fail2ban-client status

# Check status of SSH jail specifically
sudo fail2ban-client status sshd

Example output:

Status for the jail: sshd
|- Filter
|  |- Currently failed: 2
|  |- Total failed:     47
|  `- File list:        /var/log/auth.log
`- Actions
   |- Currently banned: 1
   |- Total banned:     5
   `- Banned IP list:   203.0.113.42
# Manually unban an IP
sudo fail2ban-client set sshd unbanip 203.0.113.42

# Manually ban an IP
sudo fail2ban-client set sshd banip 198.51.100.10

# Check what fail2ban regex matches (useful for debugging)
sudo fail2ban-regex /var/log/auth.log /etc/fail2ban/filter.d/sshd.conf

Hands-On: Test fail2ban

# 1. Start fail2ban with SSH jail enabled
sudo systemctl start fail2ban

# 2. From another machine, intentionally fail SSH login 4+ times
ssh baduser@your-server    # Enter wrong password repeatedly

# 3. Check that the IP was banned
sudo fail2ban-client status sshd

# 4. Check the iptables/nftables rules fail2ban created
sudo iptables -L f2b-sshd -n -v

Unattended Upgrades: Automatic Security Patches

One of the most impactful security measures is simply keeping your system patched. Unattended upgrades automate this for security updates.

Debian/Ubuntu

# Install unattended-upgrades
sudo apt install unattended-upgrades apt-listchanges

# Enable it
sudo dpkg-reconfigure -plow unattended-upgrades

The main configuration file:

sudo vi /etc/apt/apt.conf.d/50unattended-upgrades
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
    // Uncomment the next line to include regular updates too
    // "${distro_id}:${distro_codename}-updates";
};

// Automatically reboot if required (e.g., kernel updates)
Unattended-Upgrade::Automatic-Reboot "true";
Unattended-Upgrade::Automatic-Reboot-Time "04:00";

// Email notification
Unattended-Upgrade::Mail "admin@example.com";
Unattended-Upgrade::MailReport "on-change";

// Remove unused dependencies
Unattended-Upgrade::Remove-Unused-Dependencies "true";

Enable the automatic timer:

sudo vi /etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
# Test the configuration (dry run)
sudo unattended-upgrades --dry-run --debug

# Check the logs
cat /var/log/unattended-upgrades/unattended-upgrades.log

RHEL/Fedora

# Install dnf-automatic
sudo dnf install dnf-automatic

# Configure it
sudo vi /etc/dnf/automatic.conf
[commands]
# What to do: download, apply, or nothing
apply_updates = yes
# Only apply security updates
upgrade_type = security

[emitters]
# How to notify
emit_via = stdio

[email]
email_from = root@localhost
email_to = admin@example.com
# Enable the timer
sudo systemctl enable --now dnf-automatic.timer

# Verify the timer is active
systemctl status dnf-automatic.timer

Think About It: Should you enable automatic reboots on a production database server? What could go wrong? How might you handle the need for kernel updates that require a reboot on a system that cannot go down during business hours?


Debug This

A junior admin reports that users are complaining they cannot change their passwords. You investigate and find:

$ ls -l /usr/bin/passwd
-rwxr-xr-x 1 root root 68208 Mar 14 2025 /usr/bin/passwd
$ passwd
passwd: Authentication token manipulation error
passwd: password unchanged

What is wrong? (Hint: compare the permissions shown above with what the passwd binary normally requires.)

Answer: The SUID bit is missing. The passwd command needs to run as root to modify /etc/shadow, but the permissions show -rwxr-xr-x instead of -rwsr-xr-x. Someone (or a misconfigured script) removed the SUID bit. Fix it with:

sudo chmod u+s /usr/bin/passwd

Verify:

ls -l /usr/bin/passwd
# Should show: -rwsr-xr-x 1 root root 68208 ...

What Just Happened?

+------------------------------------------------------------------+
|                  LINUX SECURITY FUNDAMENTALS                      |
+------------------------------------------------------------------+
|                                                                  |
|  MINDSET:                                                        |
|    - Defense in depth: multiple independent layers               |
|    - Least privilege: minimum necessary permissions              |
|    - Attack surface reduction: remove what you don't need        |
|                                                                  |
|  USER SECURITY:                                                  |
|    - Enforce password quality with pam_pwquality                 |
|    - Control password aging with chage                           |
|    - Audit accounts: empty passwords, extra UID 0, unused        |
|                                                                  |
|  FILE SECURITY:                                                  |
|    - Check permissions on /etc/shadow, /etc/passwd, SSH keys     |
|    - Find world-writable files and directories                   |
|    - Audit SUID/SGID binaries -- know your expected list         |
|                                                                  |
|  NETWORK SECURITY:                                               |
|    - Audit open ports with ss -tlnup                             |
|    - Bind services to localhost when possible                    |
|    - Firewall: default deny, allow only what you need            |
|                                                                  |
|  AUTOMATED DEFENSE:                                              |
|    - fail2ban: auto-ban brute-force attackers                    |
|    - unattended-upgrades / dnf-automatic: auto-patch             |
|                                                                  |
|  HARDENING CHECKLIST:                                            |
|    1. Update everything                                          |
|    2. Remove unnecessary packages/services                       |
|    3. Configure SSH securely (keys, no root login)               |
|    4. Enable firewall with default deny                          |
|    5. Set up logging and monitoring                              |
|    6. Enable fail2ban                                            |
|    7. Enable automatic security updates                          |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Security Audit Script

Write a bash script that performs a basic security audit:

#!/bin/bash
echo "=== Security Audit Report ==="
echo "Date: $(date)"
echo ""
echo "--- Users with UID 0 ---"
awk -F: '$3 == 0 {print $1}' /etc/passwd
echo ""
echo "--- Accounts with empty passwords ---"
sudo awk -F: '($2 == "") {print $1}' /etc/shadow
echo ""
echo "--- SUID binaries ---"
find / -perm -4000 -type f 2>/dev/null
echo ""
echo "--- Listening ports ---"
ss -tlnp
echo ""
echo "--- Failed SSH logins (last 24h) ---"
sudo journalctl -u sshd --since "24 hours ago" | grep -c "Failed"
echo ""
echo "--- Packages needing updates ---"
apt list --upgradable 2>/dev/null | tail -n +2 | wc -l

Run it on your system and review the output.

Exercise 2: Harden a Fresh Server

Start with a fresh VM and apply the full hardening checklist from this chapter. Document every change you make. Then ask a colleague to try to find something you missed.

Exercise 3: fail2ban Custom Jail

Create a custom fail2ban jail for Nginx that bans IPs making too many 404 requests (a common sign of vulnerability scanning). Hint: you will need to write a custom filter in /etc/fail2ban/filter.d/.

Bonus Challenge

Set up a second VM and use it to attack your hardened server. Try brute-forcing SSH, scanning ports with nmap, and accessing services. Document what your defenses caught and what got through. This adversarial testing is how real security teams validate their hardening.

TLS/SSL & Public Key Infrastructure

Why This Matters

Every time you visit a website, check your email, or push code to a repository, your data travels across networks controlled by strangers. Without encryption, anyone sitting between you and the server -- an ISP, a coffee shop Wi-Fi operator, a compromised router -- can read and modify everything in transit. Your passwords, credit card numbers, private messages, all in plain text.

TLS (Transport Layer Security) is the protocol that prevents this. That padlock icon in your browser exists because of TLS, and behind TLS is an entire trust system called Public Key Infrastructure (PKI). Understanding how these pieces fit together is not optional for anyone who operates servers. Misconfigure TLS and you either break your service or leave it exposed. This chapter gives you the conceptual foundation; the next chapter puts it into practice with OpenSSL.


Try This Right Now

See TLS in action on your own machine:

# Connect to a website and see the TLS handshake
openssl s_client -connect example.com:443 -brief

# See the certificate chain your browser would verify
echo | openssl s_client -connect google.com:443 -showcerts 2>/dev/null \
  | grep "s:" | head -5

# Check which TLS version a server supports
openssl s_client -connect example.com:443 -tls1_3 -brief 2>&1 | head -3

# View the certificate details
echo | openssl s_client -connect example.com:443 2>/dev/null \
  | openssl x509 -noout -subject -issuer -dates

You just performed a TLS handshake, inspected a certificate chain, and checked expiration dates -- all from the command line.


Symmetric vs Asymmetric Encryption

Before we can understand TLS, we need to understand two fundamentally different approaches to encryption.

Symmetric Encryption: One Key, Two Purposes

With symmetric encryption, the same key encrypts and decrypts. Think of it like a physical lock where both parties have a copy of the same key.

  Alice                                    Bob
    |                                       |
    |  Plaintext: "Hello Bob"               |
    |       |                               |
    |   [Encrypt with shared key K]         |
    |       |                               |
    |  Ciphertext: "x7$mQ2..."             |
    |  -------- send over network --------> |
    |                                       |
    |                               [Decrypt with same key K]
    |                                       |
    |                               Plaintext: "Hello Bob"

Algorithms: AES-256, ChaCha20

Strengths: Extremely fast. AES-256 can encrypt gigabytes per second on modern hardware.

Weakness: How do Alice and Bob agree on the shared key in the first place? If they send it over the network, anyone watching can intercept it. This is called the key distribution problem.

Asymmetric Encryption: Two Keys, Complementary

Asymmetric encryption uses a pair of mathematically related keys: a public key and a private key. What one key encrypts, only the other can decrypt.

  Alice                                    Bob
    |                                       |
    |       Bob's Public Key (known to all) |
    |  <----- Bob publishes it ------------ |
    |                                       |
    |  Plaintext: "Hello Bob"               |
    |       |                               |
    |   [Encrypt with Bob's PUBLIC key]     |
    |       |                               |
    |  Ciphertext: "j9#kL..."              |
    |  -------- send over network --------> |
    |                                       |
    |                [Decrypt with Bob's PRIVATE key]
    |                                       |
    |                Plaintext: "Hello Bob"  |
    |                                       |
    |  Only Bob can decrypt -- only he has  |
    |  the private key.                     |

Algorithms: RSA, ECDSA, Ed25519

Strengths: Solves the key distribution problem. The public key can be shared openly; only the private key must be kept secret.

Weakness: Much slower than symmetric encryption (100 to 1000 times slower).

The Hybrid Approach (What TLS Actually Does)

TLS uses both. Asymmetric encryption is used briefly at the start to securely exchange a symmetric key. Then the actual data flows using fast symmetric encryption.

  Phase 1: Key Exchange (Asymmetric - slow, but only a few messages)
  +-------------------------------------------------------------+
  |  Client and server use asymmetric crypto to agree on a      |
  |  shared secret. No eavesdropper can learn the secret even   |
  |  if they record every byte on the wire.                     |
  +-------------------------------------------------------------+
          |
          v
  Phase 2: Data Transfer (Symmetric - fast, bulk encryption)
  +-------------------------------------------------------------+
  |  All application data (HTTP requests, responses, etc.) is   |
  |  encrypted with the shared symmetric key (AES or ChaCha20). |
  +-------------------------------------------------------------+

Think About It: Why not just use asymmetric encryption for everything? Consider a busy web server handling 10,000 requests per second. What would happen to its CPU if every byte of every response used RSA encryption?


How TLS Works: The Handshake

The TLS handshake is the process by which client and server establish a secure connection. Here is the TLS 1.3 handshake, which is the current standard:

  Client (Browser)                          Server (Web Server)
       |                                         |
       |  1. ClientHello                         |
       |  - Supported TLS versions               |
       |  - Supported cipher suites              |
       |  - Client random                        |
       |  - Key share (DH public value)          |
       |  -------------------------------------> |
       |                                         |
       |                          2. ServerHello |
       |             - Chosen TLS version        |
       |             - Chosen cipher suite       |
       |             - Server random             |
       |             - Key share (DH public)     |
       |                                         |
       |             3. EncryptedExtensions      |
       |             4. Certificate              |
       |             5. CertificateVerify        |
       |             6. Finished                 |
       |  <------------------------------------- |
       |                                         |
       |  At this point, both sides can          |
       |  compute the shared secret from         |
       |  the key shares (Diffie-Hellman).       |
       |                                         |
       |  7. Finished                            |
       |  -------------------------------------> |
       |                                         |
       |  === Encrypted Application Data ===     |
       |  <----------------------------------->  |
       |                                         |

Let's break down what happens at each step:

  1. ClientHello: The client says "Here is what I support" and sends its half of the Diffie-Hellman key exchange.

  2. ServerHello: The server picks the strongest mutually-supported cipher suite and sends its half of the key exchange.

  3. Certificate: The server sends its certificate so the client can verify it is who it claims to be.

  4. CertificateVerify: The server proves it owns the private key matching the certificate by signing the handshake transcript.

  5. Finished: Both sides confirm the handshake is complete and unmodified.

After step 2, both sides can compute the shared secret using Diffie-Hellman. From step 3 onward, everything is encrypted.

TLS 1.3 vs TLS 1.2: TLS 1.3 completes the handshake in 1 round trip (1-RTT) instead of 2. It also removed support for insecure algorithms and simplified the protocol. TLS 1.2 is still widely used, but TLS 1.3 is the target for new deployments.


Certificates and X.509

A certificate is a digitally signed document that binds a public key to an identity (like a domain name). The standard format is X.509.

What Is Inside a Certificate?

+--------------------------------------------------+
|                  X.509 Certificate               |
+--------------------------------------------------+
|  Version: 3                                      |
|  Serial Number: 04:A3:7B:...                     |
|  Signature Algorithm: SHA256withRSA              |
|                                                  |
|  Issuer: CN=Let's Encrypt Authority X3           |
|          O=Let's Encrypt                         |
|          C=US                                    |
|                                                  |
|  Validity:                                       |
|    Not Before: Jan 1 00:00:00 2026 UTC           |
|    Not After:  Apr 1 00:00:00 2026 UTC           |
|                                                  |
|  Subject: CN=www.example.com                     |
|                                                  |
|  Subject Public Key Info:                        |
|    Algorithm: RSA (2048-bit)                     |
|    Public Key: 30:82:01:0a:02:82:01:01:...       |
|                                                  |
|  Extensions:                                     |
|    Subject Alternative Name (SAN):               |
|      DNS: www.example.com                        |
|      DNS: example.com                            |
|    Key Usage: Digital Signature, Key Encipherment|
|    Basic Constraints: CA:FALSE                   |
|                                                  |
|  Signature: a4:b7:c3:d1:...                      |
|  (Signed by the Issuer's private key)            |
+--------------------------------------------------+

Key fields:

  • Subject -- Who this certificate is for (the domain name).
  • Issuer -- Who signed this certificate (the Certificate Authority).
  • Validity -- When the certificate is valid. Expired certificates are rejected.
  • Public Key -- The subject's public key. Used by clients to encrypt data.
  • Subject Alternative Name (SAN) -- Additional domain names the certificate covers. Modern certificates use SAN instead of (or in addition to) the CN.
  • Signature -- The issuer's digital signature proving the certificate has not been tampered with.

Certificate Chains and Trust

A single certificate is not enough. Browsers and operating systems need to trust it. This is where certificate chains come in.

The Chain of Trust

  +---------------------------+
  |   Root CA Certificate     |  Stored in browser/OS trust store
  |   (Self-signed)           |  Offline, heavily guarded
  |   Validity: 20+ years    |
  +---------------------------+
              |
              | Signs
              v
  +---------------------------+
  |   Intermediate CA Cert    |  Operated by the CA
  |   Issued by Root CA       |  Used for day-to-day signing
  |   Validity: 5-10 years   |
  +---------------------------+
              |
              | Signs
              v
  +---------------------------+
  |   Leaf Certificate        |  Your server's certificate
  |   (End-entity cert)       |  The one you install on Nginx
  |   Issued by Intermediate  |
  |   Validity: 90 days-1 yr  |
  +---------------------------+

When your browser connects to www.example.com:

  1. The server sends its leaf certificate and the intermediate certificate.
  2. The browser checks: "Was this leaf cert signed by this intermediate?"
  3. The browser checks: "Was this intermediate signed by a root CA I trust?"
  4. The browser has a built-in list of trusted root CAs. If the chain leads to one of them, the connection is trusted.

Why Intermediates?

The root CA's private key is incredibly valuable. If it were compromised, every certificate it ever signed would be suspect. By keeping the root offline (literally in a vault), and using intermediates for daily signing, the root is protected. If an intermediate is compromised, only that intermediate is revoked -- not the root.

# See the root CAs your system trusts
ls /etc/ssl/certs/ | head -20

# Count how many root CAs your system trusts
ls /etc/ssl/certs/ | wc -l

# On Debian/Ubuntu, the bundle file is:
cat /etc/ssl/certs/ca-certificates.crt | grep "BEGIN CERTIFICATE" | wc -l

Distro Note: The CA certificate bundle lives in different places:

  • Debian/Ubuntu: /etc/ssl/certs/ca-certificates.crt
  • RHEL/Fedora: /etc/pki/tls/certs/ca-bundle.crt
  • Both are managed by the ca-certificates package.

Think About It: What happens if you visit a website whose certificate was signed by a root CA that your browser does not trust? You have seen this -- it is the "Your connection is not private" warning. What would an attacker need to do to make a fake certificate that your browser trusts?


Public and Private Keys

The public key and private key are mathematically related but computationally infeasible to derive one from the other.

How They Work Together

  SIGNING (proves identity):
  +----------------------------------------------------+
  |  1. Server hashes the message                      |
  |  2. Server encrypts the hash with its PRIVATE key  |
  |     This encrypted hash is the "signature"         |
  |  3. Client decrypts signature with PUBLIC key      |
  |  4. Client hashes the message independently        |
  |  5. If hashes match --> signature is valid          |
  +----------------------------------------------------+

  ENCRYPTION (protects data):
  +----------------------------------------------------+
  |  1. Client encrypts data with server's PUBLIC key   |
  |  2. Only the server's PRIVATE key can decrypt it   |
  +----------------------------------------------------+

The private key must never leave the server. If someone obtains your private key, they can:

  • Impersonate your server
  • Decrypt recorded traffic (unless you use forward secrecy)
  • Sign things as you
# A private key file should always have strict permissions
ls -l /etc/ssl/private/
# Expected: drwx--x--- root ssl-cert

ls -l /etc/ssl/private/server.key
# Expected: -rw-r----- root ssl-cert
# Or even: -rw------- root root

Key Types

AlgorithmKey SizeSecurity LevelSpeedUsage
RSA2048-bitGoodSlowerLegacy, widely supported
RSA4096-bitBetterSlowHigh-security needs
ECDSA256-bitGood (= RSA 3072)FastModern, recommended
Ed25519256-bitGoodFastestSSH keys, newer TLS

ECDSA with P-256 offers equivalent security to RSA 3072-bit at a fraction of the computational cost. For new deployments, ECDSA is the standard choice.


The CSR: Certificate Signing Request

When you want a Certificate Authority to issue you a certificate, you do not send them your private key (never do that). Instead, you create a CSR.

What Is in a CSR?

  +-----------------------------------------------+
  |         Certificate Signing Request            |
  +-----------------------------------------------+
  |  Subject:                                      |
  |    CN = www.example.com                        |
  |    O  = Example Corp                           |
  |    C  = IN                                     |
  |                                                |
  |  Public Key: (your public key)                 |
  |                                                |
  |  Signature: (signed with YOUR private key,     |
  |    proving you own the corresponding key)      |
  +-----------------------------------------------+

The flow:

  You (Server Admin)              Certificate Authority
       |                                |
  1. Generate key pair                  |
     (private key + public key)         |
       |                                |
  2. Create CSR                         |
     (embed public key + identity)      |
       |                                |
  3. Send CSR  -----------------------> |
       |                                |
       |                  4. Verify you control the domain
       |                     (via HTTP challenge, DNS, etc.)
       |                                |
       |                  5. Sign the certificate
       |                     (using CA's private key)
       |                                |
  6. Receive signed cert <------------- |
       |                                |
  7. Install cert + private key         |
     on your web server                 |

The private key never leaves your server. The CA never sees it. The CSR proves you own the matching private key without revealing it.


Why PKI Matters

Without PKI, the internet has no way to answer the question: "Am I really talking to bank.example.com, or is someone pretending to be them?"

PKI provides three guarantees:

  1. Authentication -- The server is who it claims to be (verified by the CA chain).
  2. Confidentiality -- The data is encrypted in transit (symmetric encryption with keys exchanged via asymmetric crypto).
  3. Integrity -- The data has not been tampered with (message authentication codes).

What Happens Without TLS

  Without TLS (HTTP):
  +--------+                    +----------+                  +--------+
  | Client | --- "password" --> | Attacker | --- "password" --> | Server |
  +--------+     (plaintext)    +----------+    (plaintext)    +--------+
                                     |
                              Attacker reads
                              your password!

  With TLS (HTTPS):
  +--------+                    +----------+                  +--------+
  | Client | --- "x7#kQ..." --> | Attacker | --- "x7#kQ..." --> | Server |
  +--------+     (encrypted)   +----------+    (encrypted)    +--------+
                                     |
                              Attacker sees only
                              encrypted garbage.

Certificate Formats

Certificates come in several formats. Knowing which is which will save you hours of frustration.

PEM (Privacy Enhanced Mail)

The most common format on Linux. Base64-encoded with header/footer markers.

-----BEGIN CERTIFICATE-----
MIIDXTCCAkWgAwIBAgIJALKhBz3h7qGhMA0GCSqGSIb3DQEBCwUAMEUxCzAJBgNV
BAYTAklOMQ0wCwYDVQQIDARUZXN0MQ0wCwYDVQQHDARUZXN0MRgwFgYDVQQKDA9F
... (base64 data) ...
eG1wbGUgQ29ycDAeFw0yNjAxMDEwMDAwMDBaFw0yNjA0MDEwMDAwMDBaMAAwggEi
-----END CERTIFICATE-----
  • File extensions: .pem, .crt, .cer, .key
  • Used by: Apache, Nginx, most Linux tools
  • You can concatenate multiple PEM certificates into one file (certificate chain).

DER (Distinguished Encoding Rules)

Binary format. Not human-readable.

  • File extensions: .der, .cer
  • Used by: Java, Windows, some embedded systems
  • Contains the same data as PEM, just binary-encoded instead of Base64.

PKCS#12 / PFX

A binary format that bundles the private key, certificate, and chain together in one password-protected file.

  • File extensions: .p12, .pfx
  • Used by: Windows, Java keystores, importing/exporting key+cert pairs
  • Useful for transferring credentials between systems.

Converting Between Formats

# PEM to DER
openssl x509 -in cert.pem -outform DER -out cert.der

# DER to PEM
openssl x509 -in cert.der -inform DER -outform PEM -out cert.pem

# PEM cert + key to PKCS#12
openssl pkcs12 -export -out bundle.p12 \
  -inkey private.key -in cert.pem -certfile chain.pem

# PKCS#12 to PEM
openssl pkcs12 -in bundle.p12 -out everything.pem -nodes

We will cover these commands in detail in the next chapter (OpenSSL Hands-On).

Quick Reference

+------------+--------+----------+-----------+------------------------+
| Format     | Encode | Readable | Extension | Common Use             |
+------------+--------+----------+-----------+------------------------+
| PEM        | Base64 | Yes      | .pem .crt | Linux, Nginx, Apache   |
| DER        | Binary | No       | .der .cer | Java, Windows          |
| PKCS#12    | Binary | No       | .p12 .pfx | Import/Export bundles   |
+------------+--------+----------+-----------+------------------------+

Hands-On: Inspect a Real Certificate Chain

Let's examine the certificate chain for a real website:

# Connect and show the full certificate chain
echo | openssl s_client -connect www.google.com:443 -showcerts 2>/dev/null

In the output, you will see multiple -----BEGIN CERTIFICATE----- blocks. These are the certificates in the chain, from leaf to intermediate.

# Extract just the subject and issuer of each cert in the chain
echo | openssl s_client -connect www.google.com:443 -showcerts 2>/dev/null \
  | awk '/BEGIN CERT/,/END CERT/{print}' \
  | csplit -z -f cert- - '/BEGIN CERT/' '{*}' 2>/dev/null

# Now examine each certificate
for f in cert-*; do
  echo "=== $f ==="
  openssl x509 -in "$f" -noout -subject -issuer
  echo ""
done

# Clean up
rm -f cert-*

You should see something like:

=== cert-00 ===
subject=CN = www.google.com
issuer=C = US, O = Google Trust Services, CN = GTS CA 1C3

=== cert-01 ===
subject=C = US, O = Google Trust Services, CN = GTS CA 1C3
issuer=C = US, O = Google Trust Services LLC, CN = GTS Root R1

Notice how each certificate's issuer matches the next certificate's subject. That is the chain of trust.


Debug This

A colleague configures TLS on a new server, but clients see "certificate verify failed" errors. They run:

$ echo | openssl s_client -connect newserver.example.com:443 2>&1 | head -20

And see:

depth=0 CN = newserver.example.com
verify error:num=20:unable to get local issuer certificate
verify error:num=21:unable to verify the first certificate
Verify return code: 21 (unable to verify the first certificate)

The certificate itself is valid and not expired. What is the problem?

Answer: The server is sending only the leaf certificate, not the intermediate certificate. The client cannot build the chain from the leaf to a trusted root CA. The fix is to configure the server to send the full chain:

For Nginx:

ssl_certificate /etc/ssl/certs/fullchain.pem;  # leaf + intermediate
ssl_certificate_key /etc/ssl/private/server.key;

The fullchain.pem file should contain the leaf certificate followed by the intermediate certificate(s), concatenated together.


What Just Happened?

+------------------------------------------------------------------+
|              TLS/SSL & PUBLIC KEY INFRASTRUCTURE                  |
+------------------------------------------------------------------+
|                                                                  |
|  ENCRYPTION TYPES:                                               |
|    Symmetric  - Same key encrypts/decrypts (AES) - fast          |
|    Asymmetric - Key pair: public + private (RSA/ECDSA) - slow    |
|    TLS uses both: asymmetric for key exchange, symmetric for     |
|    data transfer (hybrid approach).                              |
|                                                                  |
|  TLS HANDSHAKE (1.3):                                            |
|    ClientHello --> ServerHello + Cert --> Finished                |
|    1 round trip, then encrypted data flows.                      |
|                                                                  |
|  CERTIFICATES (X.509):                                           |
|    Bind a public key to a domain name.                           |
|    Fields: Subject, Issuer, Validity, Public Key, SAN.           |
|                                                                  |
|  CHAIN OF TRUST:                                                 |
|    Root CA --> Intermediate CA --> Leaf (your cert)               |
|    Browsers trust root CAs; intermediates bridge the gap.        |
|                                                                  |
|  CSR FLOW:                                                       |
|    Generate key pair --> Create CSR --> Send to CA                |
|    --> CA verifies domain --> CA signs cert --> You install it    |
|                                                                  |
|  FORMATS:                                                        |
|    PEM (Base64, Linux) | DER (Binary, Java) | PKCS#12 (Bundle)  |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Inspect Certificates

Pick five websites you use daily. For each one, use openssl s_client to determine:

  • What TLS version is used?
  • What cipher suite is negotiated?
  • Who issued the certificate?
  • When does it expire?
  • How many certificates are in the chain?
# Template command
echo | openssl s_client -connect DOMAIN:443 -brief 2>&1
echo | openssl s_client -connect DOMAIN:443 2>/dev/null \
  | openssl x509 -noout -subject -issuer -dates

Exercise 2: Trace the Trust

For one of those certificates, trace the full chain. Find the root CA in your system's trust store:

# Find the root CA name from the chain
echo | openssl s_client -connect example.com:443 -showcerts 2>/dev/null \
  | grep "s:.*CN"

# Search for it in the system trust store
grep -r "ISRG Root" /etc/ssl/certs/ 2>/dev/null | head -3

Exercise 3: Expired Certificate

Use openssl s_client to connect to expired.badssl.com:443. Read the error messages. What exactly does the output tell you about why the connection is untrusted?

Bonus Challenge

Draw your own ASCII diagram of what happens when you type https://example.com in your browser, from DNS resolution through the TLS handshake to the first encrypted HTTP request. Include every step, every key exchange, and every verification. This exercise forces you to truly understand the entire flow.

OpenSSL Hands-On

Why This Matters

In the previous chapter, we learned what TLS, certificates, and PKI are. Now it is time to get your hands dirty. OpenSSL is the Swiss Army knife of cryptography on Linux -- it generates keys, creates certificates, tests TLS connections, and converts between formats. You will use it when setting up HTTPS on a web server, debugging certificate problems at 3 AM, creating a development CA for your team, or verifying that a certificate renewal actually worked.

This chapter is almost entirely commands and output. You will generate keys, sign certificates, build a certificate chain, and test it all -- the way you would in a real production environment.


Try This Right Now

# What version of OpenSSL do you have?
openssl version

# With build details
openssl version -a

# Generate a random 32-byte hex string (useful for tokens/passwords)
openssl rand -hex 32

# Quick hash of a file
echo "hello world" | openssl dgst -sha256

These commands confirm OpenSSL is installed and working.


Generating Private Keys

The private key is the foundation of everything. You generate it first, and everything else (CSR, certificate) builds on top of it.

RSA Keys

# Generate a 2048-bit RSA private key
openssl genrsa -out server-rsa.key 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
....................+++++
.......+++++
e is 65537 (0x010001)
# Generate a 4096-bit RSA key (more secure, slightly slower)
openssl genrsa -out server-rsa4096.key 4096

# Generate an encrypted private key (password-protected)
openssl genrsa -aes256 -out server-encrypted.key 2048
# You will be prompted for a passphrase

WARNING: If you password-protect a key used by a web server, the server will prompt for the passphrase every time it starts. This means automated restarts will hang. For server keys, typically leave them unencrypted but protect them with file permissions.

# Examine the key
openssl rsa -in server-rsa.key -text -noout | head -20
# Extract the public key from the private key
openssl rsa -in server-rsa.key -pubout -out server-rsa.pub

# View the public key
cat server-rsa.pub

ECDSA keys are smaller and faster than RSA while providing equivalent security.

# List available elliptic curves
openssl ecparam -list_curves | head -10

# Generate an ECDSA key using the P-256 curve (most common)
openssl ecparam -genkey -name prime256v1 -noout -out server-ecdsa.key

# Examine the key
openssl ec -in server-ecdsa.key -text -noout

# Extract the public key
openssl ec -in server-ecdsa.key -pubout -out server-ecdsa.pub

Key Comparison

# Compare file sizes
ls -la server-rsa.key server-ecdsa.key
-rw------- 1 user user 1704 Feb 21 10:00 server-rsa.key     # 2048-bit RSA
-rw------- 1 user user  227 Feb 21 10:00 server-ecdsa.key    # P-256 ECDSA

The ECDSA key is much smaller but provides comparable security.

Securing Your Private Key

# Set strict permissions -- only the owner can read
chmod 600 server-rsa.key server-ecdsa.key

# Verify permissions
ls -la server-rsa.key server-ecdsa.key

Never store private keys in git repositories, shared directories, or anywhere publicly accessible. Never email a private key.

Think About It: What would happen if an attacker obtained your web server's private key? What could they do with it? How would you recover?


Creating a CSR (Certificate Signing Request)

The CSR contains your public key and the identity information (domain name, etc.) that you want the CA to certify.

Interactive CSR Creation

# Create a CSR interactively
openssl req -new -key server-rsa.key -out server.csr

You will be prompted for:

Country Name (2 letter code) [AU]: IN
State or Province Name [Some-State]: Karnataka
Locality Name []: Bangalore
Organization Name []: Example Corp
Organizational Unit Name []: Engineering
Common Name []: www.example.com
Email Address []: admin@example.com

Please enter the following 'extra' attributes:
A challenge password []:          # Leave empty
An optional company name []:      # Leave empty

Non-Interactive CSR Creation (Scriptable)

For automation, pass everything on the command line:

openssl req -new \
  -key server-rsa.key \
  -out server.csr \
  -subj "/C=IN/ST=Karnataka/L=Bangalore/O=Example Corp/CN=www.example.com"

CSR with Subject Alternative Names (SAN)

Modern certificates require SANs. The Common Name (CN) alone is no longer sufficient -- browsers check SAN entries.

Create a configuration file first:

cat > san.cnf << 'EOF'
[req]
default_bits = 2048
prompt = no
default_md = sha256
distinguished_name = dn
req_extensions = v3_req

[dn]
C = IN
ST = Karnataka
L = Bangalore
O = Example Corp
CN = www.example.com

[v3_req]
subjectAltName = @alt_names

[alt_names]
DNS.1 = www.example.com
DNS.2 = example.com
DNS.3 = api.example.com
EOF
# Generate CSR with SANs
openssl req -new -key server-rsa.key -out server-san.csr -config san.cnf

# Verify the CSR includes SANs
openssl req -in server-san.csr -text -noout | grep -A4 "Subject Alternative"

Examining a CSR

# View the full CSR details
openssl req -in server.csr -text -noout
Certificate Request:
    Data:
        Version: 1 (0x0)
        Subject: C=IN, ST=Karnataka, L=Bangalore, O=Example Corp, CN=www.example.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                ...
    Signature Algorithm: sha256WithRSAEncryption
         ...
# Verify the CSR is valid (signature matches the key)
openssl req -in server.csr -verify -noout
verify OK

Self-Signed Certificates

Self-signed certificates are useful for development, testing, and internal services. They are NOT trusted by browsers (you will get a warning) because no CA vouched for them.

Quick Self-Signed Certificate

# Generate key and self-signed cert in one command
openssl req -x509 -newkey rsa:2048 -nodes \
  -keyout selfsigned.key \
  -out selfsigned.crt \
  -days 365 \
  -subj "/C=IN/ST=Karnataka/O=Dev Team/CN=localhost"

Breaking down the flags:

  • -x509 -- Output a certificate instead of a CSR
  • -newkey rsa:2048 -- Generate a new 2048-bit RSA key
  • -nodes -- Do not encrypt the private key (no passphrase)
  • -days 365 -- Valid for one year
  • -subj -- Non-interactive subject

Self-Signed with SANs

openssl req -x509 -newkey rsa:2048 -nodes \
  -keyout selfsigned.key \
  -out selfsigned.crt \
  -days 365 \
  -subj "/C=IN/ST=Karnataka/O=Dev Team/CN=localhost" \
  -addext "subjectAltName=DNS:localhost,IP:127.0.0.1"

Using a Self-Signed Certificate from an Existing Key

# Create self-signed cert from an existing key and CSR
openssl x509 -req -in server.csr \
  -signkey server-rsa.key \
  -out server-selfsigned.crt \
  -days 365

Examining Certificates

The openssl x509 command is your best friend for inspecting certificates.

View Full Certificate Details

openssl x509 -in selfsigned.crt -text -noout

This produces detailed output. The most important sections:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            4a:7b:c3:...
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=IN, ST=Karnataka, O=Dev Team, CN=localhost
        Validity
            Not Before: Feb 21 10:00:00 2026 GMT
            Not After : Feb 21 10:00:00 2027 GMT
        Subject: C=IN, ST=Karnataka, O=Dev Team, CN=localhost
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                ...
        X509v3 extensions:
            X509v3 Subject Alternative Name:
                DNS:localhost, IP Address:127.0.0.1
    Signature Algorithm: sha256WithRSAEncryption
         ...

Quick Extractions

# Just the subject
openssl x509 -in cert.pem -noout -subject

# Just the issuer
openssl x509 -in cert.pem -noout -issuer

# Just the validity dates
openssl x509 -in cert.pem -noout -dates

# Subject, issuer, and dates together
openssl x509 -in cert.pem -noout -subject -issuer -dates

# The fingerprint (SHA-256)
openssl x509 -in cert.pem -noout -fingerprint -sha256

# Serial number
openssl x509 -in cert.pem -noout -serial

# Check SANs
openssl x509 -in cert.pem -noout -ext subjectAltName

Check If a Certificate Matches a Key

This is a common troubleshooting step -- you have a .crt and a .key file but are not sure they belong together.

# Compare the modulus hash of the certificate and key
openssl x509 -in server.crt -noout -modulus | openssl md5
openssl rsa -in server.key -noout -modulus | openssl md5

# If both MD5 hashes match, the cert and key go together

For ECDSA:

openssl x509 -in server.crt -noout -pubkey | openssl md5
openssl ec -in server.key -pubout 2>/dev/null | openssl md5

Checking Certificate Expiration

Expired certificates are the number one TLS-related outage. Check proactively.

# Check when a local certificate expires
openssl x509 -in /etc/ssl/certs/server.crt -noout -enddate

# Check when a remote server's certificate expires
echo | openssl s_client -connect example.com:443 2>/dev/null \
  | openssl x509 -noout -enddate

# Check if a certificate will expire within the next 30 days
openssl x509 -in cert.pem -noout -checkend 2592000
# Exit code 0 = still valid; exit code 1 = will expire within 30 days

Hands-On: Certificate Expiration Monitoring Script

#!/bin/bash
# check_cert_expiry.sh -- Check certificate expiration for multiple domains

DOMAINS="example.com google.com github.com"
WARN_DAYS=30
WARN_SECONDS=$((WARN_DAYS * 86400))

for domain in $DOMAINS; do
  expiry=$(echo | openssl s_client -connect "$domain:443" 2>/dev/null \
    | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)

  if [ -z "$expiry" ]; then
    echo "[ERROR] $domain - Could not retrieve certificate"
    continue
  fi

  expiry_epoch=$(date -d "$expiry" +%s 2>/dev/null)
  now_epoch=$(date +%s)
  days_left=$(( (expiry_epoch - now_epoch) / 86400 ))

  if [ "$days_left" -lt 0 ]; then
    echo "[EXPIRED] $domain - Expired $((-days_left)) days ago!"
  elif [ "$days_left" -lt "$WARN_DAYS" ]; then
    echo "[WARNING] $domain - Expires in $days_left days ($expiry)"
  else
    echo "[OK]      $domain - Expires in $days_left days ($expiry)"
  fi
done
chmod +x check_cert_expiry.sh
./check_cert_expiry.sh
[OK]      example.com - Expires in 245 days (Oct 28 12:00:00 2026 GMT)
[OK]      google.com - Expires in 67 days (Apr 29 08:30:00 2026 GMT)
[OK]      github.com - Expires in 180 days (Aug 20 00:00:00 2026 GMT)

Verifying Certificate Chains

Verify a Certificate Against the System Trust Store

# Verify using the system's CA bundle
openssl verify cert.pem

If the issuing CA is in the system trust store, you will see:

cert.pem: OK

Verify with an Explicit CA Certificate

# Verify with a specific CA certificate
openssl verify -CAfile ca-cert.pem server-cert.pem

# Verify with both a root CA and an intermediate
openssl verify -CAfile root-ca.pem -untrusted intermediate.pem server-cert.pem

Testing TLS Connections with s_client

openssl s_client is one of the most useful debugging tools. It acts as a basic TLS client.

# Basic TLS connection test
openssl s_client -connect example.com:443 -brief
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Ciphersuite: TLS_AES_256_GCM_SHA384
Requested Signature Algorithms: ...
Peer certificate: CN=example.com
...
Verification: OK

Common s_client Options

# Show the full certificate chain
openssl s_client -connect example.com:443 -showcerts

# Test a specific TLS version
openssl s_client -connect example.com:443 -tls1_2
openssl s_client -connect example.com:443 -tls1_3

# Connect with SNI (Server Name Indication) -- important for shared hosting
openssl s_client -connect shared-host.com:443 -servername www.example.com

# Test with a specific CA bundle
openssl s_client -connect example.com:443 -CAfile /etc/ssl/certs/ca-certificates.crt

# Send an HTTP request through the TLS connection
echo -e "GET / HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n" \
  | openssl s_client -connect example.com:443 -quiet 2>/dev/null

# Check SMTP with STARTTLS
openssl s_client -connect smtp.gmail.com:587 -starttls smtp -brief

Think About It: Why is SNI important? What happens on a server hosting multiple HTTPS websites on a single IP address if the client does not send the SNI extension?


Creating a Mini CA (Certificate Authority)

This is one of the most instructive exercises in this entire book. You will create your own CA, sign a server certificate with it, and verify the chain.

Step 1: Create the Root CA

# Create a directory structure
mkdir -p ~/miniCA/{root,intermediate,server}
cd ~/miniCA

# Generate the root CA private key
openssl genrsa -aes256 -out root/root-ca.key 4096
# Enter a strong passphrase -- this is your root CA's crown jewel

# Create the root CA certificate (self-signed)
openssl req -x509 -new -nodes \
  -key root/root-ca.key \
  -sha256 -days 3650 \
  -out root/root-ca.crt \
  -subj "/C=IN/ST=Karnataka/O=MiniCA/CN=MiniCA Root CA"
# Verify the root CA certificate
openssl x509 -in root/root-ca.crt -text -noout | head -15

Notice that the Issuer and Subject are the same -- that is what makes it self-signed and a root certificate.

Step 2: Create an Intermediate CA

# Generate the intermediate CA key
openssl genrsa -aes256 -out intermediate/intermediate.key 4096

# Create a CSR for the intermediate CA
openssl req -new \
  -key intermediate/intermediate.key \
  -out intermediate/intermediate.csr \
  -subj "/C=IN/ST=Karnataka/O=MiniCA/CN=MiniCA Intermediate CA"

Create a config file for intermediate CA extensions:

cat > intermediate/intermediate-ext.cnf << 'EOF'
[v3_intermediate_ca]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer
basicConstraints = critical, CA:true, pathlen:0
keyUsage = critical, digitalSignature, cRLSign, keyCertSign
EOF
# Sign the intermediate CSR with the root CA
openssl x509 -req \
  -in intermediate/intermediate.csr \
  -CA root/root-ca.crt \
  -CAkey root/root-ca.key \
  -CAcreateserial \
  -out intermediate/intermediate.crt \
  -days 1825 \
  -sha256 \
  -extfile intermediate/intermediate-ext.cnf \
  -extensions v3_intermediate_ca
# Verify the intermediate cert was signed by the root
openssl verify -CAfile root/root-ca.crt intermediate/intermediate.crt
intermediate/intermediate.crt: OK

Step 3: Create and Sign a Server Certificate

# Generate the server key (no passphrase for server use)
openssl genrsa -out server/server.key 2048

# Create a CSR with SANs
cat > server/server-ext.cnf << 'EOF'
[req]
default_bits = 2048
prompt = no
default_md = sha256
distinguished_name = dn
req_extensions = v3_req

[dn]
C = IN
ST = Karnataka
O = Example Corp
CN = app.example.local

[v3_req]
subjectAltName = @alt_names

[alt_names]
DNS.1 = app.example.local
DNS.2 = *.example.local
IP.1 = 192.168.1.100
EOF

openssl req -new \
  -key server/server.key \
  -out server/server.csr \
  -config server/server-ext.cnf
# Create a server cert extensions file
cat > server/server-sign-ext.cnf << 'EOF'
[server_cert]
basicConstraints = CA:FALSE
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = DNS:app.example.local,DNS:*.example.local,IP:192.168.1.100
EOF

# Sign the server CSR with the intermediate CA
openssl x509 -req \
  -in server/server.csr \
  -CA intermediate/intermediate.crt \
  -CAkey intermediate/intermediate.key \
  -CAcreateserial \
  -out server/server.crt \
  -days 365 \
  -sha256 \
  -extfile server/server-sign-ext.cnf \
  -extensions server_cert

Step 4: Create the Certificate Chain and Verify

# Create the full chain file (intermediate + root)
cat intermediate/intermediate.crt root/root-ca.crt > server/chain.pem

# Create the full chain including the server cert
cat server/server.crt intermediate/intermediate.crt > server/fullchain.pem

# Verify the complete chain
openssl verify -CAfile root/root-ca.crt \
  -untrusted intermediate/intermediate.crt \
  server/server.crt
server/server.crt: OK
# See the chain in detail
echo "=== Server Cert ==="
openssl x509 -in server/server.crt -noout -subject -issuer
echo ""
echo "=== Intermediate Cert ==="
openssl x509 -in intermediate/intermediate.crt -noout -subject -issuer
echo ""
echo "=== Root CA Cert ==="
openssl x509 -in root/root-ca.crt -noout -subject -issuer

Expected output:

=== Server Cert ===
subject=C=IN, ST=Karnataka, O=Example Corp, CN=app.example.local
issuer=C=IN, ST=Karnataka, O=MiniCA, CN=MiniCA Intermediate CA

=== Intermediate Cert ===
subject=C=IN, ST=Karnataka, O=MiniCA, CN=MiniCA Intermediate CA
issuer=C=IN, ST=Karnataka, O=MiniCA, CN=MiniCA Root CA

=== Root CA Cert ===
subject=C=IN, ST=Karnataka, O=MiniCA, CN=MiniCA Root CA
issuer=C=IN, ST=Karnataka, O=MiniCA, CN=MiniCA Root CA

The chain is clear: Server --> Intermediate --> Root (self-signed).


Converting Between Formats

PEM to DER and Back

# PEM to DER
openssl x509 -in server/server.crt -outform DER -out server/server.der

# DER to PEM
openssl x509 -in server/server.der -inform DER -outform PEM -out server/server-back.pem

# Verify they are identical
diff <(openssl x509 -in server/server.crt -noout -modulus) \
     <(openssl x509 -in server/server-back.pem -noout -modulus)
# No output means they match

Create a PKCS#12 Bundle

# Bundle cert + key + chain into a PKCS#12 file
openssl pkcs12 -export \
  -out server/server.p12 \
  -inkey server/server.key \
  -in server/server.crt \
  -certfile server/chain.pem \
  -name "app.example.local"
# You will be prompted for an export password

# Extract everything from a PKCS#12
openssl pkcs12 -in server/server.p12 -out server/extracted.pem -nodes
# Enter the export password

PEM Key Format Conversion

# Convert a traditional RSA key to PKCS#8 format
openssl pkcs8 -topk8 -inform PEM -outform PEM \
  -in server/server.key -out server/server-pkcs8.key -nocrypt

# Convert PKCS#8 back to traditional format
openssl rsa -in server/server-pkcs8.key -out server/server-traditional.key

Debug This

You configure Nginx with a new certificate, but it refuses to start:

nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/ssl/private/server.key") failed
nginx: [emerg] cannot load certificate key "/etc/ssl/private/server.key":
  error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch

How do you diagnose and fix this?

Answer: The certificate and private key do not match. They were probably generated separately or mixed up during a renewal. Check the modulus of both:

# Check the certificate's public key
openssl x509 -in /etc/ssl/certs/server.crt -noout -modulus | openssl md5

# Check the private key's corresponding public key
openssl rsa -in /etc/ssl/private/server.key -noout -modulus | openssl md5

If the MD5 hashes are different, the files do not go together. You need to either find the correct matching key, or generate a new key and CSR and get a new certificate.


What Just Happened?

+------------------------------------------------------------------+
|                     OPENSSL HANDS-ON                             |
+------------------------------------------------------------------+
|                                                                  |
|  KEY GENERATION:                                                 |
|    RSA:   openssl genrsa -out key.pem 2048                       |
|    ECDSA: openssl ecparam -genkey -name prime256v1 -noout -out   |
|                                                                  |
|  CSR CREATION:                                                   |
|    openssl req -new -key key.pem -out request.csr                |
|    Use -config with SAN for modern certificates                  |
|                                                                  |
|  SELF-SIGNED:                                                    |
|    openssl req -x509 -newkey rsa:2048 -nodes ...                 |
|                                                                  |
|  EXAMINING CERTS:                                                |
|    openssl x509 -in cert.pem -text -noout                        |
|    -subject, -issuer, -dates, -fingerprint                       |
|                                                                  |
|  CERT MATCHING:                                                  |
|    Compare modulus MD5: cert vs key must match                   |
|                                                                  |
|  CHAIN VERIFICATION:                                             |
|    openssl verify -CAfile root.pem [-untrusted inter.pem] cert   |
|                                                                  |
|  TLS TESTING:                                                    |
|    openssl s_client -connect host:443 -brief                     |
|                                                                  |
|  MINI CA:                                                        |
|    Root CA --> signs Intermediate --> signs Server cert           |
|    fullchain.pem = server cert + intermediate(s)                 |
|                                                                  |
|  FORMAT CONVERSION:                                              |
|    PEM <-> DER:  openssl x509 -outform DER/PEM                   |
|    PKCS#12:      openssl pkcs12 -export / -in                    |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Full Certificate Lifecycle

Perform the complete lifecycle without looking at the chapter:

  1. Generate an ECDSA private key
  2. Create a CSR with SANs for myapp.local and *.myapp.local
  3. Self-sign it for 90 days
  4. Examine the certificate and verify the SANs are present
  5. Check the expiration date

Exercise 2: Mini CA Expansion

Using the mini CA you created in this chapter, sign certificates for three different services: a web server, a database, and an API gateway. Give each one different SANs. Verify all three against the chain.

Exercise 3: Certificate Detective

Pick any website and extract as much information as you can using only openssl s_client and openssl x509:

  • TLS version and cipher suite
  • Full certificate chain with subjects and issuers
  • Key type and size
  • SANs
  • Exact expiration timestamp
  • Certificate fingerprint

Bonus Challenge

Write a script that takes a domain name as an argument and produces a formatted report of all the TLS and certificate information for that domain. Include the certificate chain, key details, expiration, and whether the certificate will expire within the next 30 days.

Let's Encrypt & ACME

Why This Matters

Before Let's Encrypt, getting an HTTPS certificate for your website meant paying a Certificate Authority $50-$300 per year, waiting for manual verification, and remembering to renew before it expired. The result? Huge swaths of the internet ran on unencrypted HTTP. Login pages, e-commerce sites, personal blogs -- all sending data in plain text.

Let's Encrypt changed everything. It is a free, automated, and open Certificate Authority that has issued billions of certificates. Combined with the ACME protocol (Automatic Certificate Management Environment), it allows you to obtain and renew TLS certificates without any human intervention. Today, there is no excuse for running an unencrypted website.

This chapter covers how Let's Encrypt works, how to use certbot to get certificates, how to set up automatic renewal, and how to handle special cases like wildcard certificates.


Try This Right Now

If you have a server with a public domain name pointing to it, you can get a certificate in under a minute:

# Install certbot (Debian/Ubuntu)
sudo apt install certbot

# Get a standalone certificate (stop any web server on port 80 first)
sudo certbot certonly --standalone -d yourdomain.com

If you do not have a public server yet, you can still explore certbot:

# Install certbot
sudo apt install certbot    # Debian/Ubuntu
sudo dnf install certbot    # RHEL/Fedora

# See what certbot can do
certbot --help all

# Check if you already have any certificates
sudo certbot certificates

What Is Let's Encrypt?

Let's Encrypt is a non-profit Certificate Authority run by the Internet Security Research Group (ISRG). It provides:

  • Free domain-validated (DV) certificates
  • Automated issuance and renewal via the ACME protocol
  • Open -- all software and protocols are open source
  • Trusted -- Let's Encrypt certificates are trusted by all major browsers and operating systems
  Traditional CA Process:
  +--------+     +----------+     +------+     +--------+
  | Pay $$ | --> | Wait for | --> | Get  | --> | Renew  |
  |        |     | approval |     | cert |     | yearly |
  +--------+     +----------+     +------+     +--------+
       Manual, slow, expensive, easy to forget renewal

  Let's Encrypt Process:
  +----------+     +----------+     +-------+
  | Run      | --> | Auto-    | --> | Auto- |
  | certbot  |     | verified |     | renew |
  +----------+     +----------+     +-------+
       Automated, free, 90-day certs, auto-renewed

Why 90-Day Certificates?

Let's Encrypt certificates are valid for only 90 days (compared to the traditional 1 year). This seems like a hassle, but it is intentional:

  1. Encourages automation -- If you must renew every 90 days, you are forced to automate it. Automated renewal is more reliable than human memory.
  2. Limits damage -- If a key is compromised, the exposure window is shorter.
  3. Forces freshness -- Certificates and keys are regularly rotated.

The ACME Protocol

ACME (Automatic Certificate Management Environment, defined in RFC 8555) is the protocol that makes automation possible. Here is how it works:

  ACME Client (certbot)                  ACME Server (Let's Encrypt)
       |                                         |
       |  1. Request certificate for             |
       |     example.com                         |
       |  -------------------------------------> |
       |                                         |
       |              2. Here is a challenge:    |
       |              Prove you control          |
       |              example.com                |
       |  <------------------------------------- |
       |                                         |
       |  3. Complete the challenge              |
       |     (place a file on web server         |
       |      or create a DNS record)            |
       |                                         |
       |  4. Challenge completed                 |
       |  -------------------------------------> |
       |                                         |
       |              5. Verify the challenge    |
       |              (fetch the file or         |
       |               query the DNS record)     |
       |                                         |
       |              6. Challenge passed!       |
       |              Here is your certificate.  |
       |  <------------------------------------- |
       |                                         |

The key insight is that Let's Encrypt never sees your private key. You generate the key locally, create a CSR, and only the CSR and challenge proof are sent to Let's Encrypt.


Challenge Types

Let's Encrypt needs to verify that you control the domain before issuing a certificate. There are two main challenge types.

HTTP-01 Challenge

The most common challenge. Let's Encrypt asks you to place a specific file at a specific URL on your web server.

  Let's Encrypt says:
  "Place a file with content 'abc123...' at:
   http://example.com/.well-known/acme-challenge/TOKEN"

  Certbot:
  1. Places the file in the web server's document root
  2. Tells Let's Encrypt to check
  3. Let's Encrypt fetches the URL from their servers
  4. If the file content matches, you control the domain

Requirements:

  • Port 80 must be open and reachable from the internet
  • The domain must point to your server's IP address
  • Works for individual domain names only (not wildcards)

DNS-01 Challenge

Let's Encrypt asks you to create a specific DNS TXT record under your domain.

  Let's Encrypt says:
  "Create a DNS TXT record:
   _acme-challenge.example.com  TXT  'xyz789...'"

  Certbot:
  1. Creates the DNS record (manually or via DNS API)
  2. Tells Let's Encrypt to check
  3. Let's Encrypt queries DNS for the TXT record
  4. If the record content matches, you control the domain

Requirements:

  • Access to your domain's DNS management
  • DNS propagation can take time (seconds to minutes)
  • Required for wildcard certificates
  • Works even if your server is not publicly accessible

Think About It: Why can't HTTP-01 challenges be used for wildcard certificates? (Hint: think about what *.example.com means and how many servers it could point to.)


Installing Certbot

Debian/Ubuntu

# Install certbot and the Nginx plugin
sudo apt update
sudo apt install certbot python3-certbot-nginx

# Or for Apache
sudo apt install certbot python3-certbot-apache

RHEL/Fedora

# Enable EPEL repository (RHEL/CentOS)
sudo dnf install epel-release

# Install certbot and plugins
sudo dnf install certbot python3-certbot-nginx
# Or
sudo dnf install certbot python3-certbot-apache

Using snap (Distribution-Independent)

The certbot team recommends snap for the latest version:

# Install snap if not present
sudo apt install snapd    # Debian/Ubuntu
sudo dnf install snapd    # RHEL/Fedora

# Install certbot via snap
sudo snap install --classic certbot

# Create a symlink so certbot is in the path
sudo ln -s /snap/bin/certbot /usr/bin/certbot

Obtaining Certificates

Method 1: Standalone Mode

Certbot runs its own temporary web server on port 80 to answer the challenge. Use this when you do not have a web server running (or can stop it temporarily).

# Stop your web server first
sudo systemctl stop nginx    # or apache2

# Get the certificate
sudo certbot certonly --standalone -d example.com -d www.example.com

# Start your web server again
sudo systemctl start nginx

Method 2: Webroot Mode

Certbot places challenge files in your existing web server's document root. Use this when your web server is running and you do not want to stop it.

# Your web server must serve files from the webroot
sudo certbot certonly --webroot \
  -w /var/www/html \
  -d example.com -d www.example.com

For this to work, your web server must serve files from /var/www/html/.well-known/acme-challenge/. With Nginx, ensure you have:

server {
    listen 80;
    server_name example.com www.example.com;

    location /.well-known/acme-challenge/ {
        root /var/www/html;
    }

    # Redirect everything else to HTTPS
    location / {
        return 301 https://$host$request_uri;
    }
}

The Nginx plugin handles everything -- it reads your Nginx config, gets the certificate, and configures Nginx to use it.

# Get certificate and automatically configure Nginx
sudo certbot --nginx -d example.com -d www.example.com

Certbot will:

  1. Read your existing Nginx configuration
  2. Obtain the certificate using webroot via the existing Nginx
  3. Modify the Nginx config to add SSL directives
  4. Reload Nginx

Method 4: Apache Plugin

# Get certificate and automatically configure Apache
sudo certbot --apache -d example.com -d www.example.com

What Certbot Creates

After successful certificate issuance:

sudo ls -la /etc/letsencrypt/live/example.com/
cert.pem       -> ../../archive/example.com/cert1.pem       # Leaf certificate
chain.pem      -> ../../archive/example.com/chain1.pem      # Intermediate chain
fullchain.pem  -> ../../archive/example.com/fullchain1.pem  # cert + chain
privkey.pem    -> ../../archive/example.com/privkey1.pem    # Private key

These are symlinks to the latest versions. When certificates are renewed, the symlinks are updated to point to the new files.

For Nginx, use:

ssl_certificate     /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

For Apache, use:

SSLCertificateFile    /etc/letsencrypt/live/example.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem

Hands-On: Full Nginx + Let's Encrypt Setup

Here is a complete walkthrough for setting up a new site with HTTPS.

Step 1: Set Up the Basic Nginx Site

# Create a basic Nginx config for your domain
sudo tee /etc/nginx/sites-available/example.com << 'EOF'
server {
    listen 80;
    server_name example.com www.example.com;

    root /var/www/example.com;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }
}
EOF

# Enable the site
sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/

# Create the document root
sudo mkdir -p /var/www/example.com
echo "<h1>Hello, HTTPS!</h1>" | sudo tee /var/www/example.com/index.html

# Test and reload Nginx
sudo nginx -t && sudo systemctl reload nginx

Step 2: Get the Certificate

# Use the Nginx plugin
sudo certbot --nginx -d example.com -d www.example.com

Certbot will ask a few questions:

  • Email address for urgent notices (like expiration warnings)
  • Agreement to terms of service
  • Whether to redirect HTTP to HTTPS (say yes)

Step 3: Verify

# Test HTTPS
curl -I https://example.com

# Test that HTTP redirects to HTTPS
curl -I http://example.com

# Check the certificate details
echo | openssl s_client -connect example.com:443 -brief

# View certbot's view of the certificate
sudo certbot certificates

Auto-Renewal

Certificates expire every 90 days. Let's Encrypt recommends renewing at 60 days (30 days before expiry). Certbot handles this automatically.

How Auto-Renewal Works

When you install certbot, it creates either a systemd timer or a cron job that runs certbot renew twice daily. The command checks all certificates and renews any that are within 30 days of expiry.

# Check if the systemd timer is active
systemctl status certbot.timer
● certbot.timer - Run certbot twice daily
     Loaded: loaded (/lib/systemd/system/certbot.timer; enabled)
     Active: active (waiting)
    Trigger: Fri 2026-02-21 14:23:00 UTC; 6h left
   Triggers: ● certbot.service
# Or check for a cron job
cat /etc/cron.d/certbot
# Certbot automatic renewal
0 */12 * * * root test -x /usr/bin/certbot -a \! -d /run/systemd/system && \
  perl -e 'sleep int(rand(43200))' && certbot -q renew

Testing Renewal

# Dry run -- test renewal without actually renewing
sudo certbot renew --dry-run
Saving debug log to /var/log/letsencrypt/letsencrypt.log
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/example.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert not yet due for renewal

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The following certificates are not due for renewal yet:
  /etc/letsencrypt/live/example.com/fullchain.pem expires on 2026-05-22
No renewals were attempted.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Renewal Hooks

After a certificate is renewed, you typically need to reload your web server. Certbot supports hooks for this:

# Reload Nginx after successful renewal
sudo certbot renew --deploy-hook "systemctl reload nginx"

Or set it permanently in the renewal configuration:

sudo vi /etc/letsencrypt/renewal/example.com.conf

Add at the bottom under [renewalparams]:

[renewalparams]
# ... existing settings ...

[[ commands ]]
post_hook = systemctl reload nginx

You can also place hook scripts in dedicated directories:

# Scripts in these directories run automatically during renewal
ls /etc/letsencrypt/renewal-hooks/
# deploy/   - runs after successful renewal
# post/     - runs after every renewal attempt
# pre/      - runs before every renewal attempt

# Example: reload Nginx after renewal
sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh << 'EOF'
#!/bin/bash
systemctl reload nginx
EOF
sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh

Setting Up a systemd Timer Manually

If the certbot timer is not installed:

sudo tee /etc/systemd/system/certbot-renewal.service << 'EOF'
[Unit]
Description=Certbot Renewal
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/bin/certbot renew --quiet --deploy-hook "systemctl reload nginx"
EOF

sudo tee /etc/systemd/system/certbot-renewal.timer << 'EOF'
[Unit]
Description=Run certbot renewal twice daily

[Timer]
OnCalendar=*-*-* 00,12:00:00
RandomizedDelaySec=3600
Persistent=true

[Install]
WantedBy=timers.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now certbot-renewal.timer

Think About It: Why does certbot add a random delay before renewal? (Hint: what would happen if thousands of servers all tried to renew at exactly midnight?)


Wildcard Certificates

A wildcard certificate covers all subdomains of a domain: *.example.com matches www.example.com, api.example.com, mail.example.com, etc.

Requirements

  • Wildcard certificates require the DNS-01 challenge (HTTP-01 will not work)
  • You need API access to your DNS provider, or you must create TXT records manually

Manual DNS Challenge

# Request a wildcard certificate
sudo certbot certonly --manual --preferred-challenges dns \
  -d "*.example.com" -d example.com

Certbot will show:

Please deploy a DNS TXT record under the name:
  _acme-challenge.example.com
with the following value:
  aB3dEfGhIjKlMnOpQrStUvWxYz1234567890abc

Before continuing, verify the TXT record has been deployed. Depending on the
DNS provider, this may take a few seconds to a few minutes.

Press Enter to Continue

You must log into your DNS provider and create this TXT record, then press Enter.

WARNING: Manual DNS challenges cannot be renewed automatically. You will need to repeat the manual process every 90 days. For production use, use a DNS plugin instead.

Automated DNS Challenge with DNS Plugins

For automated renewal, use a certbot DNS plugin for your DNS provider:

# Cloudflare example
sudo apt install python3-certbot-dns-cloudflare    # Debian/Ubuntu
sudo dnf install python3-certbot-dns-cloudflare    # RHEL/Fedora

# Create API credentials file
sudo mkdir -p /etc/letsencrypt
sudo tee /etc/letsencrypt/cloudflare.ini << 'EOF'
dns_cloudflare_api_token = YOUR_CLOUDFLARE_API_TOKEN
EOF
sudo chmod 600 /etc/letsencrypt/cloudflare.ini

# Get a wildcard certificate with automatic DNS verification
sudo certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /etc/letsencrypt/cloudflare.ini \
  -d "*.example.com" -d example.com

Available DNS plugins include:

  • certbot-dns-cloudflare
  • certbot-dns-route53 (AWS)
  • certbot-dns-google (Google Cloud DNS)
  • certbot-dns-digitalocean
  • certbot-dns-linode
  • certbot-dns-ovh

Rate Limits

Let's Encrypt imposes rate limits to prevent abuse. Know these before you start testing in production:

+-----------------------------+------------------+-----------------------+
| Limit                       | Value            | Reset                 |
+-----------------------------+------------------+-----------------------+
| Certificates per domain     | 50 per week      | Rolling 7-day window  |
| Duplicate certificates      | 5 per week       | Rolling 7-day window  |
| Failed validations          | 5 per hour       | Rolling 1-hour window |
| New registrations (accounts)| 10 per IP/3 hrs  | Rolling 3-hour window |
| Pending authorizations      | 300 per account  | N/A                   |
+-----------------------------+------------------+-----------------------+

Using the Staging Environment

For testing, always use the staging environment. It has much higher rate limits and issues certificates signed by a fake CA (not trusted by browsers, but functionally identical).

# Use staging for testing
sudo certbot certonly --standalone \
  --staging \
  -d test.example.com

# The staging certificate will show issuer: "(STAGING) ..."

After confirming everything works with staging, remove the staging cert and run again without --staging:

# Delete the staging certificate
sudo certbot delete --cert-name test.example.com

# Get the real certificate
sudo certbot certonly --standalone -d test.example.com

Alternative Client: acme.sh

While certbot is the most popular ACME client, acme.sh is a lightweight pure-bash alternative. It has no dependencies beyond bash and curl.

Installation

# Install acme.sh (runs as your user, not root)
curl https://get.acme.sh | sh -s email=admin@example.com

# Reload your shell to get the acme.sh alias
source ~/.bashrc

Getting a Certificate

# Webroot mode
acme.sh --issue -d example.com -w /var/www/html

# Standalone mode
acme.sh --issue -d example.com --standalone

# DNS mode (Cloudflare)
export CF_Token="YOUR_CLOUDFLARE_API_TOKEN"
acme.sh --issue -d "*.example.com" -d example.com --dns dns_cf

Installing the Certificate

# Install certificate to the correct location and set up reload hook
acme.sh --install-cert -d example.com \
  --key-file /etc/ssl/private/example.com.key \
  --fullchain-file /etc/ssl/certs/example.com.fullchain.pem \
  --reloadcmd "systemctl reload nginx"

Why Choose acme.sh?

  • No root required (runs as regular user)
  • No Python dependency (pure bash)
  • Supports more DNS providers out of the box (over 100)
  • Built-in cron job for renewal
  • Lightweight and portable

Distro Note: acme.sh works identically across all Linux distributions since it only requires bash and curl, which are universally available.


Managing Certificates

List All Certificates

sudo certbot certificates
Found the following certs:
  Certificate Name: example.com
    Serial Number: 03a4b5c6d7e8f9...
    Key Type: RSA
    Domains: example.com www.example.com
    Expiry Date: 2026-05-22 10:30:00+00:00 (VALID: 89 days)
    Certificate Path: /etc/letsencrypt/live/example.com/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/example.com/privkey.pem

Revoke a Certificate

# Revoke using the certificate file
sudo certbot revoke --cert-path /etc/letsencrypt/live/example.com/cert.pem

# Revoke and delete all related files
sudo certbot revoke --cert-path /etc/letsencrypt/live/example.com/cert.pem --delete-after-revoke

Delete a Certificate (Without Revoking)

sudo certbot delete --cert-name example.com

Expand a Certificate (Add Domains)

# Add a new domain to an existing certificate
sudo certbot certonly --expand -d example.com -d www.example.com -d new.example.com

Debug This

A certbot renewal starts failing with this error:

Attempting to renew cert (example.com) from /etc/letsencrypt/renewal/example.com.conf
Cert is due for renewal, auto-renewing...
Could not choose appropriate plugin: The manual plugin is not working;
there may be problems with your existing configuration.

The certificate was originally obtained with --manual --preferred-challenges dns. What is wrong?

Answer: Manual challenges cannot be automated. When certbot tries to auto-renew, it cannot complete the DNS challenge because there is no automated DNS plugin configured. The fix is to switch to an automated method:

# Delete the old cert
sudo certbot delete --cert-name example.com

# Re-obtain with a DNS plugin (e.g., Cloudflare)
sudo certbot certonly --dns-cloudflare \
  --dns-cloudflare-credentials /etc/letsencrypt/cloudflare.ini \
  -d example.com -d www.example.com

# Or switch to HTTP-01 with the Nginx plugin
sudo certbot --nginx -d example.com -d www.example.com

Now auto-renewal will work because certbot can complete the challenge without human intervention.


What Just Happened?

+------------------------------------------------------------------+
|                  LET'S ENCRYPT & ACME                            |
+------------------------------------------------------------------+
|                                                                  |
|  LET'S ENCRYPT:                                                  |
|    Free, automated, open CA. No more excuses for no HTTPS.       |
|    90-day certs by design -- forces automation.                  |
|                                                                  |
|  ACME PROTOCOL:                                                  |
|    Client requests cert --> CA issues challenge -->               |
|    Client proves control --> CA issues certificate                |
|                                                                  |
|  CHALLENGES:                                                     |
|    HTTP-01: Place file at /.well-known/acme-challenge/ (port 80) |
|    DNS-01:  Create TXT record at _acme-challenge.domain          |
|    Wildcards require DNS-01.                                     |
|                                                                  |
|  CERTBOT METHODS:                                                |
|    --standalone   : certbot runs its own web server              |
|    --webroot      : uses existing web server's doc root          |
|    --nginx        : reads and modifies Nginx config              |
|    --apache       : reads and modifies Apache config             |
|                                                                  |
|  RENEWAL:                                                        |
|    certbot renew (via systemd timer or cron)                     |
|    Test with: certbot renew --dry-run                            |
|    Deploy hooks reload the web server after renewal              |
|                                                                  |
|  WILDCARDS:                                                      |
|    certbot certonly --dns-PLUGIN -d "*.example.com"              |
|    Requires DNS API access for auto-renewal                      |
|                                                                  |
|  RATE LIMITS:                                                    |
|    Use --staging for testing!                                    |
|    50 certs/week per registered domain                           |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Local Staging Test

Set up a VM with a public IP and a domain name pointing to it. Use certbot with --staging to practice the full certificate lifecycle: obtain, configure Nginx, verify HTTPS works, test renewal with --dry-run, then delete and re-obtain with the real CA.

Exercise 2: Multiple Sites

Configure two separate websites on the same Nginx server, each with their own Let's Encrypt certificate. Verify that SNI is working correctly by testing both sites with openssl s_client -servername.

Exercise 3: acme.sh Alternative

Install acme.sh alongside certbot and obtain a staging certificate using it. Compare the experience -- the directory structure, the renewal mechanism, and the configuration approach. Which do you prefer?

Bonus Challenge

Write a monitoring script that checks all certificates managed by certbot, reports their expiration dates, and sends an alert (to a log file or email) if any certificate will expire within 14 days. Combine this with the certificate expiration script from Chapter 40 to cover both Let's Encrypt and non-Let's Encrypt certificates.

SELinux & AppArmor

Why This Matters

Imagine your web server gets compromised through a vulnerability in your application code. The attacker gets a shell running as the www-data user. With traditional Linux permissions (DAC -- Discretionary Access Control), that attacker can read anything www-data can read: configuration files, other users' data, maybe even credentials stored in environment variables. They can open network connections to exfiltrate data. They can write to any directory www-data has write access to.

Traditional permissions answer the question "Is this user allowed to do this?" But they cannot answer "Should a web server process be reading /etc/passwd?" or "Should Apache be making outbound connections to port 4444?"

That is what Mandatory Access Control (MAC) systems like SELinux and AppArmor do. They confine processes to only what they are supposed to do, regardless of what the user permissions would allow. They are your last line of defense when everything else has failed, and they have prevented countless real-world breaches.


Try This Right Now

Check which MAC system your distribution uses:

# Check for SELinux
getenforce 2>/dev/null && echo "SELinux is available"

# Check for AppArmor
sudo aa-status 2>/dev/null && echo "AppArmor is available"

# Which one is active?
cat /sys/kernel/security/lsm

Typical results:

  • RHEL/Fedora/CentOS/Rocky/Alma: SELinux
  • Ubuntu/Debian/SUSE: AppArmor
  • Both are compiled into most modern kernels, but only one is active at a time.

MAC vs DAC

DAC (Discretionary Access Control) -- What You Already Know

DAC is the traditional Unix permission model: the owner of a file decides who can access it. This is the rwxr-xr-x permissions, users, and groups system from Chapter 6.

  DAC Decision:
  +-------------------+
  | Process runs as   |
  | user "www-data"   |
  +-------------------+
           |
           v
  +-------------------+
  | File owned by     |
  | root:root         |  --> Does "www-data" have read permission?
  | Mode: -rw-r--r--  |  --> Yes (world-readable) --> ACCESS GRANTED
  +-------------------+

The problem: DAC is permissive by default. The process gets all the permissions of its user, even if it should not need them.

MAC (Mandatory Access Control) -- The Extra Layer

MAC enforces policies defined by the system administrator, not by file owners. Even if DAC would allow access, MAC can deny it.

  Access Request: Apache (httpd_t) wants to read /etc/shadow

  +----------+     +---------+     +----------+
  | DAC      | --> | Allowed | --> | MAC      | --> DENIED
  | Check    |     | (r--r-- |     | Check    |     (httpd_t is not
  |          |     |  group) |     | (SELinux)|      allowed to read
  +----------+     +---------+     +----------+      shadow_t files)

MAC runs after DAC. Both must allow the access for it to succeed. This means MAC can only restrict further -- it cannot grant permissions that DAC denied.

  +---------------------------------------------------+
  |                                                   |
  |  DAC says YES  +  MAC says YES  =  ACCESS GRANTED |
  |  DAC says YES  +  MAC says NO   =  ACCESS DENIED  |
  |  DAC says NO   +  MAC says ???  =  ACCESS DENIED  |
  |  (DAC check happens first; if it fails, MAC       |
  |   is never consulted)                             |
  |                                                   |
  +---------------------------------------------------+

Think About It: If a process runs as root, DAC almost never denies access. Why does this make MAC even more important for processes that run as root?


SELinux

SELinux (Security-Enhanced Linux) was developed by the NSA and Red Hat. It uses labels (called security contexts) on everything -- files, processes, ports, and users. Policy rules define which labeled processes can access which labeled objects.

SELinux Modes

# Check the current mode
getenforce

Three modes:

ModeBehavior
EnforcingPolicies are enforced. Violations are denied and logged.
PermissivePolicies are NOT enforced, but violations are logged. Useful for debugging.
DisabledSELinux is completely off. Not recommended.
# Switch between modes temporarily (until reboot)
sudo setenforce 1    # Enforcing
sudo setenforce 0    # Permissive

# Check persistent configuration
cat /etc/selinux/config
# /etc/selinux/config
SELINUX=enforcing
SELINUXTYPE=targeted

To change the mode persistently, edit /etc/selinux/config and reboot.

WARNING: Never set SELINUX=disabled in production. If you need to debug, use Permissive mode instead. Disabling and re-enabling SELinux requires a full filesystem relabel, which can take a very long time on large systems.

SELinux Contexts (Labels)

Everything in SELinux has a context with four parts:

  user:role:type:level

  Example:
  system_u:system_r:httpd_t:s0

  user    = system_u   (SELinux user, not Linux user)
  role    = system_r   (role, determines what types are allowed)
  type    = httpd_t    (the TYPE -- most important for policy decisions)
  level   = s0         (MLS level, usually s0 on targeted policy)

The type is the most important part. Almost all SELinux policy decisions on a targeted policy are based on the type.

Viewing Contexts

# See file contexts
ls -Z /var/www/html/
unconfined_u:object_r:httpd_sys_content_t:s0 index.html
# See process contexts
ps -eZ | grep httpd
system_u:system_r:httpd_t:s0     1234 ?  00:00:05 httpd
# See your own context
id -Z

# See port contexts
sudo semanage port -l | grep http
http_port_t          tcp      80, 81, 443, 488, 8008, 8009, 8443, 9000

How SELinux Policy Works

The targeted policy (default on RHEL/Fedora) confines specific services while leaving most user processes unconfined.

  Apache process (httpd_t) wants to read a file:

  +--------------------------------------------+
  | File has type httpd_sys_content_t?         |
  |   YES --> Policy allows httpd_t to read    |
  |           httpd_sys_content_t --> ALLOWED   |
  +--------------------------------------------+

  +--------------------------------------------+
  | File has type user_home_t?                 |
  |   YES --> Policy does NOT allow httpd_t    |
  |           to read user_home_t --> DENIED    |
  +--------------------------------------------+

Changing File Contexts with chcon

chcon changes the context of a file temporarily (it does not survive a relabel):

# Change a file's type to httpd_sys_content_t
sudo chcon -t httpd_sys_content_t /var/www/custom/index.html

# Change recursively
sudo chcon -R -t httpd_sys_content_t /var/www/custom/

# Verify the change
ls -Z /var/www/custom/index.html

Restoring Default Contexts with restorecon

restorecon sets file contexts back to the system default (based on file path):

# Restore default context for a file
sudo restorecon -v /var/www/html/index.html

# Restore recursively
sudo restorecon -Rv /var/www/html/

# The -v flag shows what changed
Relabeled /var/www/html/index.html from unconfined_u:object_r:default_t:s0
to unconfined_u:object_r:httpd_sys_content_t:s0

Setting Permanent Default Contexts with semanage

semanage fcontext defines the default context rules. These survive relabels.

# See the default rules for /var/www
sudo semanage fcontext -l | grep /var/www

# Add a custom rule for a new directory
sudo semanage fcontext -a -t httpd_sys_content_t "/srv/myapp(/.*)?"

# Apply the rule to existing files
sudo restorecon -Rv /srv/myapp/

SELinux Booleans

Booleans are on/off switches that modify SELinux policy without writing new rules. They handle common configuration needs.

# List all booleans
sudo getsebool -a

# List booleans related to httpd
sudo getsebool -a | grep httpd
httpd_can_network_connect --> off
httpd_can_network_connect_db --> off
httpd_can_sendmail --> off
httpd_enable_cgi --> on
httpd_enable_homedirs --> off
httpd_use_nfs --> off
# Enable Apache to make network connections (e.g., to a backend API)
sudo setsebool -P httpd_can_network_connect on
# -P makes it persistent across reboots

# Allow Apache to connect to databases
sudo setsebool -P httpd_can_network_connect_db on

# Get a description of a boolean
sudo semanage boolean -l | grep httpd_can_network

Hands-On: Troubleshoot an SELinux Denial

Scenario: You move your website files from /var/www/html to /opt/website, update the Nginx config, but the site returns 403 Forbidden.

# 1. Check if SELinux is the cause
sudo ausearch -m avc -ts recent
type=AVC msg=audit(...): avc:  denied  { read } for  pid=1234
comm="nginx" name="index.html" dev="sda1" ino=12345
scontext=system_u:system_r:httpd_t:s0
tcontext=unconfined_u:object_r:default_t:s0
tclass=file permissive=0

Reading this denial message:

  • comm="nginx" -- Nginx was denied
  • { read } -- it tried to read
  • name="index.html" -- this file
  • scontext=...httpd_t -- Nginx runs as httpd_t
  • tcontext=...default_t -- the file has type default_t
  • httpd_t is not allowed to read default_t
# 2. Check the current context
ls -Z /opt/website/index.html
unconfined_u:object_r:default_t:s0 /opt/website/index.html

The file has default_t instead of httpd_sys_content_t.

# 3. Fix it -- set the permanent rule and apply
sudo semanage fcontext -a -t httpd_sys_content_t "/opt/website(/.*)?"
sudo restorecon -Rv /opt/website/

# 4. Verify
ls -Z /opt/website/index.html
unconfined_u:object_r:httpd_sys_content_t:s0 /opt/website/index.html

The site should now work.

Using sealert for Friendly Error Messages

On RHEL/Fedora, sealert provides human-readable explanations of SELinux denials:

# Install setroubleshoot
sudo dnf install setroubleshoot-server

# Analyze the audit log
sudo sealert -a /var/log/audit/audit.log

Example output:

SELinux is preventing nginx from read access on the file index.html.

*****  Plugin restorecon (99.5 confidence) suggests  ************************

If you want to fix the label:
  /opt/website/index.html default label should be httpd_sys_content_t.
  Then you can run restorecon. The access attempt may have been stopped
  due to insufficient permissions to access a parent directory, in which
  case try to change the following command accordingly.
  Do
  # /sbin/restorecon -v /opt/website/index.html

Using audit2allow

When you need to create a custom policy module (because no existing boolean covers your use case):

# Generate a policy module from recent denials
sudo ausearch -m avc -ts recent | audit2allow -M mypolicy

# This creates:
#   mypolicy.pp  (compiled policy module)
#   mypolicy.te  (human-readable policy source)

# Review what it would allow
cat mypolicy.te

# Install the module
sudo semodule -i mypolicy.pp

WARNING: Never blindly run audit2allow and install the result. Always read the .te file first. audit2allow might generate rules that are too permissive. Sometimes the right fix is a boolean or a context change, not a new policy module.


AppArmor

AppArmor (Application Armor) is the MAC system used by Ubuntu, Debian, and SUSE. It takes a different approach from SELinux: instead of labeling everything, AppArmor uses path-based profiles that define what each program is allowed to do.

AppArmor vs SELinux: Key Differences

+-------------------+---------------------------+---------------------------+
| Feature           | SELinux                   | AppArmor                  |
+-------------------+---------------------------+---------------------------+
| Approach          | Labels on every object    | Path-based profiles       |
| Complexity        | Steeper learning curve    | Simpler to understand     |
| Granularity       | Very fine-grained         | Good, but less granular   |
| Default distro    | RHEL, Fedora, CentOS      | Ubuntu, Debian, SUSE      |
| New file handling | Inherits parent's label   | Matched by path rules     |
| Moved files       | Keep their label          | Matched by new path       |
| Profile creation  | Policy modules (complex)  | Text profiles (readable)  |
+-------------------+---------------------------+---------------------------+

AppArmor Modes

Each profile can be in one of two modes:

ModeBehavior
EnforceViolations are denied and logged
ComplainViolations are logged but allowed (like SELinux permissive)

Checking AppArmor Status

# Overall status
sudo aa-status
apparmor module is loaded.
44 profiles are loaded.
  39 profiles are in enforce mode.
    /snap/core/...
    /usr/bin/man
    /usr/sbin/mysqld
    /usr/sbin/ntpd
    ...
  5 profiles are in complain mode.
    /usr/sbin/cups-browsed
    ...
2 processes have profiles defined.
  2 are in enforce mode.
    /usr/sbin/mysqld (1234)
    /usr/sbin/ntpd (5678)
  0 are in complain mode.
  0 are unconfined but have a profile defined.

Understanding AppArmor Profiles

Profiles live in /etc/apparmor.d/ and are named after the program they confine (with slashes replaced by dots):

ls /etc/apparmor.d/
usr.sbin.mysqld
usr.sbin.ntpd
usr.bin.man
...

Let's look at a simplified profile:

sudo cat /etc/apparmor.d/usr.sbin.ntpd
# Profile for ntpd
/usr/sbin/ntpd {
  # Include common abstractions
  #include <abstractions/base>
  #include <abstractions/nameservice>

  # Capabilities the program needs
  capability net_bind_service,
  capability sys_time,

  # Files the program can read
  /etc/ntp.conf r,
  /etc/ntp/** r,
  /var/lib/ntp/** rw,
  /var/log/ntp.log w,

  # Runtime files
  /run/ntpd.pid rw,

  # Network access
  network inet dgram,
  network inet stream,
}

Profile rules use these permission flags:

FlagMeaning
rRead
wWrite
aAppend
xExecute
mMemory map as executable
kLock
lLink
ixExecute, inheriting profile
pxExecute, using target's profile
uxExecute, unconfined

Managing Profiles

# Put a profile in enforce mode
sudo aa-enforce /etc/apparmor.d/usr.sbin.mysqld

# Put a profile in complain mode (for debugging)
sudo aa-complain /etc/apparmor.d/usr.sbin.mysqld

# Disable a profile
sudo aa-disable /etc/apparmor.d/usr.sbin.mysqld

# Reload a profile after editing
sudo apparmor_parser -r /etc/apparmor.d/usr.sbin.mysqld

# Reload all profiles
sudo systemctl reload apparmor

Hands-On: Write a Simple AppArmor Profile

Let's create a profile for a simple script that should only be allowed to read one specific file and write to a log.

# Create the script
sudo tee /usr/local/bin/myapp.sh << 'SCRIPT'
#!/bin/bash
# Read config and log activity
CONFIG=$(cat /etc/myapp/config.txt)
echo "$(date) - Config loaded: $CONFIG" >> /var/log/myapp.log
echo "App running with config: $CONFIG"
SCRIPT
sudo chmod +x /usr/local/bin/myapp.sh

# Create the config and log locations
sudo mkdir -p /etc/myapp
echo "setting=value" | sudo tee /etc/myapp/config.txt
sudo touch /var/log/myapp.log

Now create the AppArmor profile:

sudo tee /etc/apparmor.d/usr.local.bin.myapp.sh << 'PROFILE'
# AppArmor profile for /usr/local/bin/myapp.sh
/usr/local/bin/myapp.sh {
  # Include base abstractions (libc, etc.)
  #include <abstractions/base>
  #include <abstractions/bash>

  # The script itself needs to be readable
  /usr/local/bin/myapp.sh r,

  # Allow reading the config file
  /etc/myapp/config.txt r,

  # Allow writing to the log file
  /var/log/myapp.log w,

  # Allow executing common utilities
  /usr/bin/cat ix,
  /usr/bin/echo ix,
  /usr/bin/date ix,
}
PROFILE
# Load the profile in complain mode first
sudo aa-complain /etc/apparmor.d/usr.local.bin.myapp.sh

# Test the script
sudo /usr/local/bin/myapp.sh

# Check for any violations
sudo dmesg | grep ALLOWED | tail -5

# If it works, switch to enforce mode
sudo aa-enforce /etc/apparmor.d/usr.local.bin.myapp.sh

# Test again -- should still work
sudo /usr/local/bin/myapp.sh

# Now try to do something NOT in the profile
# Edit the script to also try reading /etc/shadow
sudo /usr/local/bin/myapp.sh
# The /etc/shadow read will be denied

Generating Profiles with aa-genprof

AppArmor can generate a profile by watching what a program does:

# Install the utilities
sudo apt install apparmor-utils    # Debian/Ubuntu

# Start profile generation (interactive)
sudo aa-genprof /usr/local/bin/myapp.sh

In another terminal, run the program:

sudo /usr/local/bin/myapp.sh

Back in the aa-genprof terminal, press S to scan for events, then respond to each prompt about whether to allow or deny the observed accesses. Press F to finish and save the profile.

Troubleshooting AppArmor Denials

# Check for denials in the system log
sudo dmesg | grep "apparmor" | tail -20

# Or in the system journal
sudo journalctl -k | grep "apparmor" | tail -20

# Look for DENIED entries
sudo journalctl -k | grep "DENIED"

Example denial:

audit: type=1400 audit(...): apparmor="DENIED" operation="open"
  profile="/usr/local/bin/myapp.sh" name="/etc/shadow"
  pid=1234 comm="cat" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

Reading this:

  • apparmor="DENIED" -- Access was denied
  • profile="/usr/local/bin/myapp.sh" -- Which profile triggered it
  • name="/etc/shadow" -- What file was being accessed
  • requested_mask="r" -- What the program tried to do (read)

To fix: add the appropriate rule to the profile, or reconsider whether the program should actually need that access.

Using aa-logprof to Update Profiles

After running a program and encountering denials, use aa-logprof to interactively update the profile:

sudo aa-logprof

It reads the denial logs and asks whether each denied access should be allowed. This is the easiest way to refine a profile.


When to Use Which?

Use SELinux When:

  • You are running RHEL, Fedora, CentOS, Rocky, or AlmaLinux (it is already there)
  • You need fine-grained control over process interactions
  • You are in an environment with compliance requirements (PCI-DSS, HIPAA)
  • You need network-level controls (port labeling)
  • You manage servers where many services interact

Use AppArmor When:

  • You are running Ubuntu, Debian, or SUSE (it is already there)
  • You want simpler, path-based profiles
  • You need to quickly confine a specific application
  • Your team is new to MAC systems
  • You want profiles that are easy to read and audit

General Advice

Do not disable your MAC system. The single most important piece of advice in this chapter is: do not turn it off. When something breaks and you suspect SELinux or AppArmor, switch to permissive/complain mode and investigate. Fix the problem properly. Then switch back to enforcing.

  +----------------------------------------------------------+
  |  "Setenforce 0" is NOT a solution.                       |
  |  It is like removing the smoke detector because it was   |
  |  beeping. Find out WHY it is beeping.                    |
  +----------------------------------------------------------+

Think About It: You are setting up a new server. The distribution comes with SELinux in enforcing mode. A developer asks you to disable it because their application is getting permission denied errors. What is the right response?


Debug This

On a RHEL server, a developer deploys a new web application to /var/www/app and configures Nginx to serve it. The site works perfectly. Then the developer runs a backup script that uses tar to archive the site, extracts it to test on another server, and copies it back. Now the site returns 403 Forbidden. Nothing in the file permissions has changed (ls -l looks correct). SELinux is in enforcing mode.

What happened?

Answer: When tar extracted and copied the files back, the SELinux contexts were lost. The files now have default_t or unconfined_u:object_r:default_t:s0 instead of httpd_sys_content_t. Standard tar and cp do not preserve SELinux contexts by default.

Fix:

# Restore the correct contexts
sudo restorecon -Rv /var/www/app/

Prevention: Use tar --selinux or cp --preserve=context to preserve SELinux contexts during backup and copy operations:

# Tar with SELinux context preservation
tar --selinux -czf backup.tar.gz /var/www/app/

# Copy with context preservation
cp --preserve=context -r /var/www/app/ /var/www/app-backup/

What Just Happened?

+------------------------------------------------------------------+
|                    SELINUX & APPARMOR                             |
+------------------------------------------------------------------+
|                                                                  |
|  MAC vs DAC:                                                     |
|    DAC: Owner controls access (rwxr-xr-x)                       |
|    MAC: System policy controls access (regardless of owner)      |
|    Both must allow --> access granted                            |
|                                                                  |
|  SELINUX (RHEL/Fedora):                                         |
|    Labels everything: user:role:type:level                       |
|    Key commands:                                                 |
|      getenforce / setenforce     -- check/set mode               |
|      ls -Z / ps -eZ             -- view contexts                 |
|      chcon                      -- change context (temporary)    |
|      restorecon                 -- restore default context       |
|      semanage fcontext          -- set permanent rules           |
|      getsebool / setsebool      -- manage booleans               |
|      ausearch -m avc            -- find denials                  |
|      sealert                    -- friendly error messages       |
|      audit2allow                -- generate policy (use caution) |
|                                                                  |
|  APPARMOR (Ubuntu/Debian/SUSE):                                  |
|    Path-based profiles per program                               |
|    Key commands:                                                 |
|      aa-status                  -- check status                  |
|      aa-enforce / aa-complain   -- set profile mode              |
|      aa-disable                 -- disable a profile             |
|      aa-genprof                 -- generate a new profile        |
|      aa-logprof                 -- update profile from logs      |
|      apparmor_parser -r         -- reload a profile              |
|                                                                  |
|  GOLDEN RULE:                                                    |
|    Never disable. Switch to permissive/complain, diagnose,       |
|    fix properly, switch back to enforcing.                       |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: SELinux Scavenger Hunt

On a RHEL/Fedora system (or a CentOS VM):

  1. List all SELinux booleans related to samba or nfs
  2. Find the default context for files in /srv/
  3. Create a directory /srv/mysite, put an HTML file in it, configure Nginx to serve it, and fix the SELinux denial without using setenforce 0
  4. Use audit2why (from the policycoreutils-python-utils package) to analyze a denial: sudo ausearch -m avc -ts recent | audit2why

Exercise 2: AppArmor Profile from Scratch

On an Ubuntu system:

  1. Write a simple Python or bash script that reads a config file, writes a log, and optionally makes a network request
  2. Use aa-genprof to generate a profile for it
  3. Switch to enforce mode
  4. Modify the script to try accessing a file not in the profile
  5. Observe the denial in the logs
  6. Use aa-logprof to decide whether to allow or deny the new access

Exercise 3: Compare and Contrast

If you have access to both a RHEL-family and Ubuntu system:

  1. Deploy the same simple web application on both
  2. Move the web files to a non-standard directory on both
  3. Fix the MAC denial on both systems
  4. Compare the experience: which was easier? Which gave better error messages? Which felt more secure?

Bonus Challenge

Create an AppArmor profile for curl that only allows it to connect to a specific list of domains. Test it by trying to curl an allowed domain (should work) and a disallowed domain (should be denied). This demonstrates how MAC can enforce network policies per-application.

HTTP Protocol Essentials

Why This Matters

Every time you open a browser, call an API, download a package with apt or dnf, or deploy a web application, you are using HTTP. It is the protocol that glues the web together. If you are going to manage web servers, set up reverse proxies, debug application issues, or secure web traffic, you need to understand HTTP at a level deeper than "it shows web pages."

Consider this real scenario: your company's API is returning intermittent 502 errors to users. The developers say "the app is fine." The load balancer logs show upstream timeouts. Without understanding HTTP status codes, headers, connection behavior, and the difference between the client, proxy, and backend, you will be guessing in the dark. This chapter gives you the foundation to diagnose and reason about every web request that flows through your infrastructure.


Try This Right Now

If you have any Linux system with curl installed (it is available on virtually every distribution), run this:

$ curl -v http://example.com 2>&1 | head -30

You should see something like:

*   Trying 93.184.216.34:80...
* Connected to example.com (93.184.216.34) port 80
> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Content-Length: 1256
< Cache-Control: max-age=604800

Those lines starting with > are what your machine sent (the request). Lines starting with < are what the server replied (the response). You just watched an HTTP conversation happen in real time. Let us break it all apart.


What HTTP Is

HTTP stands for HyperText Transfer Protocol. It is an application-layer protocol (Layer 7 in the OSI model) that defines how a client and a server communicate. The model is simple:

  1. The client sends a request.
  2. The server sends back a response.
  3. That is one transaction. Done.
┌──────────┐                        ┌──────────┐
│          │  ── HTTP Request ──>   │          │
│  Client  │                        │  Server  │
│  (curl,  │  <── HTTP Response ──  │  (nginx, │
│  browser)│                        │  apache) │
└──────────┘                        └──────────┘

HTTP is stateless -- each request-response pair is independent. The server does not remember your previous request unless something at the application layer (cookies, sessions, tokens) adds that memory.

HTTP runs on top of TCP (and with HTTP/2 and HTTP/3, on top of TLS and even QUIC), typically on port 80 for plain HTTP and port 443 for HTTPS.


Anatomy of an HTTP Request

An HTTP request has four parts:

┌─────────────────────────────────────────────────────┐
│  REQUEST LINE                                        │
│  GET /api/users?page=2 HTTP/1.1                     │
├─────────────────────────────────────────────────────┤
│  HEADERS                                             │
│  Host: api.example.com                              │
│  User-Agent: curl/8.5.0                             │
│  Accept: application/json                            │
│  Authorization: Bearer eyJhbGciOi...                │
├─────────────────────────────────────────────────────┤
│  BLANK LINE (separates headers from body)            │
├─────────────────────────────────────────────────────┤
│  BODY (optional)                                     │
│  {"name": "alice", "email": "alice@example.com"}    │
└─────────────────────────────────────────────────────┘

The Request Line

The first line contains three things:

  • Method -- what action to perform (GET, POST, PUT, DELETE, etc.)
  • URL/Path -- what resource you want (/api/users?page=2)
  • HTTP Version -- which version of the protocol (HTTP/1.1)

Headers

Headers are key-value pairs that carry metadata about the request. Each one sits on its own line in the format Header-Name: value. Headers are case-insensitive (Content-Type and content-type are the same).

Body

The body carries the actual data payload. GET requests usually have no body. POST and PUT requests typically do. The body is separated from headers by a blank line.


HTTP Methods

HTTP defines several methods (sometimes called "verbs"). Here are the ones you will encounter constantly:

MethodPurposeHas Body?Idempotent?Safe?
GETRetrieve a resourceNoYesYes
POSTCreate a resource / submit dataYesNoNo
PUTReplace a resource entirelyYesYesNo
PATCHPartially update a resourceYesNoNo
DELETERemove a resourceOptionalYesNo
HEADSame as GET but no response bodyNoYesYes
OPTIONSAsk what methods are supportedNoYesYes

Idempotent means doing it once or doing it ten times produces the same result. Sending the same PUT request ten times replaces the resource with the same data each time -- same outcome. Sending the same POST ten times might create ten different records.

Safe means the method should not change anything on the server. GET and HEAD are safe -- they only read.

Hands-On: Exploring Methods with curl

# A simple GET request
$ curl -X GET http://httpbin.org/get

# A POST request with JSON data
$ curl -X POST http://httpbin.org/post \
  -H "Content-Type: application/json" \
  -d '{"name": "linux-book", "topic": "HTTP"}'

# A PUT request
$ curl -X PUT http://httpbin.org/put \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name"}'

# A DELETE request
$ curl -X DELETE http://httpbin.org/delete

# A HEAD request (only headers, no body)
$ curl -I http://httpbin.org/get

# An OPTIONS request (check allowed methods)
$ curl -X OPTIONS http://httpbin.org/get -v 2>&1 | grep -i "allow"

Think About It: Why would a browser send an OPTIONS request before a POST? Look up "CORS preflight" -- it is directly related to this.


Anatomy of an HTTP Response

The server's response follows a similar structure:

┌─────────────────────────────────────────────────────┐
│  STATUS LINE                                         │
│  HTTP/1.1 200 OK                                    │
├─────────────────────────────────────────────────────┤
│  HEADERS                                             │
│  Content-Type: application/json                      │
│  Content-Length: 245                                  │
│  Cache-Control: no-cache                             │
│  X-Request-Id: a3f8c9d2                              │
├─────────────────────────────────────────────────────┤
│  BLANK LINE                                          │
├─────────────────────────────────────────────────────┤
│  BODY                                                │
│  {"users": [{"id": 1, "name": "alice"}, ...]}       │
└─────────────────────────────────────────────────────┘

The status line has the HTTP version, a status code (a three-digit number), and a reason phrase (a human-readable description).


HTTP Status Codes

Status codes are grouped into five classes. Memorize the common ones -- you will see them daily.

1xx -- Informational

CodeMeaningWhen You See It
100ContinueServer says "go ahead, send the body"
101Switching ProtocolsUpgrading to WebSocket

2xx -- Success

CodeMeaningWhen You See It
200OKStandard successful response
201CreatedResource successfully created (POST)
204No ContentSuccess, but no body to return (DELETE)

3xx -- Redirection

CodeMeaningWhen You See It
301Moved PermanentlyURL changed forever, update your bookmarks
302Found (Temp Redirect)Temporary redirect
304Not ModifiedCached version is still valid
307Temporary RedirectLike 302 but keeps the method
308Permanent RedirectLike 301 but keeps the method

4xx -- Client Error

CodeMeaningWhen You See It
400Bad RequestMalformed request syntax
401UnauthorizedAuthentication required
403ForbiddenAuthenticated but not authorized
404Not FoundResource does not exist
405Method Not AllowedUsed POST where only GET is accepted
408Request TimeoutClient took too long
429Too Many RequestsRate limit exceeded

5xx -- Server Error

CodeMeaningWhen You See It
500Internal Server ErrorUnhandled exception / generic server bug
502Bad GatewayProxy got invalid response from upstream
503Service UnavailableServer overloaded or in maintenance
504Gateway TimeoutProxy timed out waiting for upstream

Hands-On: Observing Status Codes

# 200 OK
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/status/200
200

# 404 Not Found
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/status/404
404

# 301 Redirect (follow with -L)
$ curl -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/redirect-to?url=http://example.com
302

# Follow the redirect
$ curl -L -o /dev/null -s -w "%{http_code}\n" http://httpbin.org/redirect-to?url=http://example.com
200

Think About It: You see 502 Bad Gateway errors. Is the problem on the client, the proxy, or the backend? What would you check first?


Essential HTTP Headers

Headers control everything from content negotiation to caching to authentication. Here are the ones you must know:

Request Headers

HeaderPurposeExample
HostWhich virtual host to reachHost: api.example.com
User-AgentIdentifies the client softwareUser-Agent: curl/8.5.0
AcceptWhat content types the client wantsAccept: application/json
Content-TypeFormat of the request bodyContent-Type: application/json
AuthorizationCredentials for authenticationAuthorization: Bearer eyJ...
CookieSession cookiesCookie: session=abc123
Cache-ControlCaching directives from clientCache-Control: no-cache
If-None-MatchConditional request (ETag-based)If-None-Match: "abc123"

Response Headers

HeaderPurposeExample
Content-TypeFormat of the response bodyContent-Type: text/html; charset=UTF-8
Content-LengthSize of response body in bytesContent-Length: 1256
Cache-ControlHow long clients/proxies can cacheCache-Control: max-age=3600
Set-CookieSend cookies to the clientSet-Cookie: session=abc123; HttpOnly
LocationURL to redirect to (with 3xx codes)Location: https://example.com/new
X-Request-IdUnique ID for tracing (custom header)X-Request-Id: a3f8c9d2
ServerIdentifies the server softwareServer: nginx/1.24.0

Hands-On: Inspecting Headers

# See all response headers
$ curl -I https://www.google.com

# Send custom headers
$ curl -H "Accept: application/json" \
       -H "X-Custom-Header: myvalue" \
       http://httpbin.org/headers

# See both request and response headers
$ curl -v http://httpbin.org/get 2>&1 | grep -E "^[<>]"

The Host Header and Virtual Hosting

One crucial thing to understand: a single server (one IP address) can host hundreds of different websites. How does the server know which site you want? The Host header.

GET / HTTP/1.1
Host: blog.example.com       <-- THIS tells the server which site

This is called name-based virtual hosting. When you configure Nginx or Apache with multiple server blocks (or VirtualHosts), they use the Host header to route the request to the right configuration.

# These hit the same IP but get different sites:
$ curl -H "Host: site-a.example.com" http://93.184.216.34/
$ curl -H "Host: site-b.example.com" http://93.184.216.34/

HTTP/1.1 vs HTTP/2

HTTP/1.1 (1997 -- still everywhere)

HTTP/1.1 is text-based and human-readable. Each request-response uses its own TCP connection (or reuses one with Connection: keep-alive, which is the default in HTTP/1.1).

The major bottleneck: head-of-line blocking. If you need 10 files, the browser sends requests one-at-a-time on each connection. Browsers work around this by opening 6-8 parallel TCP connections per host, but this is wasteful.

HTTP/2 (2015 -- widely adopted)

HTTP/2 solves these problems:

┌──────────────────────────────────────────────────────────┐
│                     HTTP/1.1                              │
│                                                          │
│  Connection 1: GET /style.css ──> response               │
│  Connection 2: GET /app.js   ──> response                │
│  Connection 3: GET /logo.png ──> response                │
│  Connection 4: GET /data.json ──> response               │
│  (One request per connection at a time)                   │
├──────────────────────────────────────────────────────────┤
│                     HTTP/2                                │
│                                                          │
│  Single Connection:                                       │
│    Stream 1: GET /style.css ──> response    ┐             │
│    Stream 2: GET /app.js   ──> response     │ All at once │
│    Stream 3: GET /logo.png ──> response     │ (multiplexed│
│    Stream 4: GET /data.json ──> response    ┘  binary)    │
└──────────────────────────────────────────────────────────┘

Key improvements in HTTP/2:

  • Multiplexing -- multiple requests/responses over a single TCP connection simultaneously
  • Binary framing -- more efficient parsing (not human-readable on the wire)
  • Header compression (HPACK) -- reduces redundant header data
  • Server push -- server can proactively send resources it predicts the client needs
  • Stream prioritization -- clients can hint which resources matter most

Hands-On: Checking HTTP/2 Support

# Check if a site supports HTTP/2
$ curl -I --http2 -s https://www.google.com | head -1
HTTP/2 200

# Force HTTP/1.1 for comparison
$ curl -I --http1.1 -s https://www.google.com | head -1
HTTP/1.1 200 OK

# Verbose to see the negotiation
$ curl -v --http2 https://example.com 2>&1 | grep -i "ALPN"
* ALPN: offers h2,http/1.1
* ALPN: server accepted h2

ALPN (Application-Layer Protocol Negotiation) is how the client and server agree to use HTTP/2 during the TLS handshake.


HTTPS: HTTP + TLS

HTTPS is not a different protocol -- it is HTTP wrapped in a TLS (Transport Layer Security) encrypted tunnel. Everything we have discussed (methods, headers, status codes) works identically; the difference is that the entire conversation is encrypted.

┌──────────────────────────────────────────────────────────┐
│                    HTTPS Flow                             │
│                                                          │
│  1. Client connects to port 443                          │
│  2. TLS handshake occurs:                                │
│     - Server presents its certificate                     │
│     - Client verifies the certificate                     │
│     - Both sides agree on encryption keys                 │
│  3. Encrypted tunnel established                          │
│  4. HTTP request/response flows inside the tunnel         │
│                                                          │
│  ┌────────┐    TLS Tunnel    ┌────────┐                  │
│  │ Client ├══════════════════┤ Server │                  │
│  │        │  HTTP inside     │        │                  │
│  └────────┘                  └────────┘                  │
│                                                          │
│  Anyone sniffing the network sees encrypted gibberish.    │
└──────────────────────────────────────────────────────────┘

Hands-On: Inspecting a TLS Connection

# See the full TLS handshake + certificate details
$ curl -v https://example.com 2>&1 | grep -E "(SSL|TLS|subject|issuer|expire)"

# Check certificate details specifically
$ openssl s_client -connect example.com:443 -brief
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Ciphersuite: TLS_AES_256_GCM_SHA384

We covered TLS in depth in Chapters 39-41. The key point here: always use HTTPS in production. There is no excuse not to, especially with free certificates from Let's Encrypt.


Mastering curl for HTTP Exploration

curl is the Swiss Army knife of HTTP. Every sysadmin and developer should be fluent in it. Here is your reference:

# Basic GET request
$ curl http://example.com

# Save output to a file
$ curl -o page.html http://example.com

# Show response headers only
$ curl -I http://example.com

# Show the full conversation (verbose)
$ curl -v http://example.com

# Follow redirects
$ curl -L http://example.com

# POST with form data
$ curl -X POST -d "user=alice&pass=secret" http://httpbin.org/post

# POST with JSON
$ curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"user": "alice"}' \
  http://httpbin.org/post

# Send custom headers
$ curl -H "Authorization: Bearer mytoken" http://httpbin.org/headers

# Show only the HTTP status code
$ curl -o /dev/null -s -w "%{http_code}\n" http://example.com

# Show timing information
$ curl -o /dev/null -s -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" https://example.com

# Download with progress bar
$ curl -# -O https://example.com/largefile.tar.gz

# Resume a broken download
$ curl -C - -O https://example.com/largefile.tar.gz

# Send a request with basic auth
$ curl -u username:password http://httpbin.org/basic-auth/username/password

# Ignore SSL certificate errors (testing only!)
$ curl -k https://self-signed.badssl.com/

Safety Warning: The -k flag disables certificate verification. Never use this in production scripts. It defeats the entire purpose of HTTPS.

Timing a Request End-to-End

This is invaluable for debugging slow responses:

$ curl -o /dev/null -s -w "\
   DNS Lookup:  %{time_namelookup}s\n\
   TCP Connect: %{time_connect}s\n\
   TLS Handshake: %{time_appconnect}s\n\
   First Byte:  %{time_starttransfer}s\n\
   Total Time:  %{time_total}s\n\
   Download Size: %{size_download} bytes\n\
" https://www.google.com

Example output:

   DNS Lookup:  0.012s
   TCP Connect: 0.025s
   TLS Handshake: 0.078s
   First Byte:  0.142s
   Total Time:  0.155s
   Download Size: 19876 bytes

If DNS lookup is slow, you have a DNS problem. If TLS handshake is slow, check the certificate chain. If time-to-first-byte (TTFB) is slow, the backend application is slow.


Debug This

A developer reports that their API call is failing with "connection refused." They show you this curl command:

$ curl -v http://api.internal.company.com:8080/health
* Trying 10.0.1.50:8080...
* connect to 10.0.1.50 port 8080 failed: Connection refused
* Failed to connect to api.internal.company.com port 8080: Connection refused

Questions to work through:

  1. DNS resolved successfully (to 10.0.1.50). Is DNS the problem?
  2. "Connection refused" means TCP got a RST packet. What does this tell you about the server?
  3. What would you check on the server at 10.0.1.50?
  4. How is "Connection refused" different from "Connection timed out"?

Answers:

  1. No, DNS is fine. The name resolved to an IP.
  2. "Connection refused" means the server is reachable at the network level, but nothing is listening on port 8080. The TCP SYN got a RST back.
  3. Check if the application is running (ss -tlnp | grep 8080), check if it crashed (journalctl -u myapp), check if it is listening on a different port or only on localhost (127.0.0.1 instead of 0.0.0.0).
  4. "Connection timed out" means packets are being dropped (firewall, wrong IP, host down). "Connection refused" means the host is alive and actively rejecting the connection.

Connection Keep-Alive and Persistent Connections

In HTTP/1.0, every request opened a new TCP connection and closed it after the response. This was wasteful -- TCP handshakes and TLS negotiations are expensive.

HTTP/1.1 introduced persistent connections (keep-alive) as the default. The TCP connection stays open for multiple request-response cycles:

Without Keep-Alive (HTTP/1.0):
  TCP connect → Request 1 → Response 1 → TCP close
  TCP connect → Request 2 → Response 2 → TCP close
  TCP connect → Request 3 → Response 3 → TCP close

With Keep-Alive (HTTP/1.1 default):
  TCP connect → Request 1 → Response 1
              → Request 2 → Response 2
              → Request 3 → Response 3
              → ... → TCP close (after timeout)

You can see this in action:

# curl reuses connections when given multiple URLs
$ curl -v http://example.com http://example.com 2>&1 | grep -E "(Connected|Re-using)"
* Connected to example.com (93.184.216.34) port 80
* Re-using existing connection with host example.com

Content Negotiation

When a client and server need to agree on the format of data, they use content negotiation headers:

# Client says "I want JSON"
$ curl -H "Accept: application/json" http://httpbin.org/get

# Client says "I want XML"
$ curl -H "Accept: application/xml" http://httpbin.org/get

# Client says "I'm sending JSON"
$ curl -H "Content-Type: application/json" \
       -d '{"key": "value"}' \
       http://httpbin.org/post

Common content types you will encounter:

Content-TypeWhat It Is
text/htmlHTML web page
text/plainPlain text
application/jsonJSON data
application/xmlXML data
application/x-www-form-urlencodedHTML form data
multipart/form-dataFile uploads
application/octet-streamRaw binary data

Caching Basics

HTTP has built-in caching mechanisms that reduce load and speed up responses:

┌────────┐     ┌───────────┐     ┌────────┐
│ Client │ ──> │   Cache   │ ──> │ Server │
│        │ <── │ (browser, │ <── │        │
│        │     │  proxy,   │     │        │
│        │     │  CDN)     │     │        │
└────────┘     └───────────┘     └────────┘

Key caching headers:

  • Cache-Control: max-age=3600 -- cache this for 3600 seconds
  • Cache-Control: no-cache -- always revalidate with server before using cache
  • Cache-Control: no-store -- never cache this at all
  • ETag: "abc123" -- a fingerprint of the content; client can ask "has it changed?"
  • If-None-Match: "abc123" -- client sends the old ETag; server returns 304 if unchanged
# See caching headers
$ curl -I https://www.google.com 2>/dev/null | grep -i cache
Cache-Control: private, max-age=0

# See ETag header
$ curl -I http://example.com 2>/dev/null | grep -i etag
ETag: "3147526947+gzip"

What Just Happened?

┌──────────────────────────────────────────────────────────┐
│                   Chapter 43 Recap                        │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  HTTP is a request-response protocol (client asks,       │
│  server answers). Each transaction is stateless.          │
│                                                          │
│  A REQUEST has: Method + URL + Headers + optional Body    │
│  A RESPONSE has: Status Code + Headers + optional Body    │
│                                                          │
│  Methods: GET (read), POST (create), PUT (replace),      │
│           DELETE (remove), HEAD (headers only),           │
│           OPTIONS (capabilities)                          │
│                                                          │
│  Status codes:                                            │
│    2xx = success    3xx = redirect    4xx = client error  │
│    5xx = server error                                     │
│                                                          │
│  Key headers: Host (virtual hosting), Content-Type        │
│  (data format), Authorization (credentials),              │
│  Cache-Control (caching behavior)                         │
│                                                          │
│  HTTP/2 = binary, multiplexed, single connection          │
│  HTTPS  = HTTP inside a TLS encrypted tunnel              │
│                                                          │
│  curl is your best friend for HTTP debugging.             │
│                                                          │
└──────────────────────────────────────────────────────────┘

Try This

Exercise 1: Decode a Full Request

Use curl -v against any public URL. Identify and label every part: the method, URL, HTTP version, each request header, the status code, each response header, and the body.

Exercise 2: Status Code Scavenger Hunt

Using httpbin.org/status/{code}, get curl to show you a 200, 301, 403, 404, 500, and 502. Observe how the responses differ.

$ for code in 200 301 403 404 500 502; do
    echo "=== $code ==="
    curl -o /dev/null -s -w "Status: %{http_code}\n" http://httpbin.org/status/$code
  done

Exercise 3: Timing Deep Dive

Use the curl timing format string from this chapter to measure the response time of five different websites. Which has the fastest TTFB? Which has the slowest DNS lookup?

Exercise 4: Content Negotiation

Send requests to httpbin.org/get with different Accept headers. Try application/json, text/html, application/xml, and text/plain. Compare the responses.

Bonus Challenge

Write a bash script that takes a URL as an argument and produces a "health check report" including: the HTTP status code, the server header, the content type, the TLS version (if HTTPS), and the total response time. Format it nicely.

#!/bin/bash
URL="${1:?Usage: $0 <url>}"
echo "=== Health Check: $URL ==="
curl -o /dev/null -s -w "\
  Status Code:    %{http_code}\n\
  Content Type:   %{content_type}\n\
  TLS Version:    %{ssl_version}\n\
  Response Time:  %{time_total}s\n\
  Download Size:  %{size_download} bytes\n\
" "$URL"

What Comes Next

Now that you understand HTTP at the protocol level, it is time to set up the software that actually speaks this protocol. In the next chapter, we will install Nginx and configure it from scratch to serve web content -- your first web server.

Nginx: From Zero to Production

Why This Matters

You have just deployed a web application. It works perfectly on localhost:3000. Now you need to make it available to the world on port 80 and 443, serve static files efficiently, handle thousands of concurrent connections, add HTTPS, and set up proper logging. You need a web server.

Nginx (pronounced "engine-x") is the most popular web server on the internet, powering over a third of all websites. It is used by Netflix, Cloudflare, WordPress.com, and countless others. It is fast, lightweight, and incredibly versatile -- it can serve static files, act as a reverse proxy, terminate TLS, and load-balance traffic.

This chapter takes you from installing Nginx to having a production-ready configuration. By the end, you will understand its architecture, know how to write server blocks, serve static content, and configure it securely.


Try This Right Now

On any Debian/Ubuntu system:

$ sudo apt update && sudo apt install -y nginx
$ sudo systemctl start nginx
$ curl -I http://localhost

You should see:

HTTP/1.1 200 OK
Server: nginx/1.24.0
Content-Type: text/html
...

Congratulations. You have a running web server. Now let us understand everything behind that simple response.

Distro Note: On RHEL/CentOS/Fedora, use sudo dnf install -y nginx. On Arch, use sudo pacman -S nginx. The package is called nginx everywhere, but the default configuration paths differ (we will cover this).


Nginx Architecture

Most traditional web servers (like Apache's prefork MPM) spawn a new process or thread for every connection. At 10,000 concurrent connections, you have 10,000 processes eating memory. Nginx takes a fundamentally different approach.

Master and Worker Processes

┌──────────────────────────────────────────────────────────┐
│                    Nginx Process Model                    │
│                                                          │
│  ┌──────────────────┐                                    │
│  │  Master Process   │  (runs as root)                   │
│  │  - Reads config   │  - PID 1234                       │
│  │  - Manages workers│  - Binds to ports 80/443          │
│  │  - Handles signals│  - Does NOT serve requests        │
│  └───────┬──────────┘                                    │
│          │                                               │
│    ┌─────┼──────────────────┐                            │
│    │     │                  │                             │
│  ┌─┴──┐ ┌┴───┐ ┌────┐ ┌────┐                            │
│  │ W1 │ │ W2 │ │ W3 │ │ W4 │  (run as www-data/nginx)   │
│  └────┘ └────┘ └────┘ └────┘                             │
│  Worker processes handle ALL connections                  │
│  Each worker uses an event loop (epoll/kqueue)            │
│  One worker can handle thousands of connections            │
└──────────────────────────────────────────────────────────┘
  • The master process runs as root (it needs to bind to ports 80 and 443). It reads the configuration, creates worker processes, and handles signals (reload, stop).
  • Worker processes do the actual work. They run as an unprivileged user (www-data on Debian, nginx on RHEL). Each worker uses an event-driven, non-blocking model with epoll (Linux) or kqueue (BSD).

The key insight: a single worker process can handle thousands of concurrent connections because it never blocks waiting for I/O. It uses the kernel's event notification system to efficiently multiplex connections.

Hands-On: See the Processes

# View master and worker processes
$ ps aux | grep nginx
root       1234  ...  nginx: master process /usr/sbin/nginx
www-data   1235  ...  nginx: worker process
www-data   1236  ...  nginx: worker process
www-data   1237  ...  nginx: worker process
www-data   1238  ...  nginx: worker process

# Count worker processes (should match CPU cores by default)
$ ps aux | grep 'nginx: worker' | grep -v grep | wc -l
4

# Check how many CPU cores you have
$ nproc
4

By default, Nginx creates one worker per CPU core. This is optimal -- no context-switching overhead between workers.

Think About It: Why does the master process run as root while workers run as an unprivileged user? What is the security benefit? (Hint: think about what happens if a worker is compromised.)


Configuration File Structure

Nginx configuration lives in /etc/nginx/. Understanding the file layout is essential.

Debian/Ubuntu Layout

/etc/nginx/
├── nginx.conf              # Main config (global settings)
├── sites-available/        # All site configs (available but not active)
│   └── default             # Default site config
├── sites-enabled/          # Symlinks to active site configs
│   └── default -> ../sites-available/default
├── conf.d/                 # Additional config fragments
├── snippets/               # Reusable config snippets
├── modules-available/      # Dynamic module configs
├── modules-enabled/        # Active module symlinks
├── mime.types              # Maps file extensions to MIME types
└── fastcgi_params          # FastCGI parameter defaults

RHEL/CentOS/Fedora Layout

/etc/nginx/
├── nginx.conf              # Main config (includes everything)
├── conf.d/                 # Site configs go here (*.conf auto-loaded)
│   └── default.conf        # Default site
├── default.d/              # Additional defaults
└── mime.types              # MIME type mappings

Distro Note: RHEL-based distributions do not use sites-available/sites-enabled. Instead, they drop .conf files directly into conf.d/. Both approaches work. The Debian style gives you a way to have configs "available but not enabled" without deleting them.

The Main Configuration File

$ cat /etc/nginx/nginx.conf

Here is a typical nginx.conf with annotations:

# --- Global Context ---
user www-data;                      # Worker process user
worker_processes auto;              # One worker per CPU core
pid /run/nginx.pid;                 # PID file location
error_log /var/log/nginx/error.log; # Global error log

# --- Events Context ---
events {
    worker_connections 1024;        # Max connections per worker
    # Total max connections = worker_processes x worker_connections
    # With 4 workers: 4 x 1024 = 4096 simultaneous connections
}

# --- HTTP Context ---
http {
    # MIME types
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging
    access_log /var/log/nginx/access.log;

    # Performance
    sendfile on;                    # Efficient file serving (kernel-level copy)
    tcp_nopush on;                  # Send headers and file data together
    tcp_nodelay on;                 # Disable Nagle's algorithm
    keepalive_timeout 65;           # Keep connections alive for 65 seconds

    # Gzip compression
    gzip on;
    gzip_types text/plain text/css application/json application/javascript;

    # Include site configs
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

The hierarchy of contexts:

Main context (global)
├── events { }          # Connection handling settings
└── http { }            # All HTTP-related settings
    ├── upstream { }    # Backend server pools
    └── server { }      # Virtual host (one per site)
        └── location { }  # URL pattern matching rules

Server Blocks (Virtual Hosts)

A server block is Nginx's equivalent of Apache's VirtualHost. It defines how to handle requests for a specific domain name (or IP/port combination).

Your First Server Block

Create a new site configuration:

$ sudo nano /etc/nginx/sites-available/mysite
server {
    listen 80;                          # Listen on port 80
    server_name mysite.example.com;     # Respond to this domain

    root /var/www/mysite;               # Document root
    index index.html index.htm;         # Default files to serve

    location / {
        try_files $uri $uri/ =404;      # Try file, then directory, then 404
    }
}

Enable it and create the content:

# Create the document root
$ sudo mkdir -p /var/www/mysite

# Create a simple page
$ echo '<h1>Hello from mysite!</h1>' | sudo tee /var/www/mysite/index.html

# Enable the site (Debian/Ubuntu)
$ sudo ln -s /etc/nginx/sites-available/mysite /etc/nginx/sites-enabled/

# Test the configuration
$ sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

# Reload Nginx (no downtime!)
$ sudo systemctl reload nginx

# Test it
$ curl -H "Host: mysite.example.com" http://localhost
<h1>Hello from mysite!</h1>

Distro Note: On RHEL/CentOS/Fedora, skip the symlink step. Instead, save your config directly to /etc/nginx/conf.d/mysite.conf (must end in .conf).

Multiple Sites on One Server

# /etc/nginx/sites-available/blog
server {
    listen 80;
    server_name blog.example.com;
    root /var/www/blog;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }
}

# /etc/nginx/sites-available/api
server {
    listen 80;
    server_name api.example.com;
    root /var/www/api;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }
}

Nginx uses the Host header (from Chapter 43) to determine which server block handles each request. If no server_name matches, Nginx uses the default server -- the first server block it encounters, or one explicitly marked with default_server:

server {
    listen 80 default_server;
    server_name _;              # Underscore = catch-all / invalid name
    return 444;                 # Close connection without response
}

Location Blocks

Location blocks control what happens when a request matches a specific URL pattern. They are the core routing mechanism in Nginx.

Location Matching Rules

# Exact match (highest priority)
location = /health {
    return 200 "OK\n";
}

# Prefix match
location /images/ {
    root /var/www/static;       # Serves /var/www/static/images/
}

# Regular expression match (case-sensitive)
location ~ \.php$ {
    # Handle PHP files
}

# Regular expression match (case-insensitive)
location ~* \.(jpg|jpeg|png|gif)$ {
    expires 30d;                # Cache images for 30 days
}

# Preferential prefix match (like prefix but beats regex)
location ^~ /static/ {
    root /var/www;
}

Location Priority Order

Nginx evaluates locations in this order:

1.  = /exact/path          (exact match -- checked first, stops immediately)
2.  ^~ /prefix/path        (preferential prefix -- stops, skips regex)
3.  ~ or ~* regex          (regex -- first match wins)
4.  /prefix/path           (regular prefix -- longest match wins)

Hands-On: Understanding Location Matching

server {
    listen 80;
    server_name test.example.com;

    # Matches ONLY /
    location = / {
        return 200 "exact root\n";
    }

    # Matches /api, /api/, /api/anything
    location /api/ {
        return 200 "prefix api\n";
    }

    # Matches any .json file
    location ~* \.json$ {
        return 200 "regex json\n";
    }

    # Matches /static/ and below, skips regex check
    location ^~ /static/ {
        return 200 "preferential static\n";
    }

    # Catch-all
    location / {
        return 200 "default catch-all\n";
    }
}

Test it:

$ curl http://localhost/              # "exact root"
$ curl http://localhost/api/users     # "prefix api"
$ curl http://localhost/data.json     # "regex json"
$ curl http://localhost/api/data.json # "prefix api" (prefix matched longer)
$ curl http://localhost/static/x.json # "preferential static" (^~ beats regex)
$ curl http://localhost/anything      # "default catch-all"

Think About It: What happens if you request /api/data.json? The /api/ prefix matches, and the .json regex also matches. Which wins? (Answer: the longer prefix /api/ wins because prefix match length is compared first; the regex is only tried if no preferential prefix matches.)


The try_files Directive

try_files is one of the most used directives. It tells Nginx to try several options in order:

location / {
    try_files $uri $uri/ /index.html;
}

This means:

  1. Try to serve the file at $uri (the requested path)
  2. If not found, try it as a directory $uri/ (and serve its index file)
  3. If still not found, serve /index.html (a fallback -- perfect for SPAs)
# For a traditional site (return 404 if not found)
location / {
    try_files $uri $uri/ =404;
}

# For a single-page application (always serve index.html)
location / {
    try_files $uri $uri/ /index.html;
}

# For a PHP application (pass to PHP-FPM if not a static file)
location / {
    try_files $uri $uri/ /index.php?$query_string;
}

Serving Static Files

Nginx excels at serving static files. Here is a production-ready static file configuration:

server {
    listen 80;
    server_name www.example.com;
    root /var/www/example;

    # Homepage
    location / {
        try_files $uri $uri/ =404;
    }

    # Static assets with long cache
    location ~* \.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;         # Don't log static file requests
    }

    # Deny access to hidden files
    location ~ /\. {
        deny all;
        access_log off;
        log_not_found off;
    }
}

Hands-On: Build and Serve a Static Site

# Create site structure
$ sudo mkdir -p /var/www/demo/{css,js,images}

# Create HTML
$ sudo tee /var/www/demo/index.html > /dev/null << 'EOF'
<!DOCTYPE html>
<html>
<head>
    <title>Nginx Demo</title>
    <link rel="stylesheet" href="/css/style.css">
</head>
<body>
    <h1>Nginx is serving this page!</h1>
    <p>Served at: <span id="time"></span></p>
    <script src="/js/app.js"></script>
</body>
</html>
EOF

# Create CSS
$ sudo tee /var/www/demo/css/style.css > /dev/null << 'EOF'
body { font-family: sans-serif; max-width: 800px; margin: 50px auto; }
h1 { color: #2d8cf0; }
EOF

# Create JS
$ sudo tee /var/www/demo/js/app.js > /dev/null << 'EOF'
document.getElementById('time').textContent = new Date().toLocaleString();
EOF

# Set ownership
$ sudo chown -R www-data:www-data /var/www/demo

# Create Nginx config
$ sudo tee /etc/nginx/sites-available/demo > /dev/null << 'EOF'
server {
    listen 80;
    server_name demo.local;
    root /var/www/demo;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }

    location ~* \.(css|js|png|jpg|gif|ico)$ {
        expires 7d;
        add_header Cache-Control "public";
    }
}
EOF

# Enable and reload
$ sudo ln -sf /etc/nginx/sites-available/demo /etc/nginx/sites-enabled/
$ sudo nginx -t && sudo systemctl reload nginx

# Test
$ curl -H "Host: demo.local" http://localhost
$ curl -I -H "Host: demo.local" http://localhost/css/style.css

Access and Error Logs

Nginx has excellent logging. Understanding the logs is critical for debugging and monitoring.

Access Log

Every request is logged to the access log (default: /var/log/nginx/access.log):

93.184.216.34 - - [15/Jan/2025:10:30:15 +0000] "GET /api/users HTTP/1.1" 200 1234 "https://example.com/" "Mozilla/5.0..."

Fields: remote_addr - remote_user [time] "request" status body_bytes_sent "referer" "user_agent"

Custom Log Formats

http {
    # Define a custom log format
    log_format detailed '$remote_addr - $remote_user [$time_local] '
                        '"$request" $status $body_bytes_sent '
                        '"$http_referer" "$http_user_agent" '
                        'rt=$request_time urt=$upstream_response_time';

    # Use it in a server block
    server {
        access_log /var/log/nginx/mysite-access.log detailed;
    }
}

The $request_time and $upstream_response_time fields are gold for performance debugging:

  • $request_time -- total time Nginx spent handling the request
  • $upstream_response_time -- how long the backend took to respond

Error Log

The error log captures warnings, errors, and debugging information:

# View recent errors
$ sudo tail -20 /var/log/nginx/error.log

# Watch errors in real time
$ sudo tail -f /var/log/nginx/error.log

You can control the error log verbosity:

error_log /var/log/nginx/error.log warn;    # warn, error, crit, alert, emerg
error_log /var/log/nginx/debug.log debug;   # Very verbose, for troubleshooting

Per-Site Logging

server {
    server_name api.example.com;
    access_log /var/log/nginx/api-access.log;
    error_log /var/log/nginx/api-error.log;
}

Hands-On: Analyzing Logs

# Top 10 most requested URLs
$ awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# Top 10 client IPs
$ awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# Count of each status code
$ awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# All 5xx errors
$ awk '$9 >= 500' /var/log/nginx/access.log

# Requests per minute (rough)
$ awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1-3 | uniq -c | tail -10

Testing Configuration and Reload vs Restart

Always Test Before Reloading

# Test configuration syntax
$ sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

# If there is an error:
$ sudo nginx -t
nginx: [emerg] unknown directive "sevrer" in /etc/nginx/sites-enabled/demo:1
nginx: configuration file /etc/nginx/nginx.conf test failed

Always run nginx -t before reload. A bad config on reload will be rejected, but it is better to catch it explicitly.

Reload vs Restart

ActionCommandDowntime?What Happens
Reloadsudo systemctl reload nginxNoMaster re-reads config, spawns new workers, old workers finish existing requests, then exit
Restartsudo systemctl restart nginxBriefFull stop then start. All connections dropped

In production, always use reload. Restart is only needed when changing fundamental settings like the user directive or loading new binary modules.

# The production workflow:
$ sudo nano /etc/nginx/sites-available/mysite    # Edit config
$ sudo nginx -t                                   # Test
$ sudo systemctl reload nginx                     # Apply (zero downtime)

Basic Security Headers

A production Nginx configuration should include security headers:

server {
    listen 80;
    server_name www.example.com;

    # Prevent clickjacking
    add_header X-Frame-Options "SAMEORIGIN" always;

    # Prevent MIME type sniffing
    add_header X-Content-Type-Options "nosniff" always;

    # Enable XSS protection
    add_header X-XSS-Protection "1; mode=block" always;

    # Referrer policy
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;

    # Content Security Policy (customize per site)
    add_header Content-Security-Policy "default-src 'self';" always;

    # Hide Nginx version number
    server_tokens off;

    # Prevent access to hidden files (.git, .env, etc.)
    location ~ /\. {
        deny all;
        return 404;
    }

    # ... rest of config
}

Test the headers:

$ curl -I http://localhost
HTTP/1.1 200 OK
Server: nginx                          # No version number (server_tokens off)
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block

Safety Warning: Never expose .git directories, .env files, or backup files through your web server. The location ~ /\. block above prevents this. Always test with curl http://yoursite/.git/config to verify.


Debug This

You have configured a new server block, reloaded Nginx, but all requests are returning the default welcome page instead of your site.

Your config:

server {
    listen 80;
    server_name myapp.example.com;
    root /var/www/myapp;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }
}

Debugging steps:

# 1. Is the config actually loaded?
$ sudo nginx -t
# If this fails, the config file is not being included.
# Check: Is the file in sites-enabled (symlinked)?
$ ls -la /etc/nginx/sites-enabled/

# 2. Is the Host header correct?
$ curl -H "Host: myapp.example.com" http://localhost
# If this works, DNS is the problem (the domain isn't pointing to your server)

# 3. Is there a default_server catching everything?
$ grep -r "default_server" /etc/nginx/sites-enabled/

# 4. Does the document root exist and have correct permissions?
$ ls -la /var/www/myapp/
$ ls -la /var/www/myapp/index.html

# 5. Check the error log
$ sudo tail -20 /var/log/nginx/error.log

Common causes:

  • Missing symlink in sites-enabled
  • Config file does not end in .conf (for RHEL) so it is not included
  • The default site has default_server and is catching the request
  • Document root has wrong permissions (www-data cannot read it)

What Just Happened?

┌──────────────────────────────────────────────────────────┐
│                   Chapter 44 Recap                        │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  Nginx uses a master/worker process architecture with     │
│  event-driven, non-blocking I/O. This lets it handle      │
│  thousands of connections with minimal memory.             │
│                                                          │
│  Configuration hierarchy:                                 │
│    nginx.conf -> http { } -> server { } -> location { }  │
│                                                          │
│  Key concepts:                                            │
│  - Server blocks = virtual hosts (matched by Host header) │
│  - Location blocks = URL routing (exact > prefix > regex) │
│  - try_files = graceful fallback chain                    │
│  - Always: nginx -t before reload                         │
│  - Always: reload, not restart (zero downtime)            │
│                                                          │
│  Files:                                                   │
│  - Config: /etc/nginx/ (nginx.conf, sites-available/)    │
│  - Logs:   /var/log/nginx/ (access.log, error.log)       │
│  - Webroot: /var/www/                                     │
│                                                          │
└──────────────────────────────────────────────────────────┘

Try This

Exercise 1: Multiple Virtual Hosts

Set up three server blocks on one Nginx instance, each serving different content. Use curl -H "Host: ..." to test each one. Add per-site access logs and verify they log to separate files.

Exercise 2: Custom Error Pages

Create custom 404 and 500 error pages. Configure Nginx to use them:

error_page 404 /custom_404.html;
error_page 500 502 503 504 /custom_50x.html;

location = /custom_404.html {
    root /var/www/errors;
    internal;
}

Exercise 3: Directory Listing

Enable directory listing for a /files/ path using the autoindex module:

location /files/ {
    alias /var/www/shared-files/;
    autoindex on;
    autoindex_exact_size off;
    autoindex_localtime on;
}

Put some files in the directory and browse the listing.

Exercise 4: Log Analysis

Generate some traffic with a loop, then use awk to answer: What is the most requested URL? What is the average response time? How many 404s occurred?

# Generate traffic
$ for i in $(seq 1 100); do
    curl -s http://localhost/ > /dev/null
    curl -s http://localhost/nonexistent > /dev/null
  done

Bonus Challenge

Set up Nginx with HTTPS using a self-signed certificate. Configure it to redirect all HTTP traffic to HTTPS. (Hint: you will need ssl_certificate, ssl_certificate_key, and a return 301 https://... block.)


What Comes Next

You now know how to make Nginx serve static content. But most real applications live behind Nginx as a reverse proxy. In the next chapter, we will configure Nginx to proxy traffic to backend applications, load-balance across multiple servers, and cache responses -- the configuration patterns used in virtually every production deployment.

Reverse Proxy & Load Balancing with Nginx

Why This Matters

In the real world, almost no production application is served directly by its application process. Instead, a reverse proxy sits in front, handling tasks that the application should not care about: TLS termination, load balancing, caching, compression, rate limiting, and connection management.

Here is a scenario you will encounter: your team runs three instances of a Node.js API on ports 3001, 3002, and 3003. Users should hit a single URL (https://api.example.com). If one instance crashes, traffic should seamlessly go to the other two. Response times are slow, so you want to cache certain endpoints. And you need to add rate limiting to prevent abuse. Nginx handles all of this with a few dozen lines of configuration.

This chapter covers the patterns that power nearly every production web deployment.


Try This Right Now

Let us set up a minimal reverse proxy. First, start a simple backend (Python's built-in HTTP server works perfectly):

# Terminal 1: Start a backend on port 8001
$ mkdir -p /tmp/backend1 && echo "Hello from Backend 1" > /tmp/backend1/index.html
$ cd /tmp/backend1 && python3 -m http.server 8001 &

# Terminal 2: Create an Nginx reverse proxy config
$ sudo tee /etc/nginx/sites-available/proxy-demo > /dev/null << 'EOF'
server {
    listen 80;
    server_name proxy-demo.local;

    location / {
        proxy_pass http://127.0.0.1:8001;
    }
}
EOF

$ sudo ln -sf /etc/nginx/sites-available/proxy-demo /etc/nginx/sites-enabled/
$ sudo nginx -t && sudo systemctl reload nginx

# Test it
$ curl -H "Host: proxy-demo.local" http://localhost
Hello from Backend 1

That single proxy_pass line turned Nginx into a reverse proxy. The client talks to Nginx; Nginx talks to the backend.


What Is a Reverse Proxy?

A forward proxy sits in front of clients (like a corporate proxy that employees use to access the internet). A reverse proxy sits in front of servers, and the client usually does not know it exists.

┌──────────────────────────────────────────────────────────────┐
│                     Forward Proxy                             │
│                                                              │
│  ┌────────┐    ┌───────────┐    ┌────────────┐              │
│  │ Client ├───>│  Proxy    ├───>│  Internet  │              │
│  │        │    │ (client's │    │  Servers   │              │
│  │        │<───┤  side)    │<───┤            │              │
│  └────────┘    └───────────┘    └────────────┘              │
│  Client KNOWS about the proxy                                │
├──────────────────────────────────────────────────────────────┤
│                     Reverse Proxy                             │
│                                                              │
│  ┌────────┐    ┌───────────┐    ┌────────────┐              │
│  │ Client ├───>│  Nginx    ├───>│  Backend   │              │
│  │        │    │ (server's │    │  App(s)    │              │
│  │        │<───┤  side)    │<───┤            │              │
│  └────────┘    └───────────┘    └────────────┘              │
│  Client has NO idea the backend exists                       │
└──────────────────────────────────────────────────────────────┘

Why use a reverse proxy?

  • Security -- the backend is never directly exposed to the internet
  • TLS termination -- Nginx handles HTTPS; the backend speaks plain HTTP
  • Load balancing -- distribute traffic across multiple backends
  • Caching -- serve cached responses without hitting the backend
  • Compression -- Nginx compresses responses, saving backend CPU
  • Connection management -- Nginx handles thousands of client connections while maintaining only a few to the backend
  • Rate limiting -- protect backends from abuse

The proxy_pass Directive

proxy_pass is the heart of Nginx's reverse proxy functionality.

location / {
    proxy_pass http://127.0.0.1:8001;
}

URI Handling: Trailing Slash Matters

This is one of the most common Nginx gotchas:

# WITHOUT trailing slash in proxy_pass:
# Request: /api/users
# Proxied to: http://backend:8001/api/users  (path preserved)
location /api/ {
    proxy_pass http://backend:8001;
}

# WITH trailing slash in proxy_pass:
# Request: /api/users
# Proxied to: http://backend:8001/users  (location prefix stripped!)
location /api/ {
    proxy_pass http://backend:8001/;
}

The rule: if proxy_pass has a URI component (even just /), Nginx replaces the matched location prefix with that URI. If it has no URI, the full original path is forwarded.

Essential Proxy Headers

When Nginx proxies a request, the backend loses information about the original client. You need to pass it along with headers:

location / {
    proxy_pass http://127.0.0.1:8001;

    # Pass the original Host header
    proxy_set_header Host $host;

    # Pass the real client IP
    proxy_set_header X-Real-IP $remote_addr;

    # Pass the chain of proxies
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # Tell the backend if the original request was HTTPS
    proxy_set_header X-Forwarded-Proto $scheme;

    # Timeouts
    proxy_connect_timeout 5s;       # Time to establish connection to backend
    proxy_send_timeout 10s;         # Time to send request to backend
    proxy_read_timeout 30s;         # Time to read response from backend
}

Without proxy_set_header Host, the backend receives Host: 127.0.0.1:8001 instead of Host: api.example.com. Without X-Real-IP, the backend sees all requests coming from Nginx's IP instead of the real client.

Think About It: Why is X-Forwarded-For a chain of IPs rather than a single IP? What happens when there are multiple proxies in front of the backend?


Upstream Blocks and Load Balancing

An upstream block defines a group of backend servers that Nginx can distribute traffic across.

Basic Round-Robin Load Balancing

upstream backend_pool {
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://backend_pool;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

By default, Nginx uses round-robin: requests go to each backend in order (1, 2, 3, 1, 2, 3, ...).

Hands-On: Set Up Load Balancing

Let us create a realistic multi-backend setup:

# Start three backends
$ for port in 8001 8002 8003; do
    dir="/tmp/backend${port}"
    mkdir -p "$dir"
    echo "Response from backend on port ${port}" > "$dir/index.html"
    cd "$dir" && python3 -m http.server "$port" &
  done

# Verify they are running
$ curl http://localhost:8001
$ curl http://localhost:8002
$ curl http://localhost:8003
# /etc/nginx/sites-available/loadbalancer
upstream app_backends {
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
}

server {
    listen 80;
    server_name lb.local;

    location / {
        proxy_pass http://app_backends;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
$ sudo ln -sf /etc/nginx/sites-available/loadbalancer /etc/nginx/sites-enabled/
$ sudo nginx -t && sudo systemctl reload nginx

# Send 6 requests and watch round-robin in action
$ for i in $(seq 1 6); do
    curl -s -H "Host: lb.local" http://localhost
  done

Expected output:

Response from backend on port 8001
Response from backend on port 8002
Response from backend on port 8003
Response from backend on port 8001
Response from backend on port 8002
Response from backend on port 8003

Load Balancing Algorithms

Round-Robin (Default)

Distributes requests evenly in order. Simple and effective when backends have equal capacity.

upstream backend {
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

Weighted Round-Robin

Give more traffic to more powerful servers:

upstream backend {
    server 10.0.1.10:8001 weight=5;    # Gets 5x the traffic
    server 10.0.1.11:8001 weight=3;    # Gets 3x the traffic
    server 10.0.1.12:8001 weight=1;    # Gets 1x the traffic
}

Out of every 9 requests: 5 go to server 1, 3 go to server 2, 1 goes to server 3.

Least Connections

Send the request to the backend with the fewest active connections. Best when request processing time varies:

upstream backend {
    least_conn;
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

IP Hash

Always send the same client IP to the same backend. Useful for applications that store session state locally:

upstream backend {
    ip_hash;
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

Safety Warning: ip_hash provides "sticky sessions" but has drawbacks. If one backend fails, all sessions pinned to it are disrupted. And if many users share an IP (corporate NAT), one backend gets overloaded. Prefer stateless application design when possible.

Hash (Generic)

Hash on any variable for consistent routing:

upstream backend {
    hash $request_uri consistent;     # Same URL always goes to same backend
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001;
}

The consistent keyword uses a consistent hashing ring, which minimizes redistribution when backends are added or removed.

Comparison

┌──────────────────────────────────────────────────────────────┐
│              Load Balancing Algorithm Comparison               │
├────────────────┬─────────────────────────────────────────────┤
│ Round-Robin    │ Simple, even distribution, best default      │
│ Weighted       │ When backends have different capacities      │
│ Least Conn     │ When request duration varies widely          │
│ IP Hash        │ When sessions must be sticky (not ideal)     │
│ Hash URI       │ When caching per-URL on specific backends    │
└────────────────┴─────────────────────────────────────────────┘

Health Checks and Failure Detection

Nginx has built-in passive health checks. If a backend fails, Nginx temporarily removes it from the pool.

Passive Health Checks (Open Source Nginx)

upstream backend {
    server 10.0.1.10:8001 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8001 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:8001 max_fails=3 fail_timeout=30s;
}
  • max_fails=3 -- after 3 failed requests, mark the server as down
  • fail_timeout=30s -- keep it marked as down for 30 seconds, then try again

What counts as a failure? A connection timeout, a connection refused, or an error response (configurable with proxy_next_upstream):

location / {
    proxy_pass http://backend;
    proxy_next_upstream error timeout http_502 http_503;
    proxy_next_upstream_tries 2;      # Try at most 2 other backends
    proxy_next_upstream_timeout 10s;  # Give up after 10s total
}

Marking a Server as Down

upstream backend {
    server 10.0.1.10:8001;
    server 10.0.1.11:8001;
    server 10.0.1.12:8001 down;       # Temporarily removed from rotation
    server 10.0.1.13:8001 backup;     # Only used if all others are down
}

Think About It: Why is passive health checking imperfect? (Answer: it only detects failure when a real user request fails. Active health checks -- available in Nginx Plus or HAProxy -- proactively probe backends so failures are detected before any user is affected.)


Caching with proxy_cache

Nginx can cache backend responses, dramatically reducing backend load and improving response times.

# Define a cache zone in the http context
http {
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m
                     max_size=1g inactive=60m use_temp_path=off;
}

Parameters explained:

  • /var/cache/nginx -- where cache files are stored on disk
  • levels=1:2 -- two-level directory structure (prevents too many files in one dir)
  • keys_zone=my_cache:10m -- 10 MB of shared memory for cache keys
  • max_size=1g -- maximum total cache size on disk
  • inactive=60m -- remove items not accessed for 60 minutes

Using the Cache

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://backend_pool;
        proxy_cache my_cache;
        proxy_cache_valid 200 10m;         # Cache 200 responses for 10 min
        proxy_cache_valid 404 1m;          # Cache 404 responses for 1 min
        proxy_cache_use_stale error timeout updating http_500 http_502;

        # Add header so you can see cache status
        add_header X-Cache-Status $upstream_cache_status;
    }
}

The X-Cache-Status header tells you whether the response came from cache:

$ curl -I -H "Host: api.example.com" http://localhost
X-Cache-Status: MISS       # First request, not cached yet

$ curl -I -H "Host: api.example.com" http://localhost
X-Cache-Status: HIT        # Served from cache!

Possible values: MISS, HIT, EXPIRED, STALE, UPDATING, BYPASS.

Bypassing the Cache

Sometimes you need to skip the cache:

location / {
    proxy_pass http://backend_pool;
    proxy_cache my_cache;

    # Don't cache POST requests
    proxy_cache_methods GET HEAD;

    # Bypass cache if the client sends a specific header
    proxy_cache_bypass $http_x_no_cache;

    # Don't cache if backend says not to
    proxy_no_cache $http_set_cookie;
}

WebSocket Proxying

WebSocket connections start as HTTP and then upgrade to a persistent bidirectional connection. Nginx can proxy them, but you need to handle the upgrade:

location /ws/ {
    proxy_pass http://websocket_backend;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;

    # WebSocket connections are long-lived
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;
}

The critical lines are Upgrade and Connection -- without them, the WebSocket handshake fails and you get a 400 error.

┌──────────────────────────────────────────────────────────────┐
│                   WebSocket Proxy Flow                        │
│                                                              │
│  Client                  Nginx                 Backend       │
│    │                       │                      │          │
│    │── GET /ws/ HTTP/1.1 ──>│                      │          │
│    │   Upgrade: websocket  │── GET /ws/ HTTP/1.1 ──>│         │
│    │   Connection: Upgrade │   Upgrade: websocket  │          │
│    │                       │   Connection: upgrade │          │
│    │                       │                      │          │
│    │<── 101 Switching ─────│<── 101 Switching ────│          │
│    │                       │                      │          │
│    │<═══ Bidirectional ════>│<═══ Bidirectional ═══>│         │
│    │     WebSocket data    │     WebSocket data    │          │
└──────────────────────────────────────────────────────────────┘

SSL/TLS Termination

SSL termination means Nginx handles HTTPS while talking to backends over plain HTTP. This simplifies backend configuration and consolidates certificate management.

server {
    listen 443 ssl;
    server_name api.example.com;

    ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;

    # Modern TLS configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;

    # HSTS (tell browsers to always use HTTPS)
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    location / {
        proxy_pass http://backend_pool;           # Plain HTTP to backend
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;   # Tells backend "this was HTTPS"
    }
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name api.example.com;
    return 301 https://$host$request_uri;
}

Rate Limiting

Protect your backends from abuse or accidental overload:

# Define rate limit zones in the http context
http {
    # 10 requests per second per client IP, 10MB zone
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

    # Connection limit: max 20 simultaneous connections per IP
    limit_conn_zone $binary_remote_addr zone=conn_limit:10m;
}

server {
    listen 80;
    server_name api.example.com;

    location /api/ {
        # Allow bursts of 20 requests, then enforce the rate
        limit_req zone=api_limit burst=20 nodelay;
        limit_conn conn_limit 20;

        # Return 429 instead of 503 when rate limited
        limit_req_status 429;
        limit_conn_status 429;

        proxy_pass http://backend_pool;
    }
}
  • rate=10r/s -- 10 requests per second per client IP
  • burst=20 -- allow up to 20 requests to queue up
  • nodelay -- process burst requests immediately rather than throttling them

Hands-On: Testing Rate Limiting

# Send 30 rapid requests
$ for i in $(seq 1 30); do
    curl -s -o /dev/null -w "%{http_code} " -H "Host: api.example.com" http://localhost/api/
  done
echo ""

# You should see:
# 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 429 429 429 429 429 429 429 429 429 429

Practical: Complete Multi-Backend Setup

Here is a production-style configuration that ties everything together:

# /etc/nginx/sites-available/production-app

upstream api_backends {
    least_conn;
    server 10.0.1.10:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:3000 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:3000 max_fails=3 fail_timeout=30s;
}

upstream websocket_backends {
    ip_hash;
    server 10.0.1.10:3001;
    server 10.0.1.11:3001;
    server 10.0.1.12:3001;
}

# Cache zone
proxy_cache_path /var/cache/nginx/api levels=1:2 keys_zone=api_cache:10m
                 max_size=500m inactive=30m;

# Rate limiting
limit_req_zone $binary_remote_addr zone=api_rate:10m rate=20r/s;

server {
    listen 443 ssl http2;
    server_name app.example.com;

    # TLS
    ssl_certificate /etc/letsencrypt/live/app.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/app.example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header Strict-Transport-Security "max-age=31536000" always;
    server_tokens off;

    # Static files (served directly by Nginx)
    location /static/ {
        alias /var/www/app/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;
    }

    # API endpoints (proxied, cached, rate-limited)
    location /api/ {
        limit_req zone=api_rate burst=40 nodelay;

        proxy_pass http://api_backends;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Cache GET requests
        proxy_cache api_cache;
        proxy_cache_valid 200 5m;
        proxy_cache_methods GET HEAD;
        proxy_cache_bypass $http_authorization;
        add_header X-Cache-Status $upstream_cache_status;

        # Retry on backend failure
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 2;
    }

    # WebSocket endpoint
    location /ws/ {
        proxy_pass http://websocket_backends;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 3600s;
    }

    # Health check endpoint (no proxy, instant response)
    location = /health {
        access_log off;
        return 200 "OK\n";
    }
}

# HTTP -> HTTPS redirect
server {
    listen 80;
    server_name app.example.com;
    return 301 https://$host$request_uri;
}

Debug This

Users report intermittent 502 Bad Gateway errors. The application team says all backends are healthy.

# Step 1: Check Nginx error log
$ sudo tail -50 /var/log/nginx/error.log | grep 502
upstream prematurely closed connection while reading response header

# Step 2: Check backend connectivity from the Nginx server
$ curl http://10.0.1.10:3000/health
curl: (7) Failed to connect to 10.0.1.10 port 3000: Connection refused

# Step 3: Check upstream response times in access log
$ awk '{print $NF}' /var/log/nginx/access.log | sort -n | tail -20
# If you see "urt=30.000" values, backends are timing out

# Step 4: Check if it's a specific backend
$ grep "502" /var/log/nginx/error.log | grep -oP 'upstream: "\K[^"]+' | sort | uniq -c

Common causes:

  • Backend process crashed or is not listening
  • Backend is too slow and Nginx's proxy_read_timeout is exceeded
  • Backend closes the connection before sending a response (keepalive mismatch)
  • OS-level connection limits reached (check ulimit -n and ss -s)

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                     Chapter 45 Recap                          │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  A reverse proxy sits in front of backends, handling TLS,    │
│  load balancing, caching, and rate limiting.                  │
│                                                              │
│  Key directives:                                              │
│  - proxy_pass: forward requests to a backend                  │
│  - upstream { }: define a pool of backend servers             │
│  - proxy_cache: cache backend responses                       │
│  - limit_req: rate-limit requests                             │
│                                                              │
│  Load balancing algorithms:                                   │
│  - round-robin (default), weighted, least_conn, ip_hash      │
│                                                              │
│  Always set these proxy headers:                              │
│  - Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto      │
│                                                              │
│  For WebSockets: set Upgrade and Connection headers           │
│  For TLS: terminate at Nginx, plain HTTP to backends          │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Weighted Load Balancing

Start three backends, give them different weights (5, 3, 1). Send 90 requests and count how many each backend receives. Does it match the expected ratio?

Exercise 2: Failure Detection

Set up three backends with max_fails=2 fail_timeout=15s. Kill one backend. Send requests and observe that Nginx stops sending traffic to the dead backend. Start it again and verify it rejoins the pool.

Exercise 3: Caching

Configure proxy_cache for a backend. Send the same request five times. Use X-Cache-Status header to verify the first request is a MISS and subsequent requests are HITs. Then send curl -H "Cache-Control: no-cache" and observe a BYPASS.

Exercise 4: Rate Limiting

Set a rate limit of 2 requests per second with burst=5. Use a bash loop to send 20 rapid requests. Count how many succeed (200) and how many are rate-limited (429).

Bonus Challenge

Set up Nginx as a reverse proxy for two different applications on the same domain: /api/ goes to a Node.js (or Python) backend, and / serves a static React/Vue build. Add caching for the API and long-lived cache headers for static assets. This is the most common production pattern you will encounter.


What Comes Next

Nginx is excellent for many scenarios, but it is not the only option. The next chapter covers Apache -- the original web server that is still widely used -- and helps you understand when to choose one over the other.

Apache Basics & When to Use What

Why This Matters

Apache HTTP Server (commonly just "Apache") has been around since 1995. It was the dominant web server for over two decades and still powers roughly a quarter of all websites. Many enterprise applications, hosting control panels (cPanel, Plesk), and PHP applications (WordPress, Drupal, Magento) are built with Apache in mind.

Even if your new projects use Nginx, you will inevitably encounter Apache in production environments, inherited infrastructure, or specific situations where Apache is the better choice. Understanding Apache is not optional -- it is a professional requirement.

This chapter covers Apache's architecture, configuration, key modules, and provides clear guidance on when to choose Apache versus Nginx.


Try This Right Now

On a Debian/Ubuntu system:

$ sudo apt update && sudo apt install -y apache2
$ sudo systemctl start apache2
$ curl -I http://localhost

You should see:

HTTP/1.1 200 OK
Server: Apache/2.4.58 (Ubuntu)
Content-Type: text/html; charset=UTF-8
...

Distro Note: On RHEL/CentOS/Fedora, the package is called httpd, not apache2:

$ sudo dnf install -y httpd
$ sudo systemctl start httpd

The service name, binary name, and configuration paths all differ between Debian and RHEL families. We will cover both.


Apache vs Nginx: When to Use Each

Before diving into Apache's internals, let us address the question you are already asking.

┌──────────────────────────────────────────────────────────────┐
│                   Apache vs Nginx                             │
├────────────────────┬─────────────────────────────────────────┤
│                    │  Apache              │  Nginx            │
├────────────────────┼──────────────────────┼───────────────────┤
│ Architecture       │ Process/thread-based │ Event-driven      │
│ Config model       │ Distributed (.htaccess)│ Centralized     │
│ Module loading     │ Dynamic (at runtime) │ Mostly compile-time│
│ PHP integration    │ mod_php (embedded)   │ PHP-FPM (external)│
│ Static files       │ Good                 │ Excellent         │
│ Reverse proxy      │ Good (mod_proxy)     │ Excellent         │
│ Concurrency        │ Good (event MPM)     │ Excellent         │
│ Memory per conn    │ Higher               │ Lower             │
│ .htaccess support  │ Yes                  │ No                │
│ Shared hosting     │ Better (per-dir conf)│ Not suited        │
│ Learning curve     │ Gentle               │ Moderate          │
└────────────────────┴──────────────────────┴───────────────────┘

Choose Apache when:

  • You need .htaccess files (shared hosting, user-controlled directories)
  • You are running PHP with mod_php and want simplicity
  • You need dynamic module loading without recompiling
  • You are maintaining existing Apache infrastructure
  • You need per-directory configuration overrides

Choose Nginx when:

  • You need maximum performance for static files and reverse proxying
  • You are handling a large number of concurrent connections
  • You want a simpler, centralized configuration model
  • You are building microservice architectures with many backends
  • Memory efficiency matters (containers, cloud instances)

The most common pattern in production: Nginx in front as a reverse proxy, with Apache behind running PHP or legacy applications. You get the best of both worlds.


Apache Architecture: MPM Models

Apache uses Multi-Processing Modules (MPMs) to determine how it handles connections. Understanding MPMs is key to understanding Apache's behavior and performance.

prefork MPM

┌──────────────────────────────────────────────────────────┐
│                   prefork MPM                             │
│                                                          │
│  ┌──────────────┐                                        │
│  │ Master Process│                                       │
│  └──────┬───────┘                                        │
│         │                                                │
│    ┌────┼────────────────────────┐                       │
│    │    │         │              │                        │
│  ┌─┴─┐ ┌┴──┐ ┌───┴┐ ┌────┐ ┌───┴┐                      │
│  │P1 │ │P2 │ │ P3 │ │ P4 │ │ P5 │   One process per     │
│  │1  │ │1  │ │ 1  │ │ 1  │ │ 1  │   connection.          │
│  │req│ │req│ │ req│ │ req│ │ req│   No threads.          │
│  └───┘ └───┘ └────┘ └────┘ └────┘   Safe for non-       │
│                                       thread-safe libs.   │
└──────────────────────────────────────────────────────────┘
  • Creates a separate process for each connection
  • Maximum compatibility (safe for non-thread-safe PHP modules)
  • Highest memory usage (each process is a full copy)
  • Use case: Legacy PHP applications with non-thread-safe extensions

worker MPM

┌──────────────────────────────────────────────────────────┐
│                   worker MPM                              │
│                                                          │
│  ┌──────────────┐                                        │
│  │ Master Process│                                       │
│  └──────┬───────┘                                        │
│         │                                                │
│    ┌────┼──────────┐                                     │
│    │    │          │                                      │
│  ┌─┴──────┐ ┌─────┴───┐                                 │
│  │Process 1│ │Process 2│    Each process has multiple     │
│  │ T1  T2 │ │ T1  T2  │    threads. Each thread handles  │
│  │ T3  T4 │ │ T3  T4  │    one connection.               │
│  │ T5  T6 │ │ T5  T6  │    Better memory usage than      │
│  └────────┘ └─────────┘    prefork.                      │
└──────────────────────────────────────────────────────────┘
  • Each process spawns multiple threads
  • Each thread handles one connection
  • Less memory than prefork (threads share process memory)
  • Use case: High-traffic sites with thread-safe applications
┌──────────────────────────────────────────────────────────┐
│                   event MPM                               │
│                                                          │
│  Like worker MPM, but with a dedicated listener thread.   │
│  Keep-alive connections don't tie up worker threads.       │
│                                                          │
│  ┌──────────┐                                            │
│  │Process 1 │  Listener thread handles idle keep-alive   │
│  │ Listener │  connections. Worker threads only handle    │
│  │ W1 W2 W3│  active requests. Much more efficient.      │
│  │ W4 W5 W6│                                             │
│  └──────────┘                                            │
└──────────────────────────────────────────────────────────┘
  • Improves on worker by handling keep-alive connections asynchronously
  • A listener thread manages idle connections without consuming a worker thread
  • Default on modern Apache installations
  • Use case: General purpose, the best choice for most new deployments

Hands-On: Check Your MPM

# Debian/Ubuntu
$ apachectl -V | grep MPM
Server MPM:     event

# Or check the loaded modules
$ apachectl -M | grep mpm
 mpm_event_module (shared)

# RHEL/CentOS/Fedora
$ httpd -V | grep MPM
Server MPM:     event

Switching MPMs (Debian/Ubuntu)

# Disable current MPM, enable a different one
$ sudo a2dismod mpm_event
$ sudo a2enmod mpm_prefork
$ sudo systemctl restart apache2

Safety Warning: Switching MPMs requires a restart (not just reload) and will briefly drop all connections. Do this during a maintenance window.


Configuration File Structure

Debian/Ubuntu Layout

/etc/apache2/
├── apache2.conf            # Main config
├── ports.conf              # Listen directives (ports 80, 443)
├── envvars                 # Environment variables (user, group, paths)
├── sites-available/        # All site configs
│   ├── 000-default.conf    # Default HTTP site
│   └── default-ssl.conf    # Default HTTPS site template
├── sites-enabled/          # Symlinks to active sites
│   └── 000-default.conf -> ../sites-available/000-default.conf
├── mods-available/         # All available modules
├── mods-enabled/           # Symlinks to active modules
├── conf-available/         # Additional config fragments
└── conf-enabled/           # Active config fragments

RHEL/CentOS/Fedora Layout

/etc/httpd/
├── conf/
│   └── httpd.conf          # Main config (everything in one file)
├── conf.d/                 # Additional configs (*.conf auto-loaded)
│   ├── ssl.conf            # SSL/TLS configuration
│   └── welcome.conf        # Default welcome page
├── conf.modules.d/         # Module loading configs
│   ├── 00-base.conf
│   ├── 00-ssl.conf
│   └── ...
└── logs -> /var/log/httpd   # Log symlink

Distro Note: The Debian layout is more modular (separate dirs for sites, mods, confs). The RHEL layout is flatter (mostly in httpd.conf and conf.d/). Both achieve the same result.


VirtualHost Configuration

VirtualHosts are Apache's equivalent of Nginx's server blocks.

Basic VirtualHost

$ sudo nano /etc/apache2/sites-available/mysite.conf
<VirtualHost *:80>
    ServerName mysite.example.com
    ServerAlias www.mysite.example.com
    DocumentRoot /var/www/mysite

    <Directory /var/www/mysite>
        AllowOverride All
        Require all granted
    </Directory>

    ErrorLog ${APACHE_LOG_DIR}/mysite-error.log
    CustomLog ${APACHE_LOG_DIR}/mysite-access.log combined
</VirtualHost>

Enable it:

# Debian/Ubuntu
$ sudo a2ensite mysite.conf
$ sudo systemctl reload apache2

# RHEL (just drop the file in conf.d/)
$ sudo cp mysite.conf /etc/httpd/conf.d/
$ sudo systemctl reload httpd

Hands-On: Create a VirtualHost

# Create document root and content
$ sudo mkdir -p /var/www/mysite
$ echo '<h1>Apache says hello!</h1>' | sudo tee /var/www/mysite/index.html
$ sudo chown -R www-data:www-data /var/www/mysite

# Create VirtualHost config
$ sudo tee /etc/apache2/sites-available/mysite.conf > /dev/null << 'EOF'
<VirtualHost *:80>
    ServerName mysite.example.com
    DocumentRoot /var/www/mysite

    <Directory /var/www/mysite>
        AllowOverride All
        Require all granted
    </Directory>

    ErrorLog ${APACHE_LOG_DIR}/mysite-error.log
    CustomLog ${APACHE_LOG_DIR}/mysite-access.log combined
</VirtualHost>
EOF

# Enable and reload
$ sudo a2ensite mysite.conf
$ sudo apache2ctl configtest      # Test config (like nginx -t)
Syntax OK
$ sudo systemctl reload apache2

# Test
$ curl -H "Host: mysite.example.com" http://localhost
<h1>Apache says hello!</h1>

Disabling Sites

# Debian/Ubuntu
$ sudo a2dissite mysite.conf
$ sudo systemctl reload apache2

Think About It: Both Apache and Nginx use the Host header to route requests to virtual hosts. What happens if you do not set a ServerName in your VirtualHost? (Answer: Apache will use the first VirtualHost it finds as the default, similar to Nginx's default_server.)


The .htaccess File

This is Apache's killer feature that Nginx does not have. .htaccess files allow per-directory configuration without editing the main config or reloading Apache.

How .htaccess Works

When Apache receives a request for /var/www/mysite/blog/post.html, it checks for .htaccess files in every directory along the path:

/var/www/.htaccess            (if exists, apply it)
/var/www/mysite/.htaccess     (if exists, apply it -- overrides parent)
/var/www/mysite/blog/.htaccess (if exists, apply it -- overrides parent)

Enabling .htaccess

.htaccess processing is controlled by AllowOverride:

<Directory /var/www/mysite>
    AllowOverride All          # Allow .htaccess to override everything
    Require all granted
</Directory>

AllowOverride options:

  • None -- .htaccess files are completely ignored (best for performance)
  • All -- .htaccess can override any directive
  • FileInfo -- allows MIME types, redirects, rewriting
  • AuthConfig -- allows authentication directives
  • Indexes -- allows directory index settings

Common .htaccess Uses

# /var/www/mysite/.htaccess

# Redirect HTTP to HTTPS
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

# Custom error pages
ErrorDocument 404 /errors/404.html
ErrorDocument 500 /errors/500.html

# Password protection
AuthType Basic
AuthName "Restricted Area"
AuthUserFile /etc/apache2/.htpasswd
Require valid-user

# Block access to sensitive files
<FilesMatch "\.(env|git|htpasswd|log)$">
    Require all denied
</FilesMatch>

# Cache static assets
<IfModule mod_expires.c>
    ExpiresActive On
    ExpiresByType image/jpeg "access plus 1 year"
    ExpiresByType text/css "access plus 1 month"
    ExpiresByType application/javascript "access plus 1 month"
</IfModule>

Safety Warning: Every .htaccess file causes a filesystem stat on every request for every directory in the path. On high-traffic sites, this has measurable performance impact. In production, prefer putting directives in the main config and setting AllowOverride None.


mod_rewrite: URL Rewriting

mod_rewrite is one of Apache's most powerful and most confusing modules. It rewrites URLs based on regular expressions.

Enabling mod_rewrite

$ sudo a2enmod rewrite
$ sudo systemctl restart apache2

Common Rewrite Rules

# In VirtualHost or .htaccess

RewriteEngine On

# Redirect old URLs to new ones
RewriteRule ^/old-page$ /new-page [R=301,L]

# Pretty URLs: /products/42 -> /product.php?id=42
RewriteRule ^/products/([0-9]+)$ /product.php?id=$1 [L,QSA]

# Remove trailing slashes
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ /$1 [R=301,L]

# Front controller pattern (like WordPress, Laravel)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ /index.php [L]

Rewrite flags:

  • [L] -- Last rule, stop processing
  • [R=301] -- External redirect with status code
  • [QSA] -- Append query string to the rewritten URL
  • [NC] -- Case-insensitive matching
  • [F] -- Return 403 Forbidden

Hands-On: Set Up URL Rewriting

# Enable mod_rewrite
$ sudo a2enmod rewrite
$ sudo systemctl restart apache2

# Create a test site
$ sudo mkdir -p /var/www/rewrite-demo
$ echo "Main page" | sudo tee /var/www/rewrite-demo/index.html
$ echo "User profile page" | sudo tee /var/www/rewrite-demo/user.php

# Create .htaccess with rewrite rules
$ sudo tee /var/www/rewrite-demo/.htaccess > /dev/null << 'EOF'
RewriteEngine On
# /users/alice -> /user.php?name=alice
RewriteRule ^users/([a-z]+)$ user.php?name=$1 [L,QSA]
EOF

# Update VirtualHost to allow overrides
# (ensure AllowOverride All in the Directory block)

# Test
$ curl "http://localhost/users/alice"
# Should serve user.php with name=alice

Enabling and Disabling Modules

Apache's module system is one of its greatest strengths. Modules are loaded dynamically -- no recompilation needed.

Debian/Ubuntu: a2enmod / a2dismod

# List available modules
$ ls /etc/apache2/mods-available/ | head -20

# List enabled modules
$ apachectl -M

# Enable a module
$ sudo a2enmod ssl
$ sudo a2enmod headers
$ sudo a2enmod proxy
$ sudo a2enmod proxy_http

# Disable a module
$ sudo a2dismod autoindex

# Always restart after changing modules
$ sudo systemctl restart apache2

RHEL/CentOS/Fedora

On RHEL, modules are managed through config files in /etc/httpd/conf.modules.d/:

# List loaded modules
$ httpd -M

# Modules are loaded via LoadModule directives
$ cat /etc/httpd/conf.modules.d/00-base.conf
LoadModule mpm_event_module modules/mod_mpm_event.so
LoadModule unixd_module modules/mod_unixd.so
...

# To disable a module, comment out its LoadModule line
$ sudo sed -i 's/^LoadModule autoindex/# LoadModule autoindex/' /etc/httpd/conf.modules.d/00-base.conf
$ sudo systemctl restart httpd

Essential Modules

ModulePurpose
mod_sslHTTPS/TLS support
mod_rewriteURL rewriting
mod_headersCustom HTTP headers
mod_proxyReverse proxy functionality
mod_proxy_httpHTTP backend proxying
mod_proxy_wstunnelWebSocket proxying
mod_deflateResponse compression (gzip)
mod_expiresCache control headers
mod_auth_basicHTTP Basic authentication
mod_securityWeb Application Firewall (WAF)

Basic Authentication

Apache has built-in support for HTTP Basic Auth:

# Create a password file
$ sudo htpasswd -c /etc/apache2/.htpasswd admin
New password: ****
Re-type new password: ****
Adding password for user admin

# Add another user (-c creates the file; omit for subsequent users)
$ sudo htpasswd /etc/apache2/.htpasswd developer

In VirtualHost config:

<VirtualHost *:80>
    ServerName internal.example.com
    DocumentRoot /var/www/internal

    <Directory /var/www/internal>
        AuthType Basic
        AuthName "Internal Access Only"
        AuthUserFile /etc/apache2/.htpasswd
        Require valid-user
    </Directory>
</VirtualHost>

Or in .htaccess:

AuthType Basic
AuthName "Restricted"
AuthUserFile /etc/apache2/.htpasswd
Require user admin

Test it:

# Without credentials (401 Unauthorized)
$ curl -I http://internal.example.com
HTTP/1.1 401 Unauthorized

# With credentials
$ curl -u admin:password http://internal.example.com

Apache as a Reverse Proxy

Apache can function as a reverse proxy using mod_proxy:

# Enable required modules
$ sudo a2enmod proxy proxy_http proxy_balancer lbmethod_byrequests
$ sudo systemctl restart apache2

Simple Reverse Proxy

<VirtualHost *:80>
    ServerName api.example.com

    ProxyPreserveHost On
    ProxyPass / http://127.0.0.1:3000/
    ProxyPassReverse / http://127.0.0.1:3000/

    ErrorLog ${APACHE_LOG_DIR}/api-error.log
    CustomLog ${APACHE_LOG_DIR}/api-access.log combined
</VirtualHost>
  • ProxyPreserveHost On -- forward the original Host header to the backend
  • ProxyPass -- forward requests to the backend
  • ProxyPassReverse -- rewrite Location headers in responses so redirects work correctly

Load Balancing with mod_proxy_balancer

<VirtualHost *:80>
    ServerName app.example.com

    <Proxy "balancer://app_cluster">
        BalancerMember http://10.0.1.10:3000
        BalancerMember http://10.0.1.11:3000
        BalancerMember http://10.0.1.12:3000
        ProxySet lbmethod=byrequests      # Round-robin
    </Proxy>

    ProxyPreserveHost On
    ProxyPass / balancer://app_cluster/
    ProxyPassReverse / balancer://app_cluster/
</VirtualHost>

Load balancing methods:

  • byrequests -- round-robin by request count
  • bytraffic -- distribute by bytes transferred
  • bybusyness -- send to least busy worker
  • heartbeat -- use heartbeat monitoring

Hands-On: Complete Apache Setup

Let us set up a realistic Apache configuration:

# 1. Install Apache and enable essential modules
$ sudo apt update && sudo apt install -y apache2
$ sudo a2enmod rewrite ssl headers proxy proxy_http
$ sudo systemctl restart apache2

# 2. Create a site with URL rewriting and security headers
$ sudo mkdir -p /var/www/production
$ echo '<h1>Production Site</h1>' | sudo tee /var/www/production/index.html
$ sudo chown -R www-data:www-data /var/www/production

# 3. Create the VirtualHost
$ sudo tee /etc/apache2/sites-available/production.conf > /dev/null << 'CONF'
<VirtualHost *:80>
    ServerName www.example.com
    DocumentRoot /var/www/production

    <Directory /var/www/production>
        AllowOverride None
        Require all granted

        # Rewrite rules (in config, not .htaccess, for performance)
        RewriteEngine On
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule ^ /index.html [L]
    </Directory>

    # Security headers
    Header always set X-Frame-Options "SAMEORIGIN"
    Header always set X-Content-Type-Options "nosniff"
    Header always set X-XSS-Protection "1; mode=block"
    Header always set Referrer-Policy "strict-origin-when-cross-origin"

    # Hide Apache version
    ServerSignature Off

    # Block hidden files
    <FilesMatch "^\.">
        Require all denied
    </FilesMatch>

    # Logging
    ErrorLog ${APACHE_LOG_DIR}/production-error.log
    CustomLog ${APACHE_LOG_DIR}/production-access.log combined
</VirtualHost>
CONF

# 4. Enable the site, disable the default
$ sudo a2ensite production.conf
$ sudo a2dissite 000-default.conf
$ sudo apache2ctl configtest
$ sudo systemctl reload apache2

# 5. Test
$ curl -I http://localhost
HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff

Debug This

After enabling mod_rewrite and adding rewrite rules in .htaccess, your URLs are not being rewritten. All requests return the literal file path.

Debugging steps:

# 1. Is mod_rewrite loaded?
$ apachectl -M | grep rewrite
 rewrite_module (shared)
# If this is empty, run: sudo a2enmod rewrite && sudo systemctl restart apache2

# 2. Is AllowOverride set correctly?
$ grep -A3 "Directory /var/www" /etc/apache2/sites-enabled/mysite.conf
# If AllowOverride is "None", .htaccess is being ignored.
# Change to "AllowOverride All" (or "AllowOverride FileInfo")

# 3. Is RewriteEngine On in .htaccess?
$ cat /var/www/mysite/.htaccess
# Forgetting "RewriteEngine On" is the #1 mistake

# 4. Check Apache error log for rewrite issues
$ sudo tail -20 /var/log/apache2/error.log

# 5. Enable rewrite logging (temporarily, for debugging)
# In VirtualHost:
# LogLevel alert rewrite:trace3
$ sudo systemctl reload apache2
# Now check error.log for detailed rewrite processing

Common causes:

  • AllowOverride None in the VirtualHost (most common)
  • mod_rewrite not enabled
  • Missing RewriteEngine On in .htaccess
  • .htaccess file has wrong permissions (Apache cannot read it)

Configuration Testing

Always test Apache configuration before reloading:

# Debian/Ubuntu
$ sudo apache2ctl configtest
Syntax OK

# Or
$ sudo apachectl -t
Syntax OK

# RHEL/CentOS
$ sudo httpd -t
Syntax OK

For a verbose dump of the entire parsed configuration:

# Show all virtual hosts
$ sudo apachectl -S
VirtualHost configuration:
*:80                   www.example.com (/etc/apache2/sites-enabled/production.conf:1)

# Show all loaded modules
$ sudo apachectl -M

# Show compiled-in modules
$ sudo apachectl -l

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                     Chapter 46 Recap                          │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Apache HTTP Server uses MPM models for concurrency:          │
│  - prefork: one process per connection (legacy)               │
│  - worker: threads within processes                           │
│  - event: async keep-alive handling (recommended)             │
│                                                              │
│  Key concepts:                                                │
│  - VirtualHost = virtual host (like Nginx server blocks)      │
│  - .htaccess = per-directory config (Apache's unique feature) │
│  - mod_rewrite = powerful URL rewriting                       │
│  - a2enmod/a2dismod = enable/disable modules (Debian)         │
│                                                              │
│  Apache vs Nginx:                                             │
│  - Apache: .htaccess, mod_php, dynamic modules, shared hosting│
│  - Nginx: performance, memory efficiency, reverse proxy       │
│  - Best pattern: Nginx in front, Apache behind for PHP/legacy │
│                                                              │
│  Config:  /etc/apache2/ (Debian)  /etc/httpd/ (RHEL)         │
│  Logs:    /var/log/apache2/       /var/log/httpd/             │
│  Test:    apache2ctl configtest   httpd -t                    │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: MPM Comparison

Switch between event and prefork MPMs. Use ps aux | grep apache to compare the process model. With prefork, start ab -n 100 -c 10 http://localhost/ (Apache Bench) and watch processes spawn.

$ sudo apt install -y apache2-utils
$ ab -n 1000 -c 50 http://localhost/

Exercise 2: .htaccess Mastery

Create a site with these .htaccess rules:

  • Password-protect the /admin/ directory
  • Set up a custom 404 page
  • Redirect /old-blog to /blog with a 301
  • Block requests with empty User-Agent headers

Exercise 3: Reverse Proxy

Set up Apache as a reverse proxy for a Python (python3 -m http.server 8888) or Node.js backend. Verify that ProxyPreserveHost correctly passes the Host header.

Exercise 4: Module Exploration

Enable mod_status to get a real-time Apache status page:

<Location "/server-status">
    SetHandler server-status
    Require ip 127.0.0.1
</Location>

Visit http://localhost/server-status and explore the information provided.

Bonus Challenge

Set up Nginx in front of Apache. Nginx handles static files and proxies PHP requests to Apache. Apache runs mod_php to process the PHP. This is the "Nginx + Apache" pattern used by many high-traffic WordPress sites.


What Comes Next

Both Nginx and Apache can do load balancing, but they are web servers first and load balancers second. In the next chapter, we cover HAProxy -- a tool that is a load balancer first and foremost, with features that neither Nginx nor Apache can match.

HAProxy & Advanced Load Balancing

Why This Matters

Nginx and Apache can both load balance, but they are web servers that happen to do load balancing. HAProxy is a dedicated load balancer and reverse proxy -- it is all it does, and it does it extraordinarily well.

HAProxy powers some of the highest-traffic sites on the internet: GitHub, Reddit, Stack Overflow, Tumblr, and Twitter have all relied on it. It handles millions of concurrent connections, provides fine-grained health checking, supports advanced traffic routing with ACLs, offers real-time stats, and is battle-tested in ways that few other pieces of software can claim.

If your infrastructure grows beyond a single Nginx instance handling everything, or if you need Layer 4 (TCP) load balancing, advanced health checks, or sophisticated traffic routing, HAProxy is the tool you reach for.


Try This Right Now

# Install HAProxy
$ sudo apt update && sudo apt install -y haproxy

# Check the version
$ haproxy -v
HAProxy version 2.8.5 ...

# Check if it is running
$ sudo systemctl status haproxy

Distro Note: On RHEL/CentOS/Fedora:

$ sudo dnf install -y haproxy
$ sudo systemctl enable --now haproxy

On Arch: sudo pacman -S haproxy.


What Makes HAProxy Different

┌──────────────────────────────────────────────────────────────┐
│              Web Server vs Dedicated Load Balancer             │
│                                                              │
│  Nginx / Apache:                                              │
│  ┌─────────────────────────────────────────┐                 │
│  │  Static files  │  Reverse proxy  │  LB  │                │
│  │  CGI/FastCGI   │  URL rewriting  │      │                │
│  │  Compression   │  TLS termination│      │                │
│  └─────────────────────────────────────────┘                 │
│  Load balancing is ONE feature among many.                    │
│                                                              │
│  HAProxy:                                                     │
│  ┌─────────────────────────────────────────┐                 │
│  │            LOAD BALANCING               │                 │
│  │  Layer 4 (TCP)  │  Layer 7 (HTTP)       │                 │
│  │  Health checks  │  ACL routing          │                 │
│  │  Stick tables   │  Rate limiting        │                 │
│  │  Stats dashboard│  Connection queuing    │                │
│  │  TLS termination│  Header manipulation  │                 │
│  └─────────────────────────────────────────┘                 │
│  Load balancing is THE ONLY job. Depth over breadth.          │
└──────────────────────────────────────────────────────────────┘

Key advantages of HAProxy:

  • Active health checks -- probes backends independently (not just on user request failures)
  • Layer 4 and Layer 7 -- can load balance any TCP protocol, not just HTTP
  • Connection queuing -- when all backends are at capacity, requests are queued rather than rejected
  • Stick tables -- track per-client state (request rates, connections) for advanced traffic management
  • Stats dashboard -- real-time visibility into backend health, request rates, and response times
  • Zero-downtime reloads -- seamless configuration changes with no dropped connections

HAProxy Configuration Structure

HAProxy's configuration file is typically at /etc/haproxy/haproxy.cfg. It is divided into four sections:

┌──────────────────────────────────────────────────────────────┐
│                HAProxy Configuration Sections                 │
│                                                              │
│  ┌─────────────────────┐                                     │
│  │  global              │  Process-wide settings              │
│  │  (security, tuning)  │  (user, chroot, logging)            │
│  └─────────────────────┘                                     │
│            │                                                  │
│  ┌─────────────────────┐                                     │
│  │  defaults            │  Default settings inherited by all  │
│  │  (timeouts, modes)   │  frontends and backends              │
│  └─────────────────────┘                                     │
│            │                                                  │
│  ┌─────────────────────┐   ┌─────────────────────┐           │
│  │  frontend            │──>│  backend             │          │
│  │  (listening side)    │   │  (server pool)       │          │
│  │  (binds to ports)    │   │  (health checks)     │          │
│  │  (ACL routing)       │   │  (load balancing)    │          │
│  └─────────────────────┘   └─────────────────────┘           │
└──────────────────────────────────────────────────────────────┘

A Complete Configuration Walkthrough

$ sudo cat /etc/haproxy/haproxy.cfg

Here is a well-annotated production configuration:

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log /dev/log    local0          # Log to syslog
    log /dev/log    local1 notice
    chroot /var/lib/haproxy         # Chroot for security
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy                    # Run as unprivileged user
    group haproxy
    daemon                          # Run in background

    # TLS tuning
    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256
    ssl-default-bind-options ssl-min-ver TLSv1.2

#---------------------------------------------------------------------
# Default settings (inherited by all frontends/backends)
#---------------------------------------------------------------------
defaults
    log     global
    mode    http                    # Layer 7 mode (use "tcp" for Layer 4)
    option  httplog                 # Log full HTTP requests
    option  dontlognull             # Don't log health check probes
    timeout connect 5s              # Time to connect to backend
    timeout client  30s             # Time to wait for client data
    timeout server  30s             # Time to wait for backend response
    retries 3                       # Retry on connection failure
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

#---------------------------------------------------------------------
# Frontend: accepts incoming connections
#---------------------------------------------------------------------
frontend http_front
    bind *:80                       # Listen on port 80 on all interfaces
    default_backend web_servers     # Where to send traffic by default

#---------------------------------------------------------------------
# Backend: pool of servers
#---------------------------------------------------------------------
backend web_servers
    balance roundrobin              # Load balancing algorithm
    server web1 10.0.1.10:8080 check    # "check" enables health checking
    server web2 10.0.1.11:8080 check
    server web3 10.0.1.12:8080 check

Hands-On: Your First HAProxy Setup

# Start three backends
$ for port in 8001 8002 8003; do
    dir="/tmp/habackend${port}"
    mkdir -p "$dir"
    echo "Response from backend ${port}" > "$dir/index.html"
    cd "$dir" && python3 -m http.server "$port" &
  done

# Configure HAProxy
$ sudo tee /etc/haproxy/haproxy.cfg > /dev/null << 'EOF'
global
    log /dev/log local0
    chroot /var/lib/haproxy
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    timeout connect 5s
    timeout client  30s
    timeout server  30s

frontend web_frontend
    bind *:80
    default_backend web_backend

backend web_backend
    balance roundrobin
    server backend1 127.0.0.1:8001 check
    server backend2 127.0.0.1:8002 check
    server backend3 127.0.0.1:8003 check
EOF

# Validate the configuration
$ sudo haproxy -c -f /etc/haproxy/haproxy.cfg
Configuration file is valid

# Start HAProxy
$ sudo systemctl restart haproxy

# Test round-robin
$ for i in $(seq 1 6); do curl -s http://localhost; done
Response from backend 8001
Response from backend 8002
Response from backend 8003
Response from backend 8001
Response from backend 8002
Response from backend 8003

Layer 4 vs Layer 7 Load Balancing

This is one of HAProxy's major differentiators.

Layer 4 (TCP Mode)

HAProxy forwards raw TCP connections. It does not inspect HTTP content -- it just passes bytes between client and backend. Use this for databases, mail servers, or any non-HTTP protocol.

defaults
    mode tcp                        # Layer 4 mode

frontend mysql_front
    bind *:3306
    default_backend mysql_servers

backend mysql_servers
    balance roundrobin
    server db1 10.0.1.20:3306 check
    server db2 10.0.1.21:3306 check

Layer 7 (HTTP Mode)

HAProxy understands HTTP. It can inspect headers, URLs, cookies, and make routing decisions based on content.

defaults
    mode http                       # Layer 7 mode

frontend http_front
    bind *:80

    # Route based on URL path
    acl is_api path_beg /api
    acl is_static path_end .css .js .png .jpg

    use_backend api_servers if is_api
    use_backend static_servers if is_static
    default_backend web_servers

Comparison

┌──────────────────────────────────────────────────────────────┐
│              Layer 4 vs Layer 7                                │
├──────────────────┬───────────────────────────────────────────┤
│                  │  Layer 4 (TCP)      │  Layer 7 (HTTP)     │
├──────────────────┼─────────────────────┼─────────────────────┤
│ Inspects content │  No                 │  Yes (headers, URLs)│
│ Protocols        │  Any TCP            │  HTTP/HTTPS only    │
│ Routing options  │  IP, port           │  URL, header, cookie│
│ Performance      │  Faster (no parsing)│  Slightly slower    │
│ Use cases        │  DB, mail, SSH      │  Web apps, APIs     │
│ SSL termination  │  Passthrough or term│  Full termination   │
└──────────────────┴─────────────────────┴─────────────────────┘

Think About It: You need to load-balance PostgreSQL connections across three database replicas. Should you use Layer 4 or Layer 7? Why? (Answer: Layer 4, because PostgreSQL does not speak HTTP. HAProxy only needs to forward the raw TCP connection.)


ACLs: Intelligent Traffic Routing

Access Control Lists (ACLs) let you define conditions and route traffic based on them. This is one of HAProxy's most powerful features.

frontend http_front
    bind *:80

    # Define ACLs
    acl is_api        path_beg /api/
    acl is_admin      path_beg /admin/
    acl is_websocket  hdr(Upgrade) -i websocket
    acl is_post       method POST
    acl from_internal src 10.0.0.0/8 192.168.0.0/16
    acl is_mobile     hdr_sub(User-Agent) -i mobile android iphone

    # Route based on ACLs
    use_backend api_servers      if is_api
    use_backend admin_servers    if is_admin from_internal     # AND logic
    use_backend ws_servers       if is_websocket
    use_backend upload_servers   if is_api is_post             # API POST -> upload servers
    default_backend web_servers

ACL Matching Functions

FunctionWhat It MatchesExample
path_begURL path starts withpath_beg /api/
path_endURL path ends withpath_end .php
pathExact URL path matchpath /health
hdrHTTP header exact matchhdr(Host) -i api.example.com
hdr_begHTTP header starts withhdr_beg(Host) -i api.
hdr_subHTTP header contains substringhdr_sub(User-Agent) -i curl
srcSource IP address/rangesrc 10.0.0.0/8
methodHTTP methodmethod POST
ssl_fcConnection is over SSLssl_fc

Hands-On: Content-Based Routing

frontend http_front
    bind *:80

    # Different backends for different domains
    acl host_blog hdr(Host) -i blog.example.com
    acl host_api  hdr(Host) -i api.example.com
    acl host_shop hdr(Host) -i shop.example.com

    use_backend blog_backend if host_blog
    use_backend api_backend  if host_api
    use_backend shop_backend if host_shop
    default_backend default_backend

backend blog_backend
    balance roundrobin
    server blog1 10.0.1.10:8080 check
    server blog2 10.0.1.11:8080 check

backend api_backend
    balance leastconn
    server api1 10.0.2.10:3000 check
    server api2 10.0.2.11:3000 check
    server api3 10.0.2.12:3000 check

backend shop_backend
    balance source
    server shop1 10.0.3.10:8080 check
    server shop2 10.0.3.11:8080 check

Health Checks

HAProxy's health checking is far more sophisticated than Nginx's passive checks.

TCP Health Check (Default)

When you add check to a server line, HAProxy opens a TCP connection to verify the backend is alive:

server web1 10.0.1.10:8080 check inter 5s fall 3 rise 2
  • inter 5s -- check every 5 seconds
  • fall 3 -- mark as down after 3 consecutive failures
  • rise 2 -- mark as up after 2 consecutive successes

HTTP Health Check

Verify the backend returns a proper HTTP response:

backend web_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ localhost
    http-check expect status 200
    server web1 10.0.1.10:8080 check
    server web2 10.0.1.11:8080 check

This sends an actual HTTP GET to /health and expects a 200 status code. If the backend returns 500 (because its database is down, for example), HAProxy removes it from the pool.

Advanced Health Check

backend api_servers
    option httpchk
    http-check send meth GET uri /health ver HTTP/1.1 hdr Host localhost
    http-check expect status 200
    http-check expect string "OK"      # Also verify response body

    server api1 10.0.2.10:3000 check inter 3s fall 2 rise 3
    server api2 10.0.2.11:3000 check inter 3s fall 2 rise 3

Server States

┌──────────────────────────────────────────────────────────────┐
│                HAProxy Server States                          │
│                                                              │
│  UP ──(health check fails)──> check fail count increases     │
│       ──(fall threshold)──> DOWN                              │
│                                                              │
│  DOWN ──(health check succeeds)──> check pass count increases│
│        ──(rise threshold)──> UP                               │
│                                                              │
│  Additional states:                                           │
│  - MAINT:  manually put into maintenance                      │
│  - DRAIN:  accepting no new connections, finishing existing   │
│  - NOLB:   not in load balancing pool but still checked      │
└──────────────────────────────────────────────────────────────┘

Stick Tables

Stick tables let HAProxy track per-client state. This is powerful for rate limiting, session persistence, and abuse detection -- all without external tools.

Session Persistence (Sticky Sessions)

backend app_servers
    balance roundrobin
    stick-table type ip size 200k expire 30m
    stick on src
    server app1 10.0.1.10:8080 check
    server app2 10.0.1.11:8080 check

This creates a table of client IPs. When a client first connects, they are assigned a backend server. Subsequent connections from the same IP go to the same server for 30 minutes.

Rate Limiting with Stick Tables

frontend http_front
    bind *:80

    # Track request rate per client IP
    stick-table type ip size 100k expire 30s store http_req_rate(10s)

    # Count requests
    http-request track-sc0 src

    # Deny if more than 100 requests in 10 seconds
    http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }

    default_backend web_servers

Connection Limiting

frontend http_front
    bind *:80

    stick-table type ip size 100k expire 30s store conn_cur

    http-request track-sc0 src
    http-request deny deny_status 429 if { sc_conn_cur(0) gt 50 }

    default_backend web_servers

Think About It: How are HAProxy's stick tables different from Nginx's limit_req for rate limiting? (Answer: Stick tables are far more flexible. They can track any counter -- connection rate, request rate, bytes transferred, error rates -- and combine multiple conditions. They can also be synchronized between HAProxy nodes in a cluster.)


SSL/TLS Termination

HAProxy can terminate TLS and forward plain HTTP to backends:

frontend https_front
    bind *:443 ssl crt /etc/haproxy/certs/example.com.pem
    bind *:80

    # Redirect HTTP to HTTPS
    http-request redirect scheme https unless { ssl_fc }

    # Forward the protocol info to backends
    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request set-header X-Forwarded-Proto http  unless { ssl_fc }

    default_backend web_servers

backend web_servers
    balance roundrobin
    server web1 10.0.1.10:8080 check
    server web2 10.0.1.11:8080 check

The certificate file must contain the certificate and private key concatenated:

# Combine cert and key into one file (HAProxy's required format)
$ sudo cat /etc/letsencrypt/live/example.com/fullchain.pem \
           /etc/letsencrypt/live/example.com/privkey.pem \
    | sudo tee /etc/haproxy/certs/example.com.pem

$ sudo chmod 600 /etc/haproxy/certs/example.com.pem

SNI-Based Routing (Multiple Domains on One IP)

frontend https_front
    bind *:443 ssl crt /etc/haproxy/certs/

    # Route based on the TLS SNI hostname
    use_backend blog_servers  if { ssl_fc_sni blog.example.com }
    use_backend api_servers   if { ssl_fc_sni api.example.com }
    default_backend web_servers

When crt points to a directory, HAProxy loads all .pem files in it and selects the right certificate based on the SNI hostname.


The Stats Dashboard

HAProxy comes with a built-in statistics dashboard that provides real-time visibility into your entire load balancing setup:

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:secretpassword
    stats admin if LOCALHOST          # Allow admin actions from localhost
$ sudo systemctl reload haproxy

# View stats in terminal
$ curl -u admin:secretpassword http://localhost:8404/stats

# Or open in a browser: http://your-server:8404/stats

The dashboard shows:

  • Frontend -- incoming connection rates, bytes in/out
  • Backend -- each server's status (UP/DOWN), current connections, request rate, response times, error counts
  • Server details -- health check status, weight, last status change
  • Session rates -- current, maximum, and limit

Safety Warning: The stats page exposes sensitive information about your infrastructure. Always protect it with authentication and restrict access by IP. Never expose it to the public internet.

Hands-On: Enable and Explore Stats

# Add stats to your configuration
$ sudo tee -a /etc/haproxy/haproxy.cfg > /dev/null << 'EOF'

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 5s
    stats auth admin:haproxy123
EOF

$ sudo haproxy -c -f /etc/haproxy/haproxy.cfg
$ sudo systemctl reload haproxy

# Access the stats
$ curl -u admin:haproxy123 http://localhost:8404/stats
# (Better viewed in a web browser for the HTML dashboard)

# You can also get stats in CSV format
$ curl -s -u admin:haproxy123 "http://localhost:8404/stats;csv"

HAProxy vs Nginx for Load Balancing

┌──────────────────────────────────────────────────────────────┐
│              HAProxy vs Nginx for Load Balancing               │
├──────────────────┬───────────────────┬───────────────────────┤
│ Feature          │ HAProxy           │ Nginx (open source)   │
├──────────────────┼───────────────────┼───────────────────────┤
│ Active health    │ Yes (built-in)    │ No (passive only;     │
│ checks           │                   │ active in Nginx Plus) │
│ Layer 4 (TCP)    │ Full support      │ Supported             │
│ Stats dashboard  │ Built-in, rich    │ Basic stub_status     │
│ Stick tables     │ Yes               │ No                    │
│ Connection queue │ Yes (with limits) │ No (rejects excess)   │
│ Zero-downtime    │ Yes (seamless)    │ Yes (reload)          │
│ reload           │                   │                       │
│ Serve static     │ No                │ Yes (excellent)       │
│ files            │                   │                       │
│ ACL routing      │ Very powerful     │ Via location/map      │
│ Rate limiting    │ Via stick tables  │ Via limit_req module  │
│ Config           │ Single flat file  │ Hierarchical includes │
│ Ecosystem        │ LB only           │ LB + web server + more│
└──────────────────┴───────────────────┴───────────────────────┘

In practice, many production architectures use both:

Internet → HAProxy (Layer 4/7 LB, global routing, stats)
            → Nginx (TLS termination, static files, caching)
               → Application backends

High Availability Concepts

A single HAProxy instance is a single point of failure. Production environments need redundancy.

Active-Passive with Keepalived

The most common HA pattern uses Keepalived with a Virtual IP (VIP):

┌──────────────────────────────────────────────────────────────┐
│            High Availability with Keepalived                  │
│                                                              │
│                   ┌─────────────────┐                        │
│                   │   Virtual IP     │                        │
│                   │  192.168.1.100   │                        │
│                   └────────┬────────┘                        │
│                            │                                  │
│               ┌────────────┼────────────┐                    │
│               │            │            │                     │
│         ┌─────┴─────┐ ┌───┴───────┐                         │
│         │ HAProxy 1  │ │ HAProxy 2  │                        │
│         │  (MASTER)  │ │  (BACKUP)  │                        │
│         │ .1.101     │ │ .1.102     │                        │
│         └────────────┘ └────────────┘                        │
│                                                              │
│  - The VIP floats to whichever node is MASTER                │
│  - If the master fails, keepalived moves the VIP to backup   │
│  - Clients always connect to the VIP, never directly         │
│  - Failover happens in seconds                                │
└──────────────────────────────────────────────────────────────┘
# Install keepalived
$ sudo apt install -y keepalived

# /etc/keepalived/keepalived.conf on HAProxy node 1 (MASTER)
$ sudo tee /etc/keepalived/keepalived.conf > /dev/null << 'EOF'
vrrp_script check_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass mysecret
    }
    virtual_ipaddress {
        192.168.1.100/24
    }
    track_script {
        check_haproxy
    }
}
EOF

On the backup node, change state MASTER to state BACKUP and priority 101 to priority 100.


Practical: Complete Production Configuration

Here is a comprehensive HAProxy configuration that ties all the concepts together:

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    user haproxy
    group haproxy
    daemon
    maxconn 50000
    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256
    ssl-default-bind-options ssl-min-ver TLSv1.2

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    option  forwardfor          # Add X-Forwarded-For header
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    timeout http-keep-alive 10s
    timeout check   5s
    retries 3
    default-server inter 3s fall 3 rise 2

#---------------------------------------------------------------------
# Stats dashboard
#---------------------------------------------------------------------
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:secretpassword

#---------------------------------------------------------------------
# HTTP frontend (redirect to HTTPS)
#---------------------------------------------------------------------
frontend http_front
    bind *:80
    http-request redirect scheme https code 301 unless { ssl_fc }

#---------------------------------------------------------------------
# HTTPS frontend (main entry point)
#---------------------------------------------------------------------
frontend https_front
    bind *:443 ssl crt /etc/haproxy/certs/

    # Security headers
    http-response set-header X-Frame-Options SAMEORIGIN
    http-response set-header X-Content-Type-Options nosniff
    http-response set-header Strict-Transport-Security max-age=31536000

    # Rate limiting
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request deny deny_status 429 if { sc_http_req_rate(0) gt 200 }

    # ACL-based routing
    acl is_api        path_beg /api/
    acl is_websocket  hdr(Upgrade) -i websocket
    acl is_admin      path_beg /admin/
    acl from_office   src 203.0.113.0/24

    # Routing rules
    use_backend api_backend      if is_api
    use_backend ws_backend       if is_websocket
    use_backend admin_backend    if is_admin from_office
    default_backend web_backend

#---------------------------------------------------------------------
# Backends
#---------------------------------------------------------------------
backend web_backend
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    server web1 10.0.1.10:8080 check weight 100
    server web2 10.0.1.11:8080 check weight 100
    server web3 10.0.1.12:8080 check weight 50    # Less powerful server

backend api_backend
    balance leastconn
    option httpchk GET /api/health
    http-check expect status 200
    timeout server 60s                  # APIs may need longer timeout
    server api1 10.0.2.10:3000 check
    server api2 10.0.2.11:3000 check
    server api3 10.0.2.12:3000 check

backend ws_backend
    balance source                      # Sticky for WebSocket
    timeout tunnel 3600s                # Long timeout for WebSocket
    server ws1 10.0.3.10:3001 check
    server ws2 10.0.3.11:3001 check

backend admin_backend
    balance roundrobin
    server admin1 10.0.4.10:8080 check

Debug This

Users report that some requests are taking 30 seconds before returning a 504 Gateway Timeout. The stats dashboard shows all backends as "UP."

# Step 1: Check the stats page for response time data
# Look at the "Avg" and "Max" time columns in the backend section
$ curl -s -u admin:pass http://localhost:8404/stats\;csv | \
    awk -F, '/web_backend/ {print $2, "rtime_avg="$62, "rtime_max="$63}'

# Step 2: Check HAProxy logs for slow requests
$ sudo journalctl -u haproxy --since "10 minutes ago" | grep "30000"
# (30000ms = 30 seconds = timeout)

# Step 3: Is it a specific backend server?
# Check per-server stats in the dashboard

# Step 4: Check the backend directly (bypass HAProxy)
$ curl -o /dev/null -s -w "Total: %{time_total}s\n" http://10.0.1.10:8080/slow-endpoint

# Step 5: Check if the timeout is configured correctly
$ grep "timeout server" /etc/haproxy/haproxy.cfg
timeout server  30s
# 30s timeout matches the 30-second delay!

Common causes:

  • Backend application is slow on certain endpoints (increase timeout server for that backend)
  • Backend server is overloaded (check CPU/memory on the backend)
  • Connection pool exhaustion (check maxconn on server lines)
  • DNS resolution issues (HAProxy resolves at startup; backends changed IP?)

Configuration Validation and Reload

# Always validate before reloading
$ sudo haproxy -c -f /etc/haproxy/haproxy.cfg
Configuration file is valid

# Reload (zero downtime -- new process takes over from old)
$ sudo systemctl reload haproxy

# If reload fails, check the journal
$ sudo journalctl -u haproxy --since "1 minute ago"

HAProxy's reload is seamless: it starts a new process, transfers listening sockets to it, and the old process finishes existing connections before exiting. No connections are dropped.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                     Chapter 47 Recap                          │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  HAProxy is a dedicated load balancer and reverse proxy.      │
│                                                              │
│  Configuration sections:                                      │
│  - global:    process-wide settings                           │
│  - defaults:  inherited defaults                              │
│  - frontend:  where connections come in (binds to ports)      │
│  - backend:   where connections go (server pools)             │
│                                                              │
│  Key capabilities:                                            │
│  - Layer 4 (TCP) and Layer 7 (HTTP) load balancing            │
│  - ACLs for content-based routing                             │
│  - Active health checks (probes backends independently)       │
│  - Stick tables for rate limiting and session persistence     │
│  - Built-in stats dashboard                                   │
│  - Zero-downtime reloads                                      │
│                                                              │
│  For high availability: pair with Keepalived + Virtual IP     │
│                                                              │
│  HAProxy excels at load balancing. Nginx excels at web        │
│  serving. They complement each other beautifully.             │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Multi-Backend Routing

Configure HAProxy to route traffic based on URL path: /api/ goes to one backend pool, /static/ goes to another, and everything else goes to a default backend. Verify with curl.

Exercise 2: Health Checks in Action

Set up HAProxy with three backends and HTTP health checks. Kill one backend and observe:

  • The stats dashboard shows it going DOWN
  • Requests are no longer sent to it
  • Start it back up and watch it return to UP
# Kill backend 2
$ kill $(lsof -ti:8002)
# Check stats: backend2 should show DOWN after a few seconds

# Restart it
$ cd /tmp/habackend8002 && python3 -m http.server 8002 &
# Check stats: backend2 should return to UP

Exercise 3: Rate Limiting

Implement rate limiting with stick tables (max 10 requests per second). Use a bash loop to send rapid requests and observe the 429 responses.

Exercise 4: Stats Dashboard

Enable the stats dashboard and explore it. Identify: current session rate, backend health status, response time averages, and error counts.

Bonus Challenge

Set up a complete HA architecture:

  1. Two HAProxy instances with Keepalived (MASTER/BACKUP) sharing a VIP
  2. Three web backends behind them
  3. HTTP health checks
  4. Stats dashboard on each HAProxy node
  5. Test failover by stopping HAProxy on the master node

This is the exact architecture used in many production environments.


What Comes Next

With web servers and load balancing mastered, you have the skills to build and manage production web infrastructure. The next section of the book moves into storage, backup, and disaster recovery -- because all that web traffic needs data, and data needs protection.

Disk Management in Production

Why This Matters

It is 2 AM and your monitoring alerts you: the database server's /var/lib/mysql partition is 95% full. The database will stop accepting writes within hours. If this were a traditional fixed partition, you would be looking at downtime -- adding a new disk, copying data, resizing partitions, and praying nothing goes wrong.

But this server uses LVM. You attach a new disk, extend the volume group, grow the logical volume, and resize the filesystem -- all without unmounting, all without downtime. The database never notices.

On another server, one of three disks in a RAID array has failed. The array is still serving data because RAID provides redundancy. You hot-swap the failed disk, add the replacement, and the array rebuilds itself while the application keeps running.

This is what production disk management looks like. LVM gives you flexibility. RAID gives you resilience. Together, they are the foundation of reliable storage in any serious Linux environment. This chapter teaches you both.


Try This Right Now

Check your current disk and partition layout:

$ lsblk
$ df -hT
$ cat /proc/mdstat    # shows RAID status (empty if no RAID)
$ sudo lvs 2>/dev/null    # shows logical volumes (empty if no LVM)
$ sudo pvs 2>/dev/null    # shows physical volumes
$ sudo vgs 2>/dev/null    # shows volume groups

If these LVM commands return nothing, you may not have LVM set up yet -- which is exactly what we are about to learn.


LVM: Logical Volume Management

The Problem LVM Solves

Traditional partitioning is rigid. When you create a 50 GB partition for /home, that is all you get. If you need more space, you have difficult choices: resize the partition (risky), move data to a bigger disk, or add a mount point and split your data.

LVM adds a layer of abstraction between your physical disks and your filesystems. This abstraction gives you the ability to:

  • Resize volumes while they are mounted and in use
  • Span a single volume across multiple physical disks
  • Take snapshots of volumes for backups
  • Move data between physical disks without downtime

The Three Layers of LVM

LVM has three layers, and understanding them is essential:

┌───────────────────────────────────────────────────────┐
│                  FILESYSTEMS                           │
│              /home    /var    /data                     │
├───────────────────────────────────────────────────────┤
│              LOGICAL VOLUMES (LV)                      │
│          lv_home   lv_var   lv_data                    │
│    (These are what you format and mount)               │
├───────────────────────────────────────────────────────┤
│              VOLUME GROUP (VG)                          │
│                   vg_main                              │
│    (A pool of storage from one or more PVs)            │
├───────────────────────────────────────────────────────┤
│             PHYSICAL VOLUMES (PV)                      │
│         /dev/sdb1       /dev/sdc1                      │
│    (Actual disk partitions or whole disks)              │
├───────────────────────────────────────────────────────┤
│              PHYSICAL DISKS                             │
│            /dev/sdb     /dev/sdc                        │
└───────────────────────────────────────────────────────┘

Physical Volume (PV): A disk or partition that has been initialized for use by LVM. Think of it as raw material entering a factory.

Volume Group (VG): A pool of storage formed by combining one or more PVs. Think of it as a warehouse where all the raw material is combined into one big pile.

Logical Volume (LV): A slice of storage carved out from a VG. This is what you actually format with a filesystem and mount. Think of it as the finished product cut from the pile.

Hands-On: Creating an LVM Setup

We will simulate this using loop devices (virtual block devices backed by files). This is safe to do on any system.

Step 1: Create virtual disks

# Create two 500 MB files to act as disks
$ sudo dd if=/dev/zero of=/tmp/disk1.img bs=1M count=500
$ sudo dd if=/dev/zero of=/tmp/disk2.img bs=1M count=500

# Attach them as loop devices
$ sudo losetup /dev/loop10 /tmp/disk1.img
$ sudo losetup /dev/loop11 /tmp/disk2.img

# Verify
$ losetup -a | grep loop1
/dev/loop10: []: (/tmp/disk1.img)
/dev/loop11: []: (/tmp/disk2.img)

Step 2: Create Physical Volumes

$ sudo pvcreate /dev/loop10 /dev/loop11
  Physical volume "/dev/loop10" successfully created.
  Physical volume "/dev/loop11" successfully created.

# Inspect them
$ sudo pvs
  PV           VG   Fmt  Attr PSize   PFree
  /dev/loop10       lvm2 ---  500.00m 500.00m
  /dev/loop11       lvm2 ---  500.00m 500.00m

$ sudo pvdisplay /dev/loop10
  "/dev/loop10" is a new physical volume of "500.00 MiB"
  --- NEW Physical volume ---
  PV Name               /dev/loop10
  VG Name
  PV Size               500.00 MiB
  ...

Step 3: Create a Volume Group

$ sudo vgcreate vg_lab /dev/loop10 /dev/loop11
  Volume group "vg_lab" successfully created

$ sudo vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  vg_lab   2   0   0 wz--n- 992.00m 992.00m

$ sudo vgdisplay vg_lab
  --- Volume group ---
  VG Name               vg_lab
  VG Size               992.00 MiB
  PE Size               4.00 MiB
  Total PE              248
  Free  PE / Size       248 / 992.00 MiB
  ...

Notice that the VG size (992 MB) is slightly less than the raw total (1000 MB) due to LVM metadata overhead.

Step 4: Create Logical Volumes

# Create a 400 MB logical volume
$ sudo lvcreate -n lv_data -L 400M vg_lab
  Logical volume "lv_data" created.

# Create another using 200 MB
$ sudo lvcreate -n lv_logs -L 200M vg_lab
  Logical volume "lv_logs" created.

$ sudo lvs
  LV      VG     Attr       LSize   Pool
  lv_data vg_lab -wi-a----- 400.00m
  lv_logs vg_lab -wi-a----- 200.00m

Step 5: Create filesystems and mount

# Format with ext4
$ sudo mkfs.ext4 /dev/vg_lab/lv_data
$ sudo mkfs.ext4 /dev/vg_lab/lv_logs

# Create mount points and mount
$ sudo mkdir -p /mnt/data /mnt/logs
$ sudo mount /dev/vg_lab/lv_data /mnt/data
$ sudo mount /dev/vg_lab/lv_logs /mnt/logs

# Verify
$ df -h /mnt/data /mnt/logs
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_lab-lv_data  388M  2.3M  362M   1% /mnt/data
/dev/mapper/vg_lab-lv_logs  190M  1.6M  175M   1% /mnt/logs

Think About It: We have 992 MB in the volume group, and we have allocated 600 MB to logical volumes. What happens to the remaining 392 MB? Can we use it later?


Extending and Reducing LVM Volumes

This is where LVM truly shines -- resizing storage on the fly.

Extending a Logical Volume

The /mnt/data volume is getting full. Let us add 200 MB from the free space in the volume group:

# Check free space in the VG
$ sudo vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  vg_lab   2   2   0 wz--n- 992.00m 392.00m

# Extend the logical volume by 200 MB
$ sudo lvextend -L +200M /dev/vg_lab/lv_data
  Size of logical volume vg_lab/lv_data changed from 400.00 MiB to 600.00 MiB.
  Logical volume vg_lab/lv_data successfully resized.

# IMPORTANT: The LV is bigger, but the filesystem still sees the old size
$ df -h /mnt/data
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_lab-lv_data  388M  2.3M  362M   1% /mnt/data

# Resize the filesystem to fill the new space
$ sudo resize2fs /dev/vg_lab/lv_data
resize2fs 1.47.0 (5-Feb-2023)
Filesystem at /dev/vg_lab/lv_data is mounted on /mnt/data; on-line resizing required
Performing an on-line resize of /dev/vg_lab/lv_data to 614400 (1k) blocks.

# Now the filesystem sees the new size
$ df -h /mnt/data
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_lab-lv_data  580M  2.3M  545M   1% /mnt/data

You can combine both steps with a single command:

# The -r flag resizes the filesystem automatically
$ sudo lvextend -L +100M -r /dev/vg_lab/lv_data

Distro Note: For XFS filesystems (default on RHEL/CentOS/Fedora), use xfs_growfs /mnt/data instead of resize2fs. XFS can only grow, never shrink.

Adding a New Disk to a Volume Group

When the entire volume group is full, you can add another physical disk:

# Create a third virtual disk
$ sudo dd if=/dev/zero of=/tmp/disk3.img bs=1M count=500
$ sudo losetup /dev/loop12 /tmp/disk3.img

# Initialize it as a PV and add to the VG
$ sudo pvcreate /dev/loop12
$ sudo vgextend vg_lab /dev/loop12

$ sudo vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  vg_lab   3   2   0 wz--n- <1.46g 692.00m

You just expanded your storage pool without touching existing data. No unmounting, no reformatting, no data copying.

Reducing a Logical Volume

WARNING: Reducing a volume can destroy data if done incorrectly. Always back up first. XFS filesystems cannot be shrunk at all.

# Unmount first (required for shrinking)
$ sudo umount /mnt/logs

# Check the filesystem
$ sudo e2fsck -f /dev/vg_lab/lv_logs

# Shrink filesystem first, then LV
$ sudo resize2fs /dev/vg_lab/lv_logs 100M
$ sudo lvreduce -L 100M /dev/vg_lab/lv_logs
  WARNING: Reducing active logical volume to 100.00 MiB.
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
  Do you really want to reduce vg_lab/lv_logs? [y/n]: y

# Remount
$ sudo mount /dev/vg_lab/lv_logs /mnt/logs

Or use the safe combined approach:

$ sudo umount /mnt/logs
$ sudo lvreduce -L 100M -r /dev/vg_lab/lv_logs
$ sudo mount /dev/vg_lab/lv_logs /mnt/logs

LVM Snapshots

LVM snapshots create a point-in-time copy of a logical volume. They are invaluable for backups and for testing changes safely.

# Create some test data
$ sudo sh -c 'echo "Important data - version 1" > /mnt/data/config.txt'

# Create a snapshot (100M for storing changes)
$ sudo lvcreate -s -n snap_data -L 100M /dev/vg_lab/lv_data
  Logical volume "snap_data" created.

# Now modify the original
$ sudo sh -c 'echo "Important data - version 2 (BROKEN)" > /mnt/data/config.txt'

# Mount the snapshot (read-only) to recover
$ sudo mkdir -p /mnt/snap
$ sudo mount -o ro /dev/vg_lab/snap_data /mnt/snap

# The snapshot still has the original data
$ cat /mnt/snap/config.txt
Important data - version 1

# Recover the file
$ sudo cp /mnt/snap/config.txt /mnt/data/config.txt

# Cleanup
$ sudo umount /mnt/snap
$ sudo lvremove /dev/vg_lab/snap_data

Snapshots use copy-on-write: they only store blocks that change in the original volume after the snapshot is taken. The snapshot volume needs to be large enough to hold all the changes that occur while it exists.

Think About It: If you take a snapshot and then write 200 MB of new data to the original volume, but the snapshot only has 100 MB of space, what happens?


RAID: Redundant Array of Independent Disks

RAID combines multiple disks to provide redundancy, performance, or both. Linux supports software RAID through mdadm.

RAID Levels Explained

RAID 0 (Striping) - Performance, NO redundancy
┌─────────┐  ┌─────────┐
│ Disk 1   │  │ Disk 2   │
│ Block 1  │  │ Block 2  │
│ Block 3  │  │ Block 4  │
│ Block 5  │  │ Block 6  │
└─────────┘  └─────────┘
Min disks: 2 | Usable: 100% | Fault tolerance: NONE
If ANY disk fails, ALL data is lost.

RAID 1 (Mirroring) - Redundancy, reduced capacity
┌─────────┐  ┌─────────┐
│ Disk 1   │  │ Disk 2   │
│ Block 1  │  │ Block 1  │  (identical copy)
│ Block 2  │  │ Block 2  │  (identical copy)
│ Block 3  │  │ Block 3  │  (identical copy)
└─────────┘  └─────────┘
Min disks: 2 | Usable: 50% | Can lose 1 disk

RAID 5 (Striping + Distributed Parity)
┌─────────┐  ┌─────────┐  ┌─────────┐
│ Disk 1   │  │ Disk 2   │  │ Disk 3   │
│ Data A1  │  │ Data A2  │  │ Parity A │
│ Data B1  │  │ Parity B │  │ Data B2  │
│ Parity C │  │ Data C1  │  │ Data C2  │
└─────────┘  └─────────┘  └─────────┘
Min disks: 3 | Usable: (N-1)/N | Can lose 1 disk

RAID 6 (Striping + Double Distributed Parity)
Same as RAID 5 but with two parity blocks per stripe.
Min disks: 4 | Usable: (N-2)/N | Can lose 2 disks

RAID 10 (Mirror + Stripe)
┌─────────┐ ┌─────────┐   ┌─────────┐ ┌─────────┐
│ Disk 1   │ │ Disk 2   │   │ Disk 3   │ │ Disk 4   │
│ Block 1  │ │ Block 1  │   │ Block 2  │ │ Block 2  │
│ Block 3  │ │ Block 3  │   │ Block 4  │ │ Block 4  │
└─────────┘ └─────────┘   └─────────┘ └─────────┘
  Mirror 1                   Mirror 2
  ─────────── Striped ──────────────
Min disks: 4 | Usable: 50% | Can lose 1 disk per mirror

Hands-On: Creating a RAID 1 Array with mdadm

# Install mdadm
$ sudo apt install mdadm        # Debian/Ubuntu
$ sudo dnf install mdadm        # Fedora/RHEL

# Create two virtual disks for RAID
$ sudo dd if=/dev/zero of=/tmp/raid1.img bs=1M count=200
$ sudo dd if=/dev/zero of=/tmp/raid2.img bs=1M count=200
$ sudo losetup /dev/loop20 /tmp/raid1.img
$ sudo losetup /dev/loop21 /tmp/raid2.img

# Create a RAID 1 array
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 \
    /dev/loop20 /dev/loop21
mdadm: array /dev/md0 started.

# Check the status
$ cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 loop21[1] loop20[0]
      200576 blocks super 1.2 [2/2] [UU]

# The [UU] means both disks are Up. [U_] would mean one is missing.

# Detailed information
$ sudo mdadm --detail /dev/md0
/dev/md0:
         Version : 1.2
   Creation Time : Sat Jan 18 10:30:00 2025
      Raid Level : raid1
      Array Size : 200576 (195.89 MiB)
   Used Dev Size : 200576 (195.89 MiB)
    Raid Devices : 2
   Total Devices : 2
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    State : clean

# Format and mount
$ sudo mkfs.ext4 /dev/md0
$ sudo mkdir -p /mnt/raid
$ sudo mount /dev/md0 /mnt/raid

Simulating a Disk Failure and Recovery

# Write some data
$ sudo sh -c 'echo "Critical data on RAID" > /mnt/raid/important.txt'

# Simulate a disk failure
$ sudo mdadm --manage /dev/md0 --fail /dev/loop20
mdadm: set /dev/loop20 faulty in /dev/md0

$ cat /proc/mdstat
md0 : active raid1 loop21[1] loop20[0](F)
      200576 blocks super 1.2 [2/1] [_U]

# [_U] -- first disk is down, second is up
# But data is still accessible!
$ cat /mnt/raid/important.txt
Critical data on RAID

# Remove the failed disk
$ sudo mdadm --manage /dev/md0 --remove /dev/loop20

# Add a replacement disk
$ sudo dd if=/dev/zero of=/tmp/raid3.img bs=1M count=200
$ sudo losetup /dev/loop22 /tmp/raid3.img
$ sudo mdadm --manage /dev/md0 --add /dev/loop22

# Watch the rebuild
$ cat /proc/mdstat
md0 : active raid1 loop22[2] loop21[1]
      200576 blocks super 1.2 [2/1] [_U]
      [========>............]  recovery = 42.5% ...

# Wait for it to finish, then:
$ cat /proc/mdstat
md0 : active raid1 loop22[2] loop21[1]
      200576 blocks super 1.2 [2/2] [UU]

Monitoring RAID Health

# Check array status
$ sudo mdadm --detail /dev/md0

# Scan all arrays
$ sudo mdadm --examine --scan

# Set up email alerts for failures
$ sudo mdadm --monitor --mail=admin@example.com --delay=300 /dev/md0 &

# Or configure monitoring in mdadm.conf
$ cat /etc/mdadm/mdadm.conf
MAILADDR admin@example.com

Disk Health Monitoring with smartctl

Disks warn you before they die -- if you are listening. SMART (Self-Monitoring, Analysis, and Reporting Technology) tracks disk health indicators.

# Install smartmontools
$ sudo apt install smartmontools    # Debian/Ubuntu
$ sudo dnf install smartmontools    # Fedora/RHEL

# Check if a disk supports SMART
$ sudo smartctl -i /dev/sda

# View overall health
$ sudo smartctl -H /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

# View detailed attributes
$ sudo smartctl -A /dev/sda
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   14523
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   0

Key attributes to watch:

AttributeWhat It MeansWorry When
Reallocated_Sector_CtBad sectors replaced by sparesAny value > 0
Current_Pending_SectorSectors waiting to be remappedAny value > 0
Offline_UncorrectableSectors that could not be readAny value > 0
Power_On_HoursTotal hours of operationApproaching rated life
Temperature_CelsiusCurrent temperatureAbove 50C for HDDs
# Run a short self-test
$ sudo smartctl -t short /dev/sda

# Run a long self-test (can take hours)
$ sudo smartctl -t long /dev/sda

# Check test results
$ sudo smartctl -l selftest /dev/sda

# Enable automatic monitoring daemon
$ sudo systemctl enable --now smartd

Distro Note: On RHEL/CentOS, the smartd configuration is at /etc/smartmontools/smartd.conf. On Debian/Ubuntu, it is at /etc/smartd.conf.


Debug This

A junior admin reports: "I extended the logical volume but the filesystem still shows the old size."

$ sudo lvs
  LV      VG      Attr       LSize
  lv_app  vg_prod -wi-ao---- 100.00g

$ df -h /app
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/vg_prod-lv_app    50G   45G  2.5G  95% /app

The LV is 100 GB but the filesystem only sees 50 GB. What did they forget?

Answer: They forgot to resize the filesystem after extending the LV. The fix depends on the filesystem type:

# For ext4:
$ sudo resize2fs /dev/vg_prod/lv_app

# For XFS:
$ sudo xfs_growfs /app

This is one of the most common LVM mistakes. The -r flag on lvextend would have handled this automatically:

$ sudo lvextend -L 100G -r /dev/vg_prod/lv_app

Cleanup

If you followed along with the lab, clean up the loop devices:

$ sudo umount /mnt/data /mnt/logs /mnt/raid /mnt/snap 2>/dev/null
$ sudo lvremove -f vg_lab/lv_data vg_lab/lv_logs 2>/dev/null
$ sudo vgremove vg_lab 2>/dev/null
$ sudo pvremove /dev/loop10 /dev/loop11 /dev/loop12 2>/dev/null
$ sudo mdadm --stop /dev/md0 2>/dev/null
$ sudo losetup -d /dev/loop10 /dev/loop11 /dev/loop12 /dev/loop20 /dev/loop21 /dev/loop22 2>/dev/null
$ sudo rm -f /tmp/disk1.img /tmp/disk2.img /tmp/disk3.img /tmp/raid1.img /tmp/raid2.img /tmp/raid3.img

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  LVM provides flexible storage management:                │
│  - PV (Physical Volume) → raw disk/partition              │
│  - VG (Volume Group)    → pool of PVs                     │
│  - LV (Logical Volume)  → usable slice from a VG          │
│                                                           │
│  Key LVM operations:                                      │
│  - pvcreate/vgcreate/lvcreate → build the stack           │
│  - lvextend -r  → grow a volume + filesystem              │
│  - lvreduce -r  → shrink (backup first!)                  │
│  - lvcreate -s  → snapshot for backup/testing             │
│                                                           │
│  RAID provides disk redundancy:                           │
│  - RAID 0 = speed, no safety                              │
│  - RAID 1 = mirror, can lose one disk                     │
│  - RAID 5 = parity across 3+ disks                        │
│  - RAID 10 = mirror + stripe (production favorite)        │
│                                                           │
│  smartctl monitors disk health before failure.            │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. LVM basics: Create three loop devices, combine them into a volume group, create two logical volumes, format them with ext4, and mount them.

  2. Online resize: Write a 50 MB file to one of your logical volumes, then extend the volume by 200 MB using lvextend -r. Verify the file is still intact.

  3. Snapshot backup: Create a snapshot of a logical volume, write new files to the original, then mount the snapshot read-only and verify it still has the old data.

  4. RAID simulation: Create a RAID 5 array with three loop devices. Write data, mark one device as failed, verify data is still readable, then add a replacement and watch the rebuild.

  5. Bonus challenge: Combine LVM and RAID -- create a RAID 1 array with mdadm, then use the RAID device as a physical volume for LVM. This is how many production servers are configured.

NFS & Network Filesystems

Why This Matters

You have a team of five developers, each working on a separate server. They all need access to the same project files, configuration data, and shared libraries. You could copy files between servers manually, but every copy is stale the moment someone makes a change. You could use rsync on a schedule, but that introduces delays and conflicts.

Or you could use NFS -- the Network File System. With NFS, one server exports a directory, and all other servers mount it as if it were a local directory. When a developer saves a file on one server, every other server sees the change instantly. There is no copying, no syncing, no conflicts.

NFS has been the standard way to share filesystems across Unix and Linux machines since 1984. It is used everywhere: shared home directories in universities, shared media libraries, centralized configuration distribution, and shared data stores in compute clusters. If you manage more than one Linux server, you will eventually need NFS.

This chapter covers NFS server and client setup, performance tuning, security considerations, and alternatives like SSHFS and CIFS/Samba for Windows interoperability.


Try This Right Now

Check if your system already has NFS capabilities:

# Check if NFS client utilities are installed
$ which mount.nfs 2>/dev/null && echo "NFS client available" || echo "NFS client not installed"

# Check if NFS server is running (it likely is not on a workstation)
$ systemctl status nfs-server 2>/dev/null || systemctl status nfs-kernel-server 2>/dev/null

# Check for any existing NFS mounts
$ mount -t nfs,nfs4

# Check what your system might be exporting
$ cat /etc/exports 2>/dev/null

How NFS Works

NFS allows a server to share (export) directories over the network. Clients mount these exports and access files as though they were local. The key concept is transparency -- applications do not need to know they are using a network filesystem.

┌─────────────────────┐          ┌─────────────────────┐
│     NFS Server       │          │     NFS Client       │
│                      │          │                      │
│  /srv/shared/        │  Network │  /mnt/shared/        │
│    ├── project/      │◄────────►│    ├── project/      │
│    ├── data/         │  NFSv4   │    ├── data/         │
│    └── configs/      │  TCP/2049│    └── configs/      │
│                      │          │                      │
│  Exports via         │          │  Mounts remote       │
│  /etc/exports        │          │  export as local     │
└─────────────────────┘          └─────────────────────┘

NFS Versions

VersionKey Features
NFSv3Stateless, uses multiple ports (portmap), UDP or TCP
NFSv4Stateful, single port (2049/TCP), built-in security (Kerberos), ACL support
NFSv4.1Parallel NFS (pNFS), session trunking
NFSv4.2Server-side copy, sparse files, application I/O hints

Use NFSv4 unless you have a specific reason to use v3. It is simpler (one port), more secure, and performs better over WANs.


Setting Up an NFS Server

Install the NFS Server

# Debian/Ubuntu
$ sudo apt install nfs-kernel-server

# Fedora/RHEL
$ sudo dnf install nfs-utils

# Arch
$ sudo pacman -S nfs-utils

Create Directories to Share

# Create a shared directory
$ sudo mkdir -p /srv/nfs/shared
$ sudo mkdir -p /srv/nfs/readonly

# Put some content in them
$ sudo sh -c 'echo "Hello from the NFS server" > /srv/nfs/shared/welcome.txt'
$ sudo sh -c 'echo "Reference data" > /srv/nfs/readonly/reference.txt'

# Set ownership -- for simple setups, use nobody:nogroup
$ sudo chown -R nobody:nogroup /srv/nfs/shared
$ sudo chown -R nobody:nogroup /srv/nfs/readonly

Configure Exports

The /etc/exports file defines what gets shared and with whom:

$ sudo vim /etc/exports
# /etc/exports
#
# Syntax: directory    client(options) [client(options)] ...
#
# Share /srv/nfs/shared with the 192.168.1.0/24 network, read-write
/srv/nfs/shared    192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)

# Share /srv/nfs/readonly with everyone, read-only
/srv/nfs/readonly  *(ro,sync,no_subtree_check)

# Share with a specific host
# /srv/nfs/private  webserver.local(rw,sync,no_subtree_check)

Key export options explained:

OptionMeaning
rwRead-write access
roRead-only access
syncWrite data to disk before replying (safe, slower)
asyncReply before data is written to disk (fast, risky)
no_subtree_checkDisables subtree checking (improves reliability)
root_squashMap root (UID 0) on client to nobody (default, safer)
no_root_squashAllow client root to act as root on server (needed for some use cases)
all_squashMap all users to nobody
anonuid=1000Map anonymous users to UID 1000
anongid=1000Map anonymous groups to GID 1000

WARNING: no_root_squash is a security risk. A client with root access can read/write any file on the export as root. Only use it when truly necessary (e.g., for diskless clients or specific applications that require it).

Apply and Start

# Apply export changes
$ sudo exportfs -ra

# Verify what is exported
$ sudo exportfs -v
/srv/nfs/shared    192.168.1.0/24(rw,wdelay,no_root_squash,no_subtree_check,...)
/srv/nfs/readonly  <world>(ro,wdelay,root_squash,no_subtree_check,...)

# Start and enable the NFS server
$ sudo systemctl enable --now nfs-server

# Verify it is listening
$ sudo ss -tlnp | grep 2049
LISTEN  0  64  *:2049  *:*

Distro Note: On Debian/Ubuntu, the service is called nfs-kernel-server. On Fedora/RHEL/Arch, it is nfs-server.


Setting Up an NFS Client

Install Client Utilities

# Debian/Ubuntu
$ sudo apt install nfs-common

# Fedora/RHEL
$ sudo dnf install nfs-utils

# Arch
$ sudo pacman -S nfs-utils

Test What the Server Exports

# Show exports from a server
$ showmount -e 192.168.1.10
Export list for 192.168.1.10:
/srv/nfs/shared   192.168.1.0/24
/srv/nfs/readonly *

Mount an NFS Share

# Create mount point
$ sudo mkdir -p /mnt/shared

# Mount the NFS share
$ sudo mount -t nfs 192.168.1.10:/srv/nfs/shared /mnt/shared

# Verify
$ mount | grep nfs
192.168.1.10:/srv/nfs/shared on /mnt/shared type nfs4 (rw,relatime,...)

# Test it
$ cat /mnt/shared/welcome.txt
Hello from the NFS server

# Write something (if rw)
$ echo "Hello from the client" | sudo tee /mnt/shared/client_message.txt

Specifying NFS Mount Options

# Mount with specific options
$ sudo mount -t nfs -o vers=4,rw,hard,intr,timeo=600,retrans=2 \
    192.168.1.10:/srv/nfs/shared /mnt/shared

Key client mount options:

OptionMeaning
vers=4Force NFSv4
hardRetry NFS requests indefinitely (default; safe for data integrity)
softGive up after retrans retries (can cause data corruption)
intrAllow signals to interrupt hung NFS operations
timeo=NTimeout in tenths of a second before retry
retrans=NNumber of retries before giving up (soft mounts)
rsize=1048576Read buffer size in bytes
wsize=1048576Write buffer size in bytes

Think About It: What happens to applications accessing an NFS mount when the server goes down? How do hard and soft mount options change this behavior?

Making Mounts Persistent with /etc/fstab

# Add to /etc/fstab for automatic mounting at boot
$ sudo vim /etc/fstab
# NFS mounts in /etc/fstab
192.168.1.10:/srv/nfs/shared   /mnt/shared   nfs   defaults,_netdev   0 0
192.168.1.10:/srv/nfs/readonly /mnt/readonly nfs   ro,_netdev          0 0

The _netdev option is critical -- it tells the system to wait for network to be available before attempting to mount. Without it, the system may hang at boot waiting for an NFS mount.

# Test fstab entries without rebooting
$ sudo mount -a

# Verify
$ df -hT | grep nfs
192.168.1.10:/srv/nfs/shared  nfs4  50G  12G  38G  24% /mnt/shared

Autofs: Mount on Demand

Instead of mounting NFS shares permanently, autofs mounts them automatically when accessed and unmounts them after a period of inactivity. This is ideal for shares that are not needed constantly.

# Install autofs
$ sudo apt install autofs        # Debian/Ubuntu
$ sudo dnf install autofs        # Fedora/RHEL

# Configure the master map
$ sudo vim /etc/auto.master
# /etc/auto.master
# Mount point       Map file              Options
/mnt/auto           /etc/auto.nfs         --timeout=300
# Configure the NFS map
$ sudo vim /etc/auto.nfs
# /etc/auto.nfs
# Key       Options                           Location
shared      -rw,sync                          192.168.1.10:/srv/nfs/shared
readonly    -ro                               192.168.1.10:/srv/nfs/readonly
# Start autofs
$ sudo systemctl enable --now autofs

# Now just access the directory -- autofs mounts it automatically
$ ls /mnt/auto/shared
welcome.txt  client_message.txt

# After 300 seconds of inactivity, it unmounts automatically

The power of autofs is that it avoids hung mounts. If the NFS server is down, accessing the directory simply fails rather than hanging the entire mount table.


NFS Performance Tuning

Server-Side Tuning

# Increase the number of NFS daemon threads (default is 8)
# For busy servers, use one thread per CPU core or more
$ sudo vim /etc/nfs.conf
[nfsd]
threads = 16
# Or set it at runtime
$ sudo sh -c 'echo 16 > /proc/fs/nfsd/threads'

# Check current thread count
$ cat /proc/fs/nfsd/threads
16

Client-Side Tuning

# Mount with larger read/write buffers
$ sudo mount -t nfs -o rsize=1048576,wsize=1048576 \
    192.168.1.10:/srv/nfs/shared /mnt/shared

# Check current NFS mount settings
$ nfsstat -m
/mnt/shared from 192.168.1.10:/srv/nfs/shared
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,...

Monitoring NFS Performance

# NFS statistics on the server
$ nfsstat -s
Server rpc stats:
calls      badcalls   badfmt     badauth    badclnt
14523      0          0          0          0

# NFS statistics on the client
$ nfsstat -c

# Per-mount statistics
$ cat /proc/self/mountstats | grep -A 20 "nfs"

# Quick bandwidth test with dd
$ dd if=/dev/zero of=/mnt/shared/testfile bs=1M count=100 oflag=direct
100+0 records out
104857600 bytes (105 MB) copied, 1.23 s, 85.2 MB/s

NFS Security Considerations

NFS was designed for trusted networks. By default, NFS trusts client machines to report user identities honestly. This means:

  1. IP-based access control only: NFS exports restrict access by IP address, not by user authentication.
  2. UID/GID must match: If user alice is UID 1001 on the client but UID 1002 on the server, she will access the wrong files.
  3. No encryption by default: NFSv3 traffic is sent in the clear. NFSv4 supports Kerberos but it requires additional setup.

Basic Security Practices

# Restrict exports to specific subnets, never use *
/srv/nfs/shared    192.168.1.0/24(rw,sync,no_subtree_check)

# Use root_squash (the default) to prevent client root from being server root
# Only disable it when absolutely necessary

# Use firewall rules to restrict NFS access
$ sudo iptables -A INPUT -p tcp --dport 2049 -s 192.168.1.0/24 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --dport 2049 -j DROP

NFSv4 with Kerberos (Overview)

For environments that need stronger security, NFSv4 supports three Kerberos security modes:

ModeDescription
krb5Authentication only (verifies identity)
krb5iAuthentication + integrity checking
krb5pAuthentication + integrity + encryption

Setting up Kerberos is beyond the scope of this chapter, but know that it exists when you need it.


SSHFS: The Simple Alternative

SSHFS mounts a remote directory over SSH. It is slower than NFS but requires zero server-side configuration -- if you can SSH to a machine, you can SSHFS to it.

# Install SSHFS
$ sudo apt install sshfs        # Debian/Ubuntu
$ sudo dnf install fuse-sshfs   # Fedora/RHEL

# Mount a remote directory
$ mkdir -p ~/remote_server
$ sshfs user@192.168.1.10:/home/user ~/remote_server

# Verify
$ ls ~/remote_server
Documents  Downloads  projects

# Unmount
$ fusermount -u ~/remote_server

SSHFS advantages:

  • No server-side setup required (just SSH)
  • Encrypted by default
  • Works through firewalls (port 22)
  • Non-root users can mount

SSHFS disadvantages:

  • Significantly slower than NFS (SSH encryption overhead)
  • Not suitable for high-throughput workloads
  • Can be unreliable on spotty connections

Making SSHFS Persistent

# Add to /etc/fstab (requires key-based SSH auth)
user@192.168.1.10:/home/user  /mnt/remote  fuse.sshfs  defaults,_netdev,allow_other,IdentityFile=/home/localuser/.ssh/id_ed25519  0 0

Think About It: When would you choose SSHFS over NFS? When would NFS be the clear winner?


CIFS/Samba: Windows Interoperability

If you need to share files between Linux and Windows, Samba implements the SMB/CIFS protocol.

Accessing Windows Shares from Linux

# Install CIFS utilities
$ sudo apt install cifs-utils    # Debian/Ubuntu
$ sudo dnf install cifs-utils    # Fedora/RHEL

# Mount a Windows share
$ sudo mkdir -p /mnt/windows_share
$ sudo mount -t cifs //windows-server/ShareName /mnt/windows_share \
    -o username=admin,domain=WORKGROUP

# With credentials file (more secure)
$ cat ~/.smbcredentials
username=admin
password=secret
domain=WORKGROUP

$ chmod 600 ~/.smbcredentials
$ sudo mount -t cifs //windows-server/ShareName /mnt/windows_share \
    -o credentials=/home/user/.smbcredentials

# fstab entry
//windows-server/ShareName /mnt/windows_share cifs credentials=/home/user/.smbcredentials,_netdev 0 0

Setting Up a Samba Server (Brief)

# Install Samba
$ sudo apt install samba

# Configure a share
$ sudo vim /etc/samba/smb.conf
[shared]
   path = /srv/samba/shared
   browseable = yes
   read only = no
   valid users = @smbgroup
# Create a Samba user
$ sudo smbpasswd -a username

# Restart Samba
$ sudo systemctl restart smbd

# Test configuration
$ testparm

Debug This

A user reports: "I can mount the NFS share from one client but not from another. Both are on the same subnet."

Server /etc/exports:

/srv/nfs/data    192.168.1.0/24(rw,sync,no_subtree_check)

Working client: 192.168.1.15 -- mounts successfully. Failing client: 192.168.1.25 -- gets "access denied."

Both clients can ping the server. What could be wrong?

Checklist to investigate:

# On the failing client, check if NFS utils are installed
$ which mount.nfs

# Check if the server's firewall is blocking the specific client
$ sudo iptables -L -n | grep 2049

# Check if the export was applied
$ sudo exportfs -v    # on the server

# Common gotcha: space between host and options
# WRONG (exports to everyone read-only):
/srv/nfs/data    192.168.1.0/24 (rw,sync)
#                              ^ this space is the problem!

# RIGHT (exports to subnet read-write):
/srv/nfs/data    192.168.1.0/24(rw,sync)

That accidental space is a classic NFS trap. With the space, 192.168.1.0/24 gets the default (read-only) export, and (rw,sync) becomes a separate entry exporting to everyone with rw. Remove the space and re-run exportfs -ra.


┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  NFS shares filesystems across the network:               │
│  - Server exports directories via /etc/exports            │
│  - Clients mount them like local filesystems              │
│  - NFSv4 is the modern choice (single port, better       │
│    security)                                              │
│                                                           │
│  Key files and commands:                                  │
│  - /etc/exports         → server export config            │
│  - exportfs -ra         → apply export changes            │
│  - mount -t nfs         → mount on client                 │
│  - /etc/fstab + _netdev → persistent mounts               │
│  - autofs               → mount on demand                 │
│                                                           │
│  Alternatives:                                            │
│  - SSHFS: simple, encrypted, no server setup              │
│  - CIFS/Samba: Linux-Windows file sharing                 │
│                                                           │
│  Security: NFS trusts the network. Restrict by IP,        │
│  use root_squash, consider Kerberos for sensitive data.   │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. NFS server and client: If you have two Linux VMs (or use containers), set up an NFS server on one and mount the export on the other. Write files from the client and verify they appear on the server.

  2. Export options: Experiment with ro, all_squash, and root_squash. Create a file as root on the client with each option and check the ownership on the server.

  3. Autofs setup: Configure autofs to mount an NFS share on demand. Access the directory, verify the mount appears, then wait for the timeout and verify it unmounts.

  4. SSHFS experiment: Mount a remote directory over SSHFS. Compare the speed of copying a large file over SSHFS versus NFS (if both are available).

  5. Bonus challenge: Set up an NFS share that uses anonuid and anongid to map all clients to a specific user. Verify that files created by any client user end up owned by the target user on the server.

Backup Strategies

Why This Matters

A small startup stored everything on a single server -- code, databases, customer data, configuration. They had no backups. One night, a disk failed. They lost three years of work, their customer database, and ultimately, the company.

This is not an unusual story. Backups are the single most important thing you can do for any system you care about. Yet they are routinely neglected, improperly configured, or never tested. The most dangerous backup is one you have never tried to restore.

This chapter teaches you practical, battle-tested backup strategies using open source tools. You will learn the 3-2-1 rule, understand different backup types, and get hands-on with tar, rsync, borgbackup, and restic. By the end, you will have the knowledge to build a backup system that could save your job -- or your company.


Try This Right Now

Before we build anything new, check what backup mechanisms already exist on your system:

# Check if any cron jobs reference backup
$ sudo crontab -l 2>/dev/null | grep -i backup
$ crontab -l 2>/dev/null | grep -i backup

# Check for systemd backup timers
$ systemctl list-timers --all | grep -i backup

# Check if common backup tools are installed
$ which rsync && rsync --version | head -1
$ which borgbackup 2>/dev/null || which borg 2>/dev/null
$ which restic 2>/dev/null

# Check your disk usage (what needs backing up?)
$ df -h
$ du -sh /home /etc /var/log 2>/dev/null

The 3-2-1 Backup Rule

Before touching any tools, understand the strategy. The 3-2-1 rule is the gold standard:

┌──────────────────────────────────────────────────────┐
│              THE 3-2-1 BACKUP RULE                    │
│                                                       │
│  3  Keep at least THREE copies of your data           │
│     (1 primary + 2 backups)                           │
│                                                       │
│  2  Store backups on TWO different types of media     │
│     (local disk + cloud, or local disk + tape)        │
│                                                       │
│  1  Keep at least ONE copy offsite                    │
│     (different building, different city, cloud)       │
│                                                       │
│  WHY?                                                 │
│  - 1 copy: disk failure = total loss                  │
│  - 2 copies on same media: both can fail together     │
│  - 2 copies in same location: fire/flood = total loss │
│  - 3-2-1: survives any single disaster                │
└──────────────────────────────────────────────────────┘

Backup Types

Understanding the three main backup types helps you balance speed, storage, and recovery time:

Full Backup: Copies EVERYTHING every time
Day 1: [████████████████] 100 GB    ← complete copy
Day 2: [████████████████] 100 GB    ← another complete copy
Day 3: [████████████████] 100 GB    ← another complete copy
+ Simple to restore (just one backup needed)
- Uses the most storage and time

Incremental Backup: Copies only what changed SINCE LAST BACKUP
Day 1: [████████████████] 100 GB    ← full backup
Day 2: [██]               5 GB     ← changes since Day 1
Day 3: [█]                2 GB     ← changes since Day 2
Day 4: [███]               8 GB     ← changes since Day 3
+ Uses least storage
- Restore requires full + ALL incrementals in order

Differential Backup: Copies what changed SINCE LAST FULL
Day 1: [████████████████] 100 GB    ← full backup
Day 2: [██]               5 GB     ← changes since Day 1
Day 3: [███]               7 GB     ← changes since Day 1
Day 4: [█████]            12 GB     ← changes since Day 1
+ Restore requires only full + latest differential
- Uses more storage than incremental

Think About It: A critical server has 500 GB of data but only 2-3 GB changes per day. Which backup strategy would you choose and why? What if the recovery time objective is under 30 minutes?


tar: The Classic Backup Tool

tar (tape archive) has been the standard Unix backup tool since the 1970s. It creates archive files from directories.

Basic tar Backup

# Create a compressed backup of /etc
$ sudo tar -czf /backup/etc-$(date +%Y%m%d).tar.gz /etc
# -c = create
# -z = compress with gzip
# -f = filename

# List contents without extracting
$ tar -tzf /backup/etc-20250118.tar.gz | head -20
etc/
etc/hostname
etc/fstab
etc/passwd
...

# Extract to a specific directory
$ mkdir /tmp/restore_test
$ tar -xzf /backup/etc-20250118.tar.gz -C /tmp/restore_test
# -x = extract
# -C = change to directory before extracting

# Extract a single file
$ tar -xzf /backup/etc-20250118.tar.gz -C /tmp/ etc/fstab

Backup with Better Compression

# Use xz for better compression (slower but smaller files)
$ sudo tar -cJf /backup/etc-$(date +%Y%m%d).tar.xz /etc

# Use zstd for a good balance of speed and compression
$ sudo tar --zstd -cf /backup/etc-$(date +%Y%m%d).tar.zst /etc

# Compare sizes
$ ls -lh /backup/etc-20250118.*
-rw-r--r-- 1 root root  4.2M /backup/etc-20250118.tar.gz
-rw-r--r-- 1 root root  3.1M /backup/etc-20250118.tar.xz
-rw-r--r-- 1 root root  3.5M /backup/etc-20250118.tar.zst

Incremental Backups with tar

# Full backup (creates a snapshot file)
$ sudo tar -czf /backup/home-full-$(date +%Y%m%d).tar.gz \
    --listed-incremental=/backup/home.snar /home

# Next day: incremental backup (only changes since last)
$ sudo tar -czf /backup/home-inc-$(date +%Y%m%d).tar.gz \
    --listed-incremental=/backup/home.snar /home

# To restore: apply full first, then each incremental in order
$ tar -xzf /backup/home-full-20250118.tar.gz -C /
$ tar -xzf /backup/home-inc-20250119.tar.gz -C /
$ tar -xzf /backup/home-inc-20250120.tar.gz -C /

Excluding Files from Backups

# Exclude patterns
$ sudo tar -czf /backup/home.tar.gz \
    --exclude='*.tmp' \
    --exclude='.cache' \
    --exclude='node_modules' \
    --exclude='.local/share/Trash' \
    /home

# Or use an exclude file
$ cat /backup/exclude-list.txt
*.tmp
.cache
node_modules
.local/share/Trash
__pycache__

$ sudo tar -czf /backup/home.tar.gz \
    --exclude-from=/backup/exclude-list.txt /home

rsync: Efficient File Synchronization

While tar creates archives, rsync synchronizes files between locations. It only transfers differences, making it extremely efficient for repeated backups.

Basic rsync

# Sync a directory to a backup location
$ rsync -avh /home/user/projects/ /backup/projects/
# -a = archive mode (preserves permissions, timestamps, symlinks, etc.)
# -v = verbose
# -h = human-readable sizes

sending incremental file list
./
index.html
css/style.css
js/app.js

sent 45.23K bytes  received 92 bytes  90.64K bytes/sec
total size is 44.98K  speedup is 0.99

# Run it again -- only changes are transferred
$ rsync -avh /home/user/projects/ /backup/projects/
sending incremental file list

sent 234 bytes  received 12 bytes  492.00 bytes/sec
total size is 44.98K  speedup is 182.85

Notice the "speedup" -- the second run transferred almost nothing because nothing changed.

WARNING: Trailing slashes matter in rsync! /home/user/projects/ (with slash) syncs the contents of projects. /home/user/projects (without slash) syncs the directory itself, creating /backup/projects/projects/.

rsync with Delete (Mirror)

# Mirror source to destination (delete files in dest that are not in source)
$ rsync -avh --delete /home/user/projects/ /backup/projects/

WARNING: --delete removes files from the destination that no longer exist in the source. Always test with --dry-run first:

$ rsync -avh --delete --dry-run /home/user/projects/ /backup/projects/

rsync Over SSH

This is one of the most common backup patterns -- syncing data to a remote server over an encrypted SSH connection:

# Backup to a remote server
$ rsync -avh -e ssh /home/user/projects/ backupuser@backup-server:/backup/projects/

# With specific SSH key and port
$ rsync -avh -e "ssh -i ~/.ssh/backup_key -p 2222" \
    /home/user/projects/ backupuser@backup-server:/backup/projects/

# Limit bandwidth to 5 MB/s (useful over WAN)
$ rsync -avh --bwlimit=5000 -e ssh \
    /home/user/projects/ backupuser@backup-server:/backup/projects/

# Show progress
$ rsync -avh --progress -e ssh \
    /home/user/projects/ backupuser@backup-server:/backup/projects/

rsync Backup Script

A practical daily backup script using rsync:

#!/bin/bash
# backup.sh - Daily rsync backup with rotation

BACKUP_SRC="/home /etc /var/www"
BACKUP_DST="/backup"
DATE=$(date +%Y%m%d)
LOG="/var/log/backup-${DATE}.log"

echo "=== Backup started at $(date) ===" | tee -a "$LOG"

for src in $BACKUP_SRC; do
    dirname=$(basename "$src")
    echo "Backing up $src..." | tee -a "$LOG"
    rsync -ah --delete \
        --exclude='.cache' \
        --exclude='*.tmp' \
        "$src/" "${BACKUP_DST}/${dirname}/" \
        >> "$LOG" 2>&1
done

echo "=== Backup finished at $(date) ===" | tee -a "$LOG"

# Delete logs older than 30 days
find /var/log -name "backup-*.log" -mtime +30 -delete

borgbackup: Deduplicated, Encrypted Backups

borgbackup (borg) is a modern backup tool that provides deduplication, compression, and encryption. It is the tool of choice for many Linux administrators.

Why borg?

  • Deduplication: If a file appears in 100 backups, it is stored only once
  • Compression: Multiple algorithms (lz4, zstd, zlib, lzma)
  • Encryption: AES-256 encryption at rest
  • Efficient: Only transfers and stores unique data chunks
  • Pruning: Automatic retention policy management

Installing borg

# Debian/Ubuntu
$ sudo apt install borgbackup

# Fedora/RHEL
$ sudo dnf install borgbackup

# Arch
$ sudo pacman -S borg

# Or via pip
$ pip install borgbackup

Hands-On: Complete borg Workflow

Step 1: Initialize a repository

# Create a local encrypted repository
$ borg init --encryption=repokey /backup/borg-repo
Enter new passphrase:
Enter same passphrase again:
Do you want your passphrase to be displayed for verification? [yN]: y

# CRITICAL: Export and save the key somewhere safe!
$ borg key export /backup/borg-repo /backup/borg-key-backup.txt

WARNING: If you lose both the passphrase and the key, your backups are irrecoverable. Store the key export separately from the backups.

Step 2: Create a backup (archive)

# Create an archive named with the date
$ borg create --stats --progress \
    /backup/borg-repo::home-$(date +%Y%m%d-%H%M) \
    /home \
    --exclude '/home/*/.cache' \
    --exclude '/home/*/Downloads' \
    --exclude '*.tmp'

Archive name: home-20250118-1430
Archive fingerprint: a1b2c3...
Time (start): Sat, 2025-01-18 14:30:00
Time (end):   Sat, 2025-01-18 14:32:15
Duration: 2 minutes 15 seconds
Number of files: 28547
                       Original size      Compressed size    Deduplicated size
This archive:               12.45 GB              8.23 GB              2.15 GB
All archives:               12.45 GB              8.23 GB              2.15 GB

Notice the "Deduplicated size" -- this is the actual new data stored. After the first backup, subsequent backups store much less.

Step 3: Create another backup and see deduplication in action

# Next day, create another archive
$ borg create --stats /backup/borg-repo::home-20250119-1430 /home \
    --exclude '/home/*/.cache'

                       Original size      Compressed size    Deduplicated size
This archive:               12.48 GB              8.25 GB             85.32 MB
All archives:               24.93 GB             16.48 GB              2.23 GB

The second backup was 12.48 GB of data, but only 85 MB of new (deduplicated) data was actually stored.

Step 4: List and inspect archives

# List all archives
$ borg list /backup/borg-repo
home-20250118-1430       Sat, 2025-01-18 14:30:00
home-20250119-1430       Sun, 2025-01-19 14:30:00

# List files in a specific archive
$ borg list /backup/borg-repo::home-20250118-1430 | head -10
drwxr-xr-x user   user      0 Sat, 2025-01-18 14:00:00 home/user
-rw-r--r-- user   user   4521 Sat, 2025-01-18 13:45:00 home/user/.bashrc
...

# Show archive info
$ borg info /backup/borg-repo::home-20250118-1430

Step 5: Restore from a borg backup

# Restore entire archive to a directory
$ mkdir /tmp/borg-restore
$ cd /tmp/borg-restore
$ borg extract /backup/borg-repo::home-20250118-1430

# Restore a specific file
$ borg extract /backup/borg-repo::home-20250118-1430 home/user/.bashrc

# Restore with a dry run (list what would be extracted)
$ borg extract --dry-run --list /backup/borg-repo::home-20250118-1430

Step 6: Prune old backups (retention policy)

# Keep the last 7 daily, 4 weekly, 6 monthly, and 1 yearly backups
$ borg prune --stats --list \
    --keep-daily=7 \
    --keep-weekly=4 \
    --keep-monthly=6 \
    --keep-yearly=1 \
    /backup/borg-repo

# Always compact after pruning to reclaim disk space
$ borg compact /backup/borg-repo

borg Over SSH (Remote Backups)

# Initialize a remote repository
$ borg init --encryption=repokey ssh://backupuser@backup-server/~/borg-repo

# Create a backup to the remote repository
$ borg create --stats \
    ssh://backupuser@backup-server/~/borg-repo::home-$(date +%Y%m%d) \
    /home

Think About It: borg deduplicates at the chunk level, not the file level. What does this mean if you have a 1 GB database dump that changes by only 5 MB each day?


restic: Another Excellent Option

restic is similar to borg but with a different design philosophy. It supports multiple backends (local, S3, SFTP, Azure, GCS) out of the box.

# Install restic
$ sudo apt install restic        # Debian/Ubuntu
$ sudo dnf install restic        # Fedora/RHEL

# Initialize a repository
$ restic init --repo /backup/restic-repo
enter password for new repository:
enter password again:
created restic repository at /backup/restic-repo

# Create a backup
$ restic -r /backup/restic-repo backup /home
enter password for repository:
repository opened
Files:       12345 new
Added to the repo: 2.15 GiB

# List snapshots
$ restic -r /backup/restic-repo snapshots
ID        Time                 Host        Tags    Paths
a1b2c3d4  2025-01-18 14:30:00  myhost              /home

# Restore a snapshot
$ restic -r /backup/restic-repo restore a1b2c3d4 --target /tmp/restore

# Backup to S3
$ export AWS_ACCESS_KEY_ID=your_key
$ export AWS_SECRET_ACCESS_KEY=your_secret
$ restic -r s3:s3.amazonaws.com/my-backup-bucket init
$ restic -r s3:s3.amazonaws.com/my-backup-bucket backup /home

# Prune old snapshots
$ restic -r /backup/restic-repo forget --keep-daily 7 --keep-weekly 4 --prune

borg vs restic Quick Comparison

Featureborgbackuprestic
DeduplicationYes (content-defined chunking)Yes (content-defined chunking)
EncryptionAES-256-CTRAES-256-CTR
CompressionMultiple algorithmszstd (since 0.16)
Cloud backendsSSH only (natively)S3, Azure, GCS, SFTP, local
SpeedGenerally faster for localBetter for cloud targets
MaturityLonger track recordNewer, very active development

Both are excellent. If you backup to local disks or SSH, borg is a great choice. If you backup to cloud storage, restic has more native backend support.


Backup Rotation and Retention

A common retention scheme:

┌──────────────────────────────────────────────────────┐
│               RETENTION POLICY EXAMPLE                │
│                                                       │
│  Daily backups:   Keep last 7 days                    │
│  Weekly backups:  Keep last 4 weeks                   │
│  Monthly backups: Keep last 12 months                 │
│  Yearly backups:  Keep last 3 years                   │
│                                                       │
│  Timeline:                                            │
│  ◄─ 7 days ──►◄── 4 weeks ──►◄── 12 months ──►      │
│  D D D D D D D W  W  W  W  M   M   M  ...  M  Y Y Y │
│                                                       │
│  Old backups are pruned automatically.                │
│  Backups at transition boundaries are promoted.       │
└──────────────────────────────────────────────────────┘

For tar and rsync backups, you implement rotation manually:

# Delete tar backups older than 30 days
$ find /backup -name "*.tar.gz" -mtime +30 -delete

# Keep only the last 7 daily rsync backups using dated directories
$ ls -d /backup/daily-* | head -n -7 | xargs rm -rf

For borg and restic, use their built-in prune commands as shown above.


Testing Restores

A backup that has never been tested is not a backup. It is a hope.

# Test a tar restore
$ mkdir /tmp/restore-test
$ tar -xzf /backup/etc-20250118.tar.gz -C /tmp/restore-test
$ diff -r /etc /tmp/restore-test/etc --brief | head -20

# Test a borg restore
$ mkdir /tmp/borg-test
$ cd /tmp/borg-test
$ borg extract --dry-run --list /backup/borg-repo::home-20250118-1430
# If dry-run succeeds, the archive is readable

# Verify borg repository integrity
$ borg check /backup/borg-repo
$ borg check --verify-data /backup/borg-repo    # slower but thorough

# Test a restic restore
$ restic -r /backup/restic-repo check
$ restic -r /backup/restic-repo restore latest --target /tmp/restic-test

Schedule monthly restore tests. Add them to your calendar. Treat untested backups as nonexistent.


Automating Backups

With cron

# Edit root's crontab
$ sudo crontab -e
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1

# Weekly full backup Sunday at 1 AM
0 1 * * 0 /usr/local/bin/full-backup.sh >> /var/log/backup.log 2>&1

With systemd Timers

# Create the service unit
$ sudo vim /etc/systemd/system/backup.service
[Unit]
Description=Daily Backup
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh
User=root
# Create the timer unit
$ sudo vim /etc/systemd/system/backup.timer
[Unit]
Description=Run backup daily at 2 AM

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
RandomizedDelaySec=300

[Install]
WantedBy=timers.target
# Enable and start
$ sudo systemctl daemon-reload
$ sudo systemctl enable --now backup.timer

# Check timer status
$ systemctl list-timers | grep backup
NEXT                         LEFT     LAST                         PASSED   UNIT
Sun 2025-01-19 02:00:00 UTC  11h left Sat 2025-01-18 02:00:00 UTC 12h ago  backup.timer

The Persistent=true setting ensures that if the system was off when the timer should have fired, it runs the backup as soon as the system boots.


Debug This

An admin's backup script has been running for months but the restores fail. Here is the script:

#!/bin/bash
tar -czf /backup/nightly.tar.gz /var/www /etc /home 2>/dev/null

What is wrong?

Problems:

  1. Same filename every night: nightly.tar.gz is overwritten each run. There is only ever one backup. If the current one is corrupt, there is nothing to fall back on.
  2. Errors are silenced: 2>/dev/null hides all error messages. If /var/www is too large or permissions fail, the admin never knows.
  3. No verification: No check that the archive is valid after creation.
  4. No rotation: No old backups kept.

Fixed version:

#!/bin/bash
set -euo pipefail
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP="/backup/nightly-${DATE}.tar.gz"
LOG="/var/log/backup-${DATE}.log"

echo "Starting backup at $(date)" | tee "$LOG"
tar -czf "$BACKUP" /var/www /etc /home 2>&1 | tee -a "$LOG"

# Verify the archive
tar -tzf "$BACKUP" > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Backup verified successfully" | tee -a "$LOG"
else
    echo "ERROR: Backup verification failed!" | tee -a "$LOG"
    exit 1
fi

# Remove backups older than 30 days
find /backup -name "nightly-*.tar.gz" -mtime +30 -delete

echo "Backup completed at $(date)" | tee -a "$LOG"

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  The 3-2-1 rule: 3 copies, 2 media types, 1 offsite.    │
│                                                           │
│  Backup types:                                            │
│  - Full: everything, every time (simple, large)           │
│  - Incremental: changes since last backup (small, complex)│
│  - Differential: changes since last full (middle ground)  │
│                                                           │
│  Tools:                                                   │
│  - tar: simple archives, great for /etc and small dirs    │
│  - rsync: efficient syncing, good for file-level backup   │
│  - borgbackup: dedup + encryption + compression           │
│  - restic: like borg with native cloud backend support    │
│                                                           │
│  Golden rules:                                            │
│  1. Automate backups (cron or systemd timers)             │
│  2. Test restores regularly                               │
│  3. Monitor for failures                                  │
│  4. Keep backups offsite                                  │
│  5. An untested backup is not a backup                    │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. tar basics: Create a compressed backup of /etc with a date-stamped filename. Extract it to /tmp and verify the contents match.

  2. rsync mirror: Use rsync to mirror a directory to a backup location. Modify some files, delete some files, and run rsync again with --delete. Verify the mirror is exact.

  3. borg workflow: Initialize a borg repository, create three archives (modifying some files between each), then prune to keep only the latest two. Verify pruning worked with borg list.

  4. Automate it: Write a backup script that uses either borg or rsync, add error checking, and schedule it with a systemd timer that runs daily at 3 AM.

  5. Bonus challenge: Set up borg to back up to a remote server over SSH. Configure a retention policy of 7 daily, 4 weekly, and 6 monthly backups. Write a script that creates the backup, prunes old archives, and compacts the repository, all in one run.

Disaster Recovery

Why This Matters

Your production web server will not boot. The GRUB bootloader is corrupted. You have customers waiting, your phone is ringing, and your boss is standing behind you. What do you do?

Or consider this: a filesystem has become corrupted after a power outage. The server comes back up, but /var will not mount. Your application logs, your database, your mail spool -- all on that filesystem. You need to get that data back.

Disaster recovery is not about preventing disasters -- that is what backups, RAID, and monitoring are for. DR is about what you do after something has gone catastrophically wrong. It is about having a plan, having the tools, and having the muscle memory to execute under pressure.

The time to practice DR is not during a disaster. The time to practice is now, in a lab, when nothing is on fire. This chapter walks you through the critical DR skills every Linux administrator needs: rescue media, GRUB recovery, filesystem repair, disk cloning, and restore procedures.


Try This Right Now

Prepare yourself for recovery by checking what you have available:

# Do you know your GRUB version?
$ grub-install --version 2>/dev/null || grub2-install --version 2>/dev/null

# Can you access a root shell from GRUB? (we will learn how)
# Check if you have rescue tools available
$ which fsck xfs_repair e2fsck 2>/dev/null
$ which ddrescue 2>/dev/null || echo "ddrescue not installed"
$ which clonezilla 2>/dev/null || echo "clonezilla not installed"

# Check your current boot setup
$ lsblk -o NAME,FSTYPE,MOUNTPOINT,SIZE
$ cat /etc/fstab

# Do you have a live USB ready? If not, make one today.

DR Planning Basics

Before any tools, you need a plan. Every organization should have a written DR plan that answers these questions:

┌──────────────────────────────────────────────────────────┐
│               DISASTER RECOVERY PLAN                      │
│                                                           │
│  1. WHAT could go wrong?                                  │
│     - Disk failure                                        │
│     - Filesystem corruption                               │
│     - Bootloader corruption                               │
│     - Accidental data deletion                            │
│     - Ransomware / security breach                        │
│     - Hardware failure (motherboard, PSU)                  │
│     - Natural disaster (fire, flood, earthquake)          │
│                                                           │
│  2. WHAT is the impact of each?                           │
│     - Which services go down?                             │
│     - How many users are affected?                        │
│     - What is the financial cost per hour of downtime?    │
│                                                           │
│  3. HOW do we recover from each?                          │
│     - Step-by-step procedures                             │
│     - Who is responsible?                                 │
│     - What tools and media are needed?                    │
│     - Where are the backups?                              │
│                                                           │
│  4. HOW LONG can we afford to be down?                    │
│     → RTO (Recovery Time Objective)                       │
│                                                           │
│  5. HOW MUCH DATA can we afford to lose?                  │
│     → RPO (Recovery Point Objective)                      │
└──────────────────────────────────────────────────────────┘

RTO and RPO

These two metrics drive every DR decision:

RTO (Recovery Time Objective): The maximum acceptable time to restore service after a disaster. If your RTO is 4 hours, you need to be back online within 4 hours.

RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time. If your RPO is 1 hour, you cannot lose more than 1 hour of data, which means you need backups at least hourly.

Timeline of a disaster:

Last backup    Disaster occurs    Service restored
    │               │                    │
    ▼               ▼                    ▼
────●───────────────●────────────────────●────────►
    │◄─── RPO ─────►│                    │
    │   (data loss)  │◄───── RTO ───────►│
    │                │   (downtime)       │

Example scenarios:

SystemRTORPOImplication
Personal blog24 hours1 weekDaily backups, manual restore
E-commerce site1 hour15 minutesHot standby, frequent backups
Bank transaction systemNear zeroZeroActive-active replication
Internal wiki8 hours24 hoursDaily backups, next-day restore

Think About It: Your company's email server goes down. The boss says "get it back immediately" but has not approved budget for redundant servers. What questions should you ask to establish a realistic RTO and RPO?


Bootable Rescue Media

The first thing you need when a system will not boot is rescue media -- a bootable USB drive with tools to repair the system.

Creating a Bootable USB

# Download a rescue-focused distribution
# SystemRescue (formerly SystemRescueCd) is excellent for this:
# https://www.system-rescue.org/

# Write it to a USB drive
# FIRST: identify the USB device (be VERY careful here)
$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0  500G  0 disk
├─sda1   8:1    0  512M  0 part /boot/efi
└─sda2   8:2    0  499G  0 part /
sdb      8:16   1  32G   0 disk              ← USB drive
└─sdb1   8:17   1  32G   0 part

WARNING: The dd command below will DESTROY all data on the target device. Triple-check the device name. Writing to the wrong device can wipe your system disk.

# Write the ISO to the USB drive
$ sudo dd if=systemrescue-11.00-amd64.iso of=/dev/sdb bs=4M status=progress conv=fsync

Alternatively, most Linux distributions' live ISOs work well as rescue media. Ubuntu, Fedora, and Debian live images all include fsck, chroot, and disk utilities.

Booting Into Rescue Mode

When the system will not boot normally:

  1. Insert the rescue USB
  2. Enter BIOS/UEFI firmware settings (usually F2, F12, Del, or Esc at POST)
  3. Set USB as the first boot device
  4. Boot from USB
  5. You now have a working Linux environment with access to the broken system's disks

GRUB Recovery

GRUB (GRand Unified Bootloader) is the most common bootloader on Linux systems. When it breaks, the system will not boot at all.

Symptoms of GRUB Problems

  • grub rescue> prompt (GRUB cannot find its configuration)
  • error: unknown filesystem (GRUB cannot read the boot partition)
  • error: file not found (kernel or initramfs missing)
  • System boots to a black screen with a blinking cursor

Recovery from the GRUB Rescue Prompt

If you see grub rescue>, GRUB is loaded but cannot find its configuration:

# At the grub rescue prompt, find the boot partition
grub rescue> ls
(hd0) (hd0,msdos1) (hd0,msdos2)

grub rescue> ls (hd0,msdos1)/
./ ../ grub/ vmlinuz initrd.img

# Set the root and prefix
grub rescue> set root=(hd0,msdos1)
grub rescue> set prefix=(hd0,msdos1)/grub

# Load normal mode
grub rescue> insmod normal
grub rescue> normal

This should bring you to the normal GRUB menu. Once booted, fix it permanently.

Reinstalling GRUB from Live Media

Boot from rescue/live media, then:

# Identify your partitions
$ lsblk -f
NAME   FSTYPE LABEL MOUNTPOINT
sda
├─sda1 vfat
├─sda2 ext4
└─sda3 ext4

# Mount the root filesystem
$ sudo mount /dev/sda2 /mnt

# If you have a separate /boot partition, mount it too
$ sudo mount /dev/sda1 /mnt/boot

# For UEFI systems, mount the EFI partition
$ sudo mount /dev/sda1 /mnt/boot/efi

# Mount essential virtual filesystems
$ sudo mount --bind /dev /mnt/dev
$ sudo mount --bind /dev/pts /mnt/dev/pts
$ sudo mount --bind /proc /mnt/proc
$ sudo mount --bind /sys /mnt/sys

# Chroot into the broken system
$ sudo chroot /mnt

# Now you are "inside" the broken system. Reinstall GRUB.

# For BIOS/MBR systems:
$ grub-install /dev/sda
$ update-grub

# For UEFI systems:
$ grub-install --target=x86_64-efi --efi-directory=/boot/efi
$ update-grub

# Exit chroot and reboot
$ exit
$ sudo umount -R /mnt
$ sudo reboot

Distro Note: On Fedora/RHEL, use grub2-install and grub2-mkconfig -o /boot/grub2/grub.cfg instead of grub-install and update-grub.


Filesystem Repair

Filesystem corruption can happen after power outages, kernel panics, or hardware failures. The repair tools depend on the filesystem type.

ext4 Repair with fsck

# IMPORTANT: Never run fsck on a mounted filesystem!
# Unmount first, or boot from rescue media.

# Check and repair an ext4 filesystem
$ sudo fsck.ext4 -f /dev/sda2
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda2: 45231/3276800 files (0.5% non-contiguous), 982145/13107200 blocks

# Automatically fix all problems (use with caution)
$ sudo fsck.ext4 -fy /dev/sda2

# Check without making changes (dry run)
$ sudo fsck.ext4 -n /dev/sda2

WARNING: Running fsck on a mounted filesystem can cause severe data corruption. Always unmount first or run from rescue media. The root filesystem can be checked by booting into single-user mode or from live media.

XFS Repair with xfs_repair

# XFS uses xfs_repair, not fsck
$ sudo xfs_repair /dev/sda3

# If xfs_repair fails, try clearing the log first
$ sudo xfs_repair -L /dev/sda3
# WARNING: -L destroys the log, which may lose recent data

# Check without modifying (dry run)
$ sudo xfs_repair -n /dev/sda3

Checking the Root Filesystem

Since you cannot unmount / while the system is running, force a check at next boot:

# Force fsck on next boot (ext4)
$ sudo touch /forcefsck

# Or set the filesystem to require a check
$ sudo tune2fs -C 100 -c 1 /dev/sda2
# This sets the check count so fsck runs on next mount

Recovering Deleted Files

When a file is deleted, the data is not immediately erased -- only the directory entry and inode references are removed. The data blocks remain on disk until overwritten by new data.

Key Principles

  1. Stop writing to the filesystem immediately. Every write reduces the chance of recovery.
  2. Mount the filesystem read-only if possible.
  3. Work on a copy of the disk, not the original.

Tools for File Recovery

# extundelete - for ext3/ext4 filesystems
$ sudo apt install extundelete    # Debian/Ubuntu

# Recover all recently deleted files
$ sudo extundelete /dev/sda2 --restore-all
# Recovered files appear in RECOVERED_FILES/

# Recover a specific file
$ sudo extundelete /dev/sda2 --restore-file home/user/important.txt

# testdisk - for multiple filesystem types
$ sudo apt install testdisk

# Run testdisk interactively
$ sudo testdisk /dev/sda
# Follow the menu: Analyze → Quick Search → list files

# photorec - recovers files by signature (works even on damaged filesystems)
$ sudo photorec /dev/sda
# Recovers files by type (photos, documents, etc.)
# Does NOT preserve filenames

Think About It: Why does "stop writing to the filesystem" matter for file recovery? What happens to the deleted file's data blocks when new files are written?


Disk Cloning

Disk cloning creates an exact, bit-for-bit copy of a disk. This is essential for DR, migration, and forensic analysis.

Cloning with dd

# Clone entire disk to another disk
$ sudo dd if=/dev/sda of=/dev/sdb bs=64K status=progress conv=noerror,sync
# if = input file (source)
# of = output file (destination)
# bs = block size (64K is a good balance)
# status=progress = show progress
# conv=noerror = continue past read errors
# conv=sync = pad read errors with zeros

# Clone disk to an image file
$ sudo dd if=/dev/sda of=/backup/server-image-$(date +%Y%m%d).img \
    bs=64K status=progress

# Compress the image (disks have lots of empty space)
$ sudo dd if=/dev/sda bs=64K status=progress | gzip -c > /backup/server.img.gz

# Restore from compressed image
$ gunzip -c /backup/server.img.gz | sudo dd of=/dev/sda bs=64K status=progress

WARNING: dd does not ask for confirmation. Swapping if and of will overwrite your source disk with the contents of the destination. Triple-check your command before pressing Enter.

ddrescue: For Failing Disks

When a disk is failing with read errors, standard dd may hang or fail. ddrescue is designed for exactly this situation -- it copies what it can, skips bad sectors, and returns to retry them later.

# Install ddrescue
$ sudo apt install gddrescue        # Debian/Ubuntu (note: gddrescue, not ddrescue)
$ sudo dnf install ddrescue          # Fedora/RHEL

# Clone a failing disk (first pass: quick copy, skip errors)
$ sudo ddrescue -d -r0 /dev/sda /backup/rescue.img /backup/rescue.log
# -d = direct access (bypass kernel cache)
# -r0 = do not retry bad sectors yet
# The log file tracks which sectors were read

# Second pass: retry bad sectors
$ sudo ddrescue -d -r3 /dev/sda /backup/rescue.img /backup/rescue.log
# -r3 = retry bad sectors 3 times
# The log file ensures already-read sectors are skipped

The log file is critical -- it lets you stop and resume the rescue operation, and it ensures ddrescue does not re-read sectors it already copied successfully.

Clonezilla: Disk Imaging Made Easy

Clonezilla is a partition and disk imaging/cloning program. It is like the open source equivalent of Norton Ghost or Acronis True Image.

# Clonezilla is typically booted from a live USB
# Download from: https://clonezilla.org/

# Key Clonezilla features:
# - Clone disk to disk (device-to-device)
# - Clone disk to image (device-to-image file)
# - Multicasting (image one disk to many machines)
# - Supports ext4, XFS, Btrfs, NTFS, FAT32, and more
# - Only copies used blocks (much faster than dd)

Clonezilla works through a text-based menu system. Boot from the Clonezilla USB and follow the prompts. For scripted/automated cloning, Clonezilla provides a command-line interface as well.


Restoring from Backups

Having backups is only half the equation. You need to know how to restore from them under pressure.

Restoring from tar

# Full system restore from tar backup
# Boot from rescue media first, then:

# Mount the target filesystem
$ sudo mount /dev/sda2 /mnt

# Extract the backup
$ sudo tar -xzf /backup/full-system-20250118.tar.gz -C /mnt

# Restore GRUB
$ sudo mount --bind /dev /mnt/dev
$ sudo mount --bind /proc /mnt/proc
$ sudo mount --bind /sys /mnt/sys
$ sudo chroot /mnt
$ grub-install /dev/sda
$ update-grub
$ exit

# Unmount and reboot
$ sudo umount -R /mnt
$ sudo reboot

Restoring from borg

# Boot from rescue media with borg installed, then:

# Mount the target filesystem
$ sudo mount /dev/sda2 /mnt

# Mount the backup storage
$ sudo mount /dev/sdb1 /backup

# List available archives
$ borg list /backup/borg-repo
home-20250118-1430       Sat, 2025-01-18 14:30:00
home-20250119-1430       Sun, 2025-01-19 14:30:00
system-20250118-0200     Sat, 2025-01-18 02:00:00

# Restore the system archive
$ cd /mnt
$ borg extract /backup/borg-repo::system-20250118-0200

# Fix bootloader, fstab, etc.
$ sudo chroot /mnt
$ grub-install /dev/sda
$ update-grub
$ exit

Restoring from rsync

# rsync backups are just files, so restoration is straightforward
$ sudo mount /dev/sda2 /mnt
$ sudo rsync -avh /backup/system/ /mnt/
$ sudo chroot /mnt
$ grub-install /dev/sda
$ update-grub
$ exit

Documenting Recovery Procedures

A DR plan that exists only in someone's head is not a plan. Document everything.

What to Document

┌──────────────────────────────────────────────────────────┐
│          DR DOCUMENTATION CHECKLIST                       │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  For each critical server, document:                      │
│                                                           │
│  □ Server name, IP, role, and owner                       │
│  □ Disk layout (lsblk output, /etc/fstab)                │
│  □ LVM configuration (pvs, vgs, lvs output)              │
│  □ RAID configuration (mdadm --detail output)            │
│  □ Partition table (fdisk -l output)                     │
│  □ Installed packages list                                │
│  □ Custom configuration files locations                   │
│  □ Backup location, schedule, and retention               │
│  □ Step-by-step restore procedure                         │
│  □ Service startup order and dependencies                 │
│  □ Contact information for escalation                     │
│  □ Estimated recovery time                                │
│                                                           │
│  Store documentation:                                     │
│  - In the backup itself                                   │
│  - In a wiki/shared document                              │
│  - Printed copy in a secure location                      │
│                                                           │
└──────────────────────────────────────────────────────────┘

Capturing System State for DR

Create a script that captures all the information you would need to rebuild a system:

#!/bin/bash
# dr-capture.sh - Capture system state for disaster recovery
DIR="/root/dr-docs"
mkdir -p "$DIR"
DATE=$(date +%Y%m%d)

echo "=== DR State Capture: $(hostname) - $(date) ===" > "$DIR/dr-info-${DATE}.txt"

# Disk layout
lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT >> "$DIR/dr-info-${DATE}.txt"
echo "---" >> "$DIR/dr-info-${DATE}.txt"

# Partition tables
fdisk -l >> "$DIR/dr-info-${DATE}.txt" 2>/dev/null
echo "---" >> "$DIR/dr-info-${DATE}.txt"

# LVM (if used)
pvs >> "$DIR/dr-info-${DATE}.txt" 2>/dev/null
vgs >> "$DIR/dr-info-${DATE}.txt" 2>/dev/null
lvs >> "$DIR/dr-info-${DATE}.txt" 2>/dev/null
echo "---" >> "$DIR/dr-info-${DATE}.txt"

# RAID (if used)
cat /proc/mdstat >> "$DIR/dr-info-${DATE}.txt" 2>/dev/null
echo "---" >> "$DIR/dr-info-${DATE}.txt"

# Fstab
cp /etc/fstab "$DIR/fstab-${DATE}"

# Network
ip addr > "$DIR/network-${DATE}.txt"
ip route >> "$DIR/network-${DATE}.txt"
cat /etc/resolv.conf >> "$DIR/network-${DATE}.txt"

# Installed packages
dpkg --get-selections > "$DIR/packages-${DATE}.txt" 2>/dev/null  # Debian/Ubuntu
rpm -qa > "$DIR/packages-${DATE}.txt" 2>/dev/null                # RHEL/Fedora

echo "DR state captured to $DIR"

DR Drills

Practice makes recovery possible. Schedule regular DR drills:

Drill 1: Boot Recovery

  1. Boot a VM from rescue media
  2. Deliberately break GRUB (rename /boot/grub/grub.cfg)
  3. Reboot and fix it using the GRUB rescue prompt or live media
  4. Time yourself. Can you do it in under 15 minutes?

Drill 2: Filesystem Repair

  1. Create a VM with test data
  2. Force a power-off (simulate a crash)
  3. Boot from rescue media
  4. Run fsck and repair the filesystem
  5. Verify data integrity

Drill 3: Full System Restore

  1. Back up a VM using borg or tar
  2. Delete the VM (or create a new empty VM)
  3. Restore from backup using rescue media
  4. Boot the restored system
  5. Verify all services are running

Drill 4: Partial Restore

  1. Back up a database directory
  2. Delete the database files
  3. Restore only the database from backup
  4. Start the database and verify data integrity

Think About It: How often should DR drills be performed? What is the cost of a drill versus the cost of discovering your DR plan does not work during an actual disaster?


Debug This

A server will not boot after a power outage. It drops to an emergency shell with:

[FAILED] Failed to mount /var.
[DEPEND] Dependency failed for Local File Systems.
You are in emergency mode. ...
Give root password for maintenance:

You enter the root password and get a shell. What do you do?

Step-by-step diagnosis and repair:

# Check what filesystem /var is
$ grep /var /etc/fstab
/dev/mapper/vg_sys-lv_var   /var   ext4   defaults   0 2

# Try to check and repair it
$ fsck.ext4 -f /dev/mapper/vg_sys-lv_var
# If it asks "Fix?" answer yes (or use -y flag)

# If fsck finds and fixes errors:
$ mount /var
$ exit    # or Ctrl+D to continue normal boot

# If the filesystem is severely damaged:
# Mount read-only first, then copy critical data
$ mount -o ro /dev/mapper/vg_sys-lv_var /var

# If the LV itself is damaged, check LVM
$ vgscan
$ vgchange -ay
$ lvscan

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  DR is about recovery AFTER something breaks:             │
│                                                           │
│  Planning:                                                │
│  - Define RTO (max downtime) and RPO (max data loss)     │
│  - Document everything: disk layout, configs, procedures │
│  - Practice with DR drills                                │
│                                                           │
│  Recovery tools:                                          │
│  - Rescue USB: SystemRescue or any Linux live image       │
│  - GRUB repair: chroot + grub-install + update-grub      │
│  - Filesystem repair: fsck (ext4) / xfs_repair (XFS)    │
│  - File recovery: extundelete, testdisk, photorec        │
│  - Disk cloning: dd, ddrescue, Clonezilla                │
│                                                           │
│  Critical rules:                                          │
│  - Never fsck a mounted filesystem                        │
│  - Stop writing when recovering deleted files             │
│  - Triple-check dd device names                           │
│  - Test restores regularly                                │
│  - Document procedures BEFORE you need them               │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. Rescue media: Create a bootable USB with SystemRescue or your distribution's live ISO. Boot from it and explore the tools available.

  2. GRUB recovery: In a VM, rename /boot/grub/grub.cfg to simulate a GRUB failure. Reboot and recover from the GRUB rescue prompt. Then boot from live media and reinstall GRUB properly.

  3. Filesystem repair: In a VM, create an ext4 filesystem on a loop device, write some files, then corrupt it slightly with dd if=/dev/urandom of=/dev/loopX bs=1 count=100 seek=1024. Run fsck and observe the repair process.

  4. Disk cloning: Clone a small partition with dd to an image file. Mount the image file using a loop device and verify the contents are identical to the original.

  5. Bonus challenge: Perform a complete DR drill: back up a VM with borg, destroy the VM, create a new VM from scratch, restore from the borg backup, fix the bootloader, and boot the restored system. Document every step you take and time the entire process.

System Monitoring Tools

Why This Matters

Your web application is slow. Users are complaining. Is the CPU maxed out? Is the server swapping to disk? Is one process consuming all the memory? Is the disk I/O saturated?

You cannot fix what you cannot see. System monitoring tools are how you see inside a running Linux system. They reveal CPU usage, memory consumption, disk activity, process states, and system load -- the vital signs of your server.

Every experienced Linux administrator has these tools at their fingertips. When a system is misbehaving, the first thing they do is open top or htop and start reading the numbers. Within 30 seconds, they know whether the problem is CPU, memory, disk, or something else entirely. This chapter teaches you to read those numbers and know what they mean.


Try This Right Now

Open a terminal and run these commands. Look at what your system is doing right now:

# How long has the system been running? What is the load?
$ uptime
 14:23:15 up 12 days,  3:45,  2 users,  load average: 0.52, 0.38, 0.41

# Quick process overview
$ top -bn1 | head -20

# Memory at a glance
$ free -h

# Disk usage
$ df -h

# What is using the CPU right now?
$ ps aux --sort=-%cpu | head -10

Understanding Load Average

Before diving into tools, you need to understand load average -- the single most common metric you will encounter.

$ uptime
 14:23:15 up 12 days,  3:45,  2 users,  load average: 0.52, 0.38, 0.41
                                                        ^^^^  ^^^^  ^^^^
                                                        1min  5min  15min

Load average represents the average number of processes that are either running on a CPU or waiting for a CPU (in a runnable state), plus processes in uninterruptible sleep (usually waiting for disk I/O).

What Do the Numbers Mean?

Think of CPU cores as checkout lanes in a supermarket:

Single-core system (1 checkout lane):
  Load 0.5:  Lane is 50% utilized. No waiting.
  Load 1.0:  Lane is 100% utilized. Exactly saturated.
  Load 2.0:  Lane is full + 1 person waiting. Overloaded.

Quad-core system (4 checkout lanes):
  Load 2.0:  2 of 4 lanes busy. 50% utilized. Fine.
  Load 4.0:  All 4 lanes busy. 100% saturated.
  Load 8.0:  All lanes full + 4 people waiting. Overloaded.

Rule of thumb:
  Load < number of cores  → System is coping fine
  Load = number of cores  → System is at capacity
  Load > number of cores  → System is overloaded
# How many CPU cores do you have?
$ nproc
4

# Or from /proc
$ grep -c ^processor /proc/cpuinfo
4

# So for this system:
# Load < 4.0 → OK
# Load = 4.0 → at capacity
# Load > 4.0 → overloaded

Reading the Three Load Average Numbers

The three numbers (1-minute, 5-minute, 15-minute) tell a story:

Pattern1min5min15minInterpretation
Stable2.02.02.0Consistent moderate load
Spike8.02.01.5Recent sudden load (investigate now)
Increasing2.04.06.0Load is decreasing (recovering)
Decreasing6.04.02.0Load is increasing (getting worse)

Wait -- the third row seems backwards. Remember: the 15-minute average is the oldest. If 15min is highest and 1min is lowest, the system was heavily loaded 15 minutes ago and is getting better.

# Detailed load information
$ cat /proc/loadavg
0.52 0.38 0.41 2/345 28547
#                ^^^^^ ^^^^
#                |     Last PID assigned
#                running/total processes

Think About It: A server has 2 CPU cores and a load average of 4.0, 3.5, 1.0. Is this getting better or worse? What might have happened recently?


top: The Classic Process Monitor

top is installed on virtually every Linux system. It provides a real-time view of processes, CPU, and memory.

$ top
top - 14:30:00 up 12 days,  3:52,  2 users,  load average: 0.52, 0.38, 0.41
Tasks: 245 total,   1 running, 244 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.2 us,  2.1 sy,  0.0 ni, 91.8 id,  0.5 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem :  16384.0 total,   8234.5 free,   4521.2 used,   3628.3 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  11458.4 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1234 mysql     20   0  2.3g   512m    32m S   8.5   3.1  125:34.56 mysqld
 5678 www-data  20   0  856m   234m    12m S   3.2   1.4   45:12.89 apache2
  901 root      20   0  123m    45m     8m S   1.1   0.3   12:45.67 systemd-journald

Understanding the top Header

CPU line breakdown:

FieldMeaning
us (user)Time running user processes
sy (system)Time running kernel code
ni (nice)Time running niced (lower priority) processes
id (idle)Time doing nothing
wa (I/O wait)Time waiting for disk I/O
hi (hardware interrupt)Time handling hardware interrupts
si (software interrupt)Time handling software interrupts
st (steal)Time stolen by hypervisor (VMs only)

Key indicators of problems:

  • High wa → disk I/O bottleneck
  • High us + low id → CPU-bound process
  • High sy → lots of system calls (possible I/O or context switching)
  • High st → VM is not getting enough CPU from the host

top Interactive Commands

While top is running:

KeyAction
1Toggle individual CPU cores display
MSort by memory usage
PSort by CPU usage
TSort by cumulative time
kKill a process (enter PID)
rRenice a process (change priority)
fChoose which fields to display
cToggle full command line display
HToggle thread display
qQuit

Batch Mode for Scripts

# Run top once and capture output (for scripts/logs)
$ top -bn1 | head -30

# Run top 5 times with 2-second intervals
$ top -bn5 -d2 > /tmp/top-capture.txt

htop: top Made Beautiful

htop is an enhanced, interactive process viewer. If you use only one monitoring tool, make it htop.

# Install htop
$ sudo apt install htop    # Debian/Ubuntu
$ sudo dnf install htop    # Fedora/RHEL

$ htop
  0[||||||||||||||||                    35.2%]   Tasks: 245, 128 thr; 1 running
  1[||||||                              12.5%]   Load average: 0.52 0.38 0.41
  2[||||||||||||                        28.7%]   Uptime: 12 days, 03:52:15
  3[|||||                               10.1%]
  Mem[||||||||||||||||||||         4.52G/16.0G]
  Swp[                              0K/4.00G]

  PID USER     PRI  NI  VIRT   RES   SHR S CPU%  MEM%   TIME+  Command
 1234 mysql     20   0 2354M  512M  32.4M S  8.5   3.1 125:34  /usr/sbin/mysqld
 5678 www-data  20   0  856M  234M  12.8M S  3.2   1.4  45:12  /usr/sbin/apache2

htop Advantages Over top

  • Color-coded CPU and memory bars
  • Mouse support (click to sort, scroll to navigate)
  • Horizontal and vertical scrolling (see full command lines)
  • Tree view (shows parent-child process relationships)
  • Easy process filtering and searching

htop Interactive Commands

KeyAction
F1Help
F2Setup (customize display)
F3Search by name
F4Filter processes
F5Tree view
F6Sort by column
F9Kill process (choose signal)
F10Quit
tToggle tree view
uFilter by user
pToggle program path
HToggle user threads

Filtering and Searching in htop

# Press F4, type "nginx" to show only nginx processes
# Press F3, type "python" to find the next python process
# Press u, select "www-data" to show only that user's processes

vmstat: Virtual Memory Statistics

vmstat provides a snapshot of system-wide CPU, memory, I/O, and process statistics. It is ideal for spotting trends over time.

# Run vmstat every 2 seconds, 10 times
$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 8234496 123456 3628032  0    0     5    12  125  456  5  2 92  1  0
 0  0      0 8234112 123456 3628064  0    0     0     8  118  434  3  1 95  1  0
 2  0      0 8233728 123460 3628128  0    0     0    24  145  512  8  3 88  1  0

vmstat Column Reference

ColumnMeaningWatch For
rProcesses waiting for CPU> number of cores = CPU bottleneck
bProcesses in uninterruptible sleep> 0 sustained = I/O bottleneck
swpdVirtual memory used (KB)Growing = memory pressure
siSwap in (KB/s)> 0 sustained = needs more RAM
soSwap out (KB/s)> 0 sustained = needs more RAM
biBlocks read from disk/sHigh = lots of disk reads
boBlocks written to disk/sHigh = lots of disk writes
usCPU user time %High = CPU-bound workload
syCPU system time %High = lots of system calls
idCPU idle time %Low = CPU bottleneck
waCPU I/O wait time %High = disk I/O bottleneck

Using vmstat to Diagnose

# "Is my system swapping?"
$ vmstat 1 5
# Look at si and so columns. If they are consistently > 0, you need more RAM.

# "Is my disk the bottleneck?"
# Look at the wa column and b column.
# High wa + high b = disk I/O saturation

iostat: Disk I/O Statistics

iostat shows CPU and disk I/O statistics. It is part of the sysstat package.

# Install sysstat
$ sudo apt install sysstat    # Debian/Ubuntu
$ sudo dnf install sysstat    # Fedora/RHEL

# Basic iostat
$ iostat
Linux 6.1.0 (myhost)     01/18/2025    _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.23    0.00    2.14    0.52    0.00   92.11

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
sda              12.45        45.67        123.45         0.00    4567890   12345678

# Extended stats with 2-second interval
$ iostat -x 2
Device  r/s     w/s    rkB/s    wkB/s  rrqm/s  wrqm/s  %util  await  r_await  w_await
sda     5.23   7.22    45.67   123.45    0.45    2.34   3.45   1.23    0.89     1.45

Key fields:

  • %util -- percentage of time the device was busy. 100% means saturated.
  • await -- average time (ms) for I/O requests. High values = slow disk.
  • r/s, w/s -- reads and writes per second.

mpstat: Per-CPU Statistics

# Show all CPUs individually
$ mpstat -P ALL 2
14:30:00     CPU    %usr   %nice   %sys  %iowait   %irq   %soft  %steal  %idle
14:30:02     all    5.23    0.00   2.14     0.52    0.00    0.12    0.00  91.99
14:30:02       0    8.50    0.00   3.00     1.00    0.00    0.50    0.00  87.00
14:30:02       1   12.00    0.00   4.00     0.00    0.00    0.00    0.00  84.00
14:30:02       2    1.00    0.00   0.50     0.50    0.00    0.00    0.00  98.00
14:30:02       3    0.50    0.00   1.00     0.00    0.00    0.00    0.00  98.50

This reveals whether load is spread across cores or concentrated on one (which can indicate a single-threaded bottleneck).


sar: System Activity Reporter

sar (also from sysstat) collects and reports system activity data over time. It is the closest thing to a built-in monitoring system.

# Enable data collection (usually done by sysstat package)
$ sudo systemctl enable --now sysstat

# View today's CPU data
$ sar -u
14:00:01        CPU     %user     %nice   %system   %iowait   %steal     %idle
14:10:01        all      5.23      0.00      2.14      0.52      0.00     92.11
14:20:01        all      3.45      0.00      1.78      0.34      0.00     94.43
14:30:01        all     12.67      0.00      4.56      1.23      0.00     81.54
...

# View memory usage over time
$ sar -r

# View disk I/O over time
$ sar -d

# View network statistics
$ sar -n DEV

# View a specific day's data
$ sar -u -f /var/log/sysstat/sa18    # 18th of the month

# View data for a specific time range
$ sar -u -s 14:00:00 -e 16:00:00

sar data is collected every 10 minutes by default (via a cron job or systemd timer). This gives you historical data to analyze trends and correlate with incidents.


Key /proc Files for Monitoring

The /proc filesystem is the source of truth for all monitoring tools. Here are the most useful files:

# Load average
$ cat /proc/loadavg
0.52 0.38 0.41 2/345 28547

# Memory details
$ cat /proc/meminfo
MemTotal:       16777216 kB
MemFree:         8234496 kB
MemAvailable:   11458560 kB
Buffers:          123456 kB
Cached:          3628032 kB
SwapTotal:       4194304 kB
SwapFree:        4194304 kB
...

# CPU information
$ cat /proc/cpuinfo | grep "model name" | head -1
model name    : Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

# Per-process information
$ cat /proc/1234/status    # replace 1234 with a PID
Name:   mysqld
State:  S (sleeping)
VmRSS:  524288 kB
Threads:  32
...

glances: A Modern All-in-One Monitor

glances is a cross-platform monitoring tool that combines top, iostat, iftop, and more into a single display.

# Install glances
$ sudo apt install glances        # Debian/Ubuntu
$ sudo dnf install glances        # Fedora/RHEL
$ pip install glances             # via pip

# Run glances
$ glances

glances automatically highlights values that need attention in yellow (warning) or red (critical). It shows CPU, memory, load, disk I/O, network, processes, and more in a single screen.

# Glances in web mode (access via browser)
$ glances -w
Glances Web UI started on http://0.0.0.0:61208

# Glances client-server mode
$ glances -s    # on the server
$ glances -c server-hostname    # from the client

Distro Note: On minimal server installations, glances may pull in many Python dependencies. In such environments, htop + iostat may be more practical.


Debug This

A developer reports that their application is "slow" on the server. Here is what you see:

$ uptime
 14:30:00 up 45 days,  load average: 12.34, 11.87, 8.45

$ nproc
4

$ vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
12  3 524288  45312   8192  65536  450  890   5600  1200  890 2345 15  8  2 75  0

What is the problem? Walk through the diagnosis:

  1. Load average: 12.34 on a 4-core system. That is 3x overloaded.
  2. vmstat r column: 12 processes waiting for CPU. Confirms CPU contention.
  3. vmstat b column: 3 processes blocked on I/O.
  4. vmstat si/so: Swapping in and out heavily (450/890 KB/s). Memory pressure.
  5. vmstat wa: 75% I/O wait. The CPU is mostly waiting for disk.
  6. vmstat free: Only 45 MB free. Very low.

Diagnosis: The system is severely memory-constrained. It is swapping heavily, which causes high I/O wait, which makes the CPU idle waiting for disk. The root cause is not CPU -- it is memory.

Fix: Find the memory-hungry process (htop, sort by MEM%), and either kill it, add more RAM, or reduce the application's memory usage.


┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  System monitoring gives you visibility:                  │
│                                                           │
│  Load Average:                                            │
│  - Represents demand on the CPU                           │
│  - Compare against number of cores (nproc)                │
│  - Three values: 1min, 5min, 15min trend                  │
│                                                           │
│  Tools:                                                   │
│  - top/htop: real-time process monitoring                 │
│  - vmstat: CPU, memory, swap, I/O snapshot                │
│  - iostat: disk I/O performance                           │
│  - mpstat: per-CPU breakdown                              │
│  - sar: historical data collection                        │
│  - glances: all-in-one modern dashboard                   │
│                                                           │
│  Diagnosis pattern:                                       │
│  1. Check load average (overloaded?)                      │
│  2. Check CPU (us/sy/wa/id in top)                        │
│  3. Check memory (free, swap usage)                       │
│  4. Check disk I/O (iostat %util, await)                  │
│  5. Find the guilty process (sort by CPU or MEM)          │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. Read top: Run top, press 1 to see individual CPUs, then press M to sort by memory. Identify the top 5 memory consumers on your system.

  2. vmstat trending: Run vmstat 1 60 for one minute while doing something intensive (like compiling software or running stress --cpu 4). Watch how the numbers change.

  3. Historical data: Enable sysstat, wait a few hours, then use sar -u and sar -r to view CPU and memory trends. Identify the busiest period.

  4. Load average experiment: Run stress --cpu 8 --timeout 60 on a multi-core system. Watch the 1-minute load average rise in uptime while the 15-minute average stays low.

  5. Bonus challenge: Write a shell script that captures vmstat, iostat, and free output every minute for an hour, saves it to a log file, and then use awk or grep to find the moment of peak memory usage.

Memory Management

Why This Matters

Your monitoring dashboard shows the server using 95% of its 16 GB RAM. Should you panic? Maybe not. Linux aggressively uses free memory for disk caching. That 95% might include 8 GB of cache that can be reclaimed instantly when applications need it. But if applications are genuinely consuming all the RAM, the kernel's OOM (Out of Memory) killer will start terminating processes -- and it might choose your database.

Understanding Linux memory management is the difference between a calm "that is just cache" and a panicked "we need to add RAM immediately." It is the difference between knowing why your Java application was killed at 3 AM and being baffled by a mystery crash.

This chapter explains how Linux manages memory, how to read memory statistics correctly, how swap and the OOM killer work, and how to control memory usage with cgroups.


Try This Right Now

# See your memory usage
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            16Gi       4.5Gi       3.2Gi       256Mi       8.3Gi        11Gi
Swap:          4.0Gi          0B       4.0Gi

# What does the kernel think?
$ cat /proc/meminfo | head -10

# Which processes use the most memory?
$ ps aux --sort=-%mem | head -10

# Current swap usage
$ swapon --show

# OOM score of a running process (pick any PID)
$ cat /proc/1/oom_score

Virtual Memory Recap

Every process in Linux believes it has its own private, contiguous block of memory. This is virtual memory -- an illusion maintained by the kernel and the CPU's Memory Management Unit (MMU).

┌──────────────────────────────────────────────────────────┐
│                 VIRTUAL MEMORY                            │
│                                                           │
│  Process A sees:          Process B sees:                 │
│  ┌──────────────┐         ┌──────────────┐               │
│  │ 0x0000 Code  │         │ 0x0000 Code  │               │
│  │ 0x1000 Data  │         │ 0x1000 Data  │               │
│  │ 0x2000 Heap  │         │ 0x2000 Heap  │               │
│  │   ...        │         │   ...        │               │
│  │ 0xFFFF Stack │         │ 0xFFFF Stack │               │
│  └──────┬───────┘         └──────┬───────┘               │
│         │                        │                        │
│         │    Page Table          │    Page Table           │
│         ▼    Mapping             ▼    Mapping              │
│  ┌──────────────────────────────────────────┐             │
│  │          PHYSICAL RAM                     │             │
│  │  ┌────┬────┬────┬────┬────┬────┬────┐    │             │
│  │  │ A  │ B  │ A  │ K  │ B  │ A  │ B  │    │             │
│  │  └────┴────┴────┴────┴────┴────┴────┘    │             │
│  │     Pages scattered across physical RAM   │             │
│  └──────────────────────────────────────────┘             │
│                                                           │
│  When physical RAM is full, the kernel swaps              │
│  inactive pages to disk (swap space).                     │
└──────────────────────────────────────────────────────────┘

Key concepts:

  • Memory is divided into pages (typically 4 KB each)
  • The kernel maps virtual pages to physical RAM frames
  • Processes can allocate more virtual memory than physical RAM exists
  • When RAM is full, inactive pages are moved to swap on disk

The free Command: Reading It Correctly

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            16Gi       4.5Gi       3.2Gi       256Mi       8.3Gi        11Gi
Swap:          4.0Gi          0B       4.0Gi

What Each Column Means

ColumnMeaning
totalTotal physical RAM installed
usedRAM used by processes AND kernel
freeRAM not being used for anything at all
sharedMemory used by tmpfs (shared memory)
buff/cacheMemory used for disk buffers and file cache
availableMemory available for new processes (free + reclaimable cache)

The Critical Insight: free vs available

Do not look at free. Look at available.

The free column shows memory that is completely unused. But Linux uses "free" memory for file caching -- keeping recently read file data in RAM so that future reads are fast. This cache is instantly reclaimable when applications need memory.

Example:
  total = 16 GB
  used  =  4.5 GB   (applications)
  free  =  3.2 GB   (truly idle)
  cache =  8.3 GB   (file cache, reclaimable)
  available = 11 GB  (free + reclaimable cache)

Is this server out of memory? NO!
It has 11 GB available for applications.
The 8.3 GB of cache is HELPING performance, not wasting RAM.

Think About It: A new administrator sees free showing 200 MB on a 64 GB server and wants to add more RAM. What would you tell them? What number should they actually check?


Buffers vs Cache

Both are forms of disk caching, but they serve different purposes:

$ cat /proc/meminfo | grep -E "^(Buffers|Cached|SReclaimable)"
Buffers:          123456 kB
Cached:          3504576 kB
SReclaimable:     245760 kB

Buffers: Cache for raw block device I/O (disk metadata, superblocks, directory entries). Small in size.

Cached: Cache for file content. When you read a file, its contents are kept in the page cache. This is usually the large portion.

SReclaimable: Slab memory that can be reclaimed (kernel data structures like inode cache, dentry cache).

# Watch cache fill up as you read files
$ free -h    # note the buff/cache
$ sudo find /usr -type f -exec cat {} \; > /dev/null 2>&1
$ free -h    # buff/cache should have grown

# Drop caches (for testing only, not for production!)
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ free -h    # cache dropped, free increased

WARNING: Dropping caches in production causes performance degradation as files must be re-read from disk. Only do this for diagnostic purposes.


Swap: The Overflow Lane

Swap is disk space used as an extension of RAM. When physical memory is full, the kernel moves inactive pages to swap, freeing RAM for active processes.

# View current swap
$ swapon --show
NAME      TYPE      SIZE   USED PRIO
/dev/sda3 partition 4G     0B   -2

# Or from /proc
$ cat /proc/swaps
Filename              Type        Size       Used    Priority
/dev/sda3             partition   4194300    0       -2

# View swap in free output
$ free -h | grep Swap
Swap:          4.0Gi          0B       4.0Gi

Creating Swap Space

# Create a swap file (2 GB)
$ sudo fallocate -l 2G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile

# Verify
$ swapon --show

# Make it permanent
$ echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

# Remove swap
$ sudo swapoff /swapfile
$ sudo rm /swapfile
# (and remove the fstab entry)

Swappiness: How Aggressively to Swap

The swappiness parameter controls how aggressively the kernel moves pages to swap:

# View current swappiness (default is usually 60)
$ cat /proc/sys/vm/swappiness
60

# Temporarily change it
$ sudo sysctl vm.swappiness=10

# Make it permanent
$ echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.d/99-swappiness.conf
$ sudo sysctl --system
SwappinessBehavior
0Avoid swap unless absolutely necessary (kernel may still swap)
10Swap only when under heavy memory pressure (good for databases)
60Default -- balanced behavior
100Aggressively swap (favors keeping cache over anonymous pages)

For database servers and latency-sensitive applications, a low swappiness (10-20) is common because swapping causes latency spikes. For general-purpose servers, the default 60 is usually fine.

Think About It: Why might you NOT want to set swappiness to 0 on a database server? What happens if a memory leak slowly consumes all RAM and there is no swap?


The OOM Killer

When the system runs out of both RAM and swap, the kernel invokes the OOM (Out of Memory) Killer. Its job is to kill processes to free memory and keep the system alive.

How the OOM Killer Chooses Victims

Every process has an OOM score from 0 to 1000. The process with the highest score gets killed first.

# View OOM score of a process
$ cat /proc/1234/oom_score
15

# View OOM score adjustment
$ cat /proc/1234/oom_score_adj
0

The OOM score is calculated based on:

  • How much memory the process is using (more memory = higher score)
  • How long the process has been running (shorter = higher score)
  • The oom_score_adj adjustment (-1000 to +1000)

Protecting Critical Processes from OOM

# Make a process immune to the OOM killer
$ echo -1000 | sudo tee /proc/1234/oom_score_adj

# Make a process more likely to be killed
$ echo 500 | sudo tee /proc/5678/oom_score_adj

Common strategy:

Processoom_score_adjRationale
sshd-1000Never kill SSH -- you need it to fix things
database-500Protect critical data
web server0Default priority
batch jobs500Kill these first

Setting OOM Protection in systemd Services

# In a systemd service file
[Service]
OOMScoreAdjust=-1000

Detecting OOM Kills

# Check kernel logs for OOM events
$ dmesg | grep -i "oom"
[12345.678901] Out of memory: Killed process 5678 (java) total-vm:8388608kB, anon-rss:6291456kB

# Or in journald
$ journalctl -k | grep -i oom

# Check if a specific process was OOM-killed
$ journalctl -k | grep "Killed process"

OOM Anatomy: What an OOM Kill Looks Like

Jan 18 14:30:00 server kernel: java invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE)
Jan 18 14:30:00 server kernel: Mem-Info:
Jan 18 14:30:00 server kernel: active_anon:4012345 inactive_anon:23456 ...
Jan 18 14:30:00 server kernel: Free swap  = 0kB
Jan 18 14:30:00 server kernel: Total swap = 4194300kB
Jan 18 14:30:00 server kernel: Out of memory: Killed process 5678 (java)
                                total-vm:8388608kB, anon-rss:6291456kB, file-rss:12345kB

This tells you:

  • java triggered the OOM killer
  • Swap is 100% full (Free swap = 0kB)
  • The java process was using ~6 GB of RSS (resident set size)

/proc/meminfo Deep Dive

$ cat /proc/meminfo

Key fields explained:

FieldMeaning
MemTotalTotal usable RAM (slightly less than physical due to kernel reservations)
MemFreeCompletely unused RAM
MemAvailableEstimated memory available for applications (includes reclaimable cache)
BuffersRaw disk block cache
CachedFile content cache (page cache)
SwapCachedSwap data also in RAM (avoids re-reading from swap)
ActiveRecently accessed memory (less likely to be reclaimed)
InactiveNot recently accessed (candidate for reclamation or swap)
Active(anon)Anonymous pages (application heap/stack) recently used
Inactive(anon)Anonymous pages not recently used (swap candidates)
Active(file)File-backed pages recently used
Inactive(file)File-backed pages (reclaimable without swap)
DirtyPages modified in memory but not yet written to disk
SlabKernel data structure cache
SReclaimableSlab memory that can be freed
SUnreclaimSlab memory that cannot be freed
Committed_ASTotal memory committed (promised) to processes
VmallocTotalTotal vmalloc address space
HugePages_TotalNumber of hugepages allocated
# Quick check: is the system under memory pressure?
$ awk '/MemAvailable/ {avail=$2} /MemTotal/ {total=$2} END {printf "%.1f%% available\n", avail/total*100}' /proc/meminfo
68.3% available

Finding Memory-Hungry Processes

Using ps

# Top 10 memory consumers by RSS (Resident Set Size)
$ ps aux --sort=-%mem | head -11
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mysql     1234  2.3  12.5 2415648 2048000 ?     Ssl  Jan15  48:23 /usr/sbin/mysqld
www-data  5678  0.8   5.2  892416  851968 ?     S    Jan15  12:45 /usr/sbin/apache2
redis     9012  0.1   3.1  234567  507904 ?     Ssl  Jan15   2:34 /usr/bin/redis-server

# VSZ = Virtual Size (address space, often much larger than actual usage)
# RSS = Resident Set Size (actual physical RAM used)
# %MEM = RSS as a percentage of total RAM

Using smem (More Accurate)

# Install smem
$ sudo apt install smem

# smem accounts for shared memory properly
$ sudo smem -rkt -s pss | head -10
  PID User     Command                         Swap      USS      PSS      RSS
 1234 mysql    /usr/sbin/mysqld                   0   1.90G    1.92G    1.95G
 5678 www-data /usr/sbin/apache2                  0  78.5M   120.3M   234.5M

USS (Unique Set Size): Memory uniquely owned by this process. PSS (Proportional Set Size): USS + proportional share of shared memory. RSS (Resident Set Size): All physical memory used, including shared.

PSS is the most accurate measure of a process's true memory footprint.

Using /proc/PID/status

$ cat /proc/1234/status | grep -E "^(Name|VmRSS|VmSize|VmSwap|RssAnon|RssFile)"
Name:   mysqld
VmSize:  2415648 kB    # Virtual memory size
VmRSS:   2048000 kB    # Physical memory in use
VmSwap:        0 kB    # Memory swapped to disk
RssAnon: 1900000 kB    # Anonymous (heap/stack) memory
RssFile:  148000 kB    # File-backed memory (mmap'd files)

Memory Leak Detection Basics

A memory leak occurs when a process continuously allocates memory without freeing it.

Spotting a Leak

# Watch a process's memory over time
$ while true; do
    ps -p 1234 -o pid,rss,vsz,comm --no-headers
    sleep 60
  done | tee /tmp/mem-watch.log

# If RSS grows continuously without leveling off, it is likely a leak.

# Or use pidstat (from sysstat)
$ pidstat -r -p 1234 60
14:30:00  PID  minflt/s  majflt/s     VSZ       RSS    %MEM  Command
14:31:00  1234   45.23      0.00    2415648   2048000   12.5  mysqld
14:32:00  1234   52.17      0.00    2415648   2052096   12.5  mysqld
14:33:00  1234   48.90      0.00    2419744   2060288   12.6  mysqld  ← growing

Using valgrind (For Development)

# Run a program under valgrind to detect leaks
$ valgrind --leak-check=full --show-leak-kinds=all ./my_application
==12345== LEAK SUMMARY:
==12345==    definitely lost: 1,234 bytes in 5 blocks
==12345==    indirectly lost: 5,678 bytes in 12 blocks

Cgroups Memory Limits

Control groups (cgroups) allow you to limit memory usage for a process or group of processes. This prevents a single application from consuming all system memory.

Using systemd (cgroups v2)

# Limit a service to 512 MB of memory
$ sudo systemctl set-property myapp.service MemoryMax=512M

# Or in the service file
$ sudo vim /etc/systemd/system/myapp.service
[Service]
MemoryMax=512M
MemorySwapMax=0        # No swap allowed
MemoryHigh=400M        # Start throttling at 400M
# Check current memory usage of a service
$ systemctl status myapp.service
    Memory: 234.5M (max: 512.0M available: 277.5M)

# Or via cgroupfs
$ cat /sys/fs/cgroup/system.slice/myapp.service/memory.current
245891072

$ cat /sys/fs/cgroup/system.slice/myapp.service/memory.max
536870912

Manual cgroups v2

# Create a cgroup
$ sudo mkdir /sys/fs/cgroup/mygroup

# Set memory limit (256 MB)
$ echo 268435456 | sudo tee /sys/fs/cgroup/mygroup/memory.max

# Add a process to the cgroup
$ echo 1234 | sudo tee /sys/fs/cgroup/mygroup/cgroup.procs

# Check usage
$ cat /sys/fs/cgroup/mygroup/memory.current

Hands-On: Memory Pressure Simulation

# Install stress-ng for memory testing
$ sudo apt install stress-ng

# Allocate 2 GB of memory for 30 seconds
$ stress-ng --vm 1 --vm-bytes 2G --timeout 30s &

# In another terminal, watch the memory impact
$ watch -n1 free -h

# Watch the OOM score change
$ watch -n1 "cat /proc/$(pgrep -f stress-ng | head -1)/oom_score"

Think About It: If you run stress-ng --vm 1 --vm-bytes 20G on a system with 16 GB of RAM and 4 GB of swap, what happens? Which kernel subsystem intervenes?


Debug This

A server has 32 GB of RAM. An application team says the server is "out of memory" and requests a RAM upgrade. Here is what you see:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            32Gi        28Gi       512Mi       128Mi       3.5Gi       3.8Gi
Swap:          8.0Gi       2.1Gi       5.9Gi

Questions to ask:

  1. Is the server truly out of memory? available shows 3.8 GB. There is still memory available.
  2. But swap is being used (2.1 GB). This means the system has been under memory pressure at some point. Active swapping causes performance issues.
  3. What is using the memory?
$ ps aux --sort=-%mem | head -5
USER    PID %CPU %MEM    VSZ    RSS    COMMAND
java   4567  5.2 62.5  24G    20G     java -Xmx20G ...

The Java application has a 20 GB heap configured. On a 32 GB system, that leaves only 12 GB for the OS, cache, and all other processes. The real fix is not more RAM -- it is right-sizing the Java heap.


┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  Linux memory management key points:                      │
│                                                           │
│  Reading free:                                            │
│  - Ignore "free" -- look at "available"                   │
│  - buff/cache is reclaimable, not wasted                  │
│  - Swap usage indicates past memory pressure              │
│                                                           │
│  Swap:                                                    │
│  - Overflow for when RAM is full                          │
│  - swappiness controls how aggressively pages are swapped │
│  - Set low (10-20) for latency-sensitive workloads        │
│                                                           │
│  OOM Killer:                                              │
│  - Last resort when RAM + swap are exhausted              │
│  - Kills the highest oom_score process                    │
│  - Protect critical services with oom_score_adj=-1000     │
│                                                           │
│  Controlling memory:                                      │
│  - cgroups/systemd MemoryMax to limit per-service usage   │
│  - Find leaks with pidstat/valgrind                       │
│  - Monitor with /proc/meminfo and ps/smem                 │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. Read free correctly: Run free -h on your system. Calculate what percentage of total RAM is actually available. Is the system under memory pressure?

  2. Watch caching: Clear the page cache (echo 3 > /proc/sys/vm/drop_caches), note the free and buff/cache values, then read a large file. Watch buff/cache grow and free shrink. Verify that available barely changes.

  3. Swappiness experiment: Check your current swappiness. Create a swap file, change swappiness to 100, and run a memory stress test. Then change swappiness to 10 and repeat. Compare how quickly swap is used.

  4. OOM exploration: Find the OOM scores of all your running processes. Which process would the OOM killer target first? Set oom_score_adj=-1000 on your SSH daemon to protect it.

  5. Bonus challenge: Create a systemd service with MemoryMax=100M that runs a script which tries to allocate 200 MB. Watch the OOM killer terminate it. Check journalctl -k for the OOM event.

Disk I/O & Performance

Why This Matters

Your database queries are taking 30 seconds instead of 3. The application is not doing anything differently. The CPU is mostly idle. Memory is fine. What is going on?

You check iostat and see the disk is 100% utilized with average I/O latency of 45 milliseconds. The disk is the bottleneck. Maybe a backup job is running and saturating the disk. Maybe the working set no longer fits in the page cache. Maybe the I/O scheduler is not suited for this workload.

Disk I/O is frequently the slowest component in any system. RAM operates in nanoseconds. SSDs in microseconds. Spinning hard drives in milliseconds. That is a million-fold difference between RAM and HDD. Understanding I/O performance -- how to measure it, how to identify bottlenecks, and how to tune it -- is a fundamental skill for any Linux administrator.


Try This Right Now

# What disks do you have?
$ lsblk -d -o NAME,ROTA,SIZE,MODEL
# ROTA=1 means rotational (HDD), ROTA=0 means SSD/NVMe

# What I/O scheduler is in use?
$ cat /sys/block/sda/queue/scheduler
# or for NVMe:
$ cat /sys/block/nvme0n1/queue/scheduler

# Quick I/O stats
$ iostat -x 1 3

# Who is doing I/O right now?
$ sudo iotop -o -b -n 3 2>/dev/null || echo "Install iotop: sudo apt install iotop"

I/O Schedulers

The I/O scheduler determines the order in which disk I/O requests are served. Different schedulers optimize for different workloads.

Available Schedulers

# View available schedulers for a device
$ cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

# The one in brackets is currently active
┌──────────────────────────────────────────────────────────┐
│                  I/O SCHEDULERS                           │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  mq-deadline                                              │
│  - Default for most setups                                │
│  - Ensures requests are served within a deadline          │
│  - Good for: databases, mixed workloads                   │
│  - Prevents starvation of reads by heavy writes           │
│                                                           │
│  bfq (Budget Fair Queueing)                               │
│  - Fair scheduling between processes                      │
│  - Good for: desktops, interactive workloads              │
│  - Higher CPU overhead than mq-deadline                   │
│  - Best when fairness matters more than throughput         │
│                                                           │
│  kyber                                                    │
│  - Lightweight, designed for fast devices (NVMe, SSD)     │
│  - Good for: SSDs with high IOPS capability               │
│  - Minimal CPU overhead                                   │
│  - Separates reads and writes into different queues        │
│                                                           │
│  none                                                     │
│  - No scheduling at all (FIFO)                            │
│  - Good for: NVMe devices with internal schedulers        │
│  - Minimum latency, no CPU overhead                       │
│  - Best when the device itself handles scheduling         │
│                                                           │
└──────────────────────────────────────────────────────────┘

Changing the I/O Scheduler

# Change at runtime (immediate, not persistent)
$ echo mq-deadline | sudo tee /sys/block/sda/queue/scheduler
$ echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

# Verify
$ cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

# Make it persistent with a udev rule
$ sudo vim /etc/udev/rules.d/60-ioschedulers.rules
# Set mq-deadline for rotational (HDD) devices
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="mq-deadline"

# Set none for non-rotational (SSD/NVMe) devices
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="none"
# Reload udev rules
$ sudo udevadm control --reload-rules
$ sudo udevadm trigger

Which Scheduler for Which Situation?

Device TypeWorkloadRecommended Scheduler
HDDDatabasemq-deadline
HDDDesktop/interactivebfq
HDDGeneral servermq-deadline
SSD (SATA)Databasemq-deadline or none
SSD (SATA)Generalmq-deadline
NVMe SSDAnynone

Think About It: Why would none be the best scheduler for NVMe SSDs? What advantage does the NVMe device have that makes kernel-level scheduling unnecessary?


iostat Deep Dive

iostat is your primary tool for understanding disk I/O performance. Let us break down every column.

# Install if needed (part of sysstat)
$ sudo apt install sysstat    # Debian/Ubuntu
$ sudo dnf install sysstat    # Fedora/RHEL

# Extended statistics, 2-second interval
$ iostat -xz 2
Linux 6.1.0 (myhost)     01/18/2025    _x86_64_    (4 CPU)

Device  r/s     w/s    rkB/s    wkB/s  rrqm/s  wrqm/s  %rrqm  %wrqm  r_await  w_await  aqu-sz  rareq-sz  wareq-sz  svctm  %util
sda     45.23  123.45  5678.90  12345.67  2.34    8.90   4.92   6.72    0.89     1.45     0.15   125.56     100.01   0.45   7.60

Column Reference

ColumnFull NameMeaningWatch For
r/sReads per secondIOPS for reads
w/sWrites per secondIOPS for writes
rkB/sRead KB/secRead throughput
wkB/sWrite KB/secWrite throughput
rrqm/sRead requests merged/sAdjacent reads merged into one
wrqm/sWrite requests merged/sAdjacent writes merged into one
r_awaitRead await (ms)Average read latency> 10ms (SSD) or > 20ms (HDD)
w_awaitWrite await (ms)Average write latency> 10ms (SSD) or > 20ms (HDD)
aqu-szAverage queue sizeHow many I/O requests are queued> 1 sustained = saturation
svctmService time (ms)Time to actually service the I/ODeprecated, use await
%utilUtilizationPercentage of time device was busy> 80% = concerning

Reading iostat Like a Pro

# Scenario 1: Healthy SSD
Device  r/s     w/s    r_await  w_await  aqu-sz  %util
sda     150.0   200.0    0.15     0.25    0.05   3.50
# Low latency, low queue, low utilization. All good.

# Scenario 2: Saturated HDD
Device  r/s     w/s    r_await  w_await  aqu-sz  %util
sda     120.0    30.0   45.00    12.00    5.60  99.80
# High latency (45ms reads), deep queue, fully utilized. Bottleneck!

# Scenario 3: Write-heavy workload
Device  r/s     w/s    r_await  w_await  aqu-sz  %util
sda       5.0  500.0    0.50    15.00    7.50  85.00
# Lots of writes, high write latency, high utilization.
# Maybe a log-heavy application or database checkpoint.

iotop: Per-Process I/O

iotop shows which processes are performing the most I/O -- the I/O equivalent of top for CPU.

# Install iotop
$ sudo apt install iotop    # Debian/Ubuntu
$ sudo dnf install iotop    # Fedora/RHEL

# Run iotop (requires root)
$ sudo iotop
Total DISK READ:       5.23 M/s | Total DISK WRITE:      12.45 M/s
Current DISK READ:     5.23 M/s | Current DISK WRITE:    12.45 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 1234 be/4  mysql      3.45 M/s    8.90 M/s  0.00 %  65.23 %  mysqld
 5678 be/4  www-data   1.23 M/s    2.34 M/s  0.00 %  12.45 %  apache2
 9012 be/4  root       0.50 M/s    1.21 M/s  0.00 %   5.67 %  rsync

iotop Options

# Show only processes actually doing I/O
$ sudo iotop -o

# Batch mode (for scripting)
$ sudo iotop -b -n 5 -d 2

# Show accumulated I/O instead of bandwidth
$ sudo iotop -a

# Show specific process
$ sudo iotop -p 1234

iotop Alternative: pidstat

If iotop is not available, pidstat from the sysstat package can show per-process I/O:

# Show I/O for all processes, every 2 seconds
$ pidstat -d 2
14:30:00   PID   kB_rd/s   kB_wr/s   kB_ccwr/s  iodelay  Command
14:30:02  1234   3534.00   9114.00       0.00      15     mysqld
14:30:02  5678   1260.60   2396.40       0.00       3     apache2

fio: Benchmarking Disk Performance

fio (Flexible I/O Tester) is the standard tool for benchmarking storage performance. It can simulate virtually any I/O workload.

# Install fio
$ sudo apt install fio    # Debian/Ubuntu
$ sudo dnf install fio    # Fedora/RHEL

# Random read test (simulates database workload)
$ fio --name=random-read \
      --ioengine=libaio \
      --direct=1 \
      --bs=4k \
      --iodepth=32 \
      --numjobs=4 \
      --size=1G \
      --rw=randread \
      --runtime=30 \
      --time_based \
      --directory=/tmp

# Key output:
#   read: IOPS=45678, BW=178MiB/s
#   lat (usec): min=45, max=2345, avg=89.23

# Sequential write test (simulates log writing)
$ fio --name=seq-write \
      --ioengine=libaio \
      --direct=1 \
      --bs=128k \
      --iodepth=8 \
      --numjobs=1 \
      --size=2G \
      --rw=write \
      --runtime=30 \
      --time_based \
      --directory=/tmp

# Mixed random read/write (70/30 split, simulates OLTP)
$ fio --name=mixed-rw \
      --ioengine=libaio \
      --direct=1 \
      --bs=4k \
      --iodepth=32 \
      --numjobs=4 \
      --size=1G \
      --rw=randrw \
      --rwmixread=70 \
      --runtime=30 \
      --time_based \
      --directory=/tmp

Understanding fio Output

random-read: (groupid=0, jobs=4): err= 0: pid=1234
  read: IOPS=45.7k, BW=178MiB/s (187MB/s)(5.22GiB/30001msec)
    slat (nsec): min=1200, max=123456, avg=2345.67
    clat (usec): min=45, max=12345, avg=89.23, stdev=34.56
     lat (usec): min=47, max=12348, avg=91.57, stdev=34.78
    clat percentiles (usec):
     |  1.00th=[   52],  5.00th=[   58], 10.00th=[   62],
     | 50.00th=[   82], 90.00th=[  120], 95.00th=[  145],
     | 99.00th=[  245], 99.50th=[  334], 99.90th=[  734],
     | 99.95th=[ 1123], 99.99th=[ 2345]

Key metrics:

  • IOPS: Number of I/O operations per second (higher = better for random workloads)
  • BW (Bandwidth): Throughput in MB/s (higher = better for sequential workloads)
  • clat (completion latency): Time from submission to completion (lower = better)
  • Percentiles: p99 latency matters more than average for databases

IOPS, Throughput, and Latency

┌──────────────────────────────────────────────────────────┐
│         THE THREE I/O PERFORMANCE METRICS                 │
│                                                           │
│  IOPS (I/O Operations Per Second)                         │
│  - How many read/write operations per second              │
│  - Critical for: databases, random I/O workloads          │
│  - HDD: ~100-200 IOPS | SSD: 10K-100K+ IOPS             │
│                                                           │
│  Throughput (MB/s)                                         │
│  - How much data transferred per second                   │
│  - Critical for: streaming, backups, sequential I/O       │
│  - HDD: ~100-200 MB/s | SSD: 500-7000 MB/s               │
│                                                           │
│  Latency (ms or us)                                        │
│  - How long each operation takes                          │
│  - Critical for: user-facing applications, databases      │
│  - HDD: 5-15ms | SSD: 0.1-1ms | NVMe: 0.01-0.1ms        │
│                                                           │
│  Relationship:                                            │
│  IOPS = 1 / Latency (approximately, with queue depth 1)  │
│  Throughput = IOPS x Block Size                           │
│                                                           │
└──────────────────────────────────────────────────────────┘

I/O Wait in top

When you see high wa (I/O wait) in top, it means CPUs are idle because processes are waiting for disk I/O.

%Cpu(s):  2.1 us,  1.3 sy,  0.0 ni,  21.5 id,  75.0 wa,  0.0 hi,  0.1 si,  0.0 st
                                                  ^^^^
                                              75% I/O wait!

I/O wait is NOT CPU usage -- it means the CPU has nothing to do because it is waiting for disk. High I/O wait indicates a disk bottleneck, not a CPU problem.

Diagnosing High I/O Wait

# Step 1: Confirm I/O wait
$ top -bn1 | head -5
# Look at %wa

# Step 2: Identify the saturated device
$ iostat -xz 1 3
# Look for %util near 100%

# Step 3: Find the guilty process
$ sudo iotop -o
# Identify the process with highest disk I/O

# Step 4: Understand what it is doing
$ sudo strace -e trace=read,write,open -p <PID> 2>&1 | head -20
# Shows exactly which files the process is reading/writing

Disk Cache and Page Cache

Linux uses free RAM as a read cache for disk data. This is the page cache -- one of the most important performance features of the kernel.

┌──────────────────────────────────────────────────────────┐
│                    PAGE CACHE                             │
│                                                           │
│   Application reads file "data.db"                        │
│         │                                                 │
│         ▼                                                 │
│   ┌─── Is it in the page cache? ───┐                     │
│   │                                 │                     │
│   YES                               NO                    │
│   │                                 │                     │
│   ▼                                 ▼                     │
│   Return from RAM                Read from disk           │
│   (~100 nanoseconds)             (~10 milliseconds)       │
│                                     │                     │
│                                     ▼                     │
│                               Store in page cache         │
│                               (for future reads)          │
│                                     │                     │
│                                     ▼                     │
│                               Return to application       │
│                                                           │
│   Speed difference: ~100,000x faster from cache!          │
└──────────────────────────────────────────────────────────┘
# See page cache usage
$ free -h | grep Mem
Mem:            16Gi       4.5Gi       3.2Gi       256Mi       8.3Gi        11Gi
#                                                              ^^^^
#                                                         8.3 GB of cache

# See cache hit rate with cachestat (if available)
# Part of BCC/BPF tools
$ sudo cachestat 1
    HITS   MISSES  DIRTIES HITRATIO   BUFFERS_MB  CACHED_MB
   45678     234      567   99.49%          120       8192

A 99%+ hit ratio means almost all reads come from cache -- excellent performance.


Tuning Dirty Page Writeback

When applications write data, it goes into the page cache first (dirty pages) and is flushed to disk later by background kernel threads. You can tune how quickly dirty pages are written to disk.

# View current settings
$ sysctl vm.dirty_ratio
vm.dirty_ratio = 20

$ sysctl vm.dirty_background_ratio
vm.dirty_background_ratio = 10

$ sysctl vm.dirty_expire_centisecs
vm.dirty_expire_centisecs = 3000

$ sysctl vm.dirty_writeback_centisecs
vm.dirty_writeback_centisecs = 500
ParameterDefaultMeaning
dirty_ratio20Max % of total RAM for dirty pages. Processes block when exceeded.
dirty_background_ratio10Start background writeback when dirty pages exceed this %
dirty_expire_centisecs3000Dirty pages older than 30s get written out
dirty_writeback_centisecs500Flush thread wakes up every 5s to check for dirty pages

Tuning for Different Workloads

# Database server: flush to disk more aggressively (data safety)
$ sudo sysctl -w vm.dirty_ratio=5
$ sudo sysctl -w vm.dirty_background_ratio=2

# Write-heavy batch server: allow more dirty pages (throughput)
$ sudo sysctl -w vm.dirty_ratio=40
$ sudo sysctl -w vm.dirty_background_ratio=10

# Make persistent
$ sudo vim /etc/sysctl.d/99-disk-tuning.conf
vm.dirty_ratio = 5
vm.dirty_background_ratio = 2

Think About It: If you set dirty_ratio=80 on a server with 16 GB of RAM, up to 12.8 GB of data could be in memory but not yet on disk. What happens if the power fails?


SSD vs HDD Considerations

┌───────────────────────────────────────────────────────┐
│              SSD vs HDD CHARACTERISTICS                │
├──────────────────┬──────────────┬─────────────────────┤
│  Property         │  HDD         │  SSD/NVMe           │
├──────────────────┼──────────────┼─────────────────────┤
│  Random IOPS     │  100-200     │  10K-1M+            │
│  Sequential R/W  │  100-200 MB/s│  500-7000 MB/s      │
│  Latency         │  5-15 ms     │  0.01-1 ms          │
│  Seek time       │  Yes (slow)  │  No (instant)       │
│  Write endurance │  Unlimited   │  Limited (TBW)      │
│  Cost per GB     │  Low         │  Higher             │
│  Power usage     │  Higher      │  Lower              │
│  Noise           │  Yes         │  Silent             │
│  I/O scheduler   │  mq-deadline │  none               │
│  Defrag needed   │  Yes         │  No (harmful!)      │
└──────────────────┴──────────────┴─────────────────────┘

SSD-Specific Considerations

# Check if TRIM is supported
$ sudo hdparm -I /dev/sda | grep -i trim
    *    Data Set Management TRIM supported

# Enable periodic TRIM via systemd timer
$ sudo systemctl enable --now fstrim.timer

# Or use continuous TRIM in fstab (less preferred, slight overhead)
# /dev/sda1  /  ext4  defaults,discard  0 1

# Check SSD write endurance (Total Bytes Written)
$ sudo smartctl -A /dev/sda | grep -i "total.*written"
241 Total_LBAs_Written      0x0032   099   099   ---    Old_age   512345678

WARNING: Never defragment an SSD. Defragmentation writes data unnecessarily, reducing the SSD's lifespan without improving performance (SSDs have no seek time to optimize away).


Debug This

An administrator reports that the server "feels slow" after they enabled full database logging. Here is the diagnostic data:

$ iostat -x 1 3
Device  r/s     w/s    rkB/s    wkB/s   r_await  w_await  aqu-sz  %util
sda     12.0   850.0   48.0     65000.0   0.5     25.0     21.2   99.8

Analysis:

  1. 850 writes/second at 65 MB/s write throughput -- this is a write-heavy workload.
  2. Write latency is 25ms -- slow, indicating saturation.
  3. Queue depth is 21.2 -- deep queue means requests are piling up.
  4. %util is 99.8% -- the device is fully saturated.
  5. Read performance is fine (r_await 0.5ms) when reads can get through.

The problem: Full database logging is generating massive write I/O that is saturating the disk.

Solutions:

  1. Move the log files to a separate, faster disk (SSD/NVMe)
  2. Reduce logging verbosity
  3. Use async writes for logs (accept small risk of data loss on crash)
  4. Increase dirty page ratios to batch more writes together
  5. Upgrade to an SSD if currently on HDD

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  I/O schedulers control request ordering:                 │
│  - mq-deadline: good for HDDs, databases                  │
│  - bfq: good for interactive/desktop                      │
│  - kyber: lightweight, for fast SSDs                      │
│  - none: best for NVMe (device does its own scheduling)  │
│                                                           │
│  Key tools:                                               │
│  - iostat: device-level I/O statistics                    │
│  - iotop: per-process I/O usage                           │
│  - fio: I/O benchmarking                                  │
│                                                           │
│  Three metrics that matter:                               │
│  - IOPS: operations per second (random workloads)         │
│  - Throughput: MB/s (sequential workloads)                │
│  - Latency: response time per operation                   │
│                                                           │
│  Page cache makes reads fast by caching in RAM.           │
│  Dirty page settings control write buffering.             │
│  I/O wait (%wa in top) = CPU idle, waiting for disk.      │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. Identify your scheduler: Check which I/O scheduler each of your block devices is using. Is it appropriate for the device type (HDD vs SSD)?

  2. iostat monitoring: Run iostat -xz 2 while copying a large file. Observe the %util, await, and throughput columns changing in real time.

  3. fio benchmark: Benchmark your disk with fio using a 4K random read test. Record the IOPS and latency. Then run a sequential write test and record the throughput. How do your numbers compare to the theoretical maximums for your device type?

  4. Find the I/O hog: Use iotop during normal system operation to identify which processes are performing the most disk I/O. Are any of them surprising?

  5. Bonus challenge: Change the I/O scheduler on one of your devices, run the same fio benchmark, and compare results. Try mq-deadline vs bfq vs none and document the differences in IOPS and latency.

Network Monitoring

Why This Matters

A user reports that the application is "slow." You have checked CPU, memory, and disk -- they are all fine. The problem is the network. But where? Is it bandwidth saturation? Packet loss? High latency to a downstream service? A single application hogging all the bandwidth?

Network problems are notoriously difficult to diagnose because they involve multiple hosts, multiple hops, and multiple layers. You cannot just look at one number and know the answer. You need tools that show you bandwidth usage per interface, per process, and per connection. You need tools that measure latency along every hop. You need tools that capture and analyze individual packets.

This chapter covers the essential network monitoring toolkit: iftop, nethogs, iperf3, ss, nload, mtr, and tcpdump. These tools will give you visibility into what your network is doing and help you find problems fast.


Try This Right Now

# What network interfaces do you have?
$ ip -br link show
lo         UNKNOWN  00:00:00:00:00:00
eth0       UP       52:54:00:ab:cd:ef

# How much traffic has passed through them?
$ ip -s link show eth0

# What sockets are open?
$ ss -tuln

# Quick latency check to a well-known host
$ ping -c 5 1.1.1.1

# How many established connections?
$ ss -t state established | wc -l

iftop: Bandwidth by Connection

iftop shows real-time bandwidth usage per connection on a network interface. Think of it as top for network traffic.

# Install iftop
$ sudo apt install iftop    # Debian/Ubuntu
$ sudo dnf install iftop    # Fedora/RHEL

# Run on the default interface
$ sudo iftop

# Run on a specific interface
$ sudo iftop -i eth0

# Show port numbers instead of service names
$ sudo iftop -P
                     12.5Kb         25.0Kb         37.5Kb         50.0Kb
└────────────────────┴──────────────┴──────────────┴──────────────┘
myhost              => db-server.local               4.23Kb  3.12Kb  2.89Kb
                    <=                                1.45Kb  1.23Kb  1.12Kb
myhost              => cdn.example.com              12.50Kb  8.90Kb  7.45Kb
                    <=                               45.2Kb  34.5Kb  28.9Kb
myhost              => api.service.com               2.34Kb  1.89Kb  1.56Kb
                    <=                                5.67Kb  4.23Kb  3.45Kb

────────────────────────────────────────────────────────────────────
TX:             cum:   2.34MB   peak:   125Kb   rates:  19.1Kb  14.0Kb  11.9Kb
RX:                    5.67MB           234Kb           52.3Kb  40.0Kb  33.5Kb
TOTAL:                 8.01MB           359Kb           71.4Kb  54.0Kb  45.4Kb

iftop Interactive Commands

KeyAction
hHelp
nToggle DNS resolution
sToggle source host display
dToggle destination host display
SToggle source port display
DToggle destination port display
tCycle through display modes (2-line, 1-line, sent only, received only)
pToggle port display
PPause display
j/kScroll the list
1/2/3Sort by 2s/10s/40s average
qQuit

Filtering Traffic

# Only show traffic to/from a specific host
$ sudo iftop -f "host 192.168.1.10"

# Only show traffic on a specific port
$ sudo iftop -f "port 443"

# Combine filters
$ sudo iftop -f "host 192.168.1.10 and port 80"

nethogs: Bandwidth by Process

While iftop shows bandwidth per connection, nethogs shows bandwidth per process. This answers the question: "which application is using all the bandwidth?"

# Install nethogs
$ sudo apt install nethogs    # Debian/Ubuntu
$ sudo dnf install nethogs    # Fedora/RHEL

# Run on default interface
$ sudo nethogs

# Run on a specific interface
$ sudo nethogs eth0
NetHogs version 0.8.7

    PID USER     PROGRAM                          DEV        SENT      RECEIVED
   1234 user     /usr/bin/rsync                   eth0      45.234    123.456 KB/sec
   5678 www-data /usr/sbin/nginx                  eth0       8.901     23.456 KB/sec
   9012 root     /usr/bin/apt                     eth0       0.234     15.789 KB/sec
   3456 user     /usr/bin/ssh                     eth0       1.234      0.567 KB/sec

  TOTAL                                                     55.603    163.268 KB/sec

nethogs Interactive Commands

KeyAction
mCycle between KB/s, KB, B, MB
rSort by received
sSort by sent
qQuit

Think About It: You see that rsync is consuming 90% of your bandwidth. Is this a problem? How would you decide whether to throttle it?


iperf3: Network Benchmarking

iperf3 measures maximum achievable bandwidth between two endpoints. It is the gold standard for network performance testing.

# Install iperf3
$ sudo apt install iperf3    # Debian/Ubuntu
$ sudo dnf install iperf3    # Fedora/RHEL

Basic Bandwidth Test

You need two machines: a server and a client.

# On the server side:
$ iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

# On the client side:
$ iperf3 -c server-ip-address
Connecting to host server-ip-address, port 5201
[  5] local 192.168.1.20 port 43210 connected to 192.168.1.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec    0
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec    0
...
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec    0   sender
[  5]   0.00-10.00  sec  1.09 GBytes   938 Mbits/sec        receiver

Advanced iperf3 Tests

# Test with multiple parallel streams
$ iperf3 -c server-ip -P 4

# Test UDP performance (default is TCP)
$ iperf3 -c server-ip -u -b 100M
# -b sets target bandwidth for UDP

# Reverse test (server sends, client receives)
$ iperf3 -c server-ip -R

# Longer test (60 seconds instead of default 10)
$ iperf3 -c server-ip -t 60

# Test with specific window/buffer size
$ iperf3 -c server-ip -w 512K

# Bidirectional test
$ iperf3 -c server-ip --bidir

Interpreting iperf3 Results

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec    3   256 KBytes
FieldMeaning
TransferAmount of data transferred
BitrateThroughput achieved
RetrTCP retransmissions (should be 0 or very low)
CwndTCP congestion window size

High retransmissions indicate packet loss or network congestion.


ss: Socket Statistics Deep Dive

ss (socket statistics) is the modern replacement for netstat. It is faster and provides more detailed information.

Basic ss Usage

# All TCP connections
$ ss -t
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
ESTAB  0       0       192.168.1.20:22     192.168.1.10:54321
ESTAB  0       0       192.168.1.20:80     203.0.113.50:12345

# All listening ports
$ ss -tln
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
LISTEN 0       128     0.0.0.0:22           0.0.0.0:*
LISTEN 0       511     0.0.0.0:80           0.0.0.0:*
LISTEN 0       128     0.0.0.0:443          0.0.0.0:*

# Show process names
$ ss -tlnp
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN 0       128     0.0.0.0:22           0.0.0.0:*         users:(("sshd",pid=1234,fd=3))
LISTEN 0       511     0.0.0.0:80           0.0.0.0:*         users:(("nginx",pid=5678,fd=6))

Advanced ss Queries

# Show all connections to a specific port
$ ss -t dst :443

# Show connections from a specific IP
$ ss -t src 192.168.1.20

# Show connections in a specific state
$ ss -t state established
$ ss -t state time-wait
$ ss -t state close-wait

# Count connections by state
$ ss -t | awk '{print $1}' | sort | uniq -c | sort -rn
   245 ESTAB
    12 TIME-WAIT
     3 CLOSE-WAIT
     1 State

# Show detailed TCP info (congestion window, RTT, etc.)
$ ss -ti
ESTAB 0 0 192.168.1.20:22 192.168.1.10:54321
     cubic wscale:7,7 rto:204 rtt:1.234/0.567 ato:40 mss:1448 cwnd:10 ssthresh:20

# Show memory usage per socket
$ ss -tm
ESTAB 0 0 192.168.1.20:22 192.168.1.10:54321
     skmem:(r0,rb131072,t0,tb87040,f0,w0,o0,bl0,d0)

Finding Connection Problems with ss

# Large Recv-Q or Send-Q indicates a problem
$ ss -t | awk '$2 > 0 || $3 > 0'
# Recv-Q > 0: application is not reading data fast enough
# Send-Q > 0: peer is not acknowledging data (network issue or slow peer)

# Too many TIME-WAIT connections (can exhaust port space)
$ ss -t state time-wait | wc -l

# CLOSE-WAIT connections (application did not close the socket)
$ ss -t state close-wait
# These indicate a bug in the application

nload and bmon: Simple Bandwidth Monitors

nload

# Install nload
$ sudo apt install nload

# Monitor all interfaces
$ nload

# Monitor a specific interface
$ nload eth0
Device eth0 [192.168.1.20] (1/2):
==========================================================
Incoming:
                       ####
                    ########
                  ############
Curr: 23.45 MBit/s
Avg:  18.90 MBit/s
Min:   0.12 MBit/s
Max:  95.23 MBit/s
Ttl: 123.45 GByte

Outgoing:
               ##
             #####
            #######
Curr:  5.67 MBit/s
Avg:   4.23 MBit/s
Min:   0.01 MBit/s
Max:  12.34 MBit/s
Ttl:  45.67 GByte

bmon

# Install bmon
$ sudo apt install bmon

# Run bmon
$ bmon

# bmon shows per-interface bandwidth with graphs
# Use arrow keys to select interfaces
# Press 'd' for detailed statistics
# Press 'g' to toggle graph

Monitoring Network Errors

# Show interface statistics including errors
$ ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP
    link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
    1234567890 9876543     0       0       0    1234
    TX:  bytes packets errors dropped carrier collsns
     987654321 8765432     0       0       0       0

# Key error counters:
# errors   - hardware-level errors (bad CRC, etc.)
# dropped  - packets dropped (often buffer overruns)
# carrier  - carrier sense errors (cable/physical issues)
# collsns  - collisions (should be 0 on full-duplex)
# More detailed statistics
$ ethtool -S eth0 | head -20
NIC statistics:
     rx_packets: 9876543
     tx_packets: 8765432
     rx_bytes: 1234567890
     tx_bytes: 987654321
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     rx_crc_errors: 0
     rx_frame_errors: 0

If you see errors or drops increasing, investigate:

  • Physical layer: Bad cables, loose connections, failing NIC
  • Buffer overruns: Increase ring buffer (ethtool -G eth0 rx 4096)
  • MTU mismatches: Jumbo frames on one side, standard on the other

MTR: Ping + Traceroute Combined

mtr (My Traceroute) combines ping and traceroute into a single tool. It continuously sends packets and shows per-hop latency and packet loss.

# Install mtr
$ sudo apt install mtr-tiny    # Debian/Ubuntu
$ sudo dnf install mtr         # Fedora/RHEL

# Run mtr to a destination
$ mtr example.com
                             My traceroute  [v0.95]
myhost (192.168.1.20) -> example.com (93.184.216.34)
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                         Packets               Pings
 Host                                  Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. gateway.local                       0.0%    50    0.5   0.6   0.3   1.2   0.2
 2. isp-router.example.net              0.0%    50    5.2   5.8   4.1   8.9   1.2
 3. core-router.isp.net                 0.0%    50    8.3   9.1   7.5  12.4   1.1
 4. peering.exchange.net                0.0%    50   15.2  16.8  14.1  22.3   2.1
 5. cdn-edge.example.com                0.5%    50   18.4  19.2  17.5  25.6   1.8
 6. example.com                         0.0%    50   20.1  21.3  18.9  28.4   2.3

Reading MTR Output

ColumnMeaning
Loss%Packet loss at this hop
SntPackets sent
LastMost recent ping time (ms)
AvgAverage ping time
BestBest (lowest) ping time
WrstWorst (highest) ping time
StDevStandard deviation (consistency)

Diagnosing Network Problems with MTR

Scenario 1: Packet loss at one hop that continues downstream
 3. router-a     0.0%    5.2ms
 4. router-b    15.0%   45.2ms    ← Loss starts here
 5. router-c    14.8%   48.1ms    ← Loss continues
 6. destination 15.2%   52.3ms    ← Loss continues
Diagnosis: The problem is at hop 4.

Scenario 2: Loss at one hop but NOT downstream
 3. router-a     0.0%    5.2ms
 4. router-b    50.0%   45.2ms    ← Looks like loss
 5. router-c     0.0%   12.1ms    ← But no loss here!
 6. destination  0.0%   15.3ms    ← Or here!
Diagnosis: Router-b is simply deprioritizing ICMP packets.
           This is normal and NOT a problem.

Scenario 3: Latency spike at one hop
 3. router-a     0.0%    5.2ms
 4. router-b     0.0%   85.2ms    ← Huge jump
 5. router-c     0.0%   86.1ms    ← Stays high
 6. destination  0.0%   87.3ms    ← Stays high
Diagnosis: Congestion or routing issue at hop 4.
# Generate a report (non-interactive)
$ mtr -r -c 100 example.com
# -r = report mode
# -c 100 = send 100 packets

# Use TCP instead of ICMP (gets through more firewalls)
$ mtr -T -P 443 example.com

tcpdump: Packet Capture

tcpdump captures individual network packets. It is the most powerful network diagnostic tool and the one you reach for when nothing else shows you what is happening.

# Capture all traffic on an interface
$ sudo tcpdump -i eth0

# Capture traffic on a specific port
$ sudo tcpdump -i eth0 port 80

# Capture traffic to/from a specific host
$ sudo tcpdump -i eth0 host 192.168.1.10

# Capture only TCP SYN packets (new connections)
$ sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0'

# Save capture to a file (for analysis with Wireshark)
$ sudo tcpdump -i eth0 -w /tmp/capture.pcap -c 1000
# -c 1000 = capture 1000 packets then stop

# Read a capture file
$ sudo tcpdump -r /tmp/capture.pcap

# Show packet contents in ASCII
$ sudo tcpdump -i eth0 -A port 80 | head -50

# Show packet contents in hex and ASCII
$ sudo tcpdump -i eth0 -X port 80 | head -50

# Capture with timestamps
$ sudo tcpdump -i eth0 -tttt port 443

Common tcpdump Filters

# DNS queries
$ sudo tcpdump -i eth0 port 53

# HTTP traffic
$ sudo tcpdump -i eth0 port 80 or port 443

# Traffic between two specific hosts
$ sudo tcpdump -i eth0 host 192.168.1.10 and host 192.168.1.20

# Only incoming traffic
$ sudo tcpdump -i eth0 dst host $(hostname -I | awk '{print $1}')

# Exclude SSH traffic (useful when capturing over SSH)
$ sudo tcpdump -i eth0 not port 22

# Large packets (possible MTU issues)
$ sudo tcpdump -i eth0 'greater 1500'

WARNING: tcpdump on a busy server generates enormous output. Always use filters to narrow down the traffic. Use -c to limit the number of packets captured. Capturing to a file (-w) is more efficient than displaying on screen.

Think About It: You are connected to a server via SSH and need to capture network traffic. If you run tcpdump -i eth0 without any filters, what problem will you immediately encounter?


Debug This

Users report intermittent slowness when accessing your web server. The application itself is healthy. You suspect a network issue.

Investigation steps:

# Step 1: Check for errors on the interface
$ ip -s link show eth0
# Look for errors, dropped packets

# Step 2: Check connection states
$ ss -t state established | wc -l
# If this number is very high (thousands), you may have connection exhaustion

# Step 3: Check for retransmissions
$ ss -ti | grep -c retrans
# High retransmission count indicates packet loss

# Step 4: MTR to clients (or from client to server)
$ mtr -r -c 100 client-ip
# Look for packet loss along the path

# Step 5: tcpdump for TCP retransmissions
$ sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0' -c 100
# Are SYN packets being retransmitted? That means connection setup is failing.

Common findings:

  • Packet loss at an intermediate router: contact ISP
  • NIC errors increasing: replace cable or NIC
  • Too many TIME-WAIT connections: tune net.ipv4.tcp_tw_reuse
  • Large Send-Q values: application or peer cannot keep up

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  Network monitoring tools:                                │
│                                                           │
│  Bandwidth monitoring:                                    │
│  - iftop: per-connection bandwidth (who is talking?)      │
│  - nethogs: per-process bandwidth (which app?)            │
│  - nload/bmon: per-interface bandwidth (how much?)        │
│                                                           │
│  Benchmarking:                                            │
│  - iperf3: max throughput between two points              │
│                                                           │
│  Connection analysis:                                     │
│  - ss: socket states, queues, processes                   │
│  - ss -ti: TCP internals (RTT, cwnd, retrans)            │
│                                                           │
│  Path analysis:                                           │
│  - mtr: per-hop latency and packet loss                   │
│                                                           │
│  Packet capture:                                          │
│  - tcpdump: capture and analyze individual packets        │
│                                                           │
│  Error monitoring:                                        │
│  - ip -s link: interface error counters                   │
│  - ethtool -S: NIC-level statistics                       │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. iftop exploration: Run sudo iftop -P on your system and browse the web in another window. Watch the connections appear and disappear. Press n to toggle DNS resolution.

  2. nethogs discovery: Run sudo nethogs and start a large download or run apt update. Identify which process is consuming the most bandwidth.

  3. iperf3 benchmark: Set up iperf3 between two machines (or between your machine and a public iperf3 server). Measure your actual throughput and compare it to your link speed.

  4. ss deep dive: Run ss -t state established and identify all connections on your system. How many are SSH? HTTP? Are there any in CLOSE-WAIT state (indicating application bugs)?

  5. MTR diagnosis: Run mtr -r -c 100 to several different destinations (google.com, your ISP, a server in another country). Compare the latency and loss at each hop. Can you identify where the latency jumps significantly?

  6. Bonus challenge: Capture HTTP traffic with tcpdump -i eth0 -A port 80 while making a curl request to an HTTP (not HTTPS) site. Can you read the HTTP headers and body in the capture? Now try the same with port 443 -- can you read HTTPS traffic? Why or why not?

File Descriptors & Resource Limits

Why This Matters

It is 2 PM on a busy day. Your web server suddenly starts rejecting connections. The error log is full of messages like:

accept4(): Too many open files

The server is not out of CPU or memory. It has hit its file descriptor limit. Every network connection, every open file, every pipe, every socket -- each one consumes a file descriptor. When you run out, the process cannot open anything new. No new connections. No new files. No new logs.

This is one of the most common production issues, and it catches people off guard because "too many open files" sounds like a disk problem when it is actually a kernel resource limit problem. Understanding file descriptors and resource limits is essential for running any kind of server under load.


Try This Right Now

# How many files does your shell have open?
$ ls -l /proc/$$/fd
total 0
lrwx------ 1 user user 64 Jan 18 14:00 0 -> /dev/pts/0
lrwx------ 1 user user 64 Jan 18 14:00 1 -> /dev/pts/0
lrwx------ 1 user user 64 Jan 18 14:00 2 -> /dev/pts/0
lr-x------ 1 user user 64 Jan 18 14:00 255 -> /dev/pts/0

# What are your current limits?
$ ulimit -n    # max open files for this shell
1024

# How many files are open system-wide?
$ cat /proc/sys/fs/file-nr
3456    0    9223372036854775807
#  ^    ^    ^
#  |    |    System-wide maximum
#  |    Free (unused, allocated but free)
#  Currently allocated

# What is the system-wide maximum?
$ cat /proc/sys/fs/file-max
9223372036854775807

What File Descriptors Are

A file descriptor (FD) is a small non-negative integer that the kernel uses to identify an open file, socket, pipe, or device within a process. It is an index into the process's table of open files.

┌──────────────────────────────────────────────────────────┐
│            PROCESS FILE DESCRIPTOR TABLE                   │
│                                                           │
│   FD    Points To                    Purpose              │
│   ──    ─────────                    ───────              │
│   0     /dev/pts/0                   stdin  (keyboard)    │
│   1     /dev/pts/0                   stdout (screen)      │
│   2     /dev/pts/0                   stderr (screen)      │
│   3     /var/log/app.log             log file             │
│   4     socket:[12345]               TCP connection       │
│   5     socket:[12346]               TCP connection       │
│   6     pipe:[67890]                 pipe to child proc   │
│   7     /etc/app.conf                config file          │
│   ...                                                     │
│                                                           │
│   Every open() or socket() call returns the next          │
│   available FD number. When closed, the number is freed.  │
└──────────────────────────────────────────────────────────┘

The Three Standard File Descriptors

Every process starts with three file descriptors already open:

FDNameDefaultShell Symbol
0stdinTerminal (keyboard)< or 0<
1stdoutTerminal (screen)> or 1>
2stderrTerminal (screen)2>

This is why shell redirection works the way it does:

# Redirect stdout (FD 1) to a file
$ command > output.txt       # same as: command 1> output.txt

# Redirect stderr (FD 2) to a file
$ command 2> errors.txt

# Redirect both stdout and stderr
$ command > output.txt 2>&1  # stderr goes where stdout goes

# Redirect stdin (FD 0)
$ command < input.txt        # same as: command 0< input.txt

# Discard stderr
$ command 2>/dev/null

What Consumes File Descriptors?

Everything that involves I/O in Linux uses file descriptors:

  • Regular files (open())
  • Directories
  • Network sockets (TCP, UDP, Unix domain)
  • Pipes (between processes)
  • Devices (/dev/*)
  • Event descriptors (eventfd, epoll, inotify)
  • Timer descriptors (timerfd)
  • Signal descriptors (signalfd)

A busy web server might have:

  • Hundreds of client TCP connections (one FD each)
  • Connections to databases (FDs for each)
  • Open log files
  • Pipes to CGI processes
  • epoll FD for event monitoring

Think About It: A web server handles 10,000 concurrent connections, has 5 log files open, 10 database connections, and a few internal pipes. Approximately how many file descriptors does it need? Is the default limit of 1024 sufficient?


Exploring /proc/PID/fd

Every process has its open file descriptors listed under /proc/PID/fd/:

# Find a process to inspect (e.g., your shell)
$ echo $$
1234

# List its file descriptors
$ ls -la /proc/1234/fd/
total 0
dr-x------ 2 user user 0 Jan 18 14:00 .
dr-xr-xr-x 9 user user 0 Jan 18 14:00 ..
lrwx------ 1 user user 64 Jan 18 14:00 0 -> /dev/pts/0
lrwx------ 1 user user 64 Jan 18 14:00 1 -> /dev/pts/0
lrwx------ 1 user user 64 Jan 18 14:00 2 -> /dev/pts/0
lr-x------ 1 user user 64 Jan 18 14:00 255 -> /dev/pts/0

# Count file descriptors for a process
$ ls /proc/1234/fd/ | wc -l
4

# Inspect nginx's file descriptors
$ sudo ls -la /proc/$(pgrep -f "nginx: master" | head -1)/fd/
lrwx------ 1 root root 64 Jan 18 14:00 0 -> /dev/null
lrwx------ 1 root root 64 Jan 18 14:00 1 -> /dev/null
l-wx------ 1 root root 64 Jan 18 14:00 2 -> /var/log/nginx/error.log
lrwx------ 1 root root 64 Jan 18 14:00 3 -> socket:[45678]
l-wx------ 1 root root 64 Jan 18 14:00 4 -> /var/log/nginx/access.log
lrwx------ 1 root root 64 Jan 18 14:00 5 -> socket:[45679]
lrwx------ 1 root root 64 Jan 18 14:00 6 -> socket:[45680]

# See file descriptor limits for a process
$ cat /proc/1234/limits | grep "Max open files"
Max open files            1024                 1048576              files
#                         ^^^^                 ^^^^^^^
#                      Soft limit            Hard limit

lsof: List Open Files

lsof (List Open Files) is the swiss army knife for investigating file descriptors.

Basic lsof Usage

# List all open files (WARNING: huge output!)
$ sudo lsof | wc -l
25678

# List open files for a specific process
$ sudo lsof -p 1234
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
nginx    1234 root  cwd    DIR  253,0     4096     2 /
nginx    1234 root  rtd    DIR  253,0     4096     2 /
nginx    1234 root  txt    REG  253,0  1234567 12345 /usr/sbin/nginx
nginx    1234 root  mem    REG  253,0   234567 23456 /lib/x86_64-linux-gnu/libc.so.6
nginx    1234 root    0u   CHR    1,3      0t0     5 /dev/null
nginx    1234 root    1u   CHR    1,3      0t0     5 /dev/null
nginx    1234 root    2w   REG  253,0    45678 34567 /var/log/nginx/error.log
nginx    1234 root    3u  IPv4  45678      0t0   TCP *:80 (LISTEN)
nginx    1234 root    4w   REG  253,0   123456 45678 /var/log/nginx/access.log

FD Column Meanings

FDMeaning
cwdCurrent working directory
rtdRoot directory
txtProgram text (executable)
memMemory-mapped file
0u, 1u, 2wFD number with access mode (r=read, w=write, u=read/write)

Common lsof Queries

# What process has a specific file open?
$ sudo lsof /var/log/syslog
COMMAND   PID    USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
rsyslogd  890    root    7w   REG  253,0   456789 12345 /var/log/syslog

# What files does a specific user have open?
$ sudo lsof -u www-data | head -20

# What network connections does a process have?
$ sudo lsof -i -p 1234
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nginx  1234 root    3u  IPv4  45678      0t0  TCP *:http (LISTEN)
nginx  1234 root    5u  IPv4  45679      0t0  TCP myhost:http->client:54321 (ESTABLISHED)

# What is listening on a specific port?
$ sudo lsof -i :80
$ sudo lsof -i :443

# Count open files per process
$ sudo lsof | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
   4567 nginx
   2345 mysqld
   1234 java
    567 sshd
    234 systemd

# Find processes with the most FDs
$ for pid in /proc/[0-9]*/fd; do
    count=$(ls "$pid" 2>/dev/null | wc -l)
    procname=$(cat "${pid%/fd}/comm" 2>/dev/null)
    echo "$count $procname (${pid%/fd})"
  done | sort -rn | head -10

# Find deleted files that are still held open (common disk space issue!)
$ sudo lsof +L1
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NLINK  NODE NAME
java     5678 app    12w   REG  253,0 5.2G        0 56789 /var/log/app.log (deleted)

That last one is critical: a deleted file that is still held open by a process continues to consume disk space until the process closes it or exits. This is a common cause of "the disk is full but I cannot find what is using the space."


ulimit: Per-Process Resource Limits

ulimit controls resource limits for the current shell and its child processes.

Viewing Limits

# Show all limits
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63340
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024          ← This one!
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63340
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

# Show just the open files limit
$ ulimit -n
1024

# Show hard limit (maximum the soft limit can be raised to)
$ ulimit -Hn
1048576

# Show soft limit (currently enforced)
$ ulimit -Sn
1024

Soft vs Hard Limits

┌──────────────────────────────────────────────────────────┐
│              SOFT vs HARD LIMITS                          │
│                                                           │
│  Hard Limit: Maximum ceiling. Only root can raise it.     │
│  Soft Limit: Currently enforced limit. Users can raise    │
│              it up to the hard limit.                     │
│                                                           │
│  Example:                                                 │
│  Hard limit = 65536                                       │
│  Soft limit = 1024                                        │
│                                                           │
│  A regular user can do:                                   │
│    ulimit -n 65536    ← raises soft to hard limit (OK)    │
│    ulimit -n 100000   ← exceeds hard limit (DENIED)       │
│                                                           │
│  Root can do:                                             │
│    ulimit -Hn 100000  ← raises hard limit (OK)            │
│    ulimit -n 100000   ← then raises soft limit (OK)       │
│                                                           │
└──────────────────────────────────────────────────────────┘

Changing Limits

# Raise the soft limit for the current shell (up to hard limit)
$ ulimit -n 65536

# This only affects the current shell and its children
# To make it permanent, use limits.conf or systemd

/etc/security/limits.conf

This file sets limits for users and groups at login time (via PAM).

$ sudo vim /etc/security/limits.conf
# /etc/security/limits.conf
#
# Format: <domain> <type> <item> <value>
#
# domain: username, @groupname, or * for everyone
# type:   soft or hard
# item:   nofile, nproc, memlock, etc.
# value:  the limit

# Set limits for the nginx user
nginx    soft    nofile    65536
nginx    hard    nofile    65536

# Set limits for all users in the webapps group
@webapps soft    nofile    65536
@webapps hard    nofile    65536

# Set limits for all users
*        soft    nofile    4096
*        hard    nofile    65536

# Limit max processes for regular users (fork bomb protection)
*        soft    nproc     4096
*        hard    nproc     8192

# Allow the database user to lock memory
mysql    soft    memlock   unlimited
mysql    hard    memlock   unlimited

You can also use drop-in files in /etc/security/limits.d/:

$ sudo vim /etc/security/limits.d/99-custom.conf
# Custom limits for web applications
www-data  soft  nofile  65536
www-data  hard  nofile  65536

Distro Note: On systems using systemd, limits.conf only applies to user login sessions (via SSH or console). For services managed by systemd, you must use the systemd service configuration instead.


System-Wide Limits

Beyond per-process limits, there are system-wide kernel parameters:

fs.file-max

The maximum number of file descriptors the kernel will allocate system-wide.

# View current limit
$ cat /proc/sys/fs/file-max
9223372036854775807

# View current usage
$ cat /proc/sys/fs/file-nr
3456    0    9223372036854775807
# allocated  free  max

# Set a new limit (rarely needed on modern kernels)
$ sudo sysctl -w fs.file-max=2000000
$ echo 'fs.file-max = 2000000' | sudo tee -a /etc/sysctl.d/99-file-max.conf

fs.nr_open

The maximum number of file descriptors a single process can have. This is the upper bound for ulimit -n.

$ cat /proc/sys/fs/nr_open
1048576

# Increase it if you need per-process limits higher than 1M
$ sudo sysctl -w fs.nr_open=2000000

Relationship Between Limits

┌──────────────────────────────────────────────────────────┐
│              LIMIT HIERARCHY                              │
│                                                           │
│  fs.file-max (system-wide total)                          │
│     │                                                     │
│     ├── fs.nr_open (per-process maximum)                  │
│     │      │                                              │
│     │      ├── hard limit (per-user/group, in limits.conf)│
│     │      │      │                                       │
│     │      │      └── soft limit (currently enforced)     │
│     │      │                                              │
│     │      └── soft limit <= hard limit <= nr_open        │
│     │                                                     │
│     └── sum of all processes' open FDs <= file-max        │
│                                                           │
│  Example chain:                                           │
│  fs.file-max = 2000000                                    │
│  fs.nr_open  = 1048576                                    │
│  hard limit  = 65536                                      │
│  soft limit  = 1024                                       │
│                                                           │
│  A process can open up to 1024 files (soft limit).        │
│  User can raise to 65536 (hard limit).                    │
│  Root can raise to 1048576 (nr_open).                     │
│  Total across all processes: up to 2000000 (file-max).    │
└──────────────────────────────────────────────────────────┘

Troubleshooting "Too Many Open Files"

This is one of the most common production issues. Here is a systematic approach:

Step 1: Identify the Affected Process

# Check which process is hitting the limit
$ dmesg | grep "too many open files"
$ journalctl -xe | grep "too many open files"

# Or check the application's error log
$ grep -i "too many open files" /var/log/nginx/error.log

Step 2: Check Current FD Usage

# Count open FDs for the process
$ ls /proc/$(pgrep -f nginx | head -1)/fd | wc -l
1023

# Check the process's limit
$ cat /proc/$(pgrep -f nginx | head -1)/limits | grep "Max open files"
Max open files            1024                 1024                 files
#                         ^^^^ soft            ^^^^ hard

The process has 1023 of 1024 file descriptors in use -- it is at the limit.

Step 3: Investigate What the FDs Are

# Are they all network connections?
$ sudo lsof -p $(pgrep -f nginx | head -1) | awk '{print $5}' | sort | uniq -c | sort -rn
   890 IPv4     ← 890 network connections!
    45 REG      ← 45 regular files
    12 unix     ← 12 unix sockets
     3 DIR      ← 3 directories
     2 CHR      ← 2 character devices
     1 FIFO     ← 1 pipe

# Is there a file descriptor leak? (Are FDs increasing over time?)
$ while true; do echo "$(date): $(ls /proc/$(pgrep -f nginx | head -1)/fd | wc -l)"; sleep 10; done

Step 4: Fix It

# Option A: Increase limits in systemd service file
$ sudo systemctl edit nginx.service
[Service]
LimitNOFILE=65536
$ sudo systemctl daemon-reload
$ sudo systemctl restart nginx

# Verify the new limit
$ cat /proc/$(pgrep -f nginx | head -1)/limits | grep "Max open files"
Max open files            65536                65536                files
# Option B: If using limits.conf (for non-systemd processes)
$ echo "nginx soft nofile 65536" | sudo tee -a /etc/security/limits.d/nginx.conf
$ echo "nginx hard nofile 65536" | sudo tee -a /etc/security/limits.d/nginx.conf

# Option C: If it is a file descriptor LEAK, the real fix is fixing the application
# (Increasing limits just delays the inevitable)

systemd Resource Controls

For services managed by systemd, resource limits are set in the service unit file:

[Service]
# File descriptor limit
LimitNOFILE=65536

# Max processes
LimitNPROC=4096

# Max locked memory (for databases)
LimitMEMLOCK=infinity

# Max core dump size
LimitCORE=infinity

# CPU time limit (in seconds)
LimitCPU=infinity

# Max file size
LimitFSIZE=infinity

# Max address space
LimitAS=infinity
# Apply to an existing service without editing the main file
$ sudo systemctl edit nginx.service
# This creates an override file at
# /etc/systemd/system/nginx.service.d/override.conf

# Verify the effective limits
$ sudo systemctl show nginx.service | grep LimitNOFILE
LimitNOFILE=65536
LimitNOFILESoft=65536

Checking Resource Usage of systemd Services

# See resource usage of a service
$ systemctl status nginx.service
    Tasks: 5 (limit: 4096)
    Memory: 12.5M
    CPU: 2.345s

# Detailed cgroup view
$ systemd-cgtop
Control Group      Tasks   %CPU   Memory
/                  245      5.2   4.5G
/system.slice       78      3.1   2.1G
/system.slice/ngin   5      0.8  12.5M
/user.slice        167      2.1   2.4G

Debug This

A Java application fails to start with:

java.io.IOException: Too many open files

You check and find:

$ ulimit -n
1024
$ cat /etc/security/limits.d/java.conf
javauser soft nofile 65536
javauser hard nofile 65536

The limits.conf looks correct. But the limit is still 1024. Why?

Common causes:

  1. The application is started by systemd, and systemd does not read limits.conf. Fix: add LimitNOFILE=65536 to the systemd service file.

  2. The PAM module is not loaded. limits.conf requires pam_limits.so. Check:

$ grep pam_limits /etc/pam.d/common-session
session required pam_limits.so
  1. The user running the process is different from what you think. Check:
$ ps aux | grep java
root     5678  ... java -jar app.jar
# Running as root, not as javauser!
  1. The shell session was started before the limits.conf change. Log out and back in, or start a new session.

Think About It: If you increase the file descriptor limit to 1,000,000 for a process, does that mean it will actually use that many? What is the cost of a higher limit if the FDs are not actually used?


┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  File descriptors are how Linux tracks open files:        │
│  - FD 0 = stdin, FD 1 = stdout, FD 2 = stderr            │
│  - Every file, socket, pipe, device uses an FD            │
│  - /proc/PID/fd shows a process's open FDs                │
│  - lsof lists open files and their FD details             │
│                                                           │
│  Resource limits control FD usage:                        │
│  - ulimit -n: per-process soft limit                      │
│  - /etc/security/limits.conf: persistent limits           │
│  - LimitNOFILE in systemd: for managed services           │
│  - fs.file-max: system-wide total                         │
│  - fs.nr_open: max per-process ceiling                    │
│                                                           │
│  "Too many open files" troubleshooting:                   │
│  1. Find the process (dmesg, app logs)                    │
│  2. Count its FDs (ls /proc/PID/fd | wc -l)             │
│  3. Check its limits (cat /proc/PID/limits)              │
│  4. Determine if it is a limit issue or a leak            │
│  5. Fix via systemd LimitNOFILE or limits.conf            │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. Explore your FDs: Run ls -la /proc/$$/fd/ to see your shell's file descriptors. Open a file with exec 3>/tmp/testfile, run ls -la /proc/$$/fd/ again, and see FD 3 appear. Close it with exec 3>&-.

  2. lsof investigation: Use lsof -u $USER to see all files you have open. Count them. How many are regular files vs sockets vs pipes?

  3. Limit testing: Set ulimit -n 10 in a shell. Then try to open many files with a simple script. Observe the "Too many open files" error.

  4. Process FD counting: Write a one-liner that finds the process with the most open file descriptors on your system. Use /proc/*/fd and wc -l.

  5. Bonus challenge: Create a systemd service that runs a simple script. Set LimitNOFILE=100 in the service file. Have the script try to open 200 files. Check journalctl for the failure. Then increase the limit to 300 and verify it works.

Package Managers Deep Dive

Why This Matters

It is 2 AM and your monitoring system fires an alert: a critical security vulnerability has been disclosed in OpenSSL. Every server you manage needs to be patched before business hours. Without a package manager, you would need to manually download source code on each server, compile it, figure out which files to replace, and hope you do not break anything. With a package manager, the fix is one command per server -- or one Ansible playbook for all of them.

Package managers are the backbone of software management on Linux. They handle downloading software, resolving dependencies, tracking installed files, applying updates, and cleanly removing programs. Every Linux administrator uses a package manager dozens of times a day. Understanding how they work -- not just the basic commands, but the underlying architecture -- separates administrators who react to problems from those who prevent them.

This chapter takes you deep into the three major package management ecosystems: APT (Debian/Ubuntu), DNF/YUM (Fedora/RHEL), and Pacman (Arch Linux). By the end, you will be able to install, query, pin, clean, and troubleshoot packages on any major distribution.


Try This Right Now

Check which package manager your system uses and how many packages are currently installed:

# Debian/Ubuntu
$ dpkg --list | wc -l

# Fedora/RHEL
$ rpm -qa | wc -l

# Arch Linux
$ pacman -Q | wc -l

You will likely see hundreds or even thousands of packages. Every one of those was installed, tracked, and has its dependencies satisfied by your package manager.


What Package Managers Actually Do

A package manager is not just an installer. It is a database administrator, a dependency solver, a file tracker, and a download manager rolled into one.

┌──────────────────────────────────────────────────────────┐
│                  PACKAGE MANAGER                          │
│                                                           │
│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────┐  │
│  │  Repository  │  │  Dependency  │  │   Local         │  │
│  │  Metadata    │  │  Resolver    │  │   Database      │  │
│  │  (what's     │  │  (what else  │  │   (what's       │  │
│  │  available)  │  │  is needed)  │  │   installed)    │  │
│  └──────┬──────┘  └──────┬───────┘  └────────┬────────┘  │
│         │                │                    │           │
│         v                v                    v           │
│  ┌─────────────────────────────────────────────────────┐  │
│  │             Package Operations                      │  │
│  │  Install  |  Update  |  Remove  |  Query  | Verify  │  │
│  └─────────────────────────────────────────────────────┘  │
│                          │                                │
│                          v                                │
│  ┌─────────────────────────────────────────────────────┐  │
│  │               File System                           │  │
│  │   /usr/bin  /usr/lib  /etc  /usr/share  ...         │  │
│  └─────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────┘

The Core Functions

  1. Installation: Download a package and all its dependencies, verify integrity, extract files to the correct locations
  2. Dependency resolution: If package A needs libraries from packages B and C, install B and C first
  3. Tracking: Record every file installed by every package so nothing gets lost
  4. Updates: Compare installed versions against repository versions, upgrade what is outdated
  5. Removal: Remove a package and optionally its no-longer-needed dependencies
  6. Querying: Search for packages, list files owned by a package, find which package provides a file

Package Formats

The two dominant binary package formats on Linux are:

FormatExtensionUsed ByTool
Debian.debDebian, Ubuntu, Mint, Pop!_OSdpkg
RPM.rpmFedora, RHEL, CentOS, openSUSErpm

Arch Linux uses its own compressed tar archive format (.pkg.tar.zst).

Think About It: Why do Linux distributions use binary packages at all? Why not just distribute source code and compile everything on the user's machine? (Hint: think about time, resources, and reproducibility.)


The APT Ecosystem (Debian/Ubuntu)

APT is a layered system. Understanding the layers prevents confusion about which tool to use when.

┌───────────────────────────────────────────┐
│  apt (high-level, user-friendly CLI)      │  <-- Use this
├───────────────────────────────────────────┤
│  apt-get / apt-cache (classic CLI tools)  │  <-- Use in scripts
├───────────────────────────────────────────┤
│  libapt (APT library, C++)               │
├───────────────────────────────────────────┤
│  dpkg (low-level package installer)       │  <-- Installs .deb files
└───────────────────────────────────────────┘
  • dpkg: Installs, removes, and queries individual .deb files. Does not resolve dependencies or download packages.
  • apt-get / apt-cache: The classic high-level tools. They handle repositories, downloads, and dependency resolution. Stable interface, preferred in scripts.
  • apt: The modern unified command. Combines the most common functions of apt-get and apt-cache with a nicer interface (progress bars, color). Preferred for interactive use.

sources.list: Where Packages Come From

APT fetches packages from repositories defined in /etc/apt/sources.list and files under /etc/apt/sources.list.d/.

$ cat /etc/apt/sources.list

A typical entry looks like:

deb http://deb.debian.org/debian bookworm main contrib non-free-firmware
deb-src http://deb.debian.org/debian bookworm main contrib non-free-firmware
deb http://security.debian.org/debian-security bookworm-security main contrib non-free-firmware

Breaking down a line:

deb       http://deb.debian.org/debian   bookworm   main contrib non-free-firmware
^^^       ^^^^^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
type      repository URL                 release    components
(binary
or source)
  • deb: Binary packages. deb-src: Source packages.
  • URL: The mirror hosting the packages.
  • Release: The codename (bookworm, jammy) or class (stable, testing).
  • Components: Sections within the repository. main is fully free software. contrib and non-free contain packages with different licensing.

Distro Note: Ubuntu uses components named main, restricted, universe, and multiverse. Debian uses main, contrib, and non-free. The naming differs but the concept is the same.

Modern DEB822 Format

Newer Debian and Ubuntu releases are migrating to the DEB822 .sources format in /etc/apt/sources.list.d/:

Types: deb deb-src
URIs: http://deb.debian.org/debian
Suites: bookworm bookworm-updates
Components: main contrib non-free-firmware
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

This format is more readable and explicitly ties repositories to their signing keys.

Essential APT Commands

Update the package index (always do this first):

$ sudo apt update
Hit:1 http://deb.debian.org/debian bookworm InRelease
Get:2 http://security.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Fetched 48.0 kB in 1s (42.3 kB/s)
Reading package lists... Done
Building dependency tree... Done
15 packages can be upgraded. Run 'apt list --upgradable' to see them.

Install a package:

$ sudo apt install nginx
Reading package lists... Done
Building dependency tree... Done
The following additional packages will be installed:
  libnginx-mod-http-geoip2 libnginx-mod-http-image-filter ...
The following NEW packages will be installed:
  libnginx-mod-http-geoip2 libnginx-mod-http-image-filter nginx nginx-common nginx-core
0 upgraded, 5 newly installed, 0 to remove and 15 not upgraded.
Need to get 1,847 kB of archives.
Do you want to continue? [Y/n]

Upgrade all installed packages:

$ sudo apt upgrade          # Safe upgrade: never removes packages
$ sudo apt full-upgrade     # May remove packages to resolve conflicts

Safety Warning: apt full-upgrade can remove packages. Always review what it proposes before confirming. On production servers, prefer apt upgrade and handle conflicts manually.

Search for packages:

$ apt search "web server"
nginx/stable 1.22.1-9 amd64
  small, powerful, scalable web/reverse proxy server

apache2/stable 2.4.57-2 amd64
  Apache HTTP Server

Show package details:

$ apt show nginx
Package: nginx
Version: 1.22.1-9
Depends: nginx-core (<< 1.22.1-9.1~) | nginx-full (<< 1.22.1-9.1~) ...
Description: small, powerful, scalable web/reverse proxy server

Remove a package:

$ sudo apt remove nginx          # Removes the package, keeps config files
$ sudo apt purge nginx           # Removes the package AND config files
$ sudo apt autoremove             # Removes orphaned dependencies

List installed packages:

$ apt list --installed
$ apt list --installed | grep nginx

dpkg: The Low-Level Tool

When you need to work with individual .deb files or query the package database directly:

# Install a local .deb file
$ sudo dpkg -i package.deb

# List all installed packages
$ dpkg -l

# List files installed by a package
$ dpkg -L nginx
/.
/usr
/usr/sbin
/usr/sbin/nginx
/usr/share/doc/nginx
...

# Find which package owns a file
$ dpkg -S /usr/sbin/nginx
nginx-core: /usr/sbin/nginx

# Show package status
$ dpkg -s nginx

When dpkg -i fails due to missing dependencies, fix it with:

$ sudo dpkg -i some-package.deb
dpkg: dependency problems prevent configuration of some-package:
 some-package depends on libfoo; however: Package libfoo is not installed.

$ sudo apt install -f    # Fix broken dependencies

PPAs (Ubuntu)

Personal Package Archives let developers distribute packages outside the official repositories. They are common on Ubuntu.

# Add a PPA
$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt update
$ sudo apt install python3.12

Safety Warning: PPAs are not officially vetted. Only add PPAs from sources you trust. A malicious PPA could replace system packages with compromised versions.

To remove a PPA:

$ sudo add-apt-repository --remove ppa:deadsnakes/ppa
$ sudo apt update

APT Pinning

Sometimes you want to hold a package at a specific version or prefer packages from one repository over another. This is called pinning.

Hold a package at its current version:

$ sudo apt-mark hold nginx
nginx set on hold.

$ sudo apt upgrade
The following packages have been kept back:
  nginx

Release the hold:

$ sudo apt-mark unhold nginx

Install a specific version:

# List available versions
$ apt list -a nginx
nginx/stable 1.22.1-9 amd64
nginx/oldstable 1.18.0-6.1+deb11u3 amd64

# Install a specific version
$ sudo apt install nginx=1.22.1-9

For advanced pinning, create a file in /etc/apt/preferences.d/:

Package: nginx
Pin: version 1.22.1-9
Pin-Priority: 1001

Priority values: 1001 forces a downgrade if needed, 500 is the default, 100 means only install if no other version is available.


The DNF/YUM Ecosystem (Fedora/RHEL)

DNF (Dandified YUM) is the successor to YUM. On RHEL 8+ and Fedora 22+, dnf is the standard. On older systems (RHEL 7 and earlier), yum is used. The command syntax is nearly identical.

┌──────────────────────────────────────────┐
│  dnf (high-level package manager)        │
├──────────────────────────────────────────┤
│  libdnf / hawkey (dependency solver)     │
├──────────────────────────────────────────┤
│  rpm (low-level package installer)       │
└──────────────────────────────────────────┘

Repository Configuration

Repos are defined in /etc/yum.repos.d/ as .repo files:

$ cat /etc/yum.repos.d/fedora.repo
[fedora]
name=Fedora $releasever - $basearch
metalink=https://mirrors.fedoraproject.org/metalink?repo=fedora-$releasever&arch=$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-$releasever-$basearch

Key fields:

  • enabled: 1 (active) or 0 (disabled)
  • gpgcheck: 1 to verify package signatures (always leave this on)
  • gpgkey: Path to the GPG key for signature verification
  • baseurl or metalink: Where to find packages

Essential DNF Commands

# Update repository metadata
$ sudo dnf check-update

# Install a package
$ sudo dnf install nginx
Dependencies resolved.
===========================================================================
 Package              Arch        Version              Repository    Size
===========================================================================
Installing:
 nginx                x86_64      1:1.24.0-1.fc39      updates      42 k
Installing dependencies:
 nginx-core           x86_64      1:1.24.0-1.fc39      updates     589 k
 nginx-filesystem     noarch      1:1.24.0-1.fc39      updates      11 k

Transaction Summary
===========================================================================
Install  3 Packages

Total download size: 642 k
Is this ok [y/N]:

Upgrade packages:

$ sudo dnf upgrade               # Upgrade all packages
$ sudo dnf upgrade nginx          # Upgrade a specific package
$ sudo dnf upgrade --security     # Only security updates

Search and info:

$ dnf search "web server"
$ dnf info nginx
$ dnf provides /usr/sbin/nginx     # Find which package provides a file

Remove packages:

$ sudo dnf remove nginx
$ sudo dnf autoremove               # Remove unneeded dependencies

List packages:

$ dnf list installed
$ dnf list available
$ dnf list updates

Transaction history:

One of DNF's most powerful features -- every operation is recorded and can be undone:

$ dnf history
ID     | Command line              | Date and time    | Action(s)  | Altered
-------+---------------------------+------------------+------------+--------
    15 | install nginx              | 2024-03-15 10:22 | Install    |    3
    14 | upgrade                    | 2024-03-14 22:00 | Upgrade    |   28
    13 | remove httpd               | 2024-03-10 14:15 | Removed    |    5

# Undo a transaction
$ sudo dnf history undo 15    # Removes nginx and its dependencies

RPM: The Low-Level Tool

# Install a local .rpm file
$ sudo rpm -ivh package.rpm

# Query all installed packages
$ rpm -qa

# Query info about an installed package
$ rpm -qi nginx

# List files in an installed package
$ rpm -ql nginx

# Find which package owns a file
$ rpm -qf /usr/sbin/nginx
nginx-core-1.24.0-1.fc39.x86_64

# Verify package integrity (check if files have been modified)
$ rpm -V nginx
S.5....T.  c /etc/nginx/nginx.conf

The verification output tells you exactly what changed. The flags mean:

  • S: File size differs
  • 5: MD5 checksum differs
  • T: Modification time differs
  • c: This is a config file

DNF Modules (RHEL/CentOS Stream)

RHEL uses modules to provide multiple versions of software from the same repository:

$ dnf module list nodejs
Name    Stream   Profiles                      Summary
nodejs  18       common [d], development       Javascript runtime
nodejs  20       common [d], development       Javascript runtime

$ sudo dnf module enable nodejs:20
$ sudo dnf install nodejs

Version Locking with DNF

# Install the versionlock plugin
$ sudo dnf install python3-dnf-plugin-versionlock

# Lock a package
$ sudo dnf versionlock add nginx
Adding versionlock on: nginx-1:1.24.0-1.fc39.*

# List locked packages
$ sudo dnf versionlock list

# Remove a lock
$ sudo dnf versionlock delete nginx

Think About It: DNF records a full transaction history that can be undone. Why is this valuable on production servers? What could go wrong if an upgrade breaks something?


The Pacman Ecosystem (Arch Linux)

Pacman is Arch Linux's package manager. It is fast, simple, and does not try to do more than it needs to.

Essential Pacman Commands

Pacman uses single-letter flags. The main operations are:

  • -S: Sync (install/update from repositories)
  • -R: Remove
  • -Q: Query (local database)
  • -F: File query (which package provides a file)
# Synchronize package databases and upgrade all packages
$ sudo pacman -Syu
:: Synchronizing package databases...
 core                  130.4 KiB  1234 KiB/s 00:00
 extra                   8.7 MiB  5.32 MiB/s 00:02
 multilib              147.5 KiB  1567 KiB/s 00:00
:: Starting full system upgrade...
 there is nothing to do

# Install a package
$ sudo pacman -S nginx

# Search for a package
$ pacman -Ss "web server"
extra/nginx 1.24.0-1
    Lightweight HTTP server and IMAP/POP3 proxy server

# Show package info
$ pacman -Si nginx       # Remote info
$ pacman -Qi nginx       # Local (installed) info

# List files installed by a package
$ pacman -Ql nginx

# Find which package owns a file
$ pacman -Qo /usr/bin/nginx

# Remove a package and its unneeded dependencies
$ sudo pacman -Rsu nginx

# Remove a package, dependencies, and config files
$ sudo pacman -Rns nginx

Safety Warning: On Arch Linux, always run pacman -Syu (full system upgrade) before installing new packages. Partial upgrades (installing a package without upgrading) can break your system because Arch is a rolling release.

The AUR (Arch User Repository)

The AUR is a community-driven repository of build scripts (PKGBUILDs) for software not in the official repos. It is one of the largest software repositories on Linux.

Important: AUR packages are not officially vetted. You should always inspect the PKGBUILD before building.

Using makepkg manually:

# Install base-devel group (build tools)
$ sudo pacman -S --needed base-devel git

# Clone the AUR package
$ git clone https://aur.archlinux.org/yay.git
$ cd yay

# Inspect the build script
$ cat PKGBUILD

# Build and install
$ makepkg -si

Using an AUR helper (yay):

Once you have yay installed, it wraps pacman for both official and AUR packages:

$ yay -S some-aur-package
$ yay -Syu                    # Upgrade everything, including AUR packages

Pacman Configuration

Pacman is configured in /etc/pacman.conf:

[options]
HoldPkg     = pacman glibc
Architecture = auto
Color
CheckSpace
ParallelDownloads = 5

[core]
Include = /etc/pacman.d/mirrorlist

[extra]
Include = /etc/pacman.d/mirrorlist

Useful options:

  • ParallelDownloads: Download multiple packages simultaneously (significant speed improvement)
  • Color: Colorize output
  • IgnorePkg: Skip specific packages during upgrades (equivalent to version pinning)
# Pin/ignore a package
IgnorePkg = linux linux-headers

Comparing the Three Ecosystems

TaskAPT (Debian/Ubuntu)DNF (Fedora/RHEL)Pacman (Arch)
Update repo metadataapt updatednf check-updatepacman -Sy
Installapt install pkgdnf install pkgpacman -S pkg
Removeapt remove pkgdnf remove pkgpacman -R pkg
Remove + depsapt autoremovednf autoremovepacman -Rsu pkg
Upgrade allapt upgradednf upgradepacman -Syu
Searchapt search termdnf search termpacman -Ss term
Show infoapt show pkgdnf info pkgpacman -Si pkg
List filesdpkg -L pkgrpm -ql pkgpacman -Ql pkg
Find owner of filedpkg -S /pathrpm -qf /pathpacman -Qo /path
List installedapt list --installeddnf list installedpacman -Q
Pin/hold versionapt-mark hold pkgdnf versionlock add pkgIgnorePkg in conf
Clean cacheapt cleandnf clean allpacman -Sc

Cleaning Package Caches

Package managers cache downloaded packages. Over time, this can consume significant disk space.

# APT: See cache size, then clean
$ du -sh /var/cache/apt/archives/
847M    /var/cache/apt/archives/

$ sudo apt clean             # Remove ALL cached packages
$ sudo apt autoclean         # Remove only obsolete cached packages

# DNF: Clean metadata and packages
$ sudo dnf clean all
$ sudo dnf clean packages    # Only remove cached packages

# Pacman: Clean cache
$ sudo pacman -Sc            # Remove old cached packages (keeps current)
$ sudo pacman -Scc           # Remove ALL cached packages

Distro Note: On servers with automated updates, the package cache can grow to several gigabytes. Schedule periodic cache cleaning in a cron job or systemd timer.


Security Updates

Applying security patches quickly is critical for any system exposed to the internet.

APT: Security Updates Only

# List security updates
$ apt list --upgradable 2>/dev/null | grep -i security

# Install only security updates (Debian)
$ sudo apt upgrade -y -o Dir::Etc::SourceList=/etc/apt/sources.list \
    -o Dir::Etc::SourceParts=/dev/null \
    -t bookworm-security

# Ubuntu has unattended-upgrades for automatic security patches
$ sudo apt install unattended-upgrades
$ sudo dpkg-reconfigure -plow unattended-upgrades

DNF: Security Updates Only

$ sudo dnf upgrade --security
$ dnf updateinfo list security

Automatic Updates

For production servers, consider enabling automatic security updates:

# Debian/Ubuntu: unattended-upgrades
$ cat /etc/apt/apt.conf.d/50unattended-upgrades
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
};

# Fedora/RHEL: dnf-automatic
$ sudo dnf install dnf-automatic
$ sudo systemctl enable --now dnf-automatic-install.timer

Safety Warning: Even automatic security updates carry risk. A kernel update requires a reboot. A library update could break an application. On critical production servers, test updates in a staging environment first.


Hands-On: Package Investigation Lab

Let us practice querying the package database. These exercises work on any distribution -- just use the appropriate commands from the comparison table above.

Exercise 1: Trace a binary to its package

# Step 1: Find the full path of a command
$ which curl
/usr/bin/curl

# Step 2: Find which package owns it
# Debian/Ubuntu:
$ dpkg -S /usr/bin/curl
curl: /usr/bin/curl

# Fedora/RHEL:
$ rpm -qf /usr/bin/curl
curl-8.2.1-1.fc39.x86_64

# Arch:
$ pacman -Qo /usr/bin/curl
/usr/bin/curl is owned by curl 8.5.0-1

Exercise 2: Examine package dependencies

# Debian/Ubuntu: Show what a package depends on
$ apt depends curl
curl
  Depends: libc6 (>= 2.34)
  Depends: libcurl4 (= 7.88.1-10+deb12u5)
  Depends: zlib1g (>= 1:1.1.4)

# And the reverse -- what depends on this package
$ apt rdepends curl

# Fedora/RHEL:
$ dnf repoquery --requires curl
$ dnf repoquery --whatrequires curl

# Arch:
$ pacman -Si curl | grep Depends
$ pacman -Qi curl | grep "Required By"

Exercise 3: Find packages that provide a specific file

# You need a file but don't know which package provides it
# Debian/Ubuntu:
$ apt-file search libssl.so
libssl-dev: /usr/lib/x86_64-linux-gnu/libssl.so
libssl3: /usr/lib/x86_64-linux-gnu/libssl.so.3

# (Install apt-file first: sudo apt install apt-file && sudo apt-file update)

# Fedora/RHEL:
$ dnf provides */libssl.so

# Arch:
$ pacman -F libssl.so

Debug This

A colleague reports that apt update is failing on a Debian server:

$ sudo apt update
Err:1 http://deb.debian.org/debian bookworm InRelease
  Could not resolve 'deb.debian.org'
Err:2 http://security.debian.org/debian-security bookworm-security InRelease
  Could not resolve 'security.debian.org'
Reading package lists... Done
W: Failed to fetch http://deb.debian.org/debian/dists/bookworm/InRelease
   Could not resolve 'deb.debian.org'
E: Some index files failed to download. They have been ignored, or old ones used instead.

What would you check?

  1. DNS resolution: Can the server resolve any hostname?

    $ nslookup deb.debian.org
    $ cat /etc/resolv.conf
    
  2. Network connectivity: Can you reach the internet at all?

    $ ping -c 3 8.8.8.8
    
  3. Proxy settings: Is the server behind a proxy?

    $ env | grep -i proxy
    

Now here is a different failure -- a colleague tries to install a package and gets:

$ sudo apt install libfoo-dev
E: Unable to locate package libfoo-dev

Checklist:

  1. Did you run sudo apt update first?
  2. Is the package name correct? (apt search libfoo)
  3. Is the package in a component that is not enabled? (Check universe on Ubuntu)
  4. Is the package only available for a different architecture?
  5. Has the package been renamed or replaced?

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                                                               │
│  In this chapter, you learned:                                │
│                                                               │
│  - Package managers handle installation, dependency           │
│    resolution, updates, removal, and tracking.                │
│                                                               │
│  - APT (Debian/Ubuntu): apt/apt-get for high-level ops,      │
│    dpkg for low-level .deb manipulation, sources.list         │
│    for repository configuration.                              │
│                                                               │
│  - DNF (Fedora/RHEL): dnf for high-level ops, rpm for        │
│    low-level .rpm work, .repo files in /etc/yum.repos.d/.    │
│    Transaction history with undo capability.                  │
│                                                               │
│  - Pacman (Arch): Single tool with -S/-R/-Q/-F flags.        │
│    AUR for community packages. Rolling release requires       │
│    full-system upgrades.                                      │
│                                                               │
│  - Version pinning prevents unwanted upgrades.                │
│                                                               │
│  - Cache cleaning recovers disk space.                        │
│                                                               │
│  - Security updates should be prioritized and can be          │
│    automated with unattended-upgrades or dnf-automatic.       │
│                                                               │
│  - You can trace any file to its package, examine             │
│    dependencies, and find which package provides a file.      │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Try This

Exercises

  1. Inventory exercise: List all installed packages on your system, sort them by installed size, and find the 10 largest. (Hint: on Debian/Ubuntu, try dpkg-query -W --showformat='${Installed-Size}\t${Package}\n' | sort -rn | head -10)

  2. Dependency tree: Pick a complex package (like firefox or nginx) and map out its dependency tree. How many packages does it pull in? (Hint: apt depends --recurse or dnf repoquery --requires --resolve --recursive)

  3. File hunt: Without installing it, find out what files the strace package would place on your system. (Hint: on Debian, use apt-file list strace; on Fedora, use dnf repoquery -l strace)

  4. Repository management: On a test system, add a third-party repository (like the Docker official repo), install a package from it, then cleanly remove both the package and the repository.

  5. Version pinning: On a test system, install a package, pin it to the current version, run a full upgrade, and confirm the pinned package was held back.

Bonus Challenge

Write a shell script that works on both Debian/Ubuntu and Fedora/RHEL systems. The script should detect which distribution it is running on (using /etc/os-release), then use the appropriate package manager to: update the package index, list available security updates, and report how many packages need updating. This is the kind of multi-distro script you will write in real operations work.


What's Next

You now know how to install software from repositories, but what do you do when the package you need is not available in any repository, or you need a newer version with custom options? Chapter 58 teaches you how to compile software from source code -- the original way of installing software on Unix and Linux.

Compiling Software from Source

Why This Matters

You are setting up a high-performance web server and you need Nginx compiled with a specific third-party module that is not included in your distribution's package. Or you are on an embedded system with no package manager. Or a critical security fix was released two hours ago and your distribution has not packaged it yet. Or you simply need version 3.4 of a tool and your distro ships 3.1.

Compiling from source is one of the most empowering skills a Linux administrator can have. It is the original way software was distributed on Unix -- before package managers existed, everyone compiled from source. Today, you will not do it for every piece of software, but when you need it, nothing else will do.

This chapter walks you through the entire process: from installing build tools to the classic configure-make-install dance, CMake projects, understanding what Makefiles do, custom installation prefixes, creating packages from compiled software, and diagnosing the errors that inevitably appear along the way.


Try This Right Now

Check whether your system has the basic build tools installed:

$ gcc --version
gcc (Debian 12.2.0-14) 12.2.0
...

$ make --version
GNU Make 4.3

$ pkg-config --version
1.8.1

If any of these commands fail with "command not found," you need to install your distribution's development tools -- which is the first thing we cover below.


Why Compile from Source?

Before we dive into the how, let us be clear about the when and why. You should compile from source when:

  1. The package does not exist in your distribution's repository
  2. You need a newer version than what the repository provides
  3. You need custom compile options (enable/disable features, custom modules)
  4. You need to apply patches (security fixes, bug fixes, custom modifications)
  5. You are learning how software is built (understanding builds makes you a better debugger)
  6. You are on a minimal system with no package manager (embedded, rescue environments)

You should not compile from source when:

  1. The package is available in your distribution's repository (use the package manager instead)
  2. You are managing many servers (compiled software is harder to update and track)
  3. You cannot commit to monitoring for security updates (the package manager handles this for you)
┌───────────────────────────────────────────────────────────┐
│           Use Repository Package When:                     │
│  - Version in repo is sufficient                           │
│  - You want automatic security updates                     │
│  - You manage many identical servers                       │
│  - You want dependency tracking                            │
├───────────────────────────────────────────────────────────┤
│           Compile from Source When:                         │
│  - You need a specific version not in the repo             │
│  - You need custom compile-time options                    │
│  - The software is not packaged at all                     │
│  - You need to apply custom patches                        │
│  - You are on a minimal or embedded system                 │
└───────────────────────────────────────────────────────────┘

Think About It: If you compile Nginx from source on a production server, who is responsible for applying security patches to it? How does this differ from using the distribution's package?


Installing Build Prerequisites

Before you can compile anything, you need a compiler, linker, and basic build tools.

Debian/Ubuntu

$ sudo apt update
$ sudo apt install build-essential

The build-essential meta-package installs:

  • gcc -- the GNU C compiler
  • g++ -- the GNU C++ compiler
  • make -- the build automation tool
  • libc6-dev -- C library development headers
  • dpkg-dev -- Debian package development tools

For many projects, you will also need:

$ sudo apt install pkg-config autoconf automake libtool

Fedora/RHEL

$ sudo dnf groupinstall "Development Tools"

Or install individually:

$ sudo dnf install gcc gcc-c++ make autoconf automake libtool pkgconfig

Arch Linux

$ sudo pacman -S base-devel

Distro Note: The group/meta-package names differ, but they all install the same core tools: a C/C++ compiler, make, and essential development headers.


The Configure-Make-Install Dance

The vast majority of C and C++ projects on Linux follow the same three-step build process. Understanding it deeply will serve you for years.

┌──────────────────────────────────────────────────────────┐
│                                                           │
│  Step 1: ./configure                                      │
│  - Checks your system for required tools and libraries    │
│  - Detects compiler, OS, architecture                     │
│  - Generates a Makefile tailored to YOUR system           │
│                                                           │
│  Step 2: make                                             │
│  - Reads the Makefile                                     │
│  - Compiles source code into object files                 │
│  - Links object files into executables and libraries      │
│                                                           │
│  Step 3: make install                                     │
│  - Copies compiled binaries to /usr/local/bin             │
│  - Copies libraries to /usr/local/lib                     │
│  - Copies headers to /usr/local/include                   │
│  - Copies man pages to /usr/local/share/man               │
│                                                           │
└──────────────────────────────────────────────────────────┘

Hands-On: Compiling a Real Program

Let us compile jq, the popular JSON processor, from source. This is a real-world example that demonstrates the full process.

Step 1: Download the source code.

$ mkdir -p ~/src && cd ~/src
$ wget https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-1.7.1.tar.gz
$ tar xzf jq-1.7.1.tar.gz
$ cd jq-1.7.1

Alternatively, clone from Git:

$ git clone https://github.com/jqlang/jq.git
$ cd jq
$ git checkout jq-1.7.1
$ git submodule update --init    # Some projects have submodules
$ autoreconf -i                  # Generate configure script from Git source

Step 2: Inspect what you have.

$ ls
AUTHORS  COPYING  ChangeLog  Makefile.am  NEWS  README.md  configure  configure.ac  src/  ...

Key files:

  • configure: The configuration script (generated by autoconf)
  • configure.ac: The source for the configure script (autoconf macros)
  • Makefile.am or Makefile.in: Templates that configure turns into a Makefile
  • src/: The actual source code
  • COPYING or LICENSE: The license

Step 3: Run configure.

$ ./configure --help | head -30
`configure' configures jq 1.7.1 to adapt to many kinds of systems.

Usage: ./configure [OPTION]... [VAR=VALUE]...

Installation directories:
  --prefix=PREFIX         install architecture-independent files in PREFIX
                          [/usr/local]
  --exec-prefix=EPREFIX  install architecture-dependent files in EPREFIX
                          [PREFIX]

Optional Features:
  --enable-maintainer-mode  enable make rules and dependencies
  --disable-docs            do not build documentation
  --enable-all-static       link jq with static libraries only

Always check --help first. This shows you what you can customize.

$ ./configure --prefix=/usr/local
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
...
checking for oniguruma... yes
...
configure: creating ./config.status
config.status: creating Makefile

The configure script:

  • Checks that you have a working C compiler
  • Checks for required libraries (like oniguruma for jq's regex support)
  • Detects your operating system and architecture
  • Writes a Makefile customized for your system

If configure fails, it tells you what is missing:

configure: error: Package requirements (oniguruma) were not met:
No package 'oniguruma' found

The fix is to install the missing development library:

# Debian/Ubuntu
$ sudo apt install libonig-dev

# Fedora/RHEL
$ sudo dnf install oniguruma-devel

# Arch
$ sudo pacman -S oniguruma

Distro Note: Development headers for library libfoo are typically named libfoo-dev on Debian/Ubuntu and libfoo-devel on Fedora/RHEL. On Arch, the main package usually includes headers.

Step 4: Compile.

$ make
  CC       src/main.o
  CC       src/jq_parser.o
  CC       src/lexer.o
  CC       src/builtin.o
  ...
  CCLD     jq

To speed up compilation on a multi-core system:

$ make -j$(nproc)

nproc returns the number of CPU cores. make -j4 means "run up to 4 compile jobs in parallel."

Step 5: Test (optional but recommended).

Many projects include a test suite:

$ make check
PASS: tests/jqtest
PASS: tests/onigtest
PASS: tests/shtest
==================
All 3 tests passed
==================

Step 6: Install.

$ sudo make install
/usr/bin/install -c -d '/usr/local/bin'
/usr/bin/install -c jq '/usr/local/bin'
/usr/bin/install -c -d '/usr/local/lib'
/usr/bin/install -c -m 644 libjq.a '/usr/local/lib'
/usr/bin/install -c -d '/usr/local/include'
...

Step 7: Verify.

$ which jq
/usr/local/bin/jq

$ jq --version
jq-1.7.1

$ echo '{"name":"linux"}' | jq '.name'
"linux"

Understanding --prefix

The --prefix option controls where make install puts files. This is one of the most important options.

--prefix=/usr/local  (default)
  /usr/local/bin/jq
  /usr/local/lib/libjq.a
  /usr/local/include/jq.h
  /usr/local/share/man/man1/jq.1

--prefix=/opt/jq-1.7.1
  /opt/jq-1.7.1/bin/jq
  /opt/jq-1.7.1/lib/libjq.a
  /opt/jq-1.7.1/include/jq.h
  /opt/jq-1.7.1/share/man/man1/jq.1

--prefix=$HOME/.local
  ~/.local/bin/jq         (no sudo needed!)
  ~/.local/lib/libjq.a
  ~/.local/include/jq.h

Choosing a Prefix Strategy

PrefixProsCons
/usr/local (default)In default PATH, shared by all usersHarder to uninstall, may conflict with packages
/opt/program-versionEasy to remove (just delete the directory), multiple versions can coexistNeed to add to PATH manually
$HOME/.localNo root required, user-isolatedNot available to other users

The /opt/program-version approach is especially useful on servers:

$ ./configure --prefix=/opt/nginx-1.25.3
$ make -j$(nproc)
$ sudo make install

# Create a symlink so the "current" version is easy to reference
$ sudo ln -sf /opt/nginx-1.25.3 /opt/nginx

# Add to PATH
$ export PATH=/opt/nginx/sbin:$PATH

When you upgrade, install the new version to /opt/nginx-1.25.4 and update the symlink. Rolling back is just changing the symlink.

Think About It: Why does the default prefix put files in /usr/local instead of /usr? What problem does this separation solve?


Understanding Makefiles

When you run make, it reads a file called Makefile (or makefile). Understanding the basics of Makefiles helps you debug build problems.

A Makefile consists of rules:

target: dependencies
	command

# Example:
jq: src/main.o src/jq_parser.o src/lexer.o
	gcc -o jq src/main.o src/jq_parser.o src/lexer.o -ljq -lonig

src/main.o: src/main.c src/jq.h
	gcc -c src/main.c -o src/main.o

Reading this rule: "To build jq, first make sure main.o, jq_parser.o, and lexer.o are up to date, then link them together."

Key concepts:

  • target: What to build
  • dependencies: What must exist (and be up-to-date) first
  • command: How to build it (MUST be indented with a tab, not spaces)

Common Makefile targets:

$ make                # Build the default target (usually "all")
$ make all            # Build everything
$ make install        # Install to prefix
$ make clean          # Remove compiled files (object files, binaries)
$ make distclean      # Remove everything generated by configure
$ make check          # Run tests
$ make uninstall      # Remove installed files (not always available)

CMake: The Modern Alternative

Many modern projects (especially C++ ones) use CMake instead of autotools. CMake generates Makefiles (or Ninja build files) from a CMakeLists.txt file.

The workflow is:

┌──────────────────────────────────────────────────────┐
│                                                       │
│  Step 1: mkdir build && cd build                      │
│  (CMake strongly prefers out-of-source builds)        │
│                                                       │
│  Step 2: cmake .. [options]                           │
│  (Configure the project -- like ./configure)          │
│                                                       │
│  Step 3: make -j$(nproc)                              │
│  (Compile -- same as autotools)                       │
│                                                       │
│  Step 4: sudo make install                            │
│  (Install -- same as autotools)                       │
│                                                       │
└──────────────────────────────────────────────────────┘

Hands-On: Compiling a CMake Project

Let us compile htop (the interactive process viewer) from source. htop uses CMake.

# Install CMake
# Debian/Ubuntu:
$ sudo apt install cmake

# Fedora/RHEL:
$ sudo dnf install cmake

# Arch:
$ sudo pacman -S cmake
# Download and extract
$ cd ~/src
$ wget https://github.com/htop-dev/htop/releases/download/3.3.0/htop-3.3.0.tar.xz
$ tar xJf htop-3.3.0.tar.xz
$ cd htop-3.3.0

# Note: htop actually supports both autotools and CMake.
# We will use CMake here to demonstrate the workflow.

# Create a build directory (out-of-source build)
$ mkdir build && cd build

# Configure
$ cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local
-- The C compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info - done
-- Looking for ncursesw
-- Found ncursesw: /usr/lib/x86_64-linux-gnu/libncursesw.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/src/htop-3.3.0/build

# Compile
$ make -j$(nproc)
[  2%] Building C object CMakeFiles/htop.dir/Action.c.o
[  5%] Building C object CMakeFiles/htop.dir/AvailableColumnsPanel.c.o
...
[100%] Linking C executable htop

# Install
$ sudo make install

CMake options use -D prefix:

$ cmake .. \
    -DCMAKE_INSTALL_PREFIX=/opt/htop \
    -DCMAKE_BUILD_TYPE=Release \
    -DENABLE_UNICODE=ON

To see all available options:

$ cmake .. -LH

checkinstall: Creating Packages from Source

The biggest problem with make install is that your package manager does not know about the installed files. You cannot cleanly uninstall, and upgrades may conflict.

checkinstall solves this by intercepting make install and creating a .deb or .rpm package instead.

# Install checkinstall
# Debian/Ubuntu:
$ sudo apt install checkinstall

# Instead of: sudo make install
# Run:
$ sudo checkinstall --pkgname=jq-custom --pkgversion=1.7.1 --pkgrelease=1 \
    --default make install

Creating package jq-custom...
OK

**********************************************************************
Done. The new package has been installed and saved to
/home/user/src/jq-1.7.1/jq-custom_1.7.1-1_amd64.deb
**********************************************************************

Now your package manager knows about it:

$ dpkg -l | grep jq-custom
ii  jq-custom  1.7.1-1  amd64  Package created with checkinstall

# Clean uninstall through the package manager
$ sudo dpkg -r jq-custom

You can also save the .deb file and install it on other identical systems:

$ sudo dpkg -i jq-custom_1.7.1-1_amd64.deb

Distro Note: checkinstall supports creating .deb (Debian/Ubuntu), .rpm (Fedora/RHEL), and Slackware packages. On Fedora, you may need to install it from source since it is not always in the official repos.


Common Compilation Errors and Fixes

Compilation will fail. It is not a question of if but when. Here are the errors you will encounter most often and exactly how to fix them.

Error: "configure: error: no acceptable C compiler found"

configure: error: in `/home/user/src/project':
configure: error: no acceptable C compiler found in $PATH

Fix: Install the compiler.

# Debian/Ubuntu
$ sudo apt install build-essential

# Fedora/RHEL
$ sudo dnf install gcc gcc-c++ make

Error: Missing library or header file

configure: error: Package requirements (libxml-2.0 >= 2.9) were not met:
No package 'libxml-2.0' found

Or during compilation:

src/parser.c:15:10: fatal error: libxml/parser.h: No such file or directory
 #include <libxml/parser.h>
          ^~~~~~~~~~~~~~~~~
compilation terminated.

Fix: Install the development package for the missing library.

# Find the right package name
# Debian/Ubuntu:
$ apt search libxml2 | grep dev
libxml2-dev/stable 2.9.14+dfsg-1.3 amd64

$ sudo apt install libxml2-dev

# Fedora/RHEL:
$ dnf search libxml2 | grep devel
libxml2-devel.x86_64

$ sudo dnf install libxml2-devel

The pattern is consistent:

  • Missing libfoo -> install libfoo-dev (Debian) or libfoo-devel (Fedora)

Error: "make: *** No targets specified and no makefile found"

make: *** No targets specified and no makefile found.  Stop.

Fix: You forgot to run ./configure first, or configure failed. Check for a configure script:

$ ls configure

If there is no configure script, check for:

  • CMakeLists.txt -- use cmake
  • autogen.sh or bootstrap.sh -- run it to generate configure
  • configure.ac -- run autoreconf -i to generate configure
  • Makefile -- some projects ship a Makefile directly (just run make)
  • meson.build -- uses the Meson build system

Error: Linker errors ("undefined reference to")

/usr/bin/ld: src/main.o: undefined reference to `json_parse'
collect2: error: ld returned 1 exit status

Fix: A required library is not being linked. This usually means:

  1. The library is not installed (install the -dev/-devel package)
  2. The library is installed but not found (set PKG_CONFIG_PATH or LDFLAGS)
$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
$ export LDFLAGS="-L/usr/local/lib"
$ export CFLAGS="-I/usr/local/include"
$ ./configure

Error: "Permission denied" during make install

install: cannot create regular file '/usr/local/bin/jq': Permission denied

Fix: Use sudo:

$ sudo make install

Or install to a location you own:

$ ./configure --prefix=$HOME/.local
$ make
$ make install     # No sudo needed

Error: Version mismatch

configure: error: You need at least autoconf 2.69 to build this project

Fix: Install a newer version of the required tool, or download the release tarball instead of the Git source (release tarballs include pre-generated configure scripts and do not need autoconf).


Hands-On: Complete Build-from-Source Workflow

Let us put it all together with a structured example. We will compile tree (a simple directory listing program) from source.

# Step 1: Create a workspace
$ mkdir -p ~/src && cd ~/src

# Step 2: Download
$ wget https://gitlab.com/OldManProgrammer/unix-tree/-/archive/2.1.1/unix-tree-2.1.1.tar.gz
$ tar xzf unix-tree-2.1.1.tar.gz
$ cd unix-tree-2.1.1

# Step 3: Look at what we have
$ ls
CHANGES  INSTALL  LICENSE  Makefile  README.md  doc/  man/  tree.c  ...

# This project has a Makefile directly -- no configure step needed!

# Step 4: Read the install instructions
$ cat INSTALL
# (Always read the INSTALL or README file before building)

# Step 5: Build
$ make -j$(nproc)
gcc -O2 -Wall -o tree tree.c color.c hash.c html.c json.c unix.c xml.c

# Step 6: Test it
$ ./tree ~/src --dirsfirst -L 1
/home/user/src
├── jq-1.7.1
├── unix-tree-2.1.1
└── htop-3.3.0

# Step 7: Install to a custom prefix
$ make PREFIX=/opt/tree-2.1.1 install
# Or use checkinstall:
$ sudo checkinstall --pkgname=tree-custom --pkgversion=2.1.1 --default \
    make PREFIX=/usr/local install

Build Systems Reference

Not every project uses autotools. Here is a quick reference for the build systems you will encounter:

Build SystemIdentifierConfigure StepBuild Step
Autotoolsconfigure, configure.ac./configuremake
CMakeCMakeLists.txtcmake ..make or cmake --build .
Mesonmeson.buildmeson setup buildninja -C build
Plain MakefileMakefile onlyNonemake
Gogo.modNonego build
Rust/CargoCargo.tomlNonecargo build --release
Pythonsetup.py, pyproject.tomlNonepip install .

For Meson (increasingly common in GNOME/freedesktop projects):

$ sudo apt install meson ninja-build    # or dnf/pacman equivalent
$ meson setup build --prefix=/usr/local
$ ninja -C build
$ sudo ninja -C build install

Debug This

A colleague is trying to compile a project and hits this error:

$ ./configure
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for ZLIB... no
configure: error: zlib library not found

$ dpkg -l | grep zlib
ii  zlib1g   1:1.2.13.dfsg-1   amd64   compression library - runtime

They say: "But zlib IS installed! I can see it right there!"

What is the problem?

The runtime library (zlib1g) is installed, but the development headers (zlib1g-dev) are not. The runtime library contains the .so file that programs link against at runtime. The development package contains the .h header files and the pkg-config metadata that the configure script needs at compile time.

# The fix:
$ sudo apt install zlib1g-dev

# Now configure will find it:
$ ./configure
checking for ZLIB... yes

This is the single most common compilation issue on Linux. The runtime package and the development package are separate. You always need the -dev (Debian) or -devel (Fedora) package to compile against a library.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                                                               │
│  In this chapter, you learned:                                │
│                                                               │
│  - When to compile from source vs. using packages:            │
│    custom versions, custom options, unavailable packages.     │
│                                                               │
│  - The build-essential / Development Tools packages           │
│    provide gcc, g++, make, and development headers.           │
│                                                               │
│  - The configure-make-install workflow:                        │
│    ./configure checks your system, make compiles,             │
│    make install copies files to the prefix.                   │
│                                                               │
│  - --prefix controls where files are installed.               │
│    /usr/local (default), /opt/name-version, ~/.local          │
│                                                               │
│  - CMake projects use: mkdir build && cd build &&             │
│    cmake .. && make && make install                           │
│                                                               │
│  - checkinstall creates .deb/.rpm packages from               │
│    make install, giving you clean uninstall via               │
│    the package manager.                                       │
│                                                               │
│  - Most compilation errors come from missing -dev/-devel      │
│    packages. The pattern: find the library name, install      │
│    its development package.                                   │
│                                                               │
│  - Always read INSTALL or README before building.             │
│    Always check ./configure --help for options.               │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Try This

Exercises

  1. Basic build: Download and compile GNU hello from source (https://ftp.gnu.org/gnu/hello/). This is the simplest possible autotools project -- the "Hello, World!" of compilation. Use ./configure --prefix=$HOME/.local, make, and make install.

  2. Custom prefix: Compile jq from source with --prefix=/opt/jq-custom. After installation, verify you can run it by adding /opt/jq-custom/bin to your PATH. Then remove it cleanly by deleting the /opt/jq-custom directory.

  3. CMake project: Find a small CMake-based project on GitHub (the CMake tutorial projects work well) and build it using the out-of-source build workflow.

  4. Error diagnosis: Intentionally try to compile a project without installing its required dependencies. Read the error messages carefully and identify which -dev packages you need to install.

  5. checkinstall: Compile any small program and use checkinstall to create a .deb or .rpm package. Install it, verify it works, then remove it using your package manager.

Bonus Challenge

Download the Nginx source code from nginx.org. Compile it with a custom set of modules:

  • Enable the http_ssl_module (requires libssl-dev)
  • Enable the http_v2_module
  • Disable the http_autoindex_module
  • Install to /opt/nginx-custom

Read ./configure --help to find the exact flags. This is a realistic task -- sysadmins frequently compile Nginx with custom module sets.


What's Next

When you compile and install software, the resulting binaries depend on shared libraries -- .so files that are loaded at runtime. Chapter 59 explains how shared libraries work, how the dynamic linker finds them, and how to troubleshoot the dreaded "cannot open shared object file" error.

Shared Libraries & Dynamic Linking

Why This Matters

It is Monday morning and your application will not start. The error is cryptic:

error while loading shared libraries: libssl.so.3: cannot open shared object file:
No such file or directory

If you do not understand how shared libraries and dynamic linking work, this error is baffling. The file is right there on disk -- you can see it with ls. But the program cannot find it. Why?

Shared libraries are the invisible plumbing of every Linux system. When you run almost any program -- ls, curl, python, nginx -- it does not contain all the code it needs inside its own binary. Instead, it relies on shared libraries (.so files) that are loaded into memory at runtime. This saves enormous amounts of disk space and RAM, and it means a security fix to a library like OpenSSL instantly protects every program that uses it.

But this architecture also means that when a library is missing, moved, or the wrong version, programs break in confusing ways. This chapter gives you the complete picture: how shared libraries work, how the dynamic linker finds them, and how to diagnose and fix every common library problem you will encounter.


Try This Right Now

Pick any command on your system and see which shared libraries it depends on:

$ ldd /usr/bin/curl
	linux-vdso.so.1 (0x00007ffd5a7e6000)
	libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f3a1c800000)
	libz.so.1 => /usr/lib/x86_64-linux-gnu/libz.so.1 (0x00007f3a1c7e0000)
	libssl.so.3 => /usr/lib/x86_64-linux-gnu/libssl.so.3 (0x00007f3a1c730000)
	libcrypto.so.3 => /usr/lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f3a1c200000)
	libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3a1c1f0000)
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f3a1c000000)
	...

That is curl depending on at least seven shared libraries. Every one of them must be present and findable at runtime.


Static vs Shared Libraries

There are two ways to package reusable code for programs to use:

Static Libraries (.a files)

A static library is an archive of compiled code that gets copied into the program at compile time (linking stage). The resulting binary contains everything it needs.

┌──────────────────────────────────────────────────────────┐
│  Compile Time                                             │
│                                                           │
│  program.c  +  libfoo.a  ──>  program (self-contained)   │
│                                                           │
│  The binary contains a copy of libfoo's code.             │
│  No external library needed at runtime.                   │
└──────────────────────────────────────────────────────────┘

Advantages:

  • No runtime dependencies -- the binary is self-contained
  • No "library not found" errors
  • Portable across systems with the same architecture

Disadvantages:

  • Larger binary size (every program has its own copy of the library code)
  • If a library has a bug fix, every program must be recompiled
  • More memory usage (each running program has its own copy in RAM)

Shared Libraries (.so files)

A shared library is compiled code that is loaded at runtime when the program starts. Multiple programs can share the same library in memory.

┌──────────────────────────────────────────────────────────┐
│  Compile Time                                             │
│                                                           │
│  program.c  ──>  program (contains reference to libfoo)   │
│                                                           │
│  Runtime                                                  │
│                                                           │
│  program starts                                           │
│       │                                                   │
│       ▼                                                   │
│  Dynamic linker (ld-linux.so) loads libfoo.so             │
│       │                                                   │
│       ▼                                                   │
│  program runs with libfoo's code available                │
│                                                           │
│  If another program also uses libfoo.so, the kernel       │
│  shares the same physical memory pages.                   │
└──────────────────────────────────────────────────────────┘

Advantages:

  • Smaller binaries
  • Update the library once, all programs benefit
  • Less memory usage (shared across processes via memory mapping)
  • Security patches are effective immediately

Disadvantages:

  • Runtime dependency -- the library must be present when the program runs
  • Version conflicts ("dependency hell")
  • Slightly slower startup (the dynamic linker must find and load libraries)

In practice, shared libraries are the default on Linux. Static linking is used in specific cases like Go binaries, containerized applications, and rescue utilities.

Think About It: When a critical security vulnerability is found in OpenSSL, what is the advantage of shared libraries? How many programs need to be updated?


Library Naming Conventions

Shared library names on Linux follow a specific pattern that encodes versioning information:

libfoo.so          Linker name     (symlink, used at compile time)
     │
     ▼
libfoo.so.1        SONAME          (symlink, used at runtime for ABI version)
     │
     ▼
libfoo.so.1.4.2    Real name       (actual file with full version)

Let us see this in practice:

$ ls -la /usr/lib/x86_64-linux-gnu/libssl*
lrwxrwxrwx 1 root root     13 ... libssl.so -> libssl.so.3
lrwxrwxrwx 1 root root     17 ... libssl.so.3 -> libssl.so.3.0.11
-rw-r--r-- 1 root root 684544 ... libssl.so.3.0.11

Three files (two symlinks and one real file):

  1. libssl.so.3.0.11 -- The real file. Contains the actual library code. Version 3.0.11.
  2. libssl.so.3 -- The SONAME (shared object name). A symlink to the real file. Programs record this name when compiled, so at runtime the dynamic linker looks for libssl.so.3. Any version 3.x.y can satisfy this.
  3. libssl.so -- The linker name. A symlink used only at compile time. When you compile with -lssl, the linker looks for libssl.so. This file is typically only installed with the -dev package.

Why Three Names?

This three-level system enables backward compatibility:

┌────────────────────────────────────────────────────┐
│  Your program was compiled against libssl.so.3      │
│                                                     │
│  libssl.so.3 -> libssl.so.3.0.11                   │
│                                                     │
│  The library gets a security update:                │
│  libssl.so.3 -> libssl.so.3.0.12   (NEW)           │
│                                                     │
│  Your program still works! It looks for             │
│  libssl.so.3, and the symlink was updated.          │
│  No recompilation needed.                           │
│                                                     │
│  But if the ABI changes (breaking change):          │
│  The new library becomes libssl.so.4                │
│  Your program still looks for libssl.so.3           │
│  Both can exist simultaneously.                     │
└────────────────────────────────────────────────────┘

You can check a library's SONAME:

$ objdump -p /usr/lib/x86_64-linux-gnu/libssl.so.3.0.11 | grep SONAME
  SONAME               libssl.so.3

How the Dynamic Linker Finds Libraries

When you run a program, the kernel loads the binary and then hands control to the dynamic linker (ld-linux-x86-64.so.2 on 64-bit systems). The dynamic linker's job is to find and load all required shared libraries.

The search order is:

┌──────────────────────────────────────────────────────────┐
│  Dynamic Linker Library Search Order                      │
│                                                           │
│  1. RPATH encoded in the binary (compile-time setting)    │
│  2. LD_LIBRARY_PATH environment variable                  │
│  3. RUNPATH encoded in the binary (compile-time setting)  │
│  4. /etc/ld.so.cache (precomputed lookup table)           │
│  5. Default paths: /lib, /usr/lib                         │
│                                                           │
│  The linker searches in this order and uses the FIRST     │
│  matching library it finds.                               │
└──────────────────────────────────────────────────────────┘

/etc/ld.so.conf and ldconfig

The file /etc/ld.so.conf (and files in /etc/ld.so.conf.d/) list additional directories where the dynamic linker should look for libraries:

$ cat /etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf

$ ls /etc/ld.so.conf.d/
libc.conf
x86_64-linux-gnu.conf

$ cat /etc/ld.so.conf.d/x86_64-linux-gnu.conf
/usr/local/lib/x86_64-linux-gnu
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu

After modifying these files, you must run ldconfig to rebuild the cache:

$ sudo ldconfig

ldconfig does three things:

  1. Scans the directories listed in /etc/ld.so.conf
  2. Creates the SONAME symlinks (e.g., libssl.so.3 -> libssl.so.3.0.11)
  3. Updates /etc/ld.so.cache (a binary file that the dynamic linker reads for fast lookups)

You can see what is in the cache:

$ ldconfig -p | head -20
1847 libs found in cache `/etc/ld.so.cache'
	libz.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libz.so.1
	libxml2.so.2 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libxml2.so.2
	...

# Search for a specific library
$ ldconfig -p | grep libssl
	libssl.so.3 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libssl.so.3

LD_LIBRARY_PATH

This environment variable adds directories to the library search path at runtime. It is the quick-and-dirty way to help a program find its libraries.

# Set it for a single command
$ LD_LIBRARY_PATH=/opt/myapp/lib ./myapp

# Or export it
$ export LD_LIBRARY_PATH=/opt/myapp/lib:$LD_LIBRARY_PATH
$ ./myapp

Safety Warning: Do NOT set LD_LIBRARY_PATH globally (in .bashrc or system-wide profile) as a permanent fix. It affects every program you run and can cause subtle breakage. Use it for testing only. For permanent solutions, add the directory to /etc/ld.so.conf.d/ and run ldconfig.

Safety Warning: Never set LD_LIBRARY_PATH for setuid programs. The dynamic linker ignores it for setuid binaries as a security measure, because allowing arbitrary library loading would enable privilege escalation.


The ldd Command

ldd shows the shared libraries required by a program and whether the dynamic linker can find them:

$ ldd /usr/bin/python3
	linux-vdso.so.1 (0x00007ffdb23fe000)
	libpython3.11.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.11.so.1.0 (0x00007f...)
	libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f...)
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f...)
	...

Each line shows:

  • The library name the binary expects
  • => where it was found on disk
  • The memory address where it will be loaded

When a library is not found:

$ ldd ./myapp
	libcustom.so.1 => not found
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f...)

That not found is exactly what causes the "cannot open shared object file" error at runtime.

Safety Warning: Do not run ldd on untrusted binaries. On some systems, ldd may actually execute the binary to determine its dependencies. For untrusted binaries, use objdump -p binary | grep NEEDED instead.

Checking Libraries Without ldd

# Using objdump (safer for untrusted binaries)
$ objdump -p /usr/bin/curl | grep NEEDED
  NEEDED               libcurl.so.4
  NEEDED               libz.so.1
  NEEDED               libpthread.so.0
  NEEDED               libc.so.6

# Using readelf
$ readelf -d /usr/bin/curl | grep NEEDED
 0x0000000000000001 (NEEDED)   Shared library: [libcurl.so.4]
 0x0000000000000001 (NEEDED)   Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)   Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)   Shared library: [libc.so.6]

Troubleshooting "cannot open shared object file"

This is the most common library error. Let us walk through a systematic diagnosis.

The Error

$ ./myapp
./myapp: error while loading shared libraries: libfoo.so.2: cannot open
shared object file: No such file or directory

Step-by-Step Diagnosis

Step 1: Confirm which library is missing.

$ ldd ./myapp | grep "not found"
	libfoo.so.2 => not found

Step 2: Search for the library on disk.

# Is it installed anywhere?
$ find / -name "libfoo.so*" 2>/dev/null
/opt/custom/lib/libfoo.so.2.1.0
/opt/custom/lib/libfoo.so.2

The library exists, but in a non-standard location.

Step 3: Check if the path is in the linker cache.

$ ldconfig -p | grep libfoo
# (no output -- it's not in the cache)

Step 4: Fix it.

Option A -- Add the path to the linker configuration (recommended for permanent fix):

$ echo "/opt/custom/lib" | sudo tee /etc/ld.so.conf.d/custom.conf
$ sudo ldconfig

# Verify
$ ldconfig -p | grep libfoo
	libfoo.so.2 (libc6,x86-64) => /opt/custom/lib/libfoo.so.2

Option B -- Use LD_LIBRARY_PATH (for testing):

$ LD_LIBRARY_PATH=/opt/custom/lib ./myapp

Option C -- The library is genuinely not installed. Find and install the package that provides it:

# Debian/Ubuntu
$ apt-file search libfoo.so.2
libfoo2: /usr/lib/x86_64-linux-gnu/libfoo.so.2

$ sudo apt install libfoo2

# Fedora/RHEL
$ dnf provides */libfoo.so.2

Step 5: Handle version mismatches.

Sometimes the library exists but with a different version:

$ ls /usr/lib/x86_64-linux-gnu/libfoo*
libfoo.so.3
libfoo.so.3.1.0

The program wants libfoo.so.2 but only version 3 is installed. This means:

  • The program was compiled against an older version of the library
  • Version 3 has breaking ABI changes (hence the different SONAME)
  • You need to either: install the older library version alongside the new one, or recompile the program against the new version

Think About It: Why can libfoo.so.2 and libfoo.so.3 coexist on the same system? What does the number after .so. represent?


Using strace for Library Debugging

When ldd does not give you enough information, strace shows you exactly what the dynamic linker is doing at the system call level.

$ strace -e openat ./myapp 2>&1 | grep "\.so"
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libfoo.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libfoo.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT
openat(AT_FDCWD, "/usr/lib/libfoo.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT

This reveals exactly which directories the linker searched and that every attempt returned ENOENT (file not found).

You can also use the LD_DEBUG environment variable for detailed linker diagnostics:

$ LD_DEBUG=libs ./myapp 2>&1 | head -40
     12345:	find library=libfoo.so.2 [0]; searching
     12345:	 search cache=/etc/ld.so.cache
     12345:	 search path=/usr/lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu:...
     12345:	  trying file=/usr/lib/x86_64-linux-gnu/libfoo.so.2
     12345:	  trying file=/lib/x86_64-linux-gnu/libfoo.so.2
     12345:
     12345:	./myapp: error while loading shared libraries: libfoo.so.2:
     cannot open shared object file: No such file or directory

Other useful LD_DEBUG values:

$ LD_DEBUG=files ./myapp     # Show file operations
$ LD_DEBUG=bindings ./myapp  # Show symbol binding
$ LD_DEBUG=versions ./myapp  # Show version dependencies
$ LD_DEBUG=all ./myapp       # Show everything (very verbose)
$ LD_DEBUG=help ./myapp      # List all available debug options

pkg-config: Finding Library Information

pkg-config is a helper tool that provides the compiler and linker flags needed to use a library. It reads .pc files installed by library development packages.

# What compiler flags does openssl need?
$ pkg-config --cflags openssl
-I/usr/include/openssl

# What linker flags?
$ pkg-config --libs openssl
-lssl -lcrypto

# What version is installed?
$ pkg-config --modversion openssl
3.0.11

# Does a specific version satisfy a requirement?
$ pkg-config --exists "openssl >= 3.0" && echo "yes" || echo "no"
yes

This is how ./configure scripts typically detect libraries:

# In a configure script, this is essentially what happens:
PKG_CHECK_MODULES([OPENSSL], [openssl >= 1.1])
# Translates to: pkg-config --exists "openssl >= 1.1"

The .pc files live in standard locations:

$ pkg-config --variable pc_path pkg-config
/usr/local/lib/x86_64-linux-gnu/pkgconfig:/usr/local/lib/pkgconfig:...

$ cat /usr/lib/x86_64-linux-gnu/pkgconfig/openssl.pc
prefix=/usr
exec_prefix=${prefix}
libdir=${exec_prefix}/lib/x86_64-linux-gnu
includedir=${prefix}/include

Name: OpenSSL
Description: Secure Sockets Layer and cryptography libraries and tools
Version: 3.0.11
Requires: libssl libcrypto

If pkg-config cannot find a library you know is installed, you may need to set PKG_CONFIG_PATH:

$ export PKG_CONFIG_PATH=/opt/custom/lib/pkgconfig:$PKG_CONFIG_PATH
$ pkg-config --libs customlib

RPATH and RUNPATH

Sometimes a program needs to carry its own library search path inside its binary. This is done with RPATH or RUNPATH.

What Are They?

Both RPATH and RUNPATH are paths stored inside the ELF binary that tell the dynamic linker where to search for libraries. The key difference is search order:

With RPATH:     RPATH → LD_LIBRARY_PATH → /etc/ld.so.cache → defaults
With RUNPATH:   LD_LIBRARY_PATH → RUNPATH → /etc/ld.so.cache → defaults

RPATH takes precedence over LD_LIBRARY_PATH. RUNPATH does not. Modern toolchains prefer RUNPATH because it is more flexible (you can still override with LD_LIBRARY_PATH).

Checking RPATH/RUNPATH

$ readelf -d /usr/bin/someapp | grep -E "RPATH|RUNPATH"
 0x000000000000001d (RUNPATH)     Library runpath: [/opt/myapp/lib]

# Or with objdump
$ objdump -p /usr/bin/someapp | grep -E "RPATH|RUNPATH"
  RUNPATH              /opt/myapp/lib

Setting RPATH/RUNPATH at Compile Time

# Using gcc directly
$ gcc -o myapp myapp.c -lfoo -Wl,-rpath,/opt/myapp/lib

# Using CMake
$ cmake .. -DCMAKE_INSTALL_RPATH=/opt/myapp/lib

# Using autotools
$ ./configure LDFLAGS="-Wl,-rpath,/opt/myapp/lib"

Modifying RPATH After Compilation

The patchelf tool can modify the RPATH/RUNPATH of an existing binary:

$ sudo apt install patchelf    # or dnf/pacman equivalent

# View current RPATH
$ patchelf --print-rpath ./myapp
/opt/old/lib

# Set a new RPATH
$ patchelf --set-rpath /opt/new/lib ./myapp

# Remove RPATH entirely
$ patchelf --remove-rpath ./myapp

The $ORIGIN Variable

RPATH supports a special variable $ORIGIN that resolves to the directory containing the executable. This is useful for self-contained application bundles:

/opt/myapp/
├── bin/
│   └── myapp         (RUNPATH = $ORIGIN/../lib)
└── lib/
    └── libfoo.so.2
$ gcc -o myapp myapp.c -lfoo -Wl,-rpath,'$ORIGIN/../lib'

No matter where /opt/myapp is installed, the binary will find its libraries relative to its own location.


Hands-On: Library Exploration Lab

Exercise 1: Map a program's complete library dependencies

# Start with a complex program
$ ldd /usr/bin/python3 | wc -l
12

# Now recursively find ALL libraries (libraries that depend on libraries)
$ ldd /usr/bin/python3
	linux-vdso.so.1 (0x00007ffd...)
	libpython3.11.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.11.so.1.0
	libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6
	libexpat.so.1 => /usr/lib/x86_64-linux-gnu/libexpat.so.1
	libz.so.1 => /usr/lib/x86_64-linux-gnu/libz.so.1
	...

# Check what libpython itself depends on
$ ldd /usr/lib/x86_64-linux-gnu/libpython3.11.so.1.0
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6
	libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6
	libpthread.so.0 => ...

Exercise 2: Create and use a shared library

# Step 1: Write a simple library
$ cat > mylib.c << 'EOF'
#include <stdio.h>

void greet(const char *name) {
    printf("Hello, %s! Greetings from a shared library.\n", name);
}
EOF

# Step 2: Write a header file
$ cat > mylib.h << 'EOF'
void greet(const char *name);
EOF

# Step 3: Write a program that uses it
$ cat > main.c << 'EOF'
#include "mylib.h"

int main() {
    greet("Linux learner");
    return 0;
}
EOF

# Step 4: Compile the shared library
$ gcc -shared -fPIC -o libmylib.so.1.0.0 mylib.c
$ ln -sf libmylib.so.1.0.0 libmylib.so.1      # SONAME symlink
$ ln -sf libmylib.so.1 libmylib.so             # Linker name symlink

# Step 5: Compile the program
$ gcc -o myapp main.c -L. -lmylib -Wl,-rpath,.

# Step 6: Run it
$ ./myapp
Hello, Linux learner! Greetings from a shared library.

# Step 7: Verify the library dependency
$ ldd ./myapp | grep mylib
	libmylib.so.1 => ./libmylib.so.1 (0x00007f...)

# Step 8: What happens without the library?
$ mv libmylib.so.1.0.0 /tmp/
$ ./myapp
./myapp: error while loading shared libraries: libmylib.so.1:
cannot open shared object file: No such file or directory

# Step 9: Restore it
$ mv /tmp/libmylib.so.1.0.0 .
$ ./myapp
Hello, Linux learner! Greetings from a shared library.

Exercise 3: The linker cache in action

# Step 1: Install your library system-wide
$ sudo cp libmylib.so.1.0.0 /usr/local/lib/
$ sudo ln -sf libmylib.so.1.0.0 /usr/local/lib/libmylib.so.1
$ sudo ln -sf libmylib.so.1 /usr/local/lib/libmylib.so

# Step 2: Update the cache
$ sudo ldconfig

# Step 3: Verify it is in the cache
$ ldconfig -p | grep mylib
	libmylib.so.1 (libc6,x86-64) => /usr/local/lib/libmylib.so.1

# Step 4: Recompile without -rpath and it still works
$ gcc -o myapp main.c -L/usr/local/lib -lmylib
$ ldd ./myapp | grep mylib
	libmylib.so.1 => /usr/local/lib/libmylib.so.1 (0x00007f...)
$ ./myapp
Hello, Linux learner! Greetings from a shared library.

# Step 5: Clean up
$ sudo rm /usr/local/lib/libmylib.so*
$ sudo ldconfig

Debug This

A developer deploys an application to a production server. The application works perfectly on their development machine but fails on the server:

$ /opt/myapp/bin/server
/opt/myapp/bin/server: error while loading shared libraries: libboost_system.so.1.74.0:
cannot open shared object file: No such file or directory

They check:

$ ldconfig -p | grep libboost_system
	libboost_system.so.1.83.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.83.0

The library exists, but version 1.83.0 is installed. The binary wants version 1.74.0.

What happened and how would you fix it?

Diagnosis: The binary was compiled on a system with Boost 1.74 and hard-coded the SONAME libboost_system.so.1.74.0. The production server has Boost 1.83, which has a different SONAME (the major version number in the SONAME changed, indicating an ABI break).

Options:

  1. Install the matching library version alongside the newer one (if the package exists):

    $ sudo apt install libboost-system1.74.0
    
  2. Recompile the application on the production server (or a system with matching libraries).

  3. Create a symlink (risky -- only if the ABI is actually compatible):

    # DANGEROUS -- only as a last resort after testing
    $ sudo ln -s libboost_system.so.1.83.0 /usr/lib/x86_64-linux-gnu/libboost_system.so.1.74.0
    $ sudo ldconfig
    
  4. Ship the libraries with the application and set RUNPATH:

    $ patchelf --set-rpath '/opt/myapp/lib' /opt/myapp/bin/server
    # Then copy the correct libraries to /opt/myapp/lib/
    

Option 4 is the most robust for deployment -- it makes the application self-contained.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                                                               │
│  In this chapter, you learned:                                │
│                                                               │
│  - Static libraries (.a) are linked at compile time.          │
│    Shared libraries (.so) are loaded at runtime.              │
│                                                               │
│  - Library naming: libfoo.so (linker name) ->                 │
│    libfoo.so.1 (SONAME) -> libfoo.so.1.2.3 (real file).      │
│    The SONAME enables backward-compatible updates.            │
│                                                               │
│  - The dynamic linker searches: RPATH -> LD_LIBRARY_PATH ->   │
│    RUNPATH -> /etc/ld.so.cache -> default paths.              │
│                                                               │
│  - ldd shows a binary's library dependencies.                 │
│    Use objdump -p for untrusted binaries.                     │
│                                                               │
│  - /etc/ld.so.conf.d/ configures library search paths.        │
│    Run ldconfig after changes to rebuild the cache.           │
│                                                               │
│  - "cannot open shared object file" means the dynamic         │
│    linker cannot find a required library. Fix by installing    │
│    the library, adding its path to ld.so.conf, or setting     │
│    LD_LIBRARY_PATH (for testing only).                        │
│                                                               │
│  - strace and LD_DEBUG reveal exactly what the linker          │
│    is searching for and where.                                │
│                                                               │
│  - pkg-config provides compiler/linker flags for libraries.   │
│                                                               │
│  - RPATH/RUNPATH embed library search paths in binaries.      │
│    patchelf can modify them after compilation.                │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Try This

Exercises

  1. Library census: Run ldd on five different programs (curl, python3, bash, vim, ssh). Which shared libraries appear in all of them? What does that tell you about those libraries?

  2. Create a shared library: Follow the hands-on exercise above to create libmylib.so, but add a second version (libmylib.so.2.0.0) with a different function signature. Install both versions and verify they can coexist.

  3. Break and fix: Compile a program that depends on a shared library. Then move the library to a non-standard location and observe the error. Fix it three different ways: (a) LD_LIBRARY_PATH, (b) /etc/ld.so.conf.d/, (c) patchelf --set-rpath.

  4. strace investigation: Run strace -e openat /usr/bin/curl --version 2>&1 | grep .so and trace every library the dynamic linker opens. How many files does it try before finding each library?

  5. pkg-config audit: Run pkg-config --list-all to see every library with pkg-config support on your system. Pick three and examine their .pc files.

Bonus Challenge

Write a script that checks all binaries in /usr/local/bin/ for missing library dependencies. For each binary, run ldd and report any libraries marked as not found. This is a useful health check after system upgrades.


What's Next

Now that you understand shared libraries and how the dynamic linker works, you have the foundation to understand something much larger: the Linux kernel itself. Chapter 60 covers how to upgrade your kernel, what DKMS does for third-party modules, and how to build a custom kernel from source -- the ultimate compilation project.

Kernel Upgrades & Custom Kernels

Why This Matters

Your cloud provider just announced a new generation of instances with hardware features that require kernel 6.5 or newer, but your distribution ships kernel 5.15. Or a kernel vulnerability (like the Dirty Pipe exploit, CVE-2022-0847) is announced and you need to patch every server in your fleet within hours. Or you are building an embedded system where the default kernel includes thousands of drivers you will never need, and you need to strip it down to reduce boot time and attack surface.

The kernel is the most critical piece of software on your system. Every process, every file operation, every network packet passes through it. Understanding how to upgrade it safely, how to roll back when things go wrong, and how to build a custom kernel when you need one -- these are skills that separate system administrators from system engineers.

This chapter covers kernel versioning, upgrading via package manager, DKMS for third-party modules, building a custom kernel from source, configuring it with menuconfig, installing it alongside your existing kernel, updating GRUB, and rolling back when things go wrong.


Try This Right Now

Check your current kernel version and see what other kernels are available on your system:

# What kernel are you running right now?
$ uname -r
6.1.0-18-amd64

# What other kernels are installed?
# Debian/Ubuntu:
$ dpkg --list | grep linux-image
ii  linux-image-6.1.0-17-amd64   6.1.69-1    amd64  Linux 6.1 for 64-bit PCs
ii  linux-image-6.1.0-18-amd64   6.1.76-1    amd64  Linux 6.1 for 64-bit PCs

# Fedora/RHEL:
$ rpm -qa | grep kernel-core
kernel-core-6.6.9-200.fc39.x86_64
kernel-core-6.7.3-200.fc39.x86_64

# Arch:
$ pacman -Q linux
linux 6.7.4.arch1-1

Notice that most distributions keep at least two kernel versions installed. This is your safety net.


Kernel Versioning

Linux kernel versions follow a specific numbering scheme that tells you exactly what you are running.

6.1.76
^ ^ ^^
│ │ └── Patch version (bug fixes, security patches)
│ └──── Minor version (new features within the major series)
└────── Major version

Since kernel 3.0 (2011), the major number increments when the minor number gets "large enough" -- there is no technical significance to the boundary. Linus Torvalds bumps the major version when the minor version would get unwieldy (he bumped from 5.19 to 6.0, for example). Features can land in any release.

Kernel Release Types

┌───────────────────────────────────────────────────────────┐
│  Kernel Release Types                                      │
│                                                            │
│  Mainline     Latest development release from Linus        │
│               (e.g., 6.8-rc3)                              │
│               New features land here first.                │
│                                                            │
│  Stable       Bug-fix-only releases from mainline          │
│               (e.g., 6.7.5)                                │
│               New stable release every ~1 week.            │
│                                                            │
│  LTS          Long-term support, maintained for 2-6 years  │
│  (Longterm)   (e.g., 6.1.76, 5.15.148, 5.10.209)         │
│               Distribution kernels are typically based      │
│               on LTS releases.                             │
│                                                            │
│  Distro       Distribution-specific patched kernel         │
│               (e.g., 6.1.0-18-amd64 on Debian)            │
│               May include backported features and          │
│               distribution-specific patches.               │
└───────────────────────────────────────────────────────────┘

You can check the latest versions at kernel.org:

# Quick check from the command line
$ curl -s https://www.kernel.org/finger_banner
The latest stable version of the Linux kernel is:    6.7.5
The latest mainline version of the Linux kernel is:  6.8-rc5
The latest stable 6.6 version is:                    6.6.17
The latest longterm 6.1 version is:                  6.1.78
The latest longterm 5.15 version is:                 5.15.148
...

Think About It: Your distribution ships kernel 6.1.76. The latest mainline kernel is 6.8-rc5. Should you upgrade to 6.8 on a production server? What are the trade-offs?


Upgrading the Kernel via Package Manager

The safest way to upgrade a kernel is through your distribution's package manager. The distribution maintainers test the kernel, apply patches, and handle the GRUB configuration for you.

Debian/Ubuntu

# Check available kernel updates
$ apt list --upgradable 2>/dev/null | grep linux-image
linux-image-6.1.0-19-amd64/stable 6.1.82-1 amd64 [upgradable from: 6.1.76-1]

# Install the new kernel
$ sudo apt install linux-image-6.1.0-19-amd64

# The package post-install script automatically:
# - Installs the kernel image to /boot
# - Generates an initramfs (initial RAM filesystem)
# - Updates GRUB configuration
# - Does NOT reboot (you choose when to reboot)

# Verify it is installed
$ ls /boot/vmlinuz-*
/boot/vmlinuz-6.1.0-17-amd64
/boot/vmlinuz-6.1.0-18-amd64
/boot/vmlinuz-6.1.0-19-amd64

# Reboot to use the new kernel
$ sudo reboot

# After reboot, confirm
$ uname -r
6.1.0-19-amd64

Installing matching kernel headers (needed for building kernel modules):

$ sudo apt install linux-headers-$(uname -r)

Fedora/RHEL

# Check for kernel updates
$ dnf check-update kernel
kernel.x86_64    6.7.4-200.fc39    updates

# Install (DNF keeps multiple kernel versions by default)
$ sudo dnf upgrade kernel

# Fedora keeps the last 3 kernels by default
# Configured in /etc/dnf/dnf.conf:
#   installonly_limit=3

# Reboot
$ sudo reboot

# Confirm
$ uname -r
6.7.4-200.fc39.x86_64

Arch Linux

# Arch upgrades the kernel as part of a full system upgrade
$ sudo pacman -Syu

# This replaces the kernel (Arch does not keep old versions by default)
# Reboot to use the new kernel
$ sudo reboot

Distro Note: Arch Linux does not keep old kernel versions by default. If a kernel update breaks something, you need a rescue medium to roll back. Consider installing the linux-lts package as a fallback kernel.

Safety Warning: Never reboot a production server right after a kernel upgrade without a plan to roll back. Test on a staging system first. Make sure you can access the GRUB menu (or have console access) to select a previous kernel if the new one fails to boot.


Understanding /boot

The /boot directory contains everything needed to start the kernel:

$ ls -lh /boot/
-rw-r--r-- 1 root root 256K  config-6.1.0-18-amd64
-rw-r--r-- 1 root root  62M  initrd.img-6.1.0-18-amd64
-rw-r--r-- 1 root root 3.5M  System.map-6.1.0-18-amd64
-rw-r--r-- 1 root root 7.5M  vmlinuz-6.1.0-18-amd64
drwxr-xr-x 5 root root 4.0K  grub
FilePurpose
vmlinuz-*The compressed kernel image (the actual kernel binary)
initrd.img-* or initramfs-*The initial RAM disk -- a small filesystem loaded into RAM with drivers needed to mount the real root filesystem
config-*The kernel configuration file (every option used to build this kernel)
System.map-*Symbol table for the kernel (maps memory addresses to function names, useful for debugging)
grub/GRUB bootloader configuration

GRUB: Selecting and Configuring Kernels

GRUB (GRand Unified Bootloader) is the bootloader on most Linux distributions. It presents a menu at boot time where you can select which kernel to boot.

Viewing GRUB Configuration

# The main GRUB config (auto-generated, do NOT edit directly)
$ cat /boot/grub/grub.cfg | grep menuentry
menuentry 'Debian GNU/Linux, with Linux 6.1.0-19-amd64' ...
menuentry 'Debian GNU/Linux, with Linux 6.1.0-18-amd64' ...
menuentry 'Debian GNU/Linux, with Linux 6.1.0-17-amd64' ...

Customizing GRUB

Edit /etc/default/grub for configuration, then regenerate:

$ cat /etc/default/grub
GRUB_DEFAULT=0                # Boot the first entry by default
GRUB_TIMEOUT=5                # Show the menu for 5 seconds
GRUB_CMDLINE_LINUX_DEFAULT="quiet"   # Default kernel parameters
GRUB_CMDLINE_LINUX=""                 # Kernel parameters for ALL entries

After editing, regenerate the GRUB configuration:

# Debian/Ubuntu
$ sudo update-grub

# Fedora/RHEL
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg

# Arch
$ sudo grub-mkconfig -o /boot/grub/grub.cfg

Kernel Parameters

Kernel parameters are passed on the command line at boot. They control kernel behavior.

Common parameters:

ParameterEffect
quietSuppress most boot messages
splashShow a graphical splash screen instead of text
nomodesetDisable kernel mode-setting (useful for graphics issues)
single or 1Boot into single-user mode (recovery)
init=/bin/bashSkip init, drop to a root shell (emergency recovery)
mem=4GLimit usable RAM to 4 GB
maxcpus=2Limit to 2 CPUs
panic=10Reboot automatically 10 seconds after a kernel panic
net.ifnames=0Use classic network interface names (eth0 instead of enp0s3)

Adding temporary kernel parameters (one-time at boot):

  1. At the GRUB menu, press e to edit the selected entry
  2. Find the line starting with linux (the kernel command line)
  3. Add your parameter at the end of that line
  4. Press Ctrl+X or F10 to boot

Adding permanent kernel parameters:

# Edit /etc/default/grub
$ sudo nano /etc/default/grub

# Add to the GRUB_CMDLINE_LINUX_DEFAULT line:
GRUB_CMDLINE_LINUX_DEFAULT="quiet panic=10"

# Regenerate GRUB config
$ sudo update-grub

Checking current kernel parameters:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.1.0-18-amd64 root=UUID=a1b2c3d4... ro quiet

Rolling Back to a Previous Kernel

When a kernel upgrade causes problems (drivers not loading, system not booting, performance regression), you need to roll back.

Method 1: Select at GRUB Menu

The simplest approach -- just select the old kernel at the GRUB menu:

  1. Reboot the system
  2. Hold Shift (BIOS) or Esc (UEFI) during boot to show the GRUB menu
  3. Select "Advanced options"
  4. Choose the previous kernel version
  5. The system boots with the old kernel

Method 2: Change Default Kernel

To make the older kernel the default:

# List available kernels with their menu entries
# Debian/Ubuntu:
$ grep menuentry /boot/grub/grub.cfg

# Set the default to a specific menu entry
# (entries are numbered starting from 0, or use the full string)
$ sudo nano /etc/default/grub
GRUB_DEFAULT="Advanced options for Debian GNU/Linux>Debian GNU/Linux, with Linux 6.1.0-18-amd64"

$ sudo update-grub

Method 3: Remove the Problematic Kernel

# Debian/Ubuntu (make sure you are NOT running the kernel you are removing)
$ uname -r    # Verify you are on the OLD kernel
6.1.0-18-amd64

$ sudo apt remove linux-image-6.1.0-19-amd64
$ sudo update-grub

# Fedora/RHEL
$ sudo dnf remove kernel-core-6.7.4-200.fc39.x86_64

Safety Warning: Never remove the kernel you are currently running (uname -r). Always keep at least two kernel versions installed so you have a fallback.

Think About It: On a remote server where you cannot physically access the GRUB menu, how would you handle a kernel upgrade that might fail? (Hint: think about kexec, remote console access, or cloud provider console features.)


DKMS: Dynamic Kernel Module Support

When you install third-party kernel modules (like VirtualBox guest additions, NVIDIA drivers, or ZFS), those modules are compiled against a specific kernel version. When you upgrade the kernel, those modules need to be recompiled.

DKMS automates this process. It automatically rebuilds registered modules whenever a new kernel is installed.

┌───────────────────────────────────────────────────────────┐
│  Without DKMS:                                             │
│                                                            │
│  1. Install kernel 6.1.0-18                                │
│  2. Compile VirtualBox modules for 6.1.0-18  ✓             │
│  3. Upgrade to kernel 6.1.0-19                             │
│  4. VirtualBox modules are MISSING for 6.1.0-19  ✗         │
│  5. Manually recompile modules  (annoying)                 │
│                                                            │
├───────────────────────────────────────────────────────────┤
│  With DKMS:                                                │
│                                                            │
│  1. Install kernel 6.1.0-18                                │
│  2. Install VirtualBox modules via DKMS  ✓                 │
│  3. Upgrade to kernel 6.1.0-19                             │
│  4. DKMS automatically rebuilds modules for 6.1.0-19  ✓    │
│  5. Everything just works                                  │
└───────────────────────────────────────────────────────────┘

Using DKMS

# Install DKMS
$ sudo apt install dkms    # or dnf/pacman

# Check registered DKMS modules
$ dkms status
virtualbox-guest/7.0.14, 6.1.0-18-amd64, x86_64: installed
zfs/2.2.2, 6.1.0-18-amd64, x86_64: installed

# Manually rebuild all modules for the current kernel
$ sudo dkms autoinstall -k $(uname -r)

# Build and install a specific module
$ sudo dkms build -m virtualbox-guest -v 7.0.14 -k 6.1.0-19-amd64
$ sudo dkms install -m virtualbox-guest -v 7.0.14 -k 6.1.0-19-amd64

DKMS modules are stored in /usr/src/:

$ ls /usr/src/
linux-headers-6.1.0-18-amd64/
virtualbox-guest-7.0.14/
zfs-2.2.2/

Each directory contains the module source code and a dkms.conf file that tells DKMS how to build it.


Why Build a Custom Kernel?

The distribution kernel works for most cases. You should only build a custom kernel when you have a specific reason:

  1. Hardware support: You need a newer kernel for new hardware, but your distro has not released one
  2. Performance tuning: You want to enable/disable specific scheduler options, change the tick rate, or enable real-time preemption
  3. Minimalism: You want a kernel with only the drivers you need (smaller, faster boot, smaller attack surface)
  4. Patching: You need to apply a patch that has not been merged upstream or into your distro's kernel
  5. Learning: Understanding the kernel build process deepens your understanding of Linux
┌───────────────────────────────────────────────────────────┐
│  DO build a custom kernel when:                            │
│  - You need hardware support not in your distro kernel     │
│  - You need specific performance tuning options             │
│  - You are building an embedded or specialized system       │
│  - You need to apply custom patches                        │
│  - You want to learn                                       │
│                                                            │
│  DO NOT build a custom kernel when:                        │
│  - Your distro kernel works fine                           │
│  - You just want "the latest" (wait for your distro)       │
│  - You manage many servers (use distro kernels + DKMS)     │
│  - You cannot commit to maintaining it                     │
└───────────────────────────────────────────────────────────┘

Building a Custom Kernel

This is a complete walkthrough of building a Linux kernel from source. We will compile and install it alongside your existing kernel so you always have a fallback.

Step 1: Install Prerequisites

# Debian/Ubuntu
$ sudo apt install build-essential libncurses-dev bison flex libssl-dev \
    libelf-dev bc dwarves

# Fedora/RHEL
$ sudo dnf install gcc make ncurses-devel bison flex elfutils-libelf-devel \
    openssl-devel bc dwarves perl

# Arch
$ sudo pacman -S base-devel xmlto kmod inetutils bc libelf git cpio perl

Step 2: Download the Kernel Source

$ cd /usr/src
$ sudo wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.7.5.tar.xz

# Verify the download (always verify for security)
$ sudo wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.7.5.tar.sign
$ sudo unxz linux-6.7.5.tar.xz
# Import the kernel signing key and verify (optional but recommended)

# Extract
$ sudo tar xf linux-6.7.5.tar
$ cd linux-6.7.5

Step 3: Configure the Kernel

The kernel has thousands of configuration options. You need to decide which features, drivers, and modules to include.

Option A: Start from your current kernel's config (recommended for first build).

# Copy your running kernel's configuration
$ cp /boot/config-$(uname -r) .config

# Update it for the new kernel version
# (answers new options with their default values)
$ make olddefconfig

Option B: Use menuconfig for interactive configuration.

$ make menuconfig

This launches a text-based menu interface:

┌──────────── Linux/x86 6.7.5 Kernel Configuration ─────────────┐
│  Arrow keys navigate the menu. <Enter> selects submenus -->.   │
│  Highlighted letters are hotkeys. Pressing <Y> includes, <N>   │
│  excludes, <M> builds as a module. <Esc><Esc> to go back.      │
│                                                                 │
│       General setup  --->                                       │
│       Processor type and features  --->                         │
│       Power management and ACPI options  --->                   │
│       Bus options (PCI etc.)  --->                              │
│       Firmware Drivers  --->                                    │
│       [*] Networking support  --->                              │
│       Device Drivers  --->                                      │
│       File systems  --->                                        │
│       Security options  --->                                    │
│       -*- Cryptographic API  --->                               │
│       Library routines  --->                                    │
│       Kernel hacking  --->                                      │
│                                                                 │
│         <Select>   < Exit >   < Help >   < Save >   < Load >   │
└─────────────────────────────────────────────────────────────────┘

Key concepts in menuconfig:

  • [*] = Built into the kernel (always loaded)
  • [M] = Built as a loadable module (loaded on demand)
  • [ ] = Not included
  • Press Y to include, N to exclude, M for module
  • Press / to search for a specific option
  • Press ? on any option for help text

Important configuration areas:

General setup --->
    Local version: -custom1          # Appended to the version string
    (This becomes part of `uname -r`, e.g., 6.7.5-custom1)

Processor type and features --->
    Processor family: (Generic-x86-64)
    # Change to match your specific CPU for slight performance gains

Device Drivers --->
    # Enable only the drivers you need
    # Common: SATA/SCSI, network drivers, USB, input devices
    # Disable drivers for hardware you don't have to reduce kernel size

File systems --->
    # Make sure your root filesystem type is built-in [*], not module [M]
    # ext4, XFS, or Btrfs depending on your setup

Networking support --->
    Networking options --->
        # TCP/IP, Netfilter (iptables/nftables) for firewalls

Safety Warning: If you are building a kernel for a system that boots from a specific filesystem or block device, make sure the relevant drivers are built-in ([*]), not as modules ([M]). If they are modules, the kernel cannot read the disk to find the modules. Alternatively, ensure the initramfs includes the necessary modules.

Option C: Use localmodconfig for a minimal kernel based on currently loaded modules.

# This configures only the modules currently loaded on your running system
$ make localmodconfig

This produces a much smaller kernel but will only include drivers for hardware currently connected. It is excellent for dedicated servers where the hardware is fixed.

Step 4: Compile the Kernel

# Compile the kernel, modules, and device tree blobs
$ make -j$(nproc)

This will take 15-90 minutes depending on your hardware and configuration. On a modern multi-core system, make -j$(nproc) uses all available CPU cores.

You will see output like:

  CALL    scripts/checksyscalls.sh
  CC      init/main.o
  CC      init/version.o
  ...
  LD      vmlinux
  SORTTAB vmlinux
  SYSMAP  System.map
  OBJCOPY arch/x86/boot/compressed/vmlinux.bin
  ...
  BUILD   arch/x86/boot/bzImage
Setup is 17500 bytes (padded to 17920 bytes).
System is 12148 kB
Kernel: arch/x86/boot/bzImage is ready  (#1)

When you see "bzImage is ready," the kernel compiled successfully.

Step 5: Install Modules

$ sudo make modules_install

This copies compiled modules to /lib/modules/6.7.5-custom1/.

$ ls /lib/modules/
6.1.0-18-amd64/     # Your old kernel's modules
6.7.5-custom1/      # Your new kernel's modules

Step 6: Install the Kernel

$ sudo make install

This copies the kernel image and configuration to /boot/ and runs the distribution's kernel installation hooks (which typically update GRUB and generate an initramfs).

On some distributions, you may need to generate the initramfs manually:

# Debian/Ubuntu
$ sudo update-initramfs -c -k 6.7.5-custom1

# Fedora/RHEL
$ sudo dracut --force /boot/initramfs-6.7.5-custom1.img 6.7.5-custom1

Step 7: Update GRUB

# Debian/Ubuntu
$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.7.5-custom1
Found initrd image: /boot/initrd.img-6.7.5-custom1
Found linux image: /boot/vmlinuz-6.1.0-18-amd64
Found initrd image: /boot/initrd.img-6.1.0-18-amd64
done

# Fedora/RHEL
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Step 8: Reboot and Test

$ sudo reboot

At the GRUB menu, you should see your custom kernel as an option. Select it and boot.

After booting:

$ uname -r
6.7.5-custom1

$ uname -a
Linux myhost 6.7.5-custom1 #1 SMP PREEMPT_DYNAMIC Sat Feb 10 14:22:33 UTC 2024 x86_64 GNU/Linux

If the custom kernel fails to boot, select your previous kernel at the GRUB menu.


Hands-On: Complete Custom Kernel Build Summary

Here is the entire process in a concise reference:

# 1. Install prerequisites
$ sudo apt install build-essential libncurses-dev bison flex libssl-dev \
    libelf-dev bc dwarves

# 2. Download and extract kernel source
$ cd /usr/src
$ sudo wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.7.5.tar.xz
$ sudo tar xJf linux-6.7.5.tar.xz
$ cd linux-6.7.5

# 3. Configure
$ cp /boot/config-$(uname -r) .config
$ make olddefconfig          # or: make menuconfig

# 4. Compile
$ make -j$(nproc)

# 5. Install modules
$ sudo make modules_install

# 6. Install kernel
$ sudo make install

# 7. Update bootloader
$ sudo update-grub           # Debian/Ubuntu
# or: sudo grub2-mkconfig -o /boot/grub2/grub.cfg  (Fedora/RHEL)

# 8. Reboot
$ sudo reboot

# 9. Verify
$ uname -r

Removing a Custom Kernel

If you no longer need a custom kernel:

# Remove the kernel files from /boot
$ sudo rm /boot/vmlinuz-6.7.5-custom1
$ sudo rm /boot/initrd.img-6.7.5-custom1
$ sudo rm /boot/config-6.7.5-custom1
$ sudo rm /boot/System.map-6.7.5-custom1

# Remove the modules
$ sudo rm -rf /lib/modules/6.7.5-custom1/

# Update GRUB
$ sudo update-grub

Kernel Modules

Kernel modules are pieces of kernel code that can be loaded and unloaded at runtime without rebooting. Most device drivers are modules.

Working with Modules

# List currently loaded modules
$ lsmod
Module                  Size  Used by
nf_tables             303104  0
xt_conntrack            16384  1
nf_conntrack          176128  1 xt_conntrack
btrfs                1654784  1
...

# Show info about a module
$ modinfo e1000e
filename:       /lib/modules/6.1.0-18-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version:        3.8.7-NAPI
license:        GPL v2
description:    Intel(R) PRO/1000 Network Driver
...

# Load a module
$ sudo modprobe e1000e

# Unload a module (only if not in use)
$ sudo modprobe -r e1000e

# Load a module with parameters
$ sudo modprobe bonding mode=4 miimon=100

Persisting Module Options

To load a module at boot:

# Create a file in /etc/modules-load.d/
$ echo "bonding" | sudo tee /etc/modules-load.d/bonding.conf

To set module parameters:

# Create a file in /etc/modprobe.d/
$ echo "options bonding mode=4 miimon=100" | sudo tee /etc/modprobe.d/bonding.conf

To blacklist a module (prevent it from loading):

$ echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
$ sudo update-initramfs -u    # Rebuild initramfs

Debug This

After building and installing a custom kernel, a colleague reports that the system boots but networking does not work. WiFi and Ethernet are both dead. ip link shows only the loopback interface.

$ uname -r
6.7.5-custom1

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

What happened?

When they configured the kernel with make menuconfig, they likely:

  1. Disabled the network drivers needed for their hardware, OR
  2. Built the drivers as modules ([M]) but the modules were not loaded, OR
  3. Used make localmodconfig on a system where the network interface was not active at the time

How to diagnose:

# Step 1: What network hardware does the system have?
$ lspci | grep -i net
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM
03:00.0 Network controller: Intel Corporation Wireless 8265

# Step 2: What driver does this hardware need?
# (Check from a working kernel)
$ lspci -k | grep -A3 "Ethernet"
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM
	Kernel driver in use: e1000e
	Kernel modules: e1000e

# Step 3: Is the module available in the custom kernel?
$ find /lib/modules/6.7.5-custom1/ -name "e1000e*"
# (no output -- the module was not compiled)

# Step 4: Check the kernel config
$ grep E1000E /boot/config-6.7.5-custom1
# CONFIG_E1000E is not set

The fix:

$ cd /usr/src/linux-6.7.5
$ make menuconfig
# Navigate to: Device Drivers -> Network device support -> Ethernet driver support
# Enable Intel(R) PRO/1000 PCI-Express (e1000e) as [M] or [*]
$ make -j$(nproc)
$ sudo make modules_install
$ sudo make install
$ sudo reboot

Kernel Upgrade Checklist for Production

Before upgrading a kernel on a production server, follow this checklist:

┌──────────────────────────────────────────────────────────────┐
│  Pre-Upgrade Checklist                                        │
│                                                               │
│  [ ] Test the new kernel on a staging/test system first       │
│  [ ] Verify all DKMS modules build successfully               │
│  [ ] Verify all critical applications work on the new kernel  │
│  [ ] Ensure at least one previous kernel is installed          │
│  [ ] Ensure you have console access (IPMI, cloud console)     │
│      in case the new kernel fails to boot                     │
│  [ ] Schedule a maintenance window                            │
│  [ ] Notify stakeholders                                      │
│  [ ] Take a snapshot/backup if possible                       │
│                                                               │
│  Post-Upgrade Verification                                    │
│                                                               │
│  [ ] uname -r shows the expected new version                 │
│  [ ] dmesg shows no critical errors                          │
│  [ ] All network interfaces are up                           │
│  [ ] All filesystems are mounted                             │
│  [ ] All critical services are running                       │
│  [ ] DKMS modules loaded (dkms status shows "installed")     │
│  [ ] Application smoke tests pass                            │
│                                                               │
└──────────────────────────────────────────────────────────────┘

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                                                               │
│  In this chapter, you learned:                                │
│                                                               │
│  - Kernel versions follow major.minor.patch. LTS kernels      │
│    are maintained for years. Distribution kernels are based   │
│    on LTS releases with additional patches.                   │
│                                                               │
│  - Upgrading via package manager is the safest approach.      │
│    apt/dnf handle kernel image, initramfs, and GRUB.         │
│                                                               │
│  - GRUB allows selecting between installed kernels.           │
│    Kernel parameters are set in /etc/default/grub.           │
│                                                               │
│  - DKMS automatically rebuilds third-party kernel modules     │
│    when a new kernel is installed.                            │
│                                                               │
│  - Building a custom kernel:                                  │
│    1. Download source from kernel.org                        │
│    2. Configure with make menuconfig (or olddefconfig)       │
│    3. Compile with make -j$(nproc)                           │
│    4. Install with make modules_install && make install       │
│    5. Update GRUB                                            │
│                                                               │
│  - Always keep a previous working kernel as a fallback.       │
│                                                               │
│  - Kernel modules can be loaded, unloaded, configured,        │
│    and blacklisted at runtime.                                │
│                                                               │
│  - Always test kernel upgrades on staging before production.  │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Try This

Exercises

  1. Kernel inventory: Run uname -r and look up your kernel version on kernel.org. Is it an LTS kernel? When will it reach end of life? Is there a newer patch version available?

  2. GRUB exploration: Read /boot/grub/grub.cfg (do not edit it). Identify the menu entries for each installed kernel. Find the kernel command line parameters for each entry.

  3. Module exploration: Run lsmod and pick three modules you do not recognize. Use modinfo to find out what they do. Then check if your hardware actually uses them with lspci -k.

  4. Kernel config comparison: Compare the config files for two installed kernels: diff /boot/config-VERSION1 /boot/config-VERSION2. What changed between versions?

  5. DKMS status: If you have DKMS installed, run dkms status to see what modules are managed. If not, install dkms and examine its directory structure in /usr/src/.

Bonus Challenge

On a virtual machine or test system (never production), download the latest stable kernel from kernel.org and build it from source. Use make localmodconfig to create a minimal configuration based on your currently loaded modules. Measure the compile time, the resulting kernel image size, and the number of modules compared to your distribution kernel. Boot the custom kernel and verify that all hardware works.


What's Next

With a solid understanding of packages, compilation, shared libraries, and the kernel, you have mastered the software stack from userspace down to the kernel itself. Part XIV takes you into virtualization and containers -- how Linux creates isolated environments that behave like separate machines, all on the same kernel.

Virtualization Concepts

Why This Matters

You are a sysadmin managing a rack of physical servers. Each server runs one application: a web server on one box, a database on another, a mail server on a third. Most of them sit at 5-10% CPU utilization all day, burning electricity, generating heat, and consuming rack space. When the web team needs a staging environment, they file a ticket and wait three weeks for new hardware to arrive.

Virtualization changed all of that. By running multiple virtual machines on a single physical host, organizations went from needing hundreds of physical servers to dozens. Provisioning dropped from weeks to minutes. Testing became trivial -- spin up a VM, break it, throw it away. Entire data centers shrank.

Today, virtualization is a foundational technology. Whether you are running KVM on a production hypervisor, QEMU for kernel development, or VirtualBox on your laptop for testing, understanding how virtualization works at the Linux level is essential. This chapter takes you from the concepts all the way to running your first virtual machine with open-source tools.


Try This Right Now

Check if your CPU supports hardware virtualization:

# On Intel CPUs, look for "vmx"; on AMD, look for "svm"
$ grep -E '(vmx|svm)' /proc/cpuinfo | head -1

You should see something like:

flags : ... vmx ...

If you see vmx or svm, your CPU supports hardware-assisted virtualization. If not, you may need to enable it in your BIOS/UEFI settings (look for "Intel VT-x" or "AMD-V").

Check if the KVM kernel module is loaded:

$ lsmod | grep kvm

Expected output (on Intel):

kvm_intel             368640  0
kvm                  1028096  1 kvm_intel
irqbypass              16384  1 kvm

If you see kvm_intel or kvm_amd, your system is ready for KVM virtualization.


What Is Virtualization?

At its core, virtualization is the act of creating a software-based (virtual) version of something -- a computer, an operating system, a storage device, or a network. The most common form is hardware virtualization, where you run multiple virtual machines (VMs) on a single physical host, each behaving as if it were its own independent computer.

┌─────────────────────────────────────────────────────────────┐
│                    Physical Hardware                         │
│              (CPU, RAM, Disk, Network)                       │
├─────────────────────────────────────────────────────────────┤
│                 Hypervisor / VMM                             │
│          (Manages and isolates VMs)                          │
├──────────────┬──────────────┬───────────────────────────────┤
│    VM 1      │    VM 2      │    VM 3                       │
│ ┌──────────┐ │ ┌──────────┐ │ ┌──────────┐                 │
│ │  App A   │ │ │  App B   │ │ │  App C   │                 │
│ ├──────────┤ │ ├──────────┤ │ ├──────────┤                 │
│ │  Libs    │ │ │  Libs    │ │ │  Libs    │                 │
│ ├──────────┤ │ ├──────────┤ │ ├──────────┤                 │
│ │ Guest OS │ │ │ Guest OS │ │ │ Guest OS │                 │
│ │ (Ubuntu) │ │ │ (CentOS) │ │ │ (Debian) │                 │
│ └──────────┘ │ └──────────┘ │ └──────────┘                 │
└──────────────┴──────────────┴───────────────────────────────┘

Each VM gets its own:

  • Virtual CPUs (vCPUs)
  • Virtual RAM
  • Virtual disk (a file on the host, typically)
  • Virtual network interface
  • Its own kernel, running its own operating system

The hypervisor (also called a Virtual Machine Monitor or VMM) is the software that creates and manages these virtual machines.


The Two Types of Hypervisors

Not all hypervisors are created equal. The industry classifies them into two types based on where they sit in the software stack.

Type 1: Bare-Metal Hypervisors

A Type 1 hypervisor runs directly on the physical hardware, with no host operating system underneath. The hypervisor is the operating system (or is deeply integrated into it).

┌───────────────────────────────────┐
│   VM 1    │   VM 2    │   VM 3   │
├───────────┴───────────┴──────────┤
│     Type 1 Hypervisor            │
│  (runs directly on hardware)     │
├──────────────────────────────────┤
│        Physical Hardware          │
└──────────────────────────────────┘

Open-source examples:

  • KVM (Kernel-based Virtual Machine) -- built into the Linux kernel itself
  • Xen -- used by AWS for years, powers many cloud providers

KVM is technically interesting because the Linux kernel becomes the hypervisor. When you load the KVM module, your Linux host becomes a Type 1 hypervisor that also happens to run a general-purpose operating system.

Type 2: Hosted Hypervisors

A Type 2 hypervisor runs as an application on top of a conventional operating system, just like any other program.

┌───────────────────────────────────┐
│   VM 1    │   VM 2    │   VM 3   │
├───────────┴───────────┴──────────┤
│    Type 2 Hypervisor             │
│   (runs as application)          │
├──────────────────────────────────┤
│     Host Operating System        │
│      (Linux, Windows, macOS)     │
├──────────────────────────────────┤
│       Physical Hardware           │
└──────────────────────────────────┘

Open-source examples:

  • QEMU (Quick EMUlator) -- can emulate entirely in software
  • VirtualBox -- popular desktop virtualization tool

Think About It: KVM is often called a Type 1 hypervisor even though it runs within Linux. Why? Because once the KVM module is loaded, the Linux kernel itself is the hypervisor -- it runs directly on hardware and manages VMs at the kernel level. The distinction between "bare metal" and "hosted" gets blurry with KVM, and that is actually one of its strengths.


Full Virtualization vs Paravirtualization

There are two fundamental approaches to virtualizing hardware.

Full Virtualization

In full virtualization, the guest operating system runs completely unmodified. It believes it is running on real hardware. The hypervisor traps privileged instructions from the guest and emulates them.

Pros:

  • Run any OS without modification (Windows, BSD, etc.)
  • Guests need zero awareness of virtualization

Cons:

  • Historically slower due to trap-and-emulate overhead
  • Hardware-assisted virtualization (VT-x/AMD-V) has largely eliminated the performance gap

KVM with hardware-assisted virtualization is full virtualization -- your guest OS does not need to know it is virtualized.

Paravirtualization

In paravirtualization, the guest OS is modified to be aware it is running in a virtual environment. Instead of executing privileged instructions that need to be trapped, the guest makes explicit "hypercalls" to the hypervisor.

Pros:

  • Can be faster for certain I/O-heavy operations
  • Efficient communication with the hypervisor

Cons:

  • Requires a modified guest kernel
  • Cannot run unmodified proprietary operating systems

Xen pioneered paravirtualization. Today, even full-virtualization setups use paravirtual drivers (like virtio in KVM) for disk and network I/O because they are faster than emulating real hardware.

Full Virtualization:                   Paravirtualization:

Guest OS (unmodified)                 Guest OS (modified)
    │                                     │
    │ privileged instruction              │ hypercall
    │                                     │
    ▼                                     ▼
Hypervisor traps & emulates          Hypervisor handles directly
    │                                     │
    ▼                                     ▼
  Hardware                             Hardware

KVM and QEMU: The Linux Virtualization Stack

KVM and QEMU are the two workhorses of Linux virtualization. They are separate projects that work together beautifully.

KVM (Kernel-based Virtual Machine)

KVM is a Linux kernel module that turns the kernel into a hypervisor. It leverages hardware virtualization extensions (Intel VT-x or AMD-V) to run guest code directly on the CPU at near-native speed.

What KVM handles:

  • CPU virtualization (using hardware VT-x/AMD-V)
  • Memory virtualization (using hardware EPT/NPT)
  • Interrupt handling

What KVM does not handle:

  • Device emulation (disks, network cards, displays)
  • VM management (creating, configuring VMs)

That is where QEMU comes in.

QEMU (Quick EMUlator)

QEMU is a user-space emulator that can emulate an entire computer system -- CPU, memory, devices, everything -- purely in software. It can run a guest OS for a completely different CPU architecture (e.g., run ARM code on x86).

When paired with KVM, QEMU delegates CPU and memory virtualization to KVM (which uses hardware acceleration) and handles everything else: disk controllers, network cards, USB devices, display output.

┌───────────────────────────────────────────┐
│            User Space                      │
│   ┌─────────────────────────────────┐     │
│   │          QEMU Process           │     │
│   │  ┌────────┐  ┌──────────────┐   │     │
│   │  │ Device │  │ VM Config    │   │     │
│   │  │ Emulat.│  │ & Management │   │     │
│   │  └────────┘  └──────────────┘   │     │
│   └──────────────┬──────────────────┘     │
│                  │ /dev/kvm               │
├──────────────────┼────────────────────────┤
│   Kernel Space   │                        │
│   ┌──────────────▼──────────────────┐     │
│   │          KVM Module             │     │
│   │  (CPU & Memory Virtualization)  │     │
│   └─────────────────────────────────┘     │
├───────────────────────────────────────────┤
│      Hardware (VT-x / AMD-V)              │
└───────────────────────────────────────────┘

Hands-On: Running a VM with QEMU/KVM

Let us install the tools and run a minimal VM.

Install on Debian/Ubuntu:

$ sudo apt update
$ sudo apt install -y qemu-system-x86 qemu-utils libvirt-daemon-system \
    libvirt-clients virtinst virt-manager

Install on Fedora/RHEL:

$ sudo dnf install -y qemu-kvm libvirt virt-install virt-manager

Distro Note: On Arch Linux, install qemu-full, libvirt, virt-manager, and dnsmasq. Then enable and start libvirtd.service.

Download a small test image (Alpine Linux, ~60MB):

$ mkdir -p ~/vms && cd ~/vms
$ wget https://dl-cdn.alpinelinux.org/alpine/v3.19/releases/x86_64/alpine-virt-3.19.1-x86_64.iso

Launch a VM directly with QEMU (without libvirt):

# Create a 2GB virtual disk
$ qemu-img create -f qcow2 alpine-test.qcow2 2G

# Boot the VM with KVM acceleration
$ qemu-system-x86_64 \
    -enable-kvm \
    -m 512 \
    -cpu host \
    -smp 2 \
    -drive file=alpine-test.qcow2,format=qcow2 \
    -cdrom alpine-virt-3.19.1-x86_64.iso \
    -boot d \
    -net nic -net user \
    -nographic

Let us break down those flags:

FlagMeaning
-enable-kvmUse KVM hardware acceleration
-m 512512 MB of RAM
-cpu hostPass through the host CPU model
-smp 22 virtual CPUs
-drive file=...,format=qcow2Virtual hard disk
-cdrom ...ISO image as virtual CD-ROM
-boot dBoot from CD-ROM first
-net nic -net userUser-mode networking (NAT)
-nographicConsole output in terminal

Press Ctrl+A then X to exit the QEMU console.


The libvirt Ecosystem

Running QEMU directly with command-line flags works, but it does not scale. Managing dozens of VMs with raw QEMU commands would be a nightmare. This is where libvirt comes in.

libvirt is a toolkit that provides a unified API for managing virtualization platforms. It supports KVM/QEMU, Xen, LXC, VirtualBox, and more. Think of it as an abstraction layer.

┌───────────────────────────────────────────┐
│           Management Tools                 │
│  ┌─────────┐ ┌──────────┐ ┌───────────┐  │
│  │  virsh   │ │virt-     │ │ Cockpit / │  │
│  │ (CLI)    │ │manager   │ │ oVirt     │  │
│  │         │ │(GUI)     │ │ (Web)     │  │
│  └────┬────┘ └────┬─────┘ └─────┬─────┘  │
│       └───────────┼──────────────┘        │
│                   ▼                        │
│           ┌───────────────┐               │
│           │   libvirtd    │               │
│           │   (daemon)    │               │
│           └───────┬───────┘               │
│                   │                        │
│       ┌───────────┼───────────┐           │
│       ▼           ▼           ▼           │
│   ┌───────┐  ┌────────┐  ┌────────┐      │
│   │ QEMU/ │  │  Xen   │  │  LXC   │      │
│   │  KVM  │  │        │  │        │      │
│   └───────┘  └────────┘  └────────┘      │
└───────────────────────────────────────────┘

Key components:

  • libvirtd -- the daemon that manages VMs
  • virsh -- the command-line interface
  • virt-manager -- a graphical desktop application
  • virt-install -- command-line tool for creating new VMs

Hands-On: Managing VMs with virsh

Start the libvirt daemon:

$ sudo systemctl enable --now libvirtd

List all VMs (called "domains" in libvirt terminology):

$ sudo virsh list --all
 Id   Name   State
-----------------------

Create a VM using virt-install:

$ sudo virt-install \
    --name alpine-test \
    --ram 512 \
    --vcpus 2 \
    --disk path=/var/lib/libvirt/images/alpine-test.qcow2,size=2 \
    --cdrom ~/vms/alpine-virt-3.19.1-x86_64.iso \
    --os-variant alpinelinux3.19 \
    --network network=default \
    --graphics none \
    --console pty,target_type=serial

Distro Note: If --os-variant alpinelinux3.19 is not recognized, run osinfo-query os | grep alpine to find valid values, or use --os-variant generic.

Common virsh commands:

# List running VMs
$ sudo virsh list

# Start a VM
$ sudo virsh start alpine-test

# Graceful shutdown
$ sudo virsh shutdown alpine-test

# Force power off (like pulling the plug)
$ sudo virsh destroy alpine-test

# Connect to VM console
$ sudo virsh console alpine-test

# View VM details
$ sudo virsh dominfo alpine-test

# View VM's XML configuration
$ sudo virsh dumpxml alpine-test

# Suspend (pause) a VM
$ sudo virsh suspend alpine-test

# Resume a paused VM
$ sudo virsh resume alpine-test

# Take a snapshot
$ sudo virsh snapshot-create-as alpine-test snap1 "First snapshot"

# List snapshots
$ sudo virsh snapshot-list alpine-test

# Revert to a snapshot
$ sudo virsh snapshot-revert alpine-test snap1

# Delete a VM (keeps disk by default)
$ sudo virsh undefine alpine-test

# Delete a VM and its disk
$ sudo virsh undefine alpine-test --remove-all-storage

Safety Warning: virsh destroy does NOT delete the VM. It force-powers it off (equivalent to pulling the power cord). virsh undefine --remove-all-storage permanently deletes the VM and its disk image.

virt-manager: The Graphical Interface

If you are on a desktop with X11/Wayland, virt-manager provides a full GUI for creating and managing VMs:

$ virt-manager

It connects to libvirtd and shows all your VMs. You can:

  • Create new VMs with a wizard
  • View VM consoles (graphical or serial)
  • Adjust CPU, memory, and disk settings
  • Manage virtual networks and storage pools
  • Take and manage snapshots

For headless servers, virt-manager can connect to remote libvirtd instances over SSH:

$ virt-manager --connect qemu+ssh://user@remote-host/system

Containers vs Virtual Machines

This is one of the most important architectural distinctions in modern infrastructure. Let us compare them clearly.

Virtual Machines:                    Containers:

┌──────────┐ ┌──────────┐          ┌──────────┐ ┌──────────┐
│  App A   │ │  App B   │          │  App A   │ │  App B   │
├──────────┤ ├──────────┤          ├──────────┤ ├──────────┤
│  Libs    │ │  Libs    │          │  Libs    │ │  Libs    │
├──────────┤ ├──────────┤          └────┬─────┘ └────┬─────┘
│ Guest OS │ │ Guest OS │               │            │
│ (kernel) │ │ (kernel) │          ─────┴────────────┴─────
└────┬─────┘ └────┬─────┘          │   Host OS Kernel     │
─────┴────────────┴─────           ├───────────────────────┤
│    Hypervisor        │           │      Hardware         │
├──────────────────────┤           └───────────────────────┘
│      Hardware        │
└──────────────────────┘
FeatureVirtual MachinesContainers
IsolationFull hardware-levelProcess-level (namespaces, cgroups)
KernelEach VM has its ownShares the host kernel
Boot timeSeconds to minutesMilliseconds to seconds
Resource overheadHundreds of MB per VMA few MB per container
Disk footprintGBs per VM imageMBs per container image
Security boundaryStrong (separate kernel)Weaker (shared kernel attack surface)
OS flexibilityRun any OS (Windows, BSD)Linux guests on Linux host only
DensityTens per hostHundreds to thousands per host
Use caseMulti-OS, strong isolationMicroservices, CI/CD, app packaging

When to Use VMs

  • You need to run different operating systems (Windows on a Linux host)
  • You require strong security isolation (multi-tenant environments)
  • You are running untrusted workloads
  • You need different kernel versions for different applications
  • Your application requires kernel modules not available on the host

When to Use Containers

  • You want fast startup times and high density
  • You are deploying microservices
  • You need consistent development/production environments
  • You want lightweight, portable packaging of applications
  • Your CI/CD pipeline needs to spin up environments rapidly

Think About It: Many production environments use both. VMs provide the security boundary between tenants, and containers run inside those VMs for application packaging. Cloud providers like AWS run your containers inside micro-VMs (Firecracker) for exactly this reason.


Debug This

A colleague complains their VM will not start. Here is the error:

$ sudo virsh start myvm
error: Failed to start domain 'myvm'
error: internal error: Cannot find suitable emulator for x86_64

What is wrong?

The QEMU binary is not installed or not in the expected path. libvirt cannot find qemu-system-x86_64.

Fix:

# Install QEMU
$ sudo apt install qemu-system-x86    # Debian/Ubuntu
$ sudo dnf install qemu-kvm           # Fedora/RHEL

# Verify it is installed
$ which qemu-system-x86_64
/usr/bin/qemu-system-x86_64

# Restart libvirtd
$ sudo systemctl restart libvirtd

# Try again
$ sudo virsh start myvm

Another common issue -- KVM acceleration not available:

$ sudo virsh start myvm
error: internal error: process exited while connecting to monitor:
Could not access KVM kernel module: No such file or directory

Fix: Load the KVM module:

$ sudo modprobe kvm_intel    # Intel CPUs
$ sudo modprobe kvm_amd      # AMD CPUs

# Verify
$ lsmod | grep kvm

If modprobe fails, check your BIOS settings -- hardware virtualization (VT-x or AMD-V) may be disabled.


Virtual Networking

libvirt sets up a default virtual network using a bridge called virbr0. VMs connect to this bridge and can reach the internet through NAT on the host.

# View virtual networks
$ sudo virsh net-list --all
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   yes         yes
# View network details
$ sudo virsh net-info default
Name:           default
UUID:           a1b2c3d4-...
Active:         yes
Persistent:     yes
Autostart:      yes
Bridge:         virbr0
# Check the bridge on the host
$ ip addr show virbr0
5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0

VMs on this network get IPs in the 192.168.122.0/24 range via DHCP served by dnsmasq (which libvirt manages automatically).


Virtual Disk Formats

QEMU supports several disk image formats:

FormatDescription
qcow2QEMU Copy-On-Write v2. Supports snapshots, compression, encryption. The standard choice.
rawNo overhead, slightly faster I/O, but no snapshots or compression.
vmdkVMware format. Useful for compatibility.
vdiVirtualBox format.

Working with qcow2 images:

# Create a 20GB thin-provisioned image (only uses space as data is written)
$ qemu-img create -f qcow2 disk.qcow2 20G

# Check image details
$ qemu-img info disk.qcow2

# Convert between formats
$ qemu-img convert -f raw -O qcow2 disk.raw disk.qcow2

# Resize a disk image
$ qemu-img resize disk.qcow2 +10G

What Just Happened?

┌─────────────────────────────────────────────────────────────┐
│                    CHAPTER RECAP                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Virtualization runs multiple OS instances on one machine.  │
│                                                             │
│  Type 1 hypervisors (KVM, Xen) run on bare metal.          │
│  Type 2 hypervisors (VirtualBox, QEMU) run as apps.        │
│                                                             │
│  KVM is a kernel module; QEMU handles device emulation.    │
│  Together they form the standard Linux virtualization       │
│  stack.                                                     │
│                                                             │
│  libvirt (virsh, virt-manager) provides unified VM          │
│  management across hypervisor backends.                     │
│                                                             │
│  Full virtualization = unmodified guest OS.                 │
│  Paravirtualization = modified guest with hypercalls.       │
│  virtio = paravirtual drivers for best I/O performance.    │
│                                                             │
│  VMs provide strong isolation with full kernels.            │
│  Containers are lightweight but share the host kernel.     │
│  Production environments often use both.                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Try This

  1. Basic VM creation: Install QEMU/KVM and libvirt on your system. Download the Alpine Linux ISO and create a VM using virt-install. Boot it, log in, and run uname -a inside the VM. How does the kernel differ from your host?

  2. Snapshot practice: Create a snapshot of your running VM. Make a change inside the VM (create a file, install a package). Revert to the snapshot and verify the change is gone.

  3. Explore virsh: Use virsh dominfo, virsh domstats, and virsh dumpxml to inspect your VM. Find out how much memory is allocated and how many vCPUs it has.

  4. Disk management: Create a second qcow2 disk image and attach it to your running VM using virsh attach-disk. Inside the VM, partition, format, and mount it.

  5. Bonus Challenge: Set up a second VM on the same virtual network. From VM 1, ping VM 2 by IP address. This demonstrates that libvirt's virtual network provides layer-2 connectivity between guests.

Cgroups & Namespaces

Why This Matters

Every time you run a Docker container, Kubernetes pod, or LXC system container, two Linux kernel features do all the heavy lifting behind the scenes: namespaces and cgroups. Namespaces provide isolation -- making a process believe it is alone on the system. Cgroups provide resource control -- ensuring a runaway process cannot eat all your CPU or memory.

Understanding these two primitives is the difference between "I run containers" and "I understand how containers work." When a container breaks, leaks memory, or behaves strangely with networking, knowing namespaces and cgroups lets you diagnose the problem at the kernel level instead of blindly restarting things.

In this chapter, we will not just read about these concepts. We will use unshare to create namespaces by hand, manually set up cgroups to limit CPU and memory, and -- in the final exercise -- build a mini container from scratch using nothing but standard Linux utilities.


Try This Right Now

See how many namespaces your current shell process belongs to:

$ ls -la /proc/$$/ns/
lrwxrwxrwx 1 user user 0 Feb 21 10:00 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 user user 0 Feb 21 10:00 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 user user 0 Feb 21 10:00 mnt -> 'mnt:[4026531841]'
lrwxrwxrwx 1 user user 0 Feb 21 10:00 net -> 'net:[4026531840]'
lrwxrwxrwx 1 user user 0 Feb 21 10:00 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 user user 0 Feb 21 10:00 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 user user 0 Feb 21 10:00 user -> 'user:[4026531837]'
lrwxrwxrwx 1 user user 0 Feb 21 10:00 uts -> 'uts:[4026531838]'

Each line is a namespace. The numbers in brackets are inode identifiers. Two processes with the same number for a given namespace type share that namespace. That is literally how the kernel tracks isolation.

Now check cgroups:

$ cat /proc/$$/cgroup
0::/user.slice/user-1000.slice/session-1.scope

That tells you which cgroup your shell belongs to (on a cgroups v2 system). Every process on Linux is in a cgroup.


Linux Namespaces: The Isolation Engine

A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of that resource.

Linux provides eight types of namespaces:

NamespaceFlagWhat It Isolates
PIDCLONE_NEWPIDProcess IDs -- process sees itself as PID 1
NetworkCLONE_NEWNETNetwork stack -- own interfaces, routing, firewall
MountCLONE_NEWNSFilesystem mount points
UTSCLONE_NEWUTSHostname and domain name
IPCCLONE_NEWIPCSystem V IPC, POSIX message queues
UserCLONE_NEWUSERUser and group IDs (UID/GID mapping)
CgroupCLONE_NEWCGROUPCgroup root directory
TimeCLONE_NEWTIMESystem clocks (since kernel 5.6)
Normal Linux:                         With Namespaces:

All processes see:                    Process in container sees:
  - All PIDs (1-30000+)               - Only its own PIDs (1-50)
  - All network interfaces             - Its own eth0, lo
  - All mount points                   - Its own /proc, /sys, /tmp
  - The real hostname                  - Its own hostname
  - All users                          - Mapped UIDs (root=0 inside)

PID Namespace

The PID namespace gives a process its own view of the process tree. The first process in a new PID namespace becomes PID 1 (the init process for that namespace).

# Create a new PID namespace and run bash in it
$ sudo unshare --pid --fork --mount-proc bash

# Inside the new namespace:
root# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   7236  4016 pts/0    S    10:00   0:00 bash
root         2  0.0  0.0  10072  3344 pts/0    R+   10:00   0:00 ps aux

Notice there are only two processes. Your bash is PID 1. The host's thousands of processes are invisible. But from the host, this bash process has a normal PID (say, 12345).

# Exit the namespace
root# exit

Network Namespace

A network namespace gives a process its own network stack: its own interfaces, routing table, firewall rules, and port space.

# Create a named network namespace
$ sudo ip netns add testns

# List network namespaces
$ sudo ip netns list
testns

# Run a command inside the namespace
$ sudo ip netns exec testns ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Only the loopback interface exists, and it is DOWN. This namespace is completely isolated from the host network. Let us connect it:

# Create a veth pair (virtual ethernet cable)
$ sudo ip link add veth-host type veth peer name veth-ns

# Move one end into the namespace
$ sudo ip link set veth-ns netns testns

# Configure the host end
$ sudo ip addr add 10.0.0.1/24 dev veth-host
$ sudo ip link set veth-host up

# Configure the namespace end
$ sudo ip netns exec testns ip addr add 10.0.0.2/24 dev veth-ns
$ sudo ip netns exec testns ip link set veth-ns up
$ sudo ip netns exec testns ip link set lo up

# Test connectivity
$ sudo ip netns exec testns ping -c 2 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.048 ms

This is exactly how Docker and other container runtimes set up container networking.

┌─────────────────────────┐    ┌────────────────────────┐
│     Host Namespace      │    │   testns Namespace     │
│                         │    │                        │
│   veth-host             │    │          veth-ns       │
│   10.0.0.1/24 ─────────┼────┼──────── 10.0.0.2/24   │
│                         │    │                        │
│   eth0 (real NIC)       │    │   (no real NIC)        │
│   192.168.1.100         │    │                        │
└─────────────────────────┘    └────────────────────────┘
         veth pair = virtual cable

Clean up:

$ sudo ip netns delete testns
$ sudo ip link delete veth-host 2>/dev/null

UTS Namespace

The UTS namespace isolates the hostname. Containers use this so each one can have its own hostname.

$ sudo unshare --uts bash

root# hostname container-demo
root# hostname
container-demo

root# exit

# Host hostname is unchanged
$ hostname
your-real-hostname

Mount Namespace

The mount namespace gives a process its own view of the filesystem mount points. Mounts made inside the namespace are invisible outside.

$ sudo unshare --mount bash

root# mkdir /tmp/private-mount
root# mount -t tmpfs tmpfs /tmp/private-mount
root# echo "secret" > /tmp/private-mount/file.txt
root# cat /tmp/private-mount/file.txt
secret

root# exit

# From the host, the mount does not exist
$ ls /tmp/private-mount/
# Empty or directory does not exist

User Namespace

The user namespace maps UIDs and GIDs. A process can be root (UID 0) inside a user namespace while being an unprivileged user on the host. This is the foundation of rootless containers.

# As a regular user, create a new user namespace
$ unshare --user --map-root-user bash

root# id
uid=0(root) gid=0(root) groups=0(root)

root# whoami
root

# But on the host, you are still your regular user
root# cat /proc/$$/uid_map
         0       1000          1
# UID 0 inside maps to UID 1000 outside

root# exit

Think About It: User namespaces are what make rootless Podman and rootless Docker possible. The container process believes it is running as root (UID 0), but on the host it is actually your regular unprivileged user. If the container is compromised, the attacker only has your user's privileges, not root.


Cgroups: The Resource Control Engine

While namespaces provide isolation (what a process can see), cgroups provide limitation (what a process can use). Cgroups control:

  • CPU -- how much processor time a group of processes gets
  • Memory -- how much RAM (and swap) processes can consume
  • I/O -- disk read/write bandwidth limits
  • PIDs -- maximum number of processes
  • Network (indirectly) -- via traffic shaping

Cgroups v1 vs Cgroups v2

Linux has two versions of cgroups, and the distinction matters.

FeatureCgroups v1Cgroups v2
HierarchyMultiple hierarchies (one per controller)Single unified hierarchy
Filesystem/sys/fs/cgroup/<controller>//sys/fs/cgroup/ (unified)
StatusLegacy (still supported)Current standard
systemd integrationWorks but messyNative integration

Check which version you are using:

$ stat -fc %T /sys/fs/cgroup/
  • cgroup2fs = cgroups v2 (unified)
  • tmpfs = cgroups v1 (or hybrid)
# On cgroups v2, list controllers
$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

Most modern distributions (Fedora 31+, Ubuntu 21.10+, Debian 12+, Arch) default to cgroups v2.

The Cgroup Filesystem

Cgroups are managed entirely through a virtual filesystem. Creating directories creates cgroups. Writing to files configures limits. It is elegant in its simplicity.

/sys/fs/cgroup/                          (root cgroup)
├── cgroup.controllers                   (available controllers)
├── cgroup.subtree_control               (enabled controllers for children)
├── user.slice/                          (user sessions)
│   └── user-1000.slice/
│       └── session-1.scope/
│           ├── cgroup.procs             (PIDs in this cgroup)
│           ├── memory.current           (current memory usage)
│           └── cpu.stat                 (CPU statistics)
├── system.slice/                        (system services)
│   ├── sshd.service/
│   ├── nginx.service/
│   └── docker.service/
└── init.scope/                          (PID 1)

Hands-On: Creating Cgroups and Setting Limits

Let us manually create a cgroup and limit memory.

Step 1: Create a cgroup (cgroups v2):

# Enable memory and pids controllers for children of root
$ sudo sh -c 'echo "+memory +pids +cpu" > /sys/fs/cgroup/cgroup.subtree_control'

# Create a new cgroup
$ sudo mkdir /sys/fs/cgroup/demo-group

# Verify controllers are available
$ cat /sys/fs/cgroup/demo-group/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

Step 2: Set a memory limit:

# Limit to 50MB of memory
$ sudo sh -c 'echo 52428800 > /sys/fs/cgroup/demo-group/memory.max'

# Verify
$ cat /sys/fs/cgroup/demo-group/memory.max
52428800

Step 3: Add a process to the cgroup:

# Start a bash shell in the cgroup
$ sudo sh -c "echo $$ > /sys/fs/cgroup/demo-group/cgroup.procs"

# Verify
$ cat /proc/$$/cgroup
0::/demo-group

Step 4: Test the limit:

# Try to allocate more than 50MB
$ python3 -c "
data = []
try:
    while True:
        data.append('A' * 1024 * 1024)  # 1MB at a time
        print(f'Allocated {len(data)} MB')
except MemoryError:
    print(f'Hit memory limit at {len(data)} MB')
"
Allocated 1 MB
Allocated 2 MB
...
Allocated 45 MB
Killed

The kernel's OOM killer terminated the process because it exceeded the cgroup memory limit. That is exactly how Docker's --memory flag works.

Safety Warning: Be careful when moving your current shell into a resource-limited cgroup. If you set the memory limit too low, your shell itself may be killed.

Step 5: Set CPU limits:

# Limit to 20% of one CPU core
# Format: $MAX $PERIOD (in microseconds)
# 20000 out of 100000 = 20%
$ sudo sh -c 'echo "20000 100000" > /sys/fs/cgroup/demo-group/cpu.max'

# Verify
$ cat /sys/fs/cgroup/demo-group/cpu.max
20000 100000

Step 6: Set a PID limit:

# Maximum 10 processes in this cgroup
$ sudo sh -c 'echo 10 > /sys/fs/cgroup/demo-group/pids.max'

Cleanup:

# Move our shell back to the root cgroup first
$ sudo sh -c "echo $$ > /sys/fs/cgroup/cgroup.procs"

# Remove the cgroup (must be empty)
$ sudo rmdir /sys/fs/cgroup/demo-group

Think About It: When Docker runs a container with --memory=256m --cpus=0.5, it is literally creating a cgroup, writing 268435456 to memory.max, and writing 50000 100000 to cpu.max. The container runtime is just an automation layer over these kernel primitives.


systemd and Cgroups

systemd uses cgroups extensively. Every service, user session, and scope gets its own cgroup. This is how systemd tracks all processes belonging to a service (even after forks) and applies resource limits.

# View the cgroup hierarchy as a tree
$ systemd-cgls
Control group /:
-.slice
├─user.slice
│ └─user-1000.slice
│   ├─session-1.scope
│   │ ├─1234 bash
│   │ └─5678 vim
│   └─user@1000.service
│     └─init.scope
│       └─1111 /lib/systemd/systemd --user
├─init.scope
│ └─1 /sbin/init
└─system.slice
  ├─sshd.service
  │ └─800 sshd: /usr/sbin/sshd -D
  ├─nginx.service
  │ ├─900 nginx: master process
  │ ├─901 nginx: worker process
  │ └─902 nginx: worker process
  └─docker.service
    └─1000 /usr/bin/dockerd

Set resource limits for a service via systemd:

# Edit a service's cgroup limits
$ sudo systemctl edit myapp.service
[Service]
MemoryMax=512M
CPUQuota=50%
TasksMax=100
# View current resource usage for a service
$ systemctl status nginx.service

The "CGroup" line shows which cgroup the service belongs to. You can also use:

# Real-time resource usage by cgroup
$ systemd-cgtop
Control Group                          Tasks   %CPU   Memory
/                                        150    5.2     1.8G
/system.slice                             45    3.1   800.0M
/system.slice/docker.service              12    1.5   400.0M
/user.slice                               30    1.2   500.0M

How Containers Use Namespaces + Cgroups

A container is not a special kernel feature. It is the combination of namespaces (for isolation) and cgroups (for resource control), orchestrated by a container runtime.

What a container runtime does:

1. Create namespaces:
   └─ PID namespace    → container gets its own PID 1
   └─ Network namespace → container gets its own eth0
   └─ Mount namespace   → container sees its own filesystem
   └─ UTS namespace     → container gets its own hostname
   └─ IPC namespace     → container gets isolated IPC
   └─ User namespace    → UID mapping (rootless containers)

2. Create cgroup:
   └─ Set memory.max   → --memory flag
   └─ Set cpu.max      → --cpus flag
   └─ Set pids.max     → --pids-limit flag

3. Set up filesystem:
   └─ Mount container image as root filesystem
   └─ Set up overlay filesystem (layers)
   └─ Mount /proc, /sys, /dev

4. Apply security:
   └─ Drop capabilities
   └─ Apply seccomp filters
   └─ Apply SELinux/AppArmor labels

5. Execute the container's entrypoint process

Hands-On: Build a Mini Container

Let us build a minimal container using only unshare, chroot, and cgroups. No Docker, no Podman -- just raw Linux primitives.

Step 1: Create a minimal root filesystem:

# Create a directory for our container's filesystem
$ mkdir -p ~/minicontainer/rootfs

# Use debootstrap to create a minimal Debian filesystem
# (On Debian/Ubuntu)
$ sudo apt install -y debootstrap
$ sudo debootstrap --variant=minbase bookworm ~/minicontainer/rootfs

Distro Note: On Fedora/RHEL, you can use dnf --installroot instead:

$ sudo dnf --releasever=39 --installroot=$HOME/minicontainer/rootfs \
    install -y bash coreutils procps-ng iproute

On Arch, use pacstrap from the arch-install-scripts package.

Step 2: Enter the container with namespaces:

$ sudo unshare \
    --pid \
    --fork \
    --mount \
    --uts \
    --ipc \
    --mount-proc \
    chroot ~/minicontainer/rootfs /bin/bash

Step 3: Explore the container:

# You are now inside your mini container!
root# hostname minicontainer
root# hostname
minicontainer

root# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   7236  3904 ?        S    10:00   0:00 /bin/bash
root         8  0.0  0.0  10072  3360 ?        R+   10:00   0:00 ps aux

root# ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

root# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"

You are looking at PID 1, inside a Debian filesystem, with your own hostname, isolated from the host -- and you did it without any container runtime.

Step 4: Exit the container:

root# exit

Step 5: Add cgroup limits (from the host):

To limit the resources of our mini container, we can combine unshare with cgroups:

# Create a cgroup for our container
$ sudo mkdir /sys/fs/cgroup/minicontainer
$ sudo sh -c 'echo 104857600 > /sys/fs/cgroup/minicontainer/memory.max'  # 100MB
$ sudo sh -c 'echo "50000 100000" > /sys/fs/cgroup/minicontainer/cpu.max'  # 50% CPU
$ sudo sh -c 'echo 50 > /sys/fs/cgroup/minicontainer/pids.max'  # 50 processes

# Launch the container and add it to the cgroup
$ sudo unshare --pid --fork --mount --uts --ipc --mount-proc \
    sh -c "echo \$\$ > /sys/fs/cgroup/minicontainer/cgroup.procs && \
    exec chroot $HOME/minicontainer/rootfs /bin/bash"

Congratulations -- you just built a container by hand. It has:

  • Isolated PIDs (PID namespace)
  • Isolated filesystem (mount namespace + chroot)
  • Isolated hostname (UTS namespace)
  • Isolated IPC (IPC namespace)
  • Memory and CPU limits (cgroups)

This is, at its core, what Docker and Podman do. They just add layers of convenience: image management, networking, storage drivers, and a nice CLI.


Inspecting Namespaces and Cgroups of Running Containers

When troubleshooting containers, you can inspect their namespaces and cgroups directly.

# Find a container's PID on the host
$ docker inspect --format '{{.State.Pid}}' my-container
12345

# View its namespaces
$ sudo ls -la /proc/12345/ns/
lrwxrwxrwx 1 root root 0 Feb 21 10:00 cgroup -> 'cgroup:[4026532456]'
lrwxrwxrwx 1 root root 0 Feb 21 10:00 ipc -> 'ipc:[4026532389]'
lrwxrwxrwx 1 root root 0 Feb 21 10:00 mnt -> 'mnt:[4026532387]'
lrwxrwxrwx 1 root root 0 Feb 21 10:00 net -> 'net:[4026532391]'
lrwxrwxrwx 1 root root 0 Feb 21 10:00 pid -> 'pid:[4026532390]'
lrwxrwxrwx 1 root root 0 Feb 21 10:00 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Feb 21 10:00 uts -> 'uts:[4026532388]'

# View its cgroup
$ cat /proc/12345/cgroup
0::/system.slice/docker-abc123def456.scope

# Check memory limit
$ cat /sys/fs/cgroup/system.slice/docker-abc123def456.scope/memory.max
268435456

# Check current memory usage
$ cat /sys/fs/cgroup/system.slice/docker-abc123def456.scope/memory.current
52428800

# Enter a container's namespace directly (like docker exec)
$ sudo nsenter --target 12345 --mount --uts --ipc --net --pid -- /bin/bash

The nsenter command is the underlying mechanism behind docker exec. It enters the namespaces of an existing process.


Debug This

A developer reports their container keeps getting OOM-killed even though the host has plenty of free memory.

$ docker logs my-app
... application started ...
Killed

$ dmesg | tail
[12345.678] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),
  task=java,pid=3456,uid=0
[12345.678] Memory cgroup out of memory. Kill process 3456 (java)
  total-vm:2097152kB, anon-rss:262144kB, file-rss:0kB

Diagnosis: The key phrase is CONSTRAINT_MEMCG -- this means the OOM kill was triggered by a cgroup memory limit, not system-wide memory pressure. The container has a memory limit set.

Investigation:

# Find the container's cgroup memory limit
$ docker inspect --format '{{.HostConfig.Memory}}' my-app
268435456

# That is 256MB. The Java app likely needs more.

Fix:

# Restart with more memory
$ docker run --memory=1g my-app

# Or for a Java application, also set JVM heap limits
$ docker run --memory=1g -e JAVA_OPTS="-Xmx768m" my-app

The lesson: a cgroup memory limit is a hard wall. The kernel OOM killer will terminate processes that exceed it, even if the host has gigabytes of free RAM.


What Just Happened?

┌─────────────────────────────────────────────────────────────┐
│                    CHAPTER RECAP                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Namespaces provide ISOLATION:                              │
│    PID    → isolated process tree                           │
│    Net    → isolated network stack                          │
│    Mount  → isolated filesystem mounts                      │
│    UTS    → isolated hostname                               │
│    IPC    → isolated inter-process communication            │
│    User   → UID/GID mapping (rootless containers)          │
│                                                             │
│  Cgroups provide RESOURCE CONTROL:                          │
│    memory.max  → RAM limit                                  │
│    cpu.max     → CPU time limit                             │
│    pids.max    → process count limit                        │
│    io.max      → disk I/O limit                             │
│                                                             │
│  Cgroups v2 is the modern unified hierarchy.                │
│  systemd uses cgroups to track and limit services.          │
│                                                             │
│  A container = namespaces + cgroups + filesystem image      │
│                + security policies.                         │
│                                                             │
│  unshare creates namespaces; nsenter joins them.            │
│  The /sys/fs/cgroup filesystem manages cgroups.             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Try This

  1. PID namespace exploration: Use unshare --pid --fork --mount-proc bash to create a PID namespace. Run ps aux inside and outside. Verify that the host cannot see the namespace's PID 1 as PID 1, but can see it under its real host PID.

  2. Network namespace lab: Create two network namespaces and connect them with a veth pair. Assign IP addresses and verify you can ping between them.

  3. Memory cgroup limit: Create a cgroup with a 30MB memory limit. Write a script that allocates memory in a loop. Observe it being killed when it exceeds the limit. Check dmesg for the OOM message.

  4. CPU throttling: Create a cgroup with a 10% CPU limit. Run a CPU-intensive process (like stress --cpu 1) inside it. Use top to verify it stays around 10%.

  5. Bonus Challenge: Extend the mini container exercise. Add a network namespace to your hand-built container. Create a veth pair, assign it an IP, and add NAT on the host so the container can reach the internet. You will have essentially rebuilt what Docker does for networking.

Docker

Why This Matters

It is 2 AM and the pager goes off. The application that works perfectly on every developer's laptop is crashing in production. "It works on my machine" has become a meme because the gap between development and production environments has caused more outages than anyone can count. Different library versions, missing configuration files, subtle OS differences -- the list goes on.

Docker solved this problem by packaging an application together with everything it needs -- libraries, dependencies, configuration, runtime -- into a single portable unit called a container image. If it runs in a Docker container on your laptop, it will run the same way on your production server, your colleague's machine, or a CI/CD pipeline.

Docker did not invent containerization (LXC existed before it, and namespaces/cgroups are kernel features we covered in Chapter 62), but Docker made containers accessible. It gave the world a simple CLI, a standardized image format, and a public registry (Docker Hub) that transformed how software is built, shipped, and run.


Try This Right Now

If Docker is already installed on your system:

$ docker run --rm hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
...

That single command just:

  1. Checked for the hello-world image locally
  2. Downloaded it from Docker Hub (if not found locally)
  3. Created a container from that image
  4. Ran the container (which printed the message)
  5. Removed the container (--rm flag)

If Docker is not installed, the next section walks you through installation.


Docker Architecture

Docker uses a client-server architecture:

┌──────────────────────────────────────────────────────────────┐
│                                                              │
│   docker CLI ──────────► dockerd (daemon) ──────► containerd │
│   (client)                (Docker Engine)          │         │
│       │                        │                   │         │
│       │ REST API               │                   ▼         │
│       │ (unix socket)          │               runc          │
│       │                        │           (OCI runtime)     │
│       │                        │                             │
│       │                   ┌────┴────┐                        │
│       │                   │ Images  │                        │
│       │                   │ Volumes │                        │
│       │                   │Networks │                        │
│       │                   └─────────┘                        │
│       │                                                      │
│       └──────────────► Docker Hub / Registry                 │
│                        (pull/push images)                    │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Key components:

  • docker CLI -- the command-line tool you interact with
  • dockerd -- the Docker daemon that manages containers, images, volumes, and networks
  • containerd -- the container runtime that manages container lifecycle
  • runc -- the low-level OCI runtime that actually creates containers using Linux namespaces and cgroups
  • Docker Hub -- the default public image registry

The docker CLI communicates with dockerd via a Unix socket at /var/run/docker.sock.


Installing Docker

Debian/Ubuntu

# Remove old versions
$ sudo apt remove docker docker-engine docker.io containerd runc 2>/dev/null

# Install prerequisites
$ sudo apt update
$ sudo apt install -y ca-certificates curl gnupg

# Add Docker's official GPG key
$ sudo install -m 0755 -d /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
    sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository
$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
$ sudo apt update
$ sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Start and enable Docker
$ sudo systemctl enable --now docker

# Add your user to the docker group (log out and back in after)
$ sudo usermod -aG docker $USER

Fedora/RHEL

# Add Docker repo
$ sudo dnf config-manager --add-repo \
    https://download.docker.com/linux/fedora/docker-ce.repo

# Install Docker
$ sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Start and enable
$ sudo systemctl enable --now docker

# Add your user to docker group
$ sudo usermod -aG docker $USER

Distro Note: On Arch Linux: sudo pacman -S docker and then sudo systemctl enable --now docker.

Safety Warning: Adding a user to the docker group grants them root-equivalent access to the system. The Docker daemon runs as root, and anyone who can talk to it can mount the host filesystem, access any file, or escalate privileges. In multi-user environments, consider rootless Docker or Podman instead.

Verify the installation:

$ docker version
$ docker info
$ docker run --rm hello-world

Images vs Containers

This distinction is fundamental:

  • An image is a read-only template containing the application, libraries, and filesystem. Think of it as a class in object-oriented programming.
  • A container is a running instance of an image. Think of it as an object (instance of a class).
Image (read-only template)          Container (running instance)
┌─────────────────────┐            ┌─────────────────────┐
│  Layer 4: App code  │            │  Writable layer     │ ← changes here
├─────────────────────┤            ├─────────────────────┤
│  Layer 3: pip install│           │  Layer 4: App code  │
├─────────────────────┤            ├─────────────────────┤
│  Layer 2: apt install│           │  Layer 3: pip install│
├─────────────────────┤            ├─────────────────────┤
│  Layer 1: Ubuntu base│           │  Layer 2: apt install│
└─────────────────────┘            ├─────────────────────┤
                                   │  Layer 1: Ubuntu base│
You can create many                └─────────────────────┘
containers from one image.         Has its own writable layer.

Images are built from layers. Each layer represents a filesystem change. Layers are shared between images, saving disk space and download time.

# List local images
$ docker images
REPOSITORY    TAG       IMAGE ID       CREATED        SIZE
ubuntu        22.04     a8780b506fa4   2 weeks ago    77.8MB
nginx         latest    a6bd71f48f68   3 weeks ago    187MB
hello-world   latest    9c7a54a9a43c   6 months ago   13.3kB

# List running containers
$ docker ps

# List all containers (including stopped)
$ docker ps -a

Running Containers

Basic docker run

# Run Ubuntu interactively
$ docker run -it ubuntu:22.04 bash
root@a1b2c3d4:/# cat /etc/os-release
root@a1b2c3d4:/# exit

# Run in the background (detached)
$ docker run -d --name my-nginx -p 8080:80 nginx

# Now access http://localhost:8080 in your browser

# View logs
$ docker logs my-nginx

# Follow logs in real-time
$ docker logs -f my-nginx

# Execute a command in a running container
$ docker exec -it my-nginx bash
root@e5f6g7h8:/# nginx -v
root@e5f6g7h8:/# exit

# Stop the container
$ docker stop my-nginx

# Remove the container
$ docker rm my-nginx

Key docker run flags:

FlagMeaning
-itInteractive + TTY (for shell access)
-dDetached (run in background)
--nameGive the container a name
-p 8080:80Map host port 8080 to container port 80
--rmRemove container when it stops
-e VAR=valueSet environment variable
-v /host:/containerBind mount a directory
--memory=256mMemory limit
--cpus=0.5CPU limit (half a core)
--restart=unless-stoppedRestart policy

Think About It: When you run docker run -p 8080:80 nginx, Docker creates a network namespace for the container with its own network stack. Port 80 inside the container's namespace is mapped to port 8080 on the host via iptables rules. The -p flag is networking namespace plumbing made simple.


The Dockerfile

A Dockerfile is a text file containing instructions to build an image. Each instruction creates a layer.

Anatomy of a Dockerfile

# Start from a base image
FROM python:3.12-slim

# Set metadata
LABEL maintainer="you@example.com"
LABEL description="A simple Python web application"

# Set the working directory inside the container
WORKDIR /app

# Copy dependency file first (for cache efficiency)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# Create a non-root user
RUN useradd --create-home appuser
USER appuser

# Document which port the app uses
EXPOSE 8000

# Define the startup command
CMD ["python", "app.py"]

Key Dockerfile instructions:

InstructionPurpose
FROMBase image to build upon
RUNExecute a command during build (creates a layer)
COPYCopy files from host into the image
ADDLike COPY but can handle URLs and tar extraction
WORKDIRSet the working directory for subsequent instructions
ENVSet environment variables
EXPOSEDocument which port the app listens on
CMDDefault command to run when container starts
ENTRYPOINTConfigure the container to run as an executable
USERSet the user to run subsequent commands as
VOLUMECreate a mount point for persistent data
ARGDefine build-time variables

ENTRYPOINT vs CMD

This catches many people off guard:

  • CMD provides default arguments that can be overridden: docker run myimage /bin/sh replaces the CMD.
  • ENTRYPOINT sets the main executable that always runs. CMD then provides default arguments to it.
# CMD only -- easy to override
CMD ["python", "app.py"]
# docker run myimage              → python app.py
# docker run myimage /bin/bash    → /bin/bash (CMD replaced)

# ENTRYPOINT + CMD -- flexible and robust
ENTRYPOINT ["python"]
CMD ["app.py"]
# docker run myimage              → python app.py
# docker run myimage test.py      → python test.py (CMD replaced, ENTRYPOINT stays)

Hands-On: Build an Image

Create a simple Python application:

$ mkdir -p ~/docker-demo && cd ~/docker-demo

Create app.py:

from http.server import HTTPServer, SimpleHTTPRequestHandler
import os

class Handler(SimpleHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/plain')
        self.end_headers()
        hostname = os.uname().nodename
        self.wfile.write(f"Hello from container {hostname}\n".encode())

if __name__ == '__main__':
    server = HTTPServer(('0.0.0.0', 8000), Handler)
    print("Server running on port 8000...")
    server.serve_forever()

Create requirements.txt (empty for this example):

# No external dependencies

Create Dockerfile:

FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

RUN useradd --create-home appuser
USER appuser

EXPOSE 8000
CMD ["python", "app.py"]

Build and run:

# Build the image
$ docker build -t my-python-app .

# Watch the layers being built
Step 1/8 : FROM python:3.12-slim
 ---> a1b2c3d4e5f6
Step 2/8 : WORKDIR /app
 ---> Running in f6e5d4c3b2a1
...
Successfully built 9a8b7c6d5e4f
Successfully tagged my-python-app:latest

# Run it
$ docker run -d --name myapp -p 8000:8000 my-python-app

# Test it
$ curl http://localhost:8000
Hello from container a1b2c3d4e5f6

# View the layers
$ docker history my-python-app
IMAGE          CREATED          CREATED BY                                      SIZE
9a8b7c6d5e4f   30 seconds ago   CMD ["python" "app.py"]                         0B
...

# Clean up
$ docker stop myapp && docker rm myapp

Docker Compose

Docker Compose lets you define and run multi-container applications with a single YAML file.

docker-compose.yml Anatomy

# docker-compose.yml
services:
  web:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgres://db:5432/myapp
    depends_on:
      - db
    restart: unless-stopped

  db:
    image: postgres:16
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: secretpassword
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    restart: unless-stopped

volumes:
  pgdata:

Hands-On: Running a Compose Stack

$ mkdir -p ~/compose-demo && cd ~/compose-demo

Create a docker-compose.yml:

services:
  web:
    image: nginx:alpine
    ports:
      - "8080:80"
    volumes:
      - ./html:/usr/share/nginx/html:ro
    depends_on:
      - api

  api:
    image: python:3.12-slim
    working_dir: /app
    command: python -m http.server 5000
    expose:
      - "5000"
# Create content for the web server
$ mkdir -p html
$ echo "<h1>Hello from Docker Compose</h1>" > html/index.html

# Start all services
$ docker compose up -d

# View running services
$ docker compose ps

# View logs from all services
$ docker compose logs

# View logs from one service
$ docker compose logs web

# Stop all services
$ docker compose down

# Stop and remove volumes too
$ docker compose down -v

Common Docker Compose commands:

$ docker compose up -d          # Start in background
$ docker compose down           # Stop and remove containers
$ docker compose ps             # List running services
$ docker compose logs -f        # Follow logs
$ docker compose exec web sh    # Shell into a running service
$ docker compose build          # Rebuild images
$ docker compose pull           # Pull latest images
$ docker compose restart        # Restart all services

Volumes and Bind Mounts

Containers are ephemeral. When a container is removed, any data written inside it is lost. Volumes solve this.

Bind Mount:                         Named Volume:
(host path → container path)        (Docker-managed storage)

Host filesystem                     Docker storage area
/home/user/data/  ──────────►       /var/lib/docker/volumes/
                   mount              mydata/_data/  ──────────►
                                                      mount
Container sees:                     Container sees:
/app/data/                          /app/data/
# Named volume (Docker manages the storage location)
$ docker volume create mydata
$ docker run -d -v mydata:/app/data my-app

# Bind mount (you specify the host path)
$ docker run -d -v /home/user/config:/app/config:ro my-app
#                                                ^^
#                                          read-only mount

# List volumes
$ docker volume ls

# Inspect a volume
$ docker volume inspect mydata

# Remove unused volumes
$ docker volume prune

Safety Warning: Bind mounts give the container access to host files. A container running as root with a bind mount to / has full access to your host filesystem. Always use the :ro (read-only) flag unless write access is truly needed.


Docker Networking

Docker creates several networks by default:

$ docker network ls
NETWORK ID     NAME      DRIVER    SCOPE
a1b2c3d4e5f6   bridge    bridge    local
f6e5d4c3b2a1   host      host      local
9a8b7c6d5e4f   none      null      local
NetworkDescription
bridgeDefault. Containers get their own IP on a private network. Accessed via port mapping.
hostContainer shares the host's network stack. No isolation, but no port mapping needed.
noneNo networking. Container is completely isolated.
Bridge Network (default):

┌─────────────────────────────────────────────────┐
│  Host                                            │
│                                                  │
│  ┌───────────┐    ┌───────────┐                  │
│  │Container A│    │Container B│                  │
│  │172.17.0.2 │    │172.17.0.3 │                  │
│  └─────┬─────┘    └─────┬─────┘                  │
│        │                │                        │
│  ──────┴────────────────┴───────                 │
│        docker0 bridge (172.17.0.1)               │
│                │                                  │
│        NAT (iptables masquerade)                 │
│                │                                  │
│          eth0 (host NIC)                         │
└────────────────┼────────────────────────────────┘
                 │
            Internet

User-Defined Bridge Networks

The default bridge network does not provide DNS resolution between containers. Create a user-defined network for that:

# Create a custom network
$ docker network create mynet

# Run containers on the custom network
$ docker run -d --name web --network mynet nginx
$ docker run -d --name api --network mynet python:3.12-slim \
    python -m http.server 5000

# Containers can reach each other by name!
$ docker exec web ping -c 2 api
PING api (172.18.0.3): 56 data bytes
64 bytes from 172.18.0.3: seq=0 ttl=64 time=0.089 ms

# Clean up
$ docker network rm mynet

Think About It: Docker's networking is built on the Linux network namespaces and veth pairs we explored in Chapter 62. Each container gets its own network namespace. The docker0 bridge connects them. iptables rules handle port mapping and NAT. The complexity is hidden behind simple flags.


Essential Docker Commands Reference

# Container lifecycle
$ docker run                     # Create and start a container
$ docker start <container>       # Start a stopped container
$ docker stop <container>        # Graceful stop (SIGTERM, then SIGKILL)
$ docker kill <container>        # Immediate stop (SIGKILL)
$ docker rm <container>          # Remove a stopped container
$ docker rm -f <container>       # Force remove (stop + remove)

# Inspection
$ docker ps                      # List running containers
$ docker ps -a                   # List all containers
$ docker logs <container>        # View container logs
$ docker logs -f <container>     # Follow logs
$ docker inspect <container>     # Detailed JSON info
$ docker stats                   # Real-time resource usage
$ docker top <container>         # Running processes in container

# Interaction
$ docker exec -it <container> bash   # Shell into running container
$ docker cp file.txt container:/path  # Copy file to container
$ docker cp container:/path file.txt  # Copy file from container

# Images
$ docker images                  # List local images
$ docker pull <image>            # Download an image
$ docker build -t name .         # Build image from Dockerfile
$ docker rmi <image>             # Remove an image
$ docker image prune             # Remove unused images

# System
$ docker system df               # Disk usage
$ docker system prune            # Remove all unused data
$ docker system prune -a         # Remove everything unused

Image Registries

Docker Hub is the default public registry, but you can use other registries or run your own.

# Pull from Docker Hub (default)
$ docker pull nginx:latest

# Pull from a specific registry
$ docker pull ghcr.io/owner/image:tag
$ docker pull quay.io/organization/image:tag

# Tag an image for a registry
$ docker tag my-app:latest registry.example.com/my-app:v1.0

# Push to a registry (requires login)
$ docker login registry.example.com
$ docker push registry.example.com/my-app:v1.0

Security Best Practices

Running containers securely requires deliberate choices:

1. Do Not Run as Root

# BAD: runs as root by default
FROM python:3.12-slim
COPY app.py .
CMD ["python", "app.py"]

# GOOD: create and use a non-root user
FROM python:3.12-slim
RUN useradd --create-home appuser
WORKDIR /home/appuser
COPY --chown=appuser:appuser app.py .
USER appuser
CMD ["python", "app.py"]

2. Use Minimal Base Images

# Larger attack surface (140MB+)
FROM python:3.12

# Smaller attack surface (50MB)
FROM python:3.12-slim

# Smallest attack surface (requires static binaries)
FROM python:3.12-alpine

3. Do Not Store Secrets in Images

# BAD: secret baked into the image forever
ENV API_KEY=supersecret123

# GOOD: pass secrets at runtime
# docker run -e API_KEY=supersecret123 my-app

4. Pin Image Versions

# BAD: unpredictable, could change any time
FROM python:latest

# GOOD: specific version
FROM python:3.12.1-slim

# BEST: pin to a digest
FROM python@sha256:abc123def456...

5. Use .dockerignore

Create a .dockerignore file to prevent sensitive files from being copied into the image:

.git
.env
*.secret
node_modules
__pycache__
*.pyc

6. Scan Images for Vulnerabilities

# Docker Scout (built-in scanning)
$ docker scout cve my-app:latest

# Trivy (open-source scanner)
$ trivy image my-app:latest

Debug This

A developer's container starts but the application inside is not accessible:

$ docker run -d --name web -p 8080:80 my-web-app
$ curl http://localhost:8080
curl: (56) Recv failure: Connection reset by peer

Diagnosis steps:

# Is the container actually running?
$ docker ps
# Yes, it shows as running

# Check the logs
$ docker logs web
# Error: bind address 0.0.0.0:8000 already in use

# The application is trying to bind to port 8000 inside the container,
# but we mapped host:8080 → container:80

The problem: The port mapping says to forward host port 8080 to container port 80. But the application inside the container listens on port 8000, not port 80.

Fix:

$ docker rm -f web
$ docker run -d --name web -p 8080:8000 my-web-app
$ curl http://localhost:8080
# Works!

The host port and container port in -p are HOST:CONTAINER. The container port must match what the application actually listens on.


What Just Happened?

┌─────────────────────────────────────────────────────────────┐
│                    CHAPTER RECAP                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Docker packages applications into portable containers.     │
│                                                             │
│  Architecture: CLI → dockerd → containerd → runc            │
│                                                             │
│  Image = read-only template (layers).                       │
│  Container = running instance + writable layer.             │
│                                                             │
│  Dockerfile: FROM, RUN, COPY, CMD, ENTRYPOINT, USER         │
│    → Each instruction creates an image layer.               │
│                                                             │
│  Docker Compose: multi-container apps in one YAML file.     │
│                                                             │
│  Volumes persist data beyond container lifecycle.           │
│  Bind mounts map host paths into containers.               │
│                                                             │
│  Networking: bridge (default, NAT), host, none.             │
│  User-defined networks provide DNS between containers.     │
│                                                             │
│  Security: non-root users, minimal images, no secrets       │
│  in images, pinned versions, .dockerignore, scanning.       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Try This

  1. Build and run: Create a Dockerfile for a simple web application (use Python, Node.js, or any language you like). Build it, run it, and verify it responds to HTTP requests.

  2. Multi-container app: Write a docker-compose.yml that runs a web application with a PostgreSQL database and a Redis cache. Verify the web application can connect to both.

  3. Volume persistence: Run a PostgreSQL container with a named volume. Insert some data. Stop and remove the container. Start a new PostgreSQL container with the same volume. Verify your data survived.

  4. Networking exploration: Create a custom Docker network. Run two containers on it. From one container, ping the other by container name. Then inspect the network with docker network inspect and find the IP addresses.

  5. Image optimization: Take a Dockerfile that uses a full python:3.12 base image. Rewrite it to use python:3.12-slim. Compare the image sizes with docker images. How much space did you save?

  6. Bonus Challenge: Write a multi-stage Dockerfile. Use a build stage with full build tools to compile your application, then copy only the compiled binary into a minimal final image (like alpine or scratch). This is how production Go and Rust applications are containerized.

Podman

Why This Matters

Docker changed the world, but it has a design choice that makes security-conscious sysadmins uneasy: the Docker daemon. Every Docker command talks to dockerd, a long-running daemon that runs as root. If that daemon is compromised, an attacker has root on your host. If the daemon crashes, every container on the system goes down with it.

Podman was created by Red Hat to address these concerns. It is daemonless (no central service to crash or compromise), rootless (containers can run without any root privileges), and CLI-compatible with Docker (you can literally alias docker=podman and most workflows continue to work). Podman also introduces the concept of pods -- groups of containers that share namespaces -- which mirrors Kubernetes pod architecture.

If you are on RHEL, CentOS Stream, Fedora, or any environment that values security, Podman is likely your default container runtime. Understanding Podman means understanding the future direction of Linux containerization.


Try This Right Now

If you are on Fedora or RHEL 8+, Podman is probably already installed:

$ podman --version
podman version 4.9.4

$ podman run --rm docker.io/library/hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
...

Notice that the hello-world image comes from Docker Hub (docker.io/library/). Podman can pull from the same registries as Docker.

Now check whether you are running rootless:

$ podman info --format '{{.Host.Security.Rootless}}'
true

If it says true, you are running containers without any root privileges. That is Podman's default behavior when run as a regular user.


Podman vs Docker: Key Differences

Docker Architecture:                 Podman Architecture:

┌────────┐     ┌──────────┐         ┌────────┐
│docker  │────►│ dockerd  │         │podman  │
│CLI     │     │ (daemon) │         │CLI     │
└────────┘     │ (root)   │         └───┬────┘
               └────┬─────┘             │
                    │                   │ (direct fork/exec)
                    ▼                   ▼
              ┌──────────┐        ┌──────────┐
              │containerd│        │ conmon    │
              └────┬─────┘        └────┬─────┘
                   ▼                   ▼
              ┌──────────┐        ┌──────────┐
              │  runc    │        │  crun     │
              └──────────┘        └──────────┘

Single point of failure ✗         No daemon ✓
Requires root daemon ✗            Rootless by default ✓
Daemon crash kills all ✗          Process-per-container ✓
FeatureDockerPodman
DaemonRequires dockerd (root)Daemonless
RootlessPossible but not defaultDefault mode
CLI compatibilityN/A (it IS Docker)Nearly identical
Pod supportNo native podsFirst-class pods
systemd integrationLimitedpodman generate systemd
Image formatOCI + Docker v2OCI + Docker v2
ComposeDocker Composepodman-compose or podman compose
Default runtimerunccrun (faster, written in C)
Default onUbuntu, most cloudFedora, RHEL, CentOS

Installing Podman

Fedora/RHEL/CentOS

# Usually pre-installed, but if not:
$ sudo dnf install -y podman

Debian/Ubuntu

$ sudo apt update
$ sudo apt install -y podman

Arch Linux

$ sudo pacman -S podman

Distro Note: On older Ubuntu (20.04), the packaged Podman version may be outdated. Use the Kubic project repository for a newer version. On Ubuntu 22.04+, the version in the default repos is usually adequate.

Verify installation:

$ podman --version
$ podman info

Rootless Setup

Podman runs rootless out of the box, but it needs subordinate UID/GID ranges configured:

# Check that your user has subuid/subgid entries
$ grep $USER /etc/subuid
user:100000:65536

$ grep $USER /etc/subgid
user:100000:65536

If those entries are missing:

$ sudo usermod --add-subuids 100000-165535 --add-subgids 100000-165535 $USER

These subordinate UIDs allow user namespace mapping -- the mechanism that lets a rootless container have a "root" user (UID 0) that maps to an unprivileged UID on the host.


Using Podman: The Familiar CLI

If you know Docker, you know Podman. The commands are nearly identical.

Running Containers

# Run an interactive container
$ podman run -it docker.io/library/ubuntu:22.04 bash

# Run a container in the background
$ podman run -d --name my-nginx -p 8080:80 docker.io/library/nginx

# Test it
$ curl http://localhost:8080

# View logs
$ podman logs my-nginx
$ podman logs -f my-nginx

# Execute commands in a running container
$ podman exec -it my-nginx bash

# Stop and remove
$ podman stop my-nginx
$ podman rm my-nginx

# List containers
$ podman ps -a

# List images
$ podman images

Registry Configuration

Podman requires fully qualified image names by default (unlike Docker which assumes docker.io/library/). You can configure search registries:

# Pull with full path
$ podman pull docker.io/library/nginx:latest

# Or configure default search registries
$ cat /etc/containers/registries.conf
unqualified-search-registries = ["docker.io", "quay.io"]

With that configuration:

# Now short names work
$ podman pull nginx:latest

Think About It: Podman requiring fully qualified image names is a security feature. When you type docker pull python, how do you know which registry that comes from? Is it Docker Hub? A compromised mirror? Podman forces you to be explicit: docker.io/library/python. This prevents accidental pulls from unexpected sources.


Rootless Containers in Detail

When you run podman run as a regular user, here is what happens:

Host System:                           Container:
┌──────────────────────────┐          ┌──────────────────────┐
│ Your user (UID 1000)     │          │ root (UID 0)         │
│                          │   maps   │                      │
│ Subordinate UIDs:        │ -------> │ Container users:     │
│   100000 → UID 1 in      │          │   UID 1              │
│   100001 → UID 2 in      │          │   UID 2              │
│   ...                    │          │   ...                │
│   165535 → UID 65536     │          │   UID 65536          │
│                          │          │                      │
│ Your user owns the       │          │ "root" inside has    │
│ container process        │          │ no host root privs   │
└──────────────────────────┘          └──────────────────────┘
# Run a rootless container
$ podman run -d --name rootless-test docker.io/library/nginx

# Check the process on the host
$ ps aux | grep nginx
user     12345  0.0  0.1  ... nginx: master process
user     12346  0.0  0.1  ... nginx: worker process

# The nginx processes run as YOUR user, not root!

# Inside the container, it appears to be root
$ podman exec rootless-test id
uid=0(root) gid=0(root) groups=0(root)

# But the UID mapping shows the truth
$ podman exec rootless-test cat /proc/1/uid_map
         0       1000          1
         1     100000      65536

Rootless containers have some limitations:

  • Cannot bind to ports below 1024 (by default)
  • Cannot use some network features that require root
  • Storage performance may differ slightly
# Binding to low ports as rootless user:
$ podman run -p 80:80 nginx
Error: rootlessport cannot expose privileged port 80

# Fix: use a high port
$ podman run -p 8080:80 nginx

# Or allow low port binding system-wide (requires root once)
$ sudo sysctl net.ipv4.ip_unprivileged_port_start=80

Pods: Podman's Unique Feature

A pod is a group of containers that share the same network namespace, PID namespace, and IPC namespace. This directly mirrors the Kubernetes pod concept.

┌─────────────────────────────────────────┐
│                 Pod                      │
│  ┌──────────┐  ┌──────────┐             │
│  │  Web App │  │  Sidecar │             │
│  │ Container│  │ Container│             │
│  │ port 8000│  │ port 9090│             │
│  └────┬─────┘  └────┬─────┘             │
│       │              │                  │
│  ─────┴──────────────┴────────          │
│  Shared network namespace               │
│  (containers see each other             │
│   on localhost)                         │
│                                         │
│  Infra container (pause)                │
│  Holds namespaces open                  │
└─────────────────────────────────────────┘

Hands-On: Working with Pods

# Create a pod with port mapping
$ podman pod create --name my-pod -p 8080:80

# List pods
$ podman pod list
POD ID        NAME     STATUS   CREATED        INFRA ID      # OF CONTAINERS
a1b2c3d4e5f6  my-pod   Created  5 seconds ago  f6e5d4c3b2a1  1

# Add an nginx container to the pod
$ podman run -d --pod my-pod --name web docker.io/library/nginx

# Add a sidecar container to the same pod
$ podman run -d --pod my-pod --name sidecar docker.io/library/alpine \
    sh -c "while true; do wget -qO- http://localhost:80 && sleep 5; done"

# The sidecar can reach nginx on localhost because they share a network namespace!

# View pod details
$ podman pod inspect my-pod

# View containers in the pod
$ podman ps --pod
CONTAINER ID  IMAGE                           COMMAND               STATUS         PORTS                 POD ID        PODNAME
f6e5d4c3b2a1  localhost/podman-pause:4.9.4-0  /pause                Up 2 minutes   0.0.0.0:8080->80/tcp  a1b2c3d4e5f6  my-pod
9a8b7c6d5e4f  docker.io/library/nginx:latest  nginx -g daemon o...  Up 2 minutes   0.0.0.0:8080->80/tcp  a1b2c3d4e5f6  my-pod
1a2b3c4d5e6f  docker.io/library/alpine:latest sh -c while true...   Up 1 minute    0.0.0.0:8080->80/tcp  a1b2c3d4e5f6  my-pod

# Stop the entire pod
$ podman pod stop my-pod

# Remove the pod and all its containers
$ podman pod rm my-pod

Think About It: The pod model is powerful because related containers (app + log forwarder, app + metrics collector, app + TLS proxy) can communicate over localhost, just as if they were processes on the same machine. This is the design pattern Kubernetes uses for sidecar containers.


Generating systemd Units

One of Podman's killer features for production use is generating systemd service files directly from containers. This lets systemd manage your containers like any other service -- starting them at boot, restarting on failure, and managing dependencies.

# Run a container
$ podman run -d --name my-web -p 8080:80 docker.io/library/nginx

# Generate a systemd unit file
$ podman generate systemd --name my-web --new --files
/home/user/container-my-web.service

# View the generated file
$ cat container-my-web.service
[Unit]
Description=Podman container-my-web.service
Wants=network-online.target
After=network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
TimeoutStopSec=70
ExecStartPre=/bin/rm -f %t/%n.ctr-id
ExecStart=/usr/bin/podman run \
    --cidfile=%t/%n.ctr-id \
    --cgroups=no-conmon \
    --rm \
    --sdnotify=conmon \
    -d \
    --replace \
    --name my-web \
    -p 8080:80 \
    docker.io/library/nginx
ExecStop=/usr/bin/podman stop --ignore --cidfile=%t/%n.ctr-id
ExecStopPost=/usr/bin/podman rm -f --ignore --cidfile=%t/%n.ctr-id
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

Install and enable:

# For rootless containers (user-level systemd)
$ mkdir -p ~/.config/systemd/user/
$ cp container-my-web.service ~/.config/systemd/user/
$ systemctl --user daemon-reload
$ systemctl --user enable --now container-my-web.service
$ systemctl --user status container-my-web.service

# For system-level containers (requires root)
$ sudo cp container-my-web.service /etc/systemd/system/
$ sudo systemctl daemon-reload
$ sudo systemctl enable --now container-my-web.service

For user-level services to run at boot (even before login):

$ sudo loginctl enable-linger $USER

Distro Note: On newer Podman versions (4.4+), Podman recommends Quadlet files instead of podman generate systemd. Quadlet uses .container files in ~/.config/containers/systemd/ that are simpler to write and maintain.

Quadlet (Modern Approach)

Create ~/.config/containers/systemd/my-web.container:

[Container]
Image=docker.io/library/nginx
PublishPort=8080:80

[Service]
Restart=always

[Install]
WantedBy=default.target
$ systemctl --user daemon-reload
$ systemctl --user start my-web
$ systemctl --user status my-web

Buildah: Building Images Without a Daemon

Buildah is a companion tool for building OCI container images. While Podman can also build images (using podman build), Buildah offers more flexibility and does not require a Dockerfile.

# Install Buildah
$ sudo dnf install -y buildah      # Fedora/RHEL
$ sudo apt install -y buildah      # Debian/Ubuntu

Building with a Dockerfile

# Buildah can build from Dockerfiles
$ buildah build -t my-app -f Dockerfile .

# This is equivalent to
$ podman build -t my-app -f Dockerfile .

Building Without a Dockerfile

Buildah's unique feature is scripted builds:

# Create a container from a base image
$ container=$(buildah from docker.io/library/ubuntu:22.04)

# Run commands inside it
$ buildah run $container -- apt-get update
$ buildah run $container -- apt-get install -y python3
$ buildah run $container -- mkdir /app

# Copy files in
$ buildah copy $container app.py /app/app.py

# Set configuration
$ buildah config --cmd "python3 /app/app.py" $container
$ buildah config --port 8000 $container
$ buildah config --author "Your Name" $container

# Commit as a new image
$ buildah commit $container my-scripted-app

# Clean up the working container
$ buildah rm $container

# The image is now available
$ podman images | grep my-scripted-app

Why use Buildah over a Dockerfile?

  • Shell scripting (loops, conditionals, variables)
  • No daemon required
  • Fine-grained layer control
  • Can mount host directories during build without COPY

Skopeo: Inspecting and Copying Images

Skopeo inspects and copies container images between registries without pulling them to local storage first.

# Install Skopeo
$ sudo dnf install -y skopeo       # Fedora/RHEL
$ sudo apt install -y skopeo       # Debian/Ubuntu
# Inspect a remote image without downloading it
$ skopeo inspect docker://docker.io/library/nginx:latest
{
    "Name": "docker.io/library/nginx",
    "Tag": "latest",
    "Digest": "sha256:...",
    "Created": "2024-01-15T...",
    "Architecture": "amd64",
    "Os": "linux",
    ...
}

# Copy an image between registries (no local pull needed)
$ skopeo copy docker://docker.io/library/nginx:latest \
    docker://registry.example.com/nginx:latest

# Copy an image to a local directory (for air-gapped environments)
$ skopeo copy docker://docker.io/library/nginx:latest \
    dir:/tmp/nginx-image

# Copy an image to a local OCI archive
$ skopeo copy docker://docker.io/library/nginx:latest \
    oci-archive:/tmp/nginx.tar

# List tags for a remote image
$ skopeo list-tags docker://docker.io/library/nginx

Skopeo is invaluable for:

  • Inspecting images before pulling them
  • Copying images between registries
  • Mirroring images for air-gapped environments
  • Checking image digests and signatures

Podman Compose

Podman supports Docker Compose files through two methods:

Method 1: podman-compose (standalone tool)

$ sudo dnf install -y podman-compose   # Fedora
$ pip install podman-compose           # Any distro
# Uses the same docker-compose.yml files
$ podman-compose up -d
$ podman-compose ps
$ podman-compose down

Method 2: podman compose (built-in, uses docker-compose or podman-compose)

# Podman 4.1+ can use the compose subcommand
$ podman compose up -d

Both methods work with standard docker-compose.yml files. Here is an example:

# docker-compose.yml (works with both Docker and Podman)
services:
  web:
    image: docker.io/library/nginx:alpine
    ports:
      - "8080:80"
    volumes:
      - ./html:/usr/share/nginx/html:ro

Migrating from Docker to Podman

If you are coming from Docker, the migration is straightforward.

Step 1: The alias trick

# Add to ~/.bashrc
alias docker=podman

Most Docker commands work identically with Podman. The exceptions are Docker-specific features like docker swarm.

Step 2: Update image references

# Docker (implicit docker.io)
$ docker pull nginx

# Podman (explicit registry recommended)
$ podman pull docker.io/library/nginx

Step 3: Replace Docker socket for tools that need it

Some tools expect /var/run/docker.sock. Podman can emulate this:

# Enable the Podman socket (rootless)
$ systemctl --user enable --now podman.socket

# The socket is at
$ ls $XDG_RUNTIME_DIR/podman/podman.sock

# Set DOCKER_HOST for compatibility
$ export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock

Step 4: Handle compose files

Replace docker compose with podman-compose or podman compose.

Step 5: Replace systemd integration

Instead of Docker's --restart=always, use podman generate systemd or Quadlet files for proper systemd integration.


Debug This

A user tries to run a rootless container but gets a permission error:

$ podman run -d -p 8080:80 docker.io/library/nginx
Error: error creating container storage: ... operation not permitted

Diagnosis:

# Check subuid/subgid configuration
$ grep $USER /etc/subuid
# (empty -- no output)

$ grep $USER /etc/subgid
# (empty -- no output)

The problem: The user does not have subordinate UID/GID ranges configured. Rootless containers need these for user namespace mapping.

Fix:

$ sudo usermod --add-subuids 100000-165535 --add-subgids 100000-165535 $USER

# Reset Podman's user storage
$ podman system migrate

# Try again
$ podman run -d -p 8080:80 docker.io/library/nginx
# Works!

What Just Happened?

┌─────────────────────────────────────────────────────────────┐
│                    CHAPTER RECAP                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Podman is a daemonless, rootless container engine.         │
│                                                             │
│  No daemon = no single point of failure, smaller            │
│  attack surface, no root daemon process.                   │
│                                                             │
│  Rootless by default = containers run as your user,         │
│  UID 0 inside maps to unprivileged UID on host.            │
│                                                             │
│  CLI is nearly identical to Docker. alias docker=podman     │
│  works for most workflows.                                 │
│                                                             │
│  Pods group containers sharing network/IPC namespaces,      │
│  mirroring Kubernetes pod architecture.                    │
│                                                             │
│  podman generate systemd / Quadlet files integrate          │
│  containers with systemd for production use.               │
│                                                             │
│  Buildah builds images without a daemon.                   │
│  Skopeo inspects and copies images between registries.     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Try This

  1. Rootless basics: Run an nginx container with Podman as your regular user. Verify with ps aux that the container processes run as your user, not root. Access the web server to confirm it works.

  2. Pod creation: Create a pod with two containers: an nginx web server and an Alpine container that periodically curls http://localhost:80. Verify the sidecar container can reach nginx via localhost.

  3. systemd integration: Generate a systemd unit file for a container with podman generate systemd. Install it as a user-level service. Reboot and verify the container starts automatically.

  4. Buildah scripted build: Use Buildah commands (not a Dockerfile) to create an image that contains your favorite programming language runtime and a simple script. Commit it and run it with Podman.

  5. Skopeo exploration: Use skopeo inspect to examine the python:3.12-slim image on Docker Hub without downloading it. Find out the image size, creation date, and architecture.

  6. Bonus Challenge: Set up a rootless Podman environment on a system where Docker is also installed. Configure the Podman socket and set DOCKER_HOST so that docker-compose (the Docker tool) actually uses Podman as its backend. This demonstrates the socket compatibility layer.

LXC/LXD & System Containers

Why This Matters

When most people hear "containers," they think of Docker -- packaging a single application into a lightweight image. But there is another kind of container that behaves more like a virtual machine: the system container.

Imagine you need to give 50 students each their own Linux environment for a class. Virtual machines would work, but each one needs its own kernel, gobbling up RAM and boot time. Docker containers run single applications, not full OS environments. What you want is something in between: a container that feels like a full Linux system -- with systemd, multiple services, user accounts, SSH access -- but runs as lightweight as a container.

That is exactly what LXC (Linux Containers) and LXD (now called Incus after the project forked) provide. System containers run a full init system, support multiple processes, and feel like virtual machines but start in seconds and share the host kernel.

If you are setting up development environments, CI/CD build farms, multi-tenant hosting, or any scenario where you need many lightweight Linux instances, system containers are the right tool.


Try This Right Now

If you have LXD or Incus installed:

$ lxc launch ubuntu:22.04 my-first-container
Creating my-first-container
Starting my-first-container

$ lxc exec my-first-container -- bash
root@my-first-container:~# systemctl status
● my-first-container
    State: running
     Jobs: 0 queued
   Failed: 0 units
root@my-first-container:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
root@my-first-container:~# exit

You just launched a full Ubuntu system in seconds. It has systemd, can run multiple services, has its own network, and behaves exactly like a lightweight VM.


System Containers vs Application Containers

This is the key distinction that defines when to use LXC/LXD versus Docker/Podman.

Application Container (Docker/Podman):     System Container (LXC/LXD):

┌─────────────────────────┐              ┌─────────────────────────┐
│    Single Application   │              │   Full Init System      │
│    (nginx, python, etc.)│              │   (systemd/openrc)      │
│                         │              │                         │
│    No init system       │              │   Multiple services     │
│    One main process     │              │   (sshd, cron, nginx,   │
│    No SSH daemon        │              │    logging, etc.)       │
│    No cron, no syslog   │              │                         │
│    Ephemeral            │              │   User accounts         │
│    Immutable image      │              │   Package manager       │
│                         │              │   Feels like a VM       │
└─────────────────────────┘              └─────────────────────────┘

Use for: microservices,                  Use for: dev environments,
CI/CD, app packaging                     VPS hosting, testing,
                                         full OS simulation
FeatureApplication ContainerSystem Container
Init systemNone (PID 1 is the app)Full (systemd, etc.)
ProcessesSingle process (ideally)Multiple services
Image modelLayered, immutableFull OS, mutable
LifecycleCreate, run, destroyLong-lived, like a VM
Package managementBaked into image at build timeapt/dnf inside the container
SSH accessNot typical (use exec)Supported and common
Boot timeMilliseconds1-3 seconds
KernelShares host kernelShares host kernel
DensityThousands per hostHundreds per host

LXC: The Foundation

LXC (Linux Containers) is the original Linux container technology. It uses the same kernel features as Docker (namespaces, cgroups, chroot) but is designed to run full system environments rather than single applications.

LXC provides:

  • Low-level container runtime
  • Configuration files for each container
  • Template-based container creation
  • Direct mapping to kernel features
# Install LXC
$ sudo apt install -y lxc          # Debian/Ubuntu
$ sudo dnf install -y lxc          # Fedora/RHEL

LXC works but is relatively low-level. Most users today interact with it through LXD or Incus, which provide a much better user experience.


LXD and Incus: The Modern Manager

LXD was created by Canonical as a user-friendly management layer on top of LXC. In 2023, Canonical moved LXD under their corporate control, and the Linux Containers community forked it into Incus. Incus is the community-maintained successor.

┌──────────────────────────────────────┐
│          Management Layer            │
│  ┌──────────┐    ┌──────────┐       │
│  │   LXD    │    │  Incus   │       │
│  │(Canonical)│   │(Community)│       │
│  └────┬─────┘    └────┬─────┘       │
│       └───────┬───────┘             │
│               ▼                      │
│  ┌──────────────────────┐           │
│  │    LXC (runtime)     │           │
│  └──────────┬───────────┘           │
│             ▼                        │
│  ┌──────────────────────┐           │
│  │ Namespaces + Cgroups  │          │
│  │ (Linux Kernel)        │          │
│  └──────────────────────┘           │
└──────────────────────────────────────┘

Both LXD and Incus use the same lxc client command. In this chapter, we use lxc commands that work with either backend.

Installing LXD

On Ubuntu (snap-based):

$ sudo snap install lxd
$ sudo lxd init

The lxd init wizard configures storage, networking, and defaults:

Would you like to use LXD clustering? (yes/no) [default=no]: no
Do you want to configure a new storage pool? (yes/no) [default=yes]: yes
Name of the new storage pool [default=default]: default
Name of the storage backend (btrfs, dir, lvm, zfs) [default=zfs]: dir
Would you like to connect to a MAAS server? (yes/no) [default=no]: no
Would you like to create a new local network bridge? (yes/no) [default=yes]: yes
What should the new bridge be called? [default=lxdbr0]: lxdbr0
What IPv4 address should be used? (CIDR, "auto" or "none") [default=auto]: auto
What IPv6 address should be used? (CIDR, "auto" or "none") [default=auto]: none
Would you like the LXD server to be available over the network? (yes/no) [default=no]: no
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: yes
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: no

Distro Note: On Fedora, use Incus instead: sudo dnf install incus and sudo incus admin init. The incus command replaces lxc but works identically. On Arch Linux: sudo pacman -S incus.

Add your user to the lxd group:

$ sudo usermod -aG lxd $USER
# Log out and back in

Hands-On: Working with Containers

Launching Containers

# Launch an Ubuntu container
$ lxc launch ubuntu:22.04 web-server

# Launch a Debian container
$ lxc launch images:debian/12 db-server

# Launch a CentOS container
$ lxc launch images:centos/9-Stream build-server

# Launch an Alpine container (very small)
$ lxc launch images:alpine/3.19 tiny

# List all containers
$ lxc list
+──────────────+─────────+──────────────────────+──────────────+────────────+
│     NAME     │  STATE  │        IPV4          │     TYPE     │ SNAPSHOTS  │
+──────────────+─────────+──────────────────────+──────────────+────────────+
│ web-server   │ RUNNING │ 10.10.10.100 (eth0)  │ CONTAINER    │ 0          │
│ db-server    │ RUNNING │ 10.10.10.101 (eth0)  │ CONTAINER    │ 0          │
│ build-server │ RUNNING │ 10.10.10.102 (eth0)  │ CONTAINER    │ 0          │
│ tiny         │ RUNNING │ 10.10.10.103 (eth0)  │ CONTAINER    │ 0          │
+──────────────+─────────+──────────────────────+──────────────+────────────+

Interacting with Containers

# Execute a command
$ lxc exec web-server -- cat /etc/os-release

# Get a shell
$ lxc exec web-server -- bash

# Run as a specific user
$ lxc exec web-server -- su - ubuntu

# Push a file into the container
$ lxc file push local-file.txt web-server/root/file.txt

# Pull a file from the container
$ lxc file pull web-server/var/log/syslog ./syslog-copy

# Edit a file inside the container (opens in $EDITOR)
$ lxc file edit web-server/etc/nginx/nginx.conf

Container Lifecycle

# Stop a container (graceful shutdown)
$ lxc stop web-server

# Start a container
$ lxc start web-server

# Restart a container
$ lxc restart web-server

# Force stop (like power off)
$ lxc stop web-server --force

# Pause (freeze) a container
$ lxc pause web-server

# Delete a container
$ lxc delete web-server

# Delete a running container (force)
$ lxc delete web-server --force

Safety Warning: lxc delete permanently removes the container and all its data. There is no undo. Always use snapshots or backups before deleting containers with important data.

Installing Services Inside System Containers

Because system containers run a full OS, you can use them like regular machines:

$ lxc exec web-server -- bash

# Inside the container -- it is a full system!
root@web-server:~# apt update
root@web-server:~# apt install -y nginx
root@web-server:~# systemctl enable --now nginx
root@web-server:~# curl http://localhost
<!DOCTYPE html>
<html>
<head><title>Welcome to nginx!</title></head>
...
root@web-server:~# exit

Think About It: You just installed nginx inside a container using apt install, and it is running under systemd with systemctl enable. You could not do this in a Docker container. This is what makes system containers feel like VMs -- they run a real init system and manage services normally.


Profiles

Profiles are reusable configuration templates that you can apply to containers. They control resource limits, networking, storage, and more.

# List profiles
$ lxc profile list
+─────────+──────────+
│  NAME   │ USED BY  │
+─────────+──────────+
│ default │    4     │
+─────────+──────────+

# View the default profile
$ lxc profile show default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default

Creating a Custom Profile

# Create a profile for web servers
$ lxc profile create web-server

$ lxc profile edit web-server
config:
  limits.cpu: "2"
  limits.memory: 1GB
  limits.memory.swap: "false"
  security.nesting: "false"
description: Profile for web server containers
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size: 10GB
    type: disk
# Launch a container with this profile
$ lxc launch ubuntu:22.04 prod-web --profile web-server

# Apply a profile to an existing container
$ lxc profile add existing-container web-server

# View container configuration
$ lxc config show prod-web

Setting Resource Limits Directly

# Set CPU limit
$ lxc config set web-server limits.cpu 2

# Set memory limit
$ lxc config set web-server limits.memory 512MB

# Set disk quota (requires btrfs or zfs storage)
$ lxc config device set web-server root size=5GB

# View resource usage
$ lxc info web-server

Storage Pools

LXD manages storage through pools. Different backends offer different features:

BackendSnapshotsQuotasFast CloneBest For
dirYes (slow)NoNoSimple setups
btrfsYes (instant)YesYes (CoW)Development
zfsYes (instant)YesYes (CoW)Production
lvmYesYesYesEnterprise
# List storage pools
$ lxc storage list

# Create a new storage pool
$ lxc storage create fast-pool zfs size=50GB

# View pool details
$ lxc storage show fast-pool

# View pool usage
$ lxc storage info fast-pool

Networking

LXD provides several networking options.

Default Bridge Network

By default, containers connect to a managed bridge (lxdbr0). LXD runs its own DHCP and DNS server on this bridge.

# View network configuration
$ lxc network show lxdbr0
config:
  ipv4.address: 10.10.10.1/24
  ipv4.nat: "true"
  ipv4.dhcp: "true"
  dns.mode: managed
name: lxdbr0
type: bridge

Containers on the bridge can reach each other by name:

$ lxc exec web-server -- ping -c 2 db-server
PING db-server (10.10.10.101): 56 data bytes
64 bytes from 10.10.10.101: seq=0 ttl=64 time=0.050 ms

Port Forwarding

To make a container service accessible from the host network:

# Forward host port 80 to container port 80
$ lxc config device add web-server http proxy \
    listen=tcp:0.0.0.0:80 connect=tcp:127.0.0.1:80

# Remove the port forward
$ lxc config device remove web-server http

Macvlan (Direct Network Access)

For containers that need to appear directly on the physical network:

# Create a macvlan profile
$ lxc profile create direct-net
$ lxc profile device add direct-net eth0 nic \
    nictype=macvlan parent=eth0

# Launch a container with direct network access
$ lxc launch ubuntu:22.04 direct-vm --profile direct-net
# This container gets an IP from your physical network's DHCP

Snapshots

Snapshots capture the complete state of a container at a point in time.

# Create a snapshot
$ lxc snapshot web-server clean-install

# List snapshots
$ lxc info web-server
...
Snapshots:
  clean-install (taken at 2024/02/21 10:00 UTC) (stateless)

# Restore from a snapshot
$ lxc restore web-server clean-install

# Create a new container from a snapshot
$ lxc copy web-server/clean-install web-server-clone

# Delete a snapshot
$ lxc delete web-server/clean-install

# Automatic snapshots (create one daily, keep 7)
$ lxc config set web-server snapshots.schedule "0 2 * * *"
$ lxc config set web-server snapshots.schedule.stopped "false"
$ lxc config set web-server snapshots.expiry 7d

Think About It: With ZFS or Btrfs storage backends, snapshots are nearly instant and consume no extra space initially (copy-on-write). This makes them incredibly useful for testing: snapshot, make changes, restore if something breaks.


Image Management

LXD pulls images from remote image servers.

# List configured remotes
$ lxc remote list
+──────────────+──────────────────────────────+──────────+
│    NAME      │             URL              │ PROTOCOL │
+──────────────+──────────────────────────────+──────────+
│ images       │ https://images.linuxcontai...│ simplestr│
│ ubuntu       │ https://cloud-images.ubunt...│ simplestr│
│ ubuntu-daily │ https://cloud-images.ubunt...│ simplestr│
+──────────────+──────────────────────────────+──────────+

# List available images from the "images" remote
$ lxc image list images: | head -30

# Search for Debian images
$ lxc image list images: debian

# List locally cached images
$ lxc image list

# Create an image from an existing container
$ lxc publish web-server --alias my-web-template

# Launch from your custom image
$ lxc launch my-web-template web-server-2

# Export an image to a file (for transfer)
$ lxc image export my-web-template /tmp/web-template

# Import an image from a file
$ lxc image import /tmp/web-template.tar.gz --alias imported-web

Use Cases for System Containers

1. Development Environments

Give each developer their own full Linux environment:

$ lxc launch ubuntu:22.04 dev-alice --profile dev-workstation
$ lxc launch ubuntu:22.04 dev-bob --profile dev-workstation

# Each developer gets SSH access to their own container

2. CI/CD Build Environments

Clean build environments that spin up in seconds:

# Create a golden image with build tools
$ lxc launch ubuntu:22.04 build-template
$ lxc exec build-template -- bash
root# apt install -y build-essential git cmake python3-pip
root# exit
$ lxc publish build-template --alias ci-base
$ lxc delete build-template

# Each CI job gets a fresh clone
$ lxc launch ci-base build-job-42
# ... run build ...
$ lxc delete build-job-42 --force

3. Multi-Tenant Hosting

Lightweight VPS hosting:

# Create profiles with resource limits
$ lxc profile create tenant-small
# limits.cpu: 1, limits.memory: 512MB, root size: 10GB

$ lxc profile create tenant-medium
# limits.cpu: 2, limits.memory: 2GB, root size: 50GB

# Launch tenant containers
$ lxc launch ubuntu:22.04 tenant-acme --profile tenant-small
$ lxc launch ubuntu:22.04 tenant-bigco --profile tenant-medium

LXD vs Docker: When to Use Which

Need a full Linux system?                    → LXD
Need to package a single application?         → Docker
Need systemd, cron, SSH inside?              → LXD
Need fast, reproducible app deployment?       → Docker
Need persistent, long-lived environments?     → LXD
Need ephemeral, disposable environments?      → Docker
Need to run multiple services per container?  → LXD
Need microservice architecture?               → Docker
Need VM-like behavior without VM overhead?   → LXD
Need CI/CD pipeline integration?              → Both work well

You can even run Docker inside LXD containers (with security.nesting enabled):

$ lxc config set my-container security.nesting true
$ lxc exec my-container -- bash
root# apt install -y docker.io
root# systemctl start docker
root# docker run hello-world

Debug This

A container will not start and shows a permissions error:

$ lxc start my-container
Error: Failed to run: ... apparmor ... DENIED

Diagnosis:

# Check the container log
$ lxc info my-container --show-log

The error is usually related to AppArmor or security profiles blocking the container's operations.

Fix:

# Check if the LXD AppArmor profile is loaded
$ sudo aa-status | grep lxd

# If profiles are missing, reload them
$ sudo systemctl restart snap.lxd.daemon  # snap install
$ sudo systemctl restart lxd              # package install

# As a last resort, set the container to unconfined (not recommended for production)
$ lxc config set my-container raw.lxc "lxc.apparmor.profile=unconfined"

Another common issue -- running out of storage:

$ lxc launch ubuntu:22.04 test
Error: Failed to ... no space left on device

Fix:

# Check storage pool usage
$ lxc storage info default

# If using dir backend, check host disk space
$ df -h

# Clean up unused images
$ lxc image list --format csv | grep -v CACHED || lxc image prune

What Just Happened?

┌─────────────────────────────────────────────────────────────┐
│                    CHAPTER RECAP                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  System containers run full Linux systems with init,        │
│  multiple services, and user accounts.                     │
│                                                             │
│  LXC is the low-level runtime; LXD/Incus provide           │
│  the user-friendly management layer.                       │
│                                                             │
│  lxc launch = create + start a container in seconds.        │
│  lxc exec = run commands inside containers.                │
│  lxc snapshot = point-in-time recovery.                    │
│                                                             │
│  Profiles define reusable configuration templates           │
│  for CPU, memory, storage, and networking.                  │
│                                                             │
│  Storage backends (ZFS, Btrfs) enable instant snapshots.    │
│                                                             │
│  Networking: bridge (default), macvlan (direct),            │
│  proxy devices (port forwarding).                          │
│                                                             │
│  Use system containers for: dev environments, CI/CD,        │
│  multi-tenant hosting, VM-like workloads.                  │
│  Use application containers for: microservices,             │
│  app packaging, single-process workloads.                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Try This

  1. Launch and explore: Launch containers from three different distributions (Ubuntu, Debian, Alpine). Log into each and observe the differences -- package manager, init system, default packages.

  2. Profile practice: Create a profile called limited that restricts containers to 1 CPU and 256MB of RAM. Launch a container with this profile. Inside the container, run free -m and nproc to verify the limits are enforced.

  3. Snapshot workflow: Launch a container, install nginx, create a snapshot. Then break nginx (delete its config files). Restore from the snapshot and verify nginx works again.

  4. Custom image: Configure a container with your preferred development tools (git, vim, your favorite language runtime). Publish it as a custom image. Launch three clones from that image and verify they are all identical.

  5. Networking lab: Launch two containers. Verify they can ping each other by name. Add a proxy device to make one container's nginx accessible from the host on port 8080.

  6. Bonus Challenge: Launch a container, enable security.nesting, install Docker inside it, and run a Docker container within the LXD container. You are now running containers inside containers. Verify the nested container can access the network.

Container Orchestration Primer

Why This Matters

You have learned to build and run containers. You can spin up an nginx container, a database, a Python app -- all on a single machine. Things are going well.

Then the business grows. Your application now needs to handle ten times the traffic. You need to run 20 copies of your web server across 5 machines. When one copy crashes at 3 AM, something needs to restart it automatically. When you push a new version, you want to update containers one at a time with zero downtime. The containers need to find each other across machines. Health checks need to run. Logs need to be centralized. Secrets need to be distributed securely.

Doing all of this manually is a full-time job. Container orchestration automates it. An orchestrator takes your desired state ("I want 5 copies of my web server, always running, behind a load balancer") and continuously works to make reality match that desired state.

This chapter introduces orchestration concepts and gives you hands-on experience with k3s, a lightweight Kubernetes distribution that you can run on a single machine to learn the fundamentals.


Try This Right Now

Install k3s (a lightweight Kubernetes distribution) and run your first pod:

# Install k3s (takes about 30 seconds)
$ curl -sfL https://get.k3s.io | sh -

# Wait for it to be ready
$ sudo k3s kubectl get nodes
NAME       STATUS   ROLES                  AGE   VERSION
myhost     Ready    control-plane,master   30s   v1.29.2+k3s1

# Run your first pod
$ sudo k3s kubectl run my-nginx --image=nginx --port=80

# Check it is running
$ sudo k3s kubectl get pods
NAME       READY   STATUS    RESTARTS   AGE
my-nginx   1/1     Running   0          10s

You just deployed a container to a Kubernetes cluster. The cluster is managing it -- if the container crashes, Kubernetes will restart it automatically.


Why Orchestration?

Running a single container on a single machine is straightforward. Running containers at scale requires solving many problems simultaneously:

Without Orchestration:              With Orchestration:

"Container crashed at 3 AM"         Automatic restart
"How do I scale to 10 copies?"      kubectl scale --replicas=10
"Zero-downtime deployment?"         Rolling update, built-in
"Which host has capacity?"          Automatic scheduling
"How do containers find each       Service discovery + DNS
 other?"
"One host died, 5 containers       Auto-rescheduled to
 went down"                         healthy hosts
"Distribute secrets securely"       Secret management
"Health checking"                   Liveness + readiness probes

The Core Problems Orchestration Solves

ProblemSolution
ScalingRun N copies of a container, add more as needed
Self-healingAutomatically restart failed containers
Service discoveryContainers find each other by name, across hosts
Load balancingDistribute traffic across container replicas
Rolling updatesDeploy new versions with zero downtime
SchedulingPlace containers on hosts with available resources
Secret managementSecurely inject passwords, API keys, certificates
ConfigurationSeparate config from code, inject at runtime
StorageManage persistent storage across hosts
NetworkingOverlay networks spanning multiple hosts

Kubernetes Concepts

Kubernetes (often shortened to "k8s") is the industry standard for container orchestration. It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation (CNCF).

The Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Control Plane                             │
│  ┌────────────┐ ┌────────────┐ ┌──────────────────────┐    │
│  │ API Server │ │ Scheduler  │ │ Controller Manager   │    │
│  │ (kube-     │ │ (decides   │ │ (ensures desired     │    │
│  │ apiserver) │ │ where pods │ │  state matches       │    │
│  │            │ │ run)       │ │  actual state)       │    │
│  └─────┬──────┘ └────────────┘ └──────────────────────┘    │
│        │                                                    │
│  ┌─────┴──────┐                                            │
│  │   etcd     │  (distributed key-value store)             │
│  │ (cluster   │  (stores all cluster state)                │
│  │  state)    │                                            │
│  └────────────┘                                            │
├─────────────────────────────────────────────────────────────┤
│                    Worker Nodes                              │
│  ┌──────────────────────┐  ┌──────────────────────┐        │
│  │      Node 1          │  │      Node 2          │        │
│  │ ┌──────┐ ┌──────┐   │  │ ┌──────┐ ┌──────┐   │        │
│  │ │Pod A │ │Pod B │   │  │ │Pod C │ │Pod D │   │        │
│  │ └──────┘ └──────┘   │  │ └──────┘ └──────┘   │        │
│  │                      │  │                      │        │
│  │ kubelet  kube-proxy  │  │ kubelet  kube-proxy  │        │
│  │ (agent)  (networking)│  │ (agent)  (networking)│        │
│  └──────────────────────┘  └──────────────────────┘        │
└─────────────────────────────────────────────────────────────┘

Key Kubernetes Objects

Let us walk through the fundamental building blocks.

Pods

A pod is the smallest deployable unit in Kubernetes. It is one or more containers that share networking and storage. In most cases, a pod runs a single container.

# pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-web
  labels:
    app: web
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      ports:
        - containerPort: 80

Why pods and not just containers? Because sometimes closely related containers need to share resources. A web server and its log shipper, for example, can run in the same pod, sharing the filesystem and network namespace (just like Podman pods from Chapter 64).

Deployments

A Deployment manages a set of identical pods. You tell it how many replicas you want, and it ensures that many are always running. It also handles rolling updates.

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: nginx
          image: nginx:1.25
          ports:
            - containerPort: 80
          resources:
            requests:
              memory: "64Mi"
              cpu: "100m"
            limits:
              memory: "128Mi"
              cpu: "250m"

This says: "I always want 3 copies of this nginx pod running. Each one gets at least 64MB RAM and 0.1 CPU, and must not exceed 128MB RAM and 0.25 CPU."

Services

A Service provides a stable network endpoint to access a set of pods. Pods are ephemeral -- they can be created, destroyed, and rescheduled anywhere. A Service gives you a stable IP address and DNS name that routes to whatever pods are currently healthy.

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  selector:
    app: web
  ports:
    - port: 80
      targetPort: 80
  type: ClusterIP
                   ┌─────────────────┐
                   │   web-service   │
                   │  10.43.0.100:80 │
                   │  (stable IP)    │
                   └────────┬────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
         ┌────────┐   ┌────────┐   ┌────────┐
         │ Pod 1  │   │ Pod 2  │   │ Pod 3  │
         │ nginx  │   │ nginx  │   │ nginx  │
         └────────┘   └────────┘   └────────┘

Service types:

  • ClusterIP (default) -- accessible only within the cluster
  • NodePort -- exposes on each node's IP at a static port (30000-32767)
  • LoadBalancer -- provisions an external load balancer (in cloud environments)

Namespaces

Kubernetes namespaces provide logical separation within a cluster (not to be confused with Linux kernel namespaces, though they serve a similar conceptual purpose -- grouping and isolation).

# Default namespaces
$ kubectl get namespaces
NAME              STATUS   AGE
default           Active   1d
kube-system       Active   1d
kube-public       Active   1d
kube-node-lease   Active   1d

You can create namespaces for different teams, environments, or applications:

$ kubectl create namespace staging
$ kubectl create namespace production

Think About It: The word "namespace" appears in two different contexts in this book. Linux kernel namespaces (Chapter 62) isolate processes at the OS level. Kubernetes namespaces organize resources within a cluster. They share the concept of logical separation but operate at completely different layers.


Hands-On: k3s Kubernetes

k3s is a lightweight, production-ready Kubernetes distribution. It is perfect for learning, edge computing, IoT, and small-scale production use. It packages the entire Kubernetes control plane into a single ~70MB binary.

Installing k3s

# Install k3s
$ curl -sfL https://get.k3s.io | sh -

# Verify installation
$ sudo k3s kubectl get nodes

To avoid typing sudo k3s kubectl every time:

# Set up kubectl alias and kubeconfig
$ mkdir -p ~/.kube
$ sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
$ sudo chown $USER:$USER ~/.kube/config

# Now you can use kubectl directly
$ kubectl get nodes

Distro Note: k3s works on all major Linux distributions. On RHEL/CentOS, you may need to disable firewalld or open the required ports (6443 for API server, 8472 for flannel VXLAN).

Deploying Your First Application

Step 1: Create a Deployment

$ kubectl create deployment hello-web --image=nginx:1.25 --replicas=3
deployment.apps/hello-web created

Step 2: Watch the pods come up

$ kubectl get pods -w
NAME                         READY   STATUS              RESTARTS   AGE
hello-web-6b7b4f5c9d-abc12  0/1     ContainerCreating   0          2s
hello-web-6b7b4f5c9d-def34  0/1     ContainerCreating   0          2s
hello-web-6b7b4f5c9d-ghi56  0/1     ContainerCreating   0          2s
hello-web-6b7b4f5c9d-abc12  1/1     Running             0          5s
hello-web-6b7b4f5c9d-def34  1/1     Running             0          6s
hello-web-6b7b4f5c9d-ghi56  1/1     Running             0          7s

Press Ctrl+C to stop watching.

Step 3: Expose it as a Service

$ kubectl expose deployment hello-web --type=NodePort --port=80
service/hello-web exposed

$ kubectl get service hello-web
NAME        TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
hello-web   NodePort   10.43.0.200   <none>        80:31234/TCP   5s

The NodePort is 31234 (yours will differ). Access the web server:

$ curl http://localhost:31234
<!DOCTYPE html>
<html>
<head><title>Welcome to nginx!</title></head>
...

Step 4: Test self-healing

# Delete a pod (simulate a crash)
$ kubectl delete pod hello-web-6b7b4f5c9d-abc12
pod "hello-web-6b7b4f5c9d-abc12" deleted

# Watch Kubernetes immediately create a replacement
$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
hello-web-6b7b4f5c9d-def34  1/1     Running   0          5m
hello-web-6b7b4f5c9d-ghi56  1/1     Running   0          5m
hello-web-6b7b4f5c9d-xyz99  1/1     Running   0          3s    ← new pod!

Kubernetes detected that the desired state (3 replicas) did not match the actual state (2 running), and it automatically created a new pod.

Step 5: Scale up and down

# Scale to 5 replicas
$ kubectl scale deployment hello-web --replicas=5
deployment.apps/hello-web scaled

$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
hello-web-6b7b4f5c9d-def34  1/1     Running   0          10m
hello-web-6b7b4f5c9d-ghi56  1/1     Running   0          10m
hello-web-6b7b4f5c9d-xyz99  1/1     Running   0          5m
hello-web-6b7b4f5c9d-aaa11  1/1     Running   0          5s
hello-web-6b7b4f5c9d-bbb22  1/1     Running   0          5s

# Scale down to 2
$ kubectl scale deployment hello-web --replicas=2

$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
hello-web-6b7b4f5c9d-def34  1/1     Running   0          11m
hello-web-6b7b4f5c9d-ghi56  1/1     Running   0          11m

Step 6: Rolling update

# Update to a new image version
$ kubectl set image deployment/hello-web nginx=nginx:1.26

# Watch the rollout
$ kubectl rollout status deployment/hello-web
Waiting for deployment "hello-web" rollout to finish: 1 out of 2 new replicas updated...
Waiting for deployment "hello-web" rollout to finish: 1 out of 2 new replicas updated...
deployment "hello-web" successfully rolled out

# Verify the new version
$ kubectl describe deployment hello-web | grep Image
    Image: nginx:1.26

# If something goes wrong, roll back
$ kubectl rollout undo deployment/hello-web
deployment.apps/hello-web rolled back

Deploying from YAML Files

While imperative commands are great for learning, production Kubernetes is managed declaratively with YAML files.

Create app.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: web
          image: nginx:1.25
          ports:
            - containerPort: 80
          resources:
            requests:
              memory: "64Mi"
              cpu: "50m"
            limits:
              memory: "128Mi"
              cpu: "200m"
          livenessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 3
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 80
  type: NodePort

Apply it:

# Create/update resources from the YAML file
$ kubectl apply -f app.yaml
deployment.apps/my-app created
service/my-app-service created

# View all resources
$ kubectl get all

# Delete resources defined in the file
$ kubectl delete -f app.yaml

Essential kubectl Commands

# Cluster info
$ kubectl cluster-info
$ kubectl get nodes
$ kubectl get namespaces

# Viewing resources
$ kubectl get pods                    # List pods
$ kubectl get pods -o wide            # List with more details
$ kubectl get deployments             # List deployments
$ kubectl get services                # List services
$ kubectl get all                     # List everything

# Inspecting resources
$ kubectl describe pod <pod-name>     # Detailed pod info
$ kubectl describe deployment <name>  # Detailed deployment info
$ kubectl logs <pod-name>             # View pod logs
$ kubectl logs -f <pod-name>          # Follow pod logs

# Interacting
$ kubectl exec -it <pod-name> -- bash # Shell into a pod
$ kubectl port-forward <pod> 8080:80  # Forward local port to pod

# Debugging
$ kubectl get events                  # Cluster events
$ kubectl top pods                    # Resource usage
$ kubectl top nodes                   # Node resource usage

Docker Swarm: A Brief Overview

Docker Swarm is Docker's built-in orchestration tool. It is simpler than Kubernetes but less feature-rich. It is worth knowing about because it is easy to set up and may be sufficient for smaller deployments.

# Initialize a swarm
$ docker swarm init

# Deploy a service with 3 replicas
$ docker service create --name web --replicas 3 -p 8080:80 nginx

# List services
$ docker service ls

# View service details
$ docker service ps web

# Scale the service
$ docker service scale web=5

# Update the image (rolling update)
$ docker service update --image nginx:1.26 web

# Remove the service
$ docker service rm web

# Leave the swarm
$ docker swarm leave --force

Kubernetes vs Docker Swarm

FeatureKubernetesDocker Swarm
ComplexitySteep learning curveEasy to set up
ScalingExcellent, auto-scalingGood, manual
EcosystemVast (Helm, operators, etc.)Limited
Self-healingAdvanced health checksBasic restart
Rolling updatesConfigurable strategiesBasic rolling
CommunityMassive, industry standardDeclining
Best forProduction at scaleSmall deployments

For most new projects, Kubernetes (or a lightweight variant like k3s) is the better investment. Docker Swarm is being maintained but is not seeing significant new development.


When You Need Orchestration (and When You Do Not)

This is a critical decision. Orchestration adds complexity, and complexity has costs.

You Probably Need Orchestration When:

  • You run more than a handful of containers across multiple hosts
  • You need zero-downtime deployments
  • You need automatic failover when containers or hosts die
  • Your application has multiple interconnected services (microservices)
  • You need automatic scaling based on load
  • You need to manage secrets and configuration centrally
  • Multiple teams deploy to the same infrastructure

You Probably Do NOT Need Orchestration When:

  • You run 1-5 containers on a single server
  • Docker Compose handles your needs
  • Your application is a monolith (one container)
  • You do not need high availability
  • You are a small team with simple infrastructure
  • systemd + Podman (or Docker restart policies) is enough
Single server, 1-5 containers:
  → Docker Compose or Podman with systemd

Single server, 5-20 containers:
  → Docker Compose or Podman pods
  → Consider k3s if you want self-healing

Multiple servers, production:
  → Kubernetes (k3s, k0s, or managed K8s)

Large scale, multiple teams:
  → Full Kubernetes with proper infrastructure

Think About It: One of the most common mistakes in modern infrastructure is adopting Kubernetes before you need it. A single server running Docker Compose with proper backups and monitoring can handle a surprising amount of traffic. Start simple, add complexity only when the problems you face genuinely require it. The best orchestrator is the one that solves your actual problems without creating new ones.


Debug This

You deployed an application to Kubernetes, but the pods keep restarting:

$ kubectl get pods
NAME                    READY   STATUS             RESTARTS      AGE
my-app-abc12-def34      0/1     CrashLoopBackOff   5 (30s ago)   3m

Diagnosis:

# Check the pod logs
$ kubectl logs my-app-abc12-def34
Error: Cannot connect to database at db:5432

# Check events
$ kubectl describe pod my-app-abc12-def34
...
Events:
  Type     Reason     Message
  ----     ------     -------
  Normal   Pulled     Container image pulled
  Normal   Created    Container created
  Normal   Started    Container started
  Warning  BackOff    Back-off restarting failed container

The problem: The application expects a database service at db:5432, but there is no database pod or service named db in the cluster.

Fix: Either deploy the database first, or fix the application's database connection string:

# Deploy a database
$ kubectl create deployment db --image=postgres:16
$ kubectl expose deployment db --port=5432

# Or create the full stack with a YAML file that includes both

The CrashLoopBackOff status means Kubernetes is doing exactly what it should: restarting the failed container, but with increasing delays (back-off) because it keeps failing. Once the dependency is available, the next restart will succeed.


Cleaning Up

When you are done experimenting:

# Delete all resources you created
$ kubectl delete deployment hello-web my-app
$ kubectl delete service hello-web my-app-service

# To completely uninstall k3s
$ /usr/local/bin/k3s-uninstall.sh

Safety Warning: The k3s uninstall script removes all cluster data, including any persistent volumes. Make sure you do not need any data before running it.


What Just Happened?

┌─────────────────────────────────────────────────────────────┐
│                    CHAPTER RECAP                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Container orchestration automates: scaling, self-healing,  │
│  service discovery, rolling updates, and scheduling.       │
│                                                             │
│  Kubernetes is the industry standard orchestrator.          │
│                                                             │
│  Key Kubernetes objects:                                    │
│    Pod         → one or more containers                    │
│    Deployment  → manages replicas of pods                  │
│    Service     → stable network endpoint for pods          │
│    Namespace   → logical grouping of resources             │
│                                                             │
│  k3s is lightweight Kubernetes in a single binary.          │
│  kubectl is the CLI for interacting with the cluster.      │
│                                                             │
│  Desired state: you declare what you want.                 │
│  Kubernetes continuously reconciles actual → desired.      │
│                                                             │
│  Docker Swarm is simpler but less capable.                 │
│                                                             │
│  Not every project needs orchestration. Start with         │
│  Compose/systemd; adopt K8s when complexity demands it.    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Try This

  1. Deploy and scale: Install k3s, deploy an nginx application with 2 replicas, and expose it as a NodePort service. Verify you can access it. Scale to 5 replicas and confirm all pods are running.

  2. Self-healing test: With 3 replicas running, delete one pod. Time how long it takes Kubernetes to create a replacement. Delete two pods simultaneously and observe the recovery.

  3. Rolling update: Deploy nginx:1.25, then update to nginx:1.26 using kubectl set image. Watch the rollout with kubectl rollout status. Then roll back with kubectl rollout undo and verify the original version is restored.

  4. YAML deployment: Write a complete YAML file that defines a Deployment (3 replicas) and a Service (NodePort). Include resource limits and health checks. Apply it with kubectl apply -f and verify everything works.

  5. Debugging practice: Deliberately deploy an image that does not exist (e.g., nginx:nonexistent). Use kubectl describe pod, kubectl get events, and kubectl logs to diagnose why the pod is stuck in ImagePullBackOff.

  6. Bonus Challenge: Deploy a two-tier application: a web frontend and a backend API as separate Deployments. Create a Service for the backend so the frontend can reach it by name. This demonstrates service discovery in Kubernetes.

Infrastructure as Code Concepts

Why This Matters

Picture this: your team manages 50 servers. A junior admin logs into one and tweaks the Nginx config. A senior admin fixes a firewall rule on another -- but only that one. Someone else installs a newer version of Python on a staging box "just to test." Six months later, nobody can explain why staging works but production does not. The servers that were supposed to be identical have quietly drifted apart, and nobody documented a thing.

This is configuration drift, and it has caused countless outages, security breaches, and sleepless nights. Infrastructure as Code (IaC) exists to solve this problem. Instead of logging into servers and making changes by hand, you describe your desired infrastructure in files, store those files in version control, and let tools apply them automatically.

The result? Every change is tracked. Every environment is reproducible. Every server is consistent. And when something breaks, you can see exactly what changed, when, and who approved it.

This chapter introduces the concepts behind IaC -- the philosophy, the approaches, and the major tools. The next chapters put those tools to work.


Try This Right Now

You do not need any IaC tool installed yet to see the core idea. Open a terminal and create a simple shell script that describes how a server should be configured:

$ mkdir -p ~/iac-demo && cat > ~/iac-demo/setup-webserver.sh << 'EOF'
#!/bin/bash
# Desired state: nginx installed, running, and serving a custom page

# Ensure nginx is installed
if ! command -v nginx &>/dev/null; then
    sudo apt-get update && sudo apt-get install -y nginx
fi

# Ensure the index page has our content
echo "<h1>Managed by IaC</h1>" | sudo tee /var/www/html/index.html > /dev/null

# Ensure nginx is running
sudo systemctl enable --now nginx

echo "Server configured successfully."
EOF
chmod +x ~/iac-demo/setup-webserver.sh

Now look at what you just did: you wrote a file that describes how a server should look. You can run it on any fresh Ubuntu box and get the same result. You can store it in Git. You can review changes before applying them. That is the essence of Infrastructure as Code -- even if real IaC tools are far more sophisticated than a shell script.


What Is Infrastructure as Code?

Infrastructure as Code is the practice of managing and provisioning computing infrastructure through machine-readable definition files rather than through manual processes.

┌─────────────────────────────────────────────────────────────────┐
│                     THE OLD WAY                                  │
│                                                                  │
│   Admin → SSH into server → Type commands → Hope it works        │
│   Admin → SSH into next server → Try to remember same commands   │
│   Admin → Forget to update the third server                      │
│   Result: Snowflake servers, drift, mystery configs              │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│                     THE IaC WAY                                  │
│                                                                  │
│   Admin → Write config file → Commit to Git → Tool applies it   │
│   Same file → Applied to ALL servers → Identical state           │
│   Change needed? → Edit file → Review → Merge → Auto-apply      │
│   Result: Consistent, versioned, reproducible infrastructure     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The Core Principles

1. Everything is a file. Server configs, network rules, user accounts, installed packages -- all described in text files (YAML, JSON, HCL, or domain-specific formats).

2. Version control is the source of truth. Those files live in Git. The Git history is your change log. If production broke after the last commit, you can diff and see exactly why.

3. Automation replaces manual steps. No one logs into a server to make changes. Tools read your files and make the infrastructure match.

4. Environments are reproducible. Need a new staging environment? Apply the same files. Need to rebuild after a disaster? Apply the same files.


Declarative vs. Imperative Approaches

This is the most important conceptual distinction in IaC. Understanding it will shape how you think about every tool.

Imperative: "Here Are the Steps"

The imperative approach tells the tool how to reach the desired state, step by step. Think of it as a recipe:

1. Install nginx
2. Copy the config file to /etc/nginx/nginx.conf
3. Start nginx
4. Open port 80 in the firewall

Shell scripts are imperative. They say "do this, then this, then this." The problem? If you run the script twice, it might try to install nginx again (maybe that is fine, maybe it is not). If nginx is already running, trying to start it might produce a warning or error. You have to handle every edge case yourself.

Declarative: "Here Is What I Want"

The declarative approach tells the tool what the desired state is, and the tool figures out how to get there:

# Declarative: describe the desired state
web_server:
  package: nginx
  state: installed

  config_file:
    path: /etc/nginx/nginx.conf
    source: templates/nginx.conf

  service:
    state: running
    enabled: true

  firewall:
    port: 80
    state: open

Run it once, it installs and configures everything. Run it again -- nothing happens, because the system already matches the desired state. That property has a name: idempotency.

┌───────────────────────────────────────────────────────────────┐
│                   IMPERATIVE vs DECLARATIVE                    │
│                                                                │
│  IMPERATIVE (How)           │  DECLARATIVE (What)              │
│  ─────────────────          │  ──────────────────              │
│  "Install nginx"            │  "nginx should be installed"     │
│  "Start the service"        │  "service should be running"     │
│  "Create user bob"          │  "user bob should exist"         │
│                             │                                  │
│  You manage the steps       │  Tool manages the steps          │
│  You handle edge cases      │  Tool handles edge cases         │
│  Order matters              │  Order usually does not matter   │
│                             │                                  │
│  Examples:                  │  Examples:                       │
│  - Shell scripts            │  - Ansible playbooks             │
│  - Chef recipes             │  - Terraform configs             │
│  - Some Ansible ad-hoc      │  - Puppet manifests              │
│    commands                 │  - Kubernetes manifests          │
│                             │                                  │
└───────────────────────────────────────────────────────────────┘

Most modern IaC tools lean declarative, though some (like Ansible) blend both approaches.

Think About It: You have a script that creates a user account. If the user already exists, the script fails. How would you make this script idempotent? What would a declarative tool do differently?


Idempotency: The Most Important Word in IaC

An operation is idempotent if running it once produces the same result as running it multiple times. This is not just a nice-to-have -- it is fundamental to reliable automation.

Consider these two approaches:

# NOT idempotent -- fails on second run
useradd deploy

# Idempotent -- safe to run repeatedly
id deploy &>/dev/null || useradd deploy
# NOT idempotent -- appends duplicate line every run
echo "export PATH=/opt/app/bin:$PATH" >> /etc/profile

# Idempotent -- only adds if not present
grep -qxF 'export PATH=/opt/app/bin:$PATH' /etc/profile \
  || echo "export PATH=/opt/app/bin:$PATH" >> /etc/profile

IaC tools handle idempotency for you. When you write an Ansible task that says "ensure package nginx is installed," Ansible checks first and only installs if needed. When Terraform says "ensure this server exists," it checks its state file and only creates what is missing.

This is why IaC tools are better than raw scripts for managing infrastructure at scale. Writing truly idempotent shell scripts for complex configurations is surprisingly difficult. IaC tools have already solved those edge cases.


Configuration Management vs. Provisioning

IaC tools generally fall into two categories, and understanding the distinction helps you pick the right one.

Provisioning Tools

Provisioning tools create infrastructure: servers, networks, load balancers, DNS records, storage volumes. They answer the question: "What machines and resources should exist?"

┌─────────────────────────────────────────────────┐
│              PROVISIONING                        │
│                                                  │
│   "Create 3 VMs with 4 CPU, 8GB RAM each"       │
│   "Create a virtual network with subnet 10.0.1"  │
│   "Create a load balancer on port 443"           │
│   "Create a DNS record pointing to the LB"      │
│                                                  │
│   Tools: Terraform, OpenTofu, Pulumi             │
│                                                  │
└─────────────────────────────────────────────────┘

Configuration Management Tools

Configuration management tools configure existing machines: install packages, manage files, set up services, create users. They answer the question: "How should these machines be configured?"

┌─────────────────────────────────────────────────┐
│          CONFIGURATION MANAGEMENT                │
│                                                  │
│   "Install nginx on all web servers"             │
│   "Deploy this config file to /etc/nginx/"       │
│   "Ensure sshd is running and hardened"          │
│   "Create the deploy user with these SSH keys"   │
│                                                  │
│   Tools: Ansible, Puppet, Chef, Salt             │
│                                                  │
└─────────────────────────────────────────────────┘

They Work Together

In a real workflow, you often use both:

┌──────────────┐        ┌───────────────────┐        ┌────────────┐
│  Terraform   │ ────►  │   3 new servers    │ ────►  │  Ansible   │
│  provisions  │        │   are created      │        │  configures│
│  servers     │        │   (bare OS)        │        │  them all  │
└──────────────┘        └───────────────────┘        └────────────┘

Terraform creates the VMs; Ansible installs software and configures them. Some tools blur this line -- Ansible can do light provisioning, and Terraform can run scripts on new machines -- but the distinction remains useful.


Overview of Major IaC Tools

All of the following tools are open source. Each has different strengths, and understanding them will help you choose wisely.

Ansible

  • Type: Configuration management (with some provisioning)
  • Language: YAML (playbooks), Python (modules)
  • Approach: Declarative (mostly), push-based
  • Agent: None -- uses SSH
  • License: GPL v3

Ansible connects to machines over SSH and runs tasks. No agent software needs to be installed on the managed nodes. You write playbooks in YAML that describe the desired state, and Ansible modules do the work. It is the easiest IaC tool to start with and is covered in depth in Chapter 68.

Terraform / OpenTofu

  • Type: Provisioning
  • Language: HCL (HashiCorp Configuration Language)
  • Approach: Declarative
  • Agent: None -- uses provider APIs
  • License: OpenTofu is MPL 2.0 (Terraform changed to BSL in 2023; OpenTofu is the community fork)

Terraform (and its open-source fork OpenTofu) excels at provisioning cloud and on-premises infrastructure. You describe resources in .tf files, and Terraform calculates a plan showing exactly what will be created, modified, or destroyed before applying changes. It maintains a state file that maps your config to real-world resources.

Puppet

  • Type: Configuration management
  • Language: Puppet DSL (Ruby-like)
  • Approach: Declarative, pull-based
  • Agent: Yes -- requires Puppet agent on managed nodes
  • License: Apache 2.0

Puppet was one of the first modern configuration management tools. Managed nodes run the Puppet agent, which periodically pulls the desired configuration from a Puppet server and applies it. Puppet uses its own domain-specific language to describe resources (packages, files, services, users). It has a steep learning curve but scales well for very large environments.

Chef

  • Type: Configuration management
  • Language: Ruby (recipes and cookbooks)
  • Approach: Imperative (mostly), pull-based
  • Agent: Yes -- requires Chef client on managed nodes
  • License: Apache 2.0

Chef uses Ruby-based "recipes" grouped into "cookbooks." Nodes run the Chef client, which pulls recipes from a Chef server. Chef appeals to teams comfortable with Ruby and who prefer imperative step-by-step logic. Its community has shrunk compared to Ansible and Terraform, but it remains in use in many enterprises.

SaltStack (Salt)

  • Type: Configuration management and remote execution
  • Language: YAML (state files), Python (modules)
  • Approach: Declarative (states) and imperative (remote execution)
  • Agent: Optional -- can use agent (minion) or SSH
  • License: Apache 2.0

Salt is fast thanks to its ZeroMQ-based messaging. It supports both agent-based (minion) and agentless (salt-ssh) modes. Salt states are written in YAML and describe the desired configuration. It is particularly strong for large-scale environments and real-time remote execution.

Comparison at a Glance

┌────────────┬──────────┬────────────┬────────┬────────────────┐
│ Tool       │ Type     │ Approach   │ Agent? │ Language       │
├────────────┼──────────┼────────────┼────────┼────────────────┤
│ Ansible    │ Config   │ Declarative│ No     │ YAML           │
│ Terraform  │ Provision│ Declarative│ No     │ HCL            │
│ OpenTofu   │ Provision│ Declarative│ No     │ HCL            │
│ Puppet     │ Config   │ Declarative│ Yes    │ Puppet DSL     │
│ Chef       │ Config   │ Imperative │ Yes    │ Ruby           │
│ Salt       │ Config   │ Both       │ Opt.   │ YAML           │
└────────────┴──────────┴────────────┴────────┴────────────────┘

Think About It: Your team has 200 servers. You want to ensure every server has the same SSH hardening config. Would you reach for a provisioning tool or a configuration management tool? Why?


Push vs. Pull Models

IaC tools deliver changes in two ways:

Push Model

The control machine connects to each managed node and pushes changes. The admin initiates the action.

┌─────────────────┐
│  Control Machine │
│  (your laptop)   │
│                  │
│  ansible-playbook│──── SSH ────► Node 1  ✓
│  site.yml        │──── SSH ────► Node 2  ✓
│                  │──── SSH ────► Node 3  ✓
└─────────────────┘

Examples: Ansible, Salt (in SSH mode)

Pros: Simple, no agents to install, you control when changes happen. Cons: Must have network access to all nodes from the control machine.

Pull Model

Each managed node has an agent that periodically contacts a central server and pulls its configuration.

┌─────────────────┐
│  Central Server  │
│  (Puppet Server) │
│                  │◄──── Pull ──── Node 1 (agent)
│  Stores desired  │◄──── Pull ──── Node 2 (agent)
│  configurations  │◄──── Pull ──── Node 3 (agent)
└─────────────────┘
        Every 30 minutes, agents check in

Examples: Puppet, Chef, Salt (in minion mode)

Pros: Nodes self-correct automatically; works even if admin workstation is offline. Cons: Requires agent software on every node; more complex setup.


The GitOps Workflow

GitOps takes IaC to its logical conclusion: Git is the single source of truth for infrastructure, and all changes flow through Git.

┌──────────────────────────────────────────────────────────────┐
│                      GitOps Workflow                          │
│                                                              │
│  1. Developer writes IaC change                              │
│                    │                                         │
│                    ▼                                         │
│  2. Push to feature branch                                   │
│                    │                                         │
│                    ▼                                         │
│  3. Open Pull/Merge Request                                  │
│                    │                                         │
│                    ▼                                         │
│  4. CI runs: lint, validate, plan                            │
│     (e.g., "terraform plan" or "ansible --check")            │
│                    │                                         │
│                    ▼                                         │
│  5. Team reviews the change AND the plan output              │
│                    │                                         │
│                    ▼                                         │
│  6. Merge to main branch                                     │
│                    │                                         │
│                    ▼                                         │
│  7. CD pipeline applies changes automatically                │
│     (e.g., "terraform apply" or "ansible-playbook")          │
│                    │                                         │
│                    ▼                                         │
│  8. Infrastructure updated, state committed                  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Why GitOps Works

  • Audit trail: Every change is a commit with an author, timestamp, and message.
  • Code review: Infrastructure changes get the same review as application code.
  • Rollback: Revert a commit to undo a change.
  • Collaboration: Multiple people can work on infrastructure without stepping on each other.
  • Testing: CI pipelines can validate configs before they reach production.

Hands-On: Seeing Drift in Action

You do not need any IaC tool to understand drift. Let us simulate it.

Step 1: Create a "desired state" file:

$ mkdir -p ~/iac-demo/desired-state
$ cat > ~/iac-demo/desired-state/motd.txt << 'EOF'
Welcome to the production server.
Managed by IaC -- do not modify manually.
EOF

Step 2: "Deploy" it (simulate):

$ sudo cp ~/iac-demo/desired-state/motd.txt /etc/motd
$ cat /etc/motd
Welcome to the production server.
Managed by IaC -- do not modify manually.

Step 3: Simulate drift -- someone manually edits the file:

$ echo "Temporary fix by Bob - 2am" | sudo tee -a /etc/motd > /dev/null
$ cat /etc/motd
Welcome to the production server.
Managed by IaC -- do not modify manually.
Temporary fix by Bob - 2am

Step 4: Detect the drift:

$ diff ~/iac-demo/desired-state/motd.txt /etc/motd
2a3
> Temporary fix by Bob - 2am

Step 5: Remediate -- enforce the desired state:

$ sudo cp ~/iac-demo/desired-state/motd.txt /etc/motd
$ diff ~/iac-demo/desired-state/motd.txt /etc/motd

No output means the files match again. This detect-and-remediate cycle is exactly what IaC tools do automatically, at scale, across hundreds of machines.


IaC Best Practices

1. Store Everything in Version Control

Every IaC file belongs in Git. No exceptions. This includes:

  • Playbooks, manifests, and modules
  • Variable files (but NOT secrets -- see below)
  • Documentation about your infrastructure decisions

2. Never Store Secrets in Plain Text

Passwords, API keys, and private keys should never appear in your Git repository. Use tools designed for secrets:

  • ansible-vault for Ansible
  • Environment variables injected by CI/CD
  • External secret managers (HashiCorp Vault, etc.)

3. Use Modules and Roles for Reusability

Do not copy-paste configurations. Structure your code into reusable components:

  • Ansible: roles
  • Terraform: modules
  • Puppet: modules

4. Test Before Applying

Always preview changes before applying them:

  • ansible-playbook --check (dry run)
  • terraform plan (shows what will change)
  • puppet agent --noop (no-operation mode)

5. Make Small, Incremental Changes

Large changes are hard to review, hard to test, and hard to roll back. Make small, focused commits.

6. Use Environments (Dev, Staging, Production)

Test changes in dev, validate in staging, then promote to production. Use the same IaC code with different variable files for each environment.

7. Document the "Why," Not the "What"

Your IaC files already describe what the infrastructure looks like. Comments and commit messages should explain why decisions were made.

8. Enforce Code Review for Infrastructure Changes

Infrastructure changes should go through the same pull request process as application code. A second pair of eyes catches mistakes before they reach production.

Distro Note: IaC tools work across distributions, but package names and service names differ. Ansible handles this with ansible_os_family facts. Terraform does not care -- it works at the API level. When writing IaC for mixed environments, always test against each target distribution.


Debug This

Your colleague wrote this shell script to configure servers and claims it is "Infrastructure as Code":

#!/bin/bash
apt-get install nginx
echo "server { listen 80; root /var/www; }" > /etc/nginx/sites-available/default
service nginx restart
useradd deploy
echo "deploy:s3cret" | chpasswd

What problems can you identify? Think about:

  1. Is it idempotent? What happens if you run it twice?
  2. Is it secure? What is wrong with the last two lines?
  3. Is it portable? Will it work on CentOS?
  4. Is it version-controlled? How would you track changes?
  5. What happens if apt-get install fails but the script continues?

Answers:

  1. Not idempotent: useradd deploy fails on second run; the echo overwrites any manual Nginx config changes without checking.
  2. A plaintext password in a script is a security disaster. If this script is in Git, the password is in the history forever.
  3. Not portable: apt-get is Debian/Ubuntu only; service is deprecated on systemd systems.
  4. The script itself can be in Git, but there is no structure for variables, roles, or environment separation.
  5. No error handling: if apt-get fails, the script happily continues to configure a nonexistent Nginx.

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 67 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Infrastructure as Code = managing infrastructure through    │
│  version-controlled definition files, not manual changes.    │
│                                                              │
│  Key concepts:                                               │
│  • Declarative ("what I want") vs Imperative ("how to do")   │
│  • Idempotency: run it 10 times, same result as once         │
│  • Configuration management (Ansible, Puppet, Chef, Salt)    │
│    configures existing machines                              │
│  • Provisioning (Terraform, OpenTofu) creates resources      │
│  • Push model: control machine sends changes                 │
│  • Pull model: agents fetch changes periodically             │
│  • GitOps: Git as the single source of truth                 │
│                                                              │
│  Best practices:                                             │
│  • Version control everything                                │
│  • Never store secrets in plain text                         │
│  • Test before applying (dry run)                            │
│  • Small, incremental, reviewed changes                      │
│  • Use environments (dev → staging → production)             │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Write a Declarative Spec

Write a plain-text file (in YAML, JSON, or any format you like) that describes the desired state of a web server. Include: packages to install, files to create, services to enable, firewall rules to set. Do not worry about making it runnable -- focus on describing what the server should look like.

Exercise 2: Make a Script Idempotent

Take this non-idempotent script and rewrite it to be safe to run multiple times:

#!/bin/bash
mkdir /opt/myapp
useradd appuser
echo "export APP_HOME=/opt/myapp" >> /etc/profile

Hint: check before acting. Does the directory exist? Does the user exist? Is the line already in the file?

Exercise 3: Explore an IaC Tool

Install Ansible on your machine (covered in detail in Chapter 68) and run:

$ ansible localhost -m setup | head -50

This shows the "facts" Ansible gathers about your system -- the information it uses to make decisions. Notice how it detects your OS, package manager, network interfaces, and more.

Bonus Challenge

Set up a Git repository for infrastructure code. Create a directory structure like this:

infra/
├── inventory/
│   ├── dev
│   └── production
├── playbooks/
│   ├── webserver.yml
│   └── database.yml
├── roles/
│   └── common/
│       └── tasks/
│           └── main.yml
└── README.md

Even without writing any real Ansible code yet, this structure exercise teaches you how IaC projects are organized. You will fill it in during Chapter 68.

Ansible: Agentless Automation

Why This Matters

You just got promoted. Congratulations -- you now manage 30 servers instead of three. Your boss asks you to deploy a security patch to all of them before end of business. You could SSH into each one, run the same commands 30 times, and pray you do not make a typo on server 17. Or you could write five lines of YAML and let Ansible do it in parallel across all 30 machines in under a minute.

Ansible is the most approachable Infrastructure as Code tool in the Linux ecosystem. It requires no agent software on the managed machines -- just SSH access and Python (which is already on virtually every Linux box). You describe tasks in plain YAML, and Ansible handles the rest: connecting, executing, reporting, and ensuring idempotency.

If you manage more than one server, Ansible will change your life. If you manage hundreds, it is indispensable.


Try This Right Now

If you have Ansible installed (we will install it shortly if you do not), try this one-liner:

$ ansible localhost -m ping

Expected output:

localhost | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

That just used Ansible's ping module to verify it can connect to and execute on localhost. No SSH was needed for localhost, but the same command works against remote machines.

If Ansible is not installed yet, read on -- installation is two commands away.


Installing Ansible

Ansible runs on the control node -- your workstation or a management server. The managed nodes (the servers you are configuring) need nothing installed beyond SSH and Python.

On Debian/Ubuntu

$ sudo apt update
$ sudo apt install -y ansible

On Fedora

$ sudo dnf install -y ansible

On RHEL/AlmaLinux/Rocky (with EPEL)

$ sudo dnf install -y epel-release
$ sudo dnf install -y ansible-core

On any system with pip

$ python3 -m pip install --user ansible

Verify the installation

$ ansible --version
ansible [core 2.16.3]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/user/.ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  python version = 3.11.6
  jinja version = 3.1.2
  libyaml = True

Distro Note: The ansible package on older distributions may be quite outdated. Using pip install ansible gives you the latest version regardless of distribution. On RHEL-family systems, you may need ansible-core rather than ansible from the base repos.


The Inventory: Telling Ansible What to Manage

Before Ansible can do anything, it needs to know which machines to manage. This is the inventory.

Static Inventory

The simplest inventory is a plain text file listing hostnames or IP addresses:

$ mkdir -p ~/ansible-lab
$ cat > ~/ansible-lab/inventory << 'EOF'
[webservers]
web1.example.com
web2.example.com
192.168.1.50

[databases]
db1.example.com
db2.example.com

[all:vars]
ansible_user=deploy
ansible_python_interpreter=/usr/bin/python3
EOF

Key concepts:

  • Groups are defined in [brackets]. A host can belong to multiple groups.
  • [all:vars] sets variables for all hosts.
  • ansible_user tells Ansible which SSH user to connect as.
  • Two built-in groups always exist: all (every host) and ungrouped (hosts not in any explicit group).

INI vs. YAML Inventory Format

The same inventory in YAML:

# ~/ansible-lab/inventory.yml
all:
  vars:
    ansible_user: deploy
    ansible_python_interpreter: /usr/bin/python3
  children:
    webservers:
      hosts:
        web1.example.com:
        web2.example.com:
        192.168.1.50:
    databases:
      hosts:
        db1.example.com:
        db2.example.com:

Testing Your Inventory

$ ansible -i ~/ansible-lab/inventory all --list-hosts
  hosts (5):
    web1.example.com
    web2.example.com
    192.168.1.50
    db1.example.com
    db2.example.com
$ ansible -i ~/ansible-lab/inventory webservers --list-hosts
  hosts (3):
    web1.example.com
    web2.example.com
    192.168.1.50

Dynamic Inventory

For cloud environments where servers come and go, static files become stale. Dynamic inventory scripts or plugins query your cloud provider's API to get the current list of machines:

# Example: using an AWS EC2 dynamic inventory plugin
# ansible-inventory -i aws_ec2.yml --list

Dynamic inventory is configured through YAML plugin files. Cloud-specific plugins exist for AWS, GCP, Azure, DigitalOcean, and many others.

Think About It: You have 50 servers across 3 environments (dev, staging, production). How would you organize your inventory files so you can target just production web servers, or all databases across all environments?


Ad-Hoc Commands: Quick One-Off Tasks

Before writing playbooks, let us use Ansible's ad-hoc mode for quick tasks. Ad-hoc commands use the ansible command (not ansible-playbook).

For these examples, we will use localhost so you can follow along without remote servers:

# Ping localhost
$ ansible localhost -m ping
localhost | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
# Gather system facts
$ ansible localhost -m setup -a "filter=ansible_distribution*"
localhost | SUCCESS => {
    "ansible_facts": {
        "ansible_distribution": "Ubuntu",
        "ansible_distribution_file_variety": "Debian",
        "ansible_distribution_major_version": "22",
        "ansible_distribution_release": "jammy",
        "ansible_distribution_version": "22.04"
    },
    "changed": false
}
# Run a shell command
$ ansible localhost -m shell -a "uptime"
localhost | CHANGED | rc=0 >>
 14:32:07 up 5 days,  3:21,  2 users,  load average: 0.15, 0.20, 0.18
# Check disk space
$ ansible localhost -m shell -a "df -h /"
localhost | CHANGED | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   12G   36G  25% /

Ad-Hoc with Remote Servers

If you have SSH access to remote machines, the syntax is the same but you specify the inventory:

# Ping all web servers
$ ansible -i inventory webservers -m ping

# Install a package on all databases
$ ansible -i inventory databases -m apt -a "name=postgresql state=present" --become

# Restart a service on all web servers
$ ansible -i inventory webservers -m service -a "name=nginx state=restarted" --become

The -m flag specifies the module, -a passes arguments, and --become runs with sudo.


Playbook Anatomy: The Heart of Ansible

Playbooks are YAML files that define a series of tasks to execute on a group of hosts. Here is the structure:

# ~/ansible-lab/first-playbook.yml
---
- name: Configure web servers         # Play name (describes the goal)
  hosts: webservers                    # Target group from inventory
  become: yes                          # Run tasks with sudo

  vars:                                # Variables for this play
    http_port: 80
    doc_root: /var/www/html

  tasks:                               # List of tasks to execute
    - name: Install nginx              # Task name (describes the action)
      apt:                             # Module to use
        name: nginx                    # Module arguments
        state: present
        update_cache: yes

    - name: Deploy index page
      copy:
        content: "<h1>Hello from {{ inventory_hostname }}</h1>"
        dest: "{{ doc_root }}/index.html"
        owner: www-data
        group: www-data
        mode: '0644'

    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:                            # Special tasks triggered by notify
    - name: Restart nginx
      service:
        name: nginx
        state: restarted
┌──────────────────────────────────────────────────────────────┐
│                    PLAYBOOK STRUCTURE                         │
│                                                              │
│  Playbook                                                    │
│  └── Play 1: "Configure web servers"                         │
│      ├── hosts: webservers                                   │
│      ├── become: yes                                         │
│      ├── vars:                                               │
│      │   └── http_port: 80                                   │
│      ├── tasks:                                              │
│      │   ├── Task 1: Install nginx                           │
│      │   ├── Task 2: Deploy index page                       │
│      │   └── Task 3: Ensure nginx is running                 │
│      └── handlers:                                           │
│          └── Handler 1: Restart nginx                        │
│  └── Play 2: "Configure databases"                           │
│      └── ...                                                 │
│                                                              │
│  A playbook can contain multiple plays.                      │
│  Each play targets a group of hosts.                         │
│  Each play has a list of tasks.                              │
│  Tasks are executed in order, one at a time.                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Handlers

Handlers are special tasks that only run when notified by another task. They are typically used to restart services after a config change:

  tasks:
    - name: Update nginx config
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: Restart nginx          # Only triggers if the file changed

  handlers:
    - name: Restart nginx
      service:
        name: nginx
        state: restarted

If the template has not changed, the handler does not fire. This prevents unnecessary restarts.

Variables

Variables make playbooks reusable:

  vars:
    app_user: deploy
    app_dir: /opt/myapp
    packages:
      - git
      - python3
      - python3-pip

  tasks:
    - name: Create application user
      user:
        name: "{{ app_user }}"
        shell: /bin/bash

    - name: Install required packages
      apt:
        name: "{{ packages }}"
        state: present

Variables can come from many sources: playbook vars, inventory, external files, command line (-e), or gathered facts.


Hands-On: Your First Real Playbook

Let us write a playbook that works on localhost so you can run it right now.

Step 1: Create the playbook:

$ cat > ~/ansible-lab/local-setup.yml << 'PLAYBOOK'
---
- name: Configure local development environment
  hosts: localhost
  connection: local
  become: yes

  vars:
    dev_packages:
      - git
      - curl
      - wget
      - htop
      - tree
      - jq
    motd_message: "This machine is managed by Ansible. Manual changes will be overwritten."

  tasks:
    - name: Update package cache
      apt:
        update_cache: yes
        cache_valid_time: 3600
      when: ansible_os_family == "Debian"

    - name: Install development packages (Debian)
      apt:
        name: "{{ dev_packages }}"
        state: present
      when: ansible_os_family == "Debian"

    - name: Install development packages (RedHat)
      dnf:
        name: "{{ dev_packages }}"
        state: present
      when: ansible_os_family == "RedHat"

    - name: Set message of the day
      copy:
        content: "{{ motd_message }}\n"
        dest: /etc/motd
        owner: root
        group: root
        mode: '0644'

    - name: Ensure important directories exist
      file:
        path: "{{ item }}"
        state: directory
        mode: '0755'
      loop:
        - /opt/scripts
        - /opt/backups
        - /opt/logs

    - name: Display completion message
      debug:
        msg: "Local environment configured successfully!"
PLAYBOOK

Step 2: Run it in check mode first (dry run):

$ ansible-playbook ~/ansible-lab/local-setup.yml --check

Check mode shows what would change without actually changing anything.

Step 3: Run it for real:

$ ansible-playbook ~/ansible-lab/local-setup.yml

Expected output:

PLAY [Configure local development environment] ********************************

TASK [Gathering Facts] ********************************************************
ok: [localhost]

TASK [Update package cache] ***************************************************
changed: [localhost]

TASK [Install development packages (Debian)] **********************************
changed: [localhost]

TASK [Install development packages (RedHat)] **********************************
skipping: [localhost]

TASK [Set message of the day] *************************************************
changed: [localhost]

TASK [Ensure important directories exist] *************************************
changed: [localhost] => (item=/opt/scripts)
changed: [localhost] => (item=/opt/backups)
changed: [localhost] => (item=/opt/logs)

TASK [Display completion message] *********************************************
ok: [localhost] => {
    "msg": "Local environment configured successfully!"
}

PLAY RECAP ********************************************************************
localhost                  : ok=6    changed=4    unreachable=0    failed=0    skipped=1

Step 4: Run it again -- observe idempotency:

$ ansible-playbook ~/ansible-lab/local-setup.yml

This time, most tasks report ok instead of changed. Ansible checked and found the system already matched the desired state.


Essential Ansible Modules

Modules are the building blocks of Ansible tasks. Here are the ones you will use most.

Package Management

# Debian/Ubuntu
- name: Install packages
  apt:
    name:
      - nginx
      - postgresql
    state: present         # present, absent, latest
    update_cache: yes

# RHEL/Fedora
- name: Install packages
  dnf:
    name:
      - httpd
      - mariadb-server
    state: present

File Management

# Copy a file
- name: Copy config file
  copy:
    src: files/app.conf           # Local file on control node
    dest: /etc/app/app.conf       # Destination on managed node
    owner: root
    group: root
    mode: '0644'
    backup: yes                   # Keep backup of original

# Create a file from a Jinja2 template
- name: Deploy nginx config from template
  template:
    src: templates/nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
  notify: Restart nginx

# Manage files and directories
- name: Create directory
  file:
    path: /opt/myapp
    state: directory
    owner: deploy
    group: deploy
    mode: '0755'

- name: Create a symlink
  file:
    src: /opt/myapp/current
    dest: /var/www/app
    state: link

Service Management

- name: Ensure nginx is running and enabled
  service:
    name: nginx
    state: started         # started, stopped, restarted, reloaded
    enabled: yes           # Start on boot

User and Group Management

- name: Create application user
  user:
    name: deploy
    shell: /bin/bash
    groups: sudo,docker
    append: yes            # Add to groups without removing from others
    create_home: yes

- name: Add SSH key for deploy user
  authorized_key:
    user: deploy
    key: "{{ lookup('file', 'files/deploy_id_rsa.pub') }}"
    state: present

Command Execution

# Run a command (only if needed)
- name: Initialize the database
  command: /opt/app/bin/init-db
  args:
    creates: /opt/app/data/initialized    # Skip if this file exists

# Run a shell command (supports pipes, redirects)
- name: Check disk usage
  shell: df -h / | tail -1
  register: disk_usage

- name: Show disk usage
  debug:
    var: disk_usage.stdout

Think About It: Why does Ansible have both command and shell modules? When would you choose one over the other?


Roles: Organizing Playbooks at Scale

As your Ansible codebase grows, stuffing everything into one playbook becomes unmanageable. Roles provide a standard way to organize tasks, files, templates, and variables into reusable units.

Role Directory Structure

roles/
└── webserver/
    ├── tasks/
    │   └── main.yml          # Main list of tasks
    ├── handlers/
    │   └── main.yml          # Handlers
    ├── templates/
    │   └── nginx.conf.j2     # Jinja2 templates
    ├── files/
    │   └── index.html        # Static files
    ├── vars/
    │   └── main.yml          # Role variables
    ├── defaults/
    │   └── main.yml          # Default variable values (lowest priority)
    └── meta/
        └── main.yml          # Role metadata (dependencies, etc.)

Creating a Role

$ mkdir -p ~/ansible-lab/roles
$ ansible-galaxy init ~/ansible-lab/roles/webserver

This creates the full directory structure. Now populate it:

$ cat > ~/ansible-lab/roles/webserver/tasks/main.yml << 'EOF'
---
- name: Install nginx
  apt:
    name: nginx
    state: present
    update_cache: yes
  when: ansible_os_family == "Debian"

- name: Deploy nginx configuration
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: Restart nginx

- name: Deploy website
  copy:
    src: index.html
    dest: /var/www/html/index.html
    owner: www-data
    group: www-data

- name: Ensure nginx is running
  service:
    name: nginx
    state: started
    enabled: yes
EOF
$ cat > ~/ansible-lab/roles/webserver/handlers/main.yml << 'EOF'
---
- name: Restart nginx
  service:
    name: nginx
    state: restarted
EOF
$ cat > ~/ansible-lab/roles/webserver/defaults/main.yml << 'EOF'
---
nginx_worker_processes: auto
nginx_worker_connections: 1024
server_name: localhost
EOF

Using a Role in a Playbook

# ~/ansible-lab/site.yml
---
- name: Configure web servers
  hosts: webservers
  become: yes
  roles:
    - webserver

That one line -- - webserver -- pulls in all tasks, handlers, templates, files, and variables from the role. Clean, reusable, and readable.


Ansible Galaxy: Community Roles

Ansible Galaxy is a repository of community-contributed roles. Instead of writing everything from scratch, you can install pre-built roles:

# Search for roles
$ ansible-galaxy search nginx

# Install a role
$ ansible-galaxy install geerlingguy.nginx

# List installed roles
$ ansible-galaxy list

You can also define role dependencies in a requirements.yml:

# requirements.yml
---
roles:
  - name: geerlingguy.nginx
    version: "3.1.0"
  - name: geerlingguy.postgresql
    version: "3.4.0"
$ ansible-galaxy install -r requirements.yml

Ansible Vault: Managing Secrets

Never put passwords or API keys in plain text. Ansible Vault encrypts sensitive data.

Encrypting a Variable File

$ cat > ~/ansible-lab/secrets.yml << 'EOF'
---
db_password: "SuperSecret123!"
api_key: "ak_live_abc123def456"
EOF

$ ansible-vault encrypt ~/ansible-lab/secrets.yml
New Vault password:
Confirm New Vault password:
Encryption successful

The file is now AES-256 encrypted:

$ cat ~/ansible-lab/secrets.yml
$ANSIBLE_VAULT;1.1;AES256
36383836656233613766623335383666316137663262383633303732356134343130613636326230
...

Using Encrypted Files in Playbooks

---
- name: Deploy application
  hosts: appservers
  become: yes
  vars_files:
    - secrets.yml

  tasks:
    - name: Configure database connection
      template:
        src: db_config.j2
        dest: /opt/app/config/database.yml
        mode: '0600'
# Run with vault password prompt
$ ansible-playbook deploy.yml --ask-vault-pass

# Or use a password file
$ ansible-playbook deploy.yml --vault-password-file ~/.vault_pass

Safety Warning: Never commit your vault password file to Git. Add it to .gitignore. The whole point of vault is to keep secrets safe -- do not undermine it by exposing the master password.

Encrypting a Single Variable

You can also encrypt individual variables instead of whole files:

$ ansible-vault encrypt_string 'SuperSecret123!' --name 'db_password'
db_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          61326634356234633166...

Paste this directly into your variable files.


Practical Playbook: Configure a Web Server

Here is a complete, production-style playbook:

# ~/ansible-lab/webserver-playbook.yml
---
- name: Deploy and configure web server
  hosts: localhost
  connection: local
  become: yes

  vars:
    server_name: myapp.example.com
    doc_root: /var/www/myapp
    nginx_worker_processes: auto
    nginx_worker_connections: 1024

  tasks:
    - name: Install required packages
      apt:
        name:
          - nginx
          - ufw
        state: present
        update_cache: yes
      when: ansible_os_family == "Debian"

    - name: Create document root
      file:
        path: "{{ doc_root }}"
        state: directory
        owner: www-data
        group: www-data
        mode: '0755'

    - name: Deploy application files
      copy:
        content: |
          <!DOCTYPE html>
          <html>
          <head><title>{{ server_name }}</title></head>
          <body>
            <h1>Welcome to {{ server_name }}</h1>
            <p>Deployed by Ansible on {{ ansible_date_time.iso8601 }}</p>
            <p>Running on {{ ansible_distribution }} {{ ansible_distribution_version }}</p>
          </body>
          </html>
        dest: "{{ doc_root }}/index.html"
        owner: www-data
        group: www-data
        mode: '0644'

    - name: Deploy nginx site configuration
      copy:
        content: |
          server {
              listen 80;
              server_name {{ server_name }};
              root {{ doc_root }};
              index index.html;

              access_log /var/log/nginx/{{ server_name }}_access.log;
              error_log  /var/log/nginx/{{ server_name }}_error.log;

              location / {
                  try_files $uri $uri/ =404;
              }
          }
        dest: /etc/nginx/sites-available/{{ server_name }}
        owner: root
        group: root
        mode: '0644'
      notify: Reload nginx

    - name: Enable the site
      file:
        src: /etc/nginx/sites-available/{{ server_name }}
        dest: /etc/nginx/sites-enabled/{{ server_name }}
        state: link
      notify: Reload nginx

    - name: Remove default site
      file:
        path: /etc/nginx/sites-enabled/default
        state: absent
      notify: Reload nginx

    - name: Test nginx configuration
      command: nginx -t
      changed_when: false

    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: Reload nginx
      service:
        name: nginx
        state: reloaded

Run it:

$ ansible-playbook ~/ansible-lab/webserver-playbook.yml

Debug This

Your colleague wrote this playbook and it fails. Can you spot the issues?

---
- name: Setup app server
  hosts: appservers
  become: true

  tasks:
    - name: Install packages
      apt:
        name: [nginx, python3]
        state: installed

    - name: Copy config
      copy:
        src: /home/admin/nginx.conf
        dest: /etc/nginx/nginx.conf
      notify: restart nginx

  handlers:
    - name: Restart nginx
      service:
        name: nginx
        state: restarted

Problems:

  1. state: installed is wrong. The correct value for apt module is state: present (or latest, absent). Ansible will throw an error.

  2. Handler name mismatch. The task notifies restart nginx (lowercase 'r') but the handler is named Restart nginx (uppercase 'R'). Handler names are case-sensitive. The handler will never fire.

  3. Absolute path in src of the copy module. When src is an absolute path, Ansible copies from that exact path on the control node. This works, but if you meant to use a file relative to the playbook, use a relative path like files/nginx.conf.

  4. No update_cache: yes on the apt task. On a fresh server, the apt cache may be empty and package installation will fail.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 68 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Ansible = agentless automation over SSH using YAML.         │
│                                                              │
│  Key components:                                             │
│  • Inventory: defines which machines to manage               │
│  • Ad-hoc commands: quick one-off tasks (ansible -m)         │
│  • Playbooks: YAML files defining plays, tasks, handlers     │
│  • Modules: apt, dnf, copy, template, service, user, file   │
│  • Roles: reusable, structured collections of tasks          │
│  • Galaxy: community role repository                         │
│  • Vault: encrypted secrets management                       │
│                                                              │
│  Workflow:                                                   │
│  1. Define inventory (who to manage)                         │
│  2. Write playbook (what to do)                              │
│  3. Run with --check first (dry run)                         │
│  4. Apply with ansible-playbook                              │
│  5. Run again to verify idempotency                          │
│                                                              │
│  Ansible checks before acting -- it only makes changes       │
│  when the current state differs from the desired state.      │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Inventory Practice

Create an inventory file with three groups: webservers, databases, and monitoring. Add localhost to all three groups. Then run:

$ ansible -i your_inventory webservers --list-hosts
$ ansible -i your_inventory databases --list-hosts
$ ansible -i your_inventory all --list-hosts

Verify that all shows localhost only once even though it is in multiple groups.

Exercise 2: Write a Playbook

Write a playbook that:

  1. Creates a user called appuser with a home directory
  2. Creates the directory /opt/myapp owned by appuser
  3. Copies a simple script to /opt/myapp/health.sh that echoes "OK"
  4. Makes the script executable

Run it on localhost and verify everything was created correctly.

Exercise 3: Explore Facts

Run ansible localhost -m setup and examine the output. Find:

  • Your operating system name and version
  • Total memory
  • All network interfaces and their IP addresses
  • The number of CPU cores

Exercise 4: Vault Practice

Create a file with a fake database password, encrypt it with ansible-vault, then write a playbook that uses the encrypted variable. Run the playbook with --ask-vault-pass.

Bonus Challenge

Create a role called hardening that:

  • Disables root SSH login (modifies /etc/ssh/sshd_config)
  • Sets a secure MOTD
  • Installs and enables fail2ban
  • Configures the firewall to allow only SSH (port 22) and HTTP (port 80)

Use the role in a playbook, and test with --check mode before applying.

CI/CD Pipelines on Linux

Why This Matters

A developer pushes a code change at 3pm on Friday. The change breaks the application, but nobody notices until Monday morning when customers start complaining. It takes the team four hours to figure out which of the 47 commits merged over the weekend caused the issue.

Now imagine the alternative: the developer pushes the change, and within five minutes an automated pipeline builds the code, runs 200 tests, and rejects the change because three tests fail. The developer sees the failure immediately, fixes it, pushes again, and all tests pass. The code deploys automatically to staging for further validation. Nobody's weekend is ruined.

That is CI/CD -- Continuous Integration and Continuous Delivery (or Deployment). It is the backbone of modern software engineering, and it runs almost entirely on Linux.


Try This Right Now

If you have Git installed, you can simulate a basic pipeline with a shell script:

$ mkdir -p ~/cicd-demo && cd ~/cicd-demo
$ git init

$ cat > app.py << 'EOF'
def add(a, b):
    return a + b

def subtract(a, b):
    return a - b
EOF

$ cat > test_app.py << 'EOF'
from app import add, subtract

def test_add():
    assert add(2, 3) == 5

def test_subtract():
    assert subtract(5, 3) == 2
EOF

$ cat > pipeline.sh << 'PIPELINE'
#!/bin/bash
set -e
echo "=== Stage 1: Lint ==="
python3 -m py_compile app.py && echo "PASS: Syntax OK"

echo "=== Stage 2: Test ==="
python3 -c "
from app import add, subtract
assert add(2,3) == 5, 'add failed'
assert subtract(5,3) == 2, 'subtract failed'
print('PASS: All tests passed')
"

echo "=== Stage 3: Build ==="
echo "Building artifact..."
tar czf app-$(date +%Y%m%d%H%M%S).tar.gz app.py
echo "PASS: Build complete"

echo "=== Pipeline Succeeded ==="
PIPELINE
chmod +x pipeline.sh

$ ./pipeline.sh

Expected output:

=== Stage 1: Lint ===
PASS: Syntax OK
=== Stage 2: Test ===
PASS: All tests passed
=== Stage 3: Build ===
Building artifact...
PASS: Build complete
=== Pipeline Succeeded ===

That is a CI/CD pipeline in its most primitive form. Real tools add automation, parallelism, artifact management, and much more -- but the concept is the same.


CI/CD Concepts

Continuous Integration (CI)

Continuous Integration means developers merge their code changes into a shared repository frequently -- at least daily. Each merge triggers an automated build and test process.

┌────────────────────────────────────────────────────────────┐
│               CONTINUOUS INTEGRATION                        │
│                                                            │
│  Developer A ──push──┐                                     │
│                      │    ┌──────────┐    ┌──────────┐     │
│  Developer B ──push──├──► │  Build   │──► │  Test    │     │
│                      │    └──────────┘    └──────────┘     │
│  Developer C ──push──┘         │              │            │
│                                │              │            │
│                           Pass? ──► Merge     │            │
│                           Fail? ──► Reject  Fail? ──► Fix  │
│                                                            │
└────────────────────────────────────────────────────────────┘

The goal: catch bugs early, when they are cheap to fix. A bug found in CI costs minutes to fix. The same bug found in production costs hours, money, and customer trust.

Continuous Delivery (CD)

Continuous Delivery extends CI by automatically preparing code for release to production. Every change that passes CI is deployable, but a human still decides when to deploy.

Continuous Deployment

Continuous Deployment goes one step further: every change that passes all automated tests is deployed to production automatically, with no human intervention.

┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  CI only:       Code → Build → Test → ✓ (done)              │
│                                                              │
│  CI + Delivery: Code → Build → Test → Stage → [Human] → Prod│
│                                                              │
│  CI + Deploy:   Code → Build → Test → Stage → Prod (auto)   │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Think About It: When would you choose Continuous Delivery over Continuous Deployment? Think about regulated industries, user-facing applications, and infrastructure changes.


Pipeline Stages

A typical CI/CD pipeline has these stages:

┌─────────┐   ┌─────────┐   ┌─────────┐   ┌──────────┐   ┌──────────┐
│  Lint   │──►│  Build  │──►│  Test   │──►│  Package │──►│  Deploy  │
│         │   │         │   │         │   │          │   │          │
│ Syntax  │   │ Compile │   │ Unit    │   │ Docker   │   │ Staging  │
│ Style   │   │ Bundle  │   │ Integr. │   │ Tarball  │   │ Prod     │
│ Checks  │   │ Resolve │   │ E2E     │   │ RPM/DEB  │   │          │
└─────────┘   └─────────┘   └─────────┘   └──────────┘   └──────────┘
     │              │             │              │              │
     ▼              ▼             ▼              ▼              ▼
  Fail fast    Fail fast     Fail fast     Artifacts       Rollback
  on errors    on errors     on errors     stored          on failure

Lint

Check code quality before doing anything expensive:

  • Syntax validation
  • Style compliance (PEP 8, ESLint, shellcheck)
  • Security scanning (static analysis)

Build

Compile code, resolve dependencies, bundle assets:

  • make, gcc for C/C++
  • mvn package for Java
  • npm run build for JavaScript
  • go build for Go

Test

Run automated tests at multiple levels:

  • Unit tests: Test individual functions
  • Integration tests: Test components working together
  • End-to-end tests: Test the full application flow

Package

Create deployable artifacts:

  • Docker images
  • Tarballs
  • RPM/DEB packages
  • Application archives

Deploy

Release to target environments:

  • Staging for validation
  • Production for users

Git-Based Workflows

CI/CD pipelines are triggered by Git events. The most common workflow:

┌────────────────────────────────────────────────────────────────┐
│                   BRANCHING WORKFLOW                            │
│                                                                │
│  main ──────●──────────────●──────────────●──── (always stable)│
│              \            / \            /                      │
│  feature/     \──●──●──●──   \──●──●──●──                      │
│  login           │  │  │        │  │  │                        │
│                  CI CI CI       CI CI CI                        │
│                  runs  │        runs  │                        │
│                     Merge PR      Merge PR                     │
│                                                                │
│  Every push to a branch triggers CI.                           │
│  Merging to main triggers CD (deploy to staging/production).   │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Self-Hosted Git: Gitea and Forgejo

Before choosing a CI/CD tool, you need a Git hosting platform. For self-hosted, open-source options:

Gitea

Gitea is a lightweight, self-hosted Git service written in Go. It is fast, requires minimal resources, and provides a GitHub-like interface.

# Install Gitea (binary method)
$ wget -O /tmp/gitea https://dl.gitea.com/gitea/1.21/gitea-1.21-linux-amd64
$ sudo mv /tmp/gitea /usr/local/bin/gitea
$ sudo chmod +x /usr/local/bin/gitea

# Create system user
$ sudo adduser --system --shell /bin/bash --group --home /home/gitea gitea

# Create required directories
$ sudo mkdir -p /var/lib/gitea/{custom,data,log}
$ sudo chown -R gitea:gitea /var/lib/gitea
$ sudo mkdir -p /etc/gitea
$ sudo chown root:gitea /etc/gitea
$ sudo chmod 770 /etc/gitea

Forgejo

Forgejo is a community fork of Gitea, focused on remaining fully community-governed. It is API-compatible with Gitea and uses the same configuration.

Distro Note: Both Gitea and Forgejo are available as single binaries, Docker images, or distribution packages. On Debian-based systems, check if your distro ships a package. Otherwise, the binary or Docker approach works everywhere.


Open-Source CI/CD Tools

Woodpecker CI

Woodpecker CI is a community fork of Drone CI, fully open source under the Apache 2.0 license. It integrates tightly with Gitea, Forgejo, GitHub, and GitLab.

Installing Woodpecker CI

Woodpecker is typically deployed with Docker alongside your Git server:

# docker-compose.yml for Woodpecker + Gitea
version: '3'
services:
  woodpecker-server:
    image: woodpeckerci/woodpecker-server:latest
    ports:
      - "8000:8000"
    volumes:
      - woodpecker-server-data:/var/lib/woodpecker/
    environment:
      - WOODPECKER_OPEN=true
      - WOODPECKER_HOST=http://your-server:8000
      - WOODPECKER_GITEA=true
      - WOODPECKER_GITEA_URL=http://gitea:3000
      - WOODPECKER_GITEA_CLIENT=your-oauth-client-id
      - WOODPECKER_GITEA_SECRET=your-oauth-secret
      - WOODPECKER_SECRET=a-shared-secret

  woodpecker-agent:
    image: woodpeckerci/woodpecker-agent:latest
    depends_on:
      - woodpecker-server
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - WOODPECKER_SERVER=woodpecker-server:9000
      - WOODPECKER_SECRET=a-shared-secret

volumes:
  woodpecker-server-data:

Writing a Woodpecker Pipeline

Pipelines are defined in .woodpecker.yml in your repository:

# .woodpecker.yml
steps:
  lint:
    image: python:3.11
    commands:
      - pip install flake8
      - flake8 --max-line-length=120 .

  test:
    image: python:3.11
    commands:
      - pip install pytest
      - pip install -r requirements.txt
      - pytest tests/ -v

  build:
    image: python:3.11
    commands:
      - python setup.py sdist bdist_wheel

  deploy:
    image: alpine
    commands:
      - echo "Deploying to staging..."
      - apk add --no-cache openssh-client
      - scp dist/*.whl deploy@staging:/opt/app/
    when:
      branch: main
      event: push

Each step runs in an isolated container. If any step fails, the pipeline stops.

Jenkins

Jenkins is the oldest and most widely used open-source CI/CD server. It is written in Java and has a massive plugin ecosystem.

Installing Jenkins

# On Debian/Ubuntu
$ sudo apt update
$ sudo apt install -y openjdk-17-jre
$ curl -fsSL https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key | sudo tee \
    /usr/share/keyrings/jenkins-keyring.asc > /dev/null
$ echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] \
    https://pkg.jenkins.io/debian-stable binary/" | sudo tee \
    /etc/apt/sources.list.d/jenkins.list > /dev/null
$ sudo apt update
$ sudo apt install -y jenkins
$ sudo systemctl enable --now jenkins

Jenkins runs on port 8080 by default. The initial admin password is at:

$ sudo cat /var/lib/jenkins/secrets/initialAdminPassword

Jenkinsfile: Pipeline as Code

Modern Jenkins uses a Jenkinsfile in the repository root:

// Jenkinsfile
pipeline {
    agent any

    stages {
        stage('Lint') {
            steps {
                sh 'python3 -m py_compile app.py'
            }
        }

        stage('Test') {
            steps {
                sh 'python3 -m pytest tests/ -v --junitxml=results.xml'
            }
            post {
                always {
                    junit 'results.xml'
                }
            }
        }

        stage('Build') {
            steps {
                sh 'tar czf app.tar.gz *.py requirements.txt'
                archiveArtifacts artifacts: 'app.tar.gz'
            }
        }

        stage('Deploy to Staging') {
            when {
                branch 'main'
            }
            steps {
                sh './deploy.sh staging'
            }
        }
    }

    post {
        failure {
            echo 'Pipeline failed! Check the logs.'
        }
    }
}

GitLab CI/CD

GitLab includes CI/CD built directly into the platform. Pipelines are defined in .gitlab-ci.yml:

# .gitlab-ci.yml
stages:
  - lint
  - test
  - build
  - deploy

lint:
  stage: lint
  image: python:3.11
  script:
    - pip install flake8
    - flake8 --max-line-length=120 .

test:
  stage: test
  image: python:3.11
  script:
    - pip install pytest
    - pip install -r requirements.txt
    - pytest tests/ -v --junitxml=report.xml
  artifacts:
    reports:
      junit: report.xml

build:
  stage: build
  image: python:3.11
  script:
    - python setup.py sdist bdist_wheel
  artifacts:
    paths:
      - dist/

deploy_staging:
  stage: deploy
  script:
    - echo "Deploying to staging..."
    - scp dist/*.whl deploy@staging:/opt/app/
  only:
    - main
  environment:
    name: staging

GitLab CI/CD uses runners -- agents that execute pipeline jobs. You can use shared runners provided by GitLab or install your own:

# Install GitLab Runner
$ curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash
$ sudo apt install -y gitlab-runner

# Register the runner
$ sudo gitlab-runner register

Hands-On: Build a Simple CI Pipeline

Let us create a complete, working pipeline using shell scripts to understand the mechanics before using a CI tool.

Step 1: Set up a project with tests:

$ mkdir -p ~/cicd-project/tests
$ cd ~/cicd-project
$ git init

Step 2: Create the application:

$ cat > ~/cicd-project/app.py << 'EOF'
"""Simple calculator application."""

def add(a, b):
    """Add two numbers."""
    return a + b

def subtract(a, b):
    """Subtract b from a."""
    return a - b

def multiply(a, b):
    """Multiply two numbers."""
    return a * b

def divide(a, b):
    """Divide a by b."""
    if b == 0:
        raise ValueError("Cannot divide by zero")
    return a / b
EOF

Step 3: Create tests:

$ cat > ~/cicd-project/tests/test_app.py << 'EOF'
import sys
sys.path.insert(0, '.')
from app import add, subtract, multiply, divide

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

def test_subtract():
    assert subtract(5, 3) == 2
    assert subtract(0, 5) == -5

def test_multiply():
    assert multiply(3, 4) == 12
    assert multiply(0, 5) == 0

def test_divide():
    assert divide(10, 2) == 5.0
    assert divide(7, 2) == 3.5

def test_divide_by_zero():
    try:
        divide(1, 0)
        assert False, "Should have raised ValueError"
    except ValueError:
        pass

if __name__ == "__main__":
    test_add()
    test_subtract()
    test_multiply()
    test_divide()
    test_divide_by_zero()
    print("All tests passed!")
EOF

Step 4: Create the pipeline script:

$ cat > ~/cicd-project/run-pipeline.sh << 'PIPELINE'
#!/bin/bash
set -euo pipefail

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

stage() {
    echo -e "\n${YELLOW}=== $1 ===${NC}"
}

pass() {
    echo -e "${GREEN}PASS: $1${NC}"
}

fail() {
    echo -e "${RED}FAIL: $1${NC}"
    exit 1
}

# Stage 1: Lint
stage "LINT"
python3 -m py_compile app.py && pass "Syntax check" || fail "Syntax error in app.py"

# Stage 2: Test
stage "TEST"
python3 tests/test_app.py && pass "All tests" || fail "Tests failed"

# Stage 3: Build
stage "BUILD"
BUILD_ID="build-$(date +%Y%m%d-%H%M%S)"
mkdir -p artifacts
tar czf "artifacts/${BUILD_ID}.tar.gz" app.py tests/
pass "Artifact created: artifacts/${BUILD_ID}.tar.gz"

# Stage 4: Deploy (simulated)
stage "DEPLOY"
echo "Would deploy artifacts/${BUILD_ID}.tar.gz to staging"
pass "Deployment simulated"

echo -e "\n${GREEN}=== PIPELINE SUCCEEDED ===${NC}"
PIPELINE
chmod +x ~/cicd-project/run-pipeline.sh

Step 5: Run it:

$ cd ~/cicd-project && ./run-pipeline.sh
=== LINT ===
PASS: Syntax check

=== TEST ===
All tests passed!
PASS: All tests

=== BUILD ===
PASS: Artifact created: artifacts/build-20250615-143022.tar.gz

=== DEPLOY ===
Would deploy artifacts/build-20250615-143022.tar.gz to staging
PASS: Deployment simulated

=== PIPELINE SUCCEEDED ===

Step 6: Now break something and see the pipeline catch it:

$ echo "this is not valid python" >> ~/cicd-project/app.py
$ cd ~/cicd-project && ./run-pipeline.sh
=== LINT ===
FAIL: Syntax error in app.py

The pipeline stopped at the lint stage, exactly as it should.


Artifacts, Variables, and Secrets

Artifacts

Artifacts are files produced by a pipeline that need to be preserved -- compiled binaries, test reports, Docker images, packages.

# GitLab CI example
build:
  stage: build
  script:
    - make build
  artifacts:
    paths:
      - build/myapp
    expire_in: 30 days

Environment Variables

Pipelines use environment variables for configuration:

# Woodpecker CI
steps:
  deploy:
    image: alpine
    environment:
      - APP_ENV=staging
      - APP_PORT=8080
    commands:
      - echo "Deploying to $APP_ENV on port $APP_PORT"

Secrets Management

Secrets (API keys, passwords, deploy keys) should never be in the pipeline file or the repository. CI/CD platforms provide secret storage:

  • GitLab: Settings > CI/CD > Variables (masked, protected)
  • Jenkins: Credentials plugin
  • Woodpecker: Repository secrets in the UI or via API
  • GitHub Actions: Repository or organization secrets
# GitLab CI -- using a secret variable
deploy:
  script:
    - echo "$DEPLOY_KEY" > /tmp/deploy.key
    - chmod 600 /tmp/deploy.key
    - scp -i /tmp/deploy.key build/app deploy@prod:/opt/
  after_script:
    - rm -f /tmp/deploy.key

Safety Warning: Never echo or print secret variables in pipeline output. Most CI platforms mask them automatically, but do not rely on that alone. Also be cautious about secrets in pull request pipelines -- a malicious PR could add a step that exfiltrates secrets.


Deployment Strategies

How you deploy to production matters as much as what you deploy.

Rolling Deployment

Update servers one at a time (or in batches). If something goes wrong, stop and roll back.

┌─────────────────────────────────────────────────┐
│             ROLLING DEPLOYMENT                   │
│                                                  │
│  Time 0: [v1] [v1] [v1] [v1]  (all on v1)       │
│  Time 1: [v2] [v1] [v1] [v1]  (updating...)     │
│  Time 2: [v2] [v2] [v1] [v1]  (updating...)     │
│  Time 3: [v2] [v2] [v2] [v1]  (updating...)     │
│  Time 4: [v2] [v2] [v2] [v2]  (complete)        │
│                                                  │
│  Pros: No extra infrastructure needed            │
│  Cons: Mixed versions during deployment          │
│                                                  │
└─────────────────────────────────────────────────┘

Blue-Green Deployment

Maintain two identical environments. Deploy to the inactive one, test, then switch traffic.

┌─────────────────────────────────────────────────┐
│           BLUE-GREEN DEPLOYMENT                  │
│                                                  │
│  Before:                                         │
│  Users ──► Load Balancer ──► BLUE (v1) ← active  │
│                              GREEN (idle)        │
│                                                  │
│  Deploy v2 to GREEN, test it:                    │
│  Users ──► Load Balancer ──► BLUE (v1) ← active  │
│                              GREEN (v2) ← ready  │
│                                                  │
│  Switch:                                         │
│  Users ──► Load Balancer ──► BLUE (v1) ← idle    │
│                              GREEN (v2) ← active │
│                                                  │
│  Rollback = switch back to BLUE                  │
│                                                  │
│  Pros: Instant rollback, no mixed versions       │
│  Cons: Need double the infrastructure            │
│                                                  │
└─────────────────────────────────────────────────┘

Canary Deployment

Route a small percentage of traffic to the new version. If it works, gradually increase.

┌─────────────────────────────────────────────────┐
│            CANARY DEPLOYMENT                     │
│                                                  │
│  Phase 1:  95% ──► v1 (stable)                   │
│             5% ──► v2 (canary)   ← monitor       │
│                                                  │
│  Phase 2:  75% ──► v1                            │
│            25% ──► v2            ← still OK?     │
│                                                  │
│  Phase 3:  50% ──► v1                            │
│            50% ──► v2            ← looking good  │
│                                                  │
│  Phase 4: 100% ──► v2           ← full rollout   │
│                                                  │
│  Pros: Minimal blast radius for bugs             │
│  Cons: More complex routing and monitoring       │
│                                                  │
└─────────────────────────────────────────────────┘

Think About It: You are deploying a database schema change. Which deployment strategy would you use? Why are database migrations particularly tricky for blue-green and canary deployments?


Debug This

A pipeline keeps failing with this error:

Step 3/5: Test
ERROR: ModuleNotFoundError: No module named 'requests'
Pipeline FAILED at test stage

The .gitlab-ci.yml looks like this:

test:
  stage: test
  image: python:3.11
  script:
    - pytest tests/ -v

What is wrong?

Answer: The pipeline does not install dependencies before running tests. The requests module (used by the application) is not installed in the clean Python container. Fix:

test:
  stage: test
  image: python:3.11
  script:
    - pip install -r requirements.txt    # Install dependencies first!
    - pytest tests/ -v

Every pipeline stage starts with a clean environment. Dependencies must be installed explicitly every time, or cached between runs.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 69 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  CI/CD automates the build-test-deploy cycle.                │
│                                                              │
│  CI = merge frequently, test automatically                   │
│  CD (Delivery) = always deployable, human triggers deploy    │
│  CD (Deployment) = deploy automatically on every merge       │
│                                                              │
│  Pipeline stages: Lint → Build → Test → Package → Deploy     │
│                                                              │
│  Open-source CI/CD tools:                                    │
│  • Woodpecker CI: lightweight, Docker-native, Gitea-friendly │
│  • Jenkins: veteran, plugin-rich, Jenkinsfile-based          │
│  • GitLab CI/CD: integrated with GitLab, .gitlab-ci.yml      │
│                                                              │
│  Self-hosted Git: Gitea or Forgejo                           │
│                                                              │
│  Deployment strategies:                                      │
│  • Rolling: update one at a time                             │
│  • Blue-green: switch between two identical environments     │
│  • Canary: send small traffic to new version first           │
│                                                              │
│  Never put secrets in pipeline files or repositories.        │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Expand the Pipeline

Take the shell-based pipeline from the hands-on section and add:

  • A code coverage stage that reports what percentage of code is tested
  • An artifact cleanup stage that removes builds older than 7 days

Exercise 2: Write a Woodpecker Pipeline

Create a .woodpecker.yml for a project of your choice. Include lint, test, build, and deploy stages. Use when conditions to only deploy on the main branch.

Exercise 3: Set Up Gitea

Install Gitea on your local machine (using Docker or a binary). Create a repository, push code to it, and explore the web interface.

Exercise 4: Jenkins Exploration

Install Jenkins, create a freestyle project, and configure it to poll a Git repository and run tests on every push.

Bonus Challenge

Build a pipeline that:

  1. Runs tests in parallel (unit tests and integration tests at the same time)
  2. Only builds a Docker image if all tests pass
  3. Deploys to staging automatically
  4. Requires manual approval before deploying to production
  5. Sends a notification (even just an echo) on success or failure

Think about how each CI/CD tool handles parallel stages and manual gates.

Monitoring & Alerting Stack

Why This Matters

It is 2am and your phone buzzes. The e-commerce site is down. You SSH into the server and discover the disk is full -- something has been writing massive log files for weeks. If only someone had been watching.

Now imagine the alternative: three weeks ago, your monitoring system noticed disk usage crossing 80%. It sent an alert. You investigated, found the runaway log, set up rotation, and went back to sleep. No outage. No 2am phone call.

Monitoring is not optional for any system that matters. Without it, you are flying blind. With it, you can detect problems before they become outages, understand performance trends, and make informed capacity decisions.

This chapter builds the most popular open-source monitoring stack from the ground up: Prometheus for metrics collection, Grafana for visualization, node_exporter for system metrics, and Alertmanager for sending alerts.


Try This Right Now

Even without Prometheus, you can see what system metrics look like. Run these commands and think about which ones you would want to track over time:

# CPU load averages
$ cat /proc/loadavg
0.35 0.42 0.38 2/487 12345

# Memory usage
$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7964        2134        1204         125        4625        5425
Swap:          2048          12        2036

# Disk usage
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   28G   20G  59% /

# Network connections
$ ss -s
Total: 487
TCP:   42 (estab 12, closed 8, orphaned 0, timewait 6)

If you could see these numbers on a graph, updated every 15 seconds, over the last 30 days -- that is what this chapter builds.


Monitoring Philosophy: The Three Pillars

Modern observability rests on three pillars:

┌──────────────────────────────────────────────────────────────┐
│                  THE THREE PILLARS                            │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │   METRICS   │  │    LOGS     │  │   TRACES    │          │
│  │             │  │             │  │             │          │
│  │ Numbers     │  │ Events      │  │ Request     │          │
│  │ over time   │  │ with        │  │ paths       │          │
│  │             │  │ context     │  │ through     │          │
│  │ CPU: 45%    │  │ "Error:     │  │ services    │          │
│  │ RAM: 62%    │  │  connection │  │             │          │
│  │ Req/s: 150  │  │  refused"   │  │ A → B → C  │          │
│  │ Latency: 8ms│  │             │  │ (200ms)     │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
│                                                              │
│  This chapter focuses on METRICS (Prometheus + Grafana).     │
│  Logs were covered in Chapter 17.                            │
│  Traces are used in microservice architectures.              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Metrics

Metrics are numeric measurements collected at regular intervals: CPU usage, memory consumption, request rate, error count, response time. They are cheap to store, fast to query, and excellent for dashboards and alerts.

Logs

Logs are timestamped records of discrete events: "User john logged in," "Connection to database refused," "Payment processed successfully." They provide context but are expensive to store and search at scale.

Traces

Traces follow a single request as it moves through multiple services. They show where time is spent and where failures occur. Essential for microservices, less critical for single-server setups.


Prometheus Architecture

Prometheus is the de facto standard for open-source metrics monitoring. It was created at SoundCloud, inspired by Google's internal Borgmon system, and is now a graduated CNCF project.

┌──────────────────────────────────────────────────────────────┐
│                 PROMETHEUS ARCHITECTURE                       │
│                                                              │
│  ┌──────────────┐                                            │
│  │ Prometheus   │◄── scrapes ─── node_exporter (port 9100)   │
│  │ Server       │◄── scrapes ─── nginx_exporter (port 9113)  │
│  │              │◄── scrapes ─── app /metrics endpoint       │
│  │ ┌──────────┐ │                                            │
│  │ │  TSDB    │ │    Time Series Database stores all metrics │
│  │ │ (local)  │ │                                            │
│  │ └──────────┘ │                                            │
│  │ ┌──────────┐ │                                            │
│  │ │ PromQL   │ │    Query language for metrics              │
│  │ │ Engine   │ │                                            │
│  │ └──────────┘ │                                            │
│  │ ┌──────────┐ │                                            │
│  │ │ Alert    │ │    Evaluates alert rules                   │
│  │ │ Rules    │ │                                            │
│  │ └──────────┘ │                                            │
│  └──────┬───────┘                                            │
│         │                                                    │
│         ▼                                                    │
│  ┌──────────────┐         ┌──────────────┐                   │
│  │ Alertmanager │────────►│ Notifications│                   │
│  │              │         │ Email, Slack  │                   │
│  └──────────────┘         └──────────────┘                   │
│         ▲                                                    │
│  ┌──────────────┐                                            │
│  │   Grafana    │  Queries Prometheus for dashboard data     │
│  │              │                                            │
│  └──────────────┘                                            │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Key Design Decisions

Pull-based scraping: Prometheus pulls metrics from targets at regular intervals (default: every 15 seconds). Targets expose metrics on HTTP endpoints. This is fundamentally different from push-based systems where agents send metrics to a central server.

Local time-series database (TSDB): Prometheus stores data on local disk in an efficient, compressed format. No external database needed.

Multi-dimensional data model: Every metric has a name and a set of key-value labels:

http_requests_total{method="GET", handler="/api/users", status="200"} 14523
http_requests_total{method="POST", handler="/api/users", status="201"} 892
http_requests_total{method="GET", handler="/api/users", status="500"} 3

Think About It: Why would a pull-based model be preferred over push-based for monitoring? Think about what happens when a target goes down -- how does each model detect it?


Installing node_exporter

node_exporter exposes Linux system metrics in Prometheus format. It is the first thing to install on any server you want to monitor.

Download and Install

# Download node_exporter
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

# Extract
$ tar xzf node_exporter-1.7.0.linux-amd64.tar.gz

# Move binary
$ sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

# Create a system user
$ sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter

Create a systemd Service

$ sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
Wants=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now node_exporter

Verify It Works

$ curl -s http://localhost:9100/metrics | head -20
# HELP go_gc_duration_seconds A summary of pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.5717e-05
...
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 123456.78
node_cpu_seconds_total{cpu="0",mode="system"} 4567.89
node_cpu_seconds_total{cpu="0",mode="user"} 7890.12

Those lines are Prometheus metrics. Every metric has a name, optional labels, and a numeric value. node_exporter exposes hundreds of metrics covering CPU, memory, disk, network, filesystem, and more.


Installing Prometheus

Download and Install

# Download Prometheus
$ wget https://github.com/prometheus/prometheus/releases/download/v2.49.0/prometheus-2.49.0.linux-amd64.tar.gz

# Extract
$ tar xzf prometheus-2.49.0.linux-amd64.tar.gz

# Move binaries
$ sudo mv prometheus-2.49.0.linux-amd64/prometheus /usr/local/bin/
$ sudo mv prometheus-2.49.0.linux-amd64/promtool /usr/local/bin/

# Create directories
$ sudo mkdir -p /etc/prometheus /var/lib/prometheus

# Copy console libraries
$ sudo mv prometheus-2.49.0.linux-amd64/consoles /etc/prometheus/
$ sudo mv prometheus-2.49.0.linux-amd64/console_libraries /etc/prometheus/

# Create system user
$ sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus
$ sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

Configure Prometheus

$ sudo tee /etc/prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s          # How often to scrape targets
  evaluation_interval: 15s      # How often to evaluate alert rules
  scrape_timeout: 10s           # Timeout for each scrape

scrape_configs:
  - job_name: 'prometheus'      # Monitor Prometheus itself
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'            # Monitor system via node_exporter
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: 'my-server'
          environment: 'production'
EOF

$ sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

Create a systemd Service

$ sudo tee /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description=Prometheus Monitoring System
After=network-online.target
Wants=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/var/lib/prometheus/ \
    --storage.tsdb.retention.time=30d \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now prometheus

Verify Prometheus Is Running

$ curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool

You should see your targets listed with "health": "up".

Open http://localhost:9090 in a browser to access the Prometheus web UI.


PromQL Basics

PromQL (Prometheus Query Language) is how you ask questions about your metrics. You will use it in Prometheus's web UI and in Grafana dashboards.

Simple Queries

# Current CPU usage across all modes
node_cpu_seconds_total

# Filter by label: only idle CPU time
node_cpu_seconds_total{mode="idle"}

# Current memory available in bytes
node_memory_MemAvailable_bytes

# Filesystem usage
node_filesystem_avail_bytes{mountpoint="/"}

Rate and Aggregation

# CPU usage rate over last 5 minutes (per second)
rate(node_cpu_seconds_total{mode="idle"}[5m])

# Total CPU utilization percentage
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Disk I/O rate (bytes per second read)
rate(node_disk_read_bytes_total[5m])

# Network traffic rate (bytes per second received)
rate(node_network_receive_bytes_total{device="eth0"}[5m])

# Memory usage percentage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

Try It in Prometheus UI

Navigate to http://localhost:9090/graph, type a query, and click "Execute." Switch to the "Graph" tab to see the metric over time.

# Good first query -- see memory usage as a percentage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

Installing Grafana

Grafana provides beautiful, interactive dashboards for your metrics.

On Debian/Ubuntu

$ sudo apt-get install -y apt-transport-https software-properties-common
$ wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
$ echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" \
    | sudo tee /etc/apt/sources.list.d/grafana.list
$ sudo apt-get update
$ sudo apt-get install -y grafana
$ sudo systemctl enable --now grafana-server

On RHEL/Fedora

$ sudo tee /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
EOF
$ sudo dnf install -y grafana
$ sudo systemctl enable --now grafana-server

Grafana runs on port 3000. Default credentials: admin / admin (you will be forced to change the password on first login).


Hands-On: Connect Grafana to Prometheus

Step 1: Open Grafana at http://localhost:3000 and log in.

Step 2: Add Prometheus as a data source:

  • Navigate to Configuration (gear icon) > Data Sources > Add data source
  • Select "Prometheus"
  • Set URL to http://localhost:9090
  • Click "Save & Test" -- you should see "Data source is working"

Step 3: Import a pre-built dashboard for node_exporter:

  • Navigate to Dashboards > Import
  • Enter dashboard ID: 1860 (Node Exporter Full)
  • Select your Prometheus data source
  • Click "Import"

You should immediately see a dashboard with CPU usage, memory, disk I/O, network traffic, and dozens of other system metrics, with data going back to when you started Prometheus.

Step 4: Create a custom dashboard panel:

  • Click "+ New dashboard" > "Add visualization"
  • Select your Prometheus data source
  • In the query field, enter:
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
  • Set the panel title to "Memory Usage %"
  • Under Standard options, set Unit to "Percent (0-100)"
  • Click "Apply"

You now have a live memory usage graph.

Think About It: You want to monitor 50 servers with Prometheus and Grafana. Would you install Prometheus on each server, or have one central Prometheus instance scrape all 50? What are the trade-offs?


Alertmanager: Sending Alerts

Monitoring without alerting means someone has to watch dashboards all day. Alertmanager sends notifications when metrics cross thresholds.

Install Alertmanager

$ wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
$ tar xzf alertmanager-0.27.0.linux-amd64.tar.gz
$ sudo mv alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
$ sudo mv alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/
$ sudo mkdir -p /etc/alertmanager
$ sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager

Configure Alertmanager

$ sudo tee /etc/alertmanager/alertmanager.yml << 'EOF'
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s           # Wait before sending first notification
  group_interval: 10m       # Wait between notifications for same group
  repeat_interval: 3h       # Re-send if alert still firing
  receiver: 'default'

  routes:
    - match:
        severity: critical
      receiver: 'critical-alerts'

receivers:
  - name: 'default'
    webhook_configs:
      - url: 'http://localhost:5001/alert'   # Example webhook endpoint

  - name: 'critical-alerts'
    webhook_configs:
      - url: 'http://localhost:5001/critical'
EOF

For email notifications, configure the email_configs receiver:

receivers:
  - name: 'email-alerts'
    email_configs:
      - to: 'ops-team@example.com'
        from: 'prometheus@example.com'
        smarthost: 'smtp.example.com:587'
        auth_username: 'prometheus@example.com'
        auth_password: 'smtp-password'

Create a systemd Service for Alertmanager

$ sudo tee /etc/systemd/system/alertmanager.service << 'EOF'
[Unit]
Description=Prometheus Alertmanager
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager/

[Install]
WantedBy=multi-user.target
EOF

$ sudo mkdir -p /var/lib/alertmanager
$ sudo chown alertmanager:alertmanager /var/lib/alertmanager
$ sudo systemctl daemon-reload
$ sudo systemctl enable --now alertmanager

Defining Alert Rules

Alert rules are defined in Prometheus, not Alertmanager. Prometheus evaluates rules and sends firing alerts to Alertmanager.

Create Alert Rules

$ sudo tee /etc/prometheus/alert_rules.yml << 'EOF'
groups:
  - name: system_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is above 80% for more than 5 minutes (current: {{ $value }}%)"

      - alert: HighMemoryUsage
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is above 85% (current: {{ $value }}%)"

      - alert: DiskSpaceRunningLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Disk space running low"
          description: "Root filesystem has less than 15% free space (current: {{ $value }}%)"

      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "{{ $labels.job }} target {{ $labels.instance }} has been unreachable for 1 minute"

      - alert: HighNetworkTraffic
        expr: rate(node_network_receive_bytes_total{device="eth0"}[5m]) > 100000000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High inbound network traffic"
          description: "Network receive rate exceeds 100MB/s for 5 minutes"
EOF

Update Prometheus Configuration

Add the alert rules file and Alertmanager connection to prometheus.yml:

$ sudo tee /etc/prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
EOF

$ sudo chown prometheus:prometheus /etc/prometheus/alert_rules.yml
$ sudo systemctl restart prometheus

Verify Alert Rules

# Check rule syntax
$ promtool check rules /etc/prometheus/alert_rules.yml
Checking /etc/prometheus/alert_rules.yml
  SUCCESS: 5 rules found

Visit http://localhost:9090/alerts to see alert states: inactive (green), pending (yellow), or firing (red).


The Complete Stack: How It All Fits Together

┌──────────────────────────────────────────────────────────────┐
│              COMPLETE MONITORING STACK                        │
│                                                              │
│  YOUR SERVERS                                                │
│  ┌──────────────────────┐   ┌──────────────────────┐         │
│  │ Server 1             │   │ Server 2             │         │
│  │ ┌──────────────────┐ │   │ ┌──────────────────┐ │         │
│  │ │  node_exporter   │ │   │ │  node_exporter   │ │         │
│  │ │  :9100           │ │   │ │  :9100           │ │         │
│  │ └──────────────────┘ │   │ └──────────────────┘ │         │
│  └──────────┬───────────┘   └──────────┬───────────┘         │
│             │    scrape every 15s       │                     │
│             ▼                           ▼                     │
│  ┌──────────────────────────────────────────────────┐        │
│  │            Prometheus :9090                       │        │
│  │  • Scrapes targets                                │        │
│  │  • Stores metrics in TSDB                         │        │
│  │  • Evaluates alert rules                          │        │
│  │  • Serves PromQL queries                          │        │
│  └────────┬─────────────────────┬────────────────────┘        │
│           │                     │                             │
│           ▼                     ▼                             │
│  ┌────────────────┐    ┌────────────────┐                    │
│  │ Alertmanager   │    │   Grafana      │                    │
│  │ :9093          │    │   :3000        │                    │
│  │                │    │                │                    │
│  │ Routes alerts  │    │ Dashboards     │                    │
│  │ to receivers   │    │ from PromQL    │                    │
│  └───────┬────────┘    └────────────────┘                    │
│          │                                                   │
│          ▼                                                   │
│  Email, Slack, Webhook, PagerDuty, etc.                      │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Debug This

Your monitoring stack is set up, but Prometheus shows node target as "DOWN" even though node_exporter is running. The error message says "connection refused."

$ sudo systemctl status node_exporter
● node_exporter.service - Prometheus Node Exporter
     Active: active (running)

$ curl http://localhost:9100/metrics
curl: (7) Failed to connect to localhost port 9100: Connection refused

What is going on?

Diagnosis steps:

# Check what port node_exporter is actually listening on
$ ss -tlnp | grep node_exporter

# Check if a firewall is blocking the port
$ sudo iptables -L -n | grep 9100
$ sudo ufw status

# Check if node_exporter is bound to a specific interface
$ journalctl -u node_exporter -n 20

Common causes:

  1. node_exporter is listening on a different interface (e.g., 127.0.0.1 vs 0.0.0.0)
  2. A firewall is blocking port 9100
  3. Another service is using port 9100, and node_exporter silently failed to bind
  4. SELinux or AppArmor is blocking the connection

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 70 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Monitoring = metrics + logs + traces                        │
│                                                              │
│  The Prometheus Stack:                                       │
│  • node_exporter: exposes system metrics on :9100            │
│  • Prometheus: scrapes targets, stores TSDB, evaluates rules │
│  • Grafana: visualizes metrics in dashboards                 │
│  • Alertmanager: routes alerts to email, Slack, etc.         │
│                                                              │
│  PromQL essentials:                                          │
│  • rate() for per-second rates of counters                   │
│  • avg(), sum(), max() for aggregation                       │
│  • Label filters: metric{label="value"}                      │
│                                                              │
│  Alert rules:                                                │
│  • Defined in Prometheus config                              │
│  • expr: PromQL expression that triggers the alert           │
│  • for: how long the condition must be true                  │
│  • Routed by Alertmanager to notification channels           │
│                                                              │
│  Key metrics to monitor:                                     │
│  • CPU usage, memory usage, disk space, disk I/O             │
│  • Network traffic, service availability (up/down)           │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: PromQL Practice

With Prometheus running, write PromQL queries for:

  1. Total number of CPU cores
  2. Current disk I/O write rate for all devices
  3. Total network bytes transmitted in the last hour
  4. System uptime in days

Exercise 2: Custom Dashboard

Create a Grafana dashboard with four panels:

  1. CPU usage over time (line graph)
  2. Memory usage gauge (current %)
  3. Disk space for all mount points (bar chart)
  4. Network traffic in/out (dual line graph)

Exercise 3: Alert Testing

Create an alert rule that fires when filesystem usage exceeds 50% (a threshold low enough to test easily). Verify it appears in the Prometheus alerts page. Try to trigger it by creating a large temporary file.

Exercise 4: Monitor a Service

Install nginx and configure Prometheus to scrape nginx metrics. You will need the nginx-prometheus-exporter. Set up an alert for when nginx is down.

Bonus Challenge

Set up monitoring for multiple machines. Install node_exporter on a second machine (or VM or container), add it to your Prometheus config, and create a Grafana dashboard that compares CPU and memory usage across both machines side by side.

Enterprise Linux

Why This Matters

A hospital runs a patient records system on Linux. If that system goes down for even an hour, patients could be harmed. A bank processes millions of transactions daily on Linux servers. If a kernel update introduces a bug, the financial consequences are staggering. An aerospace company runs simulations on Linux clusters certified for safety-critical work. They cannot just apt upgrade and hope for the best.

These organizations do not pick a Linux distribution the way a hobbyist does. They need guaranteed security patches for a decade. They need vendors who answer the phone at 3am. They need certified hardware compatibility and compliance with regulations like HIPAA, PCI-DSS, and SOC 2.

This is the world of enterprise Linux. It is not glamorous, but it is where Linux makes the most money and runs the most critical workloads. Understanding it is essential if you want a career in Linux systems administration.


Try This Right Now

Regardless of which distribution you are running, you can see some enterprise-relevant information about your system:

# What distribution are you running?
$ cat /etc/os-release

# How long has your system been up?
$ uptime

# What kernel are you running?
$ uname -r

# When does your distribution reach end of life?
# (Check your distro's documentation for the exact date)
$ cat /etc/os-release | grep -i "support\|eol\|version"

# Is your system receiving security updates?
# Debian/Ubuntu:
$ apt list --upgradable 2>/dev/null | head -10

# RHEL/Fedora/Alma/Rocky:
$ dnf check-update --security 2>/dev/null | head -10

If you are on a system that was installed and never updated, you might be surprised by the number of pending security patches.


What "Enterprise Linux" Means

Enterprise Linux is not a specific distribution. It is a category defined by these characteristics:

┌──────────────────────────────────────────────────────────────┐
│              ENTERPRISE vs COMMUNITY LINUX                    │
│                                                              │
│  COMMUNITY DISTROS              ENTERPRISE DISTROS           │
│  ────────────────               ─────────────────            │
│  Fedora, Debian Sid             RHEL, SLES, Ubuntu LTS       │
│  Latest packages                Stable, tested packages      │
│  Short support (1-2 yrs)        Long support (5-10+ yrs)     │
│  Community support only         Vendor support contracts      │
│  Frequent major upgrades        Predictable release cycles   │
│  Bleeding edge                  Conservative, proven          │
│  Free                           Free or paid subscriptions   │
│                                                              │
│  Great for: learning,           Great for: production,       │
│  development, personal use      regulated industries,        │
│                                 mission-critical systems     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The Pillars of Enterprise Linux

Long-term support: Security patches and bug fixes for 5 to 10+ years without requiring a major version upgrade.

Vendor support: A company you can call when things break. SLAs (Service Level Agreements) that guarantee response times.

Certification: Hardware vendors certify their equipment works with specific enterprise Linux versions. Software vendors (Oracle, SAP, etc.) certify their applications run on specific distributions.

Compliance: Documentation, audit trails, and security configurations that meet regulatory requirements.

Predictability: Known release schedules, clear upgrade paths, backported security fixes that do not change behavior.


The RHEL Ecosystem

Red Hat Enterprise Linux (RHEL) is the most widely deployed enterprise Linux distribution. Understanding its ecosystem is essential.

RHEL (Red Hat Enterprise Linux)

  • Vendor: Red Hat (owned by IBM since 2019)
  • Cost: Subscription-based (free for development, paid for production support)
  • Support lifecycle: 10 years (Full Support: 5 years, Maintenance Support: 5 years, Extended Life: additional)
  • Use case: Production servers, enterprise applications, SAP, Oracle
┌──────────────────────────────────────────────────────────────┐
│                    RHEL LIFECYCLE                              │
│                                                              │
│  Year:  0    1    2    3    4    5    6    7    8    9   10    │
│         │    │    │    │    │    │    │    │    │    │    │    │
│         ├────────────────────────┤                            │
│         │   Full Support         │                            │
│         │   (new features,       │                            │
│         │   bug fixes, security) │                            │
│         │                        ├───────────────────────┤    │
│         │                        │  Maintenance Support   │    │
│         │                        │  (security fixes,      │    │
│         │                        │  critical bug fixes)   │    │
│                                                              │
│  RHEL 8: released 2019, full support until 2024,             │
│          maintenance until 2029, ELS until 2031              │
│  RHEL 9: released 2022, full support until 2027,             │
│          maintenance until 2032                              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

CentOS Stream

After Red Hat discontinued traditional CentOS (the free RHEL clone) in 2021, CentOS Stream became the upstream development branch for RHEL. It sits between Fedora and RHEL:

Fedora ──► CentOS Stream ──► RHEL
(bleeding    (near-RHEL,      (stable,
 edge)       rolling preview)  supported)

CentOS Stream receives updates before RHEL, making it a preview of the next RHEL minor release. It is suitable for development and testing but is not a direct RHEL replacement for production use.

AlmaLinux

  • Vendor: AlmaLinux OS Foundation (community-governed)
  • Cost: Free
  • Compatibility: 1:1 binary compatible with RHEL
  • Support lifecycle: Matches RHEL lifecycle

AlmaLinux emerged after the CentOS discontinuation as a free, community-owned RHEL clone. It is built from RHEL source packages and aims for bug-for-bug compatibility.

# Converting CentOS 8 to AlmaLinux
$ curl -O https://raw.githubusercontent.com/AlmaLinux/almalinux-deploy/master/almalinux-deploy.sh
$ sudo bash almalinux-deploy.sh

Rocky Linux

  • Vendor: Rocky Enterprise Software Foundation
  • Cost: Free
  • Compatibility: 1:1 binary compatible with RHEL
  • Founded by: Gregory Kurtzer (CentOS co-founder)

Rocky Linux has the same goal as AlmaLinux: a free, community-owned RHEL rebuild. The two distributions are nearly identical in practice; the choice often comes down to community preference.

Think About It: If AlmaLinux and Rocky Linux are both free RHEL clones, why would any organization pay for RHEL? What do they get for the subscription cost?


The SUSE Ecosystem

SUSE Linux Enterprise Server (SLES)

  • Vendor: SUSE
  • Cost: Subscription-based
  • Support lifecycle: 10 years (General Support) + 3 years (Long Term Service Pack Support)
  • Strength: Strong in European markets, SAP deployments, mainframes

SLES uses zypper as its package manager and RPM packages, similar to RHEL but with different configuration tools (YaST).

openSUSE

openSUSE is the community counterpart to SLES, available in two flavors:

  • openSUSE Leap: Shares core packages with SLES, predictable releases
  • openSUSE Tumbleweed: Rolling release, bleeding edge
openSUSE Tumbleweed ──► openSUSE Leap ──► SLES
(rolling release)       (stable,           (enterprise,
                        SLES-based)        supported)

Ubuntu in the Enterprise

Ubuntu LTS

  • Vendor: Canonical
  • Cost: Free (Ubuntu Pro adds extended support for free on up to 5 machines, paid at scale)
  • LTS Support: 5 years standard, 10 years with Ubuntu Pro (ESM)
  • Release cycle: New LTS every 2 years (April of even years: 22.04, 24.04, 26.04)

Ubuntu dominates the cloud. The majority of cloud instances run Ubuntu, and most cloud providers offer Ubuntu as a first-class option.

Ubuntu Pro

Ubuntu Pro extends security updates from 5 to 10 years and covers the full software universe (not just main). It also includes:

  • FIPS 140-2 certified crypto modules
  • CIS hardening tools
  • Kernel livepatch (apply kernel security fixes without rebooting)
# Check Ubuntu Pro status
$ pro status

# Attach to Ubuntu Pro (free for up to 5 machines)
$ sudo pro attach <token>

Distro Note: When choosing between RHEL-family and Ubuntu for enterprise use, consider your vendor ecosystem. SAP and Oracle have traditionally certified on RHEL/SLES. Cloud-native workloads often favor Ubuntu. Both are excellent choices; let your application requirements guide the decision.


Support Contracts and Lifecycle

What a Support Contract Gets You

┌──────────────────────────────────────────────────────────────┐
│                 ENTERPRISE SUPPORT TIERS                      │
│                                                              │
│  SELF-SUPPORT (free)                                         │
│  • Community forums                                          │
│  • Documentation                                             │
│  • Best-effort community patches                             │
│                                                              │
│  STANDARD SUPPORT ($$)                                       │
│  • Business-hours support                                    │
│  • Response time SLA (e.g., 4 hours for critical)            │
│  • Access to knowledge base                                  │
│  • Software updates and patches                              │
│                                                              │
│  PREMIUM SUPPORT ($$$)                                       │
│  • 24/7 support                                              │
│  • 1-hour response for critical issues                       │
│  • Dedicated technical account manager                       │
│  • Proactive monitoring and recommendations                  │
│  • Certified software/hardware compatibility                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Understanding Software Lifecycle

Every enterprise distribution publishes a lifecycle policy. Knowing where your OS falls in its lifecycle determines your patching and migration strategy.

# Check RHEL lifecycle status
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 (Plow)

# Check which repositories are enabled
$ dnf repolist

# Check for available security updates
$ dnf updateinfo summary
# Check Ubuntu lifecycle status
$ ubuntu-support-status
# or
$ hwe-support-status --verbose

Compliance and Certification

Enterprise environments often must meet specific compliance standards.

Common Compliance Frameworks

  • PCI-DSS: Payment card industry -- requires specific security controls, logging, access management
  • HIPAA: Healthcare -- protects patient data, requires audit trails
  • SOC 2: General data security -- proves your organization handles data responsibly
  • FedRAMP: US government cloud security
  • FIPS 140-2/140-3: Cryptographic module validation

How Enterprise Linux Helps

Enterprise distributions provide:

# RHEL: SCAP security profiles
$ sudo dnf install -y scap-security-guide
$ oscap info /usr/share/xml/scap/ssg/content/ssg-rhel9-ds.xml

# Apply a CIS benchmark profile
$ sudo oscap xccdf eval \
    --profile cis \
    --report /tmp/compliance-report.html \
    /usr/share/xml/scap/ssg/content/ssg-rhel9-ds.xml
# Ubuntu: CIS hardening with Ubuntu Pro
$ sudo apt install -y ubuntu-advantage-tools
$ sudo ua enable cis

These tools automate the hundreds of individual security settings required by compliance frameworks.


Change Management and Patch Management

In enterprise environments, you never just run apt upgrade on a Friday afternoon. Changes follow a structured process.

The Change Management Process

┌──────────────────────────────────────────────────────────────┐
│                CHANGE MANAGEMENT WORKFLOW                     │
│                                                              │
│  1. REQUEST                                                  │
│     "Kernel CVE-2024-1234 needs patching"                    │
│                    │                                         │
│                    ▼                                         │
│  2. ASSESS                                                   │
│     Risk analysis, impact assessment, rollback plan          │
│                    │                                         │
│                    ▼                                         │
│  3. APPROVE                                                  │
│     Change Advisory Board (CAB) reviews and approves         │
│                    │                                         │
│                    ▼                                         │
│  4. TEST                                                     │
│     Apply to dev → staging → pre-production                  │
│                    │                                         │
│                    ▼                                         │
│  5. IMPLEMENT                                                │
│     Apply to production during maintenance window            │
│                    │                                         │
│                    ▼                                         │
│  6. VERIFY                                                   │
│     Confirm systems are healthy, services are running        │
│                    │                                         │
│                    ▼                                         │
│  7. DOCUMENT                                                 │
│     Record what changed, when, and by whom                   │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Practical Patch Management

# RHEL/AlmaLinux/Rocky: check for security updates
$ sudo dnf check-update --security

# Apply only security updates
$ sudo dnf update --security

# See what vulnerabilities are fixed
$ sudo dnf updateinfo list --security

# Ubuntu: check for security updates
$ apt list --upgradable 2>/dev/null | grep -i security

# Apply only security updates
$ sudo unattended-upgrade --dry-run  # Preview
$ sudo unattended-upgrade            # Apply

Kernel Live Patching

Enterprise distributions offer live kernel patching -- applying critical security fixes without rebooting:

# RHEL: kpatch
$ sudo dnf install -y kpatch
$ sudo kpatch list

# Ubuntu: Canonical Livepatch
$ sudo canonical-livepatch status

This is critical for systems with strict uptime requirements (databases, financial systems, healthcare).


Enterprise Considerations

High Availability (HA)

Enterprise systems need to keep running even when hardware fails. Linux HA clusters use:

  • Pacemaker: Cluster resource manager
  • Corosync: Cluster communication and membership
  • DRBD: Distributed Replicated Block Device (network RAID)
┌──────────────────────────────────────────────────────────────┐
│                 HA CLUSTER ARCHITECTURE                       │
│                                                              │
│  ┌─────────────────┐         ┌─────────────────┐            │
│  │   Node 1        │◄──────►│   Node 2        │            │
│  │   (Active)      │ Corosync│   (Standby)     │            │
│  │                 │ heartbt │                 │            │
│  │ ┌─────────────┐ │         │ ┌─────────────┐ │            │
│  │ │ PostgreSQL  │ │         │ │ PostgreSQL  │ │            │
│  │ │ (running)   │ │         │ │ (standby)   │ │            │
│  │ └─────────────┘ │         │ └─────────────┘ │            │
│  │ ┌─────────────┐ │   DRBD  │ ┌─────────────┐ │            │
│  │ │ /data       │ │◄──────►│ │ /data       │ │            │
│  │ │ (primary)   │ │ repl.  │ │ (secondary) │ │            │
│  │ └─────────────┘ │         │ └─────────────┘ │            │
│  │ ┌─────────────┐ │         │                 │            │
│  │ │ Virtual IP  │ │         │  Pacemaker will │            │
│  │ │ 10.0.0.100  │ │         │  move resources │            │
│  │ └─────────────┘ │         │  here if Node 1 │            │
│  └─────────────────┘         │  fails           │            │
│                              └─────────────────┘            │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Shared Storage

Enterprise environments use shared storage for clusters and large-scale deployments:

  • SAN (Storage Area Network): Fibre Channel or iSCSI block storage
  • NFS: Network file shares (covered in Chapter 49)
  • GFS2/OCFS2: Cluster filesystems for shared block devices
  • Ceph: Distributed storage system providing block, object, and file storage

Hardware Compatibility

Enterprise distributions maintain Hardware Compatibility Lists (HCLs). Before deploying RHEL on a server, check that the specific model is certified:

# RHEL: Check hardware certification
# Visit https://catalog.redhat.com/hardware

# Check if loaded drivers are supported
$ lspci -k | grep "Kernel driver"

Hands-On: Enterprise System Health Check

Run this checklist on any system to evaluate its enterprise readiness:

# 1. Check OS and support status
echo "=== OS Information ==="
cat /etc/os-release

# 2. Check kernel version
echo "=== Kernel ==="
uname -r

# 3. Check uptime
echo "=== Uptime ==="
uptime

# 4. Check pending updates
echo "=== Pending Updates ==="
if command -v dnf &>/dev/null; then
    dnf check-update --security 2>/dev/null | tail -5
elif command -v apt &>/dev/null; then
    apt list --upgradable 2>/dev/null | wc -l
fi

# 5. Check if SELinux/AppArmor is enabled
echo "=== Security Framework ==="
if command -v getenforce &>/dev/null; then
    getenforce
elif command -v aa-status &>/dev/null; then
    sudo aa-status | head -3
fi

# 6. Check if firewall is active
echo "=== Firewall ==="
if command -v firewall-cmd &>/dev/null; then
    firewall-cmd --state
elif command -v ufw &>/dev/null; then
    ufw status
fi

# 7. Check NTP synchronization
echo "=== Time Sync ==="
timedatectl | grep "synchronized"

# 8. Check disk space
echo "=== Disk Space ==="
df -h / | tail -1

# 9. Check failed services
echo "=== Failed Services ==="
systemctl --failed --no-pager

Any system heading to production should pass all of these checks.

Think About It: Your company is choosing between AlmaLinux (free) and RHEL (paid subscription). The CFO asks why you would spend money on RHEL when AlmaLinux is identical. What arguments would you make for each option?


Debug This

A new employee set up a production RHEL server but made several enterprise-unfriendly decisions. Find the problems:

  1. SELinux is in permissive mode
  2. The firewall is disabled
  3. The server has not been updated in 6 months
  4. root login is enabled over SSH
  5. NTP is not configured
  6. No monitoring agent is installed
  7. The only user with sudo is the root account

Why each is a problem:

  1. SELinux permissive = it logs policy violations but does not enforce them. In production, it should be enforcing.
  2. Firewall disabled = all ports are open. Enterprise servers should only expose required services.
  3. 6 months without updates = known vulnerabilities are unpatched. In regulated industries, this is a compliance violation.
  4. root SSH login = anyone who obtains the root password can log in remotely. Use named accounts with sudo instead.
  5. No NTP = time drift causes log correlation failures, Kerberos authentication breaks, and certificate validation can fail.
  6. No monitoring = you will not know something is wrong until users report it.
  7. Only root has sudo = no accountability. All actions are attributed to "root" with no individual tracking.

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 71 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Enterprise Linux = long support, vendor backing,            │
│  compliance, predictability, certification.                  │
│                                                              │
│  RHEL ecosystem:                                             │
│  • RHEL: paid subscription, 10-year support                  │
│  • CentOS Stream: RHEL upstream preview                      │
│  • AlmaLinux, Rocky Linux: free RHEL clones                  │
│                                                              │
│  SUSE ecosystem: SLES (enterprise) + openSUSE (community)    │
│  Ubuntu: LTS releases + Ubuntu Pro for extended support      │
│                                                              │
│  Enterprise practices:                                       │
│  • Change management (request → test → approve → deploy)     │
│  • Patch management (security updates, live patching)        │
│  • Compliance (PCI-DSS, HIPAA, SOC 2, FIPS)                 │
│  • High availability (Pacemaker + Corosync)                  │
│  • Shared storage (SAN, Ceph, GFS2)                          │
│                                                              │
│  The choice of enterprise distro depends on your             │
│  application ecosystem, support needs, and budget.           │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Lifecycle Research

Look up the exact end-of-life dates for:

  • RHEL 8 and RHEL 9
  • Ubuntu 22.04 LTS and Ubuntu 24.04 LTS
  • AlmaLinux 9

How many years of security updates does each provide?

Exercise 2: Compliance Scan

If you are on a RHEL-family system, install scap-security-guide and run an OpenSCAP scan against the CIS benchmark. Review the report and identify which checks pass and which fail.

Exercise 3: Patch Audit

On your system, list all security updates applied in the last 30 days. On RHEL-family systems, use dnf history list and dnf history info <ID>. On Ubuntu, check /var/log/apt/history.log.

Exercise 4: HA Concepts

Draw a diagram of a two-node HA cluster for a PostgreSQL database. Include: Pacemaker, Corosync, a virtual IP, and a shared storage mechanism. Describe what happens when Node 1 fails.

Bonus Challenge

Set up a test environment with two AlmaLinux or Rocky Linux VMs. Install Pacemaker and Corosync, configure a virtual IP that floats between the two nodes, and test failover by stopping the cluster service on the active node.

Embedded Linux

Why This Matters

Look around you. Your Wi-Fi router runs Linux. Your smart TV runs Linux. The infotainment system in your car almost certainly runs Linux. Medical devices, industrial robots, drones, security cameras, smart home hubs, point-of-sale terminals, digital billboards -- Linux is everywhere, and most of it is not running on anything resembling a traditional server.

This is embedded Linux: Linux running on specialized hardware with limited resources, often without a screen or keyboard, performing a specific set of tasks reliably for years without human intervention.

Understanding embedded Linux matters for several reasons. If you work in IoT, automotive, or industrial automation, you will encounter it daily. Even if you are a server administrator, understanding how Linux works under extreme resource constraints deepens your knowledge of the operating system fundamentals. And if you own a Raspberry Pi, you are already running embedded Linux -- even if you did not know it.


Try This Right Now

If you have any Linux system, you can see how your current system compares to an embedded one:

# How much RAM do you have?
$ free -m | grep Mem
Mem:           7964        2134        1204         125        4625        5425

# An embedded device might have 64MB or even 16MB of RAM.

# How big is your root filesystem?
$ df -h / | tail -1
/dev/sda1        50G   28G   20G  59% /

# An embedded device might have 256MB of flash storage total.

# How many processes are running?
$ ps aux | wc -l
247

# An embedded device might run 15-20 processes.

# How big is your kernel?
$ ls -lh /boot/vmlinuz-$(uname -r)
-rw-r--r-- 1 root root 11M Jan 15 10:00 /boot/vmlinuz-6.1.0-18-amd64

# An embedded kernel can be stripped down to 1-2MB.

The difference is dramatic. Embedded Linux is about doing more with less.


What Is Embedded Linux?

An embedded system is a computer designed to perform a dedicated function within a larger system. Unlike a general-purpose computer, an embedded device runs a fixed set of software tailored to its specific purpose.

┌──────────────────────────────────────────────────────────────┐
│            GENERAL-PURPOSE vs EMBEDDED LINUX                  │
│                                                              │
│  GENERAL-PURPOSE (Desktop/Server)                            │
│  ─────────────────────────────────                           │
│  • Full distro (Ubuntu, Fedora)                              │
│  • Thousands of packages available                           │
│  • User installs and runs any software                       │
│  • 4-128 GB RAM, 100+ GB storage                             │
│  • Keyboard, display, network                                │
│  • Boots in 15-60 seconds                                    │
│  • Updated regularly by user                                 │
│                                                              │
│  EMBEDDED                                                    │
│  ────────                                                    │
│  • Minimal custom Linux build                                │
│  • Only needed software included                             │
│  • Runs a specific application                               │
│  • 16 MB - 1 GB RAM, 32 MB - 4 GB storage                   │
│  • Often headless (no display)                               │
│  • Boots in 1-5 seconds                                      │
│  • Updated via firmware OTA or manual flash                  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Where Embedded Linux Runs

  • Networking: Routers, switches, firewalls, access points (OpenWrt)
  • Consumer electronics: Smart TVs, set-top boxes, streaming devices
  • Automotive: Infotainment systems, telematics, dashcams (Automotive Grade Linux)
  • Industrial: PLCs, HMIs, factory robots, CNC machines
  • Medical: Patient monitors, imaging devices, infusion pumps
  • IoT: Smart home hubs, sensors, environmental monitors
  • Aerospace: Drones, satellite subsystems, ground control stations
  • Retail: Point-of-sale terminals, digital signage, vending machines

The Embedded Linux Stack

An embedded Linux system has the same fundamental components as a desktop system, but each is minimized and customized:

┌──────────────────────────────────────────────────────────────┐
│                EMBEDDED LINUX STACK                           │
│                                                              │
│  ┌────────────────────────────────────────┐                  │
│  │        Application                      │                  │
│  │   (your specific software)              │                  │
│  ├────────────────────────────────────────┤                  │
│  │        Root Filesystem                  │                  │
│  │   (BusyBox, libraries, configs)         │                  │
│  ├────────────────────────────────────────┤                  │
│  │        Linux Kernel                     │                  │
│  │   (custom-configured, stripped down)     │                  │
│  ├────────────────────────────────────────┤                  │
│  │        Bootloader                       │                  │
│  │   (U-Boot, barebox)                     │                  │
│  ├────────────────────────────────────────┤                  │
│  │        Hardware                         │                  │
│  │   (SoC: CPU + RAM + peripherals)        │                  │
│  └────────────────────────────────────────┘                  │
│                                                              │
│  Total image size: 8 MB - 500 MB (vs. 2+ GB for desktop)    │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The Bootloader: U-Boot

Most embedded Linux devices use U-Boot (Universal Boot Loader) instead of GRUB. U-Boot supports dozens of CPU architectures and is specifically designed for embedded systems.

Boot sequence on embedded device:
1. Hardware powers on → CPU loads bootloader from flash
2. U-Boot initializes RAM, sets up hardware
3. U-Boot loads the Linux kernel and device tree
4. Kernel initializes, mounts root filesystem
5. Init system starts the application

The Kernel: Custom-Configured

An embedded kernel is stripped to the bare minimum:

# A desktop kernel has thousands of modules
$ find /lib/modules/$(uname -r) -name "*.ko" | wc -l
5847

# An embedded kernel might compile everything needed directly in,
# with zero loadable modules, for a smaller and faster kernel.

Embedded engineers use make menuconfig to carefully select only the drivers and features needed for their specific hardware.


Cross-Compilation

Most embedded devices use ARM, MIPS, or RISC-V processors -- not x86. You cannot compile software directly on a device with 64MB of RAM. Instead, you cross-compile: build on your powerful x86 workstation, targeting the embedded architecture.

┌──────────────────────────────────────────────────────────────┐
│                  CROSS-COMPILATION                            │
│                                                              │
│  ┌─────────────────┐          ┌─────────────────┐            │
│  │  Build Machine  │          │ Target Device   │            │
│  │  (x86_64)       │          │ (ARM)           │            │
│  │                 │          │                 │            │
│  │  Cross-compiler │ ──────►  │  Runs the       │            │
│  │  arm-linux-gcc  │ deploy   │  compiled       │            │
│  │                 │          │  binary         │            │
│  └─────────────────┘          └─────────────────┘            │
│                                                              │
│  The compiler runs on x86 but produces ARM binaries.         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Installing a Cross-Compiler

# On Debian/Ubuntu: install ARM cross-compilation toolchain
$ sudo apt install -y gcc-aarch64-linux-gnu

# Verify
$ aarch64-linux-gnu-gcc --version

Cross-Compiling a Simple Program

$ cat > hello.c << 'EOF'
#include <stdio.h>
int main() {
    printf("Hello from embedded Linux!\n");
    return 0;
}
EOF

# Compile for ARM64
$ aarch64-linux-gnu-gcc -o hello-arm64 hello.c

# Check the binary -- it is ARM, not x86
$ file hello-arm64
hello-arm64: ELF 64-bit LSB executable, ARM aarch64, ...

# Compare with native compilation
$ gcc -o hello-x86 hello.c
$ file hello-x86
hello-x86: ELF 64-bit LSB executable, x86-64, ...

You cannot run hello-arm64 on your x86 machine (unless you use QEMU emulation), but you can copy it to an ARM device and run it there.

Think About It: Why is cross-compilation necessary for embedded development? Could you install a compiler directly on the target device? What would be the trade-offs?


BusyBox: The Swiss Army Knife

On a desktop system, basic commands like ls, cp, cat, grep, mount, and sh are separate binaries, each several hundred kilobytes or more. On an embedded device with 32MB of storage, that is wasteful.

BusyBox combines hundreds of common Unix utilities into a single small binary:

# On a desktop, each command is a separate binary:
$ ls -l /bin/ls /bin/cp /bin/cat /bin/grep
-rwxr-xr-x 1 root root 142144 /bin/ls
-rwxr-xr-x 1 root root 153432 /bin/cp
-rwxr-xr-x 1 root root  43416 /bin/cat
-rwxr-xr-x 1 root root 219456 /bin/grep
# Total: ~550 KB for just 4 commands

# BusyBox provides 300+ commands in ~1 MB

How BusyBox Works

BusyBox is a single binary. Symbolic links point to it with different names. When you run ls, BusyBox checks what name it was called with and executes that command:

/bin/ls     → /bin/busybox
/bin/cp     → /bin/busybox
/bin/cat    → /bin/busybox
/bin/grep   → /bin/busybox
/bin/mount  → /bin/busybox
/bin/sh     → /bin/busybox
... (300+ more)

Trying BusyBox on Your System

# Install BusyBox
$ sudo apt install -y busybox    # Debian/Ubuntu
$ sudo dnf install -y busybox    # Fedora/RHEL

# See all included commands
$ busybox --list | head -20

# Use BusyBox versions of commands
$ busybox ls /
$ busybox df -h
$ busybox uname -a

# Check BusyBox binary size
$ ls -lh $(which busybox)
-rwxr-xr-x 1 root root 1.1M ... /bin/busybox

Distro Note: BusyBox is available on all major distributions for testing, but it is primarily used in embedded systems, Docker scratch/alpine images, and recovery environments. Alpine Linux uses BusyBox as its default userland.


Buildroot: Building an Embedded Linux System

Buildroot is a tool that automates building a complete embedded Linux system: cross-compiler, kernel, root filesystem, and bootloader -- all from source.

How Buildroot Works

┌──────────────────────────────────────────────────────────────┐
│                     BUILDROOT WORKFLOW                        │
│                                                              │
│  1. Select target architecture (ARM, MIPS, x86, RISC-V)     │
│  2. Configure kernel, packages, filesystem format            │
│  3. Buildroot downloads source code for everything           │
│  4. Cross-compiles the entire system                         │
│  5. Produces a ready-to-flash image                          │
│                                                              │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐      │
│  │ Config   │──► │  Buildroot   │──► │ Output images  │      │
│  │ (.config)│    │  (make)      │    │ - kernel       │      │
│  │          │    │              │    │ - rootfs.ext4  │      │
│  │          │    │  Downloads,  │    │ - sdcard.img   │      │
│  │          │    │  compiles,   │    │                │      │
│  │          │    │  assembles   │    │ Flash to device│      │
│  └──────────┘    └──────────────┘    └────────────────┘      │
│                                                              │
│  Build time: 15-60 minutes depending on packages selected    │
│  Output size: 8 MB - 200 MB depending on configuration      │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Quick Start with Buildroot

# Download Buildroot
$ wget https://buildroot.org/downloads/buildroot-2024.02.tar.gz
$ tar xzf buildroot-2024.02.tar.gz
$ cd buildroot-2024.02

# List available board configs
$ ls configs/ | grep raspberry
raspberrypi0_defconfig
raspberrypi3_64_defconfig
raspberrypi4_64_defconfig

# Configure for Raspberry Pi 4
$ make raspberrypi4_64_defconfig

# Customize (optional)
$ make menuconfig

# Build (this takes a while -- go get coffee)
$ make -j$(nproc)

# Output images are in output/images/
$ ls output/images/
sdcard.img  rootfs.ext4  Image  bcm2711-rpi-4-b.dtb

The sdcard.img file can be written directly to an SD card and booted on a Raspberry Pi 4. It contains a minimal Linux system you built from source.


Yocto / OpenEmbedded: Industrial-Grade Build System

For production embedded products, Yocto (built on OpenEmbedded) is the industry standard. It is more complex than Buildroot but offers greater flexibility, better dependency management, and support for commercial products.

┌──────────────────────────────────────────────────────────────┐
│              BUILDROOT vs YOCTO                               │
│                                                              │
│  BUILDROOT                       YOCTO / OPENEMBEDDED        │
│  ─────────                       ───────────────────         │
│  Simple, Makefile-based          Complex, BitBake-based      │
│  Good for small projects         Industry standard           │
│  Full rebuild on changes         Incremental builds          │
│  Smaller learning curve          Steep learning curve        │
│  Single config file              Layer-based architecture    │
│  Limited package management      Full package management     │
│                                  (RPM, DEB, IPK)            │
│                                                              │
│  Choose Buildroot for:           Choose Yocto for:           │
│  - Learning                      - Commercial products       │
│  - Simple projects               - Long-term maintenance     │
│  - Quick prototyping             - Large teams               │
│                                  - BSP vendor support        │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Yocto Concepts

  • Recipe: Instructions for building a single package (like a Makefile on steroids)
  • Layer: A collection of related recipes (meta-networking, meta-python, etc.)
  • BitBake: The build engine that processes recipes
  • Poky: The reference distribution that includes BitBake and core layers
  • BSP Layer: Board Support Package -- hardware-specific recipes provided by chip vendors
# Clone Poky (Yocto reference distribution)
$ git clone git://git.yoctoproject.org/poky
$ cd poky
$ git checkout kirkstone   # LTS release

# Initialize build environment
$ source oe-init-build-env

# Build a minimal image (takes 1-3 hours on first build)
$ bitbake core-image-minimal

Device Trees

Desktop PCs have BIOS/UEFI to describe hardware to the operating system. Embedded boards do not. Instead, they use device trees -- data structures that describe the hardware layout.

┌──────────────────────────────────────────────────────────────┐
│                    DEVICE TREE CONCEPT                        │
│                                                              │
│  Problem: The kernel needs to know what hardware exists.     │
│           PCs have ACPI/UEFI. Embedded boards do not.        │
│                                                              │
│  Solution: A device tree (.dtb) file describes the hardware: │
│  - CPU type and speed                                        │
│  - Memory address ranges                                     │
│  - Peripheral locations (UART, SPI, I2C, GPIO)              │
│  - Interrupt routing                                         │
│  - Clock frequencies                                         │
│                                                              │
│  Boot: U-Boot loads kernel + device tree → kernel reads DTB  │
│        and knows how to talk to all hardware                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

A device tree source file (.dts) looks like this:

/ {
    model = "My Custom Board";
    compatible = "mycompany,myboard";

    memory@80000000 {
        device_type = "memory";
        reg = <0x80000000 0x10000000>;  /* 256MB at address 0x80000000 */
    };

    leds {
        compatible = "gpio-leds";
        status-led {
            label = "status";
            gpios = <&gpio1 15 0>;      /* GPIO1 pin 15 */
        };
    };

    serial@44e09000 {
        compatible = "ti,omap3-uart";
        reg = <0x44e09000 0x2000>;
        interrupts = <72>;
    };
};

The .dts source is compiled into a binary .dtb file that the bootloader passes to the kernel.


Boot Process Differences

The embedded boot process differs significantly from desktop/server Linux:

┌──────────────────────────────────────────────────────────────┐
│          DESKTOP/SERVER BOOT      vs    EMBEDDED BOOT         │
│                                                              │
│  BIOS/UEFI                       ROM bootloader              │
│       │                               │                      │
│       ▼                               ▼                      │
│  GRUB (bootloader)               U-Boot                      │
│       │                               │                      │
│       ▼                               ▼                      │
│  Kernel + initramfs              Kernel + device tree         │
│       │                               │                      │
│       ▼                               ▼                      │
│  systemd (full init)             BusyBox init or custom       │
│       │                               │                      │
│       ▼                               ▼                      │
│  200+ services start             5-10 services start          │
│       │                               │                      │
│       ▼                               ▼                      │
│  Login prompt (30-60s)           Application ready (1-5s)     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Fast boot is critical in embedded systems. A car's infotainment system cannot take 30 seconds to boot. Techniques for fast boot include:

  • Minimal kernel with only required drivers compiled in (no modules)
  • No initramfs (mount root directly)
  • Minimal init (skip systemd, use BusyBox init or a direct exec of the application)
  • Kernel command-line tuning (quiet, lpj= to skip calibration)

Resource Constraints

Embedded development is the art of working within tight limits:

┌──────────────────────────────────────────────────────────────┐
│              RESOURCE COMPARISON                              │
│                                                              │
│  Resource       Desktop/Server    Embedded Device            │
│  ─────────      ──────────────    ────────────────           │
│  RAM            8-256 GB          16 MB - 1 GB               │
│  Storage        100 GB - 10 TB    32 MB - 4 GB (flash)       │
│  CPU            2-128 cores       1-4 cores (low clock)      │
│  Network        1-100 Gbps        10/100 Mbps or WiFi        │
│  Power          200-2000 W        0.5-10 W                   │
│  Cooling        Fans, liquid      Passive (no fans)          │
│  Display        1-4 monitors      None, or small LCD         │
│  Lifetime       3-5 years         5-15 years                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Managing Flash Storage

Flash memory wears out after a limited number of write cycles (typically 10,000-100,000). Embedded Linux must be careful about writes:

  • Use read-only root filesystem where possible
  • Mount /tmp and /var/log as tmpfs (RAM-based)
  • Use wear-leveling filesystems like UBIFS, JFFS2, or F2FS
  • Avoid excessive logging to flash
# Common embedded filesystem mount strategy:
# /           → read-only squashfs or ext4 (ro)
# /tmp        → tmpfs (in RAM, lost on reboot)
# /var/log    → tmpfs or size-limited log partition
# /data       → read-write partition for persistent data

Real-Time Linux: PREEMPT_RT

Some embedded applications need real-time guarantees: a robot arm must respond to sensor input within microseconds, or an airbag controller must fire within a strict deadline.

Standard Linux is not a real-time operating system. It optimizes for throughput, not guaranteed latency. The PREEMPT_RT patch set modifies the kernel to provide hard real-time capabilities:

  • Makes nearly all kernel code preemptible
  • Converts spinlocks to sleeping locks
  • Provides priority inheritance to prevent priority inversion
  • Reduces worst-case latency from milliseconds to microseconds
# Check if your kernel has PREEMPT_RT
$ uname -a | grep -i rt
# or
$ grep -i preempt /boot/config-$(uname -r)
CONFIG_PREEMPT_VOLUNTARY=y       # Desktop default (not RT)
# vs
CONFIG_PREEMPT_RT=y              # Real-time kernel

Think About It: Why would you not use a real-time kernel for everything? What are the trade-offs of PREEMPT_RT in terms of overall system throughput?


Raspberry Pi as a Learning Platform

The Raspberry Pi is the perfect platform for learning embedded Linux because it is cheap, widely available, well-documented, and powerful enough to run a full Linux distribution.

┌──────────────────────────────────────────────────────────────┐
│              RASPBERRY PI FOR EMBEDDED LEARNING               │
│                                                              │
│  Hardware: Raspberry Pi 4 / Pi 5                             │
│  • ARM Cortex-A76 CPU (Pi 5) / Cortex-A72 (Pi 4)            │
│  • 1-8 GB RAM                                                │
│  • MicroSD for storage                                       │
│  • GPIO pins for hardware interfacing                        │
│  • HDMI, USB, Ethernet, WiFi, Bluetooth                      │
│                                                              │
│  Good for learning:                                          │
│  • Cross-compilation                                         │
│  • Building custom kernels                                   │
│  • Buildroot / Yocto images                                  │
│  • GPIO programming                                         │
│  • Device tree overlays                                      │
│  • Boot process customization                                │
│  • Real-time applications (with PREEMPT_RT)                  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Hands-On: Raspberry Pi Exploration

If you have a Raspberry Pi running Raspberry Pi OS:

# Check the hardware
$ cat /proc/cpuinfo | grep -i "model name\|hardware\|revision"

# See device tree information
$ ls /proc/device-tree/
compatible  model  name  serial-number ...

$ cat /proc/device-tree/model
Raspberry Pi 4 Model B Rev 1.4

# Check GPIO pins
$ cat /sys/kernel/debug/gpio

# See the boot config
$ cat /boot/config.txt

# Check temperature (thermal management is important for embedded)
$ vcgencmd measure_temp
temp=42.3'C

# Check CPU frequency (may throttle under thermal stress)
$ vcgencmd measure_clock arm
frequency(48)=1500000000

Hands-On: Build a Minimal Root Filesystem

You can understand embedded Linux rootfs structure by building a minimal one on your workstation:

# Create a minimal root filesystem structure
$ mkdir -p ~/embedded-rootfs/{bin,sbin,etc,proc,sys,dev,tmp,var/log}

# Copy BusyBox as the userland
$ cp $(which busybox) ~/embedded-rootfs/bin/

# Create symlinks for common commands
$ cd ~/embedded-rootfs/bin
$ for cmd in sh ls cat echo mount mkdir; do
    ln -s busybox $cmd
  done
$ cd ~

# Create a minimal init script
$ cat > ~/embedded-rootfs/etc/init.d/rcS << 'EOF'
#!/bin/sh
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t tmpfs tmpfs /tmp
echo "Embedded Linux booted!"
echo "Hostname: $(cat /proc/sys/kernel/hostname)"
echo "Uptime: $(cat /proc/uptime | cut -d' ' -f1) seconds"
EOF
$ chmod +x ~/embedded-rootfs/etc/init.d/rcS

# See how small it is
$ du -sh ~/embedded-rootfs/
2.4M    /home/user/embedded-rootfs/

That 2.4MB directory contains a functional (if minimal) Linux userland. With a kernel and bootloader, this could boot on real hardware.


Debug This

An embedded device you are developing is experiencing these symptoms:

  • The device boots normally the first 50-100 times
  • After that, the boot starts failing with filesystem corruption errors
  • The syslog shows thousands of writes per minute to /var/log/syslog

What is going on, and how do you fix it?

Diagnosis: The flash storage is wearing out. The syslog daemon is writing excessively to the flash-based root filesystem, exceeding the write endurance of the flash chips.

Fixes:

  1. Mount /var/log as tmpfs: tmpfs /var/log tmpfs size=10M 0 0 in /etc/fstab
  2. Reduce log verbosity or disable unnecessary logging
  3. Use a log rotation scheme with strict size limits
  4. Make the root filesystem read-only and use a separate wear-leveled partition for writes
  5. Use an appropriate flash filesystem (UBIFS) instead of ext4

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 72 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Embedded Linux = Linux on resource-constrained hardware     │
│  running dedicated applications.                             │
│                                                              │
│  Key concepts:                                               │
│  • Cross-compilation: build on x86, run on ARM/MIPS          │
│  • BusyBox: 300+ utilities in a 1MB binary                   │
│  • Device trees: describe hardware to the kernel             │
│  • U-Boot: embedded bootloader (replaces GRUB)               │
│  • Fast boot: 1-5 seconds (vs. 30-60 for desktop)           │
│                                                              │
│  Build systems:                                              │
│  • Buildroot: simple, fast, good for learning                │
│  • Yocto/OpenEmbedded: industry standard, complex            │
│                                                              │
│  Constraints:                                                │
│  • Limited RAM, storage, CPU, power                          │
│  • Flash wear: minimize writes                               │
│  • Read-only root filesystems                                │
│  • PREEMPT_RT for real-time requirements                     │
│                                                              │
│  Raspberry Pi is the best learning platform for              │
│  embedded Linux development.                                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: BusyBox Exploration

Install BusyBox on your system and compare the output of busybox ls -la / with /bin/ls -la /. Note any differences in output format or options.

Exercise 2: Cross-Compile a Program

Install the ARM cross-compiler, write a simple C program, and cross-compile it. Use file to verify it is an ARM binary. If you have a Raspberry Pi, copy it over and run it.

Exercise 3: Minimal Rootfs

Expand the minimal root filesystem from the hands-on section. Add:

  • A /etc/passwd file with a root user
  • A /etc/hostname file
  • An inittab for BusyBox init

Research how to package it into a cpio archive that could be used as an initramfs.

Exercise 4: Buildroot Configuration

Download Buildroot and run make menuconfig. Explore the options without building. Note how you can select:

  • Target architecture
  • Kernel version
  • BusyBox configuration
  • Additional packages
  • Filesystem format (ext4, squashfs, cpio)

Bonus Challenge

If you have a Raspberry Pi, build a custom Buildroot image for it. Include only: BusyBox, an SSH server (dropbear -- a lightweight SSH implementation), and a simple web server (busybox httpd). Flash it to an SD card and boot it. Your goal: a Linux system that boots in under 5 seconds, runs SSH and HTTP, and uses less than 32MB of storage.

Linux on Cloud

Why This Matters

A startup has a brilliant idea. Ten years ago, they would need to buy servers, rent data center space, run cables, set up cooling, and wait weeks before writing a single line of code. Today, they open a browser, click a few buttons, and have a Linux server running in under 60 seconds. They pay only for what they use. If their app goes viral and they need 100 servers instead of one, they can scale up in minutes, not months.

This is cloud computing, and it runs overwhelmingly on Linux. Over 90% of cloud workloads run on Linux. Every major cloud provider defaults to Linux instances. The tools that manage cloud infrastructure -- Terraform, Ansible, Kubernetes, Docker -- are Linux-native.

Whether you are deploying a personal project or managing enterprise infrastructure, understanding how Linux behaves in the cloud is essential. Cloud Linux has different networking, storage, initialization, and management patterns compared to bare-metal or VM installations.


Try This Right Now

Even without a cloud account, you can explore how cloud instances identify themselves:

# Check if you are running in a cloud environment
$ systemd-detect-virt
# Returns: kvm, xen, microsoft, oracle, amazon, google, or "none"

# Check for cloud-init (the standard cloud initialization tool)
$ which cloud-init && cloud-init status
# If installed: "status: done"

# See if a metadata service is available (cloud instances only)
$ curl -s --connect-timeout 2 http://169.254.169.254/ 2>/dev/null
# Returns metadata API on cloud instances, timeout on local machines

# Check your system's DMI information for cloud indicators
$ sudo dmidecode -s system-manufacturer 2>/dev/null
# Might show: "Amazon EC2", "Google Compute Engine", "Microsoft Corporation"

If you are on a local machine, these will mostly return empty or "none." That is fine -- it shows you what to look for on cloud instances.


Cloud Computing Basics

Service Models

Cloud computing is divided into service models based on how much the provider manages:

┌──────────────────────────────────────────────────────────────┐
│                CLOUD SERVICE MODELS                           │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐    │
│  │                     YOU MANAGE                        │    │
│  │                                                      │    │
│  │  On-Premises    IaaS          PaaS          SaaS     │    │
│  │  ───────────    ────          ────          ────     │    │
│  │  Application    Application   Application   ----     │    │
│  │  Data           Data          Data          ----     │    │
│  │  Runtime        Runtime       ----          ----     │    │
│  │  Middleware      Middleware    ----          ----     │    │
│  │  OS             OS            ----          ----     │    │
│  │  ─ ─ ─ ─ ─ ─   ─ ─ ─ ─ ─    ─ ─ ─ ─ ─    ─ ─ ─   │    │
│  │  Virtualization ----          ----          ----     │    │
│  │  Servers        ----          ----          ----     │    │
│  │  Storage        ----          ----          ----     │    │
│  │  Networking     ----          ----          ----     │    │
│  │                                                      │    │
│  │                     PROVIDER MANAGES                  │    │
│  └──────────────────────────────────────────────────────┘    │
│                                                              │
│  IaaS = Infrastructure as a Service (VMs, networks, storage) │
│  PaaS = Platform as a Service (managed runtime/database)     │
│  SaaS = Software as a Service (just use the application)     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

IaaS is where Linux knowledge matters most. You get a virtual machine, install an OS (usually Linux), and manage everything from the OS up.

Open Cloud Platforms

While the largest cloud providers are commercial, several open-source platforms let you build your own cloud:

  • OpenStack: The most mature open-source cloud platform, used by many telecom companies and research institutions
  • Apache CloudStack: Powers large cloud deployments
  • Proxmox VE: Combines KVM virtualization and LXC containers with a web interface
  • oVirt: Red Hat's open-source virtualization management platform

Cloud Images vs. Regular Installs

When you install Linux on a physical machine, you boot from an ISO, answer setup questions, and wait for package installation. Cloud instances do not work that way.

Cloud instances boot from cloud images -- pre-built, minimal OS images designed for instant deployment:

┌──────────────────────────────────────────────────────────────┐
│          TRADITIONAL INSTALL vs CLOUD IMAGE                   │
│                                                              │
│  TRADITIONAL INSTALL              CLOUD IMAGE                │
│  ───────────────────              ───────────                │
│  Boot from ISO                    Image already built        │
│  Answer setup wizard              Configuration via API      │
│  Install packages (10-30 min)     Boot in 30-60 seconds      │
│  Set hostname manually            Hostname set by cloud-init │
│  Configure network manually       Network auto-configured    │
│  Create users manually            SSH keys injected          │
│  Set up SSH manually              SSH ready immediately      │
│  Unique installation each time    Same image, every time     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Cloud images are:

  • Minimal: No GUI, no unnecessary packages
  • Generic: Work on any cloud provider's hypervisor
  • Pre-configured for cloud-init: Accept configuration at first boot
  • Compressed: Small download size (300-800 MB)
# Download an Ubuntu cloud image (for local testing with QEMU/KVM)
$ wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

# Check the image format
$ qemu-img info jammy-server-cloudimg-amd64.img
image: jammy-server-cloudimg-amd64.img
file format: qcow2
virtual size: 2.2 GiB
disk size: 626 MiB

cloud-init: Instance Initialization

cloud-init is the industry standard for initializing cloud instances on first boot. Nearly every Linux cloud image includes it. It reads configuration from the cloud provider's metadata service and sets up the instance.

What cloud-init Does

┌──────────────────────────────────────────────────────────────┐
│                 cloud-init STAGES                             │
│                                                              │
│  Instance boots for the first time:                          │
│                                                              │
│  1. DETECT    → Identify cloud platform (AWS, GCP, etc.)     │
│  2. INIT      → Set hostname, configure networking           │
│  3. CONFIG    → Install packages, write files, run commands  │
│  4. FINAL     → Run user scripts, signal ready               │
│                                                              │
│  Data sources:                                               │
│  • Metadata service (http://169.254.169.254/)                │
│  • User-data (custom scripts/configs)                        │
│  • Vendor-data (provider defaults)                           │
│                                                              │
└──────────────────────────────────────────────────────────────┘

cloud-init Configuration (User Data)

When launching a cloud instance, you provide "user data" -- a cloud-init configuration file that runs on first boot:

#cloud-config

# Set the hostname
hostname: web-server-01

# Create users
users:
  - name: deploy
    groups: sudo
    shell: /bin/bash
    sudo: ALL=(ALL) NOPASSWD:ALL
    ssh_authorized_keys:
      - ssh-ed25519 AAAA... deploy@laptop

# Install packages
package_update: true
packages:
  - nginx
  - htop
  - curl
  - fail2ban

# Write files
write_files:
  - path: /var/www/html/index.html
    content: |
      <h1>Server provisioned by cloud-init</h1>
    owner: www-data:www-data
    permissions: '0644'

# Run commands on first boot
runcmd:
  - systemctl enable --now nginx
  - systemctl enable --now fail2ban
  - echo "Instance provisioned at $(date)" >> /var/log/provision.log

# Configure timezone
timezone: UTC

Checking cloud-init Status

# Check if cloud-init finished successfully
$ cloud-init status
status: done

# See detailed cloud-init output
$ cloud-init status --long

# View cloud-init logs
$ cat /var/log/cloud-init-output.log

# See what cloud-init configured
$ cloud-init query instance_id
$ cloud-init query region
$ cloud-init query local_hostname

Think About It: cloud-init runs only on first boot by default. If you change the user data and reboot, the changes will not apply. How would you handle configuration changes after the initial boot? (Hint: think about what we learned in Chapters 67 and 68.)


The Metadata Service

Every cloud instance has access to a metadata service at a well-known IP address: 169.254.169.254. This link-local address is routed internally by the cloud provider and provides information about the instance.

# Query the metadata service (example for a generic cloud instance)
$ curl -s http://169.254.169.254/latest/meta-data/

# Common metadata endpoints (vary by provider):
$ curl -s http://169.254.169.254/latest/meta-data/instance-id
$ curl -s http://169.254.169.254/latest/meta-data/local-ipv4
$ curl -s http://169.254.169.254/latest/meta-data/public-ipv4
$ curl -s http://169.254.169.254/latest/meta-data/hostname
$ curl -s http://169.254.169.254/latest/meta-data/instance-type

# Retrieve user-data (your cloud-init config)
$ curl -s http://169.254.169.254/latest/user-data

Safety Warning: The metadata service can expose sensitive information, including temporary security credentials. If your instance runs a web application, ensure that users cannot proxy requests to 169.254.169.254. This is a well-known attack vector called SSRF (Server-Side Request Forgery). Many cloud providers now offer IMDSv2, which requires a token for metadata access.


Cloud Networking

Cloud networking differs significantly from physical networking:

┌──────────────────────────────────────────────────────────────┐
│              CLOUD NETWORKING CONCEPTS                        │
│                                                              │
│  VPC (Virtual Private Cloud)                                 │
│  └── Your isolated network in the cloud                      │
│      ├── Subnet A (10.0.1.0/24) - Public                     │
│      │   ├── Instance 1 (10.0.1.10) + Public IP              │
│      │   └── Instance 2 (10.0.1.11) + Public IP              │
│      ├── Subnet B (10.0.2.0/24) - Private                    │
│      │   ├── Database (10.0.2.10) - No public IP             │
│      │   └── Cache (10.0.2.11) - No public IP                │
│      ├── Internet Gateway - connects VPC to internet         │
│      ├── NAT Gateway - lets private instances reach out      │
│      └── Route Tables - control traffic flow                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Security Groups vs. Firewalls

Traditional Linux firewalls (iptables, nftables) work at the OS level. Cloud security groups work at the hypervisor level, before traffic reaches your instance:

┌──────────────────────────────────────────────────────────────┐
│           SECURITY GROUPS vs IPTABLES                        │
│                                                              │
│  SECURITY GROUP (cloud-level)                                │
│  ─────────────────────────────                               │
│  • Managed through cloud API/console                         │
│  • Stateful (return traffic auto-allowed)                    │
│  • Applied per-instance or per-network interface             │
│  • Default: deny all inbound, allow all outbound             │
│  • Cannot see or modify from inside the instance             │
│                                                              │
│  IPTABLES/NFTABLES (OS-level)                                │
│  ──────────────────────────────                              │
│  • Managed from inside the instance                          │
│  • Additional layer of defense                               │
│  • Can do things security groups cannot (rate limiting, etc.) │
│                                                              │
│  Best practice: USE BOTH. Security groups for broad rules,   │
│  OS firewall for fine-grained control.                       │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Cloud Storage

Cloud storage comes in several types, and understanding them is critical:

Instance Storage (Ephemeral)

  • Temporary storage attached directly to the host
  • Data is lost when the instance stops or terminates
  • Very fast (local SSD)
  • Use for: temp files, caches, scratch data

Block Storage (Persistent)

  • Network-attached volumes (like a virtual hard drive)
  • Persists independently of the instance
  • Can be detached and reattached to different instances
  • Use for: databases, application data, anything that must survive reboots
# On a cloud instance, check your block devices
$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
xvda    202:0    0   20G  0 disk
└─xvda1 202:1    0   20G  0 part /
xvdb    202:16   0  100G  0 disk                  ← Additional volume

# Format and mount an additional volume
$ sudo mkfs.ext4 /dev/xvdb
$ sudo mkdir /data
$ sudo mount /dev/xvdb /data

# Make it persistent
$ echo '/dev/xvdb /data ext4 defaults 0 2' | sudo tee -a /etc/fstab

Object Storage

  • Stores files as objects with metadata
  • Accessed via HTTP API (not mounted as a filesystem)
  • Virtually unlimited capacity
  • Use for: backups, static assets, log archives, media files
# Using s3cmd (open-source S3-compatible client)
$ sudo apt install -y s3cmd
$ s3cmd --configure

# Upload a file
$ s3cmd put backup.tar.gz s3://my-bucket/backups/

# List bucket contents
$ s3cmd ls s3://my-bucket/backups/

# Download a file
$ s3cmd get s3://my-bucket/backups/backup.tar.gz

Auto-Scaling Concepts

One of the most powerful cloud features is automatic scaling -- adding or removing instances based on demand:

┌──────────────────────────────────────────────────────────────┐
│                    AUTO-SCALING                               │
│                                                              │
│  Low traffic (night):                                        │
│  Load Balancer ──► [Instance 1] [Instance 2]                 │
│                    (2 instances, 15% CPU each)               │
│                                                              │
│  Normal traffic (day):                                       │
│  Load Balancer ──► [Inst 1] [Inst 2] [Inst 3] [Inst 4]      │
│                    (4 instances, 40% CPU each)               │
│                                                              │
│  Traffic spike (sale event):                                 │
│  Load Balancer ──► [1] [2] [3] [4] [5] [6] [7] [8]         │
│                    (8 instances, 60% CPU each)               │
│                                                              │
│  Scaling rules:                                              │
│  • Scale up when avg CPU > 70% for 5 minutes                │
│  • Scale down when avg CPU < 30% for 10 minutes             │
│  • Minimum: 2 instances (for redundancy)                     │
│  • Maximum: 20 instances (cost control)                      │
│                                                              │
└──────────────────────────────────────────────────────────────┘

For auto-scaling to work, your application must be stateless -- any instance can handle any request. Shared state goes in a database or cache, not on local disk.


Hands-On: Infrastructure with Terraform

Terraform (and its open-source fork OpenTofu) describes cloud infrastructure in code. Here is a taste of how it works.

Install Terraform/OpenTofu

# Install OpenTofu (open-source Terraform fork)
$ curl -fsSL https://get.opentofu.org/install-opentofu.sh | sudo bash -s -- --install-method standalone

# Or install Terraform
$ wget https://releases.hashicorp.com/terraform/1.7.0/terraform_1.7.0_linux_amd64.zip
$ unzip terraform_1.7.0_linux_amd64.zip
$ sudo mv terraform /usr/local/bin/

Terraform Basics

Terraform uses HCL (HashiCorp Configuration Language) to describe resources:

# main.tf -- Example infrastructure definition

# Configure the provider
terraform {
  required_providers {
    # This is an example -- specific providers vary by cloud
    libvirt = {
      source = "dmacvicar/libvirt"
    }
  }
}

# Define a virtual machine
resource "libvirt_domain" "web_server" {
  name   = "web-server-01"
  memory = "2048"
  vcpu   = 2

  disk {
    volume_id = libvirt_volume.web_disk.id
  }

  network_interface {
    network_name = "default"
  }

  cloudinit = libvirt_cloudinit_disk.web_init.id
}

# Define a cloud-init disk
resource "libvirt_cloudinit_disk" "web_init" {
  name      = "web-init.iso"
  user_data = <<-EOF
    #cloud-config
    hostname: web-server-01
    packages:
      - nginx
    runcmd:
      - systemctl enable --now nginx
  EOF
}

# Define the disk volume
resource "libvirt_volume" "web_disk" {
  name   = "web-disk.qcow2"
  pool   = "default"
  source = "/var/lib/libvirt/images/ubuntu-cloud.img"
  format = "qcow2"
}

The Terraform Workflow

# Initialize (download providers)
$ terraform init

# Preview changes
$ terraform plan

# Apply changes (create infrastructure)
$ terraform apply

# Destroy infrastructure when done
$ terraform destroy
┌──────────────────────────────────────────────────────────────┐
│                TERRAFORM WORKFLOW                             │
│                                                              │
│  terraform init ──► Download providers and modules           │
│         │                                                    │
│         ▼                                                    │
│  terraform plan ──► Show what will be created/changed        │
│         │                                                    │
│         ▼                                                    │
│  terraform apply ──► Create/modify infrastructure            │
│         │                                                    │
│         ▼                                                    │
│  terraform.tfstate ──► State file (tracks what exists)       │
│                                                              │
│  Later:                                                      │
│  terraform destroy ──► Remove all managed infrastructure     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The state file (terraform.tfstate) is critical -- it maps your configuration to real-world resources. In a team setting, store it in a shared backend (S3, Consul, etc.), never in Git.

Distro Note: Terraform/OpenTofu work identically regardless of your local Linux distribution. The provider plugins handle cloud-specific differences. You write the same HCL whether you are on Ubuntu, Fedora, or Arch.


Cloud CLI Tools

Each cloud platform has a CLI tool. For open-source and self-hosted clouds:

# OpenStack CLI
$ pip install python-openstackclient
$ openstack server list
$ openstack server create --flavor m1.small --image ubuntu-22.04 my-server

# Proxmox (via API)
$ curl -s https://proxmox:8006/api2/json/nodes/pve/qemu \
    -H "Authorization: PVEAPIToken=user@pam!token=uuid"

For working with cloud-compatible storage (S3-compatible APIs):

# MinIO client (open-source, works with any S3-compatible storage)
$ wget https://dl.min.io/client/mc/release/linux-amd64/mc
$ chmod +x mc && sudo mv mc /usr/local/bin/

# Configure a connection
$ mc alias set myminio http://minio-server:9000 ACCESS_KEY SECRET_KEY

# Basic operations
$ mc ls myminio/
$ mc mb myminio/my-bucket
$ mc cp file.txt myminio/my-bucket/

Debug This

A cloud instance launched with the following cloud-init user data, but nginx is not running and the deploy user was not created:

#cloud-config

users:
  - name: deploy
    groups: sudo
    ssh-authorized-keys:
      - ssh-rsa AAAA... deploy@laptop

packages:
  - nginx

runcmd:
  - systemctl enable nginx
  - systemctl start nginx

The cloud-init log shows: status: done with no errors. What went wrong?

Answer: The YAML key for SSH keys is wrong. It should be ssh_authorized_keys (underscores), not ssh-authorized-keys (hyphens). cloud-init silently ignores unknown keys. The deploy user was created but without SSH key access, and since there was no way to log in as deploy, it appeared the user was not created.

Also, runcmd should combine enable and start: systemctl enable --now nginx. If the package installation has not finished when runcmd executes, the service file might not exist yet. Adding package_update: true and listing packages before runcmd ensures proper ordering.

Always check /var/log/cloud-init-output.log for detailed diagnostics.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 73 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Cloud computing lets you create Linux infrastructure        │
│  on demand, pay for what you use, and scale instantly.       │
│                                                              │
│  Key concepts:                                               │
│  • IaaS/PaaS/SaaS: how much the provider manages            │
│  • Cloud images: pre-built, minimal, instant-boot            │
│  • cloud-init: configures instances on first boot            │
│  • Metadata service (169.254.169.254): instance info         │
│  • Security groups: cloud-level firewall                     │
│  • Block storage: persistent virtual disks                   │
│  • Object storage: S3-compatible file storage                │
│  • Auto-scaling: add/remove instances based on demand        │
│                                                              │
│  Tools:                                                      │
│  • Terraform/OpenTofu: provision cloud resources as code     │
│  • cloud-init: initialize instances declaratively            │
│  • OpenStack, Proxmox: open-source cloud platforms           │
│                                                              │
│  Best practices:                                             │
│  • Use cloud-init for initial setup, IaC for ongoing config  │
│  • Layer security: security groups + OS firewall             │
│  • Separate persistent data to block/object storage          │
│  • Design for failure: instances can disappear               │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: cloud-init Local Testing

You can test cloud-init configurations locally without a cloud account:

$ sudo apt install -y cloud-init
$ cloud-init devel schema --config-file your-config.yml

Write a cloud-init config that installs three packages, creates a user, and writes a custom /etc/motd. Validate it with the schema tool.

Exercise 2: Terraform Exploration

Install OpenTofu or Terraform and explore the CLI:

$ tofu version    # or terraform version
$ tofu providers  # List available providers

Write a simple .tf file (it does not need to connect to a real cloud). Run tofu init and tofu plan to see how Terraform processes your configuration.

Exercise 3: Metadata Service

If you have access to any cloud instance, query the metadata service and document all the information it provides. Think about which fields might be sensitive.

Exercise 4: Cloud Image Investigation

Download an Ubuntu or AlmaLinux cloud image. Mount it locally using qemu-nbd or guestmount and explore its contents. Compare the installed packages and filesystem size to a regular installation.

Bonus Challenge

Set up a local cloud using Proxmox VE or OpenStack DevStack in a VM. Create a Linux instance using a cloud image, pass it cloud-init user data, and verify the configuration was applied. Then manage the same infrastructure using Terraform with the appropriate provider.

Database Ops Basics

Why This Matters

At 3am, the on-call page goes off: "Database connection errors." You are the Linux admin, not the DBA. But the DBA is on vacation. The application team is panicking. You need to figure out whether the database server is healthy, whether it is accepting connections, why it is slow, and possibly restore from a backup -- all things that require basic database knowledge.

Every significant application stores data in a database. As a Linux administrator, you will install databases, manage their services, configure authentication, set up backups, monitor performance, and troubleshoot failures. You do not need to write complex SQL queries or design schemas, but you absolutely need to know how to keep database services running.

This chapter covers the two most popular open-source databases from an operations perspective: PostgreSQL and MariaDB (the open-source MySQL fork).


Try This Right Now

Check if you have any databases installed:

# Check for PostgreSQL
$ which psql && psql --version
psql (PostgreSQL) 15.5

# Check for MariaDB/MySQL
$ which mysql && mysql --version
mysql  Ver 15.1 Distrib 10.11.4-MariaDB

# Check for running database processes
$ systemctl list-units --type=service | grep -iE "postgres|mysql|mariadb"

# Check which ports databases typically use
$ ss -tlnp | grep -E ':5432|:3306'
LISTEN 0 128  0.0.0.0:5432  0.0.0.0:*  users:(("postgres",pid=1234,fd=6))
LISTEN 0 80   0.0.0.0:3306  0.0.0.0:*  users:(("mariadbd",pid=5678,fd=22))

If nothing is installed, do not worry -- we will install them next.


PostgreSQL: Installation and Basic Administration

PostgreSQL (often called "Postgres") is the most advanced open-source relational database. It is known for data integrity, standards compliance, and extensibility.

Installing PostgreSQL

# Debian/Ubuntu
$ sudo apt update
$ sudo apt install -y postgresql postgresql-client

# Fedora
$ sudo dnf install -y postgresql-server postgresql
$ sudo postgresql-setup --initdb   # Initialize data directory
# RHEL/AlmaLinux/Rocky
$ sudo dnf install -y postgresql-server postgresql
$ sudo postgresql-setup --initdb
# Start and enable
$ sudo systemctl enable --now postgresql

Distro Note: On Debian/Ubuntu, PostgreSQL is initialized automatically during package installation. On RHEL-family systems, you must run postgresql-setup --initdb manually before starting the service.

Verify Installation

$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
     Active: active (running)

$ sudo -u postgres psql -c "SELECT version();"
                          version
------------------------------------------------------------
 PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc...

The postgres User

PostgreSQL creates a system user called postgres. By default, authentication uses "peer" mode -- the Linux username must match the PostgreSQL username:

# Switch to the postgres user to administer the database
$ sudo -u postgres psql
postgres=#

You are now in the PostgreSQL interactive shell. Type \q to exit.

Creating Databases and Users

# Method 1: Using command-line tools
$ sudo -u postgres createuser --interactive myappuser
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) y
Shall the new role be allowed to create more new roles? (y/n) n

$ sudo -u postgres createdb --owner=myappuser myappdb

# Method 2: Using SQL
$ sudo -u postgres psql
postgres=# CREATE USER myappuser WITH PASSWORD 'SecurePassword123!';
CREATE ROLE
postgres=# CREATE DATABASE myappdb OWNER myappuser;
CREATE DATABASE
postgres=# GRANT ALL PRIVILEGES ON DATABASE myappdb TO myappuser;
GRANT
postgres=# \q

Useful psql Commands

$ sudo -u postgres psql
-- List all databases
\l

-- Connect to a database
\c myappdb

-- List all tables in current database
\dt

-- Describe a table's structure
\d tablename

-- List all users/roles
\du

-- Show current connection info
\conninfo

-- Run a query
SELECT datname, pg_size_pretty(pg_database_size(datname))
FROM pg_database;

-- Show running queries
SELECT pid, now() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Exit
\q

Configuring PostgreSQL Authentication: pg_hba.conf

pg_hba.conf (Host-Based Authentication) controls who can connect and how they authenticate:

# Find the file
$ sudo -u postgres psql -c "SHOW hba_file;"
              hba_file
-------------------------------------
 /etc/postgresql/15/main/pg_hba.conf
$ sudo cat /etc/postgresql/15/main/pg_hba.conf

Key lines:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
local   all             postgres                                peer
local   all             all                                     peer
host    all             all             127.0.0.1/32            scram-sha-256
host    all             all             ::1/128                 scram-sha-256
host    all             all             10.0.0.0/8              scram-sha-256
┌──────────────────────────────────────────────────────────────┐
│              pg_hba.conf AUTHENTICATION METHODS               │
│                                                              │
│  METHOD          DESCRIPTION                                 │
│  ──────          ───────────                                 │
│  peer            Match Linux username to PostgreSQL role      │
│                  (local connections only)                     │
│  scram-sha-256   Password authentication (most secure)       │
│  md5             Password auth (older, less secure)           │
│  trust           No authentication (NEVER in production!)    │
│  reject          Deny connection                             │
│                                                              │
│  TYPE            DESCRIPTION                                 │
│  ────            ───────────                                 │
│  local           Unix socket connection                      │
│  host            TCP/IP connection (with or without SSL)     │
│  hostssl         TCP/IP with SSL required                    │
│                                                              │
└──────────────────────────────────────────────────────────────┘

After changing pg_hba.conf, reload PostgreSQL:

$ sudo systemctl reload postgresql

Safety Warning: Never set a production database to use trust authentication. This allows anyone who can reach the database to connect without a password. Always use scram-sha-256 for password-based connections.

Configuring PostgreSQL: postgresql.conf

The main configuration file controls performance and behavior:

$ sudo -u postgres psql -c "SHOW config_file;"
                config_file
-------------------------------------------
 /etc/postgresql/15/main/postgresql.conf

Key settings:

# Listen on all interfaces (default: localhost only)
listen_addresses = '*'           # Or specific IP: '10.0.0.5'

# Maximum connections
max_connections = 200

# Memory settings
shared_buffers = 2GB             # 25% of total RAM
effective_cache_size = 6GB       # 75% of total RAM
work_mem = 16MB                  # Per-operation memory
maintenance_work_mem = 512MB     # For VACUUM, CREATE INDEX

# Logging
log_directory = 'log'
log_min_duration_statement = 1000  # Log queries slower than 1 second

Think About It: Why would setting max_connections too high actually hurt performance? Think about memory usage per connection and CPU context switching.


MariaDB/MySQL: Installation and Basic Administration

MariaDB is a community-maintained fork of MySQL, created by MySQL's original developer. It is drop-in compatible with MySQL but developed independently.

Installing MariaDB

# Debian/Ubuntu
$ sudo apt update
$ sudo apt install -y mariadb-server mariadb-client

# Fedora
$ sudo dnf install -y mariadb-server mariadb

# RHEL/AlmaLinux/Rocky
$ sudo dnf install -y mariadb-server mariadb
# Start and enable
$ sudo systemctl enable --now mariadb

Secure the Installation

MariaDB ships with insecure defaults. Always run the security script first:

$ sudo mysql_secure_installation

This script:

  1. Sets a root password (or switches to unix_socket authentication)
  2. Removes anonymous users
  3. Disables remote root login
  4. Removes the test database
  5. Reloads privilege tables

Answer Y to all prompts for a secure installation.

Connecting to MariaDB

# Connect as root (uses unix socket authentication on modern installs)
$ sudo mysql
MariaDB [(none)]>

# Or with password
$ mysql -u root -p
Enter password:

Creating Databases and Users

-- Create a database
CREATE DATABASE myappdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Create a user
CREATE USER 'myappuser'@'localhost' IDENTIFIED BY 'SecurePassword123!';

-- Grant privileges
GRANT ALL PRIVILEGES ON myappdb.* TO 'myappuser'@'localhost';

-- For remote access (specific IP only)
CREATE USER 'myappuser'@'10.0.0.%' IDENTIFIED BY 'SecurePassword123!';
GRANT ALL PRIVILEGES ON myappdb.* TO 'myappuser'@'10.0.0.%';

-- Apply changes
FLUSH PRIVILEGES;

-- Verify
SHOW GRANTS FOR 'myappuser'@'localhost';

Useful MySQL/MariaDB Commands

-- Show all databases
SHOW DATABASES;

-- Use a database
USE myappdb;

-- Show all tables
SHOW TABLES;

-- Describe a table
DESCRIBE tablename;

-- Show all users
SELECT user, host FROM mysql.user;

-- Show running queries
SHOW PROCESSLIST;

-- Show database sizes
SELECT table_schema AS "Database",
       ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size (MB)"
FROM information_schema.tables
GROUP BY table_schema;

-- Show server status
SHOW STATUS LIKE 'Conn%';
SHOW VARIABLES LIKE 'max_connections';

Backup and Restore

Backups are not optional. If you manage a database and do not have tested backups, you are one hardware failure away from disaster.

PostgreSQL Backup

# Dump a single database
$ sudo -u postgres pg_dump myappdb > myappdb_backup.sql

# Dump with compression
$ sudo -u postgres pg_dump myappdb | gzip > myappdb_$(date +%Y%m%d).sql.gz

# Dump all databases
$ sudo -u postgres pg_dumpall > all_databases.sql

# Custom format (supports parallel restore)
$ sudo -u postgres pg_dump -Fc myappdb > myappdb.dump

PostgreSQL Restore

# Restore from SQL dump
$ sudo -u postgres psql myappdb < myappdb_backup.sql

# Restore from custom format
$ sudo -u postgres pg_restore -d myappdb myappdb.dump

# Create database and restore
$ sudo -u postgres createdb myappdb_restored
$ sudo -u postgres pg_restore -d myappdb_restored myappdb.dump

MariaDB Backup

# Dump a single database
$ sudo mysqldump myappdb > myappdb_backup.sql

# Dump with compression
$ sudo mysqldump myappdb | gzip > myappdb_$(date +%Y%m%d).sql.gz

# Dump all databases
$ sudo mysqldump --all-databases > all_databases.sql

# Dump with routines and triggers
$ sudo mysqldump --routines --triggers myappdb > myappdb_full.sql

MariaDB Restore

# Restore from SQL dump
$ sudo mysql myappdb < myappdb_backup.sql

# Or within the MySQL shell
MariaDB [(none)]> SOURCE /path/to/myappdb_backup.sql;

Automated Backup Script

#!/bin/bash
# /opt/scripts/db-backup.sh

BACKUP_DIR="/opt/backups/databases"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30

mkdir -p "$BACKUP_DIR"

# PostgreSQL backups
for db in $(sudo -u postgres psql -At -c "SELECT datname FROM pg_database WHERE datistemplate = false AND datname != 'postgres'"); do
    echo "Backing up PostgreSQL database: $db"
    sudo -u postgres pg_dump -Fc "$db" > "${BACKUP_DIR}/pg_${db}_${DATE}.dump"
done

# MariaDB backups
for db in $(sudo mysql -BNe "SHOW DATABASES" | grep -v -E "^(information_schema|performance_schema|mysql|sys)$"); do
    echo "Backing up MariaDB database: $db"
    sudo mysqldump "$db" | gzip > "${BACKUP_DIR}/mysql_${db}_${DATE}.sql.gz"
done

# Remove old backups
find "$BACKUP_DIR" -type f -mtime +${RETENTION_DAYS} -delete

echo "Backup completed: $(date)"

Schedule it with cron (see Chapter 24):

$ sudo crontab -e
# Daily at 2am
0 2 * * * /opt/scripts/db-backup.sh >> /var/log/db-backup.log 2>&1

Safety Warning: Always test your backups by restoring them to a test database. A backup you have never restored is a backup you cannot trust. Schedule restore tests at least monthly.


Monitoring Database Processes

PostgreSQL Monitoring

# Check active connections
$ sudo -u postgres psql -c "SELECT count(*) FROM pg_stat_activity;"

# Find long-running queries
$ sudo -u postgres psql -c "
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < now() - interval '5 seconds'
ORDER BY duration DESC;"

# Check database sizes
$ sudo -u postgres psql -c "
SELECT datname,
       pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;"

# Check for locks
$ sudo -u postgres psql -c "
SELECT blocked.pid AS blocked_pid,
       blocked.query AS blocked_query,
       blocking.pid AS blocking_pid,
       blocking.query AS blocking_query
FROM pg_stat_activity AS blocked
JOIN pg_locks AS blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks AS blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
  AND blocked_locks.relation = blocking_locks.relation
JOIN pg_stat_activity AS blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted
  AND blocked.pid != blocking.pid;"

# Kill a runaway query
$ sudo -u postgres psql -c "SELECT pg_terminate_backend(12345);"

MariaDB Monitoring

# Check active connections
$ sudo mysql -e "SHOW STATUS LIKE 'Threads_connected';"

# Show running queries
$ sudo mysql -e "SHOW FULL PROCESSLIST;"

# Check database sizes
$ sudo mysql -e "
SELECT table_schema AS db,
       ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
GROUP BY table_schema
ORDER BY SUM(data_length + index_length) DESC;"

# Kill a runaway query
$ sudo mysql -e "KILL 12345;"

# Show InnoDB status (detailed engine info)
$ sudo mysql -e "SHOW ENGINE INNODB STATUS\G" | head -50

Hands-On: Set Up a Database from Scratch

Let us install PostgreSQL and create a working database.

Step 1: Install and start PostgreSQL:

$ sudo apt install -y postgresql postgresql-client
$ sudo systemctl enable --now postgresql

Step 2: Create a user and database:

$ sudo -u postgres psql << 'SQL'
CREATE USER webapp WITH PASSWORD 'WebApp2024!';
CREATE DATABASE webapp_production OWNER webapp;
\c webapp_production
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO users (username, email) VALUES
    ('alice', 'alice@example.com'),
    ('bob', 'bob@example.com'),
    ('carol', 'carol@example.com');
GRANT ALL ON ALL TABLES IN SCHEMA public TO webapp;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO webapp;
SQL

Step 3: Test the connection:

$ PGPASSWORD='WebApp2024!' psql -U webapp -d webapp_production -h localhost -c "SELECT * FROM users;"
 id | username |       email        |         created_at
----+----------+--------------------+----------------------------
  1 | alice    | alice@example.com  | 2025-06-15 10:30:00.000000
  2 | bob      | bob@example.com    | 2025-06-15 10:30:00.000000
  3 | carol    | carol@example.com  | 2025-06-15 10:30:00.000000

Step 4: Take a backup and restore:

# Backup
$ sudo -u postgres pg_dump -Fc webapp_production > /tmp/webapp_backup.dump

# Create a test database and restore
$ sudo -u postgres createdb webapp_test
$ sudo -u postgres pg_restore -d webapp_test /tmp/webapp_backup.dump

# Verify
$ sudo -u postgres psql -d webapp_test -c "SELECT count(*) FROM users;"
 count
-------
     3

Log Configuration

Database logs are critical for troubleshooting. Configure them properly from day one.

PostgreSQL Logging

Edit postgresql.conf:

# Log destination
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_age = 1d
log_rotation_size = 100MB

# What to log
log_min_duration_statement = 500    # Log queries taking > 500ms
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0                  # Log all temp file usage

# Log format
log_line_prefix = '%t [%p] %u@%d '  # timestamp [pid] user@database

MariaDB Logging

Edit /etc/mysql/mariadb.conf.d/50-server.cnf:

[mysqld]
# Error log
log_error = /var/log/mysql/error.log

# Slow query log
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow-query.log
long_query_time = 1                 # Log queries taking > 1 second

# General query log (WARNING: very verbose, use only for debugging)
# general_log = 1
# general_log_file = /var/log/mysql/general.log

Connection Pooling and Replication Overview

Connection Pooling

Database connections are expensive. Each connection consumes memory and CPU. Connection poolers sit between applications and databases, reusing connections efficiently.

┌──────────────────────────────────────────────────────────────┐
│              CONNECTION POOLING                               │
│                                                              │
│  WITHOUT POOLER:                                             │
│  App Instance 1 ──── 20 connections ──┐                      │
│  App Instance 2 ──── 20 connections ──├──► Database           │
│  App Instance 3 ──── 20 connections ──┘    (60 connections)  │
│                                                              │
│  WITH POOLER (PgBouncer):                                    │
│  App Instance 1 ──── 20 connections ──┐                      │
│  App Instance 2 ──── 20 connections ──├──► PgBouncer ──► DB  │
│  App Instance 3 ──── 20 connections ──┘    (10 actual conn.) │
│                                                              │
│  60 app connections share 10 real database connections.       │
│  Result: much lower database resource usage.                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

For PostgreSQL, PgBouncer is the standard connection pooler:

$ sudo apt install -y pgbouncer

Replication Overview

Replication copies data from one database server to others for redundancy and read scaling:

┌──────────────────────────────────────────────────────────────┐
│                    REPLICATION                                │
│                                                              │
│  PRIMARY (read-write)                                        │
│      │                                                       │
│      ├──── Streaming replication ──► REPLICA 1 (read-only)   │
│      │                                                       │
│      └──── Streaming replication ──► REPLICA 2 (read-only)   │
│                                                              │
│  Writes go to PRIMARY only.                                  │
│  Reads can be served by any replica.                         │
│  If PRIMARY fails, a replica can be promoted.                │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Setting up replication is beyond the scope of this basics chapter, but knowing the concept is essential for understanding production database architectures.


Debug This

A developer reports: "The application cannot connect to the database." You are the Linux admin. Walk through the diagnosis:

# 1. Is the database service running?
$ sudo systemctl status postgresql
# or
$ sudo systemctl status mariadb

# 2. Is it listening on the expected port?
$ ss -tlnp | grep -E ':5432|:3306'

# 3. Can you connect locally?
$ sudo -u postgres psql -c "SELECT 1;"
# or
$ sudo mysql -e "SELECT 1;"

# 4. Is the firewall blocking connections?
$ sudo iptables -L -n | grep -E '5432|3306'

# 5. Does the authentication config allow the connection?
# PostgreSQL: check pg_hba.conf
# MariaDB: check user host permissions

# 6. Is the database out of connections?
$ sudo -u postgres psql -c "SELECT count(*) FROM pg_stat_activity;"
$ sudo -u postgres psql -c "SHOW max_connections;"

# 7. Check the database logs
$ sudo tail -50 /var/log/postgresql/postgresql-15-main.log
# or
$ sudo tail -50 /var/log/mysql/error.log

Common causes:

  1. Service not running (crash, OOM kill)
  2. Listening only on localhost but application connects from another host
  3. pg_hba.conf or MySQL user host restrictions blocking the connection
  4. Firewall blocking the port
  5. max_connections reached
  6. Wrong password or user does not exist

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 74 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Database ops basics every Linux admin needs:                │
│                                                              │
│  PostgreSQL:                                                 │
│  • Default port: 5432                                        │
│  • Config: postgresql.conf + pg_hba.conf                     │
│  • Tools: psql, createdb, createuser, pg_dump, pg_restore    │
│  • Auth: peer (local), scram-sha-256 (network)               │
│                                                              │
│  MariaDB/MySQL:                                              │
│  • Default port: 3306                                        │
│  • First step: mysql_secure_installation                     │
│  • Tools: mysql, mysqldump                                   │
│  • Auth: unix_socket (local), password (network)             │
│                                                              │
│  Critical practices:                                         │
│  • Automated daily backups with retention policy             │
│  • TEST your backups by restoring them                       │
│  • Monitor connections, slow queries, disk usage             │
│  • Configure logging for troubleshooting                     │
│  • Never use trust/no-password auth in production            │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: PostgreSQL Setup

Install PostgreSQL, create a user and database, insert sample data, and take a backup. Restore the backup to a different database name and verify the data is intact.

Exercise 2: MariaDB Setup

Install MariaDB, run mysql_secure_installation, create a user with limited privileges (only SELECT and INSERT on a specific database), and verify the restrictions work.

Exercise 3: Monitoring Script

Write a shell script that checks:

  1. Is the database service running?
  2. How many active connections are there?
  3. What is the database disk usage?
  4. Are there any queries running longer than 60 seconds?

Run this script via cron every 5 minutes and log the output.

Exercise 4: Backup Automation

Set up the automated backup script from this chapter. Verify it creates dated backups and cleans up old ones. Test a restore from the most recent backup.

Bonus Challenge

Set up both PostgreSQL and MariaDB on the same server. Create a simple benchmark: insert 100,000 rows into a test table in each database and compare the time. Configure slow query logging on both and find any queries that exceed your threshold.

Time Synchronization

Why This Matters

A financial trading system executes a transaction at 14:32:07.003. The audit log on a different server records it at 14:32:08.147. A third server shows 14:31:59.891. When regulators ask for the exact sequence of events, nobody can say which happened first. This is not a hypothetical scenario -- it has caused real regulatory fines.

Or consider this: Kerberos authentication, used in Active Directory environments, rejects tickets if the time difference between client and server exceeds 5 minutes. Your users cannot log in to their desktops because a server's clock drifted by 6 minutes over the weekend. All because nobody set up time synchronization.

Accurate time is not a luxury. It is a dependency for:

  • Log correlation: Matching events across multiple servers
  • Authentication: Kerberos, TOTP (two-factor auth), TLS certificates
  • Distributed systems: Database replication, consensus protocols
  • Compliance: Financial regulations, healthcare audit trails
  • Backups: Determining which files have changed
  • Cron jobs: Running tasks at the right time

Getting time right is one of those things that seems trivial until it goes wrong. This chapter ensures it never goes wrong on your systems.


Try This Right Now

Check your system's time configuration in 30 seconds:

$ timedatectl

Expected output:

               Local time: Sat 2025-06-15 14:32:07 UTC
           Universal time: Sat 2025-06-15 14:32:07 UTC
                 RTC time: Sat 2025-06-15 14:32:07
                Time zone: UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

The critical lines:

  • System clock synchronized: yes -- your clock is being synchronized
  • NTP service: active -- an NTP client is running
  • RTC in local TZ: no -- the hardware clock stores UTC (correct for servers)

If "System clock synchronized" shows "no," your system's time is drifting without correction. Fix it by the end of this chapter.


UTC vs. Local Time

┌──────────────────────────────────────────────────────────────┐
│                   UTC vs LOCAL TIME                           │
│                                                              │
│  UTC (Coordinated Universal Time)                            │
│  ─────────────────────────────────                           │
│  • The global reference time                                 │
│  • Does not change for daylight saving                       │
│  • Standard for servers, logs, databases                     │
│  • Eliminates ambiguity when correlating events              │
│                                                              │
│  LOCAL TIME                                                  │
│  ──────────                                                  │
│  • UTC + timezone offset                                     │
│  • Changes with daylight saving time                         │
│  • Convenient for humans                                     │
│  • Problematic for automation and logging                    │
│                                                              │
│  BEST PRACTICE:                                              │
│  • Servers: set timezone to UTC                              │
│  • Logs: always log in UTC                                   │
│  • Databases: store timestamps in UTC                        │
│  • Display: convert to local time for users in the app       │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Setting the Timezone

# View current timezone
$ timedatectl show --property=Timezone
Timezone=UTC

# List available timezones
$ timedatectl list-timezones | grep America
America/Chicago
America/Denver
America/Los_Angeles
America/New_York
...

# Set timezone to UTC (recommended for servers)
$ sudo timedatectl set-timezone UTC

# Set timezone to a local zone (if needed)
$ sudo timedatectl set-timezone America/New_York
# The timezone is stored as a symlink
$ ls -la /etc/localtime
lrwxrwxrwx 1 root root 27 Jun 15 10:00 /etc/localtime -> /usr/share/zoneinfo/UTC

Think About It: Why is daylight saving time a problem for servers? Imagine a cron job scheduled for 2:30 AM on the night clocks "spring forward" from 2:00 to 3:00. What happens? What about when clocks "fall back" and 2:30 AM happens twice?


Hardware Clock vs. System Clock

Your Linux system maintains two separate clocks:

┌──────────────────────────────────────────────────────────────┐
│           HARDWARE CLOCK vs SYSTEM CLOCK                     │
│                                                              │
│  HARDWARE CLOCK (RTC)             SYSTEM CLOCK               │
│  ────────────────────             ────────────               │
│  • Physical chip on motherboard   • Maintained by the kernel │
│  • Runs when system is off        • Only while system is on  │
│  • Battery-backed (CMOS)          • Initialized from RTC     │
│  • Low precision (drifts)           at boot                  │
│  • Accessed via /dev/rtc          • Corrected by NTP         │
│  • Set with hwclock               • Set with timedatectl     │
│                                                              │
│  BOOT SEQUENCE:                                              │
│  1. System powers on                                         │
│  2. Kernel reads hardware clock → sets system clock          │
│  3. NTP client starts → corrects system clock                │
│  4. System clock is now accurate                             │
│                                                              │
│  SHUTDOWN:                                                   │
│  1. System writes system clock → hardware clock              │
│  2. System powers off                                        │
│  3. Hardware clock keeps ticking (drifts)                    │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Working with the Hardware Clock

# Read the hardware clock
$ sudo hwclock --show
2025-06-15 14:32:07.123456+00:00

# Sync system clock to hardware clock
$ sudo hwclock --hctosys

# Sync hardware clock to system clock
$ sudo hwclock --systohc

# Check if hardware clock stores UTC or local time
$ timedatectl | grep "RTC in local TZ"
RTC in local TZ: no     # Good -- RTC stores UTC

# Set RTC to UTC (if it is wrong)
$ sudo timedatectl set-local-rtc 0

Distro Note: On dual-boot systems with Windows, Windows historically sets the RTC to local time while Linux prefers UTC. If you dual-boot, either configure Windows to use UTC for the RTC, or set timedatectl set-local-rtc 1 on Linux (not recommended for servers).


NTP Protocol Basics

The Network Time Protocol (NTP) synchronizes clocks over the network. It has been doing this reliably since 1985 and is one of the oldest Internet protocols still in active use.

How NTP Works

┌──────────────────────────────────────────────────────────────┐
│                   NTP HIERARCHY (STRATA)                      │
│                                                              │
│  Stratum 0: Reference clocks                                 │
│             (GPS receivers, atomic clocks, radio clocks)     │
│                    │                                         │
│                    ▼                                         │
│  Stratum 1: Primary time servers                             │
│             (directly connected to stratum 0)                │
│             time.nist.gov, ntp.ubuntu.com                    │
│                    │                                         │
│                    ▼                                         │
│  Stratum 2: Secondary servers                                │
│             (synchronized to stratum 1)                      │
│             pool.ntp.org servers                             │
│                    │                                         │
│                    ▼                                         │
│  Stratum 3: Your servers                                     │
│             (synchronized to stratum 2)                      │
│                    │                                         │
│                    ▼                                         │
│  Stratum 4: Your clients                                     │
│             (synchronized to your servers)                   │
│                                                              │
│  Each stratum adds a tiny bit of inaccuracy.                 │
│  Stratum 16 = unsynchronized (not trustworthy).              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The NTP Exchange

┌──────────┐                              ┌──────────┐
│  Client  │                              │  Server  │
│          │                              │          │
│    t1 ───┤──── request packet ────────► │          │
│          │                              │ t2       │
│          │                              │          │
│          │                              │ t3       │
│    t4 ◄──┤──── response packet ◄────────┤          │
│          │                              │          │
└──────────┘                              └──────────┘

The client records 4 timestamps:
  t1 = when client sent the request
  t2 = when server received the request
  t3 = when server sent the response
  t4 = when client received the response

Offset = ((t2 - t1) + (t3 - t4)) / 2
Delay  = (t4 - t1) - (t3 - t2)

The client adjusts its clock by the calculated offset.

NTP does not simply "jump" the clock. It gradually slews (speeds up or slows down) the system clock to converge on the correct time, avoiding the problems that sudden jumps cause for running applications. Only when the offset is very large (over 128ms by default) does it step the clock.


chronyd vs. ntpd

Two NTP client implementations are common on Linux:

┌──────────────────────────────────────────────────────────────┐
│              chronyd vs ntpd                                  │
│                                                              │
│  chronyd (chrony)                   ntpd (classic NTP)       │
│  ────────────────                   ────────────────         │
│  Modern, actively developed         Classic, mature          │
│  Fast initial sync (<seconds)       Slow initial sync        │
│  Handles intermittent network        Needs stable connection │
│  Good for VMs and laptops           Designed for servers     │
│  Low memory footprint               Larger footprint         │
│  Default on RHEL, Fedora, Ubuntu    Legacy, being replaced   │
│                                                              │
│  RECOMMENDATION: Use chronyd unless you have a specific      │
│  reason to use ntpd. It is the default on modern distros.    │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Configuring chrony

Installation

# Debian/Ubuntu
$ sudo apt install -y chrony

# Fedora/RHEL/AlmaLinux/Rocky
$ sudo dnf install -y chrony

# Start and enable
$ sudo systemctl enable --now chronyd

Configuration File

$ sudo cat /etc/chrony/chrony.conf

A typical configuration:

# NTP servers to synchronize with
pool 2.pool.ntp.org iburst

# Record the rate at which the system clock gains/loses time
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC)
rtcsync

# Specify directory for log files
logdir /var/log/chrony

Customizing NTP Servers

For better accuracy and reliability, use servers geographically close to you:

# Use NTP pool for your region
pool 0.us.pool.ntp.org iburst
pool 1.us.pool.ntp.org iburst
pool 2.us.pool.ntp.org iburst
pool 3.us.pool.ntp.org iburst

# Or use specific servers
server time.cloudflare.com iburst
server time.google.com iburst

The iburst option sends multiple requests at startup for faster initial synchronization.

After editing the configuration:

$ sudo systemctl restart chronyd

NTP Pool Servers

The NTP Pool Project (pool.ntp.org) is a cluster of thousands of volunteer time servers. When you configure pool.ntp.org, DNS returns different server IPs on each query, distributing load:

# See which pool servers your system resolved to
$ chronyc sources

For enterprise environments, consider running your own internal NTP server that synchronizes with the pool, and have all internal machines sync from it. This reduces external dependencies and ensures consistency.


Verifying Time Synchronization

chronyc: The chrony Control Tool

# Show current time sources
$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* ntp1.example.com              2   6   377    34   +142us[ +187us] +/-   15ms
^+ ntp2.example.com              2   6   377    35   -523us[ -478us] +/-   18ms
^+ ntp3.example.com              3   6   377    34   +1124us[+1169us] +/-   22ms
^- ntp4.example.com              3   6   377    36   -2351us[-2306us] +/-   45ms

Understanding the output:

  • ^* = current best source (the one being used)
  • ^+ = acceptable source, ready to be selected
  • ^- = acceptable source, but too far from best
  • ^? = source not yet evaluated
  • ^x = source deemed unreliable
  • Stratum = distance from reference clock
  • Poll = polling interval (log2 seconds: 6 = 64 seconds)
  • Reach = reachability register (377 = last 8 attempts all successful)
  • Last sample = offset from this source
# Show detailed tracking information
$ chronyc tracking
Reference ID    : C0A80101 (ntp1.example.com)
Stratum         : 3
Ref time (UTC)  : Sat Jun 15 14:32:07 2025
System time     : 0.000000142 seconds fast of NTP time
Last offset     : +0.000000187 seconds
RMS offset      : 0.000000523 seconds
Frequency       : 3.128 ppm slow
Residual freq   : +0.001 ppm
Skew            : 0.012 ppm
Root delay      : 0.015234 seconds
Root dispersion : 0.001234 seconds
Update interval : 64.0 seconds
Leap status     : Normal

Key fields:

  • System time: How far the system clock is from NTP time (should be microseconds)
  • Frequency: How much the local clock drifts (in parts per million)
  • Leap status: Should be "Normal" (changes during leap second events)
# Show sources with statistics
$ chronyc sourcestats
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
ntp1.example.com           25  14   26m     -0.003      0.017   +142us    63us
ntp2.example.com           25  12   26m     +0.012      0.023   -523us    89us

Quick Health Check

# One-liner: is time synchronized?
$ timedatectl show --property=NTPSynchronized --value
yes

# Chrony: am I synced?
$ chronyc tracking | grep "System time"
System time     : 0.000000142 seconds fast of NTP time

If the system time offset is in microseconds, you are in excellent shape. Milliseconds are acceptable. Seconds mean something is wrong.


Hands-On: Setting Up chrony from Scratch

Let us configure time synchronization on a fresh system.

Step 1: Install chrony:

$ sudo apt install -y chrony   # Debian/Ubuntu
# or
$ sudo dnf install -y chrony   # RHEL/Fedora

Step 2: Configure NTP sources:

$ sudo tee /etc/chrony/chrony.conf << 'EOF'
# NTP servers
pool 0.pool.ntp.org iburst
pool 1.pool.ntp.org iburst
pool 2.pool.ntp.org iburst
pool 3.pool.ntp.org iburst

# Drift file
driftfile /var/lib/chrony/drift

# Allow initial large time correction
makestep 1.0 3

# Sync hardware clock
rtcsync

# Logging
logdir /var/log/chrony
log measurements statistics tracking
EOF

Step 3: Restart and verify:

$ sudo systemctl restart chronyd
$ chronyc sources

Wait 30-60 seconds for initial synchronization, then check:

$ chronyc tracking

Step 4: Verify with timedatectl:

$ timedatectl

Confirm:

  • NTP service: active
  • System clock synchronized: yes

Step 5: Test resilience -- force a time offset and watch chrony correct it:

# Check current time
$ date

# Deliberately set wrong time (chrony will correct it)
$ sudo date -s "14:00:00"

# Watch chrony fix it (within seconds due to makestep)
$ watch -n1 chronyc tracking

Safety Warning: Manually setting the system time on a production server can cause issues with running applications, especially databases. The makestep directive in chrony handles large initial offsets gracefully, but do not manually manipulate time on production systems.


Running Your Own NTP Server

For environments with many machines, running an internal NTP server reduces external dependencies and ensures all machines agree on time:

# On the NTP server, add to chrony.conf:
$ sudo tee -a /etc/chrony/chrony.conf << 'EOF'

# Allow NTP clients on the local network
allow 10.0.0.0/8
allow 192.168.0.0/16

# Serve time even when not synchronized (optional, for isolated networks)
# local stratum 10
EOF

$ sudo systemctl restart chronyd
# On client machines, point to your internal server:
$ sudo tee /etc/chrony/chrony.conf << 'EOF'
server ntp.internal.example.com iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
EOF

$ sudo systemctl restart chronyd
┌──────────────────────────────────────────────────────────────┐
│              INTERNAL NTP ARCHITECTURE                        │
│                                                              │
│  Internet                                                    │
│  ┌─────────────────────┐                                     │
│  │  pool.ntp.org       │                                     │
│  │  (stratum 2)        │                                     │
│  └────────┬────────────┘                                     │
│           │                                                  │
│  ─────────┼──────────── Firewall ────────────────────        │
│           │                                                  │
│  ┌────────▼────────────┐                                     │
│  │  Internal NTP       │                                     │
│  │  ntp.internal       │                                     │
│  │  (stratum 3)        │                                     │
│  └────────┬────────────┘                                     │
│           │                                                  │
│     ┌─────┼─────┐                                            │
│     │     │     │                                            │
│     ▼     ▼     ▼                                            │
│  Server Server Server                                        │
│  1      2      3     (stratum 4)                             │
│                                                              │
│  Only the internal NTP server needs internet access.         │
│  All clients sync from it.                                   │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Think About It: You have an isolated network with no internet access. How would you provide accurate time to the servers? Consider GPS receivers, PTP grandmaster clocks, and the local stratum directive.


PTP: Precision Time Protocol

For applications requiring sub-microsecond accuracy (financial trading, telecom, scientific instruments), NTP's millisecond-level accuracy is not enough. PTP (IEEE 1588) provides nanosecond-level synchronization.

┌──────────────────────────────────────────────────────────────┐
│                NTP vs PTP ACCURACY                            │
│                                                              │
│  Protocol    Typical Accuracy    Use Case                    │
│  ────────    ────────────────    ────────                    │
│  NTP         1-50 ms            General servers, logs        │
│  Chrony      0.01-1 ms          Good server deployments      │
│  PTP         < 1 microsecond    Financial, telecom, science  │
│                                                              │
│  PTP requires hardware support:                              │
│  • PTP-capable network cards (hardware timestamping)         │
│  • PTP-capable switches (transparent or boundary clocks)     │
│  • PTP grandmaster clock (GPS-disciplined)                   │
│                                                              │
└──────────────────────────────────────────────────────────────┘

PTP on Linux is implemented by linuxptp:

# Install linuxptp
$ sudo apt install -y linuxptp    # Debian/Ubuntu
$ sudo dnf install -y linuxptp    # RHEL/Fedora

# Check if your NIC supports hardware timestamping
$ ethtool -T eth0 | grep -i ptp
PTP Hardware Clock: 0

# Run ptp4l (PTP daemon)
$ sudo ptp4l -i eth0 -m

PTP is specialized and requires compatible hardware throughout the network path. For most Linux administration tasks, chrony with NTP provides more than sufficient accuracy.


Debug This

Users report that Kerberos authentication is failing intermittently. The error message says "Clock skew too great." You check the affected server:

$ timedatectl
               Local time: Sat 2025-06-15 14:38:12 UTC
           Universal time: Sat 2025-06-15 14:38:12 UTC
                 RTC time: Sat 2025-06-15 14:38:12
                Time zone: UTC (UTC, +0000)
System clock synchronized: no
              NTP service: inactive
          RTC in local TZ: no

What is wrong and how do you fix it?

Diagnosis:

  1. NTP service: inactive -- no NTP client is running
  2. System clock synchronized: no -- the clock is drifting freely

Fix:

# Install and start chrony
$ sudo apt install -y chrony
$ sudo systemctl enable --now chronyd

# Verify synchronization
$ chronyc sources
$ timedatectl

# Force immediate sync if needed
$ sudo chronyc makestep

Root cause: The server was installed without NTP configured. Over days or weeks, the hardware clock drifted by more than 5 minutes (the default Kerberos tolerance), causing authentication failures.

Prevention: Include chrony installation in your server provisioning playbook (see Chapter 68) and monitor NTP synchronization status with your monitoring stack (see Chapter 70).


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 75 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Accurate time is critical for logs, authentication,         │
│  distributed systems, and compliance.                        │
│                                                              │
│  Key concepts:                                               │
│  • UTC for servers, local time for display only              │
│  • Hardware clock (RTC): persists when off, drifts           │
│  • System clock: maintained by kernel, corrected by NTP      │
│  • NTP: network protocol for time synchronization            │
│  • Stratum: distance from reference clock (lower = better)   │
│                                                              │
│  Tools:                                                      │
│  • timedatectl: view/set time, timezone, NTP status          │
│  • chronyd: modern NTP client (recommended)                  │
│  • chronyc: control and query chrony                         │
│  • hwclock: manage hardware clock                            │
│                                                              │
│  Essential commands:                                         │
│  • timedatectl                (check status)                 │
│  • chronyc sources            (see NTP servers)              │
│  • chronyc tracking           (see sync accuracy)            │
│  • chronyc makestep           (force immediate correction)   │
│                                                              │
│  PTP provides nanosecond accuracy for specialized needs.     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Time Audit

Run timedatectl and chronyc sources on every Linux system you have access to. Document which are synchronized and which are not. Fix any that are not.

Exercise 2: NTP Server List

Configure chrony to use a specific set of NTP servers (not the default pool). Choose servers geographically close to you from the NTP pool project. Verify they are being used with chronyc sources.

Exercise 3: Drift Measurement

Stop chrony, wait 24 hours, then check how far your clock has drifted:

$ sudo systemctl stop chronyd
# Wait 24 hours
$ chronyc tracking   # Before restart
$ sudo systemctl start chronyd

This shows your hardware clock's natural drift rate.

Exercise 4: Internal NTP Server

Set up a chrony server that serves time to your local network. Configure a second machine to sync from it instead of the public pool. Verify with chronyc sources on the client.

Bonus Challenge

Write a monitoring check (shell script or Prometheus alert rule) that verifies:

  1. chronyd is running
  2. The system clock offset is less than 100 milliseconds
  3. At least one NTP source is reachable (Reach > 0)
  4. The stratum is less than 10

Integrate this into your monitoring stack from Chapter 70.

Troubleshooting Methodology

Why This Matters

It is 2am. The monitoring system is screaming. The website is down. Customers are tweeting about it. Your manager is on Slack asking for updates every three minutes. And you have no idea what is wrong.

This is the moment that separates an experienced Linux administrator from a beginner. Not because the experienced admin magically knows the answer, but because they have a systematic approach to finding it. They do not panic. They do not randomly restart services hoping something sticks. They follow a methodology.

Every chapter in this book has taught you specific skills: networking, storage, processes, services, security. This chapter teaches you how to combine those skills under pressure into a systematic troubleshooting process. This is arguably the most important chapter in the book, because real-world Linux work is primarily troubleshooting.


Try This Right Now

The next time something goes wrong on your system, resist the urge to immediately start fixing it. Instead, spend 60 seconds gathering information:

# What is the system's overall health?
$ uptime
$ free -m
$ df -h
$ dmesg | tail -20

# What changed recently?
$ last -10
$ journalctl --since "1 hour ago" -p err
$ rpm -qa --last | head -10     # RHEL-family
$ zcat /var/log/apt/history.log.*.gz | head -20   # Debian-family

# What is happening right now?
$ top -bn1 | head -20
$ ss -tlnp
$ systemctl --failed

Those commands take 30 seconds to run and will tell you more than 10 minutes of guessing.


The Systematic Troubleshooting Process

┌──────────────────────────────────────────────────────────────┐
│           SYSTEMATIC TROUBLESHOOTING                         │
│                                                              │
│  1. DEFINE THE PROBLEM                                       │
│     What exactly is broken? What should be happening?        │
│                    │                                         │
│                    ▼                                         │
│  2. GATHER INFORMATION                                       │
│     Logs, metrics, error messages, recent changes            │
│                    │                                         │
│                    ▼                                         │
│  3. FORM A HYPOTHESIS                                        │
│     Based on evidence, what is the most likely cause?        │
│                    │                                         │
│                    ▼                                         │
│  4. TEST THE HYPOTHESIS                                      │
│     Design a test that proves or disproves your theory       │
│                    │                                         │
│                    ▼                                         │
│  5. IMPLEMENT THE FIX                                        │
│     Apply the solution                                       │
│                    │                                         │
│                    ▼                                         │
│  6. VERIFY                                                   │
│     Confirm the problem is actually resolved                 │
│                    │                                         │
│                    ▼                                         │
│  7. DOCUMENT                                                 │
│     Record what happened, what caused it, how it was fixed   │
│                                                              │
│  If your hypothesis is wrong at step 4, go back to step 3.  │
│  Do NOT skip step 6 -- "it seems to work" is not enough.    │
│  Do NOT skip step 7 -- future you will thank present you.   │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Step 1: Define the Problem

Before you can fix something, you must understand what is broken. Vague problem statements lead to wasted effort.

Bad: "The server is down." Better: "Users cannot load the website. The server responds to ping but port 443 returns connection refused."

Bad: "It's slow." Better: "Page load times increased from 200ms to 8 seconds starting at 14:00 today."

Ask:

  • What is the expected behavior?
  • What is the actual behavior?
  • When did it start?
  • Who is affected?
  • What changed before it started?

Step 2: Gather Information

This is where your Linux toolbox comes in. Gather facts, not opinions.

# System overview
$ uptime                           # Load and uptime
$ free -m                          # Memory usage
$ df -h                            # Disk usage
$ top -bn1 | head -30              # Top processes

# Recent events
$ journalctl --since "30 min ago" -p err --no-pager
$ dmesg | tail -30                 # Kernel messages
$ last -5                          # Recent logins

# Service status
$ systemctl status <service>       # Specific service
$ systemctl --failed               # All failed services

# Network
$ ss -tlnp                         # Listening ports
$ ip addr                          # IP addresses
$ ping -c3 <gateway>               # Basic connectivity
$ dig <hostname>                   # DNS resolution

# Recent changes
$ journalctl -u <service> --since "1 hour ago"
$ stat /etc/<config-file>          # When was config last changed?

Step 3: Form a Hypothesis

Based on the evidence, propose the most likely cause. Start with the simplest explanation -- is the service running? Is the disk full? Is the network up?

Step 4: Test the Hypothesis

Design a test that will either confirm or eliminate your hypothesis. Do not change multiple things at once -- that makes it impossible to know what fixed the problem.

Step 5: Implement the Fix

Apply the minimum change needed to resolve the issue. Document what you change before you change it.

Step 6: Verify

Confirm the problem is fully resolved, not just partially. Check from the user's perspective, not just from the server.

Step 7: Document

Write down:

  • What the symptoms were
  • What caused the problem
  • What you did to fix it
  • How to prevent it in the future

The 5 Whys Technique

When you find the immediate cause, keep asking "Why?" to find the root cause.

Problem: The website is down.
  Why? → Nginx is not running.
    Why? → It crashed due to out-of-memory.
      Why? → A memory leak in the PHP application.
        Why? → A new deployment introduced a bug in session handling.
          Why? → The code change was not reviewed and had no tests.

ROOT CAUSE: Missing code review and test coverage.
FIX: Fix the memory leak AND add code review + tests.

If you only fix "Nginx is not running" by restarting it, the problem will return. The 5 Whys drives you to the real root cause.

Think About It: A server's disk filled up because log files grew too large. "Delete the logs" fixes the immediate problem. What are the 5 Whys, and what is the real fix?


Reading Error Messages and Logs

The most underrated troubleshooting skill is actually reading the error message. Most errors tell you exactly what is wrong if you read carefully.

Common Error Patterns

┌──────────────────────────────────────────────────────────────┐
│              COMMON ERROR MESSAGES AND WHAT THEY MEAN         │
│                                                              │
│  "Permission denied"                                         │
│  → File permissions, SELinux/AppArmor, or capability issue   │
│    Check: ls -la, getenforce, journalctl for AVC denials     │
│                                                              │
│  "No such file or directory"                                 │
│  → Path is wrong, file was deleted, or filesystem not mounted│
│    Check: ls, mount, findmnt                                 │
│                                                              │
│  "Connection refused"                                        │
│  → Service is not running or not listening on that port      │
│    Check: systemctl status, ss -tlnp                         │
│                                                              │
│  "Connection timed out"                                      │
│  → Firewall blocking, network unreachable, or service hung   │
│    Check: iptables, ping, traceroute                         │
│                                                              │
│  "No space left on device"                                   │
│  → Disk full OR inodes exhausted                             │
│    Check: df -h, df -i                                       │
│                                                              │
│  "Address already in use"                                    │
│  → Another process is using that port                        │
│    Check: ss -tlnp | grep <port>                             │
│                                                              │
│  "Name or service not known"                                 │
│  → DNS resolution failure                                    │
│    Check: dig, cat /etc/resolv.conf, systemd-resolve --status│
│                                                              │
│  "Out of memory: Killed process"                             │
│  → OOM killer terminated a process                           │
│    Check: dmesg | grep -i oom, journalctl -k                 │
│                                                              │
│  "Segmentation fault"                                        │
│  → Application bug (accessing invalid memory)                │
│    Check: coredump, application logs                         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Where to Find Logs

# Systemd journal (most services)
$ journalctl -u <service-name> --since "1 hour ago"

# System-wide errors
$ journalctl -p err --since today

# Kernel messages
$ dmesg | tail -50

# Traditional log files
$ ls /var/log/
$ tail -50 /var/log/syslog           # Debian/Ubuntu
$ tail -50 /var/log/messages          # RHEL-family

# Application-specific logs
$ tail -50 /var/log/nginx/error.log
$ tail -50 /var/log/postgresql/postgresql-15-main.log
$ tail -50 /var/log/mysql/error.log

Troubleshooting Scenarios

Let us walk through the most common real-world problems with a systematic approach.

Scenario 1: Cannot SSH into a Server

# Step 1: Define the problem
# "SSH connection to 10.0.0.5 hangs and eventually times out"

# Step 2: Gather information
# From another machine that CAN reach the server (or console access):

# Is the machine up?
$ ping -c3 10.0.0.5

# Is SSH listening?
$ ss -tlnp | grep :22

# Is sshd running?
$ systemctl status sshd

# Check firewall
$ sudo iptables -L -n | grep 22
$ sudo firewall-cmd --list-all

# Check SSH config
$ sudo sshd -T | grep -i "listen\|permit\|allow\|deny"

# Check for failed login attempts
$ journalctl -u sshd --since "1 hour ago" | tail -20

# Check if fail2ban blocked the IP
$ sudo fail2ban-client status sshd

Common causes and fixes:

  • sshd not running: sudo systemctl start sshd
  • Firewall blocking: sudo firewall-cmd --add-service=ssh --permanent && sudo firewall-cmd --reload
  • fail2ban banned the IP: sudo fail2ban-client set sshd unbanip <IP>
  • /etc/hosts.deny blocking: Check and edit
  • Wrong port: Check Port directive in /etc/ssh/sshd_config
  • Key authentication failed: Check ~/.ssh/authorized_keys permissions (must be 600)

Scenario 2: Website Down (HTTP Error)

# Step 1: What does the user see?
$ curl -I http://example.com
# Connection refused? 500 error? Timeout?

# Step 2: Check the web server
$ systemctl status nginx
# or
$ systemctl status apache2

# Check error logs
$ tail -30 /var/log/nginx/error.log

# Is it listening?
$ ss -tlnp | grep -E ':80|:443'

# Check config syntax
$ nginx -t

# Check backend application
$ systemctl status myapp

# Check disk space (full disk = can't write logs = crash)
$ df -h

# Check memory (OOM = killed process)
$ free -m
$ dmesg | grep -i oom

Common causes:

  • Web server not running (restart it)
  • Config syntax error (fix config, run nginx -t)
  • Backend application crashed (check app logs)
  • Disk full (clean up, check log rotation)
  • Permissions changed on document root
  • SSL certificate expired (check with openssl s_client)

Scenario 3: Disk Full

# Step 1: Confirm and identify
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   50G     0 100% /

# Step 2: Find what is using the space
$ sudo du -sh /* 2>/dev/null | sort -rh | head -10
28G     /var
12G     /home
5G      /usr
3G      /opt

$ sudo du -sh /var/* | sort -rh | head -5
25G     /var/log
2G      /var/lib

$ sudo du -sh /var/log/* | sort -rh | head -5
22G     /var/log/myapp
2G      /var/log/syslog

# Step 3: Identify the specific culprit
$ sudo ls -lhS /var/log/myapp/ | head -5
-rw-r--r-- 1 myapp myapp 20G Jun 15 14:32 application.log

# Step 4: Fix
# Immediate: truncate the file (not delete -- deleting a file held open does not free space)
$ sudo truncate -s 0 /var/log/myapp/application.log

# Long-term: set up log rotation
$ sudo cat > /etc/logrotate.d/myapp << 'EOF'
/var/log/myapp/*.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    copytruncate
    maxsize 500M
}
EOF

Safety Warning: If you delete a file that is still open by a process, the space is not freed until the process releases the file handle. Use truncate -s 0 instead, or restart the process after deleting. Check with lsof +L1 to find deleted-but-open files.

Scenario 4: High Load Average

# Step 1: Check load average
$ uptime
 14:32:07 up 30 days, load average: 24.5, 22.3, 18.7
# On a 4-CPU system, load > 4 means processes are waiting

# Step 2: Identify what is causing the load
$ top -bn1 | head -20
# Look at CPU% and state columns

# Is it CPU-bound or I/O-bound?
$ vmstat 1 5
# High 'wa' (wait) = I/O bound
# High 'us' (user) or 'sy' (system) = CPU bound

# If I/O bound, check disk I/O
$ iostat -x 1 5
# Look for high %util, high await

# If CPU bound, find the hungry processes
$ ps aux --sort=-%cpu | head -10

# Check for process storms
$ ps aux | wc -l
# If unusually high, something might be forking excessively

Scenario 5: Service Will Not Start

# Step 1: Check the status
$ systemctl status myservice
# Read the error message -- it usually tells you exactly what is wrong

# Step 2: Check the journal
$ journalctl -u myservice --since "5 min ago" --no-pager

# Step 3: Common causes checklist
# Config syntax error?
$ myservice --check-config   # Many services support this

# Missing dependency?
$ systemctl list-dependencies myservice

# Port already in use?
$ ss -tlnp | grep <port>

# Permission issue?
$ ls -la /etc/myservice/
$ ls -la /var/run/myservice/
$ namei -l /var/run/myservice/myservice.sock   # Check path permissions

# SELinux blocking?
$ sudo ausearch -m avc --start recent
$ sudo sealert -a /var/log/audit/audit.log

Scenario 6: Network Unreachable

# Step 1: Check local network config
$ ip addr
$ ip route

# Step 2: Can you reach the gateway?
$ ping -c3 $(ip route | grep default | awk '{print $3}')

# Step 3: Can you reach external IPs?
$ ping -c3 1.1.1.1

# Step 4: Is DNS working?
$ dig google.com
# or
$ nslookup google.com

# Step 5: Check for firewall issues
$ sudo iptables -L -n
$ sudo nft list ruleset

# Step 6: Check physical layer
$ ip link show
$ ethtool eth0 | grep -i "link detected"

# Decision tree:
# Can't reach gateway → local network/cable/interface issue
# Can reach gateway but not internet → routing or upstream issue
# Can reach IPs but not names → DNS issue
# Can reach some hosts but not others → firewall or routing issue

Scenario 7: DNS Not Resolving

# Step 1: Confirm DNS is the issue
$ ping 1.1.1.1           # Works? Then network is fine.
$ ping google.com         # Fails? DNS is the problem.

# Step 2: Check DNS configuration
$ cat /etc/resolv.conf
$ resolvectl status       # systemd-resolved systems

# Step 3: Test DNS directly
$ dig @1.1.1.1 google.com    # Use a known-good DNS server
$ dig @$(grep nameserver /etc/resolv.conf | head -1 | awk '{print $2}') google.com

# Step 4: Check if systemd-resolved is running
$ systemctl status systemd-resolved

# Step 5: Common fixes
# Add a working nameserver
$ echo "nameserver 1.1.1.1" | sudo tee /etc/resolv.conf

# Restart systemd-resolved
$ sudo systemctl restart systemd-resolved

Scenario 8: Permission Denied

# Step 1: Check standard Unix permissions
$ ls -la /path/to/file
$ id                    # Who am I?

# Step 2: Check the entire path
$ namei -l /path/to/file
# Every directory in the path needs execute permission

# Step 3: Check ACLs
$ getfacl /path/to/file

# Step 4: Check SELinux/AppArmor
$ ls -Z /path/to/file                    # SELinux context
$ sudo ausearch -m avc --start recent    # Recent SELinux denials
$ sudo aa-status                          # AppArmor status

# Step 5: Check if running as correct user
$ ps aux | grep <process>
# Is the process running as the user that has permission?

# Step 6: Check capabilities (for privileged operations)
$ getcap /path/to/binary

Think About It: A web server returns "Permission denied" when trying to read files in /var/www/html, but ls -la shows the files are readable by everyone. What else could be blocking access? (Hint: think beyond standard permissions.)


Building a Troubleshooting Toolkit

Keep a cheat sheet of the most useful commands for each category:

┌──────────────────────────────────────────────────────────────┐
│              TROUBLESHOOTING TOOLKIT                          │
│                                                              │
│  SYSTEM OVERVIEW                                             │
│  • uptime, free -m, df -h, top, vmstat                       │
│                                                              │
│  PROCESSES                                                   │
│  • ps aux, top, htop, pidof, pgrep, kill, strace             │
│                                                              │
│  LOGS                                                        │
│  • journalctl, dmesg, tail /var/log/*                        │
│                                                              │
│  NETWORK                                                     │
│  • ip addr, ss -tlnp, ping, traceroute, dig, curl            │
│  • tcpdump, nmap (when needed)                               │
│                                                              │
│  DISK                                                        │
│  • df -h, df -i, du -sh, lsblk, iostat, lsof                │
│                                                              │
│  SERVICES                                                    │
│  • systemctl status/start/stop/restart/enable                │
│  • systemctl --failed, systemctl list-units                  │
│                                                              │
│  PERMISSIONS                                                 │
│  • ls -la, namei -l, getfacl, ls -Z (SELinux)               │
│                                                              │
│  PERFORMANCE                                                 │
│  • top, htop, vmstat, iostat, sar, perf                      │
│                                                              │
│  HISTORY                                                     │
│  • last, lastlog, history, journalctl --since                │
│  • rpm -qa --last, /var/log/apt/history.log                  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Incident Response Basics

When a serious outage occurs, troubleshooting alone is not enough. You need a structured incident response.

The Incident Timeline

┌──────────────────────────────────────────────────────────────┐
│                  INCIDENT RESPONSE                           │
│                                                              │
│  1. DETECT                                                   │
│     Monitoring alert fires, user reports, or you notice      │
│                    │                                         │
│                    ▼                                         │
│  2. TRIAGE                                                   │
│     How severe is this? Who is affected? Is it getting worse?│
│     Assign severity: SEV1 (critical), SEV2 (major),         │
│                      SEV3 (minor), SEV4 (cosmetic)          │
│                    │                                         │
│                    ▼                                         │
│  3. COMMUNICATE                                              │
│     Notify stakeholders. Start an incident channel/thread.   │
│     Post regular updates (every 15-30 min for SEV1).        │
│                    │                                         │
│                    ▼                                         │
│  4. MITIGATE                                                 │
│     Stop the bleeding. This might mean a rollback,          │
│     failover, or temporary workaround -- not a full fix.    │
│                    │                                         │
│                    ▼                                         │
│  5. RESOLVE                                                  │
│     Fix the root cause properly.                             │
│                    │                                         │
│                    ▼                                         │
│  6. POST-MORTEM                                              │
│     Blameless review of what happened, why, and how to      │
│     prevent it from happening again.                        │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The Golden Rule of Incidents

Mitigate first, investigate later. If rolling back a deployment fixes the problem, do that now. You can figure out why the deployment broke things tomorrow when the pressure is off.


Post-Mortems

After every significant incident, write a post-mortem (also called a "retrospective" or "incident review"). The goal is not to blame anyone -- it is to prevent the same problem from happening again.

Post-Mortem Template

INCIDENT POST-MORTEM
====================
Date: 2025-06-15
Duration: 2 hours 15 minutes
Severity: SEV2
Author: [Your name]

SUMMARY
-------
[One paragraph describing what happened]

TIMELINE
--------
14:00 - Monitoring alert: HTTP 500 error rate > 5%
14:05 - On-call engineer begins investigation
14:12 - Identified: application server OOM killed
14:15 - Attempted restart; server OOM killed again within 2 minutes
14:25 - Identified memory leak in recent deployment (v2.3.1)
14:30 - Rolled back to v2.3.0
14:35 - Service restored, error rate returning to normal
16:15 - Root cause fix deployed (v2.3.2) after code review

ROOT CAUSE
----------
[What actually caused the problem]
A memory leak in the session handler introduced in v2.3.1.
Each user request allocated 2MB of memory that was never freed.

IMPACT
------
[Who and what was affected]
- 45 minutes of degraded service for all users
- 30 minutes of complete outage for checkout flow
- Estimated 200 failed transactions

WHAT WENT WELL
--------------
- Monitoring detected the issue within 5 minutes
- Rollback was quick and effective
- Team communicated clearly throughout

WHAT COULD BE IMPROVED
-----------------------
- Memory leak was not caught in staging because load testing
  was skipped for this release
- No automated canary deployment to catch issues early

ACTION ITEMS
------------
[ ] Add memory usage alerts (threshold: 80% for warning)
[ ] Require load testing for all releases
[ ] Implement canary deployment strategy
[ ] Add memory leak detection to CI pipeline

Hands-On: Troubleshooting Practice

Let us simulate a problem and walk through the methodology.

Simulate the problem:

# Create a service that will "break"
$ sudo tee /opt/scripts/fake-webapp.sh << 'SCRIPT'
#!/bin/bash
# Simulate a web application that creates a log file
while true; do
    echo "$(date) - Request processed" >> /tmp/fake-webapp.log
    sleep 0.1
done
SCRIPT
$ sudo chmod +x /opt/scripts/fake-webapp.sh

$ sudo tee /etc/systemd/system/fake-webapp.service << 'UNIT'
[Unit]
Description=Fake Web Application
After=network.target

[Service]
ExecStart=/opt/scripts/fake-webapp.sh
Restart=always
User=nobody

[Install]
WantedBy=multi-user.target
UNIT

$ sudo systemctl daemon-reload
$ sudo systemctl start fake-webapp

Now the service is running and writing to /tmp/fake-webapp.log.

Simulate the symptom: "The service is writing too much to disk."

# Step 1: Define the problem
# "fake-webapp is writing to disk continuously"

# Step 2: Gather information
$ systemctl status fake-webapp
$ ls -lh /tmp/fake-webapp.log
# Watch the file grow
$ watch -n1 'ls -lh /tmp/fake-webapp.log'

# Step 3: Hypothesis
# "The application is logging every request with no rotation"

# Step 4: Test
$ tail -5 /tmp/fake-webapp.log
# Confirms: one line every 0.1 seconds = 864,000 lines/day

# Step 5: Fix
# Immediate: stop the bleeding
$ sudo truncate -s 0 /tmp/fake-webapp.log

# Long-term: implement log rotation or reduce log verbosity
$ sudo tee /etc/logrotate.d/fake-webapp << 'EOF'
/tmp/fake-webapp.log {
    hourly
    rotate 4
    compress
    copytruncate
    maxsize 10M
}
EOF

# Step 6: Verify
$ ls -lh /tmp/fake-webapp.log   # Should be small again
$ sleep 10 && ls -lh /tmp/fake-webapp.log  # Growing slowly

# Step 7: Document
# "fake-webapp writes ~10 lines/second to its log.
#  Added logrotate config to cap at 10MB per file, keep 4 rotations.
#  TODO: reduce log verbosity to only log errors, not every request."

# Cleanup
$ sudo systemctl stop fake-webapp
$ sudo systemctl disable fake-webapp
$ sudo rm /etc/systemd/system/fake-webapp.service
$ sudo systemctl daemon-reload

Debug This: Multi-Symptom Scenario

A developer reports multiple issues on a production server:

  1. "The app is slow"
  2. "Some pages show 500 errors"
  3. "Cron jobs are failing"

All three symptoms started at approximately the same time. How do you approach this?

Methodology: Look for a single root cause that explains all symptoms.

# Check disk space first (explains slow + errors + cron failures)
$ df -h
/dev/sda1        50G   50G     0 100% /

# Bingo! A full disk explains everything:
# - App is slow: can't write to disk (temp files, sessions)
# - 500 errors: can't write logs or temporary data
# - Cron failures: can't create lock files or write output

# Find the culprit
$ sudo du -sh /var/* | sort -rh | head -5

# Fix it
$ sudo truncate -s 0 /var/log/huge-log-file.log

# Verify all three symptoms are resolved
$ curl -s -o /dev/null -w "%{http_code}" http://localhost/
200

$ sudo -u cronuser crontab -l | head -1
# Run a test cron job manually

# Prevent recurrence
# Set up disk space monitoring and log rotation

Lesson: When multiple seemingly unrelated things break simultaneously, look for a single common cause. Disk full, memory exhaustion, and network outages are the most common culprits that produce cascading failures.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 76 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Systematic troubleshooting methodology:                     │
│  1. Define the problem (be specific)                         │
│  2. Gather information (logs, metrics, status)               │
│  3. Form a hypothesis (start simple)                         │
│  4. Test the hypothesis (change one thing at a time)         │
│  5. Implement the fix                                        │
│  6. Verify (from the user's perspective)                     │
│  7. Document (for future you and your team)                  │
│                                                              │
│  Key principles:                                             │
│  • Read the error message -- it usually tells you what       │
│  • Use the 5 Whys to find root causes                        │
│  • Mitigate first, investigate later during outages          │
│  • Multiple symptoms often share a single root cause         │
│  • Write blameless post-mortems after incidents              │
│                                                              │
│  Essential toolkit:                                          │
│  • System: uptime, free, df, top, vmstat                     │
│  • Logs: journalctl, dmesg, /var/log/*                       │
│  • Network: ip, ss, ping, dig, curl, traceroute              │
│  • Services: systemctl, systemctl --failed                   │
│  • Disk: du, lsblk, lsof, iostat                             │
│  • Permissions: ls -la, namei, getfacl, ausearch             │
│                                                              │
│  The best troubleshooters are systematic, not lucky.         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Error Message Drill

For each error message below, write down three things you would check immediately:

  1. nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
  2. FATAL: password authentication failed for user "webapp"
  3. bash: /opt/app/run.sh: Permission denied
  4. kernel: Out of memory: Killed process 1234 (java)
  5. sshd: Connection closed by 10.0.0.5 port 22 [preauth]

Exercise 2: Simulate and Fix

Create a problem on a test system and fix it using the systematic methodology:

  1. Fill a filesystem to 100% and observe what breaks
  2. Create a service with a broken config file and troubleshoot why it will not start
  3. Block a port with iptables and diagnose the resulting connection failures

Exercise 3: Write a Post-Mortem

Think of a real incident you have experienced (even a minor one like a laptop crashing or a personal project failing). Write a post-mortem using the template from this chapter. Focus on action items that would prevent recurrence.

Exercise 4: Build a Runbook

Create a troubleshooting runbook for a service you manage. Include:

  • Common failure modes and their symptoms
  • Step-by-step diagnostic commands for each failure mode
  • Known fixes and workarounds
  • Escalation path (who to contact if you cannot fix it)

Bonus Challenge

Set up a "game day" on a test system. Have a colleague (or write a script that) introduces a problem -- full disk, killed service, wrong DNS config, firewall rule, permissions change -- and practice diagnosing it under a timer. Record your time-to-diagnosis and track improvement over multiple rounds. This is how the best ops teams train.

Command Reference Cheat Sheet

This appendix is your quick-reference survival guide. It is organized by category so you can find what you need fast. For each command you get the syntax, the flags you will actually use, and a brief example. Print this out, laminate it, tape it to your monitor -- whatever works.

Tip: Every command listed here has a manual page. When in doubt, run man <command> or <command> --help.


File Operations

ls -- List directory contents

$ ls                    # List files in current directory
$ ls -la                # Long format, show hidden files
$ ls -lh                # Long format, human-readable sizes
$ ls -lt                # Sort by modification time (newest first)
$ ls -lS                # Sort by file size (largest first)
$ ls -R                 # List recursively into subdirectories

cd -- Change directory

$ cd /var/log           # Go to an absolute path
$ cd ..                 # Go up one level
$ cd ~                  # Go to your home directory
$ cd -                  # Go back to the previous directory

cp -- Copy files and directories

$ cp file.txt backup.txt              # Copy a file
$ cp -r /src/dir /dst/dir             # Copy a directory recursively
$ cp -p file.txt backup.txt           # Preserve permissions, timestamps
$ cp -i file.txt backup.txt           # Prompt before overwriting
$ cp -a /src/ /dst/                   # Archive mode (preserves everything)

mv -- Move or rename files

$ mv old.txt new.txt                  # Rename a file
$ mv file.txt /tmp/                   # Move a file to another directory
$ mv -i file.txt /tmp/               # Prompt before overwriting
$ mv -n file.txt /tmp/               # Never overwrite

rm -- Remove files and directories

$ rm file.txt                         # Delete a file
$ rm -r directory/                    # Delete a directory and its contents
$ rm -i file.txt                      # Prompt before each removal
$ rm -f file.txt                      # Force removal (no prompts, no errors)

Warning: rm -rf / will destroy your entire system. There is no undo, no trash can. Double-check your paths.

mkdir -- Create directories

$ mkdir newdir                        # Create a single directory
$ mkdir -p parent/child/grandchild    # Create nested directories
$ mkdir -m 755 newdir                 # Create with specific permissions

touch -- Create empty files or update timestamps

$ touch newfile.txt                   # Create an empty file (or update timestamp)
$ touch -t 202501011200 file.txt      # Set a specific timestamp

find -- Search for files

$ find /var/log -name "*.log"                  # Find by name
$ find / -type f -size +100M                   # Find files larger than 100MB
$ find . -type f -mtime -7                     # Modified in the last 7 days
$ find . -type f -name "*.tmp" -delete         # Find and delete
$ find . -type f -perm 777                     # Find by permissions
$ find . -type f -user alice                   # Find by owner
$ find /etc -name "*.conf" -exec grep -l "port" {} \;  # Find and grep

locate -- Find files by name (uses a database)

$ locate nginx.conf                   # Fast filename search
$ sudo updatedb                       # Update the locate database

stat -- Display detailed file information

$ stat file.txt                       # Show inode, size, timestamps, permissions

file -- Determine file type

$ file mystery_file                   # Is it a binary? Text? Image?

Text Processing

cat -- Concatenate and display files

$ cat file.txt                        # Display entire file
$ cat -n file.txt                     # Display with line numbers
$ cat file1.txt file2.txt > merged.txt  # Concatenate files

less -- Paged file viewer

$ less /var/log/syslog                # View a large file page by page

Inside less: Space = next page, b = previous page, /pattern = search, q = quit.

head and tail -- View beginning or end of files

$ head -n 20 file.txt                 # First 20 lines
$ tail -n 20 file.txt                 # Last 20 lines
$ tail -f /var/log/syslog             # Follow a log file in real time
$ tail -F /var/log/syslog             # Follow, even if file is rotated

grep -- Search text with patterns

$ grep "error" logfile.txt            # Search for a string
$ grep -i "error" logfile.txt         # Case-insensitive search
$ grep -r "TODO" /src/               # Search recursively in a directory
$ grep -n "error" logfile.txt         # Show line numbers
$ grep -c "error" logfile.txt         # Count matching lines
$ grep -v "debug" logfile.txt         # Invert match (lines NOT matching)
$ grep -E "error|warning" logfile.txt # Extended regex (OR)
$ grep -l "error" *.log              # List only filenames with matches

sed -- Stream editor

$ sed 's/old/new/' file.txt           # Replace first occurrence per line
$ sed 's/old/new/g' file.txt          # Replace all occurrences
$ sed -i 's/old/new/g' file.txt       # Edit file in place
$ sed -n '10,20p' file.txt            # Print lines 10-20
$ sed '/^#/d' file.txt                # Delete comment lines
$ sed -i.bak 's/old/new/g' file.txt   # In-place edit with backup

awk -- Pattern scanning and processing

$ awk '{print $1}' file.txt           # Print the first column
$ awk -F: '{print $1, $3}' /etc/passwd  # Custom delimiter, print fields 1 and 3
$ awk '$3 > 1000' file.txt            # Print lines where field 3 > 1000
$ awk '{sum += $1} END {print sum}' file.txt  # Sum a column
$ awk 'NR==5,NR==10' file.txt         # Print lines 5 through 10

sort -- Sort lines

$ sort file.txt                       # Alphabetical sort
$ sort -n file.txt                    # Numeric sort
$ sort -r file.txt                    # Reverse sort
$ sort -k2 -t: file.txt              # Sort by field 2, delimiter is :
$ sort -u file.txt                    # Sort and remove duplicates
$ sort -h file.txt                    # Sort human-readable sizes (1K, 2M, 3G)

uniq -- Report or filter repeated lines

$ sort file.txt | uniq                # Remove adjacent duplicates
$ sort file.txt | uniq -c             # Count occurrences
$ sort file.txt | uniq -d             # Show only duplicates

cut -- Remove sections from lines

$ cut -d: -f1 /etc/passwd             # Extract field 1, delimiter is :
$ cut -c1-10 file.txt                 # Extract characters 1-10

wc -- Word, line, and byte counts

$ wc -l file.txt                      # Count lines
$ wc -w file.txt                      # Count words
$ wc -c file.txt                      # Count bytes

tr -- Translate or delete characters

$ echo "hello" | tr 'a-z' 'A-Z'      # Convert to uppercase
$ echo "hello   world" | tr -s ' '   # Squeeze repeated spaces
$ echo "hello123" | tr -d '0-9'      # Delete digits

tee -- Read from stdin, write to stdout AND a file

$ command | tee output.log            # See output and save it
$ command | tee -a output.log         # Append instead of overwrite

diff -- Compare files line by line

$ diff file1.txt file2.txt            # Show differences
$ diff -u file1.txt file2.txt         # Unified diff format (most readable)
$ diff -r dir1/ dir2/                 # Compare directories recursively

xargs -- Build and execute commands from stdin

$ find . -name "*.log" | xargs rm           # Delete found files
$ find . -name "*.log" -print0 | xargs -0 rm  # Handle filenames with spaces
$ cat urls.txt | xargs -n1 curl             # Run curl for each URL
$ seq 10 | xargs -P4 -I{} wget "http://example.com/{}"  # Parallel execution

Process Management

ps -- Report process status

$ ps aux                              # All processes, detailed view
$ ps -ef                              # Full format listing
$ ps aux --sort=-%mem                 # Sort by memory usage (descending)
$ ps -p 1234                          # Show process with specific PID
$ ps -u alice                         # Processes owned by user alice

top -- Dynamic process viewer

$ top                                 # Interactive process monitor

Inside top: P = sort by CPU, M = sort by memory, k = kill a process, q = quit.

htop -- Better interactive process viewer

$ htop                                # Much nicer than top

kill -- Send signals to processes

$ kill 1234                           # Send SIGTERM (graceful shutdown)
$ kill -9 1234                        # Send SIGKILL (force kill)
$ kill -HUP 1234                      # Send SIGHUP (reload config)
$ kill -0 1234                        # Check if process exists (no signal sent)

killall and pkill -- Kill processes by name

$ killall nginx                       # Kill all processes named nginx
$ pkill -f "python script.py"        # Kill by full command line match
$ pkill -u alice                      # Kill all of alice's processes

bg, fg, jobs -- Job control

$ command &                           # Run command in background
$ jobs                                # List background jobs
$ fg %1                               # Bring job 1 to foreground
$ bg %1                               # Resume job 1 in background
$ Ctrl+Z                              # Suspend the current foreground job

nohup -- Run a command immune to hangups

$ nohup long_running_script.sh &      # Continues running after you log out

nice and renice -- Process scheduling priority

$ nice -n 10 ./cpu_intensive_task     # Start with lower priority
$ renice -n 5 -p 1234                 # Change priority of running process

lsof -- List open files

$ lsof -i :80                         # What process is using port 80?
$ lsof -u alice                       # Files opened by user alice
$ lsof -p 1234                        # Files opened by PID 1234
$ lsof +D /var/log/                   # Processes with open files in a directory

strace -- Trace system calls

$ strace ls                           # Trace system calls of ls
$ strace -p 1234                      # Attach to a running process
$ strace -e open,read ls              # Trace only specific syscalls
$ strace -c ls                        # Summary of syscall counts and timing

Networking

ip -- Modern network configuration tool

$ ip addr show                        # Show all IP addresses
$ ip addr add 192.168.1.100/24 dev eth0  # Assign an IP address
$ ip link show                        # Show network interfaces
$ ip link set eth0 up                 # Bring an interface up
$ ip route show                       # Show routing table
$ ip route add default via 192.168.1.1  # Add default gateway
$ ip neigh show                       # Show ARP table

ss -- Socket statistics (replacement for netstat)

$ ss -tlnp                            # TCP listening sockets with process info
$ ss -ulnp                            # UDP listening sockets with process info
$ ss -s                               # Summary statistics
$ ss -t state established             # Show established connections
$ ss -t dst 10.0.0.1                  # Connections to a specific host

ping -- Test connectivity

$ ping google.com                     # Continuous ping
$ ping -c 4 google.com               # Send 4 pings then stop
$ ping -i 0.5 google.com             # Ping every 0.5 seconds

traceroute / tracepath -- Trace the route to a host

$ traceroute google.com               # Show each hop to the destination
$ tracepath google.com                # Similar, does not require root

dig -- DNS lookup

$ dig example.com                     # Query A record
$ dig example.com MX                  # Query MX record
$ dig @8.8.8.8 example.com           # Query using a specific DNS server
$ dig +short example.com              # Just the answer, no fluff
$ dig -x 93.184.216.34               # Reverse DNS lookup

curl -- Transfer data from URLs

$ curl http://example.com             # Fetch a URL
$ curl -o file.html http://example.com  # Save to file
$ curl -O http://example.com/file.tar.gz  # Save with remote filename
$ curl -I http://example.com          # Show only HTTP headers
$ curl -u user:pass http://example.com  # Basic authentication
$ curl -X POST -d "key=value" http://example.com  # POST request
$ curl -k https://self-signed.example.com  # Skip TLS verification

wget -- Non-interactive web downloader

$ wget http://example.com/file.tar.gz              # Download a file
$ wget -c http://example.com/file.tar.gz           # Resume a broken download
$ wget -r -l 2 http://example.com                  # Recursive download, depth 2
$ wget --mirror http://example.com                  # Mirror an entire site

scp -- Secure copy over SSH

$ scp file.txt user@host:/remote/path/             # Copy to remote
$ scp user@host:/remote/file.txt ./                # Copy from remote
$ scp -r local_dir/ user@host:/remote/             # Copy directory recursively
$ scp -P 2222 file.txt user@host:/path/            # Use a non-standard port

rsync -- Efficient file sync

$ rsync -avz /src/ /dst/                           # Local sync (archive, verbose, compress)
$ rsync -avz /src/ user@host:/dst/                 # Sync to remote
$ rsync -avz --delete /src/ /dst/                  # Mirror (delete extra files at dst)
$ rsync -avz --exclude='*.log' /src/ /dst/         # Exclude patterns
$ rsync -avzn /src/ /dst/                          # Dry run (show what would change)

ssh -- Secure shell

$ ssh user@host                                    # Connect to a remote host
$ ssh -p 2222 user@host                            # Non-standard port
$ ssh -i ~/.ssh/mykey user@host                    # Specify identity file
$ ssh -L 8080:localhost:80 user@host               # Local port forwarding
$ ssh -D 1080 user@host                            # SOCKS proxy
$ ssh -t user@host 'sudo reboot'                   # Force pseudo-terminal

nmap -- Network scanner

$ nmap 192.168.1.0/24                              # Scan a subnet
$ nmap -sV -p 22,80,443 host                       # Scan specific ports, detect versions
$ nmap -sn 192.168.1.0/24                          # Ping sweep (no port scan)

netcat (nc) -- The networking Swiss army knife

$ nc -zv host 80                                   # Test if port 80 is open
$ nc -l 1234                                       # Listen on port 1234
$ echo "hello" | nc host 1234                      # Send data to a port

Disk & Storage

df -- Report filesystem disk space usage

$ df -h                               # Human-readable sizes
$ df -hT                              # Include filesystem type
$ df -i                               # Show inode usage

du -- Estimate file space usage

$ du -sh /var/log                     # Total size of a directory
$ du -h --max-depth=1 /               # Size of each top-level directory
$ du -ah /var/log | sort -rh | head -10  # Top 10 largest files/dirs

mount and umount -- Mount/unmount filesystems

$ mount /dev/sdb1 /mnt/usb           # Mount a partition
$ mount -t nfs server:/share /mnt/nfs  # Mount an NFS share
$ mount -o remount,rw /               # Remount root filesystem read-write
$ umount /mnt/usb                     # Unmount
$ mount | column -t                   # List all mounts, nicely formatted

lsblk -- List block devices

$ lsblk                               # Show block devices as a tree
$ lsblk -f                            # Include filesystem info
$ lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT  # Custom columns

fdisk and parted -- Partition management

$ sudo fdisk -l                        # List all partitions
$ sudo fdisk /dev/sdb                  # Interactive partitioning
$ sudo parted /dev/sdb print           # Show partitions (GPT-aware)

mkfs -- Create a filesystem

$ sudo mkfs.ext4 /dev/sdb1            # Create ext4 filesystem
$ sudo mkfs.xfs /dev/sdb1             # Create XFS filesystem

fsck -- Filesystem check and repair

$ sudo fsck /dev/sdb1                 # Check and repair (unmount first!)
$ sudo fsck -n /dev/sdb1              # Check only, no changes

dd -- Low-level block copy

$ dd if=/dev/sda of=/backup/disk.img bs=4M status=progress  # Disk image
$ dd if=/dev/zero of=/dev/sdb bs=4M                         # Wipe a disk
$ dd if=/dev/urandom of=testfile bs=1M count=100            # Create 100MB test file

swap management

$ sudo swapon --show                  # Show active swap
$ sudo mkswap /dev/sdb2               # Create swap space
$ sudo swapon /dev/sdb2               # Enable swap
$ sudo swapoff /dev/sdb2              # Disable swap

User Management

useradd / adduser -- Create users

$ sudo useradd -m -s /bin/bash alice   # Create user with home dir and shell
$ sudo adduser alice                   # Interactive (Debian/Ubuntu)
$ sudo useradd -G sudo,docker alice    # Create user in supplementary groups

usermod -- Modify a user account

$ sudo usermod -aG docker alice        # Add alice to docker group
$ sudo usermod -s /bin/zsh alice       # Change shell
$ sudo usermod -L alice                # Lock the account
$ sudo usermod -U alice                # Unlock the account

userdel -- Delete a user

$ sudo userdel alice                   # Delete user (keep home dir)
$ sudo userdel -r alice                # Delete user and home directory

passwd -- Change passwords

$ passwd                               # Change your own password
$ sudo passwd alice                    # Change another user's password
$ sudo passwd -l alice                 # Lock an account
$ sudo passwd -e alice                 # Force password change at next login

groups / id -- Show group membership

$ groups                               # Your groups
$ groups alice                         # alice's groups
$ id                                   # UID, GID, and all groups
$ id alice                             # Same, for alice

su and sudo -- Switch users, elevated privileges

$ su - alice                           # Switch to alice (login shell)
$ sudo command                         # Run command as root
$ sudo -u alice command                # Run command as alice
$ sudo -i                             # Root login shell
$ sudo -l                             # List your sudo privileges

Package Management

APT (Debian, Ubuntu, Mint)

$ sudo apt update                      # Refresh package index
$ sudo apt upgrade                     # Upgrade installed packages
$ sudo apt install nginx               # Install a package
$ sudo apt remove nginx                # Remove (keep config)
$ sudo apt purge nginx                 # Remove and delete config
$ sudo apt autoremove                  # Remove unused dependencies
$ apt search nginx                     # Search for packages
$ apt show nginx                       # Show package details
$ dpkg -l | grep nginx                 # List installed packages matching a pattern
$ dpkg -L nginx                        # List files installed by a package

DNF / YUM (RHEL, Fedora, Rocky, Alma)

$ sudo dnf update                      # Upgrade all packages
$ sudo dnf install nginx               # Install a package
$ sudo dnf remove nginx                # Remove a package
$ dnf search nginx                     # Search for packages
$ dnf info nginx                       # Show package details
$ dnf list installed                   # List all installed packages
$ rpm -ql nginx                        # List files installed by a package
$ rpm -qf /usr/sbin/nginx             # Which package owns this file?

pacman (Arch, Manjaro)

$ sudo pacman -Syu                     # Full system upgrade
$ sudo pacman -S nginx                 # Install a package
$ sudo pacman -R nginx                 # Remove a package
$ sudo pacman -Rs nginx                # Remove with unused dependencies
$ pacman -Ss nginx                     # Search for packages
$ pacman -Qi nginx                     # Show installed package info
$ pacman -Ql nginx                     # List files owned by a package

systemd Service Management

systemctl -- Control the systemd system and service manager

$ systemctl status nginx               # Check service status
$ sudo systemctl start nginx           # Start a service
$ sudo systemctl stop nginx            # Stop a service
$ sudo systemctl restart nginx         # Restart a service
$ sudo systemctl reload nginx          # Reload configuration without restart
$ sudo systemctl enable nginx          # Start automatically at boot
$ sudo systemctl disable nginx         # Do not start at boot
$ sudo systemctl enable --now nginx    # Enable and start in one command
$ systemctl is-active nginx            # Check if running (returns active/inactive)
$ systemctl is-enabled nginx           # Check if enabled at boot
$ systemctl list-units --type=service  # List all loaded services
$ systemctl list-units --failed        # List failed services
$ systemctl daemon-reload              # Reload unit files after editing

journalctl -- Query the systemd journal

$ journalctl -u nginx                  # Logs for a specific service
$ journalctl -u nginx --since "1 hour ago"  # Recent logs
$ journalctl -u nginx -f               # Follow logs in real time
$ journalctl -b                        # Logs since last boot
$ journalctl -b -1                     # Logs from previous boot
$ journalctl -p err                    # Only error-level and above
$ journalctl --disk-usage              # How much disk space journals use
$ sudo journalctl --vacuum-size=500M   # Shrink journals to 500MB

timedatectl, hostnamectl, localectl -- System settings

$ timedatectl                          # Show time/date/timezone info
$ sudo timedatectl set-timezone Asia/Kolkata  # Set timezone
$ hostnamectl                          # Show hostname info
$ sudo hostnamectl set-hostname myserver  # Set hostname
$ localectl                            # Show locale info

Permissions & Ownership

chmod -- Change file permissions

$ chmod 755 script.sh                  # rwxr-xr-x
$ chmod 644 file.txt                   # rw-r--r--
$ chmod u+x script.sh                 # Add execute for owner
$ chmod g-w file.txt                  # Remove write for group
$ chmod o= file.txt                   # Remove all permissions for others
$ chmod -R 755 /var/www/              # Apply recursively

Permission reference:

7 = rwx    (read + write + execute)
6 = rw-    (read + write)
5 = r-x    (read + execute)
4 = r--    (read only)
0 = ---    (no permissions)

chown -- Change file owner and group

$ sudo chown alice file.txt            # Change owner
$ sudo chown alice:devs file.txt       # Change owner and group
$ sudo chown -R alice:devs /var/www/   # Apply recursively
$ sudo chown :devs file.txt            # Change group only

chgrp -- Change group ownership

$ sudo chgrp devs file.txt            # Change group
$ sudo chgrp -R devs /project/        # Apply recursively

umask -- Set default file creation permissions

$ umask                                # Show current umask
$ umask 022                            # New files: 644, new dirs: 755
$ umask 077                            # New files: 600, new dirs: 700

Special permissions

$ chmod u+s /usr/bin/program           # Set SUID bit
$ chmod g+s /shared/dir                # Set SGID bit
$ chmod +t /tmp                        # Set sticky bit
$ find / -perm -4000 -ls              # Find all SUID files

getfacl / setfacl -- Access Control Lists

$ getfacl file.txt                     # Show ACLs
$ setfacl -m u:alice:rw file.txt       # Grant alice read/write
$ setfacl -m g:devs:rx /project/       # Grant devs group read/execute
$ setfacl -x u:alice file.txt          # Remove ACL for alice
$ setfacl -b file.txt                  # Remove all ACLs

Compression & Archiving

tar -- Archive files

$ tar cf archive.tar files/            # Create archive
$ tar czf archive.tar.gz files/        # Create gzip-compressed archive
$ tar cjf archive.tar.bz2 files/       # Create bzip2-compressed archive
$ tar cJf archive.tar.xz files/        # Create xz-compressed archive
$ tar xf archive.tar.gz                # Extract (auto-detects compression)
$ tar xf archive.tar.gz -C /dst/       # Extract to a specific directory
$ tar tf archive.tar.gz                 # List contents without extracting
$ tar czf backup.tar.gz --exclude='*.log' /data/  # Exclude patterns

Memory aid for tar flags:

c = create        x = extract       t = list
z = gzip          j = bzip2         J = xz
f = filename      v = verbose       C = change directory

gzip / gunzip -- Compress/decompress

$ gzip file.txt                        # Compress (replaces original with file.txt.gz)
$ gunzip file.txt.gz                   # Decompress
$ gzip -k file.txt                     # Keep the original file
$ gzip -9 file.txt                     # Maximum compression
$ zcat file.txt.gz                     # View compressed file without extracting

zip / unzip

$ zip archive.zip file1.txt file2.txt  # Create a zip file
$ zip -r archive.zip directory/        # Zip a directory
$ unzip archive.zip                    # Extract
$ unzip -l archive.zip                 # List contents
$ unzip archive.zip -d /dst/           # Extract to a specific directory

xz -- High-ratio compression

$ xz file.txt                          # Compress (replaces original)
$ xz -d file.txt.xz                   # Decompress
$ xz -k file.txt                      # Keep the original

System Information

uname -- System information

$ uname -a                             # All system info
$ uname -r                             # Kernel version only
$ uname -m                             # Machine architecture

hostname -- Show or set the hostname

$ hostname                             # Show hostname
$ hostname -I                          # Show all IP addresses

uptime -- How long has the system been running

$ uptime                               # Uptime, users, load averages

free -- Memory usage

$ free -h                              # Human-readable memory info
$ free -h -s 5                         # Repeat every 5 seconds

dmesg -- Kernel ring buffer messages

$ dmesg                                # All kernel messages
$ dmesg -T                             # Human-readable timestamps
$ dmesg | tail -20                     # Most recent kernel messages
$ dmesg -w                             # Follow new messages in real time

vmstat, iostat, mpstat -- Performance statistics

$ vmstat 1 5                           # System stats every 1 second, 5 times
$ iostat -xz 1                         # I/O stats every second
$ mpstat -P ALL 1                      # Per-CPU stats every second

Miscellaneous Power Tools

watch -- Execute a command repeatedly

$ watch -n 2 df -h                     # Run df -h every 2 seconds
$ watch -d ls -la                      # Highlight changes between runs

alias -- Create command shortcuts

$ alias ll='ls -la'                    # Create an alias
$ alias gs='git status'                # Shorter git status
$ unalias ll                           # Remove an alias

history -- Command history

$ history                              # Show command history
$ history | grep ssh                   # Search history
$ !123                                 # Re-run command number 123
$ !!                                   # Re-run the last command
$ sudo !!                              # Re-run the last command with sudo

screen / tmux -- Terminal multiplexers

$ tmux new -s work                     # New named session
$ tmux ls                              # List sessions
$ tmux attach -t work                  # Reattach to a session
$ Ctrl+b d                             # Detach from a session
$ Ctrl+b c                             # New window
$ Ctrl+b %                             # Split pane vertically
$ Ctrl+b "                             # Split pane horizontally

date -- Show or set the date and time

$ date                                 # Current date and time
$ date +%Y-%m-%d                       # Formatted date (2025-01-15)
$ date +%s                             # Unix timestamp
$ date -d @1700000000                  # Convert timestamp to date

crontab -- Manage cron jobs

$ crontab -l                           # List your cron jobs
$ crontab -e                           # Edit your cron jobs
$ sudo crontab -l -u alice             # List alice's cron jobs

Cron format:

* * * * * command
│ │ │ │ │
│ │ │ │ └── Day of week (0-7, 0 and 7 = Sunday)
│ │ │ └──── Month (1-12)
│ │ └────── Day of month (1-31)
│ └──────── Hour (0-23)
└────────── Minute (0-59)

Examples:

0 2 * * *     command    # Every day at 2:00 AM
*/15 * * * *  command    # Every 15 minutes
0 9 * * 1-5  command    # Weekdays at 9:00 AM
0 0 1 * *    command    # First of every month at midnight

Quick Pipelines Worth Memorizing

These are not individual commands but combinations that solve common real-world problems:

# Top 10 largest files under current directory
$ find . -type f -exec du -h {} + | sort -rh | head -10

# Count lines of code in a project (excluding blanks)
$ find . -name "*.py" -exec cat {} + | grep -cv '^$'

# Watch who is consuming the most memory right now
$ ps aux --sort=-%mem | head -10

# Find which process is listening on port 443
$ sudo ss -tlnp | grep :443

# Show all failed SSH login attempts
$ grep "Failed password" /var/log/auth.log | tail -20

# Live monitor of network connections per state
$ watch -n 1 'ss -s'

# Quick disk usage audit
$ du -h --max-depth=1 / 2>/dev/null | sort -rh | head -15

# Extract unique IP addresses from an access log
$ awk '{print $1}' /var/log/nginx/access.log | sort -u | wc -l

# Check if a service is running, restart if it is not
$ systemctl is-active --quiet nginx || sudo systemctl restart nginx

That is your cheat sheet. Bookmark this page, keep it handy, and with practice, these commands will become muscle memory. When you need more detail on any command, the man pages are always one man <command> away.

Common Config File Reference

One of the first things you learn about Linux is that everything is configured through text files. There is no hidden registry, no opaque binary database. Every service, every daemon, every system behavior is governed by a plain-text file sitting somewhere under /etc or in your home directory.

This appendix is a reference guide to the configuration files you will encounter most often as a Linux sysadmin. For each file you get its purpose, format, key fields, and a working example snippet you can use as a starting point.

Golden rule: Before editing any config file, make a backup. cp /etc/somefile /etc/somefile.bak.$(date +%Y%m%d) takes two seconds and can save you hours.


/etc/passwd -- User Account Database

Purpose: Stores basic information about every user account on the system. Despite the name, it does not contain actual passwords (those live in /etc/shadow).

Format: Colon-delimited, one user per line.

username:x:UID:GID:comment:home_directory:shell

Fields:

FieldMeaning
usernameLogin name (up to 32 characters)
xPassword placeholder (actual password is in /etc/shadow)
UIDUser ID number. 0 = root. 1-999 = system accounts. 1000+ = regular users
GIDPrimary group ID
commentFull name or description (also called GECOS field)
home_directoryUser's home directory
shellDefault login shell. /usr/sbin/nologin or /bin/false = no interactive login

Example:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
alice:x:1001:1001:Alice Johnson:/home/alice:/bin/bash
nginx:x:998:998:Nginx web server:/var/lib/nginx:/usr/sbin/nologin

Things to know:

  • This file is world-readable. Anyone can see usernames and UIDs. This is by design.
  • Never edit this file directly. Use useradd, usermod, and userdel instead. If you must edit it directly, use vipw which locks the file to prevent concurrent edits.
  • A UID of 0 grants root privileges regardless of the username. This is how the system identifies root.

/etc/shadow -- Password Hashes

Purpose: Stores the actual encrypted passwords and password aging information. Only readable by root.

Format: Colon-delimited, one user per line.

username:password_hash:lastchanged:min:max:warn:inactive:expire:reserved

Fields:

FieldMeaning
usernameMust match an entry in /etc/passwd
password_hashThe hashed password. ! or * = locked account. !! = password never set
lastchangedDays since Jan 1, 1970 that password was last changed
minMinimum days between password changes
maxMaximum days before password must be changed
warnDays before expiry to warn the user
inactiveDays after expiry before the account is locked
expireDays since Jan 1, 1970 when account expires
reservedReserved for future use

Example:

root:$6$rounds=5000$salt$hashvalue:19500:0:99999:7:::
alice:$y$j9T$salt$hashvalue:19650:0:90:14:30::
nginx:!:19400:::::

Things to know:

  • The hash prefix tells you the algorithm: $1$ = MD5 (ancient, avoid), $5$ = SHA-256, $6$ = SHA-512, $y$ = yescrypt (modern default on many distros).
  • Use passwd to change passwords, never edit this file directly. If you must, use vipw -s.
  • Permissions should be 640 owned by root:shadow. If this file is world-readable, you have a serious security problem.

/etc/group -- Group Definitions

Purpose: Defines all groups on the system and their membership.

Format: Colon-delimited, one group per line.

groupname:password:GID:member1,member2,member3

Fields:

FieldMeaning
groupnameName of the group
passwordGroup password (almost never used; usually x or empty)
GIDGroup ID number
membersComma-separated list of users in this group (no spaces!)

Example:

root:x:0:
sudo:x:27:alice,bob
docker:x:999:alice,deploy
devs:x:1002:alice,bob,charlie

Things to know:

  • A user's primary group (from /etc/passwd GID field) does not need to be listed here. The user is automatically a member.
  • To add a user to a group: sudo usermod -aG groupname username. The -a flag is critical -- without it, the user is removed from all other supplementary groups.
  • Use vigr to edit this file safely.

/etc/sudoers -- Sudo Privileges

Purpose: Controls who can use sudo and what commands they can run.

Format: Custom syntax. Never edit directly -- always use visudo, which validates syntax before saving. A syntax error in this file can lock you out of sudo entirely.

Key syntax patterns:

# User privilege specification
# who    where=(as_whom)  what
root      ALL=(ALL:ALL)    ALL
alice     ALL=(ALL)        NOPASSWD: ALL
bob       ALL=(ALL)        /usr/bin/systemctl restart nginx, /usr/bin/journalctl

# Group-based rules (group names prefixed with %)
%sudo     ALL=(ALL:ALL)    ALL
%devops   ALL=(ALL)        NOPASSWD: /usr/bin/docker, /usr/bin/systemctl

# Aliases for cleaner rules
Cmnd_Alias WEBSERVER = /usr/bin/systemctl restart nginx, /usr/bin/systemctl reload nginx
User_Alias WEBADMINS = alice, bob, charlie
WEBADMINS  ALL=(ALL)  NOPASSWD: WEBSERVER

Drop-in directory: Modern systems use /etc/sudoers.d/ for additional rules. Files in this directory are included automatically. This is the preferred approach -- leave the main sudoers file untouched and add your rules as separate files:

$ sudo visudo -f /etc/sudoers.d/deploy-user
deploy  ALL=(ALL)  NOPASSWD: /usr/bin/systemctl restart myapp, /usr/bin/journalctl -u myapp

Things to know:

  • NOPASSWD: lets users run commands without entering their password. Use sparingly and only for specific commands, not ALL.
  • Rules are evaluated top to bottom. The last matching rule wins.
  • The Defaults directive controls behavior: Defaults timestamp_timeout=15 extends the sudo password cache to 15 minutes.

/etc/fstab -- Filesystem Mount Table

Purpose: Defines which filesystems are mounted at boot and with what options.

Format: Space or tab-delimited, six fields per line.

# <device>                <mount point>  <type>  <options>           <dump> <fsck>
UUID=abc123-def456         /              ext4    errors=remount-ro   0      1
UUID=789ghi-012jkl         /home          ext4    defaults            0      2
UUID=mno345-pqr678         none           swap    sw                  0      0
/dev/sdb1                  /data          xfs     defaults,noatime    0      2
server:/export/share       /mnt/nfs       nfs     defaults,_netdev   0      0
tmpfs                      /tmp           tmpfs   defaults,noatime,size=2G  0  0

Fields:

FieldMeaning
deviceBlock device, UUID, or LABEL. UUIDs are preferred (they survive disk reordering)
mount pointWhere to mount the filesystem. none for swap
typeFilesystem type: ext4, xfs, btrfs, nfs, swap, tmpfs, etc.
optionsMount options. defaults = rw, suid, dev, exec, auto, nouser, async
dump0 = do not dump (backup). 1 = dump. Almost always 0 these days
fsckBoot-time fsck order. 0 = skip. 1 = check first (root). 2 = check after root

Common mount options:

OptionMeaning
noatimeDo not update access times (improves performance)
noexecPrevent execution of binaries on this filesystem
nosuidIgnore SUID/SGID bits
roRead-only
_netdevWait for network before mounting (essential for NFS, iSCSI)
nofailDo not fail boot if the device is missing

Things to know:

  • Get UUIDs with blkid or lsblk -f.
  • A bad fstab entry can prevent your system from booting. Always test with sudo mount -a after editing.
  • For temporary mounts, use the mount command directly instead of editing fstab.

/etc/hosts -- Static Hostname Resolution

Purpose: Maps hostnames to IP addresses, consulted before DNS (unless NSS is configured otherwise).

Format: IP address followed by hostnames, space-separated.

127.0.0.1       localhost
127.0.1.1       myserver.example.com myserver
::1             localhost ip6-localhost ip6-loopback

# Internal servers
192.168.1.10    db01.internal db01
192.168.1.11    web01.internal web01
192.168.1.12    web02.internal web02
192.168.1.20    monitoring.internal grafana

Things to know:

  • Resolution order is controlled by /etc/nsswitch.conf. The line hosts: files dns means check /etc/hosts first, then DNS.
  • This file is great for small labs and development environments. For anything larger, use proper DNS.
  • The 127.0.1.1 entry is a Debian/Ubuntu convention that maps the machine's own hostname to a loopback address.

/etc/resolv.conf -- DNS Resolver Configuration

Purpose: Tells the system which DNS servers to use and how to search for hostnames.

Format:

# DNS servers (up to 3)
nameserver 1.1.1.1
nameserver 8.8.8.8
nameserver 192.168.1.1

# Search domains: short names get these appended
search example.com internal.example.com

# Options
options timeout:2 attempts:3 rotate

Key directives:

DirectiveMeaning
nameserverIP of a DNS server (maximum 3)
searchDomain search list. If you type ssh web01, it tries web01.example.com first
domainDefault domain (mutually exclusive with search)
options timeout:NSeconds before retrying a different nameserver
options rotateRound-robin between nameservers instead of always trying the first one

Things to know:

  • On systems with systemd-resolved or NetworkManager, this file may be a symlink or auto-generated. Check with ls -la /etc/resolv.conf.
  • If using systemd-resolved, the real config is managed by resolvectl and the file often points to ../run/systemd/resolve/stub-resolv.conf.
  • To set permanent DNS servers on a system with NetworkManager, use nmcli or edit the connection profile, not resolv.conf directly.

/etc/hostname -- System Hostname

Purpose: Contains the system's hostname. Just one line.

Format:

myserver

That is it. One line, one hostname.

Things to know:

  • Change it with sudo hostnamectl set-hostname newname rather than editing the file directly.
  • The hostname should also be reflected in /etc/hosts.
  • The FQDN (fully qualified domain name) is usually configured in /etc/hosts rather than here.

/etc/ssh/sshd_config -- SSH Server Configuration

Purpose: Configures the OpenSSH server daemon (sshd).

Format: Keyword Value pairs, one per line. Comments start with #.

Example with recommended security settings:

# Listen on a non-standard port (optional, not a security measure by itself)
Port 22

# Only use protocol version 2
Protocol 2

# Authentication
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

# Limit who can log in
AllowUsers alice bob deploy
# Or restrict by group:
# AllowGroups sshusers

# Timeouts and limits
LoginGraceTime 30
MaxAuthTries 3
MaxSessions 5
ClientAliveInterval 300
ClientAliveCountMax 2

# Disable unused features
X11Forwarding no
PermitEmptyPasswords no
ChallengeResponseAuthentication no

# Logging
LogLevel VERBOSE

# SFTP subsystem
Subsystem sftp /usr/lib/openssh/sftp-server

Critical settings to know:

SettingRecommendedWhy
PermitRootLoginnoForce users to log in as themselves, then sudo
PasswordAuthenticationnoUse SSH keys only. Eliminates brute-force attacks
PubkeyAuthenticationyesEnable key-based authentication
AllowUsersspecific usersWhitelist who can SSH in
MaxAuthTries3Limit failed attempts per connection
ClientAliveInterval300Disconnect idle sessions after 5 minutes of silence

Things to know:

  • After editing, always validate: sudo sshd -t. If it says nothing, the config is valid.
  • Reload the service: sudo systemctl reload sshd. Do NOT restart if you are connected remotely -- if the config is broken, you lose access.
  • Drop-in overrides can go in /etc/ssh/sshd_config.d/ on modern systems.
  • Keep a second SSH session open when testing config changes. If the new config locks you out, you still have the old session.

/etc/nginx/nginx.conf -- Nginx Web Server Configuration

Purpose: Main configuration file for the Nginx web server and reverse proxy.

Format: Block-based configuration with nested contexts.

Example:

user www-data;
worker_processes auto;
pid /run/nginx.pid;
error_log /var/log/nginx/error.log warn;

events {
    worker_connections 1024;
    multi_accept on;
}

http {
    # Basic settings
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;          # Hide Nginx version in responses

    # MIME types
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    # Gzip compression
    gzip on;
    gzip_types text/plain text/css application/json application/javascript;
    gzip_min_length 1000;

    # Include site configs
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

A typical site config (in /etc/nginx/sites-available/mysite):

server {
    listen 80;
    server_name example.com www.example.com;
    root /var/www/example.com/html;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }

    location /api/ {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Deny access to hidden files
    location ~ /\. {
        deny all;
    }
}

Things to know:

  • Test configuration before reloading: sudo nginx -t.
  • Reload without downtime: sudo systemctl reload nginx.
  • Site configs go in /etc/nginx/sites-available/ and are enabled by symlinking to /etc/nginx/sites-enabled/.
  • On RHEL-based systems, the convention is /etc/nginx/conf.d/*.conf instead of sites-available/sites-enabled.

/etc/systemd/system/*.service -- systemd Unit Files

Purpose: Define how systemd manages a service: how to start it, when to start it, what to do if it crashes.

Format: INI-style with three main sections.

Example -- a custom application service:

[Unit]
Description=My Application Server
Documentation=https://docs.example.com
After=network.target postgresql.service
Wants=postgresql.service

[Service]
Type=simple
User=appuser
Group=appgroup
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/server --config /etc/myapp/config.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
Environment=NODE_ENV=production
EnvironmentFile=/etc/myapp/env

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/myapp /var/log/myapp
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Key directives:

SectionDirectiveMeaning
[Unit]AfterStart this unit after the listed units
[Unit]WantsWeak dependency (start but do not fail if dependency fails)
[Unit]RequiresStrong dependency (fail if dependency fails)
[Service]Typesimple (default), forking, oneshot, notify
[Service]ExecStartCommand to start the service
[Service]Restarton-failure, always, on-abnormal, no
[Service]RestartSecSeconds to wait before restarting
[Service]User/GroupRun as this user/group
[Install]WantedByWhich target enables this service (usually multi-user.target)

Things to know:

  • Custom unit files go in /etc/systemd/system/. Distribution-provided ones live in /lib/systemd/system/.
  • After creating or modifying a unit file: sudo systemctl daemon-reload.
  • To override a distribution unit without modifying it: sudo systemctl edit nginx creates a drop-in override file.
  • The security directives (ProtectSystem, PrivateTmp, etc.) are extremely useful for hardening services. Use them.

/etc/crontab -- System-Wide Cron Schedule

Purpose: System-wide scheduled tasks. Unlike user crontabs, this one includes a username field.

Format:

# m  h  dom mon dow  user     command
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=admin@example.com

# Run system maintenance at 2 AM
0  2  *   *   *     root     /usr/local/bin/daily-maintenance.sh

# Rotate application logs weekly
0  3  *   *   0     root     /usr/sbin/logrotate /etc/logrotate.conf

# Database backup every 6 hours
0  */6 *  *   *     postgres /opt/scripts/db-backup.sh

# Cleanup temp files daily at midnight
0  0  *   *   *     root     find /tmp -type f -atime +7 -delete

Things to know:

  • System crontab has a user field between the time spec and the command. User crontabs (edited with crontab -e) do not.
  • Drop-in scripts can go in /etc/cron.daily/, /etc/cron.hourly/, /etc/cron.weekly/, /etc/cron.monthly/. These are run by anacron or a cron entry.
  • MAILTO controls where error output is sent. Set it to "" to disable email.
  • Cron uses a minimal PATH. Always use full paths to commands in cron jobs, or set PATH at the top.
  • On systemd systems, consider using systemd timers instead. They offer better logging, dependency management, and randomized delays.

/etc/exports -- NFS Shared Directories

Purpose: Defines which directories are shared via NFS and who can access them.

Format: One export per line: directory followed by client specifications.

# Share /data/shared with the 192.168.1.0/24 network, read-write
/data/shared    192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)

# Share /srv/public read-only to everyone
/srv/public     *(ro,sync,no_subtree_check)

# Share /home to specific hosts
/home           web01.internal(rw,sync) web02.internal(rw,sync)

Common options:

OptionMeaning
rwRead-write access
roRead-only access
syncWrite data to disk before replying (safer)
asyncReply before data is written to disk (faster, riskier)
no_subtree_checkDisables subtree checking (improves reliability)
no_root_squashTrust root on the client (dangerous in production)
root_squashMap client root to anonymous user (default, recommended)
all_squashMap all users to anonymous (useful for public shares)

Things to know:

  • After editing, apply changes with: sudo exportfs -ra.
  • View current exports: sudo exportfs -v.
  • Make sure NFS services are running: sudo systemctl enable --now nfs-server.
  • No space between the client specification and the options in parentheses. /data host(rw) is correct. /data host (rw) is wrong -- that exports to host with default options AND to the entire world with (rw).

/etc/sysctl.conf -- Kernel Parameter Tuning

Purpose: Sets kernel parameters at boot time. These parameters can also be changed at runtime.

Format: parameter = value, one per line.

Example -- common tuning parameters:

# Enable IP forwarding (required for routers, VPNs, containers)
net.ipv4.ip_forward = 1

# Harden network stack
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.tcp_syncookies = 1

# Increase connection tracking for busy servers
net.netfilter.nf_conntrack_max = 1048576

# Virtual memory tuning
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5

# Increase file descriptor limits
fs.file-max = 2097152

# Increase maximum number of memory map areas
vm.max_map_count = 262144

# Increase network buffer sizes for high-throughput servers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

Things to know:

  • Apply changes without rebooting: sudo sysctl -p or sudo sysctl --system.
  • View a current value: sysctl net.ipv4.ip_forward.
  • Set a value temporarily (until reboot): sudo sysctl -w net.ipv4.ip_forward=1.
  • Drop-in files go in /etc/sysctl.d/. For example, /etc/sysctl.d/99-custom.conf. The numbering controls load order.

~/.bashrc -- Bash Shell Customization

Purpose: Executed for every new interactive non-login Bash shell. This is where you put your personal customizations.

Format: Bash script.

Example:

# ~/.bashrc

# If not running interactively, don't do anything
case $- in
    *i*) ;;
      *) return;;
esac

# History settings
HISTSIZE=10000
HISTFILESIZE=20000
HISTCONTROL=ignoreboth    # Ignore duplicates and commands starting with space
shopt -s histappend        # Append to history, don't overwrite

# Check window size after each command
shopt -s checkwinsize

# Custom prompt: user@host:path (green for normal user, red for root)
if [ "$(id -u)" -eq 0 ]; then
    PS1='\[\e[1;31m\]\u@\h:\w#\[\e[0m\] '
else
    PS1='\[\e[1;32m\]\u@\h:\w$\[\e[0m\] '
fi

# Useful aliases
alias ll='ls -alFh'
alias la='ls -A'
alias ..='cd ..'
alias ...='cd ../..'
alias grep='grep --color=auto'
alias df='df -h'
alias du='du -h'
alias free='free -h'
alias ports='ss -tlnp'
alias myip='curl -s ifconfig.me'

# Safety nets
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Custom PATH
export PATH="$HOME/.local/bin:$HOME/bin:$PATH"

# Default editor
export EDITOR=vim
export VISUAL=vim

# Colored man pages
export LESS_TERMCAP_mb=$'\e[1;32m'
export LESS_TERMCAP_md=$'\e[1;32m'
export LESS_TERMCAP_me=$'\e[0m'
export LESS_TERMCAP_se=$'\e[0m'
export LESS_TERMCAP_so=$'\e[01;33m'
export LESS_TERMCAP_ue=$'\e[0m'
export LESS_TERMCAP_us=$'\e[1;4;31m'

# Source local customizations if they exist
if [ -f ~/.bashrc.local ]; then
    source ~/.bashrc.local
fi

Things to know:

  • .bashrc runs for interactive non-login shells. .bash_profile (or .profile) runs for login shells. Usually .bash_profile sources .bashrc.
  • Changes take effect in new shells. To apply immediately: source ~/.bashrc.
  • System-wide defaults live in /etc/bash.bashrc (Debian/Ubuntu) or /etc/bashrc (RHEL).
  • Keep .bashrc clean and fast. Complex operations here slow down every new terminal.

~/.ssh/config -- SSH Client Configuration

Purpose: Configures the SSH client. Lets you define shortcuts, default options, and per-host settings so you never have to type long SSH commands.

Format: Block-based, with Host patterns.

Example:

# Default settings for all connections
Host *
    ServerAliveInterval 60
    ServerAliveCountMax 3
    AddKeysToAgent yes
    IdentitiesOnly yes

# Quick access to production web server
Host prod-web
    HostName 203.0.113.50
    User deploy
    Port 2222
    IdentityFile ~/.ssh/prod_key

# Jump through a bastion host to reach internal servers
Host bastion
    HostName bastion.example.com
    User alice
    IdentityFile ~/.ssh/bastion_key

Host internal-*
    ProxyJump bastion
    User alice
    IdentityFile ~/.ssh/internal_key

Host internal-db
    HostName 10.0.1.50

Host internal-web
    HostName 10.0.1.51

# Development VM
Host devbox
    HostName 192.168.56.10
    User vagrant
    IdentityFile ~/.vagrant.d/insecure_private_key
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

# GitHub (useful when you have multiple keys)
Host github.com
    HostName github.com
    User git
    IdentityFile ~/.ssh/github_key

Common directives:

DirectiveMeaning
HostNameThe actual hostname or IP to connect to
UserDefault username for this host
PortDefault port
IdentityFilePath to the private key
ProxyJumpJump through another host (bastion/jump box)
LocalForwardSet up a local port forward automatically
ServerAliveIntervalSend keepalive every N seconds
StrictHostKeyCheckingask (default), yes, no
IdentitiesOnlyOnly try the specified key, not all keys in the agent

Things to know:

  • With the config above, ssh prod-web is all you need. No more ssh -p 2222 -i ~/.ssh/prod_key deploy@203.0.113.50.
  • Host patterns support wildcards: Host *.example.com matches any subdomain.
  • Settings are applied first-match-wins. Put specific hosts before general patterns.
  • File permissions must be 600 (or 644). The .ssh directory must be 700.
  • This file is for the SSH client. The SSH server config is /etc/ssh/sshd_config.

Quick Reference Table

Here is a summary of where to find what:

What you need to configureFile
User accounts/etc/passwd
Passwords and aging/etc/shadow
Groups/etc/group
Sudo privileges/etc/sudoers (use visudo)
Filesystem mounts/etc/fstab
Static hostname resolution/etc/hosts
DNS resolver/etc/resolv.conf
System hostname/etc/hostname
SSH server/etc/ssh/sshd_config
SSH client (per user)~/.ssh/config
Nginx web server/etc/nginx/nginx.conf
Custom systemd services/etc/systemd/system/*.service
System-wide cron jobs/etc/crontab
NFS exports/etc/exports
Kernel parameters/etc/sysctl.conf
Bash customization~/.bashrc
Name resolution order/etc/nsswitch.conf
PAM authentication/etc/pam.d/*
Log rotation/etc/logrotate.conf
Time zone/etc/timezone or timedatectl
Network (modern)/etc/netplan/*.yaml or nmcli

This is not every config file on a Linux system -- there are thousands. But master these and you will be able to troubleshoot and configure the vast majority of what you encounter in the real world.

Lab Setup Guide

You cannot learn Linux by reading about it. You learn Linux by breaking things, fixing them, breaking them again, and eventually understanding why they broke. For that, you need a lab -- a safe environment where you can experiment without fear of destroying anything important.

This appendix walks you through every practical option for setting up a Linux practice lab, from the simplest (spinning up a single VM) to the more sophisticated (multi-machine Vagrant environments). Pick the approach that fits your hardware, your operating system, and your comfort level. You can always graduate to a more complex setup later.


Option 1: VirtualBox -- The Classic Approach

Oracle VirtualBox is free, open source, and runs on Windows, macOS, and Linux. It is the most common choice for beginners and works well for most of the exercises in this book.

Installing VirtualBox

On Windows:

  1. Download the installer from virtualbox.org.
  2. Run the installer. Accept the defaults.
  3. If prompted about network interfaces being temporarily reset, click "Yes." This is normal.
  4. Reboot if asked.

On macOS:

  1. Download the macOS .dmg from the VirtualBox downloads page.
  2. Open the .dmg and run the installer.
  3. You will need to allow the kernel extension in System Preferences > Security & Privacy.
  4. On Apple Silicon (M1/M2/M3) Macs, VirtualBox support is available but still maturing. Consider UTM or multipass as alternatives.

On Linux (Ubuntu/Debian):

$ sudo apt update
$ sudo apt install virtualbox virtualbox-ext-pack

On Linux (Fedora):

$ sudo dnf install VirtualBox

Creating Your First VM

  1. Open VirtualBox and click "New."
  2. Give it a name (e.g., "ubuntu-lab"), choose type "Linux," version "Ubuntu (64-bit)."
  3. Allocate memory: 2048 MB minimum, 4096 MB if you can spare it.
  4. Create a virtual hard disk: VDI, dynamically allocated, 25 GB minimum.
  5. Before starting the VM, go to Settings:
    • System > Processor: Give it 2 CPUs if you have them to spare.
    • Storage: Click the empty optical drive, then click the disk icon and choose your downloaded ISO.
    • Network: The default NAT adapter is fine for internet access from the VM.
  6. Start the VM. The ISO will boot and you can proceed with the Linux installation.

Downloading a Linux ISO

For the exercises in this book, we recommend:

  • Ubuntu Server 24.04 LTS -- Best for beginners. Massive community. ubuntu.com/download/server
  • Rocky Linux 9 or AlmaLinux 9 -- RHEL-compatible, excellent for learning enterprise Linux. rockylinux.org or almalinux.org
  • Debian 12 (Bookworm) -- Rock-solid, minimalist. debian.org

Download the server edition (not the desktop edition). Server installs are leaner, and you will learn more by working from the command line.

VirtualBox Networking Modes

Understanding the networking modes is critical for lab work. Here is what each one does:

┌─────────────────────────────────────────────────────────┐
│                     Your Host Machine                    │
│                                                          │
│  ┌─────────────────────────────────────────────────┐     │
│  │                    VirtualBox                     │     │
│  │                                                   │     │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐       │     │
│  │  │   VM 1   │  │   VM 2   │  │   VM 3   │       │     │
│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘       │     │
│  │       │              │              │             │     │
│  │  ┌────┴──────────────┴──────────────┴────┐       │     │
│  │  │     Internal Network / Host-Only      │       │     │
│  │  │          192.168.56.0/24              │       │     │
│  │  └──────────────────────────────────────┘       │     │
│  └─────────────────────────────────────────────────┘     │
│                         │ NAT                             │
└─────────────────────────┼───────────────────────────────┘
                          │
                      Internet
ModeVMs talk to Internet?Host talks to VM?VMs talk to each other?Best for
NATYesNo (unless port forwarding)NoSingle VM that needs internet
BridgedYesYesYesVM on the real network like a physical machine
Host-OnlyNoYesYesIsolated lab network
Internal NetworkNoNoYesFully isolated VM-to-VM network
NAT NetworkYesNo (unless port forwarding)YesVMs need internet AND must talk to each other

Recommended setup for this book: Give each VM two network adapters:

  • Adapter 1: NAT (for internet access to install packages)
  • Adapter 2: Host-Only (so you can SSH from your host into the VM, and VMs can talk to each other)

To create a Host-Only network:

  1. Go to File > Host Network Manager (or Tools > Network in newer versions).
  2. Create a new network. Use the defaults (usually 192.168.56.1/24 with DHCP).
  3. In each VM's settings, add a second adapter set to "Host-Only Adapter" using this network.

VirtualBox Guest Additions

Install Guest Additions in your VM for shared folders, better screen resolution, and shared clipboard:

# Inside the VM (Ubuntu/Debian)
$ sudo apt update
$ sudo apt install build-essential dkms linux-headers-$(uname -r)
# Insert the Guest Additions CD from the VirtualBox menu: Devices > Insert Guest Additions CD
$ sudo mount /dev/cdrom /mnt
$ sudo /mnt/VBoxLinuxAdditions.run
$ sudo reboot

VirtualBox Snapshots

This is your safety net. Before doing anything risky in a lab exercise, take a snapshot:

  1. In VirtualBox Manager, select your VM.
  2. Click the "Snapshots" tab (or the camera icon).
  3. Click "Take." Name it something meaningful like "clean-install" or "before-nginx-config."

If things go sideways, restore the snapshot and you are back to a known good state in seconds. Use this liberally.


Option 2: QEMU/KVM -- The Linux-Native Approach

If your host is already running Linux, QEMU/KVM is the superior virtualization option. It uses hardware virtualization directly and gives better performance than VirtualBox.

Installation

# Ubuntu/Debian
$ sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager
$ sudo usermod -aG libvirt $USER
$ sudo usermod -aG kvm $USER
# Log out and back in for group changes to take effect

# Fedora/RHEL
$ sudo dnf install @virtualization
$ sudo systemctl enable --now libvirtd

Creating a VM with virt-manager (GUI)

virt-manager provides a graphical interface very similar to VirtualBox:

  1. Open Virtual Machine Manager.
  2. Click "Create a new virtual machine."
  3. Choose "Local install media (ISO)."
  4. Browse to your downloaded ISO.
  5. Set RAM (2048 MB) and CPUs (2).
  6. Create a disk (25 GB).
  7. Name the VM and choose the network (default NAT is fine).

Creating a VM from the command line

For those who prefer the terminal:

# Create a disk image
$ qemu-img create -f qcow2 /var/lib/libvirt/images/ubuntu-lab.qcow2 25G

# Install from ISO
$ virt-install \
    --name ubuntu-lab \
    --ram 2048 \
    --vcpus 2 \
    --disk path=/var/lib/libvirt/images/ubuntu-lab.qcow2,format=qcow2 \
    --cdrom /path/to/ubuntu-24.04-live-server-amd64.iso \
    --os-variant ubuntu24.04 \
    --network network=default \
    --graphics vnc,listen=0.0.0.0 \
    --noautoconsole

Managing VMs with virsh

$ virsh list --all                     # List all VMs
$ virsh start ubuntu-lab               # Start a VM
$ virsh shutdown ubuntu-lab            # Graceful shutdown
$ virsh destroy ubuntu-lab             # Force stop (like pulling the power)
$ virsh snapshot-create-as ubuntu-lab clean-install  # Take a snapshot
$ virsh snapshot-revert ubuntu-lab clean-install     # Revert to snapshot
$ virsh console ubuntu-lab             # Connect to serial console

Option 3: WSL2 on Windows -- Linux Without a VM

If you are on Windows 10 or 11, Windows Subsystem for Linux 2 (WSL2) gives you a real Linux kernel running inside a lightweight virtual machine, integrated directly into Windows. It is the fastest way to get a Linux terminal on Windows.

Installation

Open PowerShell as Administrator:

wsl --install

That single command installs WSL2 and Ubuntu. Reboot when prompted.

After reboot, Ubuntu will launch automatically. Create a username and password when asked.

Installing additional distributions

# List available distributions
wsl --list --online

# Install a specific distribution
wsl --install -d Debian
wsl --install -d Ubuntu-24.04

# List installed distributions
wsl --list --verbose

WSL2 configuration

Create or edit ~/.wslconfig in your Windows home directory (e.g., C:\Users\YourName\.wslconfig):

[wsl2]
memory=4GB
processors=2
swap=2GB
localhostForwarding=true

And inside the Linux distribution, /etc/wsl.conf:

[boot]
systemd=true

[automount]
enabled=true
options="metadata,uid=1000,gid=1000,umask=22,fmask=11"

[network]
generateResolvConf=true

What WSL2 can and cannot do

Works great for:

  • All command-line tools and exercises in this book
  • Shell scripting, text processing, git, SSH, development tools
  • Running servers (nginx, databases) accessible from Windows via localhost
  • Docker (Docker Desktop integrates with WSL2)

Limitations:

  • No systemd by default on older WSL2 versions (but modern versions support it with the [boot] systemd=true setting)
  • No direct hardware access (you cannot practice disk partitioning on real block devices)
  • Networking is somewhat abstracted (NAT through the Windows host)
  • Cannot practice kernel module loading or custom kernel builds easily
  • Not a substitute for a real VM when practicing boot process, GRUB, or disk management

For most of this book, WSL2 is excellent. For chapters on boot process, disk management, and kernel topics, use a proper VM.


Option 4: Multipass -- Quick Ubuntu VMs

Canonical's multipass is the fastest way to spin up Ubuntu VMs on any platform. It is free, open source, and incredibly simple.

Installation

# macOS
$ brew install multipass

# Windows: Download from multipass.run

# Linux (snap)
$ sudo snap install multipass

Basic usage

# Launch an Ubuntu VM (defaults to the latest LTS)
$ multipass launch --name lab1 --cpus 2 --memory 2G --disk 20G

# List running instances
$ multipass list

# Open a shell in the VM
$ multipass shell lab1

# Run a command without entering the VM
$ multipass exec lab1 -- uname -a

# Transfer files
$ multipass transfer localfile.txt lab1:/home/ubuntu/
$ multipass transfer lab1:/home/ubuntu/remotefile.txt ./

# Stop, start, delete
$ multipass stop lab1
$ multipass start lab1
$ multipass delete lab1
$ multipass purge                      # Permanently remove deleted instances

Cloud-init support

Multipass supports cloud-init, so you can automate VM setup:

# Save as lab-init.yaml
#cloud-config
package_update: true
packages:
  - nginx
  - vim
  - tmux
  - net-tools
  - curl
  - git

users:
  - name: student
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    ssh_authorized_keys:
      - ssh-ed25519 AAAA... your-key-here

runcmd:
  - systemctl enable --now nginx
$ multipass launch --name web-lab --cloud-init lab-init.yaml

Multipass is perfect when you need a quick, disposable Ubuntu instance and do not want to deal with VirtualBox configuration.


Option 5: Vagrant -- Reproducible Environments

Vagrant is a tool by HashiCorp that automates VM creation using simple configuration files called Vagrantfiles. It is open source and works with VirtualBox, libvirt/KVM, and other providers.

The killer feature: your entire lab environment is defined in a text file that you can version-control, share, and recreate in minutes.

Installation

# macOS
$ brew install vagrant

# Ubuntu/Debian
$ sudo apt install vagrant

# Fedora
$ sudo dnf install vagrant

# Windows: Download from vagrantup.com

Vagrant also needs a provider (a hypervisor to create the VMs). VirtualBox is the default.

Your first Vagrantfile

Create a directory for your lab and initialize it:

$ mkdir ~/linux-lab && cd ~/linux-lab
$ vagrant init ubuntu/jammy64

This creates a Vagrantfile. Let us look at a cleaned-up version:

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/jammy64"
  config.vm.hostname = "lab1"

  # Forward port 80 on the VM to 8080 on your host
  config.vm.network "forwarded_port", guest: 80, host: 8080

  # Private network so you can SSH and the VM gets a static IP
  config.vm.network "private_network", ip: "192.168.56.10"

  # Sync a folder from host to guest
  config.vm.synced_folder "./shared", "/home/vagrant/shared"

  # VM resources
  config.vm.provider "virtualbox" do |vb|
    vb.memory = "2048"
    vb.cpus = 2
    vb.name = "linux-lab-1"
  end

  # Provisioning: run a script on first `vagrant up`
  config.vm.provision "shell", inline: <<-SHELL
    apt-get update
    apt-get install -y nginx vim tmux net-tools curl git
    systemctl enable --now nginx
  SHELL
end

Essential Vagrant commands

$ vagrant up                           # Create and start the VM
$ vagrant ssh                          # SSH into the VM
$ vagrant halt                         # Graceful shutdown
$ vagrant destroy                      # Delete the VM entirely
$ vagrant reload                       # Restart (re-reads Vagrantfile)
$ vagrant provision                    # Re-run provisioning scripts
$ vagrant snapshot save clean          # Take a snapshot
$ vagrant snapshot restore clean       # Restore a snapshot
$ vagrant status                       # Check VM status
$ vagrant global-status                # All VMs across all projects

Multi-Machine Vagrantfile

This is where Vagrant really shines. Need a web server and a database server for practice? Define both in one file:

Vagrant.configure("2") do |config|

  # Web server
  config.vm.define "web" do |web|
    web.vm.box = "ubuntu/jammy64"
    web.vm.hostname = "web01"
    web.vm.network "private_network", ip: "192.168.56.10"
    web.vm.provider "virtualbox" do |vb|
      vb.memory = "1024"
      vb.cpus = 1
    end
    web.vm.provision "shell", inline: <<-SHELL
      apt-get update
      apt-get install -y nginx
      systemctl enable --now nginx
    SHELL
  end

  # Database server
  config.vm.define "db" do |db|
    db.vm.box = "ubuntu/jammy64"
    db.vm.hostname = "db01"
    db.vm.network "private_network", ip: "192.168.56.11"
    db.vm.provider "virtualbox" do |vb|
      vb.memory = "2048"
      vb.cpus = 1
    end
    db.vm.provision "shell", inline: <<-SHELL
      apt-get update
      apt-get install -y postgresql postgresql-contrib
      systemctl enable --now postgresql
    SHELL
  end

  # Load balancer
  config.vm.define "lb" do |lb|
    lb.vm.box = "ubuntu/jammy64"
    lb.vm.hostname = "lb01"
    lb.vm.network "private_network", ip: "192.168.56.12"
    lb.vm.provider "virtualbox" do |vb|
      vb.memory = "512"
      vb.cpus = 1
    end
    lb.vm.provision "shell", inline: <<-SHELL
      apt-get update
      apt-get install -y haproxy
    SHELL
  end

end
$ vagrant up                           # Starts all three VMs
$ vagrant ssh web                      # SSH into the web server
$ vagrant ssh db                       # SSH into the database server
$ vagrant destroy -f                   # Tear down everything

Vagrant with libvirt (KVM)

If you are on Linux and prefer KVM over VirtualBox:

$ sudo apt install vagrant-libvirt     # or install the plugin
$ vagrant plugin install vagrant-libvirt

# Use a libvirt-compatible box
$ vagrant init generic/ubuntu2204

In your Vagrantfile, swap the provider:

config.vm.provider "libvirt" do |lv|
  lv.memory = "2048"
  lv.cpus = 2
end

Here are some lab designs matched to what you are learning:

Topology 1: Single VM (Chapters 1-27)

For the first half of the book -- shell, filesystem, users, processes, scripting, text processing -- a single VM is all you need.

┌──────────────────────────┐
│    Your Host Machine      │
│                           │
│   ┌───────────────────┐   │
│   │   Ubuntu Server   │   │
│   │   192.168.56.10   │   │
│   │   2 CPU, 2GB RAM  │   │
│   └───────────────────┘   │
│                           │
└──────────────────────────┘

Topology 2: Two VMs (Chapters 28-37, Networking)

For networking chapters, you need at least two machines to practice SSH, file transfers, firewalls, and routing.

┌──────────────────────────────────────────────────┐
│                Your Host Machine                  │
│                                                   │
│   ┌──────────────────┐    ┌──────────────────┐   │
│   │   server01       │    │   server02       │   │
│   │   192.168.56.10  │<-->│   192.168.56.11  │   │
│   │   (SSH server)   │    │   (SSH client)   │   │
│   └──────────────────┘    └──────────────────┘   │
│              │                      │             │
│              └──────────┬───────────┘             │
│                    Host-Only                      │
│                  192.168.56.0/24                  │
└──────────────────────────────────────────────────┘

Topology 3: Three VMs (Chapters 44-47, Web + Load Balancing)

For web server and load balancing practice:

┌────────────────────────────────────────────────────────┐
│                   Your Host Machine                     │
│                                                         │
│   ┌──────────────┐                                     │
│   │  lb01         │  HAProxy / Nginx LB                │
│   │  .56.12       │                                    │
│   └──────┬───────┘                                     │
│          │                                              │
│    ┌─────┴──────┐                                      │
│    │            │                                      │
│  ┌─┴────────┐ ┌─┴────────┐                            │
│  │ web01    │ │ web02    │  Nginx web servers          │
│  │ .56.10   │ │ .56.11   │                            │
│  └──────────┘ └──────────┘                            │
│                                                         │
└────────────────────────────────────────────────────────┘

Topology 4: Full Lab (Chapters 44-70+, Production Practice)

For the later chapters on monitoring, automation, and containers:

┌──────────────────────────────────────────────────────────────┐
│                      Your Host Machine                        │
│                                                               │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ web01    │ │ web02    │ │ db01     │ │ monitor  │       │
│  │ .56.10   │ │ .56.11   │ │ .56.20   │ │ .56.30   │       │
│  │ Nginx    │ │ Nginx    │ │ Postgres │ │ Grafana  │       │
│  │ 1CPU/1GB │ │ 1CPU/1GB │ │ 1CPU/2GB │ │ 1CPU/2GB │       │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘       │
│                                                               │
│  ┌──────────┐ ┌──────────┐                                   │
│  │ ansible  │ │ docker   │                                   │
│  │ .56.40   │ │ .56.50   │                                   │
│  │ Control  │ │ Docker   │                                   │
│  │ 1CPU/1GB │ │ 2CPU/4GB │                                   │
│  └──────────┘ └──────────┘                                   │
│                                                               │
│              Host-Only: 192.168.56.0/24                      │
└──────────────────────────────────────────────────────────────┘

For this full lab you will need at least 16 GB of RAM on your host machine. If you do not have that much, bring VMs up and down as needed. You rarely need all of them running simultaneously.


Cloud Free Tiers for Practice

If your local machine does not have enough resources, cloud providers offer free tiers that are perfect for practice.

Oracle Cloud Free Tier

Oracle offers an "Always Free" tier that is genuinely generous:

  • 2 AMD-based VM instances (1 CPU, 1 GB RAM each)
  • Up to 4 Arm-based Ampere A1 instances (total of 4 CPUs and 24 GB RAM)
  • 200 GB of block storage
  • This is not a trial -- it is permanently free.

Sign up at cloud.oracle.com.

AWS Free Tier

  • 750 hours/month of t2.micro or t3.micro (1 CPU, 1 GB RAM) for 12 months
  • Good for a single always-on practice server
  • Sign up at aws.amazon.com/free

Google Cloud Platform

  • $300 in credits for 90 days for new accounts
  • Always-free e2-micro instance (2 vCPU, 1 GB RAM) in select regions
  • Sign up at cloud.google.com/free

Azure Free Tier

  • $200 in credits for 30 days
  • 12 months of free B1s VM (1 CPU, 1 GB RAM)
  • Sign up at azure.microsoft.com/free

DigitalOcean / Linode / Vultr

These are not "free tier" but are very affordable:

  • $4-6/month for a basic VPS (1 CPU, 1 GB RAM, 25 GB disk)
  • Simple interfaces, great for learning
  • DigitalOcean and Linode frequently offer $100-200 in credits for new accounts through promotional links

Tip: If you use a cloud instance for practice, remember to shut it down or destroy it when you are done. Cloud bills can add up if you forget.


Hardware Resource Recommendations

How much machine do you need? Here is a practical guide:

Minimum (can learn, will be tight)

  • CPU: 2 cores
  • RAM: 8 GB
  • Disk: 50 GB free
  • What you can run: 1-2 lightweight VMs simultaneously, or WSL2
  • CPU: 4 cores
  • RAM: 16 GB
  • Disk: 100 GB free (SSD strongly recommended)
  • What you can run: 3-4 VMs simultaneously, Docker, comfortable multitasking

Ideal (no compromises)

  • CPU: 6+ cores
  • RAM: 32 GB
  • Disk: 250 GB+ SSD
  • What you can run: Full multi-VM lab, Docker, Kubernetes, monitoring stack, everything at once

SSD vs HDD

If there is one upgrade that makes the biggest difference for running VMs, it is an SSD. VM performance on a spinning hard drive is painful. On an SSD, VMs boot in seconds, snapshots are nearly instant, and everything feels responsive. If you are still on an HDD, this is the upgrade to make.


Setting Up SSH Access to Your VMs

Regardless of which virtualization tool you use, you will want to SSH into your VMs from a terminal on your host. This is much more comfortable than working in the VM's console window.

Generate an SSH key (if you do not already have one)

$ ssh-keygen -t ed25519 -C "lab-key"
# Press Enter for the default path (~/.ssh/id_ed25519)
# Optionally set a passphrase

Copy your key to the VM

$ ssh-copy-id user@192.168.56.10

Set up your SSH config for convenience

Add to ~/.ssh/config:

Host lab1
    HostName 192.168.56.10
    User ubuntu
    IdentityFile ~/.ssh/id_ed25519

Host lab2
    HostName 192.168.56.11
    User ubuntu
    IdentityFile ~/.ssh/id_ed25519

Now you can simply type ssh lab1 instead of remembering IP addresses.


Tips for Effective Lab Work

  1. Snapshot before every experiment. It takes seconds and saves hours. Make this a habit.

  2. Keep a lab journal. A simple text file where you note what you tried, what worked, and what broke. Your future self will thank you.

  3. Destroy and rebuild regularly. Do not just configure a VM once and treat it as precious. Practice rebuilding from scratch. In the real world, servers are disposable and reproducible setups are king.

  4. Simulate real problems. Kill a process and see what happens. Fill a disk to 100%. Misconfigure a firewall. Then fix it. The debugging skills you build this way are more valuable than any certification.

  5. Use version control for your configs. Keep your Vagrantfiles, scripts, and ansible playbooks in a git repository. This teaches you infrastructure-as-code habits from day one.

  6. Start simple, add complexity later. One VM is enough for months of learning. Do not build a six-machine cluster on day one. Add machines when you have a concrete reason to.

  7. Make it easy to start. The biggest enemy of learning is friction. If it takes you 20 minutes to get your lab running, you will not practice as often. Automate the setup with Vagrant or a simple shell script so you can go from zero to working lab in one command.


That is everything you need to get started. Pick the option that matches your situation, spin up a VM, and start working through the chapters. The lab is where the real learning happens.

Glossary

A comprehensive glossary of Linux, system administration, networking, security, and DevOps terms you will encounter in this book and in the real world. Terms are listed alphabetically. Where a term is covered in depth in a specific chapter, the chapter number is noted.


ACL (Access Control List) -- An extension of traditional Unix file permissions that allows you to grant fine-grained access to specific users or groups beyond the standard owner/group/other model. Managed with getfacl and setfacl. (Chapter 6)

ACME (Automatic Certificate Management Environment) -- A protocol used by Let's Encrypt and other CAs to automate the issuance and renewal of TLS certificates. Tools like certbot and acme.sh implement this protocol. (Chapter 41)

Ansible -- An open source, agentless configuration management and automation tool. It connects to remote machines via SSH and executes tasks defined in YAML playbooks. (Chapter 68)

Apache (httpd) -- The Apache HTTP Server, one of the oldest and most widely used web servers. It uses a module-based architecture and is configured through .conf files and .htaccess overrides. (Chapter 46)

AppArmor -- A Linux Security Module that confines programs to a limited set of resources using per-program profiles. It is the default MAC system on Ubuntu and SUSE-based distributions. (Chapter 42)

ARP (Address Resolution Protocol) -- A protocol that maps IP addresses to MAC (hardware) addresses on a local network. When a machine knows an IP but not the MAC address, it broadcasts an ARP request. (Chapter 32)

Bash (Bourne Again Shell) -- The default interactive shell on most Linux distributions. It is a superset of the original Bourne shell (sh) with added features like command-line editing, job control, and scripting improvements. (Chapters 18-19)

Bastion Host (Jump Box) -- A hardened server that serves as the single entry point for SSH access to an internal network. All administrators connect to the bastion first, then jump to internal servers. Configured via ProxyJump in SSH config. (Chapter 36)

BGP (Border Gateway Protocol) -- The routing protocol that glues the internet together. Autonomous systems use BGP to exchange routing information and determine the best paths for traffic.

Btrfs (B-tree Filesystem) -- A modern copy-on-write filesystem for Linux that supports snapshots, subvolumes, compression, and built-in RAID. It is the default on openSUSE and Fedora workstation editions.

Bind Mount -- A way to mount a directory to another location in the filesystem, making the same content accessible from two paths. Created with mount --bind /source /target. Frequently used in container setups.

Block Device -- A device that reads and writes data in fixed-size blocks (sectors). Hard drives, SSDs, and USB drives are block devices. Represented in /dev/ as files like /dev/sda. (Chapter 7)

Boot Loader -- Software that loads the operating system kernel into memory at startup. GRUB2 is the most common boot loader on Linux systems. (Chapter 14)

Bridge -- A virtual or physical network device that connects two or more network segments at Layer 2 (data link layer). Linux bridges are commonly used in virtualization to give VMs access to the host network.

BSS (Block Started by Symbol) -- In the context of ELF binaries, the BSS segment contains uninitialized global variables. In wireless networking, a BSS is a set of stations communicating with an access point.

CA (Certificate Authority) -- An entity that issues digital certificates, vouching for the identity of the certificate holder. Let's Encrypt is the most widely used free CA. (Chapter 39)

Cgroup (Control Group) -- A Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, I/O, network) of a collection of processes. Cgroups are a foundational building block of containers. (Chapter 62)

Character Device -- A device that reads and writes data one character (byte) at a time, without buffering. Terminals (/dev/tty), serial ports, and /dev/null are character devices.

Chroot -- A system call that changes the apparent root directory for a process and its children. It provides basic filesystem isolation but is not a full security boundary. Containers provide much stronger isolation.

CI/CD (Continuous Integration / Continuous Delivery) -- A set of practices where code changes are automatically built, tested, and deployed. Tools like Jenkins, GitLab CI, and GitHub Actions are common CI/CD platforms. (Chapter 69)

Container -- A lightweight, isolated environment that runs applications using the host kernel's namespaces and cgroups. Unlike virtual machines, containers share the host kernel and are much more efficient. Docker and Podman are common container runtimes. (Chapters 63-66)

Copy-on-Write (CoW) -- A resource management technique where copies of data share the same storage until one of them is modified. Used by fork(), container image layers, and filesystems like Btrfs and ZFS.

Cron -- A time-based job scheduler. Users define scheduled tasks in crontab files, and the cron daemon executes them at the specified times. (Chapter 24)

DAC (Discretionary Access Control) -- The traditional Unix permission model where the file owner decides who can access the file. The standard read/write/execute permissions for owner/group/others are DAC. Contrast with MAC (Mandatory Access Control). (Chapter 6)

Daemon -- A background process that runs without direct user interaction, typically started at boot. Examples: sshd, nginx, crond. The name comes from Greek mythology -- a daemon is a helpful spirit that works behind the scenes.

D-Bus -- A message bus system that provides a mechanism for inter-process communication on Linux. systemd, NetworkManager, and many desktop applications communicate via D-Bus.

DHCP (Dynamic Host Configuration Protocol) -- A network protocol that automatically assigns IP addresses, subnet masks, gateways, and DNS servers to devices on a network. (Chapter 32)

DNS (Domain Name System) -- The distributed hierarchical system that translates human-readable domain names (like example.com) into IP addresses. Often called "the phone book of the internet." (Chapter 31)

Docker -- An open source platform for building, shipping, and running applications in containers. It popularized containerization and introduced the Dockerfile and image layer concepts. (Chapter 63)

DPKG -- The low-level package manager for Debian-based systems. It installs, removes, and inspects individual .deb package files. APT builds on top of dpkg to handle dependencies. (Chapter 57)

ELF (Executable and Linkable Format) -- The standard binary format for executables, object code, shared libraries, and core dumps on Linux. When you compile a C program, the result is an ELF binary.

Environment Variable -- A named value in the shell's environment that is inherited by child processes. Common examples: PATH, HOME, USER, EDITOR. Set with export VAR=value. (Chapter 18)

Ephemeral Port -- A short-lived port number automatically assigned by the operating system for outbound connections. Typically in the range 32768-60999 on Linux.

etcd -- A distributed key-value store used as the backing store for Kubernetes cluster state. It uses the Raft consensus algorithm to maintain consistency across nodes.

ext4 (Fourth Extended Filesystem) -- The most widely used Linux filesystem. It supports journaling, files up to 16 TB, and volumes up to 1 EB. It is the default on most Debian/Ubuntu and RHEL-based distributions. (Chapter 7)

File Descriptor -- An integer that the kernel uses to identify an open file or I/O resource within a process. File descriptor 0 is stdin, 1 is stdout, and 2 is stderr. All I/O in Linux happens through file descriptors. (Chapter 56)

Firewall -- Software or hardware that filters network traffic based on rules. On Linux, the kernel's netfilter framework provides firewalling, configured via iptables, nftables, or frontends like ufw and firewalld. (Chapter 34)

Fork -- The system call that creates a new process by duplicating the calling process. The new process (child) is a copy of the parent. Nearly all process creation on Linux involves fork() followed by exec(). (Chapter 10)

FQDN (Fully Qualified Domain Name) -- The complete domain name for a specific host, including all parent domains up to the root. For example, web01.internal.example.com. is an FQDN.

FUSE (Filesystem in Userspace) -- A framework that lets non-privileged users create their own filesystems without modifying kernel code. SSHFS and rclone use FUSE.

Gateway -- A network device that routes traffic between different networks. The default gateway is the router that handles traffic destined for networks outside the local subnet. Configured with ip route add default via <gateway_ip>.

GID (Group ID) -- A numeric identifier for a group. The root group has GID 0. Defined in /etc/group. (Chapter 9)

GPT (GUID Partition Table) -- A modern disk partitioning scheme that replaces MBR. It supports up to 128 partitions, disks larger than 2 TB, and provides redundancy through a backup partition table at the end of the disk. (Chapter 7)

GNU -- A recursive acronym for "GNU's Not Unix." The GNU Project, started by Richard Stallman in 1983, created the userland tools (gcc, bash, coreutils, glibc) that, combined with the Linux kernel, form a complete operating system. (Chapter 1)

GRUB (Grand Unified Bootloader) -- The most common boot loader on Linux systems. GRUB2 loads the kernel and initial ramdisk (initrd/initramfs) and passes control to the kernel at boot time. (Chapter 14)

HAProxy -- A high-performance, open source TCP/HTTP load balancer and proxy server, widely used for distributing traffic across multiple backend servers. (Chapter 47)

Hardlink -- A directory entry that points to the same inode as another file. Both names are equally valid references to the same data on disk. Hardlinks cannot cross filesystem boundaries. (Chapter 8)

Hypervisor -- Software that creates and manages virtual machines. Type 1 (bare-metal) hypervisors like KVM run directly on hardware. Type 2 (hosted) hypervisors like VirtualBox run on top of an operating system. (Chapter 61)

ICMP (Internet Control Message Protocol) -- A network protocol used for diagnostic and error-reporting purposes. ping and traceroute use ICMP. (Chapter 32)

Initramfs (Initial RAM Filesystem) -- A temporary root filesystem loaded into memory during the boot process. It contains drivers and scripts needed to mount the real root filesystem. (Chapter 14)

Inode -- A data structure on a filesystem that stores metadata about a file (permissions, ownership, timestamps, block locations) but not the filename or the file's content. Every file has exactly one inode. (Chapter 8)

I/O Scheduler -- A kernel component that reorders and merges I/O requests to optimize disk performance. Modern Linux uses schedulers like mq-deadline, bfq, and none (for NVMe SSDs). (Chapter 54)

IPC (Inter-Process Communication) -- Mechanisms that allow processes to exchange data. Linux provides pipes, named pipes (FIFOs), message queues, shared memory, semaphores, signals, and Unix domain sockets. (Chapter 12)

iptables -- The traditional user-space tool for configuring the Linux kernel's netfilter packet filtering framework. Being gradually replaced by nftables on modern distributions. (Chapter 34)

Journald -- The systemd journal service that collects and stores log data from the kernel, services, and applications. Queried with journalctl. (Chapter 17)

Journaling (Filesystem) -- A technique where the filesystem writes changes to a log (journal) before applying them to the main data structures. If the system crashes, the journal can be replayed to restore consistency without a full filesystem check. ext4, XFS, and Btrfs all use journaling.

Kernel -- The core of the operating system. It manages hardware resources, provides system calls to applications, handles process scheduling, memory management, and device drivers. Linux refers specifically to the kernel created by Linus Torvalds. (Chapter 13)

Kernel Module -- A piece of code that can be loaded into or unloaded from the kernel at runtime, extending its functionality without requiring a reboot. Device drivers are commonly implemented as kernel modules. Managed with modprobe, lsmod, insmod, rmmod. (Chapter 60)

KVM (Kernel-based Virtual Machine) -- A Linux kernel module that turns Linux into a Type 1 hypervisor. Combined with QEMU for device emulation, it provides high-performance virtualization. (Chapter 61)

LDAP (Lightweight Directory Access Protocol) -- A protocol for accessing and maintaining distributed directory information services. Often used for centralized user authentication in enterprise environments.

Load Average -- A measure of the number of processes actively running or waiting to run on the CPU. Displayed by uptime, top, and w. The three numbers represent 1-minute, 5-minute, and 15-minute averages.

Logical Volume Manager (LVM) -- A device mapper framework that provides logical volume management for the Linux kernel. It allows you to resize, snapshot, and manage disk partitions flexibly. (Chapter 48)

Logrotate -- A utility that manages the automatic rotation, compression, and deletion of log files. Configured via /etc/logrotate.conf and drop-in files in /etc/logrotate.d/. Prevents log files from consuming all disk space.

Loopback Interface -- The virtual network interface lo with address 127.0.0.1 (IPv4) and ::1 (IPv6). Traffic sent to the loopback address stays on the local machine and never reaches the physical network.

LXC/LXD -- Linux Containers (LXC) is an OS-level virtualization method that runs multiple isolated Linux systems on a single host. LXD is a system container manager built on top of LXC, offering a better user experience. (Chapter 65)

MAC (Mandatory Access Control) -- A security model where the operating system enforces access policies that cannot be overridden by users. SELinux and AppArmor implement MAC on Linux. (Chapter 42)

MBR (Master Boot Record) -- A legacy disk partitioning scheme that stores the partition table and boot loader in the first 512 bytes of a disk. Supports up to 4 primary partitions and disks up to 2 TB. Being replaced by GPT.

Mount Point -- A directory in the filesystem where a storage device or partition is attached. When you mount /dev/sdb1 at /data, the contents of the partition become accessible at /data. (Chapter 7)

MTU (Maximum Transmission Unit) -- The largest packet size (in bytes) that can be sent over a network link without fragmentation. The default MTU for Ethernet is 1500 bytes. Jumbo frames use 9000 bytes.

NAT (Network Address Translation) -- A method of mapping private IP addresses to a public IP address for outbound internet traffic. Your home router almost certainly uses NAT.

Namespace -- A Linux kernel feature that isolates and virtualizes system resources for a group of processes.

Netfilter -- The kernel framework that provides packet filtering, network address translation, and packet mangling. It is the engine behind iptables, nftables, and firewalld. (Chapter 34) Types include PID, network, mount, UTS, IPC, user, and cgroup namespaces. Namespaces are a foundational building block of containers. (Chapter 62)

NFS (Network File System) -- A distributed filesystem protocol that allows a client machine to access files over a network as if they were local. (Chapter 49)

nftables -- The modern replacement for iptables, providing a simpler and more consistent syntax for configuring the Linux kernel's packet filtering framework. (Chapter 34)

Nginx -- A high-performance web server and reverse proxy. Known for its event-driven architecture, low memory footprint, and ability to handle many concurrent connections. (Chapters 44-45)

Nice Value -- A scheduling priority hint that ranges from -20 (highest priority) to 19 (lowest priority). A "nice" process (high nice value) yields CPU time to other processes. Set with nice and renice. (Chapter 10)

NTP (Network Time Protocol) -- A protocol for synchronizing clocks of computer systems over a network. chrony and systemd-timesyncd are common NTP clients on Linux. (Chapter 75)

OCI (Open Container Initiative) -- A set of standards for container image formats and runtimes. OCI ensures that container images built with Docker can run on Podman, containerd, or any other OCI-compliant runtime. (Chapter 63)

OOM Killer (Out-of-Memory Killer) -- A kernel mechanism that selects and kills processes when the system runs critically low on memory. Each process has an OOM score; the highest score gets killed first. (Chapter 53)

OpenSSL -- An open source toolkit that implements the SSL and TLS protocols and provides a general-purpose cryptography library. Used for generating certificates, testing TLS connections, and encrypting data. (Chapter 40)

OSI Model -- The Open Systems Interconnection model, a seven-layer conceptual framework for understanding network communication: Physical, Data Link, Network, Transport, Session, Presentation, Application. (Chapter 28)

Package Manager -- A tool that automates the installation, upgrade, configuration, and removal of software. APT (Debian/Ubuntu), DNF (Fedora/RHEL), and pacman (Arch) are common package managers. (Chapter 57)

Page Cache -- A kernel memory cache that stores recently read file data in RAM. When a program reads a file, the kernel keeps the data in the page cache so subsequent reads are served from memory instead of disk. This is why "available" memory in free is much more relevant than "free" memory. (Chapter 53)

PAM (Pluggable Authentication Modules) -- A framework that provides a flexible mechanism for authenticating users. It allows system administrators to configure authentication policies without changing individual applications.

Partition -- A logically distinct section of a physical disk. Each partition can contain a different filesystem. Managed with fdisk, parted, or gdisk. (Chapter 7)

PATH -- An environment variable containing a colon-separated list of directories that the shell searches when you type a command. If a command's executable is not in a directory listed in PATH, you must use its full path. (Chapter 18)

PID (Process ID) -- A unique numeric identifier assigned to every process by the kernel. PID 1 is the init system (typically systemd). The PID namespace allows containers to have their own PID numbering. (Chapter 10)

Pipe -- A mechanism for passing the output of one command as input to another. The | character creates an anonymous pipe. Named pipes (FIFOs) persist in the filesystem and are created with mkfifo. (Chapters 4, 12)

PKI (Public Key Infrastructure) -- The framework of policies, procedures, and technology used to manage digital certificates and public-key encryption. It enables trust relationships for TLS/SSL communication. (Chapter 39)

Podman -- A daemonless, rootless container engine compatible with Docker. It can run containers without requiring root privileges, improving security. (Chapter 64)

Port -- A 16-bit number (0-65535) that identifies a specific process or network service on a host. Well-known ports (0-1023) are reserved for standard services (22=SSH, 80=HTTP, 443=HTTPS). (Chapter 30)

POSIX (Portable Operating System Interface) -- A family of IEEE standards that define the API and shell/utility interface for Unix-like operating systems. POSIX compliance ensures portability across different Unix variants.

Process -- A running instance of a program. Each process has its own address space, file descriptors, and execution context. The kernel tracks processes using task structures. (Chapter 10)

Proxy -- An intermediary that sits between clients and servers. A forward proxy acts on behalf of clients (e.g., web proxy). A reverse proxy sits in front of servers (e.g., Nginx as a reverse proxy for application servers). (Chapter 45)

QEMU -- An open source machine emulator and virtualizer. When combined with KVM, it provides near-native performance virtualization. Standalone, it can emulate different CPU architectures. (Chapter 61)

RAID (Redundant Array of Independent Disks) -- A technology that combines multiple physical disks into a single logical unit for redundancy and/or performance. Common levels: RAID 0 (striping), RAID 1 (mirroring), RAID 5 (striping with parity), RAID 10 (mirroring + striping). (Chapter 48)

RAM (Random Access Memory) -- Volatile system memory used for running programs and caching data. The kernel manages RAM allocation and can use swap space on disk when RAM is exhausted. (Chapter 53)

Regex (Regular Expression) -- A pattern-matching language used to search, match, and manipulate text. Used extensively in grep, sed, awk, and virtually every programming language. (Chapter 20)

Reverse Proxy -- A server that sits in front of one or more backend servers and forwards client requests to them. It provides load balancing, SSL termination, caching, and security benefits. Nginx and HAProxy are common reverse proxies. (Chapter 45)

RPM (Red Hat Package Manager) -- The low-level package format and manager for Red Hat-based distributions. Individual .rpm files are managed by rpm; dependency resolution is handled by dnf (or the older yum). (Chapter 57)

Rsync -- A utility for efficiently synchronizing files between locations. It transfers only the changed parts of files (delta transfer), making it ideal for backups and remote file synchronization. (Chapter 50)

Runlevel -- A legacy System V init concept defining the state of the machine (e.g., 3 = multi-user with networking, 5 = graphical). In systemd, runlevels are replaced by targets (e.g., multi-user.target, graphical.target).

SATA (Serial ATA) -- A computer bus interface for connecting storage devices. SATA drives are the most common type of hard drive and SSD in consumer and server hardware.

Scheduler (Process) -- The kernel component that decides which process gets to run on the CPU next, and for how long. Linux uses the Completely Fair Scheduler (CFS), and newer kernels offer EEVDF (Earliest Eligible Virtual Deadline First). (Chapter 10)

Seccomp -- A Linux kernel security feature that restricts the system calls a process can make. Used by container runtimes and application sandboxes to reduce the attack surface. Docker applies a default seccomp profile to all containers.

Semaphore -- An IPC synchronization primitive that controls access to a shared resource. Counting semaphores allow N concurrent accessors; binary semaphores act like mutexes. (Chapter 12)

Shared Library -- A library file (.so on Linux) that is loaded into memory once and shared among multiple programs at runtime. This saves memory and disk space compared to static linking. Managed with ldconfig and ldd. (Chapter 59)

Shared Memory -- An IPC mechanism that allows multiple processes to access the same region of memory. It is the fastest form of IPC because data does not need to be copied between processes. (Chapter 12)

SELinux (Security-Enhanced Linux) -- A Linux Security Module that implements Mandatory Access Control. Developed by the NSA, it is the default MAC system on Red Hat-based distributions. Every file, process, and port has a security context (label). (Chapter 42)

Shell -- A command-line interpreter that provides the user interface to the operating system. Bash, Zsh, Fish, and Dash are common shells. The shell reads commands, interprets them, and executes them. (Chapter 4)

Signal -- A software interrupt delivered to a process to notify it of an event. Common signals: SIGTERM (graceful termination), SIGKILL (forced termination), SIGHUP (hangup/reload), SIGINT (Ctrl+C). (Chapter 11)

SMTP (Simple Mail Transfer Protocol) -- The standard protocol for sending email between mail servers. Common open source SMTP servers on Linux include Postfix and Exim.

Socket -- An endpoint for network communication. A socket is identified by an IP address and port number. Unix domain sockets provide IPC on the same machine using filesystem paths instead of network addresses.

SSH (Secure Shell) -- A cryptographic network protocol for secure remote login, command execution, and file transfer. OpenSSH is the standard implementation on Linux. (Chapter 36)

SSL/TLS (Secure Sockets Layer / Transport Layer Security) -- Cryptographic protocols that provide secure communication over a network. SSL is deprecated; TLS (1.2 and 1.3) is the current standard. Used for HTTPS, secure email, VPNs, and more. (Chapter 39)

Sticky Bit -- A special permission bit that, when set on a directory, prevents users from deleting files they do not own. /tmp has the sticky bit set (permission drwxrwxrwt). (Chapter 6)

Strace -- A diagnostic tool that traces the system calls made by a process. Invaluable for debugging "why is this program not working" problems. Usage: strace -p <pid> or strace <command>. (Chapter 10)

SUID (Set User ID) -- A special permission bit that, when set on an executable, causes it to run with the permissions of the file owner rather than the user who executes it. The passwd command uses SUID to write to /etc/shadow as root. (Chapter 6)

System Call (syscall) -- The interface between user-space applications and the kernel. When a program needs to access hardware, create a process, or open a file, it makes a system call. Common syscalls: open, read, write, fork, exec, mmap. (Chapter 13)

Subnet -- A logical subdivision of an IP network. Subnetting allows you to divide a large network into smaller, manageable segments. Defined by a subnet mask (e.g., 255.255.255.0 or /24). (Chapter 29)

Sudo -- A program that allows a permitted user to execute a command as the superuser or another user, as specified in /etc/sudoers. (Chapter 9)

Superblock -- A data structure on a filesystem that contains metadata about the filesystem itself: its size, block size, number of inodes, and other global properties. (Chapter 7)

Swap -- Disk space used as an extension of RAM. When physical memory is full, the kernel moves less-used pages to swap. Swap can be a dedicated partition or a swap file. (Chapter 53)

Symlink (Symbolic Link) -- A file that contains a reference (path) to another file or directory. Unlike hardlinks, symlinks can cross filesystem boundaries and link to directories. Created with ln -s. (Chapter 8)

Syslog -- A standard logging protocol and the traditional Linux logging system. Messages are categorized by facility (kern, auth, mail, etc.) and severity (emerg, alert, crit, err, warning, etc.). rsyslog and syslog-ng are common implementations. (Chapter 17)

Systemd -- The init system and service manager used by most modern Linux distributions. It manages the boot process, services, timers, mounts, and more. Controlled with systemctl and journalctl. (Chapters 15-16)

TCP (Transmission Control Protocol) -- A connection-oriented, reliable transport protocol. It guarantees ordered delivery of data and handles retransmissions. Used for HTTP, SSH, SMTP, and most internet traffic. (Chapter 30)

TLS (Transport Layer Security) -- See SSL/TLS above.

Target (systemd) -- A systemd unit type that groups other units and represents a system state. Targets replace the concept of runlevels. Common targets: multi-user.target (text mode), graphical.target (desktop), rescue.target (single-user recovery). (Chapter 15)

Terraform -- An open source infrastructure-as-code tool by HashiCorp. It uses declarative configuration files to provision and manage cloud and on-premises resources. (Chapter 67)

Tmpfs -- A temporary filesystem that resides in RAM (and swap). Data in tmpfs is lost on reboot. Often used for /tmp and /run.

TTL (Time to Live) -- In networking, a value in an IP packet header that limits the number of hops the packet can traverse. Each router decrements the TTL by 1; when it reaches 0, the packet is discarded. In DNS, TTL specifies how long a record should be cached.

TTY -- Historically a teletypewriter terminal. In modern Linux, TTY refers to terminal devices. Physical consoles are /dev/tty1 through /dev/tty6. Pseudo-terminals (used by SSH and terminal emulators) are /dev/pts/*.

Tunable -- A kernel parameter that can be adjusted at runtime via /proc/sys/ or sysctl. Tunables control network behavior, memory management, filesystem limits, and other kernel settings. (Chapter 13)

UDP (User Datagram Protocol) -- A connectionless transport protocol. It does not guarantee delivery, ordering, or duplicate protection, but has lower overhead than TCP. Used for DNS queries, NTP, video streaming, and VPNs. (Chapter 30)

UID (User ID) -- A numeric identifier for a user account. UID 0 is root. UIDs 1-999 are typically reserved for system accounts. Regular users start at UID 1000 on most distributions. (Chapter 9)

Umask -- A value that determines the default permissions for newly created files and directories. A umask of 022 means new files get 644 (rw-r--r--) and new directories get 755 (rwxr-xr-x). (Chapter 6)

Unix Domain Socket -- An IPC mechanism that uses filesystem paths instead of network addresses. Faster than TCP sockets for communication between processes on the same machine. Nginx, Docker, and systemd use Unix domain sockets extensively.

Ulimit -- A shell builtin that controls the resource limits for processes launched from the shell. Limits include maximum open files, stack size, and CPU time. Persistent limits are configured in /etc/security/limits.conf. (Chapter 56)

UUID (Universally Unique Identifier) -- A 128-bit identifier used to uniquely identify resources. In Linux, UUIDs are commonly used to identify filesystems and partitions (e.g., in /etc/fstab). View with blkid.

Vagrant -- An open source tool by HashiCorp for building and managing virtual machine environments. It uses Vagrantfiles (Ruby-based config files) to define reproducible VMs. (Appendix C)

Virtual Memory -- An abstraction that gives each process the illusion of having its own contiguous address space. The kernel maps virtual addresses to physical RAM (or swap) using page tables. This isolation is fundamental to process security and stability. (Chapter 53)

VFS (Virtual File System) -- An abstraction layer in the kernel that provides a uniform interface to different filesystem types. It allows programs to use open(), read(), write() regardless of whether the underlying filesystem is ext4, XFS, NFS, or procfs. (Chapter 8)

Vim -- A powerful, modal text editor that is available on virtually every Unix and Linux system. It descends from the original vi editor. Learning at least basic Vim navigation is essential for any Linux administrator. (Chapter 25)

VLAN (Virtual LAN) -- A technology that creates logically separate networks on the same physical infrastructure. VLANs use 802.1Q tagging to isolate traffic at Layer 2.

VPN (Virtual Private Network) -- A technology that creates an encrypted tunnel between two points over a public network. WireGuard and OpenVPN are popular open source VPN solutions on Linux. (Chapter 37)

WireGuard -- A modern, high-performance VPN protocol implemented as a Linux kernel module. It is simpler to configure than IPsec or OpenVPN and uses state-of-the-art cryptography. (Chapter 37)

Watchdog -- A timer-based mechanism (hardware or software) that reboots a system if the operating system becomes unresponsive. The Linux kernel supports hardware watchdogs via /dev/watchdog and software watchdogs through systemd.

XFS -- A high-performance journaling filesystem originally developed by SGI. It is the default filesystem on Red Hat-based distributions and excels at handling large files and parallel I/O. (Chapter 7)

YAML (YAML Ain't Markup Language) -- A human-readable data serialization format widely used in DevOps tooling. Ansible playbooks, Docker Compose files, Kubernetes manifests, and cloud-init configs all use YAML. Indentation-sensitive -- spaces only, never tabs.

yum (Yellowdog Updater, Modified) -- The legacy package manager for Red Hat-based distributions. It has been replaced by DNF on modern Fedora and RHEL systems, though the yum command often still works as an alias. (Chapter 57)

ZFS (Zettabyte File System) -- A combined filesystem and logical volume manager originally developed by Sun Microsystems. It provides built-in RAID, snapshots, compression, and data integrity verification. Available on Linux through the OpenZFS project, though licensing debates mean it is not included in the mainline kernel.

Zombie Process -- A process that has terminated but whose parent has not yet read its exit status (via wait()). It takes up a PID slot but no other resources. Shown as state Z in ps output. A large number of zombies indicates a bug in the parent process. (Chapter 10)

Zone (DNS) -- A portion of the DNS namespace managed by a specific organization or administrator. A zone file contains the DNS records (A, AAAA, MX, CNAME, NS, etc.) for a domain. (Chapter 31)

Zone (Firewall) -- In firewalld (used on RHEL-based systems), a zone is a named set of rules that define the trust level for network connections. Common zones include public, trusted, internal, and drop. (Chapter 34)

Zsh (Z Shell) -- An extended Bourne shell with many improvements over Bash, including better tab completion, spelling correction, and plugin support through frameworks like Oh My Zsh. It is the default shell on macOS and popular among power users on Linux.


This glossary covers the core terminology you will encounter throughout this book and in your career as a Linux professional. When you hit an unfamiliar term, check here first, then dive into the relevant chapter for the full story.

Further Reading & Resources

This book gives you a thorough foundation, but Linux is a vast ecosystem that evolves constantly. The resources listed here will help you go deeper into specific topics, stay current, and connect with the community. Everything listed is either free or reasonably priced, and every tool or platform mentioned is open source or freely accessible.


Essential Books

These are the books that experienced Linux professionals come back to again and again. If you are going to buy physical books, start here.

The Classics

UNIX and Linux System Administration Handbook (5th Edition) by Evi Nemeth, Garth Snyder, Trent Hein, Ben Whaley, and Dan Mackin

This is the definitive sysadmin reference. Affectionately known as "the Nemeth book" or "the ULSAH book," it covers everything from booting to DNS to configuration management. The fifth edition is current and includes cloud and DevOps topics. If you can only buy one book besides the one you are reading, make it this one.

How Linux Works (3rd Edition) by Brian Ward

An excellent mid-level book that explains what happens under the hood -- how the boot process works, how the kernel manages devices, how networking operates. It bridges the gap between "I can use Linux" and "I understand Linux." Highly recommended after you finish this book.

The Linux Command Line (2nd Edition) by William Shotts

A thorough, beginner-friendly introduction to the command line and shell scripting. The full text is available for free at linuxcommand.org. This is a great companion if you want more practice with the shell and scripting topics covered in Part V of this book.

Networking & Security

TCP/IP Illustrated, Volume 1 (2nd Edition) by Kevin Fall and W. Richard Stevens

The most thorough treatment of TCP/IP networking ever written. It is dense but incredibly rewarding. If you want to truly understand how networking works at the protocol level, this is the book.

Linux Firewalls by Steve Suehring

A practical guide to iptables and network security on Linux. Covers packet filtering, NAT, logging, and firewall design.

SSH Mastery (2nd Edition) by Michael W. Lucas

A focused, practical book on OpenSSH. Covers key management, tunneling, proxying, agent forwarding, and all the things you can do with SSH that most people never discover.

Shell Scripting

Classic Shell Scripting by Arnold Robbins and Nelson H.F. Beebe

Goes deep into shell scripting, text processing, and the Unix philosophy. Covers awk, sed, and the standard Unix toolkit in great detail.

Bash Cookbook (2nd Edition) by Carl Albing, JP Vossen, and Cameron Newham

A problem-solution format book. Great when you have a specific scripting problem and want to see how experienced Bash programmers solve it.

Performance & Internals

Systems Performance (2nd Edition) by Brendan Gregg

The definitive guide to Linux and Unix performance analysis. Covers CPUs, memory, filesystems, disks, networking, and more. Brendan Gregg's work on performance observability tools (perf, bpftrace, flame graphs) has been hugely influential.

Linux Kernel Development (3rd Edition) by Robert Love

If you want to understand the kernel internals -- process scheduling, memory management, VFS, the block I/O layer -- this is the most accessible introduction. You do not need to be a kernel developer to benefit from this book.

Understanding the Linux Kernel (3rd Edition) by Daniel Bovet and Marco Cesati

More detailed than Robert Love's book and closer to the actual kernel source code. A heavy read but invaluable if you want deep kernel understanding.

Containers & DevOps

Docker Deep Dive by Nigel Poulton

A practical, hands-on guide to Docker that covers images, containers, networking, volumes, and orchestration. Updated frequently.

The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford

A novel (yes, a novel) about DevOps. It tells the story of an IT manager trying to rescue a failing project and introduces DevOps principles in an engaging narrative format. Surprisingly effective at explaining why DevOps practices matter.

Infrastructure as Code (2nd Edition) by Kief Morris

Covers the principles and practices of managing infrastructure using code and automation. Vendor-neutral and focused on concepts that apply regardless of which tools you use.


Online Resources

Reference Wikis and Documentation

Arch Wiki (wiki.archlinux.org)

The single best Linux documentation resource on the internet, period. Despite being written for Arch Linux, the vast majority of its content applies to any distribution. Covers everything from filesystem encryption to Bluetooth troubleshooting with clear, accurate, regularly updated instructions. Bookmark this now.

man pages

The original Linux documentation system, installed on every Linux system. Get in the habit of reading man pages: man ls, man ssh, man 5 fstab (section 5 covers file formats). The quality varies, but many man pages are excellent. Use man -k keyword to search for relevant pages.

info pages

GNU's extended documentation system. Some GNU tools (like coreutils, bash, sed, awk) have more detailed info pages than man pages. Access with info coreutils or info bash.

The Linux Documentation Project (TLDP) (tldp.org)

A large collection of HOWTOs, guides, and FAQs. Some content is dated, but the foundational guides (Bash Guide for Beginners, Advanced Bash-Scripting Guide, Linux Network Administrators Guide) are still valuable.

Red Hat Documentation (docs.redhat.com)

Comprehensive, professionally maintained documentation for RHEL. Useful for anyone working with RHEL, CentOS Stream, Rocky Linux, or AlmaLinux.

Debian Administrator's Handbook (debian-handbook.info)

A free book covering Debian system administration. Excellent for Debian and Ubuntu users.

Ubuntu Server Guide (ubuntu.com/server/docs)

Official Ubuntu Server documentation. Clear, practical, and well-maintained.

Interactive Learning Sites

Linux Journey (linuxjourney.com)

A free, beginner-friendly site that teaches Linux fundamentals through short lessons with quizzes. Good for reinforcing the basics.

Explainshell (explainshell.com)

Paste any shell command and it will break it down, explaining each part with references to man pages. Incredibly useful when you encounter a long, cryptic command.

Regex101 (regex101.com)

An interactive regular expression tester that explains each part of your regex. Invaluable when building or debugging regular expressions.

ShellCheck (shellcheck.net)

An online linter for shell scripts. Paste your Bash script and it will find bugs, suggest improvements, and explain common pitfalls. Also available as a command-line tool (apt install shellcheck).


Community Resources

Learning Linux is not a solitary activity. These communities are where you can ask questions, find answers, and learn from others.

Stack Exchange Network

Unix & Linux Stack Exchange (unix.stackexchange.com)

The best Q&A site for Linux questions. Well-moderated, high-quality answers, and a massive archive of solved problems. Search here before asking elsewhere.

Server Fault (serverfault.com)

Stack Exchange for system administrators. More focused on professional sysadmin topics: networking, infrastructure, enterprise configurations.

Super User (superuser.com)

Stack Exchange for power users. Good for desktop Linux and general computing questions.

Reddit

r/linux -- General Linux news and discussion.

r/linuxadmin -- Professional system administration discussions. Great for real-world advice.

r/linuxquestions -- Beginner-friendly Q&A. No question is too basic.

r/commandline -- Tips, tricks, and discussions about the command line.

r/selfhosted -- A community around self-hosting services on Linux. Great for learning by running your own infrastructure.

r/homelab -- Discussions about home lab setups. Inspirational if you are building out a practice environment.

IRC and Chat

Libera.Chat (libera.chat)

The successor to Freenode as the home of open source IRC channels. Channels like #linux, #bash, #debian, #ubuntu, #fedora, #nginx, and hundreds more. IRC may feel old-fashioned, but the quality of help you can get from experienced developers and sysadmins in these channels is unmatched.

Discord communities

Many Linux distributions and open source projects now have Discord servers. The Linux Mint, Fedora, and Arch communities are active on Discord. Search for them on Discord or check the project's website for invite links.

Matrix/Element

Matrix is an open source, federated chat protocol. Many open source projects are migrating their real-time chat from IRC to Matrix. The #linux:matrix.org and #sysadmin:matrix.org rooms are active.

Mailing Lists

LKML (Linux Kernel Mailing List) (lkml.org)

Where Linux kernel development happens. Reading LKML is like watching master craftspeople at work. Not for asking beginner questions, but an incredible resource for understanding how the kernel evolves.

Distribution mailing lists

Most distributions maintain user and developer mailing lists. These are excellent for distribution-specific questions and staying informed about changes.


Certification Paths

Certifications are not required to be a great Linux admin, but they provide structured learning paths and are valued by many employers. Here are the most relevant ones.

Red Hat Certifications

RHCSA (Red Hat Certified System Administrator)

The most widely recognized Linux certification. It is a hands-on, performance-based exam (no multiple choice). You are given a real RHEL system and must complete tasks within a time limit. Topics include user management, file permissions, SELinux, systemd, networking, storage, and shell scripting. This cert alone can open many doors.

RHCE (Red Hat Certified Engineer)

Builds on RHCSA and focuses on Ansible automation and advanced system administration. Also a hands-on exam.

Preparation resources:

  • Red Hat's own training courses (expensive but thorough)
  • Sander van Vugt's RHCSA/RHCE video courses and books
  • Practice on Rocky Linux or AlmaLinux (RHEL-compatible, free)
  • github.com -- search for "RHCSA practice labs"

Linux Foundation Certifications

LFCS (Linux Foundation Certified System Administrator)

A distribution-neutral certification. You can choose to take the exam on Ubuntu, CentOS, or openSUSE. Covers essential commands, operation of running systems, user management, networking, and service configuration.

LFCE (Linux Foundation Certified Engineer)

The advanced version of LFCS. Covers network administration, advanced storage, security, and service management.

Preparation resources:

  • Linux Foundation's free courses on training.linuxfoundation.org
  • "Introduction to Linux" (LFS101x) on edX -- free and excellent
  • "Essentials of Linux System Administration" (LFS201) -- paid but comprehensive

CompTIA Linux+

CompTIA Linux+ (XK0-005)

A vendor-neutral certification covering hardware configuration, system operation, security, scripting, and automation. It is multiple-choice (not hands-on), so it tests knowledge rather than practical skill. Good as a first certification, especially if your employer values CompTIA certs.

Which certification should you get?

If you are starting out and want maximum industry recognition: RHCSA. It is hands-on, widely respected, and forces you to actually know how to do things (not just answer quiz questions). Prepare for it by doing every exercise in this book on a Rocky Linux or AlmaLinux VM.

If you prefer a distribution-neutral approach: LFCS. The Linux Foundation certs are well-regarded and more flexible in terms of which distribution you use.

If you need something quickly for your resume: CompTIA Linux+ is the easiest of the bunch but also the least impressive to experienced hiring managers.


Practice Platforms

Reading and labbing on your own VM is essential, but these platforms provide structured challenges and real-world scenarios.

Wargames and CTFs

OverTheWire: Bandit (overthewire.org/wargames/bandit/)

A free wargame designed to teach the basics of the Linux command line through a series of SSH-based challenges. Start with Bandit -- it is beginner-friendly and teaches file operations, piping, searching, and basic scripting through progressively harder levels. This is one of the best ways to practice what you learn in Part I of this book.

OverTheWire: other wargames

After Bandit, try Leviathan (basic exploitation), Natas (web security), and Narnia (binary exploitation). Each one teaches different Linux and security concepts.

HackTheBox (hackthebox.com)

A platform for practicing penetration testing and security skills on intentionally vulnerable machines. The free tier gives you access to several active machines. This is more security-focused but teaches an enormous amount about how Linux systems work (and how they break).

TryHackMe (tryhackme.com)

Similar to HackTheBox but more beginner-friendly, with guided paths and rooms that walk you through concepts step by step. The "Linux Fundamentals" path is an excellent supplement to this book.

Practice Labs

KodeKloud (kodekloud.com)

Hands-on labs for Linux, DevOps, Kubernetes, and cloud topics. The labs spin up real environments in your browser. There is a free tier, and the paid plans are reasonably priced. Their Linux-specific labs are well-designed.

Katacoda / O'Reilly Interactive (learning.oreilly.com)

Katacoda's interactive Linux and DevOps scenarios are now part of O'Reilly's learning platform. If you have an O'Reilly subscription (many employers provide this), the interactive labs are excellent.

Linux Survival (linuxsurvival.com)

A free, browser-based Linux tutorial that teaches basic commands interactively. Very beginner-friendly but limited in depth. Good for absolute beginners who want to get comfortable before touching a real terminal.

Self-Hosted Practice Projects

Nothing teaches like building real things. Here are projects that will exercise the skills from this book:

  1. Set up a personal web server. Install Nginx, configure virtual hosts, add TLS with Let's Encrypt, set up a reverse proxy to a backend application.

  2. Build a home monitoring stack. Install Prometheus and Grafana on a VM, configure node_exporter on your other VMs, build dashboards, set up alerting.

  3. Deploy a self-hosted Git server. Install Gitea or Forgejo. Configure SSH access, backups, and a reverse proxy.

  4. Create a VPN with WireGuard. Set up a WireGuard server on a cloud instance and connect your devices.

  5. Automate everything with Ansible. Take the manual setup of any of the above projects and convert it into Ansible playbooks. Practice idempotent automation.

  6. Set up a Pi-hole or AdGuard Home. DNS filtering at the network level. Teaches DNS, networking, systemd, and Linux administration all in one project.

  7. Run a Nextcloud instance. A self-hosted cloud storage and productivity platform. Covers web servers, databases, TLS, backups, and ongoing maintenance.


Blogs, Newsletters, and Podcasts

Blogs

Brendan Gregg's Blog (brendangregg.com)

The go-to resource for Linux performance analysis. Brendan Gregg literally wrote the book on systems performance and regularly publishes detailed articles about tracing, profiling, and debugging.

Julia Evans (b0rk) (jvns.ca)

Julia writes incredibly clear, often illustrated explanations of Linux and systems concepts. Her "zines" on topics like networking, debugging, and the command line are beloved by the community. If something confuses you, check if Julia has written about it.

Ops School (opsschool.org)

A free, community-maintained curriculum for operations engineers. Covers Linux basics, security, monitoring, and more.

Tanel Poder's Blog (tanelpoder.com)

Deep technical content on Linux performance, particularly memory management, CPU profiling, and kernel internals.

Percona Blog (percona.com/blog)

If you work with databases on Linux (MySQL, PostgreSQL, MongoDB), Percona's blog is an excellent resource for performance tuning and operational best practices.

Newsletters

Linux Weekly News (LWN) (lwn.net)

The premier news source for Linux kernel development and the broader Linux ecosystem. Some content requires a subscription, but older articles become free. The quality of analysis is unmatched.

DevOps Weekly (devopsweekly.com)

A curated weekly newsletter covering DevOps news, tools, and practices. Free.

SRE Weekly (sreweekly.com)

A newsletter focused on site reliability engineering, including incident reports, postmortems, and reliability practices. Free.

Cron Weekly (cronweekly.com)

A weekly newsletter about Linux and open source, curated by Mattias Geniar. Short, focused, and consistently useful.

Podcasts

Linux Unplugged (jupiterbroadcasting.com)

A weekly show about Linux and open source. Conversational format, good for staying current on community news and distro developments.

Self-Hosted (selfhosted.show)

A podcast about self-hosting applications and services on Linux. Practical, project-focused episodes.

Command Line Heroes (redhat.com/commandlineheroes)

Produced by Red Hat, this podcast tells the stories of open source software and the people who build it. Well-produced and insightful.

FLOSS Weekly (twit.tv/shows/floss-weekly)

Interviews with open source project maintainers and contributors. A good way to discover new tools and understand the projects behind them.


Open Source Projects to Contribute To

Contributing to open source is one of the best ways to deepen your Linux knowledge. You learn how real-world software is built, reviewed, and maintained. Start with these approachable projects:

coreutils (github.com/coreutils/coreutils)

The basic Unix utilities (ls, cat, cp, mv, etc.). Contributing here means working directly with the tools you use every day. Written in Rust (the modern rewrite) and C (the GNU version).

systemd (github.com/systemd/systemd)

The init system that runs most of the Linux world. The project is massive but has well-tagged "good first issue" items.

Ansible (github.com/ansible/ansible)

Ansible is written in Python and has an enormous collection of modules. Contributing a module, fixing a bug, or improving documentation is a great way to learn both Python and Linux automation.

Nginx (nginx.org)

Nginx's open source version accepts contributions. If you have used Nginx extensively (Chapters 44-45), you may have encountered edge cases or have ideas for documentation improvements.

Documentation contributions

Almost every open source project needs better documentation. If you find unclear documentation while working through this book, that is an opportunity. Submit a documentation fix. It is the lowest-friction way to start contributing, and maintainers love it.

Your distribution

Every Linux distribution has ways to contribute: packaging, bug triage, documentation, testing. Debian, Fedora, Ubuntu, Arch, and openSUSE all have well-documented contribution guides.


A Final Word on Continuous Learning

Linux has been around since 1991, and the ecosystem grows larger every year. Nobody knows all of it. The best Linux professionals are not the ones who have memorized every command -- they are the ones who have built strong mental models and know how to find answers quickly.

Here is a practical learning strategy:

  1. Master the fundamentals. The shell, filesystems, processes, permissions, and networking do not change. They are the foundation everything else is built on.

  2. Build real things. Theory without practice fades quickly. Run your own servers, break them, fix them.

  3. Read source code and man pages. When documentation does not answer your question, the source code always will.

  4. Teach what you learn. Write blog posts, help people on forums, mentor colleagues. Teaching forces you to truly understand a topic.

  5. Stay curious. When you encounter something you do not understand, dig into it. Follow the rabbit hole. That is where the deepest learning happens.

The resources in this appendix will serve you for years. Bookmark the ones that interest you, start with one or two, and expand your reading as your skills grow. The Linux community is vast, generous, and welcoming. Jump in.