The Kernel Up Close

Why This Matters

Every command you have run so far in this book -- every file you opened, every process you started, every network packet you sent -- went through the Linux kernel. The kernel is the one piece of software that sits between your programs and the hardware. It manages memory, schedules processes, handles disk I/O, drives network interfaces, and enforces security.

Yet most Linux users never look at the kernel directly. They interact with it through commands, system calls, and virtual filesystems without realizing it. This chapter pulls back the curtain. You will learn what the kernel actually does, how to inspect it, how to load and unload kernel modules, how to tune kernel behavior at runtime, and how to read the kernel's own log messages.

This knowledge is essential for performance tuning, hardware troubleshooting, security hardening, and understanding why things work the way they do.

Try This Right Now

# What kernel are you running?
uname -a

# Kernel version only
uname -r

# How long has this kernel been running?
uptime

# See kernel log messages (most recent)
dmesg | tail -20

# How many kernel modules are loaded?
lsmod | wc -l

# Peek at the kernel's view of your CPU
cat /proc/cpuinfo | head -20

# How much memory does the kernel see?
cat /proc/meminfo | head -10

Kernel vs. Userspace

The most fundamental distinction in Linux is between kernel space and user space.

+--------------------------------------------------+
|                User Space                        |
|                                                  |
|   +--------+  +--------+  +--------+  +------+  |
|   | bash   |  | nginx  |  | python |  | top  |  |
|   +--------+  +--------+  +--------+  +------+  |
|                                                  |
|   Applications, libraries (glibc), utilities     |
|                                                  |
+=======================+=========================+
|        System Call Interface (syscall)           |
+=======================+=========================+
|                                                  |
|                Kernel Space                      |
|                                                  |
|   +----------+  +---------+  +----------+       |
|   | Process   |  | Memory  |  | Network  |       |
|   | Scheduler |  | Manager |  | Stack    |       |
|   +----------+  +---------+  +----------+       |
|                                                  |
|   +----------+  +---------+  +----------+       |
|   | VFS      |  | Device  |  | Security |       |
|   |          |  | Drivers |  | (LSM)    |       |
|   +----------+  +---------+  +----------+       |
|                                                  |
+=======================+=========================+
|              Hardware                            |
|   CPU, RAM, Disk, Network, USB, GPU, ...         |
+--------------------------------------------------+

Why Two Spaces?

  • Kernel space has unrestricted access to hardware. A bug here can crash the entire system.
  • User space is restricted. A bug in your application cannot (usually) crash the kernel or affect other users.

The CPU enforces this split using hardware protection rings:

  • Ring 0: Kernel mode (full hardware access)
  • Ring 3: User mode (restricted)

When your program needs something that requires kernel privileges (opening a file, sending a network packet, allocating memory), it makes a system call.


System Calls: The Gateway

A system call (syscall) is how user-space programs request services from the kernel. Every meaningful operation eventually becomes a system call.

  Your Program (user space)
       |
       | printf("hello\n")
       |
       v
  C Library (glibc)
       |
       | write(1, "hello\n", 6)  <-- system call wrapper
       |
       v
  Kernel (kernel space)
       |
       | Actually writes bytes to the terminal device
       |
       v
  Hardware (terminal/screen)

Common System Calls

System CallWhat It DoesYou Use It When...
open()Open a fileOpening any file
read()Read from a file descriptorReading file contents
write()Write to a file descriptorWriting to a file or stdout
close()Close a file descriptorDone with a file
fork()Create a child processStarting a new process
exec()Replace process with new programRunning a command
mmap()Map file/memory into address spaceMemory allocation, file I/O
socket()Create a network socketAny network operation
ioctl()Device-specific controlHardware configuration

Watching System Calls with strace

strace lets you see every system call a process makes:

# Trace a simple command
strace ls /tmp 2>&1 | head -30

# Trace a running process
sudo strace -p $(pgrep nginx | head -1) -e trace=read,write

# Count system calls (summary mode)
strace -c ls /tmp

# Trace file-related calls only
strace -e trace=file ls /tmp

# Trace network-related calls only
strace -e trace=network curl -s example.com > /dev/null

# Trace with timestamps
strace -t ls /tmp 2>&1 | head -10

Example output from strace -c ls /tmp:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 25.00    0.000050          10         5           openat
 20.00    0.000040           5         8           mmap
 15.00    0.000030           4         7           close
 10.00    0.000020           3         6           fstat
 10.00    0.000020           4         5           read
  5.00    0.000010          10         1           getdents64
  ...

Think About It: When you run echo "hello", how many system calls happen? Try strace echo "hello" 2>&1 | wc -l to find out. Why are there so many for such a simple command?


Kernel Modules

The Linux kernel is modular. Rather than compiling every possible driver and feature into the kernel image, Linux loads functionality on demand through kernel modules. These are .ko (kernel object) files.

  +----------------------------------+
  |         Linux Kernel             |
  |                                  |
  |  Core (always loaded):           |
  |  - Process scheduler             |
  |  - Memory manager                |
  |  - VFS layer                     |
  |                                  |
  |  Modules (loaded on demand):     |
  |  +----------+ +----------+      |
  |  | ext4.ko  | | e1000.ko |      |
  |  +----------+ +----------+      |
  |  +----------+ +----------+      |
  |  | nf_tables| | usb_hid  |      |
  |  +----------+ +----------+      |
  +----------------------------------+

Listing Loaded Modules

# List all currently loaded modules
lsmod

# Output format:
# Module                  Size  Used by
# nf_tables             303104  0
# e1000                  151552  0
# ext4                   806912  1
# ...

The columns are:

  • Module: Module name
  • Size: Memory used (bytes)
  • Used by: Count of dependents, and which modules depend on it
# Filter for a specific module
lsmod | grep ext4

# Count loaded modules
lsmod | wc -l

Getting Module Information

# Detailed info about a module
modinfo ext4

# Key fields:
# filename:       /lib/modules/.../ext4.ko
# license:        GPL
# description:    Fourth Extended Filesystem
# depends:        jbd2,mbcache,crc16
# parm:           ...  (module parameters)

# Just show the description
modinfo -d ext4

# Show module parameters
modinfo -p ext4

# Show the file path
modinfo -n ext4

Loading and Unloading Modules

# Load a module (resolves dependencies automatically)
sudo modprobe snd_dummy

# Verify it loaded
lsmod | grep snd_dummy

# Unload a module
sudo modprobe -r snd_dummy

# Load with parameters
sudo modprobe loop max_loop=64

WARNING: Be very careful loading and unloading kernel modules on production systems. Unloading a module that is in use can crash the system. modprobe -r will refuse if the module is in use, but forcing removal (rmmod -f) can cause a kernel panic.

Module Dependencies

Modules can depend on other modules. modprobe handles this automatically, but you can see the dependency tree:

# Show what a module depends on
modinfo ext4 | grep depends

# Show the full dependency tree
modprobe --show-depends ext4

Blacklisting Modules

Sometimes you need to prevent a module from loading (conflicting drivers, security):

# Create a blacklist file
sudo tee /etc/modprobe.d/blacklist-example.conf << 'EOF'
# Prevent the nouveau driver from loading (example)
blacklist nouveau
EOF

# After blacklisting, update initramfs
sudo update-initramfs -u      # Debian/Ubuntu
sudo dracut --force            # RHEL/Fedora

Distro Note: Module blacklisting syntax is the same across distributions, but the command to rebuild initramfs differs. Debian/Ubuntu use update-initramfs, RHEL/Fedora use dracut.


Exploring /proc -- The Process Filesystem

/proc is a virtual filesystem. Nothing in it exists on disk -- the kernel generates its contents on the fly when you read them. It is your window into the kernel's state.

System-Wide Information

# Kernel version
cat /proc/version

# CPU information
cat /proc/cpuinfo

# Memory statistics
cat /proc/meminfo

# Uptime (in seconds)
cat /proc/uptime

# Load average
cat /proc/loadavg

# Mounted filesystems
cat /proc/mounts

# Currently active partitions
cat /proc/partitions

# Network statistics
cat /proc/net/dev

# Open file count system-wide
cat /proc/sys/fs/file-nr

# Maximum number of open files
cat /proc/sys/fs/file-max

# Kernel command line (boot parameters)
cat /proc/cmdline

Per-Process Information

Each PID has its own directory (covered in Chapter 10, but here is the kernel-focused view):

# Pick a PID (your own shell)
PID=$$

# Command that started this process
cat /proc/$PID/cmdline | tr '\0' ' '; echo

# Process status (kernel's view)
cat /proc/$PID/status

# Memory map
cat /proc/$PID/maps | head -10

# Open file descriptors
ls -l /proc/$PID/fd/

# Limits applied to this process
cat /proc/$PID/limits

# cgroup membership
cat /proc/$PID/cgroup

# Namespace information
ls -l /proc/$PID/ns/

# Scheduling information
cat /proc/$PID/sched | head -20

Interesting /proc Files

# Random number entropy available
cat /proc/sys/kernel/random/entropy_avail

# Hostname
cat /proc/sys/kernel/hostname

# OS type
cat /proc/sys/kernel/ostype

# Swappiness (how aggressively kernel swaps)
cat /proc/sys/vm/swappiness

# IP forwarding enabled?
cat /proc/sys/net/ipv4/ip_forward

# Maximum number of processes
cat /proc/sys/kernel/pid_max

# Kernel taint flags (non-zero means something unusual)
cat /proc/sys/kernel/tainted

Think About It: /proc files have a size of 0 bytes according to ls -l, yet cat can read content from them. Why? What does this tell you about how /proc works?


Exploring /sys -- The Device Filesystem

/sys (sysfs) is another virtual filesystem, focused on devices and kernel subsystems:

# Block devices (disks)
ls /sys/block/

# Network devices and their MAC addresses
ls /sys/class/net/
cat /sys/class/net/eth0/address 2>/dev/null

# CPU frequency governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 2>/dev/null

# Disk queue scheduler
cat /sys/block/sda/queue/scheduler 2>/dev/null

While /proc is a mix of process info and kernel state (and is older, from Linux 1.0), /sys is a cleaner, hierarchical view focused on devices and drivers (introduced in Linux 2.6). Both are virtual -- nothing on disk.


uname: Kernel Identity

# All information at once
uname -a

# Most commonly used flags
uname -r     # Kernel release: 6.1.0-18-amd64
uname -m     # Architecture: x86_64
uname -s     # Kernel name: Linux
uname -n     # Hostname

The kernel version string decoded:

  6.1.0-18-amd64
  | | |  |    |
  | | |  |    +-- Architecture variant
  | | |  +------- Distro patch level
  | | +---------- Patch version
  | +------------ Minor version
  +-------------- Major version

Kernel Parameters with sysctl

sysctl reads and writes kernel parameters at runtime. These correspond to files under /proc/sys/:

# List all kernel parameters
sysctl -a 2>/dev/null | head -20

# Read a specific parameter
sysctl net.ipv4.ip_forward
# Same as: cat /proc/sys/net/ipv4/ip_forward

# Set a parameter temporarily (until reboot)
sudo sysctl net.ipv4.ip_forward=1

# Set a parameter permanently
echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/99-forwarding.conf
sudo sysctl --system    # Reload all sysctl config

Important sysctl Parameters

# Network
sysctl net.ipv4.ip_forward                    # IP routing
sysctl net.ipv4.tcp_syncookies                # SYN flood protection
sysctl net.core.somaxconn                     # Max socket listen backlog
sysctl net.ipv4.tcp_max_syn_backlog           # SYN queue size

# Virtual memory
sysctl vm.swappiness                          # Swap aggressiveness (0-100)
sysctl vm.dirty_ratio                         # % of RAM for dirty pages before sync
sysctl vm.overcommit_memory                   # Memory overcommit policy

# Kernel
sysctl kernel.pid_max                         # Maximum PID value
sysctl kernel.hostname                        # System hostname
sysctl kernel.panic                           # Seconds before reboot on panic (0=hang)

# File system
sysctl fs.file-max                            # Maximum open files system-wide
sysctl fs.inotify.max_user_watches            # inotify watch limit

Practical: Tuning for a Web Server

# Increase connection backlog for high-traffic servers
sudo sysctl net.core.somaxconn=65535
sudo sysctl net.ipv4.tcp_max_syn_backlog=65535

# Increase file descriptor limits
sudo sysctl fs.file-max=2097152

# Increase inotify watches (for file-watching dev tools)
sudo sysctl fs.inotify.max_user_watches=524288

# Make changes permanent
sudo tee /etc/sysctl.d/99-webserver.conf << 'EOF'
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
EOF

sudo sysctl --system

dmesg: The Kernel Ring Buffer

dmesg displays the kernel ring buffer -- a circular log where the kernel writes messages about hardware detection, driver loading, errors, and other events:

# View all kernel messages
dmesg

# View with human-readable timestamps
dmesg -T

# View with color
dmesg --color=always | less -R

# Show only errors and warnings
dmesg --level=err,warn

# Follow new messages in real time (like tail -f)
dmesg -w

# Show messages since last boot
dmesg -T | head -50

# Clear the ring buffer (root only)
sudo dmesg -c

What to Look for in dmesg

# Hardware detection at boot
dmesg | grep -i "cpu\|memory\|disk\|network\|usb"

# Disk/storage messages
dmesg | grep -i "sd[a-z]\|nvme\|ext4\|xfs"

# Network interface detection
dmesg | grep -i "eth\|ens\|wlan\|link"

# Errors (these are important!)
dmesg --level=err

# Out of memory events
dmesg | grep -i "oom\|out of memory"

# USB device events
dmesg | grep -i usb

# Firewall drops (if logging is enabled)
dmesg | grep -i "iptables\|nftables\|DROP"

dmesg and journalctl

On systemd systems, kernel messages are also captured by journald:

# Kernel messages via journalctl
journalctl -k

# Kernel messages from current boot
journalctl -k -b 0

# Kernel messages from previous boot
journalctl -k -b -1

# Follow kernel messages
journalctl -kf

Think About It: The kernel ring buffer has a fixed size (typically 256KB-1MB). What happens when it fills up? How does this affect your ability to investigate boot problems hours after the system started?


Debug This: Identifying a Missing Driver

A new USB device is plugged in but does not work:

# Step 1: Check dmesg for recent USB events
dmesg -T | tail -30

# You might see something like:
# [timestamp] usb 1-1: new high-speed USB device number 4
# [timestamp] usb 1-1: New USB device found, idVendor=1234, idProduct=5678
# [timestamp] usb 1-1: New USB device strings: Mfr=1, Product=2, Serial=3

# Step 2: Check if a driver was loaded
dmesg -T | grep -i "driver\|module\|bound"

# Step 3: Find the vendor/product ID
lsusb
# Bus 001 Device 004: ID 1234:5678 Unknown Device

# Step 4: Search for a matching module
find /lib/modules/$(uname -r) -name "*.ko" | xargs modinfo 2>/dev/null | grep -B5 "1234"

# Step 5: Check if the module exists but isn't loaded
modprobe --show-depends relevant_module

# Step 6: Try loading it manually
sudo modprobe relevant_module

# Step 7: Check dmesg again
dmesg -T | tail -10

Hands-On: Kernel Exploration Lab

# 1. Determine your exact kernel version and architecture
uname -r
uname -m

# 2. How many system calls does the kernel support?
# (On x86_64 systems)
grep -c "^[0-9]" /usr/include/asm/unistd_64.h 2>/dev/null || \
ausyscall --dump 2>/dev/null | wc -l

# 3. What kernel modules are loaded for your filesystem?
lsmod | grep -E "ext4|xfs|btrfs"

# 4. What is the kernel's view of your disks?
cat /proc/partitions

# 5. Check kernel taint status (0 = clean, non-zero = something unusual)
cat /proc/sys/kernel/tainted

# 6. See the kernel command line (how it was booted)
cat /proc/cmdline

# 7. What interrupts are firing?
cat /proc/interrupts | head -20

# 8. Check the current swappiness
sysctl vm.swappiness

# 9. Temporarily change swappiness and verify
sudo sysctl vm.swappiness=10
sysctl vm.swappiness
# Reset it
sudo sysctl vm.swappiness=60

# 10. Trace system calls of a simple command
strace -c date 2>&1

What Just Happened?

+------------------------------------------------------------------+
|  Chapter 13 Recap: The Kernel Up Close                           |
|------------------------------------------------------------------|
|                                                                  |
|  - The kernel manages hardware, processes, memory, and I/O.     |
|  - User space programs access the kernel via system calls.       |
|  - strace lets you watch system calls in real time.              |
|  - Kernel modules (.ko) load functionality on demand.            |
|  - lsmod, modprobe, modinfo manage modules.                     |
|  - /proc is a virtual filesystem exposing kernel state.          |
|  - /sys exposes device and driver information hierarchically.    |
|  - uname -r shows your kernel version.                           |
|  - sysctl reads and tunes kernel parameters at runtime.         |
|  - dmesg shows the kernel ring buffer (hardware, drivers, errors)|
|  - Kernel parameters can be made permanent in /etc/sysctl.d/.   |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: System Call Counting

Use strace -c on three different commands: ls /tmp, cat /etc/passwd, and curl -s example.com > /dev/null. Compare the number and types of system calls. Which command makes the most? Why?

Exercise 2: Module Investigation

Run lsmod and pick three modules you do not recognize. Use modinfo to learn about each one: what does it do, what license is it under, and what parameters does it accept?

Exercise 3: /proc Scavenger Hunt

Using only files in /proc, determine: (a) how many CPUs/cores the kernel sees, (b) total installed RAM, (c) current load average, (d) the kernel's command line boot parameters, and (e) how many file descriptors are currently in use system-wide.

Exercise 4: sysctl Tuning

Read the current values of vm.swappiness, net.ipv4.ip_forward, and fs.file-max. Change vm.swappiness to 10, verify the change took effect, then set it back to the original value. Write the appropriate line for /etc/sysctl.d/ to make it permanent.

Bonus Challenge

Write a script called kernel-report.sh that outputs a comprehensive report: kernel version, architecture, uptime, number of loaded modules, top 5 modules by memory usage, number of running processes, file descriptor usage, and any errors in the kernel ring buffer from the last hour. Format the output cleanly with headers and dividers.