The Kernel Up Close
Why This Matters
Every command you have run so far in this book -- every file you opened, every process you started, every network packet you sent -- went through the Linux kernel. The kernel is the one piece of software that sits between your programs and the hardware. It manages memory, schedules processes, handles disk I/O, drives network interfaces, and enforces security.
Yet most Linux users never look at the kernel directly. They interact with it through commands, system calls, and virtual filesystems without realizing it. This chapter pulls back the curtain. You will learn what the kernel actually does, how to inspect it, how to load and unload kernel modules, how to tune kernel behavior at runtime, and how to read the kernel's own log messages.
This knowledge is essential for performance tuning, hardware troubleshooting, security hardening, and understanding why things work the way they do.
Try This Right Now
# What kernel are you running?
uname -a
# Kernel version only
uname -r
# How long has this kernel been running?
uptime
# See kernel log messages (most recent)
dmesg | tail -20
# How many kernel modules are loaded?
lsmod | wc -l
# Peek at the kernel's view of your CPU
cat /proc/cpuinfo | head -20
# How much memory does the kernel see?
cat /proc/meminfo | head -10
Kernel vs. Userspace
The most fundamental distinction in Linux is between kernel space and user space.
+--------------------------------------------------+
| User Space |
| |
| +--------+ +--------+ +--------+ +------+ |
| | bash | | nginx | | python | | top | |
| +--------+ +--------+ +--------+ +------+ |
| |
| Applications, libraries (glibc), utilities |
| |
+=======================+=========================+
| System Call Interface (syscall) |
+=======================+=========================+
| |
| Kernel Space |
| |
| +----------+ +---------+ +----------+ |
| | Process | | Memory | | Network | |
| | Scheduler | | Manager | | Stack | |
| +----------+ +---------+ +----------+ |
| |
| +----------+ +---------+ +----------+ |
| | VFS | | Device | | Security | |
| | | | Drivers | | (LSM) | |
| +----------+ +---------+ +----------+ |
| |
+=======================+=========================+
| Hardware |
| CPU, RAM, Disk, Network, USB, GPU, ... |
+--------------------------------------------------+
Why Two Spaces?
- Kernel space has unrestricted access to hardware. A bug here can crash the entire system.
- User space is restricted. A bug in your application cannot (usually) crash the kernel or affect other users.
The CPU enforces this split using hardware protection rings:
- Ring 0: Kernel mode (full hardware access)
- Ring 3: User mode (restricted)
When your program needs something that requires kernel privileges (opening a file, sending a network packet, allocating memory), it makes a system call.
System Calls: The Gateway
A system call (syscall) is how user-space programs request services from the kernel. Every meaningful operation eventually becomes a system call.
Your Program (user space)
|
| printf("hello\n")
|
v
C Library (glibc)
|
| write(1, "hello\n", 6) <-- system call wrapper
|
v
Kernel (kernel space)
|
| Actually writes bytes to the terminal device
|
v
Hardware (terminal/screen)
Common System Calls
| System Call | What It Does | You Use It When... |
|---|---|---|
open() | Open a file | Opening any file |
read() | Read from a file descriptor | Reading file contents |
write() | Write to a file descriptor | Writing to a file or stdout |
close() | Close a file descriptor | Done with a file |
fork() | Create a child process | Starting a new process |
exec() | Replace process with new program | Running a command |
mmap() | Map file/memory into address space | Memory allocation, file I/O |
socket() | Create a network socket | Any network operation |
ioctl() | Device-specific control | Hardware configuration |
Watching System Calls with strace
strace lets you see every system call a process makes:
# Trace a simple command
strace ls /tmp 2>&1 | head -30
# Trace a running process
sudo strace -p $(pgrep nginx | head -1) -e trace=read,write
# Count system calls (summary mode)
strace -c ls /tmp
# Trace file-related calls only
strace -e trace=file ls /tmp
# Trace network-related calls only
strace -e trace=network curl -s example.com > /dev/null
# Trace with timestamps
strace -t ls /tmp 2>&1 | head -10
Example output from strace -c ls /tmp:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
25.00 0.000050 10 5 openat
20.00 0.000040 5 8 mmap
15.00 0.000030 4 7 close
10.00 0.000020 3 6 fstat
10.00 0.000020 4 5 read
5.00 0.000010 10 1 getdents64
...
Think About It: When you run
echo "hello", how many system calls happen? Trystrace echo "hello" 2>&1 | wc -lto find out. Why are there so many for such a simple command?
Kernel Modules
The Linux kernel is modular. Rather than compiling every possible driver and feature into the kernel image, Linux loads functionality on demand through kernel modules. These are .ko (kernel object) files.
+----------------------------------+
| Linux Kernel |
| |
| Core (always loaded): |
| - Process scheduler |
| - Memory manager |
| - VFS layer |
| |
| Modules (loaded on demand): |
| +----------+ +----------+ |
| | ext4.ko | | e1000.ko | |
| +----------+ +----------+ |
| +----------+ +----------+ |
| | nf_tables| | usb_hid | |
| +----------+ +----------+ |
+----------------------------------+
Listing Loaded Modules
# List all currently loaded modules
lsmod
# Output format:
# Module Size Used by
# nf_tables 303104 0
# e1000 151552 0
# ext4 806912 1
# ...
The columns are:
- Module: Module name
- Size: Memory used (bytes)
- Used by: Count of dependents, and which modules depend on it
# Filter for a specific module
lsmod | grep ext4
# Count loaded modules
lsmod | wc -l
Getting Module Information
# Detailed info about a module
modinfo ext4
# Key fields:
# filename: /lib/modules/.../ext4.ko
# license: GPL
# description: Fourth Extended Filesystem
# depends: jbd2,mbcache,crc16
# parm: ... (module parameters)
# Just show the description
modinfo -d ext4
# Show module parameters
modinfo -p ext4
# Show the file path
modinfo -n ext4
Loading and Unloading Modules
# Load a module (resolves dependencies automatically)
sudo modprobe snd_dummy
# Verify it loaded
lsmod | grep snd_dummy
# Unload a module
sudo modprobe -r snd_dummy
# Load with parameters
sudo modprobe loop max_loop=64
WARNING: Be very careful loading and unloading kernel modules on production systems. Unloading a module that is in use can crash the system.
modprobe -rwill refuse if the module is in use, but forcing removal (rmmod -f) can cause a kernel panic.
Module Dependencies
Modules can depend on other modules. modprobe handles this automatically, but you can see the dependency tree:
# Show what a module depends on
modinfo ext4 | grep depends
# Show the full dependency tree
modprobe --show-depends ext4
Blacklisting Modules
Sometimes you need to prevent a module from loading (conflicting drivers, security):
# Create a blacklist file
sudo tee /etc/modprobe.d/blacklist-example.conf << 'EOF'
# Prevent the nouveau driver from loading (example)
blacklist nouveau
EOF
# After blacklisting, update initramfs
sudo update-initramfs -u # Debian/Ubuntu
sudo dracut --force # RHEL/Fedora
Distro Note: Module blacklisting syntax is the same across distributions, but the command to rebuild initramfs differs. Debian/Ubuntu use
update-initramfs, RHEL/Fedora usedracut.
Exploring /proc -- The Process Filesystem
/proc is a virtual filesystem. Nothing in it exists on disk -- the kernel generates its contents on the fly when you read them. It is your window into the kernel's state.
System-Wide Information
# Kernel version
cat /proc/version
# CPU information
cat /proc/cpuinfo
# Memory statistics
cat /proc/meminfo
# Uptime (in seconds)
cat /proc/uptime
# Load average
cat /proc/loadavg
# Mounted filesystems
cat /proc/mounts
# Currently active partitions
cat /proc/partitions
# Network statistics
cat /proc/net/dev
# Open file count system-wide
cat /proc/sys/fs/file-nr
# Maximum number of open files
cat /proc/sys/fs/file-max
# Kernel command line (boot parameters)
cat /proc/cmdline
Per-Process Information
Each PID has its own directory (covered in Chapter 10, but here is the kernel-focused view):
# Pick a PID (your own shell)
PID=$$
# Command that started this process
cat /proc/$PID/cmdline | tr '\0' ' '; echo
# Process status (kernel's view)
cat /proc/$PID/status
# Memory map
cat /proc/$PID/maps | head -10
# Open file descriptors
ls -l /proc/$PID/fd/
# Limits applied to this process
cat /proc/$PID/limits
# cgroup membership
cat /proc/$PID/cgroup
# Namespace information
ls -l /proc/$PID/ns/
# Scheduling information
cat /proc/$PID/sched | head -20
Interesting /proc Files
# Random number entropy available
cat /proc/sys/kernel/random/entropy_avail
# Hostname
cat /proc/sys/kernel/hostname
# OS type
cat /proc/sys/kernel/ostype
# Swappiness (how aggressively kernel swaps)
cat /proc/sys/vm/swappiness
# IP forwarding enabled?
cat /proc/sys/net/ipv4/ip_forward
# Maximum number of processes
cat /proc/sys/kernel/pid_max
# Kernel taint flags (non-zero means something unusual)
cat /proc/sys/kernel/tainted
Think About It: /proc files have a size of 0 bytes according to
ls -l, yetcatcan read content from them. Why? What does this tell you about how /proc works?
Exploring /sys -- The Device Filesystem
/sys (sysfs) is another virtual filesystem, focused on devices and kernel subsystems:
# Block devices (disks)
ls /sys/block/
# Network devices and their MAC addresses
ls /sys/class/net/
cat /sys/class/net/eth0/address 2>/dev/null
# CPU frequency governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 2>/dev/null
# Disk queue scheduler
cat /sys/block/sda/queue/scheduler 2>/dev/null
While /proc is a mix of process info and kernel state (and is older, from Linux 1.0), /sys is a cleaner, hierarchical view focused on devices and drivers (introduced in Linux 2.6). Both are virtual -- nothing on disk.
uname: Kernel Identity
# All information at once
uname -a
# Most commonly used flags
uname -r # Kernel release: 6.1.0-18-amd64
uname -m # Architecture: x86_64
uname -s # Kernel name: Linux
uname -n # Hostname
The kernel version string decoded:
6.1.0-18-amd64
| | | | |
| | | | +-- Architecture variant
| | | +------- Distro patch level
| | +---------- Patch version
| +------------ Minor version
+-------------- Major version
Kernel Parameters with sysctl
sysctl reads and writes kernel parameters at runtime. These correspond to files under /proc/sys/:
# List all kernel parameters
sysctl -a 2>/dev/null | head -20
# Read a specific parameter
sysctl net.ipv4.ip_forward
# Same as: cat /proc/sys/net/ipv4/ip_forward
# Set a parameter temporarily (until reboot)
sudo sysctl net.ipv4.ip_forward=1
# Set a parameter permanently
echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/99-forwarding.conf
sudo sysctl --system # Reload all sysctl config
Important sysctl Parameters
# Network
sysctl net.ipv4.ip_forward # IP routing
sysctl net.ipv4.tcp_syncookies # SYN flood protection
sysctl net.core.somaxconn # Max socket listen backlog
sysctl net.ipv4.tcp_max_syn_backlog # SYN queue size
# Virtual memory
sysctl vm.swappiness # Swap aggressiveness (0-100)
sysctl vm.dirty_ratio # % of RAM for dirty pages before sync
sysctl vm.overcommit_memory # Memory overcommit policy
# Kernel
sysctl kernel.pid_max # Maximum PID value
sysctl kernel.hostname # System hostname
sysctl kernel.panic # Seconds before reboot on panic (0=hang)
# File system
sysctl fs.file-max # Maximum open files system-wide
sysctl fs.inotify.max_user_watches # inotify watch limit
Practical: Tuning for a Web Server
# Increase connection backlog for high-traffic servers
sudo sysctl net.core.somaxconn=65535
sudo sysctl net.ipv4.tcp_max_syn_backlog=65535
# Increase file descriptor limits
sudo sysctl fs.file-max=2097152
# Increase inotify watches (for file-watching dev tools)
sudo sysctl fs.inotify.max_user_watches=524288
# Make changes permanent
sudo tee /etc/sysctl.d/99-webserver.conf << 'EOF'
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
EOF
sudo sysctl --system
dmesg: The Kernel Ring Buffer
dmesg displays the kernel ring buffer -- a circular log where the kernel writes messages about hardware detection, driver loading, errors, and other events:
# View all kernel messages
dmesg
# View with human-readable timestamps
dmesg -T
# View with color
dmesg --color=always | less -R
# Show only errors and warnings
dmesg --level=err,warn
# Follow new messages in real time (like tail -f)
dmesg -w
# Show messages since last boot
dmesg -T | head -50
# Clear the ring buffer (root only)
sudo dmesg -c
What to Look for in dmesg
# Hardware detection at boot
dmesg | grep -i "cpu\|memory\|disk\|network\|usb"
# Disk/storage messages
dmesg | grep -i "sd[a-z]\|nvme\|ext4\|xfs"
# Network interface detection
dmesg | grep -i "eth\|ens\|wlan\|link"
# Errors (these are important!)
dmesg --level=err
# Out of memory events
dmesg | grep -i "oom\|out of memory"
# USB device events
dmesg | grep -i usb
# Firewall drops (if logging is enabled)
dmesg | grep -i "iptables\|nftables\|DROP"
dmesg and journalctl
On systemd systems, kernel messages are also captured by journald:
# Kernel messages via journalctl
journalctl -k
# Kernel messages from current boot
journalctl -k -b 0
# Kernel messages from previous boot
journalctl -k -b -1
# Follow kernel messages
journalctl -kf
Think About It: The kernel ring buffer has a fixed size (typically 256KB-1MB). What happens when it fills up? How does this affect your ability to investigate boot problems hours after the system started?
Debug This: Identifying a Missing Driver
A new USB device is plugged in but does not work:
# Step 1: Check dmesg for recent USB events
dmesg -T | tail -30
# You might see something like:
# [timestamp] usb 1-1: new high-speed USB device number 4
# [timestamp] usb 1-1: New USB device found, idVendor=1234, idProduct=5678
# [timestamp] usb 1-1: New USB device strings: Mfr=1, Product=2, Serial=3
# Step 2: Check if a driver was loaded
dmesg -T | grep -i "driver\|module\|bound"
# Step 3: Find the vendor/product ID
lsusb
# Bus 001 Device 004: ID 1234:5678 Unknown Device
# Step 4: Search for a matching module
find /lib/modules/$(uname -r) -name "*.ko" | xargs modinfo 2>/dev/null | grep -B5 "1234"
# Step 5: Check if the module exists but isn't loaded
modprobe --show-depends relevant_module
# Step 6: Try loading it manually
sudo modprobe relevant_module
# Step 7: Check dmesg again
dmesg -T | tail -10
Hands-On: Kernel Exploration Lab
# 1. Determine your exact kernel version and architecture
uname -r
uname -m
# 2. How many system calls does the kernel support?
# (On x86_64 systems)
grep -c "^[0-9]" /usr/include/asm/unistd_64.h 2>/dev/null || \
ausyscall --dump 2>/dev/null | wc -l
# 3. What kernel modules are loaded for your filesystem?
lsmod | grep -E "ext4|xfs|btrfs"
# 4. What is the kernel's view of your disks?
cat /proc/partitions
# 5. Check kernel taint status (0 = clean, non-zero = something unusual)
cat /proc/sys/kernel/tainted
# 6. See the kernel command line (how it was booted)
cat /proc/cmdline
# 7. What interrupts are firing?
cat /proc/interrupts | head -20
# 8. Check the current swappiness
sysctl vm.swappiness
# 9. Temporarily change swappiness and verify
sudo sysctl vm.swappiness=10
sysctl vm.swappiness
# Reset it
sudo sysctl vm.swappiness=60
# 10. Trace system calls of a simple command
strace -c date 2>&1
What Just Happened?
+------------------------------------------------------------------+
| Chapter 13 Recap: The Kernel Up Close |
|------------------------------------------------------------------|
| |
| - The kernel manages hardware, processes, memory, and I/O. |
| - User space programs access the kernel via system calls. |
| - strace lets you watch system calls in real time. |
| - Kernel modules (.ko) load functionality on demand. |
| - lsmod, modprobe, modinfo manage modules. |
| - /proc is a virtual filesystem exposing kernel state. |
| - /sys exposes device and driver information hierarchically. |
| - uname -r shows your kernel version. |
| - sysctl reads and tunes kernel parameters at runtime. |
| - dmesg shows the kernel ring buffer (hardware, drivers, errors)|
| - Kernel parameters can be made permanent in /etc/sysctl.d/. |
| |
+------------------------------------------------------------------+
Try This
Exercise 1: System Call Counting
Use strace -c on three different commands: ls /tmp, cat /etc/passwd, and curl -s example.com > /dev/null. Compare the number and types of system calls. Which command makes the most? Why?
Exercise 2: Module Investigation
Run lsmod and pick three modules you do not recognize. Use modinfo to learn about each one: what does it do, what license is it under, and what parameters does it accept?
Exercise 3: /proc Scavenger Hunt
Using only files in /proc, determine: (a) how many CPUs/cores the kernel sees, (b) total installed RAM, (c) current load average, (d) the kernel's command line boot parameters, and (e) how many file descriptors are currently in use system-wide.
Exercise 4: sysctl Tuning
Read the current values of vm.swappiness, net.ipv4.ip_forward, and fs.file-max. Change vm.swappiness to 10, verify the change took effect, then set it back to the original value. Write the appropriate line for /etc/sysctl.d/ to make it permanent.
Bonus Challenge
Write a script called kernel-report.sh that outputs a comprehensive report: kernel version, architecture, uptime, number of loaded modules, top 5 modules by memory usage, number of running processes, file descriptor usage, and any errors in the kernel ring buffer from the last hour. Format the output cleanly with headers and dividers.